----- 不完整的分类数据设计:研究调查中敏感问题的非随机反应技术
Nonrandomized response techniques for sensitive survey questions refer to methodologies that aim to increase response rates in survey questionnaires for sensitive questions. Sensitive questions are common in many areas of research. Questions regarding drug use, alcohol use, and sexual behaviors are only a few such examples of sensitive questions. While there are several ways to deal with embarrassing questions, such as that proposed by Barton (1958), Warner (1965) developed a method which encourages responders to give truthful answers. The Warner model, based on a randomized response technique, was developed and extended dramatically over the last five decades. Several books discuss randomized response techniques (see for example Fox and Tracy (1986) and Chaudhuri and Mukerjee (1988) and the monograph by Chaudhuri (2011)). However, due to several issues including (1) lack of reproducibility; (2) lack of trust from the interviewee; (3) higher cost; and (4) narrow range of applications, there was a need for the development of new methodologyâthe non-randomized response approach. Non-randomized procedures were introduced by Swensson (1974) and Takahasi and Sakasegawa (1977), and then investigated and extended by several other authors. However, this is the first book related to this topic. The authors state that they aimed to provide a systematic introduction to non-randomized response techniques for statisticians and non-statisticians. They suggest that the book is written as a one-semester course for advanced master or PhD level, or as a reference book. I came to this book as a long time missing data researcher with no previous experience with randomized or non-randomized response techniques. The first chapter gives a brief introduction to the Warner (randomized response) model. In its simplest form, consider the situation where belonging to group A represents a sensitive topic, and consider two complementary yes/no response statements: (1) I belong to group A (with probability p of being assigned to respond to this statement); (2) I do not belong to group A (with probability 1 â p of being assigned to response to this statement). The respondent is required to respond âyesâ or ânoâ to either statement (1) or (2) depending on an outcome of randomization (random number, flip of a coin, toss of a die, etc) which is not revealed to the interviewer. While there are two statements, the interviewer does not know which of them was answered and which of them was not. The chapter also describes several extensions for the randomized models and the limitations of these models. Finally, the non-randomized response model is presented. For example, consider the same question as above, where an unrelated question is introduced: âwhich do you prefer, âsummerâ or âwinterâ ?ââ the answer to this question is known to the subject but is not revealed to the interviewer. The respondent then has to choose zero or one based on the following statement: (1) If you prefer summer and you are in A, say 0; (2) If you prefer summer and do not belong to A, say 1; (3) If you prefer winter and belong to A, say 1; and (4) If you prefer winter and do not belong to A, say 0. If the probability p of preferring summer is known, the non-randomized procedure is equivalent to the randomized procedure. If however, this probability parameter is unknown, it would have to be estimated from a different sample. In chapter 2 the authors describe the non-randomized crosswise model which can be viewed as the non-randomized version of the Warner model. In chapter 3 the authors introduce the triangular model, while in chapter 4 they introduce the sample size calculation procedures for these two models. Chapter 5 covers the extension of the triangular model to a multi-category case. Up to this point all of the sensitive questions had a binary answer (yes/no), but this chapter extends to m > 2 categories. Chapter 6 deals with the case of two sensitive questions with binary outcomes. Chapter 7 discusses the unrelated question model and its non-randomized extension, the parallel model, while chapter 8 introduces the sample size calculation procedures for the parallel model. Chapter 9 extends the parallel model to the multi-category case, while chapter 10 introduces more variants of the parallel model. Finally, chapter 11 discusses the combination questionnaire model where a main questionnaire together with a supplemental questionnaire are being used without any randomized devices. Overall I found the presentation clear, if rather brief. Most chapters included a survey design section, maximum likelihood estimation (mostly using EM), Bayesian inferences and one or two examples. There are three appendices: (1) The Expectation Maximization (EM) and Data Augmentation (DA) algorithms; (2) The exact inverse Bayes formula (IBF) sampling; and (3) Some statistical distributions. Although the book was published in August 2013 and the preface states that all the R code and data sets used in the book are available at http://www.saasweb.hku.hk/staff/gltian, they were not yet available at the time of this review. While the examples throughout the book looked interesting, I would have loved to run the code myself and see the ease or complexity in doing so. Without it, I assume almost no one will take it upon him/herself to re-develop the code used in the book. (Editorâs note: Although the code and data were not available online at the time of the review, they were posted before publication of the review.) The strong points of this text are the timing (it is the first book on its topic) and the fact that it uses both Bayesian and frequentist procedures for solving these interesting problems. The level of mathematical detail appears appropriate for PhD students in statistics or related disciplines. I believe the text will be useful for self-study by scientists and statisticians, and potentially useful in some applied courses or as a supplement to a more theoretical course (now that the code and data are available). There is much that it doesnât cover in any great depth, but perhaps that is too much to ask of a concise introduction.
{{comment.content}}