You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/util_rand_resp.md
+53-34Lines changed: 53 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ The lecture tells how Ljungqvist used his framework to shed light on alternative
30
30
31
31
## Privacy Measures
32
32
33
-
We consider randomized response models with only two possible answers, "yes" and "no."
33
+
We consider randomized response models with only two possible answers, "yes" and "no."
34
34
35
35
The design determines probabilities
36
36
@@ -49,7 +49,7 @@ $$
49
49
$$ (eq:util-rand-one)
50
50
51
51
52
-
## Zoo of Concepts
52
+
## Zoo of concepts
53
53
54
54
At this point we describe some concepts proposed by various researchers.
55
55
@@ -100,7 +100,7 @@ An efficient randomized response model is, therefore, any model that attains the
100
100
101
101
As a special example, Leysieffer and Warner considered "a problem in which there is no jeopardy in a no answer"; that is, $g(\text{no}|A^{'})$ can be of unlimited magnitude.
102
102
103
-
Evidently, an optimal design must have
103
+
Evidently, an optimal design must have
104
104
105
105
$$
106
106
\text{Pr}(\text{yes}|A)=1
@@ -116,7 +116,7 @@ $$
116
116
117
117
{cite:t}`lanke1975choice` argued that "it is membership in Group A that people may want to hide, not membership in the complementary Group A'."
118
118
119
-
For that reason, {cite:t}`lanke1976degree` argued that an appropriate measure of protection is to minimize
119
+
For that reason, {cite:t}`lanke1976degree` argued that an appropriate measure of protection is to minimize
This measure is just the first term in {eq}`eq:util-rand-seven-a`, i.e., the probability that an individual answers "yes" and is perceived to belong to $A$.
169
169
170
-
## Respondent's Expected Utility
170
+
## Respondent's expected utility
171
171
172
-
### Truth Border
172
+
### Truth border
173
173
174
-
Key assumptions that underlie a randomized response technique for estimating the fraction of a population that belongs to $A$ are:
174
+
Key assumptions that underlie a randomized response technique for estimating the fraction of a population that belongs to $A$ are:
175
175
176
176
- **Assumption 1**: Respondents feel discomfort from being thought of as belonging to $A$.
177
177
178
-
- **Assumption 2**: Respondents prefer to answer questions truthfully than to lie, so long as the cost of doing so is not too high. The cost is taken to be the discomfort in 1.
178
+
- **Assumption 2**: Respondents prefer to answer questions truthfully than to lie, so long as the cost of doing so is not too high.
179
+
180
+
- The cost is taken to be the discomfort in 1.
179
181
180
182
Let $r_i$ denote individual $i$'s response to the randomized question.
181
183
182
184
$r_i$ can only take values "yes" or "no".
183
185
184
-
For a given design of a randomized response interview and a given belief about the fraction of the population
185
-
that belongs to $A$, the respondent's answer is associated with a conditional probability $ \text{Pr}(A|r_i)$ that the individual belongs to $A$.
186
+
For a given design of a randomized response interview and a given belief about the fraction of the population that belongs to $A$, the respondent's answer is associated with a conditional probability $\text{Pr}(A|r_i)$ that the individual belongs to $A$.
186
187
187
-
Given $r_i$ and complete privacy, the individual's utility is higher if $r_i$ represents a truthful answer rather than a lie.
188
+
Given $r_i$ and complete privacy, the individual's utility is higher if $r_i$ represents a truthful answer rather than a lie.
188
189
189
190
In terms of a respondent's expected utility as a function of $ \text{Pr}(A|r_i)$ and $r_i$
190
191
@@ -207,19 +208,19 @@ $$ (eq:util-rand-nine-a)
207
208
and
208
209
209
210
$$
210
-
U_i\left(\text{Pr}(A|r_i),\text{truth}\right)>U_i\left(\text{Pr}(A|r_i),\text{lie}\right), \text{ for } \text{Pr}(A|r_i) \in [0,1]
211
+
U_i\left(\text{Pr}(A|r_i),\text{truth}\right)>U_i\left(\text{Pr}(A|r_i),\text{lie}\right), \text{ for } \text{Pr}(A|r_i) \in [0,1]
211
212
$$ (eq:util-rand-nine-b)
212
213
213
-
Suppose now that correct answer for individual $i$ is "yes".
214
+
Suppose now that correct answer for individual $i$ is "yes".
214
215
215
-
Individual $i$ would choose to answer truthfully if
216
+
Individual $i$ would choose to answer truthfully if
@@ -235,15 +236,15 @@ so that a "yes" answer increases the odds that an individual belongs to $A$.
235
236
236
237
Constraint {eq}`eq:util-rand-ten-b` holds for sure.
237
238
238
-
Consequently, constraint {eq}`eq:util-rand-ten-a` becomes the single necessary condition for individual $i$ always to answer truthfully.
239
+
Consequently, constraint {eq}`eq:util-rand-ten-a` becomes the single necessary condition for individual $i$ always to answer truthfully.
239
240
240
-
At equality, constraint {eq}`eq:util-rand-ten-a` determines conditional probabilities that make the individual indifferent between telling the truth and lying when the correct answer is "yes":
241
+
At equality, constraint {eq}`eq:util-rand-ten-a` determines conditional probabilities that make the individual indifferent between telling the truth and lying when the correct answer is "yes":
Equation {eq}`eq:util-rand-eleven` defines a "truth border".
247
+
Equation {eq}`eq:util-rand-eleven` defines a "truth border".
247
248
248
249
Differentiating {eq}`eq:util-rand-eleven` with respect to the conditional probabilities shows that the truth border has a positive slope in the space of conditional probabilities:
249
250
@@ -255,17 +256,25 @@ The source of the positive relationship is:
255
256
256
257
- The individual is willing to volunteer a truthful "yes" answer so long as the utility from doing so (i.e., the left side of {eq}`eq:util-rand-eleven`) is at least as high as the utility of lying on the right side of {eq}`eq:util-rand-eleven`.
257
258
258
-
- Suppose now that $\text{Pr}(A|\text{yes})$ increases. That reduces the utility of telling the truth. To preserve indifference between a truthful answer and a lie, $\text{Pr}(A|\text{no})$ must increase to reduce the utility of lying.
259
+
- Suppose now that $\text{Pr}(A|\text{yes})$ increases.
260
+
261
+
- This reduces the utility of telling the truth.
262
+
263
+
- To preserve indifference between a truthful answer and a lie, $\text{Pr}(A|\text{no})$ must increase to reduce the utility of lying.
259
264
260
-
### Drawing a Truth Border
265
+
### Drawing a truth border
261
266
262
267
We can deduce two things about the truth border:
263
268
264
269
- The truth border divides the space of conditional probabilities into two subsets: "truth telling" and "lying".
265
270
266
-
- Thus, sufficient privacy elicits a truthful answer, whereas insufficient privacy results in a lie. The truth border depends on a respondent's utility function.
271
+
- Thus, sufficient privacy elicits a truthful answer, whereas insufficient privacy results in a lie.
272
+
273
+
- The truth border depends on a respondent's utility function.
274
+
275
+
- Assumptions in {eq}`eq:util-rand-nine-a` and {eq}`eq:util-rand-nine-a` are sufficient only to guarantee a positive slope of the truth border.
267
276
268
-
- Assumptions in {eq}`eq:util-rand-nine-a` and {eq}`eq:util-rand-nine-a` are sufficient only to guarantee a positive slope of the truth border. The truth border can have either a concave or a convex shape.
277
+
The truth border can have either a concave or a convex shape.
269
278
270
279
We can draw some truth borders with the following Python code:
@@ -378,7 +387,7 @@ From expression {eq}`eq:util-rand-thirteen`, {eq}`eq:util-rand-fourteen-a` and {
378
387
379
388
- Iso-variance curves are always upward-sloping and concave.
380
389
381
-
### Drawing Iso-variance Curves
390
+
### Drawing iso-variance curves
382
391
383
392
We use Python code to draw iso-variance curves.
384
393
@@ -452,7 +461,7 @@ var = Iso_Variance(π=0.3, n=100)
452
461
var.plotting_iso_variance_curve()
453
462
```
454
463
455
-
### Optimal Survey
464
+
### Optimal survey
456
465
457
466
A point on an iso-variance curves can be attained with the unrelated question design.
458
467
@@ -476,27 +485,37 @@ Here are some comments about the model design:
476
485
477
486
- An equilibrium of the optimal design model is a Nash equilibrium of a noncooperative game.
478
487
479
-
- Assumption {eq}`eq:util-rand-nine-b` is sufficient to guarantee existence of an optimal model design. By choosing $\text{ Pr}(A|\text{yes})$ and $\text{ Pr}(A|\text{no})$ sufficiently close to each other, all respondents will find it optimal to answer truthfully. The closer are these probabilities, the higher the variance of the estimator becomes.
488
+
- Assumption {eq}`eq:util-rand-nine-b` is sufficient to guarantee existence of an optimal model design.
480
489
481
-
- If respondents experience a large enough increase in expected utility from telling the truth, then there is no need to use a randomized response model. The smallest possible variance of the estimate is then obtained at $\text{ Pr}(A|\text{yes})=1$ and $\text{ Pr}(A|\text{no})=0$ ; that is, when respondents answer truthfully to direct questioning.
490
+
- By choosing $\text{ Pr}(A|\text{yes})$ and $\text{ Pr}(A|\text{no})$ sufficiently close to each other, all respondents will find it optimal to answer truthfully.
482
491
483
-
- A more general design problem would be to minimize some weighted sum of the estimator's variance and bias. It would be optimal to accept some lies from the most "reluctant" respondents.
492
+
- The closer are these probabilities, the higher the variance of the estimator becomes.
484
493
485
-
## Criticisms of Proposed Privacy Measures
494
+
- If respondents experience a large enough increase in expected utility from telling the truth, then there is no need to use a randomized response model.
495
+
496
+
The smallest possible variance of the estimate is then obtained at $\text{ Pr}(A|\text{yes})=1$ and $\text{ Pr}(A|\text{no})=0$; that is, when respondents answer truthfully to direct questioning.
497
+
498
+
- A more general design problem would be to minimize some weighted sum of the estimator's variance and bias.
499
+
500
+
- It would be optimal to accept some lies from the most "reluctant" respondents.
501
+
502
+
## Criticisms of proposed privacy measures
486
503
487
504
We can use a utilitarian approach to analyze some privacy measures.
488
505
489
506
We'll enlist Python Code to help us.
490
507
491
-
### Analysis of Method of Lanke (1976)
508
+
### Analysis of method of Lanke (1976)
492
509
493
510
{cite:t}`lanke1976degree` recommends a privacy protection criterion that minimizes:
Following Lanke's suggestion, the statistician should find the highest possible $\text{ Pr}(A|\text{yes})$ consistent with truth telling while $\text{ Pr}(A|\text{no})$ is fixed at 0. The variance is then minimized at point $X$ in {numref}`fig-lanke-analysis`.
516
+
Following Lanke's suggestion, the statistician should find the highest possible $\text{ Pr}(A|\text{yes})$ consistent with truth telling while $\text{ Pr}(A|\text{no})$ is fixed at 0.
517
+
518
+
The variance is then minimized at point $X$ in {numref}`fig-lanke-analysis`.
500
519
501
520
However, as shown in {numref}`fig-lanke-analysis`, point $Z$ offers a smaller variance that still allows cooperation of the respondents, and it is achievable following our earlier discussion of the truth border:
502
521
@@ -562,7 +581,7 @@ $$
562
581
563
582
This is not an optimal choice under a utilitarian approach.
564
583
565
-
### Analysis on the Method of Chadhuri and Mukerjee (1988)
584
+
### Analysis on the method of Chadhuri and Mukerjee (1988)
566
585
567
586
{cite}`Chadhuri_Mukerjee_88` argued that the individual may find that since "yes" may sometimes relate to the sensitive group A, a clever respondent may falsely but safely always be inclined to respond "no".
568
587
@@ -697,7 +716,7 @@ If the individuals are willing to volunteer this information, it seems that the
697
716
698
717
It ignores the fact that respondents retain the option of lying until they have seen the question to be answered.
699
718
700
-
## Concluding Remarks
719
+
## Concluding remarks
701
720
702
721
703
722
The justifications for a randomized response procedure are that
0 commit comments