Skip to content

Commit d10dd4d

Browse files
committed
update one-sentence rule
1 parent 3c7ae02 commit d10dd4d

File tree

1 file changed

+53
-34
lines changed

1 file changed

+53
-34
lines changed

lectures/util_rand_resp.md

Lines changed: 53 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The lecture tells how Ljungqvist used his framework to shed light on alternative
3030

3131
## Privacy Measures
3232

33-
We consider randomized response models with only two possible answers, "yes" and "no."
33+
We consider randomized response models with only two possible answers, "yes" and "no."
3434

3535
The design determines probabilities
3636

@@ -49,7 +49,7 @@ $$
4949
$$ (eq:util-rand-one)
5050
5151
52-
## Zoo of Concepts
52+
## Zoo of concepts
5353
5454
At this point we describe some concepts proposed by various researchers.
5555
@@ -100,7 +100,7 @@ An efficient randomized response model is, therefore, any model that attains the
100100
101101
As a special example, Leysieffer and Warner considered "a problem in which there is no jeopardy in a no answer"; that is, $g(\text{no}|A^{'})$ can be of unlimited magnitude.
102102
103-
Evidently, an optimal design must have
103+
Evidently, an optimal design must have
104104
105105
$$
106106
\text{Pr}(\text{yes}|A)=1
@@ -116,7 +116,7 @@ $$
116116
117117
{cite:t}`lanke1975choice` argued that "it is membership in Group A that people may want to hide, not membership in the complementary Group A'."
118118
119-
For that reason, {cite:t}`lanke1976degree` argued that an appropriate measure of protection is to minimize
119+
For that reason, {cite:t}`lanke1976degree` argued that an appropriate measure of protection is to minimize
120120
121121
$$
122122
\max \left\{ \text{Pr}(A|\text{yes}) , \text{Pr}(A|\text{no}) \right\}
@@ -167,24 +167,25 @@ $$ (eq:util-rand-eight-b)
167167
168168
This measure is just the first term in {eq}`eq:util-rand-seven-a`, i.e., the probability that an individual answers "yes" and is perceived to belong to $A$.
169169
170-
## Respondent's Expected Utility
170+
## Respondent's expected utility
171171
172-
### Truth Border
172+
### Truth border
173173
174-
Key assumptions that underlie a randomized response technique for estimating the fraction of a population that belongs to $A$ are:
174+
Key assumptions that underlie a randomized response technique for estimating the fraction of a population that belongs to $A$ are:
175175
176176
- **Assumption 1**: Respondents feel discomfort from being thought of as belonging to $A$.
177177
178-
- **Assumption 2**: Respondents prefer to answer questions truthfully than to lie, so long as the cost of doing so is not too high. The cost is taken to be the discomfort in 1.
178+
- **Assumption 2**: Respondents prefer to answer questions truthfully than to lie, so long as the cost of doing so is not too high.
179+
180+
- The cost is taken to be the discomfort in 1.
179181
180182
Let $r_i$ denote individual $i$'s response to the randomized question.
181183
182184
$r_i$ can only take values "yes" or "no".
183185
184-
For a given design of a randomized response interview and a given belief about the fraction of the population
185-
that belongs to $A$, the respondent's answer is associated with a conditional probability $ \text{Pr}(A|r_i)$ that the individual belongs to $A$.
186+
For a given design of a randomized response interview and a given belief about the fraction of the population that belongs to $A$, the respondent's answer is associated with a conditional probability $\text{Pr}(A|r_i)$ that the individual belongs to $A$.
186187
187-
Given $r_i$ and complete privacy, the individual's utility is higher if $r_i$ represents a truthful answer rather than a lie.
188+
Given $r_i$ and complete privacy, the individual's utility is higher if $r_i$ represents a truthful answer rather than a lie.
188189
189190
In terms of a respondent's expected utility as a function of $ \text{Pr}(A|r_i)$ and $r_i$
190191
@@ -207,19 +208,19 @@ $$ (eq:util-rand-nine-a)
207208
and
208209
209210
$$
210-
U_i\left(\text{Pr}(A|r_i),\text{truth}\right)>U_i\left(\text{Pr}(A|r_i),\text{lie}\right) , \text{ for } \text{Pr}(A|r_i) \in [0,1]
211+
U_i\left(\text{Pr}(A|r_i),\text{truth}\right)>U_i\left(\text{Pr}(A|r_i),\text{lie}\right), \text{ for } \text{Pr}(A|r_i) \in [0,1]
211212
$$ (eq:util-rand-nine-b)
212213
213-
Suppose now that correct answer for individual $i$ is "yes".
214+
Suppose now that correct answer for individual $i$ is "yes".
214215
215-
Individual $i$ would choose to answer truthfully if
216+
Individual $i$ would choose to answer truthfully if
216217
217218
$$
218219
U_i\left(\text{Pr}(A|\text{yes}),\text{truth}\right)\geq U_i\left(\text{Pr}(A|\text{no}),\text{lie}\right)
219220
$$ (eq:util-rand-ten-a)
220221
221222
222-
If the correct answer is "no", individual $i$ would volunteer the correct answer only if
223+
If the correct answer is "no", individual $i$ would volunteer the correct answer only if
223224
224225
$$
225226
U_i\left(\text{Pr}(A|\text{no}),\text{truth}\right)\geq U_i\left(\text{Pr}(A|\text{yes}),\text{lie}\right)
@@ -235,15 +236,15 @@ so that a "yes" answer increases the odds that an individual belongs to $A$.
235236
236237
Constraint {eq}`eq:util-rand-ten-b` holds for sure.
237238
238-
Consequently, constraint {eq}`eq:util-rand-ten-a` becomes the single necessary condition for individual $i$ always to answer truthfully.
239+
Consequently, constraint {eq}`eq:util-rand-ten-a` becomes the single necessary condition for individual $i$ always to answer truthfully.
239240
240-
At equality, constraint {eq}`eq:util-rand-ten-a` determines conditional probabilities that make the individual indifferent between telling the truth and lying when the correct answer is "yes":
241+
At equality, constraint {eq}`eq:util-rand-ten-a` determines conditional probabilities that make the individual indifferent between telling the truth and lying when the correct answer is "yes":
241242
242243
$$
243244
U_i\left(\text{Pr}(A|\text{yes}),\text{truth}\right)= U_i\left(\text{Pr}(A|\text{no}),\text{lie}\right)
244245
$$ (eq:util-rand-eleven)
245246
246-
Equation {eq}`eq:util-rand-eleven` defines a "truth border".
247+
Equation {eq}`eq:util-rand-eleven` defines a "truth border".
247248
248249
Differentiating {eq}`eq:util-rand-eleven` with respect to the conditional probabilities shows that the truth border has a positive slope in the space of conditional probabilities:
249250
@@ -255,17 +256,25 @@ The source of the positive relationship is:
255256
256257
- The individual is willing to volunteer a truthful "yes" answer so long as the utility from doing so (i.e., the left side of {eq}`eq:util-rand-eleven`) is at least as high as the utility of lying on the right side of {eq}`eq:util-rand-eleven`.
257258
258-
- Suppose now that $\text{Pr}(A|\text{yes})$ increases. That reduces the utility of telling the truth. To preserve indifference between a truthful answer and a lie, $\text{Pr}(A|\text{no})$ must increase to reduce the utility of lying.
259+
- Suppose now that $\text{Pr}(A|\text{yes})$ increases.
260+
261+
- This reduces the utility of telling the truth.
262+
263+
- To preserve indifference between a truthful answer and a lie, $\text{Pr}(A|\text{no})$ must increase to reduce the utility of lying.
259264
260-
### Drawing a Truth Border
265+
### Drawing a truth border
261266
262267
We can deduce two things about the truth border:
263268
264269
- The truth border divides the space of conditional probabilities into two subsets: "truth telling" and "lying".
265270
266-
- Thus, sufficient privacy elicits a truthful answer, whereas insufficient privacy results in a lie. The truth border depends on a respondent's utility function.
271+
- Thus, sufficient privacy elicits a truthful answer, whereas insufficient privacy results in a lie.
272+
273+
- The truth border depends on a respondent's utility function.
274+
275+
- Assumptions in {eq}`eq:util-rand-nine-a` and {eq}`eq:util-rand-nine-a` are sufficient only to guarantee a positive slope of the truth border.
267276
268-
- Assumptions in {eq}`eq:util-rand-nine-a` and {eq}`eq:util-rand-nine-a` are sufficient only to guarantee a positive slope of the truth border. The truth border can have either a concave or a convex shape.
277+
The truth border can have either a concave or a convex shape.
269278
270279
We can draw some truth borders with the following Python code:
271280
@@ -341,9 +350,9 @@ plt.legend(loc=0, fontsize='large')
341350
plt.show()
342351
```
343352
344-
## Utilitarian View of Survey Design
353+
## Utilitarian view of survey design
345354
346-
### Iso-variance Curves
355+
### Iso-variance curves
347356
348357
A statistician's objective is
349358
@@ -378,7 +387,7 @@ From expression {eq}`eq:util-rand-thirteen`, {eq}`eq:util-rand-fourteen-a` and {
378387
379388
- Iso-variance curves are always upward-sloping and concave.
380389
381-
### Drawing Iso-variance Curves
390+
### Drawing iso-variance curves
382391
383392
We use Python code to draw iso-variance curves.
384393
@@ -452,7 +461,7 @@ var = Iso_Variance(π=0.3, n=100)
452461
var.plotting_iso_variance_curve()
453462
```
454463
455-
### Optimal Survey
464+
### Optimal survey
456465
457466
A point on an iso-variance curves can be attained with the unrelated question design.
458467
@@ -476,27 +485,37 @@ Here are some comments about the model design:
476485
477486
- An equilibrium of the optimal design model is a Nash equilibrium of a noncooperative game.
478487
479-
- Assumption {eq}`eq:util-rand-nine-b` is sufficient to guarantee existence of an optimal model design. By choosing $\text{ Pr}(A|\text{yes})$ and $\text{ Pr}(A|\text{no})$ sufficiently close to each other, all respondents will find it optimal to answer truthfully. The closer are these probabilities, the higher the variance of the estimator becomes.
488+
- Assumption {eq}`eq:util-rand-nine-b` is sufficient to guarantee existence of an optimal model design.
480489
481-
- If respondents experience a large enough increase in expected utility from telling the truth, then there is no need to use a randomized response model. The smallest possible variance of the estimate is then obtained at $\text{ Pr}(A|\text{yes})=1$ and $\text{ Pr}(A|\text{no})=0$ ; that is, when respondents answer truthfully to direct questioning.
490+
- By choosing $\text{ Pr}(A|\text{yes})$ and $\text{ Pr}(A|\text{no})$ sufficiently close to each other, all respondents will find it optimal to answer truthfully.
482491
483-
- A more general design problem would be to minimize some weighted sum of the estimator's variance and bias. It would be optimal to accept some lies from the most "reluctant" respondents.
492+
- The closer are these probabilities, the higher the variance of the estimator becomes.
484493
485-
## Criticisms of Proposed Privacy Measures
494+
- If respondents experience a large enough increase in expected utility from telling the truth, then there is no need to use a randomized response model.
495+
496+
The smallest possible variance of the estimate is then obtained at $\text{ Pr}(A|\text{yes})=1$ and $\text{ Pr}(A|\text{no})=0$; that is, when respondents answer truthfully to direct questioning.
497+
498+
- A more general design problem would be to minimize some weighted sum of the estimator's variance and bias.
499+
500+
- It would be optimal to accept some lies from the most "reluctant" respondents.
501+
502+
## Criticisms of proposed privacy measures
486503
487504
We can use a utilitarian approach to analyze some privacy measures.
488505
489506
We'll enlist Python Code to help us.
490507
491-
### Analysis of Method of Lanke (1976)
508+
### Analysis of method of Lanke (1976)
492509
493510
{cite:t}`lanke1976degree` recommends a privacy protection criterion that minimizes:
494511
495512
$$
496513
\max \left\{ \text{Pr}(A|\text{yes}) , \text{Pr}(A|\text{no}) \right\}
497514
$$ (eq:util-rand-five-b)
498515
499-
Following Lanke's suggestion, the statistician should find the highest possible $\text{ Pr}(A|\text{yes})$ consistent with truth telling while $\text{ Pr}(A|\text{no})$ is fixed at 0. The variance is then minimized at point $X$ in {numref}`fig-lanke-analysis`.
516+
Following Lanke's suggestion, the statistician should find the highest possible $\text{ Pr}(A|\text{yes})$ consistent with truth telling while $\text{ Pr}(A|\text{no})$ is fixed at 0.
517+
518+
The variance is then minimized at point $X$ in {numref}`fig-lanke-analysis`.
500519
501520
However, as shown in {numref}`fig-lanke-analysis`, point $Z$ offers a smaller variance that still allows cooperation of the respondents, and it is achievable following our earlier discussion of the truth border:
502521
@@ -562,7 +581,7 @@ $$
562581
563582
This is not an optimal choice under a utilitarian approach.
564583
565-
### Analysis on the Method of Chadhuri and Mukerjee (1988)
584+
### Analysis on the method of Chadhuri and Mukerjee (1988)
566585
567586
{cite}`Chadhuri_Mukerjee_88` argued that the individual may find that since "yes" may sometimes relate to the sensitive group A, a clever respondent may falsely but safely always be inclined to respond "no".
568587
@@ -697,7 +716,7 @@ If the individuals are willing to volunteer this information, it seems that the
697716
698717
It ignores the fact that respondents retain the option of lying until they have seen the question to be answered.
699718
700-
## Concluding Remarks
719+
## Concluding remarks
701720
702721
703722
The justifications for a randomized response procedure are that

0 commit comments

Comments
 (0)