Skip to content

Commit 2c9421c

Browse files
committed
Update section on using multiple files
1 parent d544c79 commit 2c9421c

File tree

8 files changed

+287
-32
lines changed

8 files changed

+287
-32
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Using Multiple Files
2+
3+
When analyzing DHS data, you will sometimes need to use more than one file.
4+
For instance, some information about the household (such as water and sanitation) can only be found in the HR or PR file.
5+
However, you may need this information for an analysis of women so you would need to merge to the IR file.
6+
7+
For more detailed explanation on merging please check the DHS Program website for this topic [Merging DHS data](https://www.dhsprogram.com/data/Merging-Datasets.cfm)
8+
9+
There are several types of merges that can be performed. The code for the most common types are available here.
10+
When possible, the DHS model datasets are used.
11+
12+
You may also need to use multiple files when performing a multi-country analysis or trend analysis.
13+
This involves appending or pooling two or more surveys of the same file type.
14+
Appending can also be used to append data from men (MR) and women (IR) which is shown in the code for merging AR to IR or MR files.
15+
16+
### mergePR_toKRorIR
17+
The code will show how to merge a PR to a KR file, and a PR to an IR file.
18+
19+
### mergeHR_toIR
20+
The purpose of this code is to show an example of a many to one merge by merging the HR file to the IR file.
21+
22+
### mergeAR_toIRorMR
23+
This shows how to merge the HIV test results from the AR file to the IR file or the MR file.
24+
The code also shows how to append the merged IR and MR files to a dataset that contains both men and women.
25+
26+
### mergeHW
27+
Older datasets may not contain the anthropometry variables according to the WHO reference.
28+
HW files are created with the WHO reference nutrition indicators that can be merged to datasets.
29+
Here we show an example using the Bangladesh 1999-2000 survey since there is no HW file for model datasets.
30+
31+
### mergeWI
32+
Older datasets sometimes do not contain the wealth index.
33+
To include the wealth index, a WI file was created for these surveys.
34+
The Bangladesh 1999-2000 survey as an example since there is no WI file for model datasets.
35+
The code will show how to merge the WI with an IR file and notes on how to merge with MR or PR/HR files.
36+
37+
### mergeGPS
38+
This shows how to merge an IR file with the GC (points) file in STATA
39+
The Nigeria 2015 MIS data is used for this example, but the code should work for any DHS survey
40+
41+
### mergeHHhead_toIR
42+
Construct a file that attaches the characteristics of the household head to each eligible woman in the household.
43+
44+
### mergeChild01_17_parents
45+
Merging children 0-17 in the PR file with all the data about their mothers and fathers and all the data in the BR file.
46+
This will produce three different data files. See description in the file for more detail.
47+
The execution begins at the bottom of the do file where you need to specify your paths.
48+
49+
### mergeIR_toSR
50+
This is an example of how to produce an IRSR file, using separate IR and SR files.
51+
The Cote d'Ivoire 2022 survey is used since the model datasets do not have an SR file.
52+
53+
### merge_loop
54+
This code shows how to perform a merge in a loop in the case you need to perform merges for several countries.
55+
56+
### pool_trends
57+
This shows how to append two surveys from one country for a trend analysis.
58+
The Nigeria 2013 and 2018 DHS are used as an example for one country and code is provided to show how an append of several countries can be performed in a loop.
59+
60+
61+
62+
##### author: Shireen Assaf
63+
##### last updated: July 26, 2023 by Shireen Assaf

Intro_DHSdata_Analysis/4_Using_Multiple_Files/merge_children_mothers_fathers_do_14Feb2023.txt renamed to Intro_DHSdata_Analysis/4_Using_Multiple_Files/mergeChild01_17_parents.do

Lines changed: 28 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,12 @@
1-
* do e:\DHS\programs\merging\merge_children_mothers_fathers_do_14Feb2023.txt
2-
3-
/*
4-
Program to merge children 0-17 in the PR file with ALL the data about their
1+
/*****************************************************************************************************
2+
Program: mergeChild01_17_parents.do
3+
Purpose: Program to merge children 0-17 in the PR file with ALL the data about their
54
mothers and fathers and ALL the data in the BR file
5+
Author: Tom Pullum
6+
Date last modified: Feb 14, 2023 by Tom Pullum
7+
*****************************************************************************************************/
68

9+
/* Description:
710
Here are the category labels for "Mother's status", with more explanation:
811
0 "Dead" hv111==0
912
1 "Not in HH" (hv111>0 & hv112==0) | hv112==.
@@ -272,18 +275,18 @@ end
272275
******************************************************************************
273276
* Execution begins here
274277

275-
* Specify workspace
276-
cd e:\DHS\DHS_data\scratch
278+
* Specify your workspace, change to your own path
279+
*cd "C:\.."
277280

278-
* Specify path to the data
279-
scalar spath="C:\Users\26216\ICF\Analysis - Shared Resources\Data\DHSdata"
281+
* Specify path to the data, change to your own
282+
scalar spath="C:\Data"
280283

281284
* specify the first six characters of the PR, IR, MR, and BR file names.
282285
* Note that characters 5-6 may not be the same in all four files.
283-
scalar sfn_PR="RWPR61"
284-
scalar sfn_IR="RWIR61"
285-
scalar sfn_MR="RWMR61"
286-
scalar sfn_BR="RWBR61"
286+
scalar sfn_PR="ZZPR62"
287+
scalar sfn_IR="ZZIR62"
288+
scalar sfn_MR="ZZMR61"
289+
scalar sfn_BR="ZZBR62"
287290

288291
make_workfiles
289292
prepare_workfiles_for_merges
@@ -292,10 +295,17 @@ merge_with_IR
292295
merge_with_MR
293296
merge_with_BR
294297

295-
296-
297-
298-
299-
300-
298+
*erase all temporary files
299+
erase BR.dta
300+
erase BRtemp.dta
301+
erase IR.dta
302+
erase IRtemp.dta
303+
erase MR.dta
304+
erase MRtemp.dta
305+
erase PR.dta
306+
erase PRtemp_ch.dta
307+
erase PRtemp_ch_mo.dta
308+
erase PRtemp_ch_mo_fa.dta
309+
erase PRtemp_fa.dta
310+
erase PRtemp_mo.dta
301311

Intro_DHSdata_Analysis/4_Using_Multiple_Files/GPS_Merge.do renamed to Intro_DHSdata_Analysis/4_Using_Multiple_Files/mergeGPS.do

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3,22 +3,15 @@
33
* Created by: Tom Fish <gpsrequests@dhsprogram.com>
44
* Created: 30 October 2018
55
* Updated: 06 April 2021
6-
* Purpose: Explan how to merge an IR file with the GC (points) file in STATA
7-
* Notes: The Nigeria data is used for this example, but the code should work for any DHS survey
6+
* Purpose: Explain how to merge an IR file with the GC (points) file in STATA
7+
* Notes: The Nigeria 2015 MIS data is used for this example, but the code should work for any DHS survey
88
******************************************************************************************/
99
clear all
1010
set more off
1111

12-
* Set folders that I am working on
13-
local ptsDir "C:\Data\DHSdata\GPS\"
14-
local dataDir "C:\Data\DHSdata\Recode\"
15-
local working "C:\Working\"
16-
17-
* Make sure we are working in our working directory
18-
cd "`working'"
1912

2013
* Convert the shapefile into a dta file to merge in STATA
21-
shp2dta using "`ptsDir'NGGE71FL.shp", database(ngpts) coordinates(ngcoord) genid(id)
14+
shp2dta using "NGGE71FL.shp", database(ngpts) coordinates(ngcoord) genid(id)
2215

2316
* Open up the table portion of the shapefile
2417
use ngpts
@@ -31,9 +24,9 @@ sort v001
3124
save ngpts, replace
3225

3326
* Open the IR file
34-
use "`dataDir'NGIR71FL.DTA", clear
27+
use "NGIR71FL.DTA", clear
3528

36-
* Do a 1 to Many merge/join by joining the GPS points (ngpts.dta) to the IR file
29+
* Do a 1 to many merge/join by joining the GPS points (ngpts.dta) to the IR file
3730
sort v001
3831
merge v001 using ngpts.dta
3932

@@ -42,4 +35,4 @@ tab _merge
4235
drop _merge id
4336

4437
* Save the merged file
45-
save NG_Merged, replace
38+
save NG_merged, replace
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
/*****************************************************************************************************
2+
Program: mergeHHhead_toIR.do
3+
Purpose: Construct a file that attaches the characteristics of the household head to each eligible woman in the household.
4+
Author: Tom Pullum
5+
Date last modified: June 16, 2023 by Tom Pullum
6+
*****************************************************************************************************/
7+
8+
use "ZZPR62FL.DTA", clear
9+
10+
* reduce to just household heads
11+
keep if hv101==1
12+
13+
* select variables you need, not necessarily as illustrated here
14+
keep hvidx hv0* hv1*
15+
16+
* rename to avoid possible confusion
17+
rename hv* head_hv*
18+
19+
* specify variables for the merge
20+
rename head_hv001 cluster
21+
rename head_hv002 hh
22+
sort cluster hh
23+
save temp.dta, replace
24+
25+
* open IR file
26+
use "ZZIR62FL.DTA", clear
27+
rename v001 cluster
28+
rename v002 hh
29+
sort cluster hh
30+
31+
* merge is m:1 because there may be more than 1 eligible woman per household
32+
merge m:1 cluster hh using temp.dta
33+
tab _merge
34+
35+
* _merge=2 for households that do not include any eligible women
36+
drop if _merege==2
37+
38+
drop _merge
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/*****************************************************************************************************
2+
Program: mergeIR_toSR.do
3+
Purpose: This is an example of how to produce an IRSR file, using separate IR and SR files.
4+
Author: Tom Pullum
5+
Date last modified: May 26, 2023 by Tom Pullum
6+
*****************************************************************************************************/
7+
8+
* This is an example of how to produce an IRSR file, using separate IR and SR files.
9+
10+
* This is useful for calculating adult and maternal mortality rates with a program
11+
* designed for surveys in which the sibling histories are in the IR file.
12+
13+
* Other modifications may be needed.
14+
* Illustrated with preliminary IR and SR files from Cote d'Ivoire 2022
15+
* The new file is saved with "IRSR" in the filename
16+
17+
* path to your workding directory where your data files should also be stored.
18+
cd "C:\.."
19+
20+
use CIIR80FL.dta, clear
21+
sort v001 v002 v003
22+
save temp.dta, replace
23+
24+
use CISR80FL.dta, clear
25+
26+
keep v001 v002 v003 mm*
27+
rename mm* mm*_
28+
rename mmidx_ index
29+
quietly summarize index
30+
local lmax_index=r(max)
31+
reshape wide mm*_ , i(v001 v002 v003) j(index)
32+
merge v001 v002 v003 using temp.dta
33+
34+
tab _merge
35+
drop _merge
36+
* _merge=2 for women who did not report any siblings; they must be kept in the data.
37+
* mmidx_* is still in the data.
38+
39+
save CIIRSR80FL.dta, replace

Intro_DHSdata_Analysis/4_Using_Multiple_Files/mergeWI.do

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Purpose: example shown to merge WR to IR file. Notes given on how to merge wi
44
Author: Shireen Assaf
55
Date last modified: May 27, 2021 by Shireen Assaf
66
7-
Notes: Older datasets can sometimes not contain the wealth index. To include the wealth index a WI file was created for these surveys. Will use the Bangladesh 1999-2000 survey as an example since model datasets do not have an WI file.
7+
Notes: Older datasets sometimes do not contain the wealth index. To include the wealth index, a WI file was created for these surveys. Will use the Bangladesh 1999-2000 survey as an example since model datasets do not have an WI file.
88
99
Notes are given at the end of the file on how to perform other merges of WI file with other file types.
1010
*****************************************************************************************************/
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
/*****************************************************************************************************
2+
Program: merge_loop.do
3+
Purpose: Merge coded KR files to HR files in a loop for several countries
4+
Author: Shireen Assaf
5+
Date last modified: July 26, 2023 by Shireen Assaf
6+
Notes: This assumes that you have HR and KR coded files from several countries saved for example as CDHRcoded.dta and CDKRcoded.dta. The master file in the case is the KR file. The same logic can be used for other merges.
7+
*****************************************************************************************************/
8+
9+
* list of HR surveys from 6 countries
10+
*CDHR61FL ETHR71FL KEHR72FL NGHR7BFL TZHR7BFL UGHR7BFL
11+
12+
* list of KR surveys from the same 6 countries
13+
global krdata "CDKR61FL ETKR71FL KEKR72FL NGKR7BFL TZKR7BFL UGKR7BFL"
14+
15+
*list your HR files in this format
16+
tokenize "CDHR ETHR KEHR NGHR TZHR UGHR"
17+
18+
*being loop
19+
foreach c in $krdata {
20+
21+
*getting country two letter acronym. See notes, the assumption here is that you have KR and HR coded files.
22+
local cn=substr("`c'",1,2)
23+
24+
use "`cn'KRcoded.dta", clear
25+
26+
*perform a many to one merge since you are merging household data to KR data (many children in one household).
27+
*this merge would be 1:1 if it was PR to KR merge for instance. See code for other merges in this section.
28+
* the local `1' here refers to the first survey in the tokenize list.
29+
30+
merge m:1 v001 v002 using "`1'coded.dta"
31+
32+
*renaming the _merge created by the merge in case you want to perform other merges.
33+
rename _merge KRmerge
34+
35+
* keep only matched cases
36+
keep if KRmerge==3
37+
38+
*save merged file
39+
save "`cn'HRKRmerged.dta", replace
40+
41+
* this shifts to the next survey in the tokensize list and the loop continues until the list is completed.
42+
mac shift
43+
44+
}
45+
46+
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
/*****************************************************************************************************
2+
Program: pool_trends.do
3+
Purpose: This is an example of how to append two surveys from different years for the same country for a trend analysis. We use DHS data from Nigeria 2013 and 2018 as an example and also show how to append in a loop for several countries. See notes below.
4+
Author: Shireen Assaf
5+
Date last modified: July 26, 2023 by Shireen Assaf
6+
*****************************************************************************************************/
7+
8+
* Append two surveys from the same country
9+
10+
use NGIR6AFL.dta, clear // open first survey
11+
gen yr=1 // create a year variable
12+
13+
append using NGIR7BFL.dta // append using second survey
14+
replace yr=2 if yr==. // replace values for second survey
15+
16+
label var yr "year of survey"
17+
label define yr 1 2013 2 2018
18+
label values yr yr
19+
tab yr
20+
21+
* can also just check v007
22+
tab v007
23+
24+
gen wt=v005/1000000
25+
26+
*run programs to code country-specific strata
27+
* Use the Survey_strata.do file from section 2
28+
quietly do "Survey_strata.do"
29+
30+
* svy set for combined data
31+
egen strata2=group(yr strata)
32+
egen v021r = group(yr v021)
33+
svyset v021r [pw=wt], strata(strata2) singleunit(centered)
34+
35+
save NGappend.dta, replace // save appended file
36+
37+
38+
*********************************************
39+
40+
* how to run a loop for appending files for trend analysis.
41+
* the code will append each two sets of surveys for each country.
42+
43+
* create two globals for each survey list. The irdata1 global is the most recent survey that will be appending to irdata2 which is an earlier survey for each country. The list of countries should be in the same order for both globals.
44+
45+
global irdata1 "BDIR7RFL BJIR71FL CMIR71FL KEIR72FL LBIR7AFL RWIR81FL SLIR7AFL UGIR7BFL ZMIR71FL ZWIR72FL"
46+
global irdata2 "BDIR51FL BJIR51FL CMIR44FL KEIR42FL LBIR51FL RWIR53FL SLIR51FL UGIR52FL ZMIR51FL ZWIR52FL"
47+
48+
49+
tokenize $irdata1
50+
51+
foreach c in $irdata2 {
52+
use "`1'coded.dta", clear
53+
gen yr=1
54+
55+
append using "`c'coded.dta"
56+
replace yr=2 if yr==.
57+
58+
local cn = substr(v000,1,2)
59+
save "`cn'append.dta", replace
60+
61+
mac shift 1
62+
}
63+
64+
*/
65+
66+

0 commit comments

Comments
 (0)