-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
disability weight location processing bugfix #354
disability weight location processing bugfix #354
Conversation
src/vivarium_inputs/extract.py
Outdated
"key", axis=1 | ||
) | ||
data = pd.merge( | ||
data.drop("year_id", axis=1, errors="ignore"), year_df, on="key" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm just adding errors='ignore' so we don't error out if year_id isn't in data (we usually expect it not to be after looking at the disability weight flat file, but I'm just being safe and dropping it if it is present).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring errors is pretty scary. I think it would be a lot better to do a simple check for this column before merging so you don't inadvertently cover up a different error caused by a data schema issue.
disability_weights.healthstate_id == entity.healthstate.gbd_id, : | ||
] | ||
# Update location_id to match original location id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do this in gbd access - Zeb was seeing duplicates in our data from these lines
disability weight location processing bugfix
Description=
Changes and notes
Don't duplicate data for locations since this is already done in gbd access.
Only drop year_id column if it is present in data.
Testing
Pulled disability weight for diarrheal diseases with no year specified, for 2021, for 2019 and 2021, and for all years.