You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug can be found in the two episode*.csv files generated for patient 49037. In both files, no diagnosis columns have label 1, which is clearly not right.
The cause is in preprocessing.py. In function extract_diagnosis_labels, in the input dataframe diagnosis, the ICD9_CODE column has a numerical dtype. This causes the columns of labels to also be numerical. However the match condition in Line 82 is against the hardcoded list diagnosis_labels which contains strings. This means Line 82 will never be true, and no diagnosis value will be set to 1.
This bug affects all episodes who only have numerical diagnosis ICD codes (i.e. no alpha-numerical codes like V28492). In these cases pandas automatically infers the dtype to be int64, rather than object/str, causing the bug.
This bug however does not seem to affect the labels in task-specific datasets, which still look correct.
A fix is to add this line diagnoses['ICD9_CODE'] = diagnoses['ICD9_CODE'].astype(str)
before diagnoses['VALUE'] = 1.
The text was updated successfully, but these errors were encountered:
This bug can be found in the two episode*.csv files generated for patient 49037. In both files, no diagnosis columns have label 1, which is clearly not right.
The cause is in
preprocessing.py
. In functionextract_diagnosis_labels
, in the input dataframediagnosis
, theICD9_CODE
column has a numerical dtype. This causes the columns oflabels
to also be numerical. However the match condition in Line 82 is against the hardcoded listdiagnosis_labels
which contains strings. This means Line 82 will never be true, and no diagnosis value will be set to 1.This bug affects all episodes who only have numerical diagnosis ICD codes (i.e. no alpha-numerical codes like V28492). In these cases pandas automatically infers the dtype to be int64, rather than object/str, causing the bug.
This bug however does not seem to affect the labels in task-specific datasets, which still look correct.
A fix is to add this line
diagnoses['ICD9_CODE'] = diagnoses['ICD9_CODE'].astype(str)
before
diagnoses['VALUE'] = 1
.The text was updated successfully, but these errors were encountered: