Events tables : "inputevents_cv", "inputevents_mv", "chartevents", "labevents", "outputevents" , "prescriptions"
Run ./create_tables.py
in which we change units to be consistent,
and drop records whose units we cannot change by rules.
Events tables : "inputevents_cv", "inputevents_mv", "chartevents", "labevents", "outputevents"
Run ./extract_tables.py
in which we filter useful itemids for further extration.
Run ./extract_subjects.py
in which we extract the features from first icu admission for 38597 distinct adult subjects.
For icustays
, add age, and given-hours mortality, only keep age > 15
For prescriptions
, diagnoses
and procedures
, filter them on icustays
,and count the number of occurrences of ICD-9 code
which we can use to keep only icd9 code by the number of occurrences above threshold.
Break up basic tables above and events tables by subject respectively:
Filter records whose value is nan or empty string.
Run ./validate_events.py
-
drop all events whose HADM_ID is empty or not listed in
stays.csv
-
recover ICUSTAY_ID if it is empty in
stays.csv
-
drop all events whose ICUSTAY_ID is not the same as that of stays.csv for the same HADM_ID
-
filter events with nan VALUE
-
save at "X_verified.csv"
Run ./extract_episodes_from_subjects.py
in which we extract raw timeseries data in episodeN_timeseries_processed.csv
, processed timeseries data in episodeN_timeseries_raw.csv
, and static episode including basic info and diagnoses
icd9 codes group in episodeN.csv
.
Besides, there is episodes_index
in output path which save 38466 SUBJECT_ID to its episode path.
stays.csv
+ diagnoses.csv
+ procedures.csv
>> episode.csv
prescriptions
, inputevents_cv
, inputevents_mv
, labevents
,outputevents
, chartevents
>>
episode_timeseries.csv
Run ./create_dataset.py
in which we creat dataset from 38462 subjects for experiments.
EOF