Medical Model Monitor | M3

Capstone 4: Applied Model Application

Use Case & Scenario

Problem Statement: Hospitals struggle to continuously monitor patients' vital signs and promptly identify those at risk of adverse events. Complex data and environmental factors can delay intervention, increase risk, and incur higher expense.
Solution: A mobile and desktop application that notifies staff of patients needing assistance. From a technical perspective, this involves training a range of models to establish a baseline of knowledge, creating a prediction for values based on historical analysis, and managing real-time inputs for validation.

Data Authorization and Access

Research-grade medical data is considered public, but retains certain safeguards to deter improper use and mitigate risk. This introduced some delay as various steps were completed, and requests were processed by third party teams and systems. These prerequisites were anticipated though, as outlined in the project proposal.

To summarize, the following steps were completed in order to gain access to the MIMIC IV dataset:

Registration on PhysioNet website
PhysioNet application review and approval; use case and reference evaluation
Training Completion; CITI 'Data or Specimens Only Research Training'
Code of Conduct agreement
Credentialed Health Data Use Agreement (per dataset)

After all steps were completed, access was granted to credentialed datasets, to include the MIMIC IV dataset.

Data Extraction and Loading

Post-authorization, I elected to locally store the ~120 GB dataset in PostgreSQL over cloud-based access. This incurred a restricted download (500 kb/s) over ~19 hours for all zipped tables to complete. The end result, two datasets, comprising over half a million records, for over a quarter-million individuals.

Records (Qty)	Scope
364,627	unique individuals
546,028	hospitalizations
94,458	unique ICU stays

hosp contains 546,028 hospitalizations for 223,452 unique individuals
icu contains 94,458 ICU stays for 65,366 unique individuals

Data Visual Inspection

With records unzipped and imported into PostgreSQL I could begin inspecting table columns.

High frequency of `null` values

Despite broad use, null values should remain in most cases.
Larger tables combine various events that are inherently unique, and could be degraded in quality if subjected to rounding, interpolation or similar data manipulation. While handling varies by each case, generally speaking, high-level analysis may be inclined to drop such columns, whereas fine-grained analysis may filter records on specific data types and values.

Depersonalization; Modification of Date and Age values

Outlined in the official documentation, all personally identifiable information has been scrubbed from MIMIC-IV, and date/age values have been shifted at random, but retain their relation. These transformed values map to subsequent anchor columns, explained below:

The anchor_year column is a deidentified year occurring sometime between 2100 - 2200.
The anchor_year_group column is one of the following values: "2008 - 2010", "2011 - 2013", "2014 - 2016", "2017 - 2019", and "2020 - 2022".

Example: if a patient's anchor_year is 2158, and their anchor_year_group is 2011 - 2013, then any hospitalizations for the patient occurring in the year 2158 actually occurred sometime between 2011 - 2013.

The anchor_age provides the patient age in the given anchor_year.

Example: If the patient was over 89 in the anchor_year, this anchor_age has been set to 91 (i.e. all patients over 89 have been grouped together into a single group with value 91, regardless of what their real age was).

Summary

To reiterate, retaining null values will vary by context. From basic analysis, it appears they will be ignored/dropped when performing broad-spectrum queries, where such detailed values would be irrelevant. Conversely, specific analysis, such as queries based on a specific condition, medication, or person, may benefit from retaining them, providing detailed insight on both condition and treatment.

This review affirms the machine learning algorithms and models chosen in the project proposal. There is a substantial amount of variability, with highly dimensional tables spanning a very broad range of topics and events. Selection of Random Forest, Gradient Descent and Neural Networks is far more applicable than their linear counterparts, which would struggle with overfitting and context development.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
archive		archive
data		data
.gitignore		.gitignore
00_pipeline_readme.ipynb		00_pipeline_readme.ipynb
01_unsupervised.ipynb		01_unsupervised.ipynb
02_supervised.ipynb		02_supervised.ipynb
03_time_series.ipynb		03_time_series.ipynb
readme.md		readme.md
setup_utils.py		setup_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Model Monitor | M3

Capstone 4: Applied Model Application

Use Case & Scenario

Data Authorization and Access

Data Extraction and Loading

Data Visual Inspection

High frequency of `null` values

Depersonalization; Modification of Date and Age values

Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Medical Model Monitor | M3

Capstone 4: Applied Model Application

Use Case & Scenario

Data Authorization and Access

Data Extraction and Loading

Data Visual Inspection

High frequency of null values

Depersonalization; Modification of Date and Age values

Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

High frequency of `null` values

Packages