A simple project to download, process, and analyze CMS (Centers for Medicare & Medicaid Services) claims data for Multiple Sclerosis (MS) patients.
The included scripts perform the following steps:
- Download CMS claims data (in zip files) from the CMS website(https://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DESample01).
- Extract the data (CSV files) from the zip files.
- Load the CSV files into a SQLite database.
- Performs data transformations using SQL on the loaded data.
- Creates and persists new tables with the transformed data.
To run this project, you will need Python 3.6+ installed on your system. Additionally, the following Python libraries are required:
- pandas
- requests
- sqlalchemy
- tqdm
To set up the project, follow these steps:
-
Clone this repository.
git clone https://github.com/IrvicRodriguez/CMS-MS-Finder
-
Create a virtual environment.
python -m venv venv
You can name the virtual environment differently. -
Activate the virtual environment. On MacOs or Linux use:
source venv/bin/activate
on Windows use:venv\Scripts\activate
. -
Install the required packages from the
requirements.txt
file usingpip install -r requirements.txt
.
After setting up the project, you can run the main.py
script to execute the pipeline using: python main.py
.
This script will download the necessary data files, process them, and store the results in the SQLite database (cms_data.db
). Be aware run time is based on local system. For example the scripts takes around 10 to 15 Minutes of runtime (Example times come from Intel 3.6GHz quad-core Mac with 64GB's of RAM).
main.py
: The main script that executes the data processing pipeline.config.py
: Contains the specific zip file URLs to download from CMS for the rest of the scripts to work.downloader.py
: Contains the function download_and_extract_data for downloading and extracting data files.db_loader.py
: Contains the function load_data_to_db for loading CSV data into the SQLite database.transformations.py
: Contains the function create_and_persist_transformations for creating and persisting data transformations in the database.requirements.txt
: Contains needed packages to install in the environmennt running.data
: Folder that contains data file locally after download of zipfiles
Each file contains comments that explain the purpose of each function and how the code works.