We're Hunter College Computer Science students working on a data science project together through CUNY Tech Prep 2024. View weekly updates below.
In this team: Tedd Lee, Aaleia Fernando
- EMS personel respond to a wide variety of calls, including both physical and mental health emergencies. Our project aims to improve the accuracy of determining NYC EMS call type before completion of the call, with the ultimate purpose of improving the actual response to such emergencies by ensuring the right personell and resources are made available on-site as soon as possible. Currently, there is about 90% accuracy in determining call type when a call is first placed -- we believe that we can do better.
- Tedd:
- Data organization
- Visualization and analysis
- Backend development
- Aaleia:
- Data preprocessing
- Feature engineering, hyperparameter tuning
- Frontend build
- How to Build a Data Pipeline (Tedd)
- End To End Machine Learning | Classification & Regression (Aaleia)
- A supervised model with >90% accuracy predicting final call emergency type.
- A simple frontend application allowing user input and an output of predicted label.
- At least one major visualization of an insight we gained from exploring and analyzing the data.
- Hunt down datasets that are directly related to our project ideas
- Find additional datasets that might be useful or add useful insights to our analysis
- Get familiar with how the raw data might break things.
- Initial glances at trends and outliers
- Data visualization focusing on specific potential insights
- Identify features to be used in training our model
- Start with barebones random forest
- Adjust features used and hyperparameters to achieve higher accuracy
- Add additional features from external data sources (e.g. weather and/or special event data)
- Evaluation and optimization
- Develop simple webapp to display our results
- Connect model and allow for user input and model predictions
- Add final insights and visualizations to the site
- Develop a final presentation
- Invest time in label grouping
- Initial random forest model complete
- Optimize for higher accuracy (try using gradient-boosting) (T)
- Incorporate new data (start with weather data) (A)
- Ideation for frontend build
- outline the rest of the project MVP, weekly goals, project direction
- analyze data and create visualizations
- group classifications
- one-hot encoding
- build initial model
- Regrouping - find new dataset and new MVP
- Completed data understanding and exploration
- Completed initial model build (A)
- Download the data
- Initial visualization (AF)
- + bonus ideation
- Determine MVP