I'm a data analyst passionate about coding in Python, R, and Go.
Some of my projects include:
Calibrated Uncertainty (NumPy, SciPy, scikit-learn, PyMC3, JAX-based NumPyro)
- Conducted in-depth analysis of an uncertainty calibration algorithm for Bayesian neural networks.
- Identified the advantages of the approach and its modes of failure.
- Received the highest grade and an invitation to continue research at the Harvard Data to Actionable Knowledge lab.
Galaxy Measurements (TensorFlow, pandas, Streamlit, Docker, Heroku)
- Used TensorFlow to estimate the shape and brightness of simulated galaxies.
- Responsible for data generation, exploratory data analysis web app, neural architecture search, and denoising pipelines.
- Got the top grade and an opportunity to continue research.
NBA Conference Advantage (R, tidyverse, Scrapy, LaTeX)
- Performed statistical modeling of potential bias in the NBA that grants teams in one conference an easier path to success due to the differences in travel and schedule.
- Wrote web scrapers, feature engineering, and most of the analysis code in R.
- Built linear regression models, ran diagnostics, authored around 70% of the report.
Meteorological Observatory (Python, TCP sockets, regex, pytest)
- Implemented in Python streaming data collection from weather instruments.
- The code has a suite of unit tests and is deployed at a university meteorological station.
- Serves as the basis for experimental studies of turbulence by five scientific institutions.
- Contributed with bug fixes and documentation updates to open source projects such as Apache Arrow, scikit-learn, ThunderSVM (GPU accelerated SVM), Picasso (sparse regression algorithm), Gap Statistic (clustering metric).