This repository contains code and analysis performed for my DATA MODELING graduate course at Harvard University. The files are presented in both R Markdown (.rmd) and PDF formats for comprehensive documentation and readability.
• /code/: Contains R Markdown files (.rmd) used for various data modeling assignments and projects.
• /output/: Contains the rendered PDF versions of the .rmd files for easy viewing of code and results.
• /data/: Contains the data that was provided outside of the existing libraries mentioned in the .rmd files.
The course focuses on:
• Building and analyzing statistical models using R.
• Applying techniques such as regression, ANOVA, and data visualization.
• Understanding and interpreting model diagnostics and results.
Key Topics Covered:
• Linear Regression Analysis: Modeling relationships between variables.
• ANOVA: Analysis of variance for hypothesis testing.
• Residual Analysis: Evaluating model fit through residuals and plots.
• Regularization Techniques: Ridge, Lasso, and Elastic Net.
• Exploratory Data Analysis (EDA): Visual and statistical exploration of data.
Example Files:
• linear_model_analysis.rmd & linear_model_analysis.pdf: Code and report for linear model building.
• anova_project.rmd & anova_project.pdf: Analysis of variance project and findings.
To run the .rmd files locally, ensure you have: • R installed (version X.X or higher) • RStudio for easy editing and rendering of .rmd files • Required R packages: ggplot2, dplyr, tidyverse, etc.
-
Clone the repository:
git clone https://github.com/break/datamod.git
-
Open and run .rmd files in RStudio to reproduce the analysis.
-
View PDF reports for a snapshot of the analysis and results.
This repository is for educational purposes related to coursework at Harvard University. Feel free to use the code as a reference but ensure proper attribution for any direct use.