This repository shows the process of preparing data for creating a statistical model in the R programming language.
The dataset concerns the parameters used in the beer brewing process. 11 variables were arbitrarily selected from a dataset containing 29 variables. These variables are to be used to build a statistical model that examines the effect of selected variables on alcohol by volume.
The process outlined includes:
- descriptive analysis of selected variables, determination of the measurement scale and visualization
- imputation of missing data
- outliers identification
- analysis of correlation between variables
- data sampling
library(dplyr)
library(ggplot2)
library(VIM)
library(gridExtra)
library(corrplot)