Skip to content

Commit 064b262

Browse files
authored
Merge pull request #47 from UBC-MDS/cor_map
Added scaling example for readme
2 parents 14dc655 + 9ba3c23 commit 064b262

File tree

2 files changed

+29
-12
lines changed

2 files changed

+29
-12
lines changed

README.md

Lines changed: 29 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ $ pip install -i https://test.pypi.org/simple/ eda_utils_py
1515
## Functions
1616

1717
The four functions contained in this package are as follows:
18-
- `cor_map`: A function to plot a correlation matrix of numeric columns in the dataframe
18+
- `imputer`: A function to impute missing values
1919
- `outlier_identifier`: A function to identify and deal with outliers
20+
- `cor_map`: A function to plot a correlation matrix of numeric columns in the dataframe
2021
- `scale` A function to scale numerical values in the dataset
21-
- `imputer`: A function to impute missing values
2222

2323

2424
## Our Place in the Python Ecosystem
@@ -33,9 +33,9 @@ While Python packages with similar functionalities exist, this package aims to s
3333
- Please see a list of dependencies [here](pyproject.toml).
3434

3535
## Usage
36-
The eda_utils_py package help you to build exploratory data analysis.
36+
The eda_utils_py package will help you in your exploratory data analysis portion of your work.
3737

38-
eda_utils_py includes multiple custom functions to perform initial exploratory analysis on any input data describing the structure and the relationships present in the data. The generated output can be obtained in both object and graphical form.
38+
eda_utils_py includes multiple custom functions to perform initial exploratory analysis on any input data describing the structure and the relationships present in the data. Depending on the function, the generated output can be obtained in object or graphical form.
3939

4040
```python
4141
import pandas as pd
@@ -59,39 +59,56 @@ data_with_outlier = pd.DataFrame({
5959
'SepalWidthCm':[1.4, 1.4, 1.3, 1.2, 1.2, 1.3, 1.6, 1.3],
6060
'PetalWidthCm':[0.2, 0.1, 30, 0.2, 0.3, 0.1, 0.4, 0.5]
6161
})
62+
63+
data_with_scale = pd.DataFrame({'SepalLengthCm':[1, 0, 0, 3, 4],
64+
'SepalWidthCm':[4, 1, 1, 0, 1],
65+
'PetalWidthCm':[2, 0, 0, 2, 1],
66+
'Species':['Iris-setosa','Iris-virginica', 'Iris-germanica', 'Iris-virginica','Iris-germanica']})
6267
```
6368

64-
The eda_utils_py will help you to:
65-
- Diagnose data quality: Resolve skewed data by identifing missing data and outlier and provide corresponding remedy.
69+
The eda_utils_py package contains functions that will help you to:
70+
- **Impute**: Resolve skewed data by identifying missing data and outlier and provide corresponding remedy.
6671

6772
```python
6873
imputer(data_with_NA)
6974
```
70-
Output:
75+
Output of `imputer()`:
7176

7277
![imputer_output](images/imputer_output.png)
7378

79+
- **Identify Outliers**: Identify and deal with outliers in the dataset.
80+
7481
```python
7582
outlier_identifier(data_with_outlier, method = "median")
7683
```
77-
Output:
84+
Output of `outlier_identifier()`:
7885

7986
![outlier_output](images/outlier_output.png)
8087

81-
- This package can help you easily plot a correlation matrix along with its values to help explore data.
88+
- **Correlation Heatmap Plotting**: Easily plot a correlation matrix along with its values to help explore data.
8289

8390
```python
8491
numerical_columns = ['SepalLengthCm','SepalWidthCm','PetalWidthCm']
8592

8693
cor_map(data, numerical_columns, col_scheme = 'purpleorange')
8794

8895
```
89-
Output:
96+
Output of `cor_map()`:
9097

9198
![cor_map_output](images/cor_map.output.png)
9299

93-
- Machine learning pereperation: Perform column transformations, derive scaler automatically to fulfill further machine learning need
94-
100+
- **Scaling**: Scale the data in preperation for future use in machine learning projects.
101+
102+
```python
103+
numerical_columns = ['SepalLengthCm','SepalWidthCm','PetalWidthCm']
104+
105+
scale(data, numerical_columns, scaler="minmax")
106+
107+
```
108+
Output of `scale()`:
109+
110+
![scale_output](images/scale_output.png)
111+
95112
## Documentation
96113

97114
The official documentation is hosted on Read the Docs: https://eda_utils_py.readthedocs.io/en/latest/

images/scale_output.png

14 KB
Loading

0 commit comments

Comments
 (0)