Skip to content

To read the given data and perform Feature Encoding and Transformation process. ( 19AI403 - Introduction to Data Science )

Notifications You must be signed in to change notification settings

Harsayazheni/Introduction-to-data-science-3

 
 

Repository files navigation

EXNO-3-DS

AIM:

To read the given data and perform Feature Encoding and Transformation process and save the data to a file.

ALGORITHM:

STEP 1:Read the given Data.

STEP 2:Clean the Data Set using Data Cleaning Process.

STEP 3:Apply Feature Encoding for the feature in the data set.

STEP 4:Apply Feature Transformation for the feature in the data set.

STEP 5:Save the data to the file.

FEATURE ENCODING:

  1. Ordinal Encoding An ordinal encoding involves mapping each unique label to an integer value. This type of encoding is really only appropriate if there is a known relationship between the categories. This relationship does exist for some of the variables in our dataset, and ideally, this should be harnessed when preparing the data.
  2. Label Encoding Label encoding is a simple and straight forward approach. This converts each value in a categorical column into a numerical value. Each value in a categorical column is called Label.
  3. Binary Encoding Binary encoding converts a category into binary digits. Each binary digit creates one feature column. If there are n unique categories, then binary encoding results in the only log(base 2)ⁿ features.
  4. One Hot Encoding We use this categorical data encoding technique when the features are nominal(do not have any order). In one hot encoding, for each level of a categorical feature, we create a new variable. Each category is mapped with a binary variable containing either 0 or 1. Here, 0 represents the absence, and 1 represents the presence of that category.

Methods Used for Data Transformation:

1. FUNCTION TRANSFORMATION

• Log Transformation • Reciprocal Transformation • Square Root Transformation • Square Transformation

2. POWER TRANSFORMATION

• Boxcox method • Yeojohnson method

CODING AND OUTPUT:

import pandas as pd
df=pd.read_csv("/content/Encoding Data.csv")
df

Screenshot 2024-05-04 084550

from sklearn.preprocessing import LabelEncoder,OrdinalEncoder
pm=['Hot','Warm','Cold']
e1=OrdinalEncoder(categories=[pm])
e1.fit_transform(df[["ord_2"]])

Screenshot 2024-05-04 084558

df['bo2']=e1.fit_transform(df[["ord_2"]])
df

Screenshot 2024-05-04 084609

df['bo2']=e1.fit_transform(df[["ord_2"]])
df

Screenshot 2024-05-04 084618

le=LabelEncoder()
dfc=df.copy()
dfc['ord_2']=le.fit_transform(dfc['ord_2'])
dfc

Screenshot 2024-05-04 084627

from sklearn.preprocessing import OneHotEncoder
ohe=OneHotEncoder(sparse=False)
df2=df.copy()
enc=pd.DataFrame(ohe.fit_transform(df2[['nom_0']]))
df2=pd.concat([df2,enc],axis=1)
df2

Screenshot 2024-05-04 084635

pd.get_dummies(df2,columns=["nom_0"])

Screenshot 2024-05-04 084645

pip install --upgrade category_encoders

Screenshot 2024-05-04 084654

from category_encoders import BinaryEncoder
df=pd.read_csv("/content/data.csv")
be=BinaryEncoder()
nd=be.fit_transform(df['Ord_2'])
fb=pd.concat([df,nd],axis=1)
dfb=df.copy()
dfb

Screenshot 2024-05-04 084702

from category_encoders import TargetEncoder
te=TargetEncoder()
cc=df.copy()
new=te.fit_transform(X=cc["City"],y=cc["Target"])
cc=pd.concat([cc,new],axis=1)
cc

Screenshot 2024-05-04 084711

import pandas as pd
from scipy import stats
import numpy as np
df=pd.read_csv("/content/Data_to_Transform.csv")
df

Screenshot 2024-05-04 084721

df.skew()

Screenshot 2024-05-04 084729

np.log(df["Highly Positive Skew"])

Screenshot 2024-05-04 084739

np.reciprocal(df["Moderate Positive Skew"])

Screenshot 2024-05-04 084747

np.sqrt(df["Highly Positive Skew"])

Screenshot 2024-05-04 084755

np.square(df["Highly Positive Skew"])

Screenshot 2024-05-04 084805

df["Highly Positive Skew_boxcox"],parameters=stats.boxcox(df["Highly Positive Skew"])

Screenshot 2024-05-04 084815

df["Moderate Negative Skew_yeojohnson"],parameters=stats.yeojohnson(df["Moderate Negative Skew"])
df.skew()

Screenshot 2024-05-04 084822

df["Highly Negative Skew_yeojohnson"],parameters=stats.yeojohnson(df["Highly Negative Skew"])
df.skew()

Screenshot 2024-05-04 084829

import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import scipy.stats as stats

sm.qqplot(df["Moderate Negative Skew"],line='45')

plt.show()

Screenshot 2024-05-04 084838

import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import scipy.stats as stats

sm.qqplot(df["Moderate Negative Skew"],line='45')

plt.show()

Screenshot 2024-05-04 084847

sm.qqplot(np.reciprocal(df["Moderate Negative Skew"]),line='45')

Screenshot 2024-05-04 084858

from sklearn.preprocessing import QuantileTransformer
qt=QuantileTransformer(output_distribution='normal',n_quantiles=891)

df["Moderate Negative Skew"]=qt.fit_transform(df[["Moderate Negative Skew"]])

sm.qqplot(df["Moderate Negative Skew"],line='45')
plt.show()

Screenshot 2024-05-04 084905

RESULT:

Hence performing Feature Encoding and Transformation process is Successful.

About

To read the given data and perform Feature Encoding and Transformation process. ( 19AI403 - Introduction to Data Science )

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 84.2%
  • Python 15.8%