Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.41 KB

README.md

File metadata and controls

33 lines (26 loc) · 1.41 KB

Predictions-From-Descriptions


Prediction of the primary product category from its description.
A multi-class classification problem.

Cleaning it up

First, the primary product category is found from the product category tree.
Following which any extra characters such as '.' are removed. Numbers are also removed.

Throwing problematic data out

According to the plot, the data is highly imbalanced.
The top 5 categories are considered for prediction.
The rest of the categories are dropped for this analysis.

Giving sentences meaning

A sentence cannot be used directly for classification and so I am required to tokenize it.
The sentence(here description) is converted to an integer matrix of tokens.
This is done for the training and testing descriptions.

Making the machine learn it

The dataset is split. 80% of the dataset is used for training and 20% is used for testing.
Multinomial Naive Bayes and Linear Support Vector Machines are used.
The accuracy obtained for them is 99.27% and 99.84% respectively.

Greedy for more accuracy?

Use LSTMs and GRUs.
Make use of more features.

Clone this repository

$ git clone "https://github.com/Yukti-09/Predictions-From-Descriptions.git"