Skip to content

Prediction of the primary product category from its description.

Notifications You must be signed in to change notification settings

Yukti-09/Predictions-From-Descriptions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Predictions-From-Descriptions


Prediction of the primary product category from its description.
A multi-class classification problem.

Cleaning it up

First, the primary product category is found from the product category tree.
Following which any extra characters such as '.' are removed. Numbers are also removed.

Throwing problematic data out

According to the plot, the data is highly imbalanced.
The top 5 categories are considered for prediction.
The rest of the categories are dropped for this analysis.

Giving sentences meaning

A sentence cannot be used directly for classification and so I am required to tokenize it.
The sentence(here description) is converted to an integer matrix of tokens.
This is done for the training and testing descriptions.

Making the machine learn it

The dataset is split. 80% of the dataset is used for training and 20% is used for testing.
Multinomial Naive Bayes and Linear Support Vector Machines are used.
The accuracy obtained for them is 99.27% and 99.84% respectively.

Greedy for more accuracy?

Use LSTMs and GRUs.
Make use of more features.

Clone this repository

$ git clone "https://github.com/Yukti-09/Predictions-From-Descriptions.git"