Predictions-From-Descriptions

Prediction of the primary product category from its description.
A multi-class classification problem.

Cleaning it up

First, the primary product category is found from the product category tree.
Following which any extra characters such as '.' are removed. Numbers are also removed.

Throwing problematic data out

According to the plot, the data is highly imbalanced.
The top 5 categories are considered for prediction.
The rest of the categories are dropped for this analysis.

Giving sentences meaning

A sentence cannot be used directly for classification and so I am required to tokenize it.
The sentence(here description) is converted to an integer matrix of tokens.
This is done for the training and testing descriptions.

Making the machine learn it

The dataset is split. 80% of the dataset is used for training and 20% is used for testing.
Multinomial Naive Bayes and Linear Support Vector Machines are used.
The accuracy obtained for them is 99.27% and 99.84% respectively.

Greedy for more accuracy?

Use LSTMs and GRUs.
Make use of more features.

Clone this repository

$ git clone "https://github.com/Yukti-09/Predictions-From-Descriptions.git"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Predictions-From-Descriptions

Cleaning it up

Throwing problematic data out

Giving sentences meaning

Making the machine learn it

Greedy for more accuracy?

Clone this repository

Files

README.md

Latest commit

History

README.md

File metadata and controls

Predictions-From-Descriptions

Cleaning it up

Throwing problematic data out

Giving sentences meaning

Making the machine learn it

Greedy for more accuracy?

Clone this repository