Language Classification System

This Jupyter notebook contains a system for classifying languages based on text data. It includes data preprocessing, feature engineering, correlation reduction, and machine learning model training.

Modules and Packages Used

The following modules and packages were used in this notebook:

Matplotlib
Pickle
Scikit-learn
Numpy
Pandas
Math
Collections
Itertools
Seaborn

Dataset

The dataset used in this notebook is stored in a CSV file named 'language.csv'. The dataset was preprocessed to remove any missing data and convert the 'text' and 'language' columns to strings.

Creating Set of Features

A set of features was created based on the text data. It includes word count, character count, word density, punctuation count, vowel and consonant character count, exclamation and question mark count, unique words count, repeat words count, and more.

Correlation Reduction

Principal Component Analysis (PCA) was applied to reduce the correlation between the features.

Machine Learning Model

A Decision Tree Classifier was trained on the dataset and used to predict the language of text data. The trained model was saved using Pickle. The accuracy score of the model is displayed in a confusion matrix.

Usage

To use this system, you can run the code in the Jupyter notebook and provide your own text data to predict its language.

Link

Google colab file is located at https://colab.research.google.com/drive/1M_zRJwISxTOL4SU2Yo9V4ZcQHzYqpozU

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
language.csv		language.csv
language_classification_system.py		language_classification_system.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Language Classification System

Modules and Packages Used

Dataset

Creating Set of Features

Correlation Reduction

Machine Learning Model

Usage

Link

About

Uh oh!

Releases

Packages

Languages

arrdel/language-classification-system

Folders and files

Latest commit

History

Repository files navigation

Language Classification System

Modules and Packages Used

Dataset

Creating Set of Features

Correlation Reduction

Machine Learning Model

Usage

Link

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages