FastText Application

📌 Project Description

This project is an application designed for data processing and training a FastText model. It was developed as part of an academic project at a university and will not be further developed.

The application provides a complete set of tools for data preparation, cleaning, tokenization, lemmatization, and training a FastText model. The process also includes splitting the dataset into training and test sets, as well as exporting the trained model.

⚡ Features

📂 1. Data Loading

Support for CSV and JSON files.
Automatic data type conversion.
Basic dataset statistics.

🔍 2. Data Exploration and Cleaning

Display of missing value information.
Filling missing values using:
- Forward fill,
- Backward fill,
- Manual input.
Removal of duplicates and unnecessary columns.
Text data cleaning:
- Case normalization,
- Removal of excessive spaces,
- Removal of special characters and numbers.

📝 3. Text Processing

Text tokenization.
Stop-word removal.
Text lemmatization.

🎯 4. Preparing Data for FastText

Adding the __label__ prefix to labels.
Converting tokens into text format.

🔀 5. Data Splitting

Splitting data into training and test sets with configurable proportions.
Preview of the resulting data split.

🚀 6. FastText Model Training

Customizable training parameters:
- Number of epochs,
- Learning rate,
- N-grams,
- Embedding dimension,
- Loss function.
Training the model on user data.
Display of training process information.
Exporting the trained model.

📊 7. Model Evaluation

Testing the model on the test dataset.
Calculating accuracy and classification performance.

🔮 8. Prediction

Ability to make predictions on new text data.

🛠️ Requirements

To run the application, install the required dependencies listed in requirements.txt:

pip install -r requirements.txt

🚀 How to Run the Project?

Clone the repository:

git clone https://github.com/baarteek/fastTextProject
cd fastTextProject

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
python main.py
```

🖼️ Screenshots

The application includes the following views:

Data Loading
Data Exploration
Data Cleaning
Text Processing
Label Preparation
Data Splitting
Model Configuration
Model Training

Screenshots are available in the docs/ directory.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastText Application

📌 Project Description

⚡ Features

📂 1. Data Loading

🔍 2. Data Exploration and Cleaning

📝 3. Text Processing

🎯 4. Preparing Data for FastText

🔀 5. Data Splitting

🚀 6. FastText Model Training

📊 7. Model Evaluation

🔮 8. Prediction

🛠️ Requirements

🚀 How to Run the Project?

🖼️ Screenshots

About

Uh oh!

Releases

Packages

Languages

baarteek/fastTextTrainer

Folders and files

Latest commit

History

Repository files navigation

FastText Application

📌 Project Description

⚡ Features

📂 1. Data Loading

🔍 2. Data Exploration and Cleaning

📝 3. Text Processing

🎯 4. Preparing Data for FastText

🔀 5. Data Splitting

🚀 6. FastText Model Training

📊 7. Model Evaluation

🔮 8. Prediction

🛠️ Requirements

🚀 How to Run the Project?

🖼️ Screenshots

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages