AI Text Detection Web App

This project is a web-based application designed to detect whether a piece of text is generated by AI or written by a human. The app allows users to input text, select between two different models (unigram and bigram), and receive a prediction. These models are powered by machine learning techniques that analyze the structure of the text.

What are Unigrams and Bigrams?

• Unigram: A unigram is a single word, and unigram models analyze text by considering each word individually. For example, in the sentence "AI is powerful," the unigrams are "AI," "is," and "powerful." Unigram models capture word frequency, but they don't account for word order or relationships between words.
• Bigram: A bigram consists of two consecutive words. A bigram model analyzes text by considering pairs of adjacent words. For the same sentence "AI is powerful," the bigrams would be "AI is" and "is powerful." Bigram models capture more context than unigrams, as they can analyze relationships between consecutive words, which improves the accuracy for certain tasks like AI text detection.

Unigram and Bigram Models

• Unigram Model: Effective when individual word frequencies are important and sufficient for classification. It's a simpler model that performs well in tasks where word context isn't as crucial. In this project, the unigram model is stored in the uni model pkl folder and utilizes both Logistic Regression and Naive Bayes models for AI text detection.
• Bigram Model: Captures relationships between words, providing more context. This model is especially useful for complex text analysis, where understanding the pairing of words improves prediction accuracy. The bigram model is stored in the bi model pkl folder and combines LightGBM and Random Forest models for more nuanced AI text detection.

The original code for training these models can be found in my other repository: AI-Text-Detector-Model. In this repository, the models have been converted into Python scripts and serialized into pickle files for use in the backend of this application.

Model Description

• Unigram Model: Combines Logistic Regression and Naive Bayes models.
• Bigram Model: Combines LightGBM and Random Forest models.

The model takes the input text, processes it with TF-IDF vectorizers (unigram or bigram depending on the selection), and provides a combined prediction result.

Dataset Information

The models were trained on a dataset consisting of 4.5 lakh (450,000+) text samples, including both AI-generated and human-written content. The dataset covers a variety of topics and text lengths to ensure robustness. The dataset used for training can be found here on Kaggle. However, due to the growing complexity of AI-generated content, the model requires further training to enhance its accuracy across a wider range of texts.

Installation

To run the project locally, follow these steps:

Clone the repository:

 git clone https://github.com/shashankrxj/AI-Text-Detector.git

Navigate to the project directory:

cd AI-Text-Detector

Install dependencies:

pip install -r requirements.txt

Start the Flask server:

python app.py

Open your browser and visit:

http://127.0.0.1:5000 or http:localhost:5000

Usage

Open the app in your browser.
Choose between the Unigram or Bigram model.
Enter the text you want to analyze.
Click the "Submit" button.
The model will provide a prediction for the input text.

Image

Frontend will look like this.

Disclaimer

The model is trained on a dataset of 4.5 lakh text samples, but it is still evolving and may need more data to achieve higher accuracy in predicting whether text is AI-generated or human-written. As such, predictions might be incorrect in some cases. Please use the results with caution, especially in critical applications.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
bi model pkl		bi model pkl
static		static
templates		templates
uni model pkl		uni model pkl
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Text Detection Web App

What are Unigrams and Bigrams?

Unigram and Bigram Models

Model Description

Dataset Information

Installation

Usage

Image

Disclaimer

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AI Text Detection Web App

What are Unigrams and Bigrams?

Unigram and Bigram Models

Model Description

Dataset Information

Installation

Usage

Image

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages