Skip to content

shashankrxj/AI-Text-Detector

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Text Detection Web App

This project is a web-based application designed to detect whether a piece of text is generated by AI or written by a human. The app allows users to input text, select between two different models (unigram and bigram), and receive a prediction. These models are powered by machine learning techniques that analyze the structure of the text.

What are Unigrams and Bigrams?

Unigram: A unigram is a single word, and unigram models analyze text by considering each word individually. For example, in the sentence "AI is powerful," the unigrams are "AI," "is," and "powerful." Unigram models capture word frequency, but they don't account for word order or relationships between words.
Bigram: A bigram consists of two consecutive words. A bigram model analyzes text by considering pairs of adjacent words. For the same sentence "AI is powerful," the bigrams would be "AI is" and "is powerful." Bigram models capture more context than unigrams, as they can analyze relationships between consecutive words, which improves the accuracy for certain tasks like AI text detection.

Unigram and Bigram Models

Unigram Model: Effective when individual word frequencies are important and sufficient for classification. It's a simpler model that performs well in tasks where word context isn't as crucial. In this project, the unigram model is stored in the uni model pkl folder and utilizes both Logistic Regression and Naive Bayes models for AI text detection.
Bigram Model: Captures relationships between words, providing more context. This model is especially useful for complex text analysis, where understanding the pairing of words improves prediction accuracy. The bigram model is stored in the bi model pkl folder and combines LightGBM and Random Forest models for more nuanced AI text detection.

The original code for training these models can be found in my other repository: AI-Text-Detector-Model. In this repository, the models have been converted into Python scripts and serialized into pickle files for use in the backend of this application.

Model Description

Unigram Model: Combines Logistic Regression and Naive Bayes models.
Bigram Model: Combines LightGBM and Random Forest models.

The model takes the input text, processes it with TF-IDF vectorizers (unigram or bigram depending on the selection), and provides a combined prediction result.

Dataset Information

The models were trained on a dataset consisting of 4.5 lakh (450,000+) text samples, including both AI-generated and human-written content. The dataset covers a variety of topics and text lengths to ensure robustness. The dataset used for training can be found here on Kaggle. However, due to the growing complexity of AI-generated content, the model requires further training to enhance its accuracy across a wider range of texts.

Installation

To run the project locally, follow these steps:

  1. Clone the repository:
 git clone https://github.com/shashankrxj/AI-Text-Detector.git
  1. Navigate to the project directory:
cd AI-Text-Detector
  1. Install dependencies:
pip install -r requirements.txt
  1. Start the Flask server:
python app.py
  1. Open your browser and visit:
http://127.0.0.1:5000 or http:localhost:5000

Usage

  1. Open the app in your browser.
  2. Choose between the Unigram or Bigram model.
  3. Enter the text you want to analyze.
  4. Click the "Submit" button.
  5. The model will provide a prediction for the input text.

Image

Frontend will look like this. AI-Text-Detector-Image

Disclaimer

The model is trained on a dataset of 4.5 lakh text samples, but it is still evolving and may need more data to achieve higher accuracy in predicting whether text is AI-generated or human-written. As such, predictions might be incorrect in some cases. Please use the results with caution, especially in critical applications.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

AI Text Detection Web App identifies whether text is AI-generated or human-written. It offers unigram and bigram models, combining Logistic Regression, Naive Bayes, Random Forest, and LightGBM to provide accurate predictions based on text structure and context.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors