Text_detect is an exam project for the DTU course 02476 MLOPS.
The goal is to develop a machine learning pipeline and use a machine learning model to detect whether a text is generated by an AI or written by real people.
The data we have used is the kaggle dataset "DAIGT Proper Train Dataset" (https://www.kaggle.com/datasets/thedrcat/daigt-proper-train-dataset/data?select=train_drcat_04.csv) and the base-model used for the classification is the LLM transformer Bert-Tiny with pretrained weights (https://huggingface.co/FacebookAI/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal).
https://huggingface.co/prajjwal1/bert-tiny
Clone the repo In terminal run:
- pip install invoke
- invoke run_bentoml
- pip install streamlit
- streamlit src/text-detect/frontend.py
- open the link given as output in the terminal by the frontend application.
- upload a txt file to the dropdown box.
- The answer should be retrieved, as soon as the file is processed by the backend. The output will be either "Human" or "AI".
Artur Adam Habuda s233190 Eline Siegumfeldt s183540 Franciszek Marek Gorczyca s233664 Max-Peter Schrøder s214238