text_detect

Text_detect is an exam project for the DTU course 02476 MLOPS. The goal is to develop a machine learning pipeline and use a machine learning model to detect whether a text is generated by an AI or written by real people.
The data we have used is the kaggle dataset "DAIGT Proper Train Dataset" (https://www.kaggle.com/datasets/thedrcat/daigt-proper-train-dataset/data?select=train_drcat_04.csv) and the base-model used for the classification is the LLM transformer Bert-Tiny with pretrained weights (https://huggingface.co/FacebookAI/roberta-large-mnli?text=The+dog+was+lost.+Nobody+lost+any+animal).

How to use

Clone the repo In terminal run:

pip install invoke
invoke run_bentoml
pip install streamlit
streamlit src/text-detect/frontend.py
open the link given as output in the terminal by the frontend application.
upload a txt file to the dropdown box.
The answer should be retrieved, as soon as the file is processed by the backend. The output will be either "Human" or "AI".

Artur Adam Habuda s233190 Eline Siegumfeldt s183540 Franciszek Marek Gorczyca s233664 Max-Peter Schrøder s214238

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.dvc		.dvc
.github		.github
artifacts/test-run-model-v1		artifacts/test-run-model-v1
configs		configs
data		data
dockerfiles		dockerfiles
docs		docs
models		models
reports		reports
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
bentofile.yaml		bentofile.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_dev.txt		requirements_dev.txt
requirements_frontend.txt		requirements_frontend.txt
tasks.py		tasks.py
vertex_ai_config.yaml		vertex_ai_config.yaml
vertex_ai_train.yaml		vertex_ai_train.yaml