In recent years, large language models (LLMs) have become increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. This code develops a model that can detect whether a paper was written by a student or a master's degree in law.
Kaggle default environment
We used 5 datasets as the training set, with only 3 files in the datasets. The links to the other two files are https://www.kaggle.com/datasets/kagglemini/train-00000-of-00001-f9daec1515e5c4b9 (This dataset is sourced from an open-source dataset on Huggingface: https://huggingface.co/datasets/dim/essayforum_writing_prompts_6k/tree/main/) and https://www.kaggle.com/datasets/thedrcat/daigt-v2-train-dataset.