Skip to content

In recent years, large language models (LLMs) have become increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. This code develops a model that can detect whether a paper was written by a student or a master's degree in law.

Notifications You must be signed in to change notification settings

wcqy001028/LLM_detect_AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

LLM_detect_AI

In recent years, large language models (LLMs) have become increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. This code develops a model that can detect whether a paper was written by a student or a master's degree in law.

Hardware and Software

Kaggle default environment

Datasets

We used 5 datasets as the training set, with only 3 files in the datasets. The links to the other two files are https://www.kaggle.com/datasets/kagglemini/train-00000-of-00001-f9daec1515e5c4b9 (This dataset is sourced from an open-source dataset on Huggingface: https://huggingface.co/datasets/dim/essayforum_writing_prompts_6k/tree/main/) and https://www.kaggle.com/datasets/thedrcat/daigt-v2-train-dataset.

Train and Test

The processing and concatenation operations for the daigt v2 train dataset, argugpt, train-00000 of 00001-f9daec1515e5c4b9 datasets are included in this notebook: https://www.kaggle.com/wcqyfly/notebook95c85fa3c6

About

In recent years, large language models (LLMs) have become increasingly sophisticated, capable of generating text that is difficult to distinguish from human-written text. This code develops a model that can detect whether a paper was written by a student or a master's degree in law.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published