Skip to content

Model for sentence level AI generated text detection. The implementation is based of the paper SeqXGPT which in turn is based on DetectGPT

harsha148/Sentence-Level-AI-Generated-Text-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentence-Level-AI-Generated-Text-Detection

AI Text Detection

Our project is based on SeqXGPT which is a framework for AI-Generated Text Detection.

The implementation is based on the paper SeqXGPT: Sentence-Level AI-Generated Text Generation

Setting up the project

  1. Clone this repository and change directory to the project directory.

  2. Set up the virtual environment using:

python3 -m venv myenv
source venv/bin/activate
  1. Install the dependencies using:
pip install -r requirements.txt

Note: Use python or python3 script in all scripts if you face any issues with running the scripts.

Methodology

Our approach is based on SeqXGPT which is an advanced technique for sentence-level AI-generated text detection. The model architecture comprises of the following three parts:

  1. Perplexity(Feature) Extraction
  2. Feature Encoder
  3. Linear Classification Layer

Datasets

Each dataset contains six files. Within each dataset folder, based on the source of AI-generated sentences in the document, they are organized into different files. Below is SeqXGPT-Bench.

SeqXGPT-Bench

A sentence-level AI-generated text (AIGT) detection dataset used for the study of fine-grained AIGT detection.

data format:

{
    "text": "Media playback is unsupported on your device 21 June 2013 Last updated at 12:31 BST The Market Hall Cinema in Brynmawr used to be run by the local council but when it announced its funding would stop last month, work began to find a way to keep it going. Thanks to the efforts of a group of local volunteers, the cinema has been saved and reopened under a new community initiative. The group, called \"Brynmawr Foundation\", raised enough funds to take over the lease of the building and purchase new equipment. They plan to show a mix of classic and new films, as well as host events and live performances. The Market Hall Cinema has been an important part of the town's history since it first opened in 1894, and this new initiative ensures that it will continue to be a valuable resource for the community.", 
    "prompt_len": 254, 
    "label": "gpt2"
}

text refers to an entire document.

prompt_len marks the boundary between the sentences generated by humans and those generated by AI. The first prompt_len characters of the input text, i.e., text[:prompt_len], are the sentences generated by humans, while the rest are generated by a particular language model.

label is the label for each sentence, and there are six types of labels in total: gpt2, gptneo, gptj, llama, gpt3re, human.

Note: For human_lines.jsonl file, the prompt_len field won't be present by default as prompt_len marks the boundary between the sentences generated by humans and those generated by AI. i.e. The first prompt_len characters of the input text, i.e., text[:prompt_len], are the sentences generated by humans, while the rest are generated by a particular language model.

To add that in the human_lines.jsonl file use the following script:

# --model: [GPT2, GPTJ, GPTNeo]
python3 -m dataset.add_prompt_len_to_human_data --input_file human_lines.jsonl --output_file output.jsonl 

The above step is crucial before we do the feature extraction and subsequently training, as we use prompt_len in our datahandler code for processing text segments. (src/utilities/datahandler.py)

Feature Extraction

We extract the features for each input file for each of the three LLMs (GPT-2, GPT-J, GPT-Neo) using dataset/features_extractor_from_data.py file.

Models used: GPT2-xl (1.5B), GPT-Neo (2.7B), GPT-J (6B)

Note: The original paper uses LLaMA (7B) as well for the features extraction but as the pretrained model is not open-sourced by Meta, we weren't able to use it.

We used A100 GPU available with Google Colab's Pro version for this part as feature extraction for the LLMs has high VRAM requirements.

To extract the features from any input data file, please use the following script:

# --model: [GPT2, GPTJ, GPTNeo]
python3 -m dataset.features_extractor_from_data --input_file input.jsonl --output_file output.jsonl --get_features --model model_name

You can store the features for each LLM (GPT2, GPTJ, GPTNeo) for each of the six input files. Let's say for the file gpt2_lines.jsonl you get the gpt2_lines_gpt2_features.jsonl, gpt2_lines_gptj_features.jsonl, gpt2_lines_gptneo_features.jsonl files and store them under dataset/features/gpt2_lines

So each of these feature files will have the features stored in jsonl format where each json line has the format:

{
  "wordwise_loss_list": [[[0.0, 6.484006881713867, 6.484006881713867, 10.804593086242676, 10.804593086242676, 6.741743087768555,.........]],
  "text": "high - salt has been shown to play a role in the pathogenesis of autoimmune disease . in this study , we investigated the effect of high - salt on the production of inflammatory mediators by arpe-19 cells and the possible mechanisms involved . arpe-19 cells were cultured with lps in dmem to which extra nacl had been added ( 20  mm and 40  mm ) . a) The   addition of sodium chloride ( 1 mmol/ l ) did not influence their growth and (b) lps treated with salt ( 40 mm ) significantly (p<0.01) induced an IC 50 value of 11 nmol TNFa/mL. The addition of lps in dmem for the addition of the salt ( 40 mm ) resulted in a significant increase in the IC 50 in the range of 25 - 33 nmol TNFa/ml.\n\nThe mechanism by which the      salt influences arpe-19 cells immunity is not known, however, the authors suggested that the increase in the production of TNFa may result from lysosome-remodeling effect of lps. In this study, arpe-19 cells have been treated for 24 hours with lps at different concentrations which resulted to significantly different results. lps-treated arpe-19 cells grew as       compared to untreated cells in the range of 40.3 - 57.3% of the initial culture volume. This increase in proliferation indicates that lps acts via the activation of arpe-19 cells immune mechanism.\n\nLps has also been recently used for the anti-tumour effect in cancer cell line. in this study, we found that Lps can bind to EBOV-1 and -2 and promote the cell        death", 
  "prompt_len": 347, 
  "label": "gpt2",
  "label_int": 0
}

Then we merge the wordwise_loss_lists for each feature file and get the complete features file for a particular input file (gpt2_lines_features.jsonl in this case) using the following script

python3 -m dataset.combine_features --directory dataset/gpt2_lines --output_file gpt2_lines_features.jsonl

Then we save the features.jsonl file for each input file in a directory dataset/features

A sample json line in this features.jsonl file post combining would look like:

{
  "wordwise_loss_list": [[0.0, 6.484006881713867, 6.484006881713867, 10.804593086242676, .........], [7.503107070922852, 7.478128910064697, 4.111625671386719, 4.111625671386719,....], [0.1008826345205307, 0.1008826345205307, 3.0469260215759277, 3.046926259994507, 0.04809051379561424,....]],
  "text": "high - salt has been shown to play a role in the pathogenesis of autoimmune disease . in this study , we investigated the effect of high - salt on the production of inflammatory mediators by arpe-19 cells and the possible mechanisms involved . arpe-19 cells were cultured with lps in dmem to which extra nacl had been added ( 20  mm and 40  mm ) . a) The   addition of sodium chloride ( 1 mmol/ l ) did not influence their growth and (b) lps treated with salt ( 40 mm ) significantly (p<0.01) induced an IC 50 value of 11 nmol TNFa/mL. The addition of lps in dmem for the addition of the salt ( 40 mm ) resulted in a significant increase in the IC 50 in the range of 25 - 33 nmol TNFa/ml.\n\nThe mechanism by which the      salt influences arpe-19 cells immunity is not known, however, the authors suggested that the increase in the production of TNFa may result from lysosome-remodeling effect of lps. In this study, arpe-19 cells have been treated for 24 hours with lps at different concentrations which resulted to significantly different results. lps-treated arpe-19 cells grew as       compared to untreated cells in the range of 40.3 - 57.3% of the initial culture volume. This increase in proliferation indicates that lps acts via the activation of arpe-19 cells immune mechanism.\n\nLps has also been recently used for the anti-tumour effect in cancer cell line. in this study, we found that Lps can bind to EBOV-1 and -2 and promote the cell        death", 
  "prompt_len": 347, 
  "label": "gpt2",
  "label_int": 0
}

These feature files will be used as the dataset for the SeqXGPT model

Feature Encoder and Linear Classification Layer

As per the architecture discussed in the paper, the code for the model can be found in src/model.py file

Training the SeqXGPT model

# pass the number of training epochs or else it will be 20 by default
python3 -m driver --data_path dataset/features --train_path dataset/train/train.jsonl --test_path dataset/test/test.jsonl --split_dataset --num_train_epochs=100 --gpu=gpu_count

We can change other args like train_ratio, learning rate, warmup_ratio, weight_decay in the above script.

Evaluation

For evaluation, we have provided the trained model file (saved_model.pt), which you can use to run the test scripts directly

Also provided a file dataset/test_data.jsonl (which was generated and processed after the feature extraction step) which can be used for the script. You can generate your own test_files too by the process discussed before.

# Give --document_level_eval as arg if you want documenet level evaluation else it will be sentence level evaluation by default
python3 -m driver --test_path dataset/test/test_data.jsonl --gpu=gpu_count --inference 

Miscellanous

For generation of results for particular model binary AIGT detection, we need test data containing only these pairs of data at a time like:

(gpt2, human), (gpt3, human), (gptneo, human), (gptj, human), (llama, human)

To filter the test.jsonl file we had generated during the training, we use the following script:

# Pass the path to the test.jsonl file and the path and name of the file where you want the test data which should be from only the pairs of labels
# eg. pass --labels gpt2 human, --labels gpt3 human, --labels gptneo human, --labels gptj human, --labels llama human
python3 -m dataset.filter_test_data --input_file dataset/test/test.jsonl --output_file dataset/test/output_test.jsonl --labels name_label_1 name_label_2

For the evaluation we have a file evaluator.py where you can choose the calculate_metrics method as per the evaluation you want to do (Multi-class vs Binary)

Just uncomment/comment the different implementation of calculate_metrics as per the comments added.

Results

Note: All the values listed in our table are F1 scores, and Macro-F1 scores to compare the overall performance.

1. Sentence-level results

The sentence level resutls are as follows:

For SeqXGPT-Bench dataset:

Mixed-Model multi-class AIGT Detection

Method GPT-2 GPT-Neo GPT-J LLaMA GPT-3 Human Macro-F1
Sniffer 51.5 45.5 36.0 23.1 54.1 58.1 44.7
Sent-RoBERTa 43.1 31.6 31.5 42.4 78.2 90.5 52.9
Seq-RoBERTa 55.5 36.7 32.0 78.6 94.4 92.3 64.9
SeqXGPT 98.5 98.7 97.2 93.2 93.9 92.9 95.7
Our implementation 97.6 97.5 92.3 79.3 90.4 92.0 91.6

Mixed-Model Binary AIGT Detection

Method AI Human Macro-F1
Sniffer 87.7 54.3 71.0
Sent-RoBERTa 97.6 92.6 95.1
Seq-RoBERTa 97.4 91.8 94.6
SeqXGPT 97.6 92.9 95.3
Our implementation 94.9 92.0 93.49

Particular-Model Binary AIGT Detection

Method GPT-2 Human Macro-F1 Method GPT-Neo Human Macro-F1
$log$ $p(x)$ 78.4 47.9 63.1 $log$ $p(x)$ 73.9 41.2 57.5
DetectGPT 65.8 42.9 54.3 DetectGPT 57.6 41.3 49.4
Sent-RoBERTa 92.9 75.8 84.4 Sent-RoBERTa 92.6 73.4 83.0
SeqXGPT 98.6 95.8 97.2 SeqXGPT 98.8 96.4 97.6
Our implementation 97.9* 97.3 97.6 Our implementation 97.8 97.4 97.6
Method GPT-J Human Macro-F1 Method LLaMA Human Macro-F1
$log$ $p(x)$ 76.5 34.4 55.5 $log$ $p(x)$ 69.1 27.1 48.1
DetectGPT 66.8 37.0 51.9 DetectGPT 52.8 47.6 50.2
Sent-RoBERTa 93.1 71.8 82.4 Sent-RoBERTa 89.7 69.6 79.6
SeqXGPT 97.9 92.9 95.4 SeqXGPT 96.0 89.9 92.9
Our implementation 94.4 96.1 95.3 Our implementation 82.1 92.9 87.5

For Out-of-Distribution (OOD) dataset:

Mixed-Model multi-class AIGT Detection on Out-of-Distribution(OOD) dataset

For the dataset you can refer dataset/OOD

Method GPT-2 GPT-2-Neo GPT-J LLaMA GPT-3 Human Macro-F1
Sniffer 7.8 50.8 28.3 22.9 61.9 44.8 36.1
Sent-RoBERTa 32.5 18.2 24.8 30.8 74.6 30.5 35.2
Seq-RoBERTa 53.7 29.0 29.9 75.2 92.3 83.6 60.6
SeqXGPT 98.9 90.7 95.2 90.3 93.7 88.2 92.8
Our implementation 94.4 90.4 93.4 75.0 90.9 84.5 88.1

2. Document-level results On Document-level detection dataset

For the dataset you can refer dataset/document-level detection dataset

Method GPT-2 GPT-Neo GPT-J LLaMA GPT-3 Human Macro-F1
Sniffer 85.4 84.5 74.6 13.0 81.3 66.2 67.5
Sent-RoBERTa 55.8 42.8 24.4 18.3 84.7 94.6 53.4
Seq-RoBERTa 63.4 41.0 32.6 67.0 91.9 51.2 57.9
SeqXGPT 99.5 99.5 98.0 93.3 90.4 84.8 94.2
Our implementation 98.47 97.8 92.3 79.1 90.1 88.9 91.1

References

@misc{wang2023seqxgpt,
      title={SeqXGPT: Sentence-Level AI-Generated Text Detection}, 
      author={Pengyu Wang and Linyang Li and Ke Ren and Botian Jiang and Dong Zhang and Xipeng Qiu},
      year={2023},
      eprint={2310.08903},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Model for sentence level AI generated text detection. The implementation is based of the paper SeqXGPT which in turn is based on DetectGPT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages