Skip to content

apang024/CS205AIGeneratedCode

Repository files navigation

CS205AIGeneratedCode

Project Title: Detecting AI Generated Code

Project Description: With the surge of tools to help devlopers program, such as generative AI (large language models) and AI assistants (Github CoPilot), it's becoming more difficult to tell whether code is written by a human or not. In our project, we discover whether we could determine whether code is generated by AI, more specifically, generated by Github CoPilot.

File Folder: CS205_Final Project

Video: *INSERT LINK (3 min or less)

Question: Can we detect whether the given code was generated by AI or if it was human written?

Dataset: Github Repositories under the category of *INSERT CATEGORY

Method:

  1. Crawl Github with a specific search category to create our dataset
  2. Take the code and remove functions and ask Copilot to finish the code
  3. Label original code as human generated, label Copilot code as AI generated
  4. Separate train, test, validation sets as 75%, 15%, 10% respectively
  5. Tokenize all code and create features for our ML model to use
  6. Run ML models (ex: Random Forest (RF), XGBoost, and Support Vector Machine (SVM) )
  7. Gather precision, recall, accuracy, and f1-score for each model

Application: Detecting AI Generated Code can help in many different scenarios. For example, in academia, instructors can use this tool to know if code written by a student is plagiarized or not.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •