Skip to content

XiAnzheng-ID/RansomPyShield-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RansomPyShield-Model

This Model is created and maintained by Myself, i "WAS" thingking to focus on Ransomware only but it seems that iam having hard time to classify which file is a real Ransomware from a thousands malware files , also not worth the trouble and time just to do it

I wanna say THANK YOU to Abuse Ch for providing free and easy way to get a real malware sample unlike some other site out there(Cough Cough Virustotal, Any.run, vx-undergrounnd Cough Cough)

Q: What makes this ML Model Different than some public and free model out there?
A: this model integrate Yara-Rules, CAPA(WIP-Experimental), blint(WIP-Experimental) for its indicator , not just static PE Information from lief/pefile/other

Q: Why?? , if u using Machine Learning why dont just use the file metadata and structure itself, like other people do?
A: Some Malware can spoof, hide, encrypt, obfus most of their data making extracting them hard, some Model or even "bad" AV can be bypassed or be fooled using this way

Q: Why those tools?
A: With Yara-Rules, CAPA, blint help we can extract some other crucial data or indicator that the threat actor try to hide, thus this may help the ML to decide the file is Malicious or not

Not just that, these tools can also help extract the data that taking too much time to extract or even hard to do for some people

Q: So it doest have weakness?
A: good question , it does, some threat actor can bypass it using some fileless attack vector eg:CMD/Powershell attack, which we need different approach not Static Identification

⚠️WARNING, IF YOU TRY TO REPLICATE WHAT IAM DOIN OR TRY TO TRAIN USING NEW OR OTHER SAMPLE⚠️

  • DO VERY BECAREFULL AS YOU DEALING WITH REAL AND LIVE MALWARE SAMPLE, MALWAREBAZAAR AND I ARE NOT RESPONSIBLE FOR ANY DAMAGE HAPPEN TO YOUR DEVICE, PC, FILES, ETC
  • USE CMD "ren" COMMAND TO RENAME ALL THE SAMPLE EXTENSION TO SOMETHING ELSE, REDUCE THE RISK OF ACCIDENTAL EXECUTION
ren *.exe/.dll/.sys *.malware
  • Some Yara Rules that been used by this project are from other public repo, people, website, credit to their respective owner(check the yara files for more information about them)
  • Iam providing some basic preprocessing script just in-case some people need it
  • The Parameter that iam providing on train_xgbost.py is from optuna log during my training, so i reccomend use the traing_xgboost_optuna.py if you try to use this with new sample or other dataset

How to use?

  • mb_datalake.py
    Get Your malware sample from MalwareBazaar's datalake (YYYY or YYYY-MM or YYYY-MM-DD), you can add --extract argument to make this script extract the zip automatically
script.py --start YYYY-MM-DD --end YYYY-MM-DD --out "D:\Path\to\save"
  • delete_invalid_and_dupe_file.py
    Use this script if you need help with filtering the invalid and dupe pe file, you can add --workers arguments to set the thread number
script.py "D:\path\to\Folder" 
  • extract.py
    Then use the extract script to extract all the necessary data and create the dataset we need --capa and --blint is an optional and experimental-WIP argument , but if you wanna make your own model feel free to use it as it is right now
script.py --malware "C:\path\to\sample" --benign "C:\path\to\benign" --yara_rules "C:\path\to\yara_rules"
  • train_xgboost.py or train_xgboost_optuna.py
just run the script as normal script dont forget to change the dataset.csv file name tho
  • run.py
    for running the model and test the accuracy
run.py --folder/--files "/Target/Folder/Files" --model "ransompyshield.pkl" --yara_rules "Path/to/Rules" --label "malware/benign"

Wanna try the model???:

RansomPyShield-Model

Proof & Information

This model as of now is tested against 1500+ real Malware Sample and 1500+ benign file Trained with 60k+ real Malware Sample and 60k+ Benign File(total of 130k+ sample in total from Malwarebazaar and Windows System/Program files/Appdata Directory)
IAM NOT GONNA PROVIDE THE BENIGN FILE THAT IAM USING, TOO RISKY GETTING A COPYRIGHT PROBLEM

Do remember that some file were failed to be processed so it can be less

  • Malware proof

  • Benign proof

  • Confusion Matrix

About

Malware Classifer Trained using XGBoost with Optuna and YARA Integration

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published