RansomPyShield-Model

This Model is created and maintained by Myself, i "WAS" thingking to focus on Ransomware only but it seems that iam having hard time to classify which file is a real Ransomware from a thousands malware files , also not worth the trouble and time just to do it

I wanna say THANK YOU to Abuse Ch for providing free and easy way to get a real malware sample unlike some other site out there(Cough Cough Virustotal, Any.run, vx-undergrounnd Cough Cough)

Q: What makes this ML Model Different than some public and free model out there?
A: this model integrate Yara-Rules, CAPA(WIP-Experimental), blint(WIP-Experimental) for its indicator , not just static PE Information from lief/pefile/other

Q: Why?? , if u using Machine Learning why dont just use the file metadata and structure itself, like other people do?
A: Some Malware can spoof, hide, encrypt, obfus most of their data making extracting them hard, some Model or even "bad" AV can be bypassed or be fooled using this way

Q: Why those tools?
A: With Yara-Rules, CAPA, blint help we can extract some other crucial data or indicator that the threat actor try to hide, thus this may help the ML to decide the file is Malicious or not

Not just that, these tools can also help extract the data that taking too much time to extract or even hard to do for some people

Q: So it doest have weakness?
A: good question , it does, some threat actor can bypass it using some fileless attack vector eg:CMD/Powershell attack, which we need different approach not Static Identification

⚠️WARNING, IF YOU TRY TO REPLICATE WHAT IAM DOIN OR TRY TO TRAIN USING NEW OR OTHER SAMPLE⚠️

DO VERY BECAREFULL AS YOU DEALING WITH REAL AND LIVE MALWARE SAMPLE, MALWAREBAZAAR AND I ARE NOT RESPONSIBLE FOR ANY DAMAGE HAPPEN TO YOUR DEVICE, PC, FILES, ETC
USE CMD "ren" COMMAND TO RENAME ALL THE SAMPLE EXTENSION TO SOMETHING ELSE, REDUCE THE RISK OF ACCIDENTAL EXECUTION

ren *.exe/.dll/.sys *.malware

Some Yara Rules that been used by this project are from other public repo, people, website, credit to their respective owner(check the yara files for more information about them)
Iam providing some basic preprocessing script just in-case some people need it
The Parameter that iam providing on train_xgbost.py is from optuna log during my training, so i reccomend use the traing_xgboost_optuna.py if you try to use this with new sample or other dataset

How to use?

mb_datalake.py
Get Your malware sample from MalwareBazaar's datalake (YYYY or YYYY-MM or YYYY-MM-DD), you can add --extract argument to make this script extract the zip automatically

script.py --start YYYY-MM-DD --end YYYY-MM-DD --out "D:\Path\to\save"

delete_invalid_and_dupe_file.py
Use this script if you need help with filtering the invalid and dupe pe file, you can add --workers arguments to set the thread number

script.py "D:\path\to\Folder"

extract.py
Then use the extract script to extract all the necessary data and create the dataset we need --capa and --blint is an optional and experimental-WIP argument , but if you wanna make your own model feel free to use it as it is right now

script.py --malware "C:\path\to\sample" --benign "C:\path\to\benign" --yara_rules "C:\path\to\yara_rules"

train_xgboost.py or train_xgboost_optuna.py

just run the script as normal script dont forget to change the dataset.csv file name tho

run.py
for running the model and test the accuracy

run.py --folder/--files "/Target/Folder/Files" --model "ransompyshield.pkl" --yara_rules "Path/to/Rules" --label "malware/benign"

Wanna try the model???:

RansomPyShield-Model

Proof & Information

This model as of now is tested against 1500+ real Malware Sample and 1500+ benign file Trained with 60k+ real Malware Sample and 60k+ Benign File(total of 130k+ sample in total from Malwarebazaar and Windows System/Program files/Appdata Directory)
IAM NOT GONNA PROVIDE THE BENIGN FILE THAT IAM USING, TOO RISKY GETTING A COPYRIGHT PROBLEM

Do remember that some file were failed to be processed so it can be less

Malware proof

Benign proof

Confusion Matrix

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
ML Model		ML Model
PE Extract		PE Extract
Preprocess		Preprocess
Proof		Proof
Rule		Rule
Train & Dataset		Train & Dataset
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RansomPyShield-Model

⚠️WARNING, IF YOU TRY TO REPLICATE WHAT IAM DOIN OR TRY TO TRAIN USING NEW OR OTHER SAMPLE⚠️

How to use?

Wanna try the model???:

Proof & Information

About

Uh oh!

Releases

Packages

Uh oh!

Languages

XiAnzheng-ID/RansomPyShield-Model

Folders and files

Latest commit

History

Repository files navigation

RansomPyShield-Model

⚠️WARNING, IF YOU TRY TO REPLICATE WHAT IAM DOIN OR TRY TO TRAIN USING NEW OR OTHER SAMPLE⚠️

How to use?

Wanna try the model???:

Proof & Information

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages