ExeRay 2.0 🏥

Advanced X-ray Vision for Windows Executables

Detect malicious .exe files using machine learning. Extracts static features (entropy, imports, metadata) and combines ML with heuristic rules for fast, automated classification.

🚀 What's New in v2.0

50+ New Detection Features (VM detection, anti-debugging, API call chains)
Enhanced Prediction Engine with detailed suspicious behavior reports if malware found
Recall-Optimized Training: Custom scorer prioritizing malware detection
Streamlined 3-Script Architecture (faster workflow)
Improved Accuracy (F1-score up to 0.99 in testing)
Dataset Provided !!

📊 Dataset Information

- Source & Composition:

Dataset	From	Examples	Total
Malicious Dataset	DikeDataset, theZoo, MalwareBazaar	WannaCry.exe, njRAT.exe	10,925
Benign Dataset	Windows Files, Ninite.com, PortableApps.com	Putty.exe, notepad.exe, ida.exe	3,590
Total			14,515

Dataset Processing: From 10,925 malware samples, we processed 4,200 for feature extraction, then applied Undersampling to balance with 3,500 benign samples (7,000 total). Used RandomUnderSampler (random_state=42) to prevent malware bias while preserving key patterns.

⚙️ Enhanced Features

Hybrid AI detection (XGBoost + Random Forest)
Detailed Malware Fingerprinting:
- VM/Sandbox detection markers
- Anti-debugging technique identification
- Suspicious API call patterns
Confidence Scoring with threat level classification

🔧 Upgraded Tech Stack

New Components:

Advanced PE Analysis: Full directory parsing (TLS, Debug, Resources)
String Analysis: Unicode/ASCII pattern detection
Behavioral Indicators: 15+ new malware behavior signatures

Key Improvements:

- Structural Features:

# PE File Structure
'num_sections', 
'num_unique_sections',
'section_names_entropy',
'avg_section_size',
'min_section_size',
'max_section_size',
'total_section_size',
'avg_entropy',
'min_entropy',
'max_entropy',
'has_packed_sections',
'has_executable_sections',
'writable_executable_sections',
'is_dll',
'is_executable',
'is_system_file',
'has_aslr',
'has_dep',
'is_signed',
'has_rich_header',
'rich_header_entries',
'has_resources',
'num_resources',
'has_embedded_exe',
'has_debug',
'has_tls',
'has_relocations',
'ep_in_first_section',
'ep_in_last_section',
'ep_section_entropy',
'has_suspicious_sections'

- Behavioral Features:

# API/Import Analysis
'num_imports',
'num_unique_dlls',
'num_unique_imports',
'imports_to_dlls_ratio',
'has_import_name_mismatches',
'suspicious_imports_count',
'num_exports',
'suspicious_exports',
'suspicious_api_chains',
'has_delayed_imports',
'has_vm_detection_imports',
'has_anti_debug_imports',
'has_process_creation_imports',
'has_createprocess',
'has_setwindowshookex',

# String Patterns
'num_strings',
'avg_string_length',
'has_suspicious_strings',
'has_anti_debug',
'has_vm_detection_strings',
'has_vm_mac_addresses',
'has_anti_debug_strings',
'has_nop_sleds',
'has_anti_debug_strings'

- Detection Signatures:

vm_detection_strings = {
    b'vbox', b'vmware', b'virtualbox', b'qemu', b'xen', b'hypervisor',
    b'virtual machine', b'vmcheck', b'vboxguest', b'vboxsf', b'vboxvideo'
}

vm_mac_prefixes = {
    b'00:0C:29', b'00:1C:14', b'00:05:69', b'00:50:56',  # VMware
    b'08:00:27',  # VirtualBox
    b'00:16:3E',  # Xen
    b'00:1C:42',  # Parallels
    b'00:15:5D'   # Hyper-V
}

anti_debug_strings = {
    b'IsDebuggerPresent', b'CheckRemoteDebuggerPresent', b'OutputDebugString',
    b'NtQueryInformationProcess', b'NtSetInformationThread', b'ZwSetInformationThread'
}

suspicious_patterns = {
    b'payload', b'malware', b'inject', b'virus', b'trojan',
    b'backdoor', b'rat', b'worm', b'spyware', b'keylog',
    b'xored', b'encrypted', b'packed', b'obfus'
}

# API Groups
vm_detection_apis = {
    'cpuid', 'hypervisor', 'vmcheck', 'vbox', 'vmware', 'virtualbox',
    'wine_get_unix_file_name', 'wine_get_dos_file_name'
}

anti_debug_apis = {
    'IsDebuggerPresent', 'CheckRemoteDebuggerPresent', 'OutputDebugStringA',
    'NtQueryInformationProcess', 'NtSetInformationThread', 'NtQuerySystemInformation',
    'GetTickCount', 'QueryPerformanceCounter', 'RDTSC', 'GetProcessHeap',
    'ZwSetInformationThread', 'DbgBreakPoint', 'DbgUiRemoteBreakin'
}

process_creation_apis = {
    'CreateProcessA', 'CreateProcessW', 'CreateProcessAsUserA', 'CreateProcessAsUserW',
    'SetWindowsHookExA', 'SetWindowsHookExW', 'ShellExecuteA', 'ShellExecuteW',
    'WinExec', 'System'
}

# Suspicious API Chains
api_sequences = {
    ('VirtualAlloc', 'WriteProcessMemory', 'CreateRemoteThread'): 'Process Injection',
    ('RegCreateKey', 'RegSetValue', 'RegCloseKey'): 'Registry Persistence',
    ('LoadLibraryA', 'GetProcAddress', 'VirtualProtect'): 'Dynamic API Resolution',
    ('OpenProcess', 'ReadProcessMemory', 'WriteProcessMemory'): 'Process Hollowing',
    ('NtUnmapViewOfSection', 'MapViewOfFile', 'ResumeThread'): 'RunPE Technique',
    ('CreateProcessA', 'WriteProcessMemory', 'ResumeThread'): 'Process Injection',
    ('SetWindowsHookExA', 'GetMessage', 'DispatchMessage'): 'Hook Injection'
}

📁 Directory Structure

ExeShield_AI/
├── assets/                      # Repo Images
├── data/                        # Raw Samples  
│   ├── malware/                 # Malicious Executables  
│   └── benign/                  # Clean Executables
├── dependencies/                # Installation Dependencies
├── models/                      # Saved Models/Thresholds  
│   ├── malware_detector.joblib  
│   └── optimal_threshold.npy  
├── output/                      # Processed Data (CSV/features)
│   └── processed_features_dataset.csv
├── scripts/                     # Core Scripts  
│   ├── extract_features.py  
│   ├── train_model.py  
│   └── predict.py  
└── README.md

💻 Installation and Usage (Commands & Outputs)

1. Clone the repository:

git clone https://github.com/MohamedMostafa010/ExeRay.git
cd ExeRay

2. Install dependencies:

pip install -r dependencies/requirements.txt

3. Extract Features:

> python extract_features.py
[*] Processing benign samples from ../data\benign...
[!] Not a valid PE file: adaminstall.exe
[!] Not a valid PE file: adamsync.exe
[!] Not a valid PE file: AddSuggestedFoldersToLibraryDialog.exe
[!] Not a valid PE file: AgentService.exe
[!] Not a valid PE file: AggregatorHost.exe
[!] Not a valid PE file: appcmd.exe
[!] Not a valid PE file: AppHostRegistrationVerifier.exe
[!] Not a valid PE file: ApplySettingsTemplateCatalog.exe
[!] Not a valid PE file: ApplyTrustOffline.exe
[!] Not a valid PE file: ApproveChildRequest.exe
[!] Not a valid PE file: AppVClient.exe
[!] Not a valid PE file: ARPPRODUCTICON.exe
[!] Not a valid PE file: audit.exe
[!] Not a valid PE file: AuditShD.exe
[!] Not a valid PE file: autofstx.exe
...
[*] Processing malware samples (limited to 3500) from ../data\malware...

[+] Processed Features Dataset saved to ../output/processed_features_dataset.csv
[+] Total samples: 6857
[+] Malware samples: 3500
[+] Benign samples: 3357

4. Train Model (Metrics Also Provided After Training to Know Your Model's Performance):

> python train_model.py
Training models:   0%|                                                                                                                                                 | 0/2 [00:00<?, ?it/s]
New best model: XGBoost (Recall=0.990)
Training models: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.90s/it]

=== Evaluation ===
              precision    recall  f1-score   support

           0       0.99      0.99      0.99       672
           1       0.99      0.99      0.99       700

    accuracy                           0.99      1372
   macro avg       0.99      0.99      0.99      1372
weighted avg       0.99      0.99      0.99      1372

ROC AUC: 1.000

Model saved to ../models/malware_detector.joblib

5. Predict Executable:

> python predict.py "path/to/[benign_file]"
Malware Detection Results:
========================================
File: CapCut.exe
Prediction: BENIGN
Malware Probability: 0.41%
Confidence Level: VERY_LOW
Decision Threshold: 35.93%

> python predict.py "path/to/[suspicious_file]"
Malware Detection Results:
========================================
Malware Detection Results:
========================================
File: Mh1.exe
Prediction: MALWARE
Malware Probability: 98.39%
Confidence Level: VERY_HIGH
Decision Threshold: 35.93%

Top Suspicious Features:
- High maximum section entropy: 7.887
- Suspicious API imports: 6
- High average section entropy: 5.743
- High section name entropy: 2.807
- Writable and executable sections: 2
- Suspicious API call chains: 1
- Anti-debugging API imports: 1
- Anti-debugging strings in code: 1

🔍 Handling False Positives

While ExeShield AI achieves high accuracy, occasional false positives (legitimate files flagged as malware) may occur. Common causes:
- Legitimate tools with behaviors resembling malware (e.g., putty.exe).
- Packed/obfuscated benign files (high entropy).

- Example False Positive Output (Below is an example of a malicious file (1.exe) predicted as BENIGN:

> python predict.py python predict.py "C:\Users\[USERNAME]\Documents\Maicious_TEST\1.exe"
Malware Detection Results:
========================================
File: 1.exe
Prediction: BENIGN
Malware Probability: 1.56%
Confidence Level: VERY_LOW
Decision Threshold: 35.93%

Mitigation Strategies:

Adjust Threshold:
- Lower the decision threshold in predict.py for stricter filtering
Whitelist Trusted Files:
- Manually verify and exclude known-safe executables.
Retrain the Model:
- Add misclassified samples to your dataset and rerun train_model.py.

🤝 ROC Curve Analysis

Model Performance Metrics:

🟦 Our Model (AUC = 1.00): Perfect classification capability
🟥 Random Guess (AUC = 0.5): Baseline for comparison
📊 Optimal Threshold: 36% (0.36 in probability units)

Key Interpretation:

X-axis (False Positive Rate): Lower values = fewer false alarms
Y-axis (True Positive Rate): Higher values = more malware caught
Perfect Score: Curve touching top-left corner (achieved in our case)

🤝 Contributing

Pull requests are welcome! If you have ideas for new user profiles, simulation modes, or forensic artifacts, feel free to contribute.

📖 License

This project is released under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExeRay 2.0 🏥

🚀 What's New in v2.0

📊 Dataset Information

⚙️ Enhanced Features

🔧 Upgraded Tech Stack

New Components:

Key Improvements:

- Structural Features:

- Behavioral Features:

- Detection Signatures:

📁 Directory Structure

💻 Installation and Usage (Commands & Outputs)

1. Clone the repository:

2. Install dependencies:

3. Extract Features:

4. Train Model (Metrics Also Provided After Training to Know Your Model's Performance):

5. Predict Executable:

🔍 Handling False Positives

Mitigation Strategies:

🤝 ROC Curve Analysis

🤝 Contributing

📖 License

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets		assets
data		data
dependencies		dependencies
models		models
output		output
scripts		scripts
LICENSE		LICENSE
README.md		README.md

License

MohamedMostafa010/ExeRay

Folders and files

Latest commit

History

Repository files navigation

ExeRay 2.0 🏥

🚀 What's New in v2.0

📊 Dataset Information

⚙️ Enhanced Features

🔧 Upgraded Tech Stack

New Components:

Key Improvements:

- Structural Features:

- Behavioral Features:

- Detection Signatures:

📁 Directory Structure

💻 Installation and Usage (Commands & Outputs)

1. Clone the repository:

2. Install dependencies:

3. Extract Features:

4. Train Model (Metrics Also Provided After Training to Know Your Model's Performance):

5. Predict Executable:

🔍 Handling False Positives

Mitigation Strategies:

🤝 ROC Curve Analysis

🤝 Contributing

📖 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages