🤖 An AI deobfuscator that translates minified/obfuscated JavaScript back into human-readable code.
Have you ever viewed the source of a website, only to be met by a wall of meaningless, machine-generated code?
function _0x5dcf(_0x1a2b, _0x3c4d) {
var _0x7e8f = _0x1a2b["data"][0];
var _0x9b1a = _0x1a2b["key"];
if(_0x3c4d > _0x7e8f) {
for(var _0x5f8d = 0; _0x5f8d < _0x3c4d; _0x5f8d++) {
console.log(_0x9b1a + _0x5f8d);
}
}
return _0x7e8f;
}This is a nightmare for debugging and analysis. Readable.ai is the answer.
Our mission is simple: We use the power of AI (LLMs) to reverse-engineer this digital nightmare, restoring logic and human-readable meaning to obfuscated code.
We treat this as a "machine translation" problem:
- Source Language: "Minified" or "obfuscated" code.
- Target Language: The clean, original code.
Our goal is to turn the cryptic block above into its logical equivalent:
function checkThreshold(config, limit) {
var firstValue = config["data"][0];
var prefixKey = config["key"];
if(limit > firstValue) {
for(var index = 0; index < limit; index++) {
console.log(prefixKey + index);
}
}
return firstValue;
}We aren't training a model from scratch. We are fine-tuning a powerful, pre-existing model to specialize in this one task.
- Base Model:
TinyLlama/TinyLlama-1.1B-Chat-v1.0(A powerful 1.1-billion parameter model). - Training Technique: LoRA (Low-Rank Adaptation). This is a Parameter-Efficient Fine-Tuning (PEFT) technique. It allows us to "teach" the massive base model a new skill by training only a tiny fraction (< 2%) of its parameters.
- Fuel (The Dataset): This model is trained on a large-scale, custom-built dataset featuring tens of thousands of real-world, obfuscated-to-clean code pairs.
This project is trained in multiple stages due to dataset size and compute limitations:
-
Stage 1 (In Progress): Trained on the first 500,000 samples of the dataset.
- Result:
adapter_v1(Represents initial learning on a substantial data portion)
- Result:
-
Stage 2 (Upcoming): Loading
adapter_v1and continuing training on the next segment of the dataset (e.g., samples 500,001 to 1,000,000).- Result:
adapter_v2
- Result:
-
Stage 3 (Upcoming): Loading
adapter_v2and training on the subsequent segment until the full dataset is processed.- Result:
adapter_vX(The final adapter after processing all data)
- Result:
Your fine-tuned model consists of two parts: the base model (TinyLlama-1.1B-Chat-v1.0) and the adapter (your trained knowledge). To run inference, you must load the base model, then apply your adapter on top of it.
This is the standard 5-step procedure:
Ensure you have the necessary libraries installed:
pip install transformers peft accelerate torchFirst, load the original TinyLlama/TinyLlama-1.1B-Chat-v1.0 model from Hugging Face. This is the foundation your adapter will be applied to.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE_MODEL_ID = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID)
# TinyLlama uses an explicit chat template. We set pad_token for consistency.
tokenizer.pad_token = tokenizer.eos_token Using PeftModel, load your trained weights (from the adapter_model.safetensors file) and "graft" them onto the base model.
# Provide the path to your trained adapter directory
ADAPTER_PATH = "/path/to/your/trained_adapter/"
model = PeftModel.from_pretrained(base_model, ADAPTER_PATH)This is a critical optimization step. The merge_and_unload() command permanently fuses the adapter's weights into the base model. This creates a single, highly efficient model and significantly speeds up inference.
model = model.merge_and_unload()
print("✅ Adapter merged. Model is ready!")The model is now ready. You must format your request as a prompt that matches the structure the model was trained on, then call model.generate().
# 1. Define the obfuscated code
obfuscated_code = 'function _0x5dcf(_0x1a2b, _0x3c4d){var _0x7e8f=_0x1a2b["data"][0];var _0x9b1a=_0x1a2b["key"];if(_0x3c4d>_0x7e8f){for(var _0x5f8d=0;_0x5f8d<_0x3c4d;_0x5f8d++){console.log(_0x9b1a+_0x5f8d);}}return _0x7e8f;}'
# 2. Format the prompt
# **NOTE:** The prompt structure must match the one used during training.
prompt = f"### Input:\n{obfuscated_code}\n\n### Output:\n"
# 3. Tokenize and run generation
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.7,
do_sample=True
)
# 4. Decode the result
response_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print("--- Deobfuscation Result ---")
print(response_text)Understanding the resource needs is critical. Here is the breakdown based on our tests.
-
Training (LoRA): The fine-tuning process is highly efficient. By using LoRA,
fp16, and an efficient optimizer, a training session (like Stage 1 or 2) requires approximately 10-12 GB of VRAM. This is generally suitable for a high-end Colab GPU (like A100 or V100) or a mid-range local GPU. -
Inference (After Merging): This is the key benefit. After using
merge_and_unload()(Step 4), the adapter is fused into the base model.- The final merged model (Base + Adapter) requires the exact same VRAM as the original
TinyLlama-1.1Bbase model (approx. 2.4 GB infp16). - You pay no extra VRAM cost for the adapter's new knowledge.
- Inference speed is also identical to the original base model, as it's a single, unified model.
- The final merged model (Base + Adapter) requires the exact same VRAM as the original
-
Portability: This merged model can be saved and deployed as a standard transformer, without needing the
peftlibrary for inference.
If you find this project useful, want to support server costs, or just want to buy me a coffee, donations are appreciated.
BSC (Binance Smart Chain) Address:
0x1f7fa6d01f02583b48e0343a9e42cbd408ef3bfb
This project is licensed under the MIT License.