Local-first LoRA fine-tuning Studio with web UI
Windows-first • No Docker required • Runs on your machine
NamelyCorp LLM Studio is an end-to-end system for building document-grounded fine-tuned language models using Low-Rank Adaptation (LoRA). It provides a complete workflow from document ingestion to deployable models—all running locally on your machine.
Key Features:
- 📄 Dataset Builder — Generate Q&A training pairs from PDF, DOCX, TXT, CSV, and XLSX files
- ✅ Validation — Check dataset quality, remove duplicates, validate token counts, and filter low-quality entries
- ⚡ LoRA Training — Fine-tune with masked loss (assistant tokens only), configurable parameters, and training presets
- 📦 Export — Save LoRA adapters, merge full models, optional GGUF conversion for llama.cpp
- 🧪 Testing — Smoke-test inference locally before deployment
- 🖥️ Web UI — Intuitive dashboard for the complete workflow with real-time GPU monitoring
Privacy: This project is designed to run locally. No telemetry is included. Your documents and training data stay on your machine unless you choose to share them.
Overview with project statistics, quick actions, and system status monitoring
Generate Q&A pairs from your documents with OCR support
Clean and validate your training data
LoRA fine-tuning with configurable parameters and presets
- ML engineers and technical practitioners
- Developers training models on their own documents
- Teams who want repeatable, auditable fine-tuning workflows
- Windows users with NVIDIA GPUs (RTX 30/40 series recommended)
This is not a hosted service and does not include base model weights or trained artifacts.
- Operating System: Windows 11 recommended (Windows 10 also supported)
- Python: Python 3.11 (Python launcher
pymust be in PATH) - Disk Space: 50GB+ recommended (for models, datasets, and outputs)
- GPU: NVIDIA GPU with CUDA support (RTX 3060 or higher recommended)
- RTX 30/40 series with 8GB+ VRAM strongly recommended
- CPU-only training is possible but very slow
- CUDA Drivers: CUDA 12.1 compatible drivers
- RAM: 16GB+ system memory
- Tesseract OCR: For processing scanned PDFs
- Download from: https://github.com/UB-Mannheim/tesseract/wiki
- Add to PATH after installation
git clone https://github.com/NamelyCorp/NamelyCorp-LLM-Studio.git
cd NamelyCorp-LLM-StudioRun the setup script to create a virtual environment and install core dependencies (including CUDA PyTorch):
setup_llm.batThis will:
- Create
.venvwith Python 3.11 - Install CUDA 12.1 PyTorch and core libraries
- Install document processing and OCR dependencies
- Create a
docs/folder for your documents
PowerShell:
.\.venv\Scripts\Activate.ps1Command Prompt:
.venv\Scripts\activate.bathuggingface-cli logincd studio
setup_studio.bat
launch_studio.batThe Studio UI will be available at: http://localhost:7860
The Studio guides you through a complete fine-tuning pipeline:
- Dashboard — View project statistics, system status, and quick actions
- Dataset Builder — Generate Q&A pairs from your documents
- Validation — Clean and validate your training data
- Training — Fine-tune with LoRA using configurable parameters
- Export & Test — Export adapters, merge models, convert to GGUF, and test inference
- Project Statistics: Document count, dataset size, trained models
- Quick Actions: One-click shortcuts to common tasks
- System Status: Real-time GPU monitoring, CUDA toolkit status, Tesseract OCR availability, Python environment
Generate Q&A training pairs from your source documents.
Supported Input Formats:
- PDF (with optional OCR for scanned documents)
- DOCX (Microsoft Word)
- TXT (plain text)
- CSV (comma-separated values)
- XLSX (Microsoft Excel)
Configuration:
- Documents Directory: Path to your source documents
- Output CSV Path: Where to save the generated dataset
- Base Model: Model to use for tokenization
- Chunk Size: Characters per document chunk (default: 1400)
- Max Tokens: Maximum token length (default: 1024)
- OCR Configuration: Enable for scanned PDFs, select language
Process:
- Place your documents in the
docs/folder - Configure parameters in the UI
- Click "Build Dataset"
- Generated dataset is saved as CSV with
questionandanswercolumns
Check dataset quality and optionally clean problematic entries.
Validations Performed:
- Header validation (correct column names)
- Duplicate detection (exact duplicates)
- Token count validation (within max_tokens limit)
- Low-quality entry detection (very short or malformed content)
Configuration:
- Input CSV Path: Dataset to validate
- Model: Model to use for token counting
- Max Tokens: Token limit for validation
- Write Cleaned CSV: Optionally output cleaned dataset
- Cleaned Output Path: Where to save the cleaned dataset
Output:
- Markdown report with statistics and issues found
- Optional cleaned CSV with invalid entries removed
Fine-tune your model using LoRA (Low-Rank Adaptation).
Training Presets:
- Quick Test: Fast training with minimal resources (2 epochs, batch size 2)
- Balanced: Good balance of quality and speed (default settings)
- High Quality: More thorough training for production use
Training Parameters:
- Epochs: Number of training passes through the data
- Batch Size: Samples per batch
- Gradient Accumulation: Steps to accumulate gradients (effective batch = batch_size × grad_accum)
- Learning Rate: Step size for optimization (default: 1e-5)
- Max Sequence Length: Maximum token length
LoRA Configuration:
- Rank (r): LoRA rank parameter (default: 16)
- Alpha: LoRA alpha scaling (default: 32)
- Dropout: LoRA dropout rate (default: 0.05)
- Include MLP layers: Target both attention and MLP layers (recommended for content learning)
Post-Training:
- Merge & save full model: Combine LoRA adapters with base model for standalone deployment
- Output directory contains both adapters and merged model (if enabled)
Export trained models and test inference.
Export Options:
- LoRA Adapters: Lightweight adapter files (~100MB)
- Merged Full Model: Combined base model + adapters (~7GB for 3B models)
- GGUF Conversion: Convert to GGUF format for llama.cpp/CPU inference
Testing:
- Run quick inference tests to verify model quality
- Test on sample questions from your domain
For automation or scripting, all functionality is available via command-line scripts:
python make_dataset_from_docs.pyConfiguration is in the script. Edit paths for your setup.
python validate_qa.py --write-clean data_qa.cleaned.csv --report qa_report.mdpython train_ft.py --merge-fullCommon Training Options:
python train_ft.py \
--csv data_qa.cleaned.csv \
--model "C:\models\Llama-3.2-3B-Instruct" \
--output-dir ft_out \
--epochs 3 \
--batch-size 2 \
--grad-accum 16 \
--learning-rate 1e-5 \
--lora-r 16 \
--lora-alpha 32 \
--merge-fullpython test_inference.pyOutputs are written to ft_out/ directory (adapters, merged models, logs).
This repository intentionally does not include:
- Base Model Weights: You must download base models separately (e.g., Llama 3.2 from Hugging Face)
- Training Data: You must provide your own documents
- Trained Adapters/Models: No pre-trained fine-tuned models are included
- Model Licenses: You are responsible for complying with base model licenses (e.g., Meta LLaMA)
Why?
- Base models are typically 5-10GB+ and subject to their own licenses
- Training data is domain-specific and private to your use case
- This keeps the repository lightweight and license-compliant
LoRA (Low-Rank Adaptation) is an efficient fine-tuning method that:
- Updates only ~0.1-1% of model parameters (vs. 100% in full fine-tuning)
- Requires significantly less memory and training time
- Produces lightweight adapter files (~100MB vs. multi-GB full models)
- Can be merged back into the base model for standalone deployment
When to use LoRA fine-tuning:
- You have domain-specific documents (policies, FAQs, technical docs)
- You want the model to answer from your content authoritatively
- You need reproducible, auditable training workflows
LoRA vs. RAG (Retrieval-Augmented Generation):
- RAG: Retrieves relevant documents at inference time; good for large, changing knowledge bases
- LoRA: Embeds knowledge directly in model weights; good for stable, authoritative content
- Use both: RAG for broad context, LoRA for domain expertise and style
The Studio UI server can be accessed from other devices on your local network—this is intentional and useful for:
- Monitoring training progress from another computer
- Mobile access to check GPU status
- Team development environments
Security Note: The server runs on localhost:7860 by default. To enable LAN access, you would need to explicitly bind to 0.0.0.0, but DO NOT expose this to the public internet without:
- Authentication/authorization
- TLS/HTTPS encryption
- Firewall allowlist rules
- Network isolation
See SECURITY.md for detailed security guidance.
Problem: "CUDA not available" or GPU not detected
Solutions:
- Verify NVIDIA drivers are installed:
nvidia-smiin terminal - Reinstall CUDA-enabled PyTorch:
pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
- Check CUDA version compatibility: https://pytorch.org/get-started/locally/
Problem: "Tesseract not found" error
Solutions:
- Install Tesseract: https://github.com/UB-Mannheim/tesseract/wiki
- Add Tesseract to PATH:
- Default location:
C:\Program Files\Tesseract-OCR - Add to system PATH in Windows settings
- Default location:
- Verify installation:
tesseract --versionin terminal
Problem: "Address already in use" when launching Studio
Solutions:
- Check if port 7860 is in use:
netstat -ano | findstr :7860 - Kill the process using the port or choose a different port
- Edit
studio/app.pyto change the port (search for7860)
Problem: Training crashes with CUDA out of memory error
Solutions:
- Reduce batch size (try 1)
- Increase gradient accumulation steps (try 32 or 64)
- Reduce max sequence length (try 512 or 768)
- Use a smaller base model (e.g., 1B instead of 3B)
- Enable gradient checkpointing (enabled by default in
train_ft.py)
Problem: "ModuleNotFoundError" when running scripts
Solutions:
- Ensure virtual environment is activated:
.\.venv\Scripts\Activate.ps1
- Reinstall dependencies:
pip install -r requirements.txt
- For Studio UI, run
studio/setup_studio.batagain
Problem: Validation fails or training produces poor results
Solutions:
- Check dataset format: Must have
questionandanswercolumns - Remove duplicates: Use validation tab with "Write cleaned CSV" enabled
- Verify encoding: Save CSV as UTF-8
- Check content quality: Remove very short or nonsensical entries
- Increase dataset size: Aim for 100+ quality Q&A pairs minimum
NamelyCorp-LLM-Studio/
├── setup_llm.bat # One-shot environment setup (core dependencies)
├── make_dataset_from_docs.py # Generate Q&A dataset from documents
├── validate_qa.py # Validate and clean datasets
├── train_ft.py # LoRA fine-tuning script
├── test_inference.py # Test model inference
├── requirements.txt # Python dependencies
├── studio/ # Web UI application
│ ├── app.py # FastAPI backend server
│ ├── setup_studio.bat # UI dependencies setup
│ ├── launch_studio.bat # Launch Studio server
│ └── static/ # Frontend assets
│ ├── index.html # Main UI
│ ├── app.js # Frontend logic
│ ├── style.css # Styling
│ └── logo.svg # NamelyCorp logo
├── assets/ # Documentation screenshots
├── docs/ # Place your documents here (created by setup)
├── ft_out/ # Training outputs (created during training)
├── LICENSE # MIT License
├── SECURITY.md # Security policy and best practices
└── README.md # This file
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
Areas for Contribution:
- Additional document format support
- More training presets
- UI/UX improvements
- Documentation and tutorials
- Bug fixes and testing
- Only train on documents you have the right to use
- Do not include personal, confidential, or regulated data unless you have proper controls
- Respect base model licenses (e.g., Meta LLaMA, Mistral)
- Review exports before sharing—trained models can encode sensitive patterns
- Do not upload trained artifacts publicly unless you are sure your dataset is safe to disclose
This project is licensed under the MIT License. See LICENSE for details.
Important: This license applies to the NamelyCorp LLM Studio software only. Base models you use (e.g., Llama 3.2) are subject to their own licenses. You are responsible for compliance with all applicable licenses.
See SECURITY.md for security policy, reporting vulnerabilities, and best practices.
- Issues: Report bugs or request features via GitHub Issues
- Discussions: Community support via GitHub Discussions
- Documentation: Check this README and inline documentation in scripts
Built with:
- PyTorch - Deep learning framework
- Transformers - Hugging Face Transformers library
- PEFT - Parameter-Efficient Fine-Tuning
- FastAPI - Web framework for Studio UI
- Tesseract OCR - Optical character recognition
© 2026 NamelyCorp. All rights reserved.