AI-Agent Powered Computer-Assisted Structure Elucidation for Organic Natural Products
lucy-ng is a next-generation structure elucidation system designed for AI-agent driven workflows. Unlike traditional GUI-focused NMR processing tools, lucy-ng is built for programmatic, unattended operation where an AI agent (such as Claude) can iterate through the elucidation process until a structure is determined.
- Vision
- Key Features
- Scientific Background
- Installation
- Quick Start
- Usage
- Architecture
- Supported Data
- Development
- License
Lucy-ng represents a complete reimagining of Computer-Assisted Structure Elucidation (CASE) for the AI era. The original Lucy software was created by the project author and sold to Bruker. Lucy-ng builds on decades of CASE experience but prioritizes:
- AI-First Design: Every feature is accessible via structured APIs suitable for LLM agents
- Unattended Operation: No human intervention required during the elucidation process
- Iterative Refinement: AI agents can run multiple cycles of hypothesis generation and testing
- Transparent Reasoning: All processing steps are logged and explainable
Existing NMR processing tools like NMRium are GUI-focused, making it difficult for AI agents to interact with them programmatically. Structure elucidation typically requires:
- Loading multiple NMR spectra (1D and 2D)
- Peak picking with appropriate thresholds
- Cross-validation between experiments
- Generation of structural constraints
- Execution of constraint-based solvers
- Validation of proposed structures
Lucy-ng automates this entire pipeline while exposing each step as a programmable tool.
- Bruker Format Support: Read 1D and 2D NMR spectra from Bruker TopSpin format
- Automatic Experiment Detection: Identifies 1H, 13C, DEPT, COSY, HSQC, HMBC from pulse programs
- Processed Data Reading: Works with processed spectra (pdata/1/) for immediate use
- Adaptive Peak Picker: Automatically adjusts minimum distance based on line width (FWHM)
- DEPT-Guided HSQC Picking: Uses DEPT-135 as ground truth to ensure all protonated carbons are found
- HMBC-Guided Filtering: Removes noise peaks by validating against known carbon and proton positions
- Multiplicity Assignment: Automatically determines CH, CH2, CH3 from DEPT-135/90
- COCONUT Database: Match against ~895,000 natural products with predicted 13C shifts
- NMRShiftDB: Fallback database with ~33,000 experimental spectra
- Streaming Mode: Efficiently handles multi-GB databases without loading into memory
- Formula-Based Filtering: Only processes compounds matching the target molecular formula
- LSD Integration: Generate input files and execute the LSD solver
- Symmetry Detection: Identifies equivalent atoms from hydrogen budget and intensity analysis
- Constraint Generation: Automatic MULT, HSQC, HMBC, BOND constraints from spectroscopic data
- Solution Ranking: Rank LSD solutions by comparing predicted vs experimental 13C spectra
- HOSE Code Prediction: Predict 13C shifts from molecular structure using HOSE codes
- Fallback Strategy: Automatic radius reduction (6→1) for maximum coverage
- Pre-built Lookup Table: Fast predictions using nmrshiftdb2 reference data
- MCP Server: Model Context Protocol tools for Claude Desktop and other AI agents
- CLI Interface: Full command-line interface for scripting and testing
- Python API: Direct library access for custom workflows
Computer-Assisted Structure Elucidation follows a systematic approach:
NMR Spectra → Peak Picking → Constraint Extraction → Solver → Candidate Structures → Validation
Lucy-ng implements each step with careful attention to scientific accuracy:
Raw 2D peak picking produces many noise peaks and artifacts. Lucy-ng uses a guided peak picking strategy:
- DEPT as Ground Truth: DEPT-135 definitively shows all protonated carbons
- Adaptive Thresholding: HSQC threshold is lowered iteratively until all DEPT carbons are matched
- Cross-Validation: HMBC peaks are filtered to require matching carbon (from 13C) AND proton (from HSQC)
This dramatically reduces false correlations that would otherwise produce thousands of incorrect structures.
Equivalent carbons due to molecular symmetry appear as single NMR signals:
Ibuprofen (C13H18O2):
- Expected carbons from formula: 13
- Observed 13C signals: 10-11
- Reason: Para-disubstituted benzene has 2 pairs of equivalent carbons
Lucy-ng detects symmetry by:
- Comparing molecular formula H count with observed C-H count
- Analyzing relative peak intensities (doubled signals have ~2x intensity)
- Recognizing common symmetric motifs
The number of LSD solutions depends heavily on constraint quality:
| Constraint Source | Typical Solutions |
|---|---|
| Manual HMBC (16 correlations) | 900+ structures |
| Guided HMBC (28-30 correlations) | 1-10 structures |
Real experimental data with proper filtering provides much stronger constraints than manually constructed correlations.
# Basic installation
pip install "lucy-ng @ git+https://github.com/steinbeck/lucy-ng.git"
# With MCP server support (recommended)
pip install "lucy-ng[mcp] @ git+https://github.com/steinbeck/lucy-ng.git"Note for macOS/zsh users: The quotes are required because zsh interprets square brackets as glob patterns.
git clone https://github.com/steinbeck/lucy-ng.git
cd lucy-ng
pip install -e ".[dev,mcp]"The 13C shift prediction feature requires the hose-code-generator package. On Python 3.12, install it separately:
pip install git+https://github.com/Ratsemaat/HOSE_code_generator.git --no-depsNote: The
--no-depsflag skips a broken test dependency (xmlrunner) that doesn't work with Python 3.12. The prediction feature works fine without it.
LSD Solver (optional, for structure generation):
# Download from http://eos.univ-reims.fr/LSD/
# Extract and add to PATH, or specify location when runningFor dereplication, download one or both databases:
| Database | Size | Entries | Download |
|---|---|---|---|
| COCONUT | ~4.8 GB | ~895,000 | coconut.naturalproducts.net |
| NMRShiftDB | ~100 MB | ~33,000 | nmrshiftdb.nmr.uni-koeln.de |
Place in data/reference/ or ~/.lucy/ for auto-discovery.
from lucy_ng import BrukerReader
# Read 1D carbon spectrum
spectrum = BrukerReader.read_1d("data/Ibuprofen/2")
print(f"Nucleus: {spectrum.nucleus}")
print(f"Points: {len(spectrum.data)}")
print(f"PPM range: {spectrum.ppm_scale.min():.1f} - {spectrum.ppm_scale.max():.1f}")from lucy_ng import BrukerReader, AdaptivePeakPicker
spectrum = BrukerReader.read_1d("data/Ibuprofen/2")
peaks = AdaptivePeakPicker.pick_peaks(spectrum, threshold=0.05)
for peak in peaks.peaks[:5]:
print(f"{peak.position:.2f} ppm, intensity: {peak.intensity:.2e}")from lucy_ng import BrukerReader
from lucy_ng.processing import DEPTGuidedPicker
hsqc = BrukerReader.read_2d("data/Ibuprofen/6")
dept135 = BrukerReader.read_1d("data/Ibuprofen/3")
result = DEPTGuidedPicker.pick_hsqc_peaks(hsqc, dept135)
print(result.summary())
# Access: result.peaks, result.carbon_multiplicitieslucy dereplicate c13 data/Ibuprofen/2 C13H18O2from lucy_ng.dereplication import DereplicationService, NMRShiftDBLoader
from lucy_ng import BrukerReader
spectrum = BrukerReader.read_1d("data/Ibuprofen/2")
loader = NMRShiftDBLoader("data/reference/nmrshiftdb2withsignals.sd")
loader.load()
service = DereplicationService(loader)
result = service.dereplicate_from_spectrum(spectrum, "C13H18O2")
if result.is_match:
print(f"Match found: {result.top_matches[0].entry.name}")Lucy-ng provides a comprehensive CLI with five command groups:
# Read 1D spectrum
lucy read 1d data/Ibuprofen/2
# Read 2D spectrum
lucy read 2d data/Ibuprofen/6
# Output as JSON
lucy read 1d data/Ibuprofen/2 --format json# 1D peak picking
lucy pick 1d data/Ibuprofen/2
# DEPT-guided HSQC picking
lucy pick hsqc data/Ibuprofen/6 --dept135 data/Ibuprofen/3
# With DEPT-90 for CH/CH3 disambiguation
lucy pick hsqc data/Ibuprofen/6 --dept135 data/Ibuprofen/3 --dept90 data/Ibuprofen/4
# Guided HMBC picking
lucy pick hmbc data/Ibuprofen/7 --c13 data/Ibuprofen/2 --hsqc data/Ibuprofen/6lucy analyze symmetry data/Ibuprofen C13H18O2# Default uses COCONUT database
lucy dereplicate c13 data/Ibuprofen/2 C13H18O2
# Specify database
lucy dereplicate c13 data/Ibuprofen/2 C13H18O2 --database data/reference/nmrshiftdb2withsignals.sd# Generate LSD input file
lucy lsd generate data/Ibuprofen C13H18O2 -o ibuprofen.lsd
# Run LSD solver
lucy lsd run ibuprofen.lsd
# Rank solutions by 13C prediction
lucy lsd rank output/ data/Ibuprofen/2 --top 10
# Check LSD availability
lucy lsd check# Predict shifts from SMILES
lucy predict c13 "CC(C)Cc1ccc(cc1)C(C)C(=O)O"
# Build lookup table (one-time setup)
lucy predict build-table data/reference/nmrshiftdb2withsignals.sd
# Show table info
lucy predict table-infoSee docs/USER_GUIDE.md for comprehensive Python API documentation.
Lucy-ng works seamlessly with Claude Code for interactive structure elucidation. Copy this prompt to set up a new machine:
Set up this machine for NMR structure elucidation with lucy-ng:
1. Check/install lucy-ng: `lucy --version || pip install lucy-ng`
2. Check LSD solver: `lucy lsd check` - if missing, install from http://eos.univ-reims.fr/LSD/
3. Download CLAUDE.md to .claude/CLAUDE.md: https://raw.githubusercontent.com/steinbeck/lucy-ng/master/CLAUDE.md
4. Create .claude/settings.json with: {"permissions":{"allow":["Bash(lucy:*)","Bash(python3:*)"]}}
Once ready, place Bruker NMR data in a folder and I'll perform complete structure elucidation.
After setup, simply ask:
Perform structure elucidation on the NMR data in this folder. Molecular formula: C13H18O2
Claude will automatically run dereplication, pick peaks, analyze symmetry, generate LSD constraints, solve structures, and rank solutions by 13C shift prediction.
Lucy-ng includes an MCP (Model Context Protocol) server for AI agent integration.
| Tool | Description |
|---|---|
read_spectrum_1d |
Read 1D NMR spectrum metadata |
read_spectrum_2d |
Read 2D NMR spectrum metadata |
pick_peaks_1d |
Pick peaks from 1D spectrum |
pick_hsqc_peaks |
DEPT-guided HSQC peak picking |
pick_hmbc_peaks |
Guided HMBC peak picking |
analyze_symmetry |
Detect molecular symmetry |
dereplicate_c13 |
Match against reference database |
generate_lsd_input |
Generate LSD input from NMR data |
run_lsd |
Execute LSD solver |
check_lsd_availability |
Check if LSD is installed |
rank_lsd_solutions |
Rank solutions by 13C prediction similarity |
predict_c13_shifts |
Predict 13C shifts from SMILES |
Add to ~/.config/Claude/claude_desktop_config.json:
{
"mcpServers": {
"lucy-ng": {
"command": "lucy-mcp"
}
}
}Then ask Claude: "Use lucy-ng to analyze the NMR data in data/Ibuprofen and identify the compound"
See docs/MCP_INTEGRATION.md for detailed setup instructions.
AI Agent Guide: For comprehensive guidance on structure elucidation workflow, pitfalls, and best practices, see CLAUDE.md.
lucy-ng/
├── src/lucy_ng/
│ ├── models/ # Pydantic data models (Spectrum1D, Peak1D, etc.)
│ ├── readers/ # NMR file readers (BrukerReader)
│ ├── processing/ # Peak picking and signal processing
│ │ ├── peak_picker.py # AdaptivePeakPicker
│ │ ├── peak_picker_2d.py # PeakPicker2D
│ │ ├── dept_guided_picker.py # DEPTGuidedPicker
│ │ └── hmbc_guided_picker.py # HMBCGuidedPicker
│ ├── analysis/ # Symmetry analysis tools
│ ├── dereplication/ # Database matching
│ │ ├── coconut.py # COCONUT loader (streaming)
│ │ ├── nmrshiftdb.py # NMRShiftDB loader
│ │ └── service.py # DereplicationService
│ ├── lsd/ # LSD solver integration
│ │ ├── generator.py # LSD input file generation
│ │ └── runner.py # LSD execution
│ ├── prediction/ # 13C shift prediction
│ │ ├── hose.py # HOSE code generation
│ │ ├── lookup.py # Lookup table management
│ │ └── predictor.py # C13Predictor
│ ├── ranking/ # Solution ranking
│ │ └── ranker.py # SolutionRanker
│ ├── cli/ # Click-based CLI
│ └── mcp/ # MCP server
│ └── server.py # FastMCP tools (12 tools)
├── tests/ # pytest test suite
├── data/ # Test NMR datasets
└── docs/ # Documentation
- Static Methods for Convenience: Core operations like
AdaptivePeakPicker.pick_peaks()are static for easy use - Pydantic Models: All data structures use Pydantic v2 for validation and serialization
- Structured Results: Every operation returns a result object with success status and detailed data
- Error Tolerance: Operations return error information rather than raising exceptions
- Bruker TopSpin 1D and 2D spectra (processed data from pdata/1/)
- SD files for reference databases
| Type | Detection | Notes |
|---|---|---|
| 1H | Nucleus | Proton spectrum |
| 13C | Nucleus | Carbon spectrum |
| DEPT-135 | Pulse program | CH/CH3 up, CH2 down |
| DEPT-90 | Pulse program | CH only |
| COSY | Pulse program | H-H correlations |
| HSQC | Pulse program (inv4) | Direct C-H |
| HMBC | Pulse program (inv4lr) | Long-range C-H |
# All tests
pytest
# With coverage
pytest --cov=lucy_ng
# Excluding slow dereplication tests
pytest --ignore=tests/test_dereplication.py# Type checking
mypy src/lucy_ng
# Linting
ruff check src tests
# Format checking
ruff format --check src testshatch buildMIT License - see LICENSE for details.
- Original Lucy concept by Christoph Steinbeck
- LSD solver by Jean-Marc Nuzillard
- nmrglue library for NMR file parsing
- COCONUT and NMRShiftDB for reference data
If you use lucy-ng in your research, please cite:
@software{lucy-ng,
author = {Steinbeck, Christoph},
title = {lucy-ng: AI-Agent Powered Computer-Assisted Structure Elucidation},
url = {https://github.com/steinbeck/lucy-ng},
year = {2026}
}