OpenRamanDatabase is a comprehensive Flask-based web application designed to analyze Raman spectroscopy data and automatically match samples against a database of known microplastics references. The application features an automated sample nomenclature system, advanced baseline correction algorithms, and intelligent peak matching capabilities.
- Features
- Application Architecture
- Prerequisites
- Installation
- Usage Guide
- Sample Nomenclature System
- Baseline Correction Algorithms
- Database Structure
- API Endpoints
- Development
- Troubleshooting
- Automated Sample ID Generation: Implements LOC-TYP-YYMMDD-SEQ nomenclature system
- Multi-Algorithm Baseline Correction: Polynomial fitting, rolling ball, wavelet, and derivative methods
- Intelligent Peak Matching: Gaussian-weighted similarity scoring with position and intensity analysis
- Interactive Web Interface: Modern, responsive UI with real-time sample ID preview
- Reference Library Management: Browse and visualize all reference spectra with detailed metadata
- Sample History Tracking: Two-panel interface for managing analyzed samples
- Manual Match Override: Allow users to manually select best matches with plot regeneration
- Performance Monitoring: Built-in timing and performance metrics
- Session-Based Sequence Tracking: Automatic sample numbering per browser session
- Multiple File Format Support: CSV, TXT, and other common spectroscopy formats
- Real-Time Plot Generation: Dynamic visualization of spectra comparisons
- Database-Driven Architecture: SQLite backend with optimized queries
- Docker Containerization: Easy deployment and environment consistency
OpenRamanDatabase/
├── Frontend (Flask Templates)
│ ├── Sample Upload Interface
│ ├── Reference Library Browser
│ ├── Sample History Manager
│ └── Spectrum Visualization
├── Backend (Flask Application)
│ ├── Route Handlers (main.py)
│ ├── Processing Engine (utils.py)
│ ├── Database Layer (SQLite)
│ └── Plot Generation (Matplotlib)
└── Data Storage
├── Reference Database
├── Sample Bank
└── Generated Plots
- Flask Routes: Handle HTTP requests and responses
- Session Management: Track user sessions and sequence counters
- File Upload Processing: Handle multipart form data and file validation
- Template Rendering: Serve dynamic HTML with Jinja2 templating
- Spectrum Processing: Peak detection, baseline correction, normalization
- Similarity Calculation: Advanced Gaussian-weighted matching algorithms
- Plot Generation: Matplotlib-based visualization with peak annotations
- Database Operations: CRUD operations for samples and references
- Reference Spectra: Pre-calculated peak data and metadata
- Sample Bank: User-uploaded samples with match results
- Performance Optimization: Indexed queries and pre-computed values
- Responsive Design: Bootstrap-based modern UI
- Real-Time Updates: JavaScript for live sample ID preview
- Interactive Elements: Click-to-view functionality and search filters
[User Upload] → [File Processing] → [Baseline Correction] → [Peak Detection]
↓
[Plot Generation] ← [Database Storage] ← [Similarity Matching] ← [Normalization]
↓
[Web Interface Display] → [Manual Override Option] → [Plot Regeneration]
Before installing OpenRamanDatabase, ensure you have the following prerequisites:
- Operating System: Windows 10/11, macOS 10.14+, or Linux (Ubuntu 18.04+)
- Storage: At least 2GB free space for application and data
- Python: Version 3.8 or higher (if running without Docker)
- Docker Desktop: Latest version from Docker Official Site
- Docker Compose: Usually bundled with Docker Desktop
- Python 3.8+: From python.org
- Git: For cloning the repository
- Virtual Environment: Recommended for dependency isolation
-
Install Docker Desktop
- Windows/Mac: Download from Docker Official Site
- Linux (Ubuntu):
sudo apt-get update sudo apt-get install docker-ce docker-ce-cli containerd.io sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose sudo chmod +x /usr/local/bin/docker-compose
-
Clone and Setup
git clone https://github.com/Sailowtech/OpenRamanDatabase.git cd OpenRamanDatabase docker compose build docker compose up -
Access Application
- Open browser to
http://localhost:5000 - Application will be ready for use
- Open browser to
-
Clone Repository
git clone https://github.com/Sailowtech/OpenRamanDatabase.git cd OpenRamanDatabase -
Create Virtual Environment
python -m venv venv # Windows venv\Scripts\activate # macOS/Linux source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
-
Run Application
python -m app.main
- Navigate to Homepage: Open
http://localhost:5000 - Enter Sample Information:
- Location: Laboratory or sampling location (e.g., "LAB1", "FIELD2")
- Sample Type: Type of sample (e.g., "Microplastic", "Fiber", "Particle")
- Live Preview: Sample ID is generated automatically as you type
- Select Processing Algorithm:
- Polynomial: Best for smooth baseline variations
- Rolling Ball: Ideal for complex baseline structures
- Wavelet: Advanced denoising capabilities
- Derivative: Simple slope-based correction
- Upload Spectrum File: CSV or TXT format with wavelength and intensity columns
- View Results: Automatic matching with similarity scores and visualizations
Wavelength,Intensity
400.5,1250.3
401.0,1255.7
401.5,1248.9
...
- Search Functionality: Filter by material name or ID
- Quick Preview: Click any reference to view spectrum
- Detailed Information: Material properties and peak data
- Performance Metrics: Load times and database statistics
- Two-Panel Interface: Sample list on left, spectrum display on right
- Search and Filter: Find samples by ID or characteristics
- Click-to-View: Interactive spectrum display
- Sample Management: Delete outdated or incorrect samples
- Automatic matching seems incorrect
- Domain expertise suggests different match
- Quality control and validation
- From results page, click "Select Different Match"
- Choose alternative reference from similarity rankings
- System automatically regenerates plots
- Updated match is saved to database
- LOC: Location code (user-defined, uppercase)
- TYP: Sample type (user-defined)
- YYMMDD: Date in 2-digit year, month, day format
- SEQ: 3-digit sequence number (000-999)
LAB1-Microplastic-250604-001: First microplastic sample from LAB1 on June 4, 2025FIELD2-Fiber-250604-003: Third fiber sample from FIELD2 on the same dayOCEAN-Particle-250605-012: Twelfth particle sample from OCEAN on June 5, 2025
- Session-Based Sequencing: Counter resets per browser session
- Automatic Generation: No manual ID entry required
- Collision Prevention: Unique identifiers prevent database conflicts
- Traceability: Clear connection between sample origin and analysis date
Best for: Smooth, predictable baseline variations
# Mathematical basis: Least squares polynomial fitting
baseline = np.polyval(np.polyfit(wavelengths, intensities, degree), wavelengths)
corrected = intensities - baselineBest for: Complex baseline structures with multiple curves
- Simulates rolling a ball under the spectrum
- Effectively removes broad background features
- Preserves sharp peaks and valleys
Best for: Noisy spectra requiring denoising
- Uses discrete wavelet transforms
- Separates signal from noise components
- Configurable decomposition levels
Best for: Simple linear baseline slopes
- Calculates local derivatives
- Removes linear trends
- Fastest processing option
CREATE TABLE reference_spectra (
id TEXT PRIMARY KEY,
wavelength REAL,
intensity REAL,
comment TEXT
);CREATE TABLE reference_peaks (
id TEXT,
wavelength REAL,
intensity REAL,
FOREIGN KEY(id) REFERENCES reference_spectra(id)
);CREATE TABLE sample_bank (
sample_id TEXT,
wavelength REAL,
intensity REAL,
best_match TEXT,
similarity_score REAL
);- Pre-calculated Peaks: Reference peaks stored separately for faster matching
- Indexed Queries: Database indexes on frequently accessed columns
- Batch Operations: Efficient bulk data insertion and retrieval
| Route | Method | Description |
|---|---|---|
/ |
GET/POST | Main upload and analysis interface |
/library_list |
GET | Browse reference spectra library |
/sample_history |
GET | View and manage analyzed samples |
/spectrum/<id> |
GET | View individual spectrum details |
/save_manual_selection |
POST | Override automatic match selection |
/delete_sample |
POST | Remove sample from database |
/plots/<filename> |
GET | Serve generated plot images |
{
"sample_id": "LAB1-Microplastic-250604-001",
"best_match": "Polyethylene Reference",
"similarity_score": 0.847,
"processing_time": "2.345",
"plot_file": "sample_LAB1-Microplastic-250604-001_with_match.png"
}-
Clone and Setup
git clone https://github.com/Sailowtech/OpenRamanDatabase.git cd OpenRamanDatabase python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt
-
Development Mode
export FLASK_ENV=development # Windows: set FLASK_ENV=development python -m app.main
app/main.py: Flask routes and application logicapp/utils.py: Data processing and analysis functionsapp/templates/: HTML templates with Jinja2app/static/: CSS, JavaScript, and static assetsrequirements.txt: Python dependencies
def new_algorithm_baseline(intensities, parameter):
"""Implement new baseline correction method"""
# Add implementation
return corrected_intensitiesdef new_similarity_method(sample_peaks, ref_peaks):
"""Implement alternative similarity calculation"""
# Add implementation
return similarity_scorepython -c "from app.utils import generate_plots; generate_plots()"# Add new references from CSV
python create_db_from_csv.py
# Remove specific reference
python delete_id_from_db.py
# Database structure modifications
python alter_db.py# Check port availability
netstat -ano | findstr :5000
# Kill existing process
taskkill /PID <PID> /F
# Restart application
python -m app.main# Check database file permissions
ls -la app/database/microplastics_reference.db
# Recreate database if corrupted
python create_db_from_csv.py# Install missing matplotlib backends
pip install matplotlib
# Set matplotlib backend (in utils.py)
import matplotlib
matplotlib.use('Agg')- Supported Formats: CSV, TXT with wavelength/intensity columns
- File Size Limit: Default 16MB (configurable in Flask)
- Column Headers: Ensure proper wavelength and intensity column names
- Large Datasets: Consider pagination for sample history
- Memory Usage: Monitor RAM usage with large spectral files
- Database Optimization: Regular database maintenance and indexing
export FLASK_DEBUG=1 # Windows: set FLASK_DEBUG=1
python -m app.main- Check console output for processing times
- Monitor database query performance
- Review matplotlib backend compatibility
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for low-cost Raman spectroscopy applications
- Designed for microplastics identification and analysis
- Community-driven reference database expansion
