Skip to content

Yoko286/Hg-MC-Auto

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hg-MC-Auto: An End-to-End Self-Driving Pipeline for Mercury Isotope Analysis

Python License GitHub stars

Hg-MC-Auto is a comprehensive, intelligent pipeline for automated mercury isotope analysis by MC-ICP-MS, integrating robotic data extraction, expert-informed quality control, and machine learning diagnostics. Online paper: https://doi.org/10.1039/D5JA00519A

Graphic Abstract

✨ Features

  • Automated Data Processing: Robotic export from proprietary software to structured formats
  • Intelligent Quality Control: Hierarchical ML models with 99.6% F1-score accuracy
  • Root Cause Diagnosis: Multi-class classification for anomaly identification
  • User-Friendly Interface: Interactive GUI with configurable options
  • Scalable Framework: Modular design for extension to other isotope systems
  • Expert-Validated: Built on 26,000+ historical MC-ICP-MS measurements

📋 Table of Contents

  1. Installation
  2. Architecture
  3. Project Structure
  4. Usage Guide
  5. Models
  6. Citation
  7. Contact

🚀 Installation

Prerequisites

  • Python 3.9 or higher
  • 8GB RAM minimum (16GB recommended for large datasets)
  • MC-ICP-MS raw data files (.dat and .log formats)

Step-by-Step Setup

1. Clone the repository

git clone https://github.com/Yoko286/Hg-MC-Auto.git

2. Navigate to the project directory

cd Hg-MC-Auto

3. Create and activate a virtual environment

python -m venv hg-auto-env

On Windows:

hg-auto-env\Scripts\activate

On macOS/Linux:

source hg-auto-env/bin/activate

4. Install dependencies

pip install -r requirements.txt

5. Run the application

python src/main.py

🏗️ Architecture

Hg-MC-Auto employs a three-tier architecture that transforms raw MC-ICP-MS data into quality-assured isotopic results:

Overall Architecture Diagram

Core Components Data Automation Layer

Robotic Process Automation (RPA) for data extraction

Automated file conversion and merging

Batch processing of historical data

Intelligence Layer

Expert rule-based filtering

Ensemble machine learning models

Hierarchical anomaly detection

Application Layer

Interactive graphical interface

Configurable quality thresholds

Comprehensive reporting

📁 Project Structure

Hg-MC-Auto/
│
├── data/                     # Sample datasets for testing
│   ├── train dataset.xlsx             # Contact my email to get: zhouchufan@mail.gyig.ac.cn
│   └── validation dataset.xlsx        # Model validation dataset
│
├── docs/                     # Documentation and figures
│
├── model/                    # Pre-trained ML models
│   ├── Exter_ML_model/      # Binary classification models
│   │   ├── top3_model_1_Random_Forest_SMOTE.pkl
│   │   ├── top3_model_2_Bagging_RF_UnderSampling.pkl
│   │   └── top3_model_3_XGBoost_UnderSampling.pkl
│   │
│   └── Inter_ML_model/      # Multi-class diagnostic models
│       ├── Basic_features/  # Core feature-based model
│       └── Enhanced_features/ # Advanced feature-based model
│
├── results/                  # Output directory for analysis results
│
├── src/                      # Source code
│   ├── main.py              # Main application entry point
│   ├── 1_Automatic_export.py          # Data export automation
│   ├── 2_Automatic_calculation.py     # Isotope ratio calculation
│   ├── 3_Empirical_model.py           # Expert rule-based classification
│   ├── 4_ML_Predict.py                # ML model prediction interface
│   ├── 5_Exter_ML_train.py            # Binary classifier training
│   └── 6_Inter_ML_train.py            # Multi-class classifier training
│
├── custom_ranges_config.json          # User-configurable acceptance ranges
├── mouse_coordinates.config           # RPA coordinate settings
├── requirements.txt                   # Python dependencies
├── LICENSE                            # MIT License
└── README.md                          # This file

📖 Usage Guide

Launching the Application After installation, run:

python src/main.py

You will see the interactive interface:

Welcome to Hg_MC_Auto!
============================================================

Please select a task:
1. Automatically export isotope data
2. Automatically export instrument parameters, merge isotope data, and calculate isotope fractionation values
3. Classify data using an empirical model
4. Classify data using a machine learning model
5. Train your own machine learning expert model
0. Exit

Task Options Explained

Option 1: Automated Data Export Converts proprietary .dat files to structured CSV format

Merges with corresponding instrument log files

Uses RPA for vendor software interaction

Option 2: Isotope Calculation Calculates δ202Hg values relative to NIST SRM 3133

Computes mass-independent fractionation anomalies (Δ-values)

Batch processes entire datasets

Option 3: Empirical Model Classification Applies literature-based acceptance ranges (Table 1 in manuscript)

Flags measurements outside 95% confidence intervals

User-configurable thresholds via custom_ranges_config.json

Option 4: ML Model Prediction Uses pre-trained ensemble models for quality assessment

Provides confidence scores for each prediction

Identifies probable causes for abnormal measurements

Option 5: Custom Model Training Train laboratory-specific models using your annotated data

Supports both binary and multi-class classification

Adapts to different instrument performances and sample matrices

🤖 Models

Binary Classification Models Purpose: Distinguish between "Normal" and "Abnormal" measurements

Performance: Test F1-score: 0.9960, AUC: 0.999-1.0

Algorithms: Random Forest, XGBoost, Bagging Classifiers

Sampling Strategies: SMOTE, ADASYN, SMOTEENN, UnderSampling

Multi-class Diagnostic Models Purpose: Identify root causes of abnormalities

Categories:

"Possible instrument instability"

"Potential concentration anomaly"

"Combined factors"

"Other reasons, retesting recommended"

Features: Internal precision metrics, concentration mismatch ratios

📊 Performance Highlights

Metric Binary Classification Multi-class Diagnosis Accuracy 99.61% 99.84% F1-Score 0.9960 0.9909 (balanced) Recall (Normal) 99.8% - AUC 0.999-1.0 - Based on validation with 26,218 historical measurements

📝 Citation

If you use Hg-MC-Auto in your research, please cite:

bibtex @article{zhou2025selfdriving, title={A Data‑Driven, Post‑Acquisition Quality Diagnostic Pipeline for Isotope Analysis by MC-ICP-MS}, author={Zhou, Chufan and Huang, Qiang and Tang, Yang and Zhong, Ying and Feng, Xinbin}, journal={Journal of Analytical Atomic Spectrometry}, year={2025}, doi={10.1039/D5JA00519A} }

🤝 Contributing

We welcome contributions! Please:

Fork the repository

Create a feature branch

Submit a pull request

Ensure code follows PEP 8 guidelines

Include tests for new functionality

🐛 Issues and Support

Bug Reports: Use the GitHub Issues page

Questions: Check the Wiki or open a discussion

Feature Requests: Submit via GitHub Issues with the "enhancement" label

📧 Contact

Laboratory of Karst Environmental Evolution and Ecological Security, Institute of Geochemistry, Chinese Academy of Sciences, Guiyang, Guizhou 550081, China

We welcome experts from different laboratories to contribute their expertise and make contributions to the intelligent geochemistry laboratory. Welcome to join us and make a change together.

Chufan Zhou: 📧 zhouchufan@mail.gyig.ac.cn 🔗 ORCID: 0009-0008-0144-9017

Qiang Huang (Corresponding Author): 📧 huangqiang@mail.gyig.ac.cn 🔗 ORCID: 0000-0003-1568-9042

📄 License This project is licensed under the MIT License - see the LICENSE file for details.

About

A comprehensive, intelligent pipeline for automated mercury isotope analysis by MC-ICP-MS, integrating robotic data extraction, expert-informed quality control, and machine learning diagnostics.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages