QuPath Cohort Metadata Extractor

Acknowledgments

This project was developed with the assistance of Claude, an AI assistant by Anthropic.

A comprehensive metadata extraction toolkit for digital pathology research, designed to standardize cohort definition and image analysis workflows in QuPath.

🎯 Overview

The QuPath Cohort Metadata Extractor is a robust workflow designed for pathologists and researchers who need to systematically analyze large collections of whole slide images (WSI). It automatically extracts comprehensive metadata from images in QuPath projects, enabling efficient cohort definition, quality control, and standardized analysis workflows.

✨ Key Features

Comprehensive Metadata Extraction: Captures 50+ metadata fields including scanner details, acquisition settings, calibration data, and technical specifications
Multi-Scanner Support: Compatible with major digital pathology scanners (Aperio, Hamamatsu, Ventana, Philips, etc.)
Quality Control: Automated quality assessment and issue detection
Batch Processing: Processes entire QuPath projects automatically
Export Flexibility: Outputs data in CSV format for easy analysis in Excel, R, Python, or other tools
Analysis Recommendations: Provides optimal analysis level suggestions for each image
Interoperability: Designed to integrate with existing digital pathology workflows

🏥 Use Cases

Clinical Research

Multi-center studies: Standardize image analysis across different institutions and scanners
Retrospective studies: Analyze historical slide collections with consistent metadata
Quality assurance: Identify and flag images requiring manual review

Biomarker Research

Cohort selection: Define homogeneous patient cohorts based on image characteristics
Batch effect analysis: Identify and control for technical variations
Protocol optimization: Determine optimal analysis parameters for different image types

Educational Applications

Teaching collections: Organize and categorize educational slide sets
Research training: Provide students with comprehensive image metadata for analysis projects

📋 Requirements

QuPath: Version 0.6.0 or later (tested with 0.6.0-rc3)
Image formats: SVS, TIFF, NDPI, VSI, SCN, and other QuPath-supported formats
System requirements: Standard QuPath installation requirements

🚀 Installation

Option 1: Direct Script Installation (Recommended)

Download the scripts:

git clone https://github.com/sbalci/metadata-qupath.git
cd qupath-cohort-extractor

Copy to QuPath scripts directory:
- Windows: %USERPROFILE%\.qupath\scripts\
- macOS: ~/.qupath/scripts/
- Linux: ~/.qupath/scripts/
Available script versions:
- QuPathCohortExtractor.groovy - Full-featured version
- SimpleMetadataExtractor.groovy - Lightweight version for testing
- QuPath_v06_Compatible.groovy - Optimized for QuPath 0.6+

Option 2: Menu Integration

Copy MenuSetup.groovy to your QuPath scripts directory

Add the following to your QuPath startup scripts:

runScript(new File(QPEx.getQuPathUserDirectory(), "scripts/MenuSetup.groovy"))

Restart QuPath to see the new "Cohort Analysis" menu

📖 Usage

Basic Workflow

Prepare your project:
- Open QuPath and create/load a project with your WSI files
- Ensure all images are properly imported and accessible

Run the extraction:

// For menu-integrated version
// Navigate to: Analyze > Cohort Analysis > Extract Cohort Metadata

// For direct script execution
// Run the QuPath_v06_Compatible.groovy script

Review the output:
- Find results in the cohort_metadata/ directory within your project folder
- Open cohort_metadata_v06.csv in Excel or your preferred analysis tool

Command Examples

Single Image Analysis

// Analyze currently open image
def projectEntry = QPEx.getProjectEntry()
def extractor = new CohortMetadataExtractor(projectEntry)
def metadata = extractor.extractMetadata()
println("Metadata extracted: ${metadata.size()} fields")

Batch Processing with Filtering

// Load exported metadata
def cohortData = CohortUtils.loadCohortMetadata("cohort_metadata_v06.csv")

// Filter high-quality images
def highQualityImages = CohortUtils.filterImages(cohortData, [
    has_pyramid: true,
    scan_warning: "NONE",
    estimated_magnification: 40
])

println("Found ${highQualityImages.size()} high-quality 40x images")

📊 Output Data

Primary Output File: `cohort_metadata_v06.csv`

Contains 50+ columns of metadata including:

Basic Image Properties

Field	Description	Example
`image_name`	Filename of the image	`kontrol15.01.25_14_6_134952.svs`
`width_pixels`	Image width in pixels	`47622`
`height_pixels`	Image height in pixels	`63413`
`pixel_width_um`	Pixel size in micrometers	`0.263312`
`estimated_magnification`	Calculated magnification	`40`

Scanner Information

Field	Description	Example
`scanner_type`	Scanner model	`GT450`
`scanscope_id`	Scanner identifier	`1111111`
`scan_date`	Date of image acquisition	`01/07/2025`
`scan_time`	Time of image acquisition	`08:29:16`
`apparent_magnification`	Scanner-reported magnification	`40X`

Quality Metrics

Field	Description	Example
`has_pyramid`	Whether image has pyramid structure	`true`
`scan_warning`	Any scanner warnings	`NONE`
`compression_quality`	JPEG compression quality	`91`
`file_size_mb`	File size in megabytes	`563.87`

Analysis Recommendations

Field	Description	Example
`suggested_analysis_level`	Optimal pyramid level for analysis	`1`
`needs_pyramid`	Whether image needs pyramid for performance	`false`

Additional Output Files

detailed_summary_v06.txt: Human-readable summary with statistics
processing_log.txt: Detailed processing log with any errors

🔬 Analysis Examples

Python Integration

import pandas as pd
import matplotlib.pyplot as plt

# Load cohort data
df = pd.read_csv('cohort_metadata_v06.csv')

# Basic statistics
print(f"Total images: {len(df)}")
print(f"Scanners: {df['scanner_type'].unique()}")
print(f"Date range: {df['scan_date'].min()} to {df['scan_date'].max()}")

# Quality assessment
quality_issues = df[
    (df['scan_warning'] != 'NONE') | 
    (df['compression_quality'] < 85) |
    (~df['has_pyramid'])
]
print(f"Images with quality concerns: {len(quality_issues)}")

# Magnification distribution
df['estimated_magnification'].hist(bins=20)
plt.title('Magnification Distribution')
plt.xlabel('Magnification')
plt.ylabel('Number of Images')
plt.show()

R Integration

library(dplyr)
library(ggplot2)

# Load data
cohort_data <- read.csv("cohort_metadata_v06.csv")

# Scanner analysis
scanner_summary <- cohort_data %>%
  group_by(scanner_type, scan_date) %>%
  summarise(
    image_count = n(),
    avg_file_size = mean(file_size_mb, na.rm = TRUE),
    .groups = 'drop'
  )

# Visualization
ggplot(cohort_data, aes(x = pixel_width_um, y = estimated_magnification)) +
  geom_point(aes(color = scanner_type)) +
  labs(title = "Pixel Size vs Magnification by Scanner",
       x = "Pixel Width (μm)", y = "Estimated Magnification")

Excel Analysis

Open the CSV file in Excel
Create pivot tables for:
- Scanner type distribution
- Acquisition date analysis
- Quality metrics summary
Apply filters to define your cohort:
- Magnification range
- Scanner type
- Date range
- Quality criteria

🛠️ Advanced Configuration

Custom Metadata Fields

Add custom extraction logic by extending the CohortMetadataExtractor class:

class CustomExtractor extends CohortMetadataExtractor {
    def extractStainInfo() {
        // Custom stain detection logic
        if (metadata.description?.toLowerCase()?.contains('he')) {
            metadata.stain_type = 'H&E'
        }
    }
}

Integration with Analysis Workflows

// Use metadata to set analysis parameters
def cohortData = CohortUtils.loadCohortMetadata("cohort_metadata_v06.csv")
def currentImage = cohortData.find { it.image_name == getCurrentImageData().getServer().getMetadata().get('Name') }

if (currentImage) {
    def analysisLevel = currentImage.suggested_analysis_level
    def pixelSize = currentImage.pixel_width_um
    
    // Configure your analysis based on metadata
    println("Using analysis level: ${analysisLevel}")
    println("Target pixel size: ${pixelSize * Math.pow(2, analysisLevel)} μm")
}

⚠️ Troubleshooting

Common Issues

Issue: "No signature of method getImageType()"

Solution: Use QuPath_v06_Compatible.groovy for QuPath 0.6+

Issue: CSV file has only 4 columns

Solution: API compatibility issue - use the v0.6+ compatible version

Issue: "Could not load server" errors

Solution: Check that image files are accessible and not corrupted

Issue: Missing scanner metadata

Solution: Some formats may have limited metadata; this is normal

Performance Optimization

Large projects: Process in batches or use filters
Network storage: Copy files locally before processing
Memory issues: Increase QuPath memory allocation in preferences

🤝 Contributing

We welcome contributions from the digital pathology community!

How to Contribute

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes and test thoroughly
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Areas for Contribution

Support for additional scanner types
Integration with other digital pathology platforms
Enhanced quality control metrics
Documentation improvements
Bug fixes and performance optimizations

📚 Citation

If you use this workflow in your research, please cite:

@software{qupath_cohort_extractor,
  title={QuPath Cohort Metadata Extractor},
  author={[Your Name/Institution]},
  year={2025},
  url={https://github.com/sbalci/qupath-cohort-extractor}
}

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

QuPath Development Team for creating an excellent open-source platform
Digital pathology community for feedback and testing
Bio-Formats library for supporting multiple image formats
OpenSlide library for WSI format support

📞 Support

Issues: Report bugs and request features via GitHub Issues
Discussions: Join the conversation in GitHub Discussions
QuPath Forum: For general QuPath questions, use the QuPath Forum

🔄 Version History

v2.0.0: QuPath 0.6+ compatibility, enhanced metadata extraction
v1.1.0: Added menu integration and configuration options
v1.0.0: Initial release with basic metadata extraction

Made with ❤️ for the digital pathology community

Star ⭐ this repository if you find it useful!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
QuPathExtensions/CohortWorkflow		QuPathExtensions/CohortWorkflow
R		R
classifiers		classifiers
database		database
python		python
scripts		scripts
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config-example.json		config-example.json
practical_usage_examples.md		practical_usage_examples.md
project.qpproj		project.qpproj
project.qpproj.backup		project.qpproj.backup
qupath_menu_integration.md		qupath_menu_integration.md

Uh oh!

License

sbalci/metadata-qupath

Folders and files

Latest commit

History

Repository files navigation

QuPath Cohort Metadata Extractor

Acknowledgments

🎯 Overview

✨ Key Features

🏥 Use Cases

Clinical Research

Biomarker Research

Educational Applications

📋 Requirements

🚀 Installation

Option 1: Direct Script Installation (Recommended)

Option 2: Menu Integration

📖 Usage

Basic Workflow

Command Examples

Single Image Analysis

Batch Processing with Filtering

📊 Output Data

Primary Output File: cohort_metadata_v06.csv

Basic Image Properties

Scanner Information

Quality Metrics

Analysis Recommendations

Additional Output Files

🔬 Analysis Examples

Python Integration

R Integration

Excel Analysis

🛠️ Advanced Configuration

Custom Metadata Fields

Integration with Analysis Workflows

⚠️ Troubleshooting

Common Issues

Performance Optimization

🤝 Contributing

How to Contribute

Areas for Contribution

📚 Citation

📄 License

🙏 Acknowledgments

📞 Support

🔄 Version History

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Primary Output File: `cohort_metadata_v06.csv`

Packages