This project was developed with the assistance of Claude, an AI assistant by Anthropic.
A comprehensive metadata extraction toolkit for digital pathology research, designed to standardize cohort definition and image analysis workflows in QuPath.
The QuPath Cohort Metadata Extractor is a robust workflow designed for pathologists and researchers who need to systematically analyze large collections of whole slide images (WSI). It automatically extracts comprehensive metadata from images in QuPath projects, enabling efficient cohort definition, quality control, and standardized analysis workflows.
- Comprehensive Metadata Extraction: Captures 50+ metadata fields including scanner details, acquisition settings, calibration data, and technical specifications
- Multi-Scanner Support: Compatible with major digital pathology scanners (Aperio, Hamamatsu, Ventana, Philips, etc.)
- Quality Control: Automated quality assessment and issue detection
- Batch Processing: Processes entire QuPath projects automatically
- Export Flexibility: Outputs data in CSV format for easy analysis in Excel, R, Python, or other tools
- Analysis Recommendations: Provides optimal analysis level suggestions for each image
- Interoperability: Designed to integrate with existing digital pathology workflows
- Multi-center studies: Standardize image analysis across different institutions and scanners
- Retrospective studies: Analyze historical slide collections with consistent metadata
- Quality assurance: Identify and flag images requiring manual review
- Cohort selection: Define homogeneous patient cohorts based on image characteristics
- Batch effect analysis: Identify and control for technical variations
- Protocol optimization: Determine optimal analysis parameters for different image types
- Teaching collections: Organize and categorize educational slide sets
- Research training: Provide students with comprehensive image metadata for analysis projects
- QuPath: Version 0.6.0 or later (tested with 0.6.0-rc3)
- Image formats: SVS, TIFF, NDPI, VSI, SCN, and other QuPath-supported formats
- System requirements: Standard QuPath installation requirements
-
Download the scripts:
git clone https://github.com/sbalci/metadata-qupath.git cd qupath-cohort-extractor
-
Copy to QuPath scripts directory:
- Windows:
%USERPROFILE%\.qupath\scripts\
- macOS:
~/.qupath/scripts/
- Linux:
~/.qupath/scripts/
- Windows:
-
Available script versions:
QuPathCohortExtractor.groovy
- Full-featured versionSimpleMetadataExtractor.groovy
- Lightweight version for testingQuPath_v06_Compatible.groovy
- Optimized for QuPath 0.6+
- Copy
MenuSetup.groovy
to your QuPath scripts directory - Add the following to your QuPath startup scripts:
runScript(new File(QPEx.getQuPathUserDirectory(), "scripts/MenuSetup.groovy"))
- Restart QuPath to see the new "Cohort Analysis" menu
-
Prepare your project:
- Open QuPath and create/load a project with your WSI files
- Ensure all images are properly imported and accessible
-
Run the extraction:
// For menu-integrated version // Navigate to: Analyze > Cohort Analysis > Extract Cohort Metadata // For direct script execution // Run the QuPath_v06_Compatible.groovy script
-
Review the output:
- Find results in the
cohort_metadata/
directory within your project folder - Open
cohort_metadata_v06.csv
in Excel or your preferred analysis tool
- Find results in the
// Analyze currently open image
def projectEntry = QPEx.getProjectEntry()
def extractor = new CohortMetadataExtractor(projectEntry)
def metadata = extractor.extractMetadata()
println("Metadata extracted: ${metadata.size()} fields")
// Load exported metadata
def cohortData = CohortUtils.loadCohortMetadata("cohort_metadata_v06.csv")
// Filter high-quality images
def highQualityImages = CohortUtils.filterImages(cohortData, [
has_pyramid: true,
scan_warning: "NONE",
estimated_magnification: 40
])
println("Found ${highQualityImages.size()} high-quality 40x images")
Contains 50+ columns of metadata including:
Field | Description | Example |
---|---|---|
image_name |
Filename of the image | kontrol15.01.25_14_6_134952.svs |
width_pixels |
Image width in pixels | 47622 |
height_pixels |
Image height in pixels | 63413 |
pixel_width_um |
Pixel size in micrometers | 0.263312 |
estimated_magnification |
Calculated magnification | 40 |
Field | Description | Example |
---|---|---|
scanner_type |
Scanner model | GT450 |
scanscope_id |
Scanner identifier | 1111111 |
scan_date |
Date of image acquisition | 01/07/2025 |
scan_time |
Time of image acquisition | 08:29:16 |
apparent_magnification |
Scanner-reported magnification | 40X |
Field | Description | Example |
---|---|---|
has_pyramid |
Whether image has pyramid structure | true |
scan_warning |
Any scanner warnings | NONE |
compression_quality |
JPEG compression quality | 91 |
file_size_mb |
File size in megabytes | 563.87 |
Field | Description | Example |
---|---|---|
suggested_analysis_level |
Optimal pyramid level for analysis | 1 |
needs_pyramid |
Whether image needs pyramid for performance | false |
detailed_summary_v06.txt
: Human-readable summary with statisticsprocessing_log.txt
: Detailed processing log with any errors
import pandas as pd
import matplotlib.pyplot as plt
# Load cohort data
df = pd.read_csv('cohort_metadata_v06.csv')
# Basic statistics
print(f"Total images: {len(df)}")
print(f"Scanners: {df['scanner_type'].unique()}")
print(f"Date range: {df['scan_date'].min()} to {df['scan_date'].max()}")
# Quality assessment
quality_issues = df[
(df['scan_warning'] != 'NONE') |
(df['compression_quality'] < 85) |
(~df['has_pyramid'])
]
print(f"Images with quality concerns: {len(quality_issues)}")
# Magnification distribution
df['estimated_magnification'].hist(bins=20)
plt.title('Magnification Distribution')
plt.xlabel('Magnification')
plt.ylabel('Number of Images')
plt.show()
library(dplyr)
library(ggplot2)
# Load data
cohort_data <- read.csv("cohort_metadata_v06.csv")
# Scanner analysis
scanner_summary <- cohort_data %>%
group_by(scanner_type, scan_date) %>%
summarise(
image_count = n(),
avg_file_size = mean(file_size_mb, na.rm = TRUE),
.groups = 'drop'
)
# Visualization
ggplot(cohort_data, aes(x = pixel_width_um, y = estimated_magnification)) +
geom_point(aes(color = scanner_type)) +
labs(title = "Pixel Size vs Magnification by Scanner",
x = "Pixel Width (ΞΌm)", y = "Estimated Magnification")
- Open the CSV file in Excel
- Create pivot tables for:
- Scanner type distribution
- Acquisition date analysis
- Quality metrics summary
- Apply filters to define your cohort:
- Magnification range
- Scanner type
- Date range
- Quality criteria
Add custom extraction logic by extending the CohortMetadataExtractor
class:
class CustomExtractor extends CohortMetadataExtractor {
def extractStainInfo() {
// Custom stain detection logic
if (metadata.description?.toLowerCase()?.contains('he')) {
metadata.stain_type = 'H&E'
}
}
}
// Use metadata to set analysis parameters
def cohortData = CohortUtils.loadCohortMetadata("cohort_metadata_v06.csv")
def currentImage = cohortData.find { it.image_name == getCurrentImageData().getServer().getMetadata().get('Name') }
if (currentImage) {
def analysisLevel = currentImage.suggested_analysis_level
def pixelSize = currentImage.pixel_width_um
// Configure your analysis based on metadata
println("Using analysis level: ${analysisLevel}")
println("Target pixel size: ${pixelSize * Math.pow(2, analysisLevel)} ΞΌm")
}
Issue: "No signature of method getImageType()"
- Solution: Use
QuPath_v06_Compatible.groovy
for QuPath 0.6+
Issue: CSV file has only 4 columns
- Solution: API compatibility issue - use the v0.6+ compatible version
Issue: "Could not load server" errors
- Solution: Check that image files are accessible and not corrupted
Issue: Missing scanner metadata
- Solution: Some formats may have limited metadata; this is normal
- Large projects: Process in batches or use filters
- Network storage: Copy files locally before processing
- Memory issues: Increase QuPath memory allocation in preferences
We welcome contributions from the digital pathology community!
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature
- Make your changes and test thoroughly
- Commit your changes:
git commit -m 'Add amazing feature'
- Push to the branch:
git push origin feature/amazing-feature
- Open a Pull Request
- Support for additional scanner types
- Integration with other digital pathology platforms
- Enhanced quality control metrics
- Documentation improvements
- Bug fixes and performance optimizations
If you use this workflow in your research, please cite:
@software{qupath_cohort_extractor,
title={QuPath Cohort Metadata Extractor},
author={[Your Name/Institution]},
year={2025},
url={https://github.com/sbalci/qupath-cohort-extractor}
}
This project is licensed under the MIT License - see the LICENSE file for details.
- QuPath Development Team for creating an excellent open-source platform
- Digital pathology community for feedback and testing
- Bio-Formats library for supporting multiple image formats
- OpenSlide library for WSI format support
- Issues: Report bugs and request features via GitHub Issues
- Discussions: Join the conversation in GitHub Discussions
- QuPath Forum: For general QuPath questions, use the QuPath Forum
- v2.0.0: QuPath 0.6+ compatibility, enhanced metadata extraction
- v1.1.0: Added menu integration and configuration options
- v1.0.0: Initial release with basic metadata extraction
Made with β€οΈ for the digital pathology community
Star β this repository if you find it useful!