Add PyHealth contributions for chest X-ray analysis #464
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Chest X-Ray Analysis with PyHealth
Introduction
This example demonstrates how to use PyHealth to perform chest X-ray classification, focusing on detecting abnormalities such as pneumonia and edema. It builds on the reproducibility efforts for the UniXGen model (Lee et al., 2023), a vision-language generative model for view-specific chest X-ray generation. The example leverages the CheXpert dataset and introduces two new PyHealth contributions:
chest_xray_classification_fn) to label chest X-rays based on diagnoses.radiographic_agreement) to evaluate inter-rater agreement for radiographic findings.Setup
First, ensure PyHealth and its dependencies are installed. Then, import the required modules and set up logging.
Notes
/path/to/chexpertwith the actual path to your CheXpert dataset directory.Data Preprocessing
Use the
chest_xray_classification_fntask to process the dataset and label X-ray images based on the presence of pneumonia or edema.Expected Output
The
dfDataFrame will contain columns likepatient_id,visit_id,xray_path,view_position, andlabel(1 if pneumonia or edema is present, 0 otherwise).Visualization
Visualize a sample X-ray image along with its label and view position.
Notes
cv2(OpenCV) is installed to load and display images.xray_pathpoints to a valid file.Model Training (Simple Example)
Train a basic PyHealth model (e.g.,
LogisticRegression) to classify X-rays. Note that this is a placeholder; in practice, you’d likely use a deep learning model (e.g., a CNN) to extract features from X-ray images.To-Do
LogisticRegressionwith a more suitable model (e.g., a CNN like ResNet) and preprocess X-ray images into feature vectors.feature_dimsbased on your feature extraction method.Evaluation
Evaluate the model’s predictions using the
radiographic_agreementmetric to measure inter-rater agreement between true and predicted labels.Expected Output
Cohen's Kappa: A value between -1 and 1, where 1 indicates perfect agreement.Percent Agreement: Percentage of matching labels (0-100%).Conclusion
This example demonstrates how PyHealth can be used to classify chest X-ray abnormalities using the CheXpert dataset. The
chest_xray_classification_fntask simplifies data preprocessing, while theradiographic_agreementmetric provides a robust evaluation of model performance. Future work could integrate these outputs with UniXGen’s generated X-rays for enhanced analysis.References