Skip to content

Conversation

@jalengg
Copy link

@jalengg jalengg commented May 8, 2025

Authors

Jalen Jiang - jalenj4
Rodigo Mata - mata6 @rodrigomata9

What

  • New dataset loader for joining MIMIC-3 discharge summaries to the SBDH labels provided by
    MIMIC-SBDH: A Dataset for Social and Behavioral Determinants of Health
  • Joins NOTEEVENTS.csv to MIMIC-SBDH.csv on ROW_ID, also include attributes to relate to patient id and charttime. Importantly, TEXT is truncated down to only the Social History portion, and is joined to the sbdh labels like community-present, community-absent, education, economics, environment, alcohol, tobacco and drug.

Usage

from pyhealth.datasets import SBDHDataset
data_dir = "/path/to/data"  # path containing NOTEEVENTS.csv and MIMIC-SBDH.csv
output_dir = "/path/to/output" #assuming this exsts
    
# Initialize the dataset
dataset = SBDHDataset(
    root=data_dir, # this might take a while, since social history extract happens upon instantiation of class
)
    
# display stats, should see 7025 rows
dataset.stats()

# export extracted social history to CSV to proceed with model training on classifying text to SBDH
social_history_path = os.path.join(output_dir, "social_history.csv")
dataset.export_social_history(social_history_path)

@linjc16 linjc16 added the Highlight for TAs to highlight label May 10, 2025
@jhnwu3 jhnwu3 requested a review from Copilot June 14, 2025 19:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new dataset loader class, SBDHDataset, for processing MIMIC-III discharge summaries by extracting the social history section and joining it with SBDH labels from the MIMIC-SBDH dataset.

  • Introduces the SBDHDataset class with methods for extracting social history and exporting the processed data.
  • Incorporates a new configuration file (mimic3_sbdh.yml) to define file paths, join parameters, and relevant attributes.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pyhealth/datasets/mimic3_sbdh.py New dataset loader class with extraction and export functions
pyhealth/datasets/configs/mimic3_sbdh.yml New configuration file defining dataset join parameters and attributes

@staticmethod
def _extract_social_history(text: str) -> str:
"""
Extract the social history section from a sinngle value for TEXT
Copy link

Copilot AI Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: 'sinngle' should be corrected to 'single'.

Suggested change
Extract the social history section from a sinngle value for TEXT
Extract the social history section from a single value for TEXT

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Highlight for TAs to highlight

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants