This section contains detailed link check results for various datasets on the CMS's Provider Data Catalog (PDC). Each dataset has its own report detailing the status of the links and the accessibility of the data.
Here are the datasets we currently monitor:
- DAC: Doctors and Clinicians
- DF: Dialysis Facilities
- HC: Hospice Care
- HHS: Home Health Services
- HOS: Hospitals
- IRF: Inpatient Rehabilitation Facilities
- LTCH: Long-Term Care Hospitals
- NH: Nursing Homes Including Rehab Services
- PPL: Physician Office Visit Costs
- SUP: Supplier Directory
Each dataset report follows a consistent template to provide a comprehensive overview of the dataset's status and details. Below is a description of the sections included in each dataset markdown file:
-
Dataset Title
- Brief description of the dataset and its scope.
- Dataset ID: Unique identifier for the dataset.
- Status: Current status of the dataset (e.g., ✅ for accessible, ❌ for issues).
-
Dataset Details
-
File History: Detailed history of the dataset file, including creation, modification, release, and last checked dates.
File History
Activity Description Date Issued Date When the dataset was created YYYY-MM-DD Modified Date When it was last modified YYYY-MM-DD Release Date When the dataset was made public YYYY-MM-DD Last Checked When this dataset was last tested YYYY-MM-DD -
File Overview: Metrics related to the dataset file, such as filesize, row count, and column count.
File Overview
Metric Result Filesize 0.0 MB Row Count 55 Column Count 8
-
-
Data Integrity Tests
- Summary and results of basic data integrity tests, including column count consistency, header validation, and encoding validation.
✅
Test Description Result Column Count Consistency Verify that all rows have the same number of columns. ✅ Header Validation Ensure the CSV has a header row and all headers are unique and meaningful. ✅ Encoding Validation Verify that the CSV file uses UTF-8 encoding. UTF-8
- Summary and results of basic data integrity tests, including column count consistency, header validation, and encoding validation.
-
Public Access Tests
- Tests for public accessibility and A11y (Accessibility) compliance for dataset resources.
Page Status A11y Test [PDC Page](#) ✅ [![W3C Validation](https://img.shields.io/w3c-validation/default?targetUrl=#)](#) [Landing Page](#) ✅ [![W3C Validation](https://img.shields.io/w3c-validation/default?targetUrl=#)](#) [Direct Download](#) ✅
- Tests for public accessibility and A11y (Accessibility) compliance for dataset resources.
Our GitHub Actions workflow automatically runs these link checks every three hours and sends notifications if any issues are detected. You can view the latest workflow run results by clicking the badge above.
If you notice any issues or have suggestions for additional datasets to monitor, please open an issue or submit a pull request. We appreciate your contributions!
The dataset reports are generated using a Rust module that performs the following tasks:
-
Fetching Datasets
- The module fetches a list of datasets from the PDC API.
- Datasets are deserialized into a
Dataset
struct.
-
Processing Datasets
- Each dataset is processed in parallel to improve efficiency.
- The module checks the status of the dataset's download URL and landing page.
-
Generating Reports
- The module constructs a markdown report for each dataset, including:
- Dataset details (e.g., ID, title, description, issued date, modified date, release date).
- File history and overview (e.g., filesize, row count, column count).
- Data integrity tests (e.g., column count consistency, header validation, encoding validation).
- Public access tests (e.g., status of PDC page, landing page, and direct download link).
- Reports are saved to the
datasets
directory.
- The module constructs a markdown report for each dataset, including:
-
Error Handling and Logging
- The module uses Sentry for error tracking and performance monitoring.
- Detailed logging is performed using the
tracing
crate.
This ensures that all datasets listed on the Provider Data Catalog are regularly tested for accessibility and data integrity, with results being documented in a consistent and transparent manner.
For more details on the implementation, refer to the source code.
This README was generated with AI because I'm tired and don't want to do the documenting part. Bite me