Dataset Link Check Results 📊

This section contains detailed link check results for various datasets on the CMS's Provider Data Catalog (PDC). Each dataset has its own report detailing the status of the links and the accessibility of the data.

Available Datasets

Here are the datasets we currently monitor:

DAC: Doctors and Clinicians
DF: Dialysis Facilities
HC: Hospice Care
HHS: Home Health Services
HOS: Hospitals
IRF: Inpatient Rehabilitation Facilities
LTCH: Long-Term Care Hospitals
NH: Nursing Homes Including Rehab Services
PPL: Physician Office Visit Costs
SUP: Supplier Directory

Template for Dataset Reports

Each dataset report follows a consistent template to provide a comprehensive overview of the dataset's status and details. Below is a description of the sections included in each dataset markdown file:

Dataset Report Structure

Dataset Title
- Brief description of the dataset and its scope.
- Dataset ID: Unique identifier for the dataset.
- Status: Current status of the dataset (e.g., ✅ for accessible, ❌ for issues).

Dataset Details

File History: Detailed history of the dataset file, including creation, modification, release, and last checked dates.

File History

Activity	Description	Date
Issued Date	When the dataset was created	YYYY-MM-DD
Modified Date	When it was last modified	YYYY-MM-DD
Release Date	When the dataset was made public	YYYY-MM-DD
Last Checked	When this dataset was last tested	YYYY-MM-DD

File Overview: Metrics related to the dataset file, such as filesize, row count, and column count.

File Overview

Metric Result

Filesize 0.0 MB

Row Count 55

Column Count 8

Data Integrity Tests

Summary and results of basic data integrity tests, including column count consistency, header validation, and encoding validation.

✅

Test	Description	Result
Column Count Consistency	Verify that all rows have the same number of columns.	✅
Header Validation	Ensure the CSV has a header row and all headers are unique and meaningful.	✅
Encoding Validation	Verify that the CSV file uses UTF-8 encoding.	UTF-8

Public Access Tests

Tests for public accessibility and A11y (Accessibility) compliance for dataset resources.

Page	Status	A11y Test
[PDC Page](#)	✅	[![W3C Validation](https://img.shields.io/w3c-validation/default?targetUrl=#)](#)
[Landing Page](#)	✅	[![W3C Validation](https://img.shields.io/w3c-validation/default?targetUrl=#)](#)
[Direct Download](#)	✅

Automated Checks

Our GitHub Actions workflow automatically runs these link checks every three hours and sends notifications if any issues are detected. You can view the latest workflow run results by clicking the badge above.

Contributing

If you notice any issues or have suggestions for additional datasets to monitor, please open an issue or submit a pull request. We appreciate your contributions!

How the Dataset Reports are Generated

The dataset reports are generated using a Rust module that performs the following tasks:

Fetching Datasets
- The module fetches a list of datasets from the PDC API.
- Datasets are deserialized into a Dataset struct.
Processing Datasets
- Each dataset is processed in parallel to improve efficiency.
- The module checks the status of the dataset's download URL and landing page.
Generating Reports
- The module constructs a markdown report for each dataset, including:
  - Dataset details (e.g., ID, title, description, issued date, modified date, release date).
  - File history and overview (e.g., filesize, row count, column count).
  - Data integrity tests (e.g., column count consistency, header validation, encoding validation).
  - Public access tests (e.g., status of PDC page, landing page, and direct download link).
- Reports are saved to the datasets directory.
Error Handling and Logging
- The module uses Sentry for error tracking and performance monitoring.
- Detailed logging is performed using the tracing crate.

This ensures that all datasets listed on the Provider Data Catalog are regularly tested for accessibility and data integrity, with results being documented in a consistent and transparent manner.

For more details on the implementation, refer to the source code.

This README was generated with AI because I'm tired and don't want to do the documenting part. Bite me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dataset Link Check Results 📊

Available Datasets

Template for Dataset Reports

Dataset Report Structure

Automated Checks

Contributing

How the Dataset Reports are Generated

Metric	Result
Filesize	0.0 MB
Row Count	55
Column Count	8

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dataset Link Check Results 📊

Available Datasets

Template for Dataset Reports

Dataset Report Structure

Automated Checks

Contributing

How the Dataset Reports are Generated