Skip to content

Programmatic Access to CRS PDF/HTML Documents Referenced by API #413

@AndrVelich

Description

@AndrVelich

Dear Congress.gov API Team,

We are currently using the Congress.gov API to ingest structured metadata for CRS (Congressional Research Service) reports. The API endpoints for structured data function as expected and meet our requirements.

However, we are encountering issues when attempting to programmatically retrieve the associated PDF and HTML documents referenced in the API responses. The URLs provided point to the congress.gov domain, which appears to be protected by Cloudflare. Automated requests to download these files are blocked or challenged, preventing reliable programmatic access.

We would appreciate clarification on the following:

Is there an officially supported mechanism for programmatic retrieval of CRS PDF and HTML documents?

Are alternative endpoints, bulk download options, or authenticated access methods available for accessing these materials?

If access to congress.gov-hosted documents is intentionally restricted, is there a recommended approach for compliant automated retrieval?

Our use case involves systematic ingestion of publicly available CRS materials for research and indexing purposes. We are prepared to adhere to any applicable rate limits, authentication requirements, or usage policies.

We would appreciate your guidance on how best to proceed.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions