Programmatic Access to CRS PDF/HTML Documents Referenced by API

Dear Congress.gov API Team,

We are currently using the Congress.gov API to ingest structured metadata for CRS (Congressional Research Service) reports. The API endpoints for structured data function as expected and meet our requirements.

However, we are encountering issues when attempting to programmatically retrieve the associated PDF and HTML documents referenced in the API responses. The URLs provided point to the congress.gov domain, which appears to be protected by Cloudflare. Automated requests to download these files are blocked or challenged, preventing reliable programmatic access.

We would appreciate clarification on the following:

Is there an officially supported mechanism for programmatic retrieval of CRS PDF and HTML documents?

Are alternative endpoints, bulk download options, or authenticated access methods available for accessing these materials?

If access to congress.gov-hosted documents is intentionally restricted, is there a recommended approach for compliant automated retrieval?

Our use case involves systematic ingestion of publicly available CRS materials for research and indexing purposes. We are prepared to adhere to any applicable rate limits, authentication requirements, or usage policies.

We would appreciate your guidance on how best to proceed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Programmatic Access to CRS PDF/HTML Documents Referenced by API #413

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Programmatic Access to CRS PDF/HTML Documents Referenced by API #413

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions