Skip to content

enhancement: select crawl by exact timestamp #66

@laurieburchell

Description

@laurieburchell

Querying the index brings back a status, timestamp, url triple, e.g.:

$ cdxt --cc --crawl CC-MAIN-2025-43 iter 'commoncrawl.org/get-started'  

status 200, timestamp 20251014220259, url https://www.commoncrawl.org/get-started
status 200, timestamp 20251016192109, url https://commoncrawl.org/get-started

It would be good to have direct method to bring back a particular record based on the timestamp alone. I'm aware you can do something like cdxt --cc --crawl CC-MAIN-2025-43 --from 20251016192109 --limit 1 warc 'commoncrawl.org/get-started' but a direct --timestamp flag or similar would be useful, given the presentation of the index records.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions