ENH: Add support for EDGAR daily indices.

EDGAR publishes a "full" index, that is an index of all filings in the current quarter, that `pandas-datareader` now supports. Once the quarter ends, its indices are moved to a folder and presented as daily indices. So, this enhancement will support pulling together a historical index for those of us who would like to use historical filings. I'm currently working on building this.

At the moment, I'm not planning on building in document retrieval directly, but I do want to make that reasonably easy. In my mind, a workflow would look like this:
1. Use `pandas-datareader` to pull the EDGAR index for the time period of interest. (This is where `pandas-datareader`'s role ends.)
2. Use a list of CIK identifiers, particular filing types, or another dataset (via merge) to filter down the list of documents that you'd like to retrieve.
3. Use `wget`, `curl`, or whatever to pull the documents.

When I say make it easy, I mean that I'm doing things like making the filename paths consistent in the returned index so that you can just concatenate the server name and the path to have a full link. In older data, the directory paths are missing a directory (presumably because they hadn't named it "EDGAR" yet).

My thinking on keeping document retrieval out is that these should be one-time pulls that we shouldn't cache, and the performance of a dedicated tool for pulling (potentially hundreds of thousands) documents should be far better than our readily-available options. Still, it should be trivial to create a download list from the index we return.

Any thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Add support for EDGAR daily indices. #147

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ENH: Add support for EDGAR daily indices. #147

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions