Skip to content

ENH: Add support for EDGAR daily indices. #147

Closed
@jtkiley

Description

@jtkiley

EDGAR publishes a "full" index, that is an index of all filings in the current quarter, that pandas-datareader now supports. Once the quarter ends, its indices are moved to a folder and presented as daily indices. So, this enhancement will support pulling together a historical index for those of us who would like to use historical filings. I'm currently working on building this.

At the moment, I'm not planning on building in document retrieval directly, but I do want to make that reasonably easy. In my mind, a workflow would look like this:

  1. Use pandas-datareader to pull the EDGAR index for the time period of interest. (This is where pandas-datareader's role ends.)
  2. Use a list of CIK identifiers, particular filing types, or another dataset (via merge) to filter down the list of documents that you'd like to retrieve.
  3. Use wget, curl, or whatever to pull the documents.

When I say make it easy, I mean that I'm doing things like making the filename paths consistent in the returned index so that you can just concatenate the server name and the path to have a full link. In older data, the directory paths are missing a directory (presumably because they hadn't named it "EDGAR" yet).

My thinking on keeping document retrieval out is that these should be one-time pulls that we shouldn't cache, and the performance of a dedicated tool for pulling (potentially hundreds of thousands) documents should be far better than our readily-available options. Still, it should be trivial to create a download list from the index we return.

Any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions