sp500-scraper

Constituent history of the S&P 500 from various data sources.

Usage

Each data source is queried once a day, during weekdays. The data is saved in two formats - csv and parquet. Generally, each run is saved with the file name convention YYYYMMDD. I prefer to interact with this data using arrow.

First clone the repo:

git clone https://github.com/riazarbi/sp500-scraper

Then, in R (adapt to your language)

library(arrow)
open_dataset("sp500-scraper/wikipedia/sp500/parquet, unify_schemas = TRUE")

Data Source Notes

How far back this data goes depends on how far back I could parse data from websites.

iShares

STATUS: working
FIRST DATE: 2006-10-31

The iShares symbols differ sometimes from SEC tickers (see, for example, Visa Corp in each dataset). This source does include ISIN though.

Wikipedia

STATUS: working
FIRST DATE: 2007-03-07

Seems to conform to SEC tickers. The data structure has evolved over time; I've kept all the columns but tried to rename to line them up as best as possible. CIK was added on 2014-05-12, which makes symbol joining much easier.

Pre 2022-11-01 was collected by traversing Wikipedia commit history. We pulled all the revisions, omitted any changes that changed a large percentage of symbols, or were themselves overwritten within 12 hours. We hope this will eliminate spurious changes.

Post 2022-11-01 parsed html tables, pre 2022-11-01 parsed wikitext.

Tidyquant

STATUS: working
FIRST DATE: 2022-11-08

We simply run tidyquant::tq_index("SP500") and save the result to file. It contains the SEDOL and CUSIP numbers.

Useful links

SEC ticker <-> CIK mappings (clean, no completeness guarantee): https://www.sec.gov/files/company_tickers.json
All CIK matched to entity name (messy, complete): https://www.sec.gov/Archives/edgar/cik-lookup-data.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2,190 Commits
.github/workflows		.github/workflows
ishares/sp500		ishares/sp500
tidyquant/sp500		tidyquant/sp500
wikipedia/sp500		wikipedia/sp500
DESCRIPTION		DESCRIPTION
README.md		README.md
ishares.R		ishares.R
tidyquant.R		tidyquant.R
wikipedia.R		wikipedia.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sp500-scraper

Usage

Data Source Notes

iShares

Wikipedia

Tidyquant

Useful links

About

Releases

Packages

Languages

riazarbi/sp500-scraper

Folders and files

Latest commit

History

Repository files navigation

sp500-scraper

Usage

Data Source Notes

iShares

Wikipedia

Tidyquant

Useful links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages