Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
jsfenfen committed Mar 23, 2018
2 parents 6e03572 + 1e31188 commit 9c9f376
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Django app to consume and store 990 data and metadata. Depends on [IRSx](https:/

2. Run `$ python manage.py enter_yearly_submissions <YYYY>` where YYYY is a the year corresponding to a yearly index file that has already been downloaded. { If it hasn't been downloaded you can retrieve it with irsx_index --year=YYYY }. This script checks to see if the IRS' index file is any bigger than the one one disk, and only runs if it has. You can force it to try to enter any new filings (regardless of whether the file is updated) with the `--enter` option.

__There's a problem with the 2014 index file.__ An internal comma has "broken" the .csv format for some time. You can fix it with a perl one liner:
__There's a problem with the 2014 index file.__ An internal comma has "broken" the .csv format for some time. You can fix it with a perl one liner (which first backs the file up to index_2014.csv.bak before modifying it)

$ perl -i.bak -p -e 's/SILVERCREST ASSET ,AMAGEMENT/SILVERCREST ASSET MANAGEMENT/g' index_2014.csv

Expand All @@ -33,7 +33,7 @@ We can see that it worked by diffing it.
---
> 11146506,EFILE,136171217,201212,1/14/2014,MOSTYN FOUNDATION INC CO SILVERCREST ASSET ,AMAGEMENT,990PF,93491211007003,201302119349100700

For more details see [here](https://github.com/jsfenfen/990-xml-reader/edit/master/2014_is_broken.md).
For more details see [here](https://github.com/jsfenfen/990-xml-reader/blob/master/2014_is_broken.md).

#### Generate the schema files - Not required

Expand Down Expand Up @@ -113,4 +113,4 @@ With most hosting providers, you'll need to configure additional storage to supp

You may want to look into tuning your database parameters to better support data loading. And you'll get better performance if you only create indexes after loading is complete (and delete them before bulk loads take place).

One random datapoint: on an Amazon t2.medium ec2 server (~$38/month) with 150 gigs of additional storage and postgres running on the default configs and writing to an SSD EBS volume, load time for the complete set of about 490,000 filings from 2017 took about 3 hours.
One random datapoint: on an Amazon t2.medium ec2 server (~$38/month) with 150 gigs of additional storage and postgres running on the default configs and writing to an SSD EBS volume, load time for the complete set of about 490,000 filings from 2017 took about 3 hours.

0 comments on commit 9c9f376

Please sign in to comment.