Merge branch 'master' of https://github.com/jsfenfen/990-xml-database

forestofthings · Mar 23, 2018 · 9c9f376 · 9c9f376
2 parents 6e03572 + 1e31188
commit 9c9f376
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/README.md b/README.md
@@ -21,7 +21,7 @@ Django app to consume and store 990 data and metadata. Depends on [IRSx](https:/
 
 2. Run `$ python manage.py enter_yearly_submissions <YYYY>` where YYYY is a the year corresponding to a yearly index file that has already been downloaded. { If it hasn't been downloaded you can retrieve it with irsx_index --year=YYYY }. This script checks to see if the IRS' index file is any bigger than the one one disk, and only runs if it has. You can force it to try to enter any new filings (regardless of whether the file is updated) with the `--enter` option.
 
-__There's a problem with the 2014 index file.__ An internal comma has "broken" the .csv format for some time. You can fix it with a perl one liner:
+__There's a problem with the 2014 index file.__ An internal comma has "broken" the .csv format for some time. You can fix it with a perl one liner (which first backs the file up to index_2014.csv.bak before modifying it)
 
 	$ perl -i.bak -p -e 's/SILVERCREST ASSET ,AMAGEMENT/SILVERCREST ASSET MANAGEMENT/g' index_2014.csv
 
@@ -33,7 +33,7 @@ We can see that it worked by diffing it.
 	---
 	> 11146506,EFILE,136171217,201212,1/14/2014,MOSTYN FOUNDATION INC CO SILVERCREST ASSET ,AMAGEMENT,990PF,93491211007003,201302119349100700  
 
-For more details see [here](https://github.com/jsfenfen/990-xml-reader/edit/master/2014_is_broken.md).
+For more details see [here](https://github.com/jsfenfen/990-xml-reader/blob/master/2014_is_broken.md).
 
 #### Generate the schema files - Not required
 
@@ -113,4 +113,4 @@ With most hosting providers, you'll need to configure additional storage to supp
 
 You may want to look into tuning your database parameters to better support data loading. And you'll get better performance if you only create indexes after loading is complete (and delete them before bulk loads take place).
 
-One random datapoint: on an Amazon t2.medium ec2 server (~$38/month) with 150 gigs of additional storage and postgres running on the default configs and writing to an SSD EBS volume, load time for the complete set of about 490,000 filings from 2017 took about 3 hours.
+One random datapoint: on an Amazon t2.medium ec2 server (~$38/month) with 150 gigs of additional storage and postgres running on the default configs and writing to an SSD EBS volume, load time for the complete set of about 490,000 filings from 2017 took about 3 hours.