Optimize how files are stored or processed #33

schnuerle · 2018-04-23T01:57:45Z

Once the RDS is complete in Phase 1, we'd like to optimize the data storage and processing process if possible. If you have ideas on how to do this, you can make a pull request to add support for it with examples to this repo so everyone can benefit.

One idea is to store each JSON file in a zip, thus reducing storage costs.

Use this issue as a place to discuss and collaborate.

Riebart · 2018-06-20T20:53:57Z

Having done this before, LZ4 is pure magic, compressing fast, and decompressing really fast. We were able to lift the throughput out of S3 by almost an order of magnitude during batch processing by using LZ4 compressed CSVs in S3.

schnuerle added Help Wanted Good items to start with if you are looking to help with the project Phase 1 RDS End to end data processor with hooks and alarms labels Apr 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize how files are stored or processed #33

Optimize how files are stored or processed #33

schnuerle commented Apr 23, 2018

Riebart commented Jun 20, 2018

Optimize how files are stored or processed #33

Optimize how files are stored or processed #33

Comments

schnuerle commented Apr 23, 2018

Riebart commented Jun 20, 2018