Skip to content
This repository has been archived by the owner on Jan 18, 2024. It is now read-only.

Optimize how files are stored or processed #33

Open
schnuerle opened this issue Apr 23, 2018 · 1 comment
Open

Optimize how files are stored or processed #33

schnuerle opened this issue Apr 23, 2018 · 1 comment
Labels
Help Wanted Good items to start with if you are looking to help with the project Phase 1 RDS End to end data processor with hooks and alarms

Comments

@schnuerle
Copy link
Contributor

Once the RDS is complete in Phase 1, we'd like to optimize the data storage and processing process if possible. If you have ideas on how to do this, you can make a pull request to add support for it with examples to this repo so everyone can benefit.

One idea is to store each JSON file in a zip, thus reducing storage costs.

Use this issue as a place to discuss and collaborate.

@schnuerle schnuerle added Help Wanted Good items to start with if you are looking to help with the project Phase 1 RDS End to end data processor with hooks and alarms labels Apr 23, 2018
@Riebart
Copy link

Riebart commented Jun 20, 2018

Having done this before, LZ4 is pure magic, compressing fast, and decompressing really fast. We were able to lift the throughput out of S3 by almost an order of magnitude during batch processing by using LZ4 compressed CSVs in S3.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Help Wanted Good items to start with if you are looking to help with the project Phase 1 RDS End to end data processor with hooks and alarms
Projects
None yet
Development

No branches or pull requests

2 participants