Processing with large dataset
- clean the local copy of the table
- compute some basic information about the files in the table
- add these features to the table
- use the original raw data to create a second table
- make some pretty plots
The file and dataset level features computed in this repo was stored in a database and used for a dashboard.