Closed
Description
MOAR work!
Here is how things look on the tiq-test
data directory right now:
aperture-2:data alexcp$ ls
enriched population raw
aperture-2:data alexcp$ ls raw
public_inbound public_outbound
aperture-2:data alexcp$ ls raw/pu
public_inbound/ public_outbound/
aperture-2:data alexcp$ ls raw/public_inbound/
20140615.csv.gz 20140618.csv.gz 20140622.csv.gz 20140625.csv.gz 20140628.csv.gz 20140701.csv.gz 20140704.csv.gz 20140707.csv.gz 20140710.csv.gz 20140713.csv.gz
20140616.csv.gz 20140619.csv.gz 20140623.csv.gz 20140626.csv.gz 20140629.csv.gz 20140702.csv.gz 20140705.csv.gz 20140708.csv.gz 20140711.csv.gz 20140714.csv.gz
20140617.csv.gz 20140620.csv.gz 20140624.csv.gz 20140627.csv.gz 20140630.csv.gz 20140703.csv.gz 20140706.csv.gz 20140709.csv.gz 20140712.csv.gz 20140715.csv.gz
Basically we have the following structure:
data/[DATATYPE]/[DATAGROUP]/[YYYYMMDD].csv.gz
considering that:
DATATYPE
should be eitherraw
orenriched
. The names are references to what to expect on the data structure of the CSVs inside (as described on the README). Disregard thepopulation
type, it should not be a target for this presentation.DATAGROUP
is in reference to the group name of the combine output (currently the "inbound" and "outbound" separation). They can be whatever you like, I am usingpublic_inbound
andpublic_outbound
for the presentation data.YYYYMMDD
is the way dates should be represented in the whole world.
Please note the CSVs are gzipped. The code expects that as well.