Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,10 @@
## Campaign Lab Data Pipeline

For context, see [Campaign Lab Guide](https://github.com/CampaignLab/Campaign-Lab-Guide/blob/master/Campaign%20Lab%20Guide.md0).

#### What?

* We want to be able to structure our dataset (see "Campaign Lab Data Inventory").
* We want to be able to structure our dataset from the Data Inventory.
* In order to do this, we first should define what the structure (schema) of the different data sources are.
* This will help us down the line to create modules that transform our raw data into our target data, for later export into a database, R package, or any other tools for utilising the data in a highly structured and annotated format.

Expand Down Expand Up @@ -55,3 +57,22 @@
* *source* is a link (if available) to the actual dataset.
* The *description* is a one liner that describes the dataset
* *properties* is a list of the *datapoints* that we want to *end up with after transforming the raw dataset*.


### Toolset
(Author is learning his way around data science and Python, better approaches welcome.)
Datasets are expected to be largely static; transformers are intended to be manually run and eyeballed as needed, instead of automated.
They can be run in a local environment.
For reproducability and dev tooling, can also use a container environment via Docker.

Run a specific command:
`docker-compose run datascience python -c 'from london_election_results import get_data; print(get_data())'`

Running the environment:

* `docker-compose up`
* `http://localhost:9200` #elasticsearch
* `http://localhost:5601` #kibana
* Can import a CSV with e.g.
* `docker-compose run datascience python -c 'elasticsearch_loader --es-host http://elasticsearch:9200 --index campaignlab --type campaignlab csv ../schemas/local_election_results_2018-05-03.csv`
* Follow https://www.elastic.co/guide/en/kibana/current/tutorial-build-dashboard.html to visualise.
33 changes: 33 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
version: '3.1'
services:
datascience:
image: civisanalytics/datascience-python:4.2.0
container_name: datascience-python
ports:
- "8888:8888"
volumes:
- ./:/pipeline
working_dir: "/pipeline/transformers"
tty: true
# Keep container running idle.
command: [ "/bin/sh", "-c", "pip install elasticsearch-loader; tail -f /dev/null"]
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.6.0
container_name: elasticsearch
ports:
- "9200:9200"
environment:
CLUSTER_NAME: "campaignlab"
HTTP_PORT: "9200"
DISCOVERY_TYPE: "single-node"
ES_JAVA_OPTS: "-Xmx256m -Xms256m"

kibana:
image: docker.elastic.co/kibana/kibana-oss:6.6.0
container_name: kibana
ports:
- "5601:5601"
- "8080:8080"
environment:
SERVER_NAME: "kibana"
ELASTICSEARCH_HOSTS: "http://elasticsearch:9200"