-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmain.Rmd
196 lines (133 loc) · 4.63 KB
/
main.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
title: "Opensource Driven Data Pipes"
subtitle: "How KOF uses R, PostgreSQL, Git and Co."
author: "Matthias Bannert"
date: "Jan 11, 2018"
output: ioslides_presentation
widescreen: true
runtime: shiny
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
```
## Background Poll
```{r,echo=FALSE}
library(grid)
library(shiny)
source("R/bullet.R")
fluidPage(sidebarLayout(sidebarPanel(
numericInput("members", "Number of participants", 25),
numericInput("rstats", "R", 10),
numericInput("git", "Git", 10),
numericInput("sql", "PostgreSQL", 10)
),
mainPanel(
renderPlot({
techavg <- mean(c(input$prof,
input$data,
input$stats) / input$members)*100
df1 <- data.frame(units = c("R (%)","git (%)","pgSQL (%)"),
low = c(25,25,25),
mean = c(50,50,50),
high = c(100,100,100),
target = c(techavg,techavg,techavg),
value = c(100*(input$prog/input$members),
100*(input$data/input$members),
100*(input$rstats/input$members)
)
)
g <- gridBulletGraphH(df1,bcol = c("#999999","#CCCCCC","#E1E1E1"),
vcol = "#333333",font = 20)
g + title(paste("Usage of Technologies Among Participants", sep=" "))
})
)
))
```
## Outline
- about KOF -> requirements
- big picture + stack
- application examples along the flow of data
## About KOF
[=> KOF about KOF](https://www.kof.ethz.ch/en/)
## Requirements
- ensure documentation and long term knowledge transfer
- timely data imports & persistent storage
- reproducibly process data
- disseminate data
- publish KOF products
- enable language agnostic consumption
## Overview
<img src="overview.png">
## Stack
<img src="images/python.png" width="200">
<img src="images/git.png">
<img src="images/github.png">
<img src="images/gitlab.png">
<img src="images/julia.png" width="130">
<img src="images/rstats.png" height="80">
<img src="images/rstudio.png" width="130">
<img src="images/d3.png" height="80">
<img src="images/postgresql.png" width="150">
<img src="images/nginx.png" width="150">
<img src="images/javascript.jpeg" height="80">
<img src="images/kong.png">
<img src="images/rhel.jpeg">
## Robustness & Longevity
- de-centralized version control /w git
- create packages to force documentation
- [mkdocs](http://127.0.0.1:8000) (markdown based documentation), mediawiki
- use a database, not filesystem based organization
- use servers to avoid maintenance issues in production
## Git - a micro excursion
- applied example: [*diff*erence between versions](https://github.com/mbannert/tstools/commits/master)
- git is not a dropbox-style-sync
- git is decentralized
- git is not github
- git itself is a console program, GUIs come on top
## Data imports
- use APIs whenever possible
- [SNB data](https://data.snb.ch)
- [BFS PX web](https://www.pxweb.bfs.admin.ch/)
- time controlled R scripts (cronjobs) on servers
- assign unique keys following (see also [KOF Key Explorer](https://datenservice.kof.ethz.ch/static/keyexplorer/index.html)):
country.provider.source.level.instance.variable.item
- store to postgreSQL database using the [timeseriesdb](https://github.com/mbannert/timeseriesdb) R package
-> snb_demo.R
## Package timeseriesdb
- map R time series to PostgreSQL relations
- available on github
```{r,eval=FALSE}
library(timeseriesdb)
tslist <- readTimeseries(vector_of_keys, connection)
storeTimeSeries(vector_of_keys, connection, list_of_series)
```
## Reproducibly Process Data
- scripts, mostly packages
- [unit tests](https://github.com/KOF-ch/kofdata/tree/master/tests/testthat) (introduction phase)
- no point and click processes
- log files
- error notifications mails
- GUI for production processes
- data.table, seasonal
- use tryCatch
- knitr / rmarkdown, tstools
## Publish Data
- KOF Website [e.g. KOF Barometer](https://www.kof.ethz.ch/en/forecasts-and-indicators/indicators/kof-economic-barometer.html)
- [KOF Indicator Dashboard](http://127.0.0.1:1234/)
- move timeseries data from one schema to another (updates all links and graphs)
- publish / subscribe (introduction phase)
## Disseminate Data: datenservice.kof.ethz.ch API
- REST API (compose URL dynamically)
- ordinary links
- use curl based, automated downloads
- file based endpoints
- dynamic endpoints
- kofdata R and Python packages (API wrapper)
## KOF Reports
- rmarkdown templates
- time series from database
- plot /w tstools
-> knitr_demo.R
-> dlu_report.pdf
## Thank you !
https://twitter.com/whatsgoodio