forked from h2oai/db-benchmark
-
Notifications
You must be signed in to change notification settings - Fork 0
/
tech.Rmd
89 lines (74 loc) · 3.13 KB
/
tech.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
title: "Technical measures of db-benchmark"
output:
html_document:
self_contained: no
toc: true
includes:
in_header: ga.html
---
```{r render, include=FALSE}
# Rscript -e 'rmarkdown::render("tech.Rmd", output_dir="public")' # has to be output_dir='public' as there is hardcode in benchplot for that path
```
```{r opts, echo=FALSE}
knitr::opts_chunk$set(echo=FALSE, cache=FALSE)
```
```{r init}
library(lattice)
source("report.R")
ld = time_logs()
recent_nodename = ld[script_recent==TRUE, unique(nodename)]
stopifnot(length(recent_nodename)==1L)
ld = ld[nodename==recent_nodename]
```
## Incompleted timings of last run
```{r completed}
ll = ld[script_recent==TRUE, {
n_na = is.na(c(time_sec_1, time_sec_2))
n_completed=sum(!n_na)
n_failed=sum(n_na)
.(n_completed=n_completed, n_failed=n_failed, q_failed=if(n_failed==0L) NA_character_ else paste(paste0("q", iquestion[is.na(time_sec_1) | is.na(time_sec_2)]), collapse=","))
},
c("nodename","batch","solution","task","data","in_rows","k","nasorted")]
stopifnot(length(unique(ll$nodename))==1L)
```
### groupby
```{r completed_groupby}
kk(ll[task=="groupby"
][n_failed>0L, .(solution, data, in_rows, k, `NA, sorted`=nasorted, n_completed, n_failed, q_failed)])
```
## Full scripts executions
Things to consider when looking at below plots.
- Red dotted line refers to script timeout which initially was not set up. Later it was set to 60 minutes, more recently, after adding new set of questions, it was increased to 120 minutes. Up to date timeout value can be looked up in `timeout.csv` file.
- It might happen that script was terminated by _out of memory killer_ an OS feature. In result script timing will be smaller than in reality it should be.
Refer to table above to see which script has been fully completed.
### groupby
```{r logs_plot, fig.width=8, fig.height=48}
timeout = fread("timeout.csv")
timeout = timeout["groupby", on="task", nomatch=NULL] # filter for env var RUN_TASKS
stopifnot(nrow(timeout)==1L)
timeout_m = timeout[["minutes"]]
p = sapply(setNames(nm=as.character(unique(ld$solution))), simplify = FALSE, function(s)
lattice::xyplot(script_time_sec/60 ~ ibatch | k+in_rows, ld[task=="groupby" & substr(data,1,2)=="G1"],
type="l", grid=TRUE, groups=nasorted,
subset=solution==s, main=s,
panel=panel.superpose,
panel.groups=function(x, y, col, col.symbol, ...) {
panel.lines(x, y, col=col.symbol, ...)
panel.abline(h=timeout_m, col="red", lty=3)
},
xlab = "benchmark run",
ylab = "minutes",
scales=list(y=list(
relation="free",
limits=rep(ld[solution==s, list(list(c(0, max(script_time_sec)/60))), in_rows]$V1, each=3)
)),
auto.key=list(points=FALSE, lines=TRUE))
)
sapply(seq_along(p), function(i) print(p[[i]], split=c(1, i, 1, length(p)), more=i!=length(p))) -> nul
```
------
Report was generated on: `r format(Sys.time(), usetz=TRUE)`.
```{r status_set_success}
cat("tech\n", file=get_report_status_file(), append=TRUE)
```