forked from lgatto/2017_11_09_Rcourse_Jena
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path95-miscprog.Rmd
513 lines (384 loc) · 12.1 KB
/
95-miscprog.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
# R programming concepts and tools
> This section is composed of various section of more advanced
> programming topics from
> the [Teaching Material](https://lgatto.github.io/TeachingMaterial/)
> page.
## Defensive programming
Before even debugging, let's look at ways to prevent bugs in the first
place.
**Defensive programming:**
- making the code work in a predicable manner
- writing code that fails in a well-defined manner
- if something *weird* happens, either properly deal with it, of fail
quickly and loudly
The level of defensiveness will depend whether you write a function
for interactive of programmatic usage.
### Talking to users {-}
#### Diagnostic messages {-}
```{r, eval=FALSE}
message("This is a message for our dear users.")
```
```{r, eval=FALSE}
message("This is a message for our dear users. ",
paste("Thank you for using our software",
sw, "version", packageVersion(sw)))
```
Do not use `print` or `cat`:
```{r, eval=FALSE}
f1 <- function() {
cat("I AM LOUD AND YOU CAN'T HELP IT.\n")
## do stuff
invisible(TRUE)
}
f1()
```
```{r, eval=FALSE}
f2 <- function() {
message("Sorry to interup, but...")
## do stuff
invisible(TRUE)
}
f2()
suppressMessages(f2())
```
Of course, it is also possible to manually define verbosity. This
makes you write more code for a feature readily available. But still
better to use `message`.
```{r, eval=FALSE}
f3 <- function(verbose = TRUE) {
if (verbose)
message("I am being verbose because you let me.")
## do stuff
invisible(TRUE)
}
f3()
f3(verbose = FALSE)
```
#### Warning {-}
> There is a problem with warnings. No one reads them. Pat Burns, in
> *R inferno*.
```{r, eval=FALSE}
warning("Do not ignore me. Somthing bad might have happened.")
warning("Do not ignore me. Somthing bad might be happening.", immediate. = TRUE)
```
```{r, eval=FALSE}
f <- function(...)
warning("Attention, attention, ...!", ...)
f()
f(call. = FALSE)
```
Print warnings after they have been thrown.
```{r, eval=FALSE}
warnings()
last.warning
```
See also to `warn` option in `?options` .
```{r, eval=FALSE}
option("warn")
```
#### Error {-}
```{r, eval=FALSE}
stop("This is the end, my friend.")
```
```{r, eval=FALSE}
log(c(2, 1, 0, -1, 2)); print('end') ## warning
xor(c(TRUE, FALSE)); print ('end') ## error
```
Stop also has a `call.` parameter.
```{r, eval=FALSE}
geterrmessage()
```
#### Progress bars {-}
- `utils::txtProgressBar` function
```{r, eval=FALSE}
n <- 10
pb <- txtProgressBar(min = 0, max = n, style = 3)
for (i in 1:n) {
setTxtProgressBar(pb, i)
Sys.sleep(0.5)
}
close(pb)
```
- [`progress`](https://github.com/gaborcsardi/progress) package
```{r, eval=FALSE}
library("progress")
pb <- progress_bar$new(total = n)
for (i in 1:n) {
pb$tick()
Sys.sleep(0.5)
}
```
Tip: do not over use progress bars. Ideally, a user should be
confident that everything is under control and progress is made while
waiting for a function to return. In my experience, a progress bar is
usefull when there is a specific and/or user-defined number of
iterations, such a *iterating over n files*, or *running a simulation
n times*.
## KISS
Keep your functions simple and stupid (and short).
## Failing fast and well
> Bounds errors are ugly, nasty things that should be stamped out
> whenever possible. One solution to this problem is to use the
> `assert` statement. The `assert` statement tells C++, "This can
> never happen, but if it does, abort the program in a nice way." One
> thing you find out as you gain programming experience is that things
> that can "never happen" happen with alarming frequency. So just to
> make sure that things work as they are supposed to, it’s a good idea
> to put lots of self checks in your program. -- Practical C++
> Programming, Steve Oualline, O'Reilly.
```{r, eval=FALSE}
if (!condition) stop(...)
```
```{r, eval=FALSE}
stopifnot(TRUE)
stopifnot(TRUE, FALSE)
```
For example to test input classes, lengths, ...
```{r, eval=FALSE}
f <- function(x) {
stopifnot(is.numeric(x), length(x) == 1)
invisible(TRUE)
}
f(1)
f("1")
f(1:2)
f(letters)
```
The [`assertthat`](https://github.com/hadley/assertthat) package:
```{r, eval=FALSE}
x <- "1"
library("assertthat")
stopifnot(is.numeric(x))
assert_that(is.numeric(x))
assert_that(length(x) == 2)
```
* `assert_that()` signal an error.
* `see_if()` returns a logical value, with the error message as an attribute.
* `validate_that()` returns `TRUE` on success, otherwise returns the error as
a string.
* `is.flag(x)`: is x `TRUE` or `FALSE`? (a boolean flag)
* `is.string(x)`: is x a length 1 character vector?
* `has_name(x, nm)`, `x %has_name% nm`: does `x` have component `nm`?
* `has_attr(x, attr)`, `x %has_attr% attr`: does `x` have attribute `attr`?
* `is.count(x)`: is x a single positive integer?
* `are_equal(x, y)`: are `x` and `y` equal?
* `not_empty(x)`: are all dimensions of `x` greater than 0?
* `noNA(x)`: is `x` free from missing values?
* `is.dir(path)`: is `path` a directory?
* `is.writeable(path)`/`is.readable(path)`: is `path` writeable/readable?
* `has_extension(path, extension)`: does `file` have given `extension`?
## Consistency and predictability
**Ineractive use vs programming**: Moving from using R to programming
R is *abstraction*, *automation*, *generalisation*.
#### `drop` {-}
```{r, eval=FALSE}
head(cars)
head(cars[, 1])
head(cars[, 1, drop = FALSE])
```
#### `sapply/lapply` {-}
```{r, eval=FALSE}
df1 <- data.frame(x = 1:3, y = LETTERS[1:3])
sapply(df1, class)
df2 <- data.frame(x = 1:3, y = Sys.time() + 1:3)
sapply(df2, class)
```
Rather use a form where the return data structure is known...
```{r, eval=FALSE}
lapply(df1, class)
lapply(df2, class)
```
or that will break if the result is not what is exected
```{r, eval=FALSE}
vapply(df1, class, "1")
vapply(df2, class, "1")
```
Reminder of the interactive use vs programming examples:
- `[` and `drop`
- `sapply`, `lapply`, `vapply`
Remember also the concept of *tidy data*.
## Comparisons
### Floating point issues to be aware of {-}
R FAQ [7.31](http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f)?
```{r, eval=FALSE}
a <- sqrt(2)
a * a == 2
a * a - 2
```
```{r, eval=FALSE}
1L + 2L == 3L
1.0 + 2.0 == 3.0
0.1 + 0.2 == 0.3
```
### Floating point: how to compare {-}
- `all.equal` compares R objects for *near equality*. Takes into
account whether object attributes and names ought the taken into
consideration (`check.attributes` and `check.names` parameters) and
tolerance, which is machine dependent.
```{r, eval=FALSE}
all.equal(0.1 + 0.2, 0.3)
all.equal(0.1 + 0.2, 3.0)
isTRUE(all.equal(0.1 + 0.2, 3)) ## when you just want TRUE/FALSE
```
### Exact identity {-}
`identical`: test objects for exact equality
```{r, eval=FALSE}
1 == NULL
all.equal(1, NULL)
identical(1, NULL)
identical(1, 1.) ## TRUE in R (both are stored as doubles)
all.equal(1, 1L)
identical(1, 1L) ## stored as different types
```
Appropriate within `if`, `while` condition statements. (not
`all.equal`, unless wrapped in `isTRUE`).
## Exercise
From [Advanced R](http://adv-r.had.co.nz/Exceptions-Debugging.html#defensive-programming) by Hadley Wickham.
The `col_means` function computes the means of all numeric columns in
a data frame.
```{r, eval=FALSE}
col_means <- function(df) {
numeric <- sapply(df, is.numeric)
numeric_cols <- df[, numeric]
data.frame(lapply(numeric_cols, mean))
}
```
Is it a robust function? What happens if there are unusual inputs.
```{r, eval=FALSE}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = FALSE])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))
mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```
Using some of the concepts and tips above, re-write `col_means` to
make it more robust.
## Debugging: techniques and tools
**Shit happens!**
> Funding your bug is a process of confirming the many things that you
> believe are true - until you find one which is not true. -- Norm Matloff
#### 1. Identify the bug (the difficult part) {-}
- Something went wrong!
- Where in the code does it happen?
- Does it happen every time?
- What input triggered it?
- Report it (even if it is in your code - use github issues, for
example).
**Tip**: Beware of your intuition. As a scientist, do what you are
used to: generate a hypotheses, *design an experiment* to test them,
and record the results.
#### 2. Fix it (the less difficult part) {-}
- Correct the bug.
- Make sure that bug will not repeat itself!
- How can we be confident that we haven't introduced new bugs?
### Tools {-}
- `print`/`cat`
- `traceback()`
- `browser()`
- IDE: RStudio, StatET, emacs' ess tracebug.
#### Manually {-}
Inserting `print` and `cat` statements in the code. Works, but time
consuming.
#### Finding the bug {-}
Bugs are shy, and are generally hidden, deep down in your code, to
make it as difficult as possible for you to find them.
```{r, echo=TRUE}
e <- function(i) {
x <- 1:4
if (i < 5) x[1:2]
else x[-1:2]
}
f <- function() sapply(1:10, e)
g <- function() f()
```
`traceback`: lists the sequence of calls that lead to the error
```{r, eval=FALSE}
g()
traceback()
```
If the source code is available (for example for `source()`d code),
then traceback will display the exact location in the function, in the
form `filename.R#linenum`.
#### Browsing the error {-}
- Register the function for debugging: `debug(g)`. This adds a call to
the `browser()` function (see also below) and the very beginning of
the function `g`.
- Every call to `g()` will not be run interactively.
- To finish debugging: `undebug(g)`.
```{r, eval=FALSE}
debug(g)
g()
```
How to debug:
- `n` executes the next step of the function. Use `print(n)` or
`get(n)` to print/access the variable `n`.
- `s` to step into the next function. If it is not a function, same as
`n`.
- `f` to finish execution of the current loop of function.
- `c` to leave interactive debugging and continue regular execution of
the function.
- `Q` to stop debugging, terminate the function and return to the
global workspace.
- `where` print a stack trace of all active function calls.
- `Enter` same as `n` (or `s`, if it was used most recently), unless
`options(browserNLdisabled = TRUE)` is set.
To fix a function when the source code is not directly available, use
`fix(fun)`. This will open the function's source code for editing and,
after saving and closing, store the updated function in the global
workspace.
#### Breakpoints {-}
- Add a call to `browser()` anywhere in the source code to execute the
rest of the code interactively.
- To run breakpoints conditionally, wrap the call to `browser()` in a
condition.
#### Debugging with IDEs {-}
- RSudio: `Show Traceback`, `Rerun with Debug` and interactive debugging.
![RStudio debugging 1](./figs/debugRStudio1.png)
![RStudio debugging 2](./figs/debugRStudio2.png)
- StatET (Eclipse plugin)
- [emacs ESS and tracebug](http://ess.r-project.org/Manual/ess.html#Developing-with-ESS)
### Exercise {-}
1. Your turn - play with `traceback`, `recover` and `debug`:
(Example originally by Martin Morgan and Robert Gentleman.)
```{r, echo=TRUE}
e <- function(i) {
x <- 1:4
if (i < 5) x[1:2]
else x[-1:2] # oops! x[-(1:2)]
}
f <- function() sapply(1:10, e)
g <- function() f()
```
2. Fix `readFasta2`.
Preparing the ground
```{r, eval=FALSE}
## make sure you have the 'sequences' package.
library("devtools")
install_github("lgatto/sequences") ## from github
## or
install.packages("sequences") ## from CRAN
```
A working example: reading a single sequence from a fasta file to
create a object of class `DnaSeq`, representing the DNA string:
```{r}
library("sequences")
f <- dir(system.file("extdata", package = "sequences"),
full.names=TRUE, pattern = "aDnaSeq.fasta")
readFasta(f)
```
A bug, trying to read multiple sequences from a fasta file. The
expected behaviour would be to return a list of `DnaSeq` objects:
```{r, eval=FALSE}
## Get readFasta2, the function to debug
sequences:::debugme()
## Get an example file
f <- dir(system.file("extdata", package = "sequences"),
full.names=TRUE, pattern = "moreDnaSeqs.fasta")
## BANG!
readFasta2(f)
```