95-miscprog.Rmd

# R programming concepts and tools

> This section is composed of various section of more advanced
> programming topics from
> the [Teaching Material](https://lgatto.github.io/TeachingMaterial/)
> page.

## Defensive programming

Before even debugging, let's look at ways to prevent bugs in the first
place.

**Defensive programming:**  

- making the code work in a predicable manner
- writing code that fails in a well-defined manner
- if something *weird* happens, either properly deal with it, of fail
  quickly and loudly

The level of defensiveness will depend whether you write a function
for interactive of programmatic usage.

### Talking to users {-}

#### Diagnostic messages {-}

```{r, eval=FALSE}
message("This is a message for our dear users.")
```

```{r, eval=FALSE}
message("This is a message for our dear users. ",
	paste("Thank you for using our software",
              sw, "version", packageVersion(sw)))
```

Do not use `print` or `cat`:

```{r, eval=FALSE}
f1 <- function() {
    cat("I AM LOUD AND YOU CAN'T HELP IT.\n")
    ## do stuff
    invisible(TRUE)
}
f1()
```

```{r, eval=FALSE}
f2 <- function() {
    message("Sorry to interup, but...")
    ## do stuff
    invisible(TRUE)
}
f2()
suppressMessages(f2())
```

Of course, it is also possible to manually define verbosity. This
makes you write more code for a feature readily available. But still
better to use `message`.

```{r, eval=FALSE}
f3 <- function(verbose = TRUE) {
    if (verbose)
        message("I am being verbose because you let me.")
    ## do stuff
    invisible(TRUE)
}
f3()
f3(verbose = FALSE)
```

#### Warning {-}

> There is a problem with warnings. No one reads them. Pat Burns, in
> *R inferno*.

```{r, eval=FALSE}
warning("Do not ignore me. Somthing bad might have happened.")
warning("Do not ignore me. Somthing bad might be happening.", immediate. = TRUE)
```

```{r, eval=FALSE}
f <- function(...)
    warning("Attention, attention, ...!", ...)
f()
f(call. = FALSE)
```
Print warnings after they have been thrown.

```{r, eval=FALSE}
warnings()
last.warning
```

See also to `warn` option in `?options` .

```{r, eval=FALSE}
option("warn")
```

#### Error {-}

```{r, eval=FALSE}
stop("This is the end, my friend.")
```

```{r, eval=FALSE}
log(c(2, 1, 0, -1, 2)); print('end') ## warning 
xor(c(TRUE, FALSE));  print ('end')  ## error
```

Stop also has a `call.` parameter.

```{r, eval=FALSE}
geterrmessage()
```

#### Progress bars {-}

- `utils::txtProgressBar` function

```{r, eval=FALSE}
n <- 10
pb <- txtProgressBar(min = 0, max = n, style = 3)
for (i in 1:n) {
    setTxtProgressBar(pb, i)
    Sys.sleep(0.5)
}
close(pb)
```

- [`progress`](https://github.com/gaborcsardi/progress) package

```{r, eval=FALSE}
library("progress")
pb <- progress_bar$new(total = n)
for (i in 1:n) {
    pb$tick()
    Sys.sleep(0.5)
}
```

Tip: do not over use progress bars. Ideally, a user should be
confident that everything is under control and progress is made while
waiting for a function to return. In my experience, a progress bar is
usefull when there is a specific and/or user-defined number of
iterations, such a *iterating over n files*, or *running a simulation
n times*.

## KISS

Keep your functions simple and stupid (and short). 

## Failing fast and well

> Bounds errors are ugly, nasty things that should be stamped out
> whenever possible. One solution to this problem is to use the
> `assert` statement. The `assert` statement tells C++, "This can
> never happen, but if it does, abort the program in a nice way." One
> thing you find out as you gain programming experience is that things
> that can "never happen" happen with alarming frequency. So just to
> make sure that things work as they are supposed to, it’s a good idea
> to put lots of self checks in your program. -- Practical C++
> Programming, Steve Oualline, O'Reilly.

```{r, eval=FALSE}
if (!condition) stop(...)
```

```{r, eval=FALSE}
stopifnot(TRUE)
stopifnot(TRUE, FALSE)
```

For example to test input classes, lengths, ...

```{r, eval=FALSE}
f <- function(x) {
    stopifnot(is.numeric(x), length(x) == 1)
    invisible(TRUE)
}

f(1)
f("1")
f(1:2)
f(letters)
```

The [`assertthat`](https://github.com/hadley/assertthat) package:

```{r, eval=FALSE}
x <- "1"
library("assertthat")
stopifnot(is.numeric(x))
assert_that(is.numeric(x))
assert_that(length(x) == 2)
```

* `assert_that()` signal an error.
* `see_if()` returns a logical value, with the error message as an attribute.
* `validate_that()` returns `TRUE` on success, otherwise returns the error as
  a string.

  
* `is.flag(x)`: is x `TRUE` or `FALSE`? (a boolean flag)
* `is.string(x)`: is x a length 1 character vector?
* `has_name(x, nm)`, `x %has_name% nm`: does `x` have component `nm`?
* `has_attr(x, attr)`, `x %has_attr% attr`: does `x` have attribute `attr`?
* `is.count(x)`: is x a single positive integer?
* `are_equal(x, y)`: are `x` and `y` equal?
* `not_empty(x)`: are all dimensions of `x` greater than 0?
* `noNA(x)`: is `x` free from missing values?
* `is.dir(path)`: is `path` a directory?
* `is.writeable(path)`/`is.readable(path)`: is `path` writeable/readable?
* `has_extension(path, extension)`: does `file` have given `extension`?


## Consistency and predictability

**Ineractive use vs programming**: Moving from using R to programming
R is *abstraction*, *automation*, *generalisation*.

#### `drop` {-}

```{r, eval=FALSE}
head(cars)
head(cars[, 1])
head(cars[, 1, drop = FALSE])
```

#### `sapply/lapply` {-}

```{r, eval=FALSE}
df1 <- data.frame(x = 1:3, y = LETTERS[1:3])
sapply(df1, class)
df2 <- data.frame(x = 1:3, y = Sys.time() + 1:3)
sapply(df2, class)
```

Rather use a form where the return data structure is known...

```{r, eval=FALSE}
lapply(df1, class)
lapply(df2, class)
```

or that will break if the result is not what is exected

```{r, eval=FALSE}
vapply(df1, class, "1")
vapply(df2, class, "1")
```


Reminder of the interactive use vs programming examples: 
- `[` and `drop` 
- `sapply`, `lapply`, `vapply`

Remember also the concept of *tidy data*.

## Comparisons

### Floating point issues to be aware of {-}

R FAQ [7.31](http://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f)?


```{r, eval=FALSE}
a <- sqrt(2)
a * a == 2
a * a - 2
```

```{r, eval=FALSE}
1L + 2L == 3L
1.0 + 2.0 == 3.0
0.1 + 0.2 == 0.3
```

### Floating point: how to compare {-}

- `all.equal` compares R objects for *near equality*. Takes into
  account whether object attributes and names ought the taken into
  consideration (`check.attributes` and `check.names` parameters) and
  tolerance, which is machine dependent.

```{r, eval=FALSE}
all.equal(0.1 + 0.2, 0.3)
all.equal(0.1 + 0.2, 3.0)
isTRUE(all.equal(0.1 + 0.2, 3)) ## when you just want TRUE/FALSE
```

### Exact identity {-}

`identical`: test objects for exact equality

```{r, eval=FALSE}
1 == NULL
all.equal(1, NULL)
identical(1, NULL)
identical(1, 1.)   ## TRUE in R (both are stored as doubles)
all.equal(1, 1L)
identical(1, 1L)   ## stored as different types
```

Appropriate within `if`, `while` condition statements. (not
`all.equal`, unless wrapped in `isTRUE`).

## Exercise

From [Advanced R](http://adv-r.had.co.nz/Exceptions-Debugging.html#defensive-programming) by Hadley Wickham.

The `col_means` function computes the means of all numeric columns in
a data frame.

```{r, eval=FALSE}
col_means <- function(df) {
  numeric <- sapply(df, is.numeric)
  numeric_cols <- df[, numeric]
  data.frame(lapply(numeric_cols, mean))
}
```

Is it a robust function? What happens if there are unusual inputs.

```{r, eval=FALSE}
col_means(mtcars)
col_means(mtcars[, 0])
col_means(mtcars[0, ])
col_means(mtcars[, "mpg", drop = FALSE])
col_means(1:10)
col_means(as.matrix(mtcars))
col_means(as.list(mtcars))

mtcars2 <- mtcars
mtcars2[-1] <- lapply(mtcars2[-1], as.character)
col_means(mtcars2)
```

Using some of the concepts and tips above, re-write `col_means` to
make it more robust.

## Debugging: techniques and tools

**Shit happens!**

> Funding your bug is a process of confirming the many things that you
> believe are true - until you find one which is not true. -- Norm Matloff

#### 1. Identify the bug (the difficult part) {-}
- Something went wrong!
- Where in the code does it happen?
- Does it happen every time?
- What input triggered it?
- Report it (even if it is in your code - use github issues, for
  example).

**Tip**: Beware of your intuition. As a scientist, do what you are
used to: generate a hypotheses, *design an experiment* to test them,
and record the results.

#### 2. Fix it (the less difficult part) {-}
- Correct the bug.
- Make sure that bug will not repeat itself!
- How can we be confident that we haven't introduced new bugs?

### Tools {-}

- `print`/`cat`
- `traceback()`
- `browser()`
- IDE: RStudio, StatET, emacs' ess tracebug.


#### Manually {-}

Inserting `print` and `cat` statements in the code. Works, but time
consuming. 

#### Finding the bug {-}

Bugs are shy, and are generally hidden, deep down in your code, to
make it as difficult as possible for you to find them.

```{r, echo=TRUE}
e <- function(i) {
  x <- 1:4
  if (i < 5) x[1:2]
  else x[-1:2]
}
f <- function() sapply(1:10, e)
g <- function() f()
```

`traceback`: lists the sequence of calls that lead to the error

```{r, eval=FALSE}
g()
traceback()
```

If the source code is available (for example for `source()`d code),
then traceback will display the exact location in the function, in the
form `filename.R#linenum`.

#### Browsing the error {-}

- Register the function for debugging: `debug(g)`. This adds a call to
  the `browser()` function (see also below) and the very beginning of
  the function `g`.
  
- Every call to `g()` will not be run interactively.

- To finish debugging: `undebug(g)`.


```{r, eval=FALSE}
debug(g)
g()
```

How to debug:

- `n` executes the next step of the function. Use `print(n)` or
  `get(n)` to print/access the variable `n`.
- `s` to step into the next function. If it is not a function, same as
  `n`.
- `f` to finish execution of the current loop of function.
- `c` to leave interactive debugging and continue regular execution of
  the function. 
- `Q` to stop debugging, terminate the function and return to the
  global workspace.
- `where` print a stack trace of all active function calls.
- `Enter` same as `n` (or `s`, if it was used most recently), unless
  `options(browserNLdisabled = TRUE)` is set.

To fix a function when the source code is not directly available, use
`fix(fun)`. This will open the function's source code for editing and,
after saving and closing, store the updated function in the global
workspace.

#### Breakpoints {-}

- Add a call to `browser()` anywhere in the source code to execute the
  rest of the code interactively.
  
- To run breakpoints conditionally, wrap the call to `browser()` in a
  condition.

#### Debugging with IDEs {-}

- RSudio: `Show Traceback`, `Rerun with Debug` and interactive debugging.

![RStudio debugging 1](./figs/debugRStudio1.png)
![RStudio debugging 2](./figs/debugRStudio2.png)

- StatET (Eclipse plugin)

- [emacs ESS and tracebug](http://ess.r-project.org/Manual/ess.html#Developing-with-ESS)

### Exercise {-}

1. Your turn - play with `traceback`, `recover` and `debug`:

(Example originally by Martin Morgan and Robert Gentleman.)

```{r, echo=TRUE}
e <- function(i) {
  x <- 1:4
  if (i < 5) x[1:2]
  else x[-1:2] # oops! x[-(1:2)]
}
f <- function() sapply(1:10, e)
g <- function() f()
```

2. Fix `readFasta2`.

Preparing the ground

```{r, eval=FALSE}
## make sure you have the 'sequences' package.
library("devtools")
install_github("lgatto/sequences") ## from github
## or 
install.packages("sequences") ## from CRAN
```

A working example: reading a single sequence from a fasta file to
create a object of class `DnaSeq`, representing the DNA string:

```{r}
library("sequences")
f <- dir(system.file("extdata", package = "sequences"),
         full.names=TRUE, pattern = "aDnaSeq.fasta")
readFasta(f)

```

A bug, trying to read multiple sequences from a fasta file. The
expected behaviour would be to return a list of `DnaSeq` objects:

```{r, eval=FALSE}
## Get readFasta2, the function to debug
sequences:::debugme()
## Get an example file
f <- dir(system.file("extdata", package = "sequences"),
         full.names=TRUE, pattern = "moreDnaSeqs.fasta")
## BANG!
readFasta2(f)
```