-
Notifications
You must be signed in to change notification settings - Fork 42
/
Copy path11-modes-of-input.Rmd
463 lines (391 loc) · 18.2 KB
/
11-modes-of-input.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
```{r, echo = FALSE}
library(circlize)
```
# modes for `circos.genomicTrack()` {#modes-of-input}
The behaviour of `circos.genomicTrack()` and `panel.fun` will be different
according to different input data (e.g. is it a simple data frame or a list of
data frames? If it is a data frame, how many numeric columns it has?) and
different settings.
## Normal mode
### Input is a data frame
If input `data` is a data frame in _BED_ format, `region` in `panel.fun` would
be a data frame containing start position and end position in the current
chromosome which is extracted from `data`. `value` is also a data frame
which contains columns in `data` excluding the first three columns. Index of
proper numeric columns will be passed by `...` if it is set in
`circos.genomicTrack()`. If users want to use such information, they need to
pass `...` to low-level genomic function such as `circos.genoimcPoints()` as
well.
If there are more than one numeric columns, graphics are added for each column
repeatedly (with same genomic positions).
```{r, eval = FALSE}
data = generateRandomBed(nc = 2)
circos.genomicTrack(data, numeric.column = 4,
panel.fun = function(region, value, ...) {
circos.genomicPoints(region, value, ...)
circos.genomicPoints(region, value)
# 1st column in `value` while 4th column in `data`
circos.genomicPoints(region, value, numeric.column = 1)
})
```
### Input is a list of data frames
If input data is a list of data frames, `panel.fun` is applied on each
data frame iteratively to the current cell. Under such condition, `region` and `value`
will contain corresponding data in the current data frame and in the current chromosome. The index for the
current data frame can be get by `getI(...)`. Note `getI(...)` can only be used
inside `panel.fun` and `...` argument is mandatory.
When `numeric.column` is specified in `circos.genomicTrack()`, the length of
`numeric.column` can only be one or the number of data frames, which means,
there is only one numeric column that will be used in each data frame. If it
is not specified, the first numeric column in each data frame is used.
```{r, eval = FALSE}
bed_list = list(generateRandomBed(), generateRandomBed())
circos.genomicTrack(bed_list,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, col = i, ...)
})
# column 4 in the first bed and column 5 in the second bed
circos.genomicTrack(bed_list, numeric.column = c(4, 5),
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, col = i, ...)
})
```
## Stack mode
`circos.genomicTrack()` also supports a `stack` mode by setting `stack =
TRUE`. Under `stack` mode, `ylim` is re-defined inside the function and the
y-axis is splitted into several bins with equal height and graphics are put
onto "horizontal" bins (with position `y = 1, 2, ...`).
### Input is a data frame
Under `stack` mode, when input data is a single data frame containing one or
more numeric columns, each numeric column defined in `numeric.column` will be
treated as a single unit (recall that when `numeric.column` is not specified,
all numeric columns are used). `ylim` is re-defined to `c(0.5, n+0.5)` in
which `n` is number of numeric columns specified. `panel.fun` is applied
iteratively on each numeric column and add graphics to the horizontal line `y = i`.
In this case, actually `value` in e.g. `circos.genomicPoints()` doesn't
used for mapping the y positions, while replaced with `y = i` internally.
In each iteration, in `panel.fun`, `region` is still the genomic regions in
current chromosome, but `value` only contains current numeric column plus all
non-numeric columns. The value of the index of "current" numeric column can be
obtained by `getI(...)`.
```{r, eval = FALSE}
data = generateRandomBed(nc = 2)
circos.genomicTrack(data, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, col = i, ...)
})
```
### Input is a list of data frames
When input data is a list of data frames, each data frame will be treated as a
single unit. `ylim` is re-defined to `c(0.5, n+0.5)` in which `n` is the
number of data frames. `panel.fun` will be applied iteratively on each data
frame. In each iteration, in `panel.fun`, `region` is still the genomic
regions in current chromosome, and `value` contains columns in current data
frame excluding the first three columns. Graphics by low-level genomic
functions will be added on the `horizontal' bins.
```{r, eval = FALSE}
bed_list = list(generateRandomBed(), generateRandomBed())
circos.genomicTrack(bed_list, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, ...)
})
```
Under `stack` mode, if using a data frame with multiple numeric columns,
graphics on all horizontal bins share the same genomic positions while if
using a list of data frames, the genomic positions can be different.
## Applications
In this section, we will show several real examples of adding genomic graphics
under different modes. Again, if you are not happy with these functionalities,
you can simply re-implement your plot with the basic circlize functions.
### Points {#modes-points}
To make plots more clear to look at, we only add graphics in the first quarter
of the circle and initialize the plot only with chromosome 1.
```{r genomic_application_points_0, eval = FALSE}
set.seed(999)
circos.par("track.height" = 0.1, start.degree = 90,
canvas.xlim = c(0, 1), canvas.ylim = c(0, 1), gap.degree = 270)
circos.initializeWithIdeogram(chromosome.index = "chr1", plotType = NULL)
```
In the example figure (Figure \@ref(fig:genomic-application-points)) below, each track
contains points under different modes.
In track A, it is the most normal way to add points. Here `bed` only contains
one numeric column and points are added at the middle points of regions.
```{r genomic_application_points_A, eval = FALSE, echo = 1:4}
bed = generateRandomBed(nr = 300)
circos.genomicTrack(bed, panel.fun = function(region, value, ...) {
circos.genomicPoints(region, value, pch = 16, cex = 0.5, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "A", adj = c(1.1, 0.5))
```
In track B, if it is specified as `stack` mode, points are added in a
horizontal line (or visually, a circular line).
```{r genomic_application_points_B, eval = FALSE, echo = 1:6}
circos.genomicTrack(bed, stack = TRUE,
panel.fun = function(region, value, ...) {
circos.genomicPoints(region, value, pch = 16, cex = 0.5,...)
i = getI(...)
circos.lines(CELL_META$cell.xlim, c(i, i), lty = 2, col = "#00000040")
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "B", adj = c(1.1, 0.5))
```
In track C, the input data is a list of two data frames. `panel.fun` is applied
iterately on each data frame. The index of "current" index can be obtained by `getI(...)`.
```{r genomic_application_points_C, eval = FALSE, echo = 1:8}
bed1 = generateRandomBed(nr = 300)
bed2 = generateRandomBed(nr = 300)
bed_list = list(bed1, bed2)
circos.genomicTrack(bed_list,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, pch = 16, cex = 0.5, col = i, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "C", adj = c(1.1, 0.5))
```
In track D, the list of data frames is plotted under `stack` mode. Graphics
corresponding to each data frame are added to a horizontal line.
```{r genomic_application_points_D, eval = FALSE, echo = 1:6}
circos.genomicTrack(bed_list, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, pch = 16, cex = 0.5, col = i, ...)
circos.lines(CELL_META$cell.xlim, c(i, i), lty = 2, col = "#00000040")
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "D", adj = c(1.1, 0.5))
```
In track E, the data frame has four numeric columns. Under normal mode, all the four
columns are used with the same genomic coordinates.
```{r genomic_application_points_E, eval = FALSE, echo = 1:5}
bed = generateRandomBed(nr = 300, nc = 4)
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicPoints(region, value, pch = 16, cex = 0.5, col = 1:4, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "E", adj = c(1.1, 0.5))
```
In track F, the data frame has four columns but is plotted under `stack` mode.
Graphics for each column are added to a horizontal line. Current column can be
obtained by `getI(...)`. Note here `value` in `panel.fun` is a data frame
with only one column (which is the current numeric column).
```{r genomic_application_points_F, eval = FALSE, echo = c(1:7, 10)}
bed = generateRandomBed(nr = 300, nc = 4)
circos.genomicTrack(bed, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicPoints(region, value, pch = 16, cex = 0.5, col = i, ...)
circos.lines(CELL_META$cell.xlim, c(i, i), lty = 2, col = "#00000040")
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "F", adj = c(1.1, 0.5))
circos.clear()
```
```{r genomic-application-points, echo = FALSE, fig.cap = "Add points under different modes."}
chunks <- knitr:::knit_code$get()
eval(parse(text = chunks[["genomic_application_points_0"]]))
eval(parse(text = chunks[["genomic_application_points_A"]]))
eval(parse(text = chunks[["genomic_application_points_B"]]))
eval(parse(text = chunks[["genomic_application_points_C"]]))
eval(parse(text = chunks[["genomic_application_points_D"]]))
eval(parse(text = chunks[["genomic_application_points_E"]]))
eval(parse(text = chunks[["genomic_application_points_F"]]))
```
### Lines {#modes-lines}
Similar as previous figure, only the first quarter in the circle is
visualized. Examples are shown in Figure \@ref(fig:genomic-application-lines).
```{r genomic_application_lines_0, eval = FALSE}
circos.par("track.height" = 0.08, start.degree = 90,
canvas.xlim = c(0, 1), canvas.ylim = c(0, 1), gap.degree = 270,
cell.padding = c(0, 0, 0, 0))
circos.initializeWithIdeogram(chromosome.index = "chr1", plotType = NULL)
```
In track A, it is the most simple way to add lines. Middle points of regions
are used as the values on x-axes.
```{r genomic_application_lines_A, eval = FALSE, echo = 1:5}
bed = generateRandomBed(nr = 500)
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicLines(region, value)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "A", adj = c(1.1, 0.5))
```
`circos.genomicLines()` is implemented by `circos.lines()`, thus, arguments
supported in `circos.lines()` can also be in `circos.genomicLines()`. In track
B, the area under the line is filled with color and in track C, type of the
line is set to `h`.
```{r genomic_application_lines_BC, eval = FALSE, echo = c(1:4, 7:10)}
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicLines(region, value, area = TRUE)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "B", adj = c(1.1, 0.5))
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicLines(region, value, type = "h")
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "C", adj = c(1.1, 0.5))
```
In track D, the input is a list of data frames. `panel.fun` is applied to each data frame
iterately.
```{r genomic_application_lines_D, eval = FALSE, echo = 1:8}
bed1 = generateRandomBed(nr = 500)
bed2 = generateRandomBed(nr = 500)
bed_list = list(bed1, bed2)
circos.genomicTrack(bed_list,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicLines(region, value, col = i, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "D", adj = c(1.1, 0.5))
```
In track E, the input is a list of data frames and is drawn under `stack`
mode. Each genomic region is drawn as a horizontal segment and is put on a
horizontal line where the width of the segment corresponds to the width of the
genomc region. Under `stack` mode, for `circos.genomicLines()`, type of lines
is only restricted to segments.
```{r genomic_application_lines_E, eval = FALSE, echo = 1:5}
circos.genomicTrack(bed_list, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicLines(region, value, col = i, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "E", adj = c(1.1, 0.5))
```
In track F, the input is a data frame with four numeric columns. Each column
is drawn under the normal mode where the same genomic coordinates are shared.
```{r genomic_application_lines_F, eval = FALSE, echo = 1:5}
bed = generateRandomBed(nr = 500, nc = 4)
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicLines(region, value, col = 1:4, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "F", adj = c(1.1, 0.5))
```
In track G, the data frame with four numeric columns are drawn under `stack` mode.
All the four columns are drawn to four horizontal lines.
```{r genomic_application_lines_G, eval = FALSE, echo = 1:6}
bed = generateRandomBed(nr = 500, nc = 4)
circos.genomicTrack(bed, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicLines(region, value, col = i, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "G", adj = c(1.1, 0.5))
```
In track H, we specify `type` to `segment` and set different colors for segments.
Note each segment is located at the y position defined in the numeric column.
```{r genomic_application_lines_H, eval = FALSE, echo = c(1:6, 9)}
bed = generateRandomBed(nr = 200)
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicLines(region, value, type = "segment", lwd = 2,
col = rand_color(nrow(region)), ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "H", adj = c(1.1, 0.5))
circos.clear()
```
```{r genomic-application-lines, echo = FALSE, fig.cap = "Add lines under different modes."}
chunks <- knitr:::knit_code$get()
eval(parse(text = chunks[["genomic_application_lines_0"]]))
eval(parse(text = chunks[["genomic_application_lines_A"]]))
eval(parse(text = chunks[["genomic_application_lines_BC"]]))
eval(parse(text = chunks[["genomic_application_lines_D"]]))
eval(parse(text = chunks[["genomic_application_lines_E"]]))
eval(parse(text = chunks[["genomic_application_lines_F"]]))
eval(parse(text = chunks[["genomic_application_lines_G"]]))
eval(parse(text = chunks[["genomic_application_lines_H"]]))
```
### Rectangles {#modes-rectangles}
Again, only the first quarter of the circle is initialized. For rectangles,
the filled colors are always used to represent numeric values. Here we define
a color mapping function `col_fun` to map values to colors. Examples are in
Figure \@ref(fig:genomic-application-rect).
```{r genomic_application_rect_0, eval = FALSE}
circos.par("track.height" = 0.15, start.degree = 90,
canvas.xlim = c(0, 1), canvas.ylim = c(0, 1), gap.degree = 270)
circos.initializeWithIdeogram(chromosome.index = "chr1", plotType = NULL)
col_fun = colorRamp2(breaks = c(-1, 0, 1), colors = c("green", "black", "red"))
```
To draw heatmaps, you probably want to use the `stack` mode. In track A, `bed`
has four numeric columns and `stack` mode is used to arrange the heatmap. You
can see rectangles are stacked for a certain genomic region.
```{r genomic_application_rect_A, eval = FALSE, echo = 1:5}
bed = generateRandomBed(nr = 100, nc = 4)
circos.genomicTrack(bed, stack = TRUE,
panel.fun = function(region, value, ...) {
circos.genomicRect(region, value, col = col_fun(value[[1]]), border = NA, ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "A", adj = c(1.1, 0.5))
```
In track B, the input is a list of data frames. Under `stack` mode, each data
frame is added to a horizontal line. Since genomic positions for different
data frames can be different, you may see in the figure, positions for the two
sets of rectangles are different.
Under `stack` mode, by default, the height of rectangles is internally set to
make them completely fill the cell in the vertical direction. `ytop` and
`ybottom` can be used to adjust the height of rectangles. Note each line of
rectangles is at `y = i` and the default height of rectangles are 1.
```{r genomic_application_rect_B, eval = FALSE, echo = 1:9}
bed1 = generateRandomBed(nr = 100)
bed2 = generateRandomBed(nr = 100)
bed_list = list(bed1, bed2)
circos.genomicTrack(bed_list, stack = TRUE,
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicRect(region, value, ytop = i + 0.3, ybottom = i - 0.3,
col = col_fun(value[[1]]), ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "B", adj = c(1.1, 0.5))
```
In track C, we implement same graphics as in track B, but with the normal mode.
Under `stack` mode, data range on y axes and positions of rectangles are adjusted
internally. Here we explicitly adjust it under the normal mode.
```{r genomic_application_rect_C, eval = FALSE, echo = 1:6}
circos.genomicTrack(bed_list, ylim = c(0.5, 2.5),
panel.fun = function(region, value, ...) {
i = getI(...)
circos.genomicRect(region, value, ytop = i + 0.3, ybottom = i - 0.3,
col = col_fun(value[[1]]), ...)
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "C", adj = c(1.1, 0.5))
```
In track D, rectangles are used to make barplots. We specify the position of
the top of bars by `ytop.column` (1 means the first column in `value`).
```{r genomic_application_rect_D, eval = FALSE, echo = c(1:7, 10)}
bed = generateRandomBed(nr = 200)
circos.genomicTrack(bed,
panel.fun = function(region, value, ...) {
circos.genomicRect(region, value, ytop.column = 1, ybottom = 0,
col = ifelse(value[[1]] > 0, "red", "green"), ...)
circos.lines(CELL_META$cell.xlim, c(0, 0), lty = 2, col = "#00000040")
})
pos = get.cell.meta.data("yplot")
text(0, mean(pos), "D", adj = c(1.1, 0.5))
circos.clear()
```
```{r genomic-application-rect, echo = FALSE, fig.cap = "Add rectangles under different modes."}
chunks <- knitr:::knit_code$get()
eval(parse(text = chunks[["genomic_application_rect_0"]]))
eval(parse(text = chunks[["genomic_application_rect_A"]]))
eval(parse(text = chunks[["genomic_application_rect_B"]]))
eval(parse(text = chunks[["genomic_application_rect_C"]]))
eval(parse(text = chunks[["genomic_application_rect_D"]]))
```