Skip to content

Commit 7881eb2

Browse files
committed
Add tiled raster overview.
1 parent 199ca19 commit 7881eb2

File tree

1 file changed

+156
-0
lines changed

1 file changed

+156
-0
lines changed

overviews/tiledrasters.md

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
---
2+
layout: overviews
3+
title: Tiled Rasters
4+
5+
tutorial: overviews
6+
num: 6
7+
outof: 6
8+
---
9+
10+
To understand the importance of tiled rasters in GeoTrellis, it's helpful
11+
to think about the difference between task parallelism and data parallelism.
12+
Task parallelism involves taking a single piece of work and dividing it up
13+
into different kinds of tasks, and then, where possible, performing some of
14+
those tasks at the same time. In GeoTrellis, each operation is a single
15+
unit of work and it is usually the case that compound geoprocessing operations (made out of individual operations) have chains of operations that can be
16+
executed at the same time. Data parallelism, on the other hand, involves
17+
executing the same work on different data at the same time.
18+
19+
In most real world problems that benefit from distributed or parallel
20+
computing, both task and data parallelism are useful. In GeoTrellis, the
21+
primary method of data parallelism (with raster data) is executing operations
22+
on tiled rasters, which are individual raster layers that have been
23+
divided into regular grids of smaller rasters. For example, a 10k by 10k cell
24+
raster could be divided into 100 smaller rasters that are 1k by 1k cells each.
25+
You can think about a tiled raster as a distributed data source, as we cannot
26+
assume we can load the entire dataset in memory at any one time.
27+
28+
At the current stage of development, using tiled rasters in GeoTrellis does
29+
unfortunately require specialized data processing and programming. One reason
30+
this is the case it is far faster to operate on arrays in memory when possible.
31+
As we approach GeoTrellis 1.0, we hope to make the process more transparent for
32+
simple models. But while there is still much work to be done, there is useful
33+
infrastructure in place for developing services that operate on tiled raster
34+
sets. As this functinality is still a work in progress, I'll mention plans
35+
for future development alongside the current functionality.
36+
37+
## The tiled ARG format
38+
39+
On disk, a tiled ARG is a directory full of individual ARG files (each with their own json metadata) with a single master metadata file. If a tile is entirely
40+
NoData (every cell has no value), there will be no data file for that tile. In the future, we will support this same functionality for tiles with a
41+
single value. The filename of each tile has a suffix with the row and column of the tile. Each
42+
tile has the same width in pixels and the same height in pixels: for example,
43+
a common tile size is 512 by 512 or 1024 by 1024.
44+
45+
## Creating a Tiled Raster
46+
47+
You will need to use the command line gt-tool to create a tiled raster. See
48+
the getting started guide for more detalis about installing gt-tool.
49+
50+
At the current time, creating a tiled raster from a source raster that is
51+
larger than 2.3 GB requires GDAL 1.10 to be installed on your system. (In
52+
the future, this will not be necessary.)
53+
54+
You will need to decide what size tiles to create. The ideal size will vary
55+
based on your specific application: you may wish to create several tilesets and
56+
test the performance of each. Generally speaking, 512x512 may be a good choice
57+
to start with. It is important to keep in mind that every tile is the same
58+
size regardless of the source data. If your source raster is 5500x5500 and
59+
you create 1000x1000 tiles, the tiles on the right and bottom edge will have
60+
500 extra NoData cells, which can create extra work for your service.
61+
62+
For creating the tile, you should use the "import-tile" gt-tool task.
63+
64+
The options for gt-tool are:
65+
-i Path of the input raster file (which can be geotiff, ARG, or another GDAL supported raster format)
66+
-d Output directory for tiles
67+
-n Name of the output raster
68+
-cols Pixel columns (pixel width) per tile
69+
-rows Pixel rows (pixel height) per tile
70+
71+
For example, a sample usage could look like:
72+
73+
gt-tool import-tile -i /var/geotrellis/input/mydata.tif -d /var/geotrellis/data/mydata/ -n mydata -cols 512 -rows 512
74+
75+
There is also a gt-tool task called "import" that will create a tiled raster
76+
directly from an ARG file that doesn't depend on GDAL, but can only process
77+
raster files smaller than 2.3 GB.
78+
79+
## Loading a Tiled Raster
80+
81+
The operation io.LoadTileSet will load a tiled raster into memory given a path
82+
to a tile directory.
83+
84+
Raster.loadUncachedTileSet() will load an uncached tileset into memory, which
85+
creates a Raster object for transformation that loads individual tiles from
86+
disk when necessary.
87+
88+
In the future, this functionality will be expanded to allow tiled rasters to be
89+
included in the GeoTrellis catalog, as well as to provide a third alternative that
90+
temporarily caches tiles for efficient focal operations on tiled rasters on disk
91+
(see raster.TileNeighborhood).
92+
93+
## Treating an untiled raster as a tiled raster
94+
95+
If you have an untiled raster, but you want to treat it as a tiled raster for parallelization purposes,
96+
you can either create a tiled raster in memory or you can simulate one by using the LazyTiledWrapper
97+
and TiledLayout classes directly (at a performance cost). In the future, there will be operations
98+
to implement these transformations.
99+
100+
def buildTiledRaster(r:Raster, pixelCols:Int, pixelRows:Int) =
101+
Tiler.createTiledRaster(src, pixelCols, pixelRows)
102+
103+
def untiledRasterAsTiledRaster(r:Raster, pixelCols:Int, pixelRows:Int):Raster = {
104+
val tileLayout = Tiler.buildTileLayout(r.rasterExtent, pixelCols, pixelRows)
105+
Raster(LazyTiledWrapper(r.data, tileLayout), r.rasterExtent)
106+
}
107+
108+
109+
# Operations on Tiled Rasters
110+
111+
While in the future we hope to unify all operations to seamlessly operate on tiled and
112+
untiled rasters, at the moment it is necessary to know whether individual operations
113+
have been implemented to handle tiled rasters.
114+
115+
## Map/Reduce style Summary operations
116+
117+
As designed, a core function of tiled rasters is to allow operations on rasters too large to fit in memory.
118+
The logic.TileReducer class be extended to create operations that perform an operation on each individual
119+
tile (the mapper) and then perform an operation to combine the results of those operations (the reducer).
120+
Examples of operations that implement this interface are stats.Min and stats.GetHistogram.
121+
122+
## Local operations
123+
124+
By default, local operations on tiled rasters are lazy -- chained local operations are combined into
125+
a single compound operation that is only run when necessary. Because of this, it is safe to use
126+
local operations as part of your tiled raster operations. However, you should not *only* use local
127+
operations on a tiled raster as you cannot assume that the raster can fit into memory on a single
128+
machine to produce a result. For example, you might summarize the results of your raster transformed
129+
by local operations.
130+
131+
### Focal operations
132+
133+
Most focal operations can be used on tiled raster data, through the use of the "TileFocalOp" operation.
134+
Instead of specifying the raster as an argument to the operation, use an underscore like in the following
135+
example:
136+
137+
val tileFocalOp = TileFocalOp(tiledR, Min(_, Square(1)))
138+
139+
In the future, we hope to eliminate the TileFocalOp operation and allow focal operations to work
140+
directly with tiled rasters.
141+
142+
## Polygonal/Zonal Summary
143+
144+
The zonal summary (with polygonal area) operations takes advantage of tiled rasters by allowing
145+
the service builder to cache intermediate results by tile. For example, the following example
146+
creates a map of intermediate results by tile (finding the minimum value in each tile).
147+
148+
object MinService {
149+
// create the result cache once when loading this class
150+
val tileMins = zonal.summary.Min.createTileResults(rData, rasterExtent)
151+
}
152+
153+
class MinService {
154+
// use result cache when creating this operation with specific raster & polygons
155+
def minOperation(raster:Raster, zone:Polygon[_]) = zonal.summary.Min(raster, zone, tileMins)
156+
}

0 commit comments

Comments
 (0)