|
| 1 | +--- |
| 2 | +layout: overviews |
| 3 | +title: Tiled Rasters |
| 4 | + |
| 5 | +tutorial: overviews |
| 6 | +num: 6 |
| 7 | +outof: 6 |
| 8 | +--- |
| 9 | + |
| 10 | +To understand the importance of tiled rasters in GeoTrellis, it's helpful |
| 11 | +to think about the difference between task parallelism and data parallelism. |
| 12 | +Task parallelism involves taking a single piece of work and dividing it up |
| 13 | +into different kinds of tasks, and then, where possible, performing some of |
| 14 | +those tasks at the same time. In GeoTrellis, each operation is a single |
| 15 | +unit of work and it is usually the case that compound geoprocessing operations (made out of individual operations) have chains of operations that can be |
| 16 | +executed at the same time. Data parallelism, on the other hand, involves |
| 17 | +executing the same work on different data at the same time. |
| 18 | + |
| 19 | +In most real world problems that benefit from distributed or parallel |
| 20 | +computing, both task and data parallelism are useful. In GeoTrellis, the |
| 21 | +primary method of data parallelism (with raster data) is executing operations |
| 22 | +on tiled rasters, which are individual raster layers that have been |
| 23 | +divided into regular grids of smaller rasters. For example, a 10k by 10k cell |
| 24 | +raster could be divided into 100 smaller rasters that are 1k by 1k cells each. |
| 25 | +You can think about a tiled raster as a distributed data source, as we cannot |
| 26 | +assume we can load the entire dataset in memory at any one time. |
| 27 | + |
| 28 | +At the current stage of development, using tiled rasters in GeoTrellis does |
| 29 | +unfortunately require specialized data processing and programming. One reason |
| 30 | +this is the case it is far faster to operate on arrays in memory when possible. |
| 31 | +As we approach GeoTrellis 1.0, we hope to make the process more transparent for |
| 32 | +simple models. But while there is still much work to be done, there is useful |
| 33 | +infrastructure in place for developing services that operate on tiled raster |
| 34 | +sets. As this functinality is still a work in progress, I'll mention plans |
| 35 | +for future development alongside the current functionality. |
| 36 | + |
| 37 | +## The tiled ARG format |
| 38 | + |
| 39 | +On disk, a tiled ARG is a directory full of individual ARG files (each with their own json metadata) with a single master metadata file. If a tile is entirely |
| 40 | +NoData (every cell has no value), there will be no data file for that tile. In the future, we will support this same functionality for tiles with a |
| 41 | +single value. The filename of each tile has a suffix with the row and column of the tile. Each |
| 42 | +tile has the same width in pixels and the same height in pixels: for example, |
| 43 | +a common tile size is 512 by 512 or 1024 by 1024. |
| 44 | + |
| 45 | +## Creating a Tiled Raster |
| 46 | + |
| 47 | +You will need to use the command line gt-tool to create a tiled raster. See |
| 48 | +the getting started guide for more detalis about installing gt-tool. |
| 49 | + |
| 50 | +At the current time, creating a tiled raster from a source raster that is |
| 51 | +larger than 2.3 GB requires GDAL 1.10 to be installed on your system. (In |
| 52 | +the future, this will not be necessary.) |
| 53 | + |
| 54 | +You will need to decide what size tiles to create. The ideal size will vary |
| 55 | +based on your specific application: you may wish to create several tilesets and |
| 56 | +test the performance of each. Generally speaking, 512x512 may be a good choice |
| 57 | +to start with. It is important to keep in mind that every tile is the same |
| 58 | +size regardless of the source data. If your source raster is 5500x5500 and |
| 59 | +you create 1000x1000 tiles, the tiles on the right and bottom edge will have |
| 60 | +500 extra NoData cells, which can create extra work for your service. |
| 61 | + |
| 62 | +For creating the tile, you should use the "import-tile" gt-tool task. |
| 63 | + |
| 64 | +The options for gt-tool are: |
| 65 | + -i Path of the input raster file (which can be geotiff, ARG, or another GDAL supported raster format) |
| 66 | + -d Output directory for tiles |
| 67 | + -n Name of the output raster |
| 68 | + -cols Pixel columns (pixel width) per tile |
| 69 | + -rows Pixel rows (pixel height) per tile |
| 70 | + |
| 71 | +For example, a sample usage could look like: |
| 72 | + |
| 73 | + gt-tool import-tile -i /var/geotrellis/input/mydata.tif -d /var/geotrellis/data/mydata/ -n mydata -cols 512 -rows 512 |
| 74 | + |
| 75 | +There is also a gt-tool task called "import" that will create a tiled raster |
| 76 | +directly from an ARG file that doesn't depend on GDAL, but can only process |
| 77 | +raster files smaller than 2.3 GB. |
| 78 | + |
| 79 | +## Loading a Tiled Raster |
| 80 | + |
| 81 | +The operation io.LoadTileSet will load a tiled raster into memory given a path |
| 82 | +to a tile directory. |
| 83 | + |
| 84 | +Raster.loadUncachedTileSet() will load an uncached tileset into memory, which |
| 85 | +creates a Raster object for transformation that loads individual tiles from |
| 86 | +disk when necessary. |
| 87 | + |
| 88 | +In the future, this functionality will be expanded to allow tiled rasters to be |
| 89 | +included in the GeoTrellis catalog, as well as to provide a third alternative that |
| 90 | +temporarily caches tiles for efficient focal operations on tiled rasters on disk |
| 91 | +(see raster.TileNeighborhood). |
| 92 | + |
| 93 | +## Treating an untiled raster as a tiled raster |
| 94 | + |
| 95 | +If you have an untiled raster, but you want to treat it as a tiled raster for parallelization purposes, |
| 96 | +you can either create a tiled raster in memory or you can simulate one by using the LazyTiledWrapper |
| 97 | +and TiledLayout classes directly (at a performance cost). In the future, there will be operations |
| 98 | +to implement these transformations. |
| 99 | + |
| 100 | + def buildTiledRaster(r:Raster, pixelCols:Int, pixelRows:Int) = |
| 101 | + Tiler.createTiledRaster(src, pixelCols, pixelRows) |
| 102 | + |
| 103 | + def untiledRasterAsTiledRaster(r:Raster, pixelCols:Int, pixelRows:Int):Raster = { |
| 104 | + val tileLayout = Tiler.buildTileLayout(r.rasterExtent, pixelCols, pixelRows) |
| 105 | + Raster(LazyTiledWrapper(r.data, tileLayout), r.rasterExtent) |
| 106 | + } |
| 107 | + |
| 108 | + |
| 109 | +# Operations on Tiled Rasters |
| 110 | + |
| 111 | +While in the future we hope to unify all operations to seamlessly operate on tiled and |
| 112 | +untiled rasters, at the moment it is necessary to know whether individual operations |
| 113 | +have been implemented to handle tiled rasters. |
| 114 | + |
| 115 | +## Map/Reduce style Summary operations |
| 116 | + |
| 117 | +As designed, a core function of tiled rasters is to allow operations on rasters too large to fit in memory. |
| 118 | +The logic.TileReducer class be extended to create operations that perform an operation on each individual |
| 119 | +tile (the mapper) and then perform an operation to combine the results of those operations (the reducer). |
| 120 | +Examples of operations that implement this interface are stats.Min and stats.GetHistogram. |
| 121 | + |
| 122 | +## Local operations |
| 123 | + |
| 124 | +By default, local operations on tiled rasters are lazy -- chained local operations are combined into |
| 125 | +a single compound operation that is only run when necessary. Because of this, it is safe to use |
| 126 | +local operations as part of your tiled raster operations. However, you should not *only* use local |
| 127 | +operations on a tiled raster as you cannot assume that the raster can fit into memory on a single |
| 128 | +machine to produce a result. For example, you might summarize the results of your raster transformed |
| 129 | +by local operations. |
| 130 | + |
| 131 | +### Focal operations |
| 132 | + |
| 133 | +Most focal operations can be used on tiled raster data, through the use of the "TileFocalOp" operation. |
| 134 | +Instead of specifying the raster as an argument to the operation, use an underscore like in the following |
| 135 | +example: |
| 136 | + |
| 137 | + val tileFocalOp = TileFocalOp(tiledR, Min(_, Square(1))) |
| 138 | + |
| 139 | +In the future, we hope to eliminate the TileFocalOp operation and allow focal operations to work |
| 140 | +directly with tiled rasters. |
| 141 | + |
| 142 | +## Polygonal/Zonal Summary |
| 143 | + |
| 144 | +The zonal summary (with polygonal area) operations takes advantage of tiled rasters by allowing |
| 145 | +the service builder to cache intermediate results by tile. For example, the following example |
| 146 | +creates a map of intermediate results by tile (finding the minimum value in each tile). |
| 147 | + |
| 148 | + object MinService { |
| 149 | + // create the result cache once when loading this class |
| 150 | + val tileMins = zonal.summary.Min.createTileResults(rData, rasterExtent) |
| 151 | + } |
| 152 | + |
| 153 | + class MinService { |
| 154 | + // use result cache when creating this operation with specific raster & polygons |
| 155 | + def minOperation(raster:Raster, zone:Polygon[_]) = zonal.summary.Min(raster, zone, tileMins) |
| 156 | + } |
0 commit comments