JenksGTiff

Apply Jenks Natural Breaks on Geotiff files and get output image with graduated symbology.

Compute "natural breaks" (Jenks algorithm) on geotiff by preprocessing the image and thus reducing the runtime to calculate breaks while keeping the output almost more than 90% accurate to the natural break values of original dataset.

Intented compatibility: CPython 2.7+ and 3.4+

Required Dependancies:

GDAL :

pip install GDAL

Numpy :

pip install python-numpy

matplotlib :

pip install matplotlib
or    
sudo apt-get install python-matplotlib

jenkspy :

pip install jenkspy

Installation:

Download the zip file for the python package from github.

Unzip the folder to temporary location.

ubuntu@ubuntu:~$ cd tmp    
ubuntu@ubuntu:~/tmp$ unzip jenksGTiff.zip    
ubuntu@ubuntu:~/tmp$ cd jenksGTiff    
ubuntu@ubuntu:~/tmp/jenksGTiff$ pip install .

if you get an EnvironmentError: [Errno 13] Permission denied:, use

ubuntu@ubuntu:~/tmp/jenksGTiff$ pip install . --user

Usage :

>>> import jenksGTiff
>>> jenksGTiff.__all__
['clear_all', 'importGTiff', 'RemoveNoData', 'ReducedArray', 'JenksGTiff', 'DataStats', 'compareStats', 'histogram', 'exportGTiff']
>>> breaks, array, array_short = jenksGTiff.JenksGTiff('\pwd\input.tif', n_classes, NoDataVal=0, sample_size_ratio=0.1)
>>> breaks
[-0.9921568632125854, -0.37254902720451355, -0.05882352963089943, 0.13725490868091583, 0.26274511218070984, 0.40392157435417175]

Since the image dataset was reduced to a small sample dataset, we compare both the stats and plot histograms.

>>> jenksGTiff.compareStats(array, array_short)
Stats Measures - Value (original dataset) - Value (Sample dataset) 

DataCount : 393868 : 39387
Minimum : -0.99215686 : -0.99215686
Maximum : 0.40392157 : 0.40392157
Sum : 102931.59580400819 : 10252.694481091574
Mean : 0.26133528 : 0.26030654
Median : 0.29411766 : 0.29411766
StandardDeviation : 0.11551328 : 0.117199086

>>> jenksGTiff.histogram(array, 'Image Dataset', bins=134)

>>> jenksGTiff.histogram(array_short, 'Sample Dataset', bins=134)

>>> new_value = jenksGTiff.exportGTiff('\pwd\input.tiff','\cwd\output.tif', breaks, NoDataVal=0)

This should give us the output geotiff file.

Benchmark against original or larger dataset:

In [1]: import timeit

In [2]: %timeit jenksGTiff.JenksGTiff('\pwd\input.tif', n_classes=5, NoDataVal=0, sample_size_ratio=0.1)
5.62 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit jenksGTiff.JenksGTiff('\pwd\input.tif', n_classes=5, NoDataVal=0, sample_size_ratio=0.2)
25.7 s ± 1.51 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

It is possible to obtain the Natural Breaks just with a sample dataset that is 10% of the original dataset. Running the algorithm on 10% sample dataset is ~4.6X faster than that compared to running on 20% sample dataset. This brings down the runtime to calculate the breaks significantly compared to running the whole dataset.

Author:

Nikhil S Hubballi

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
__pycache__		__pycache__
jenksGTiff		jenksGTiff
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
array_hist.png		array_hist.png
array_short_hist.png		array_short_hist.png
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JenksGTiff

Required Dependancies:

Installation:

Usage :

Benchmark against original or larger dataset:

Author:

About

Uh oh!

Releases

Packages

Languages

License

samashti/JenksGTiff

Folders and files

Latest commit

History

Repository files navigation

JenksGTiff

Required Dependancies:

Installation:

Usage :

Benchmark against original or larger dataset:

Author:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages