Inspired by OfflineReverseGeocode found at this Reddit post. You may also be interested in GeoSharp. Uses KdTree. Built and tested on .Net 4.5.
This library provides classes for downloading, reading and parsing, writing and composing files from GeoNames.org and provides (reverse) geocoding methods like NearestNeighbourSearch()
and RadialSearch()
on the downloaded dataset(s).
This library is available as NuGet package.
var datadir = @"D:\test\geo\";
// Download file (optional; you can point a GeoFileReader to existing files ofcourse)
var downloader = GeoFileDownloader.CreateGeoFileDownloader();
downloader.DownloadFile("NL.zip", datadir); // Zipfile will be automatically extracted
// Read NL.txt file to memory (NL = ISO3166-2:The Netherlands)
var nldata = GeoFileReader.ReadExtendedGeoNames(Path.Combine(datadir, "NL.txt")).ToArray();
// Note: we "Materialize" the file to memory by calling ToArray()
// We're going to use Amsterdam as "search-center"
var amsterdam = nldata.Where(n =>
n.Name.Equals("Amsterdam", StringComparison.OrdinalIgnoreCase)
&& n.FeatureCode.Equals("PPLC")
).First();
// Initialize a reversegeocoder with our geo-items from The Netherlands
var reversegeocoder = new ReverseGeoCode<ExtendedGeoName>(nldata);
// Locate 250 geo-items near the center of Amsterdam
var results = reversegeocoder.RadialSearch(amsterdam, 250);
// Print the results
foreach (var r in results) {
Console.WriteLine(
string.Format(
CultureInfo.InvariantCulture, "{0}, {1} {2} ({3:F4}Km)",
r.Latitude, r.Longitude, r.Name, r.DistanceTo(amsterdam) / 1000d
)
);
}
The library provides for the following main operations:
- Downloading / retrieving data from geonames.org (Optional)
- Reading / parsing geonames.org data
- Utilizing geonames.org data
- Writing / composing geonames.org data
The library consists mainly of parsers, composers and entities (in their respective namespaces) and a GeoFileReader
and GeoFileWriter
to read/parse and write/compose geonames.org compatible files, a GeoFileDownloader
to retrieve files from geonames.org and a ReverseGeoCode<T>
class to do the heavy lifting of the reverse geocoding itself.
Because some "geoname files" can be very large (like allcountries.txt
) we have a GeoName
entity which is a simplified version (and baseclass) of an ExtendedGeoName
. The GeoName
class contains a unique id which can be used to resolve the ExtendedGeoName
easily for more information when required. It is, however, recommended to use <countrycode>.txt
(e.g. GB.txt
) cities15000.txt
or cities1000.txt
for example to reduce the dataset to a smaller size, You can also compose your own custom datasets using the GeoFileWriter
and composers.
Also worth noting is that the readers return an IEnumerable<SomeEntity>
; make sure that you materialize these enumerables to a list, array or other datastructure (using .ToList()
, .ToArray()
, .ToDictionary()
etc.) if you access it more than once to avoid file I/O to the underlying file each time you access the data.
To download files from geonames.org you can use the GeoFileDownloader
class which is, in essence, a wrapper for a basic WebClient
. The simplest form is:
// Downloads (and extracts) geoname data in NL.zip from geonames.org
GeoFileDownloader.CreateGeoFileDownloader()
.DownloadFile("NL.zip", @"D:\my\geodata\geo");
// Downloads (and extracts) postalcode data in NL.zip from geonames.org
GeoFileDownloader.CreatePostalcodeDownloader()
.DownloadFile("NL.zip", @"D:\my\geodata\postalcode");
You can specify the BaseUri in the GeoFileDownloader
constructor or just pass an absolute url to the DownloadFile()
method if you want to use another location than the default http://download.geonames.org/export/dump/
. The static 'factory methods' CreateGeoFileDownloader()
and CreatePostalcodeDownloader
are the easiest way to create a GeoFileDownloader
; these use the built-in values for the BaseUri. The GeoFileDownloader
has properties to set a (HTTP) CachePolicy
, Proxy
and Credentials
to use when downloading the file. The filedownloader, by default, downloads a file only if the destination file doesn't exist or when the destination file has "expired" (by default 24 hours). It uses the file's CreationDate to determine when the file was downloaded and if a newer version should be downloaded. The "TTL", how long a file will be 'valid', can be set using the DefaultTTL
property of the GeoFileDownloader
class. You can also use the DownloadFileWhenOlderThan()
method which allows you to explicitly set a TTL. When a filename is specified (e.g. d:\folder\foo.txt
) the file will be named accordingly.
ZIP files are automatically extracted in the destinationfolder; the original zipfile is preserved because the GeoFileDownloader
needs to know which files are supposed to be in the zipfile and thus in the destinationdirectory in their extracted form.
Once files are downloaded using the GeoFileDownloader
, or by using your own custom/specific implementation, the files can be accessed using the GeoFileReader
class. This class contains a number of static "convenience methods" like ReadGeoNames()
and it's "sibling" ReadExtendedGeoNames()
. but also ReadCountryInfo()
, ReadAlternateNames()
etc. There is a "convenience method" for each entity.
// Open file "cities1000.txt" and retrieve only cities in the US
var cities_in_us = GeoFileReader.ReadExtendedGeoNames(@"D:\my\geodata\cities1000.txt")
.Where(p => p.CountryCode.Equals("US", StringComparison.OrdinalIgnoreCase))
.OrderBy(p => p.Name);
Again, please note that Read<Something>
methods return an IEnumerable<T>
. Whenever you want to access the data more than once you will probably want to call .ToArray()
or similar to materialize the data into memory. The GeoFileReader
class has two static method (ReadBuiltInContinents()
and ReadBuiltInFeatureClasses()
) that can be used to use built-in values for continents and feature codes which are not provided by geonames.org as downloadable files. You can, however, craft your own files for this purpose and use the ReadContinents()
and ReadFeatureClasses()
if you want to specify your own values / update built-in values (should NGeoNames
's values be outdated for example).
You can also add your own entities and, as long as you provide a parser for it, use the GeoFileReader
class to read/parse files for these entities as well:
var data = new GeoFileReader().ReadRecords<MyEntity>("d:\foo\bar.txt", new MyEntityParser());
As long as your parser implements IParser<MyEntity>
you're good to go. A parser can skip a fixed number of lines in a file (for example a 'header' record), skip comments (for example lines starting with #
) and you can even specify the encoding to use etc. Examples and more information can be found in the unittests.
Another thing to note is that the GeoFileReader
will try to "autodetect" if the file is a plain text file (.txt
extension) or a GZipped file (.gz
extension). Support for GZip was added to keep the footprint of the files lower when desired. This will, however, trade-off I/O speed and CPU load for space. The ReadRecords<T>()
method has an overload where you can explicitly specify the type of the file (should you want to use your own file-extensions like .dat
for example).
Support for compressing downloaded files using the
GeoFileDownloader
on the fly is planned for a later version; for now you will have to GZip the files manually.
The GeoFileReader
also supports the use of Stream
s so you can provide data from a MemoryStream for example or any other source that can be wrapped in a stream.
As you'll probably realize by now, the GeoFileReader
class combined with LINQ allows for very powerful querying, filtering and sorting of the data. Combine it with the GeoFileWriter
to persist custom datasets (custom "materialized views") and the sky is the limit.
The 'heart' of the library is the ReverseGeoCode<T>
class. When you supply it with either IEnumerable<GeoNames>
or IEnumerable<ExtendedGeoNames>
it can be used to do a RadialSearch()
or NearestNeighbourSearch()
. Supplying the class with data can be done by either passing it to the class constructor or by using the Add()
or AddRange()
methods. You may want to call the Balance()
method to balance the internal KD-tree, however; this is done automatically when the data is supplied via the constructor. Even if you choose to store your data in a database or custom (binary?) fileformat or anything else; as long as you provide an IEnumerable
to this class you'll be able to use it.
// Create our ReverseGeoCode class and supply it with data
var r = new ReverseGeoCode<ExtendedGeoName>(
GeoFileReader.ReadExtendedGeoNames(@"D:\foo\cities1000.txt")
);
// Create a point from a lat/long pair from which we want to conduct our search(es) (center)
var new_york = r.CreateFromLatLong(40.7056308, -73.9780035);
// Find 10 nearest
r.NearestNeighbourSearch(new_york, 10);
Ofcourse there's no need to dabble with lat/long at all:
// Read data into memory
var data = GeoFileReader.ReadExtendedGeoNames(@"D:\foo\cities1000.txt")
.ToDictionary(p => p.Id);
// Find New York by it's geoname ID (O(1) lookup)
var new_york = data[5128581];
// Find 10 nearest
var r = new ReverseGeoCode<ExtendedGeoName>(data.Values);
r.NearestNeighbourSearch(new_york, 10);
Or simply find by name:
// Read data into memory
var data = GeoFileReader.ReadExtendedGeoNames(@"D:\foo\cities1000.txt")
.ToArray();
// Find New York by it's name (linear search, O(n))
var new_york = data.Where(p => p.Name.Equals("New York City")).First();
// Find 10 nearest
var r = new ReverseGeoCode<ExtendedGeoName>(data);
r.NearestNeighbourSearch(new_york, 10);
Depending on how you want to search/use the underlying data you may want to use other, more optimal, datastructures than demonstrated above. It's up to you!
Note that the library is based on the International System of Units (SI); units of distance are specified in meters. If you want to use the imperial system (e.g. miles, nautical miles, yards, foot and whathaveyou's) you need to convert to/from meters. The GeoUtil
class provides helper-methods for converting miles/yards to meters and vice versa.
The GeoName
class (and, by extension, the ExtendedGeoName
class) has a DistanceTo()
method which can be used to determine the exact distance betweem two points.
Both the NearestNeighbourSearch()
and RadialSearch()
methods have some overloads that accept lat/long pairs as doubles as well.
The NGeoNames.Composers
namespace holds composers (the opposite of parsers) to enable you to write geoname.org datafiles. For this you can use the GeoNameFileWriter
class which, like the GeoNameFileReader
class, has generic methods for writing records (WriteRecords<T>
) and static "convenience methods" to write specific entities to a file. If you wanted to 'transform' a file like allcountries.txt
to a file with data from, say, the Benelux) you could supply the GeoNameFileWriter
with data from BE.txt
, NL.txt
and LU.txt
or data from allcountries.txt
or cities1000.txt
filtered with a LINQ query to only data from these countries.
Below is an example of what this would look like (with an extra filter added to filter out records with population < 1000
):
// Filter 'allcountries.txt' to only BE, NL, LU entries with a population of >= 1000
GeoFileWriter.WriteExtendedGeoNames(@"d:\foo\benelux1000.txt",
GeoFileReader.ReadExtendedGeoNames(@"d:\foo\allcountries.txt")
.Where(e => new[] { "BE", "NL", "LU" }.Contains(e.CountryCode) && e.Population >= 1000)
.OrderBy(e => e.CountryCode).ThenBy(e => e.Name)
);
// ...or...
// Join BE, NL en LU datasets, filter records with a population of >= 1000
GeoFileWriter.WriteExtendedGeoNames(@"d:\foo\benelux1000.txt",
GeoFileReader.ReadExtendedGeoNames(@"d:\foo\BE.txt")
.Union(GeoFileReader.ReadExtendedGeoNames(@"d:\foo\NL.txt"))
.Union(GeoFileReader.ReadExtendedGeoNames(@"d:\foo\LU.txt"))
.Where(e => e.Population >= 1000)
.OrderBy(e => e.CountryCode).ThenBy(e => e.Name)
);
The GeoNamesReader
and GeoNamesWriter
and the (Extended)GeoName parsers/composers always assume the ExtendedGeoName
format (e.g. 19 fields of data) unless explicitly specified. The parameter extendedfileformat may pop-up on some method overloads; whenever this parameter is passed false
the class will assume a 'simple' (or non-extended) format with only 4 fields of data: Id, Name, Latitude and Longitude. This format is more compact; especially when writing GeoName
entities instead of ExtendedGeoName
entities to a file. However, to remain compatible with the original files you probably don't want to use this 'simple' format. Make sure you understand the consequences before you do!
The NuGet package comes with a Windows Help File (NGeonames.chm
) with lots more information. You can also build this help file, or other formats, yourself using Sandcastle Help File Builder. And finally you can use the richly commented code if you don't want to build or use help files.
The project will be updated from time-to-time when required. I am happy to accept pull-requests; if you're interested in contributing to this library please contact me. If you have any issues please open an issue.
Licensed under MIT license. See LICENSE for details.
Logo / icon sourced from iconninja.com (Archived page)