Simple Ruby point clustering
In a nutshell, Clusto is a simple way to assign many points to a few clusters, ideally for rendering on a map. It's an alternative to client-side clustering which can cause performance problems and typically necessitates code duplication if your data is used by different client-side languages or frameworks (JavaScript in the browser, Objective-C on iOS, Java on Android).
If you're familiar with the k-means algorithm then Clusto's clustering process should be broadly recognisable, although it does take some liberties in the name of performance.
The Clusto.cluster method accepts an array of Clusto::Point values, and returns an array of Clusto::Cluster objects. Each cluster has one or more Clusto::Point values.
points = data.map { |p| Clusto::Point.new(p.x, p.y) }
clusters = Clusto.cluster(points)The cluster method also accepts a :bounds parameter, which will determine the initial cluster positions, and a :grid_size parameter, the square of which determines the number of clusters. By default, the :bounds is set to the minimum bounding rectangle of the supplied points and the :grid_size is set to 8.
bounds = [40.51, -74.39, 40.84, -73.44]
clusters = Clusto.cluster(points, :bounds => bounds, :grid_size => 10)Each point is assigned to the nearest cluster using the Haversine formula, but you can change the :distance_method to :euclidian if you wish:
clusters = Clusto.cluster(points, :distance_method => :euclidian)To keep track of points, you can assign each point an optional tag value:
tag = 123
point = Clusto::Point(x, y, tag)