Skip to content

ST_Geometry in Hive versus SQL

Randall Whitman edited this page Dec 21, 2021 · 13 revisions

The UDF's for Hive and SparkSQL are modeled after existing implementations of ST_Geometry. Listed below are some differences from the SQL-based implementations.

Additional ST_Geometry Functions in Hive

ST_Geometry Functions with Different Behavior

  • Overloaded constructors - These overloaded constructors differ from other ST_Geometry implementations in how the caller can specify the spatial-reference ID. Default SRID is plane, when the SRID is not specified.

    • Hive: does not accept SRID in second argument - wrap with ST_SetSRID or use ST_GeomFromText;

    • PostgreSQL: SRID in second argument is optional;

    • Oracle: SRID in second argument is required.

      Applies To: - ST_Point - ST_LineString - ST_Polygon - ST_MultiPoint - ST_MultiLineString - ST_MultiPolygon

  • ST_PointN - Return type varies in the case of index out of range - Hive: null; PostgreSQL: empty; Oracle: out-of-band value.

  • ST_AsText - The OGC WKT standard dictates that a MultiPoint is represented as MULTIPOINT ((1 2),(3 4)); however some existing WKT parsers accept only MULTIPOINT (1 2, 3 4). ST_AsText outputs the former, compliant format, with the nested parentheses.

  • ST_Centroid - In older versions of Spatial Framework for Hadoop, returned center of the polygon's envelope instead of the centroid.

  • ST_Envelope - In the case of a point or a vertical or horizontal line, ST_Envelope may either apply a tolerance or return an empty envelope.

  • ST_Intersection - In the case where the two geometries intersect in a lower dimension, ST_Intersection may drop the lower-dimension intersections, or output a closed linestring. ST_Intersection(ST_Polygon(2,0, 3,1, 2,1), ST_Polygon(1,1, 4,1, 4,4, 1,4))) -- MULTIPOLYGON EMPTY or LINESTRING (2 1, 3 1, 2 1)

  • ST_Union - may drop lower-dimension members of the union ST_Union(ST_LineString(2,3, 4,5), ST_Point(1,1)) -- MULTILINESTRING ((2 3, 4 5))

  • ST_SymmetricDiff - Hive-spatial follows the naming in the Esri implemention of ST_Geometry. For the OGC naming, use an alias: create temporary function ST_SymDifference as 'com.esri.hadoop.hive.ST_SymmetricDiff';

ST_Geometry Functions Not Supported in Hive

(but may be in some SQL implementations)

  • ST_GeomCollection, ST_GeomFromText, ST_NumGeometries, ST_GeometryN - collection of varying geometry types is not supported; hive supports arrays
  • ST_Geometry - no constructor of this name - use one of the other constructors, e.g. ST_GeomFromText
  • ST_Curve, ST_Surface, ST_MultiCurve, ST_MultiSurface - Curve and Surface constructors not supported
  • ST_PointOnSurface - ST_PointOnSurface is not supported on Hive
  • ST_GeoSize - ST_GeoSize is not supported on Hive
  • ST_Transform - ST_Transform is not supported on Hive
  • ST_Equalsrs - ST_Equalsrs is not supported on Hive