Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Documentation with Pydata Sphinx Theme, and more #523

Merged
merged 9 commits into from
May 30, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions conda/environments/cuspatial_dev_cuda11.0.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ dependencies:
- cython>=0.29,<0.30
- gtest=1.10.0
- gmock=1.10.0
- pydata-sphinx-theme
1 change: 1 addition & 0 deletions conda/environments/cuspatial_dev_cuda11.1.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ dependencies:
- cython>=0.29,<0.30
- gtest=1.10.0
- gmock=1.10.0
- pydata-sphinx-theme
1 change: 1 addition & 0 deletions conda/environments/cuspatial_dev_cuda11.2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@ dependencies:
- cython>=0.29,<0.30
- gtest=1.10.0
- gmock=1.10.0
- pydata-sphinx-theme
84 changes: 0 additions & 84 deletions docs/source/api.rst

This file was deleted.

18 changes: 18 additions & 0 deletions docs/source/api_docs/geopandas_compatibility.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
GeoPandas Compatibility
-----------------------

We support any geometry format supported by `GeoPandas`. Load geometry information from a `GeoPandas.GeoSeries` or `GeoPandas.GeoDataFrame`.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

>>> gpdf = geopandas.read_file('arbitrary.txt')
cugpdf = cuspatial.from_geopandas(gpdf)

or

>>> cugpdf = cuspatial.GeoDataFrame(gpdf)
harrism marked this conversation as resolved.
Show resolved Hide resolved

.. currentmodule:: cuspatial

.. autoclass:: cuspatial.GeoDataFrame
:members:
.. autoclass:: cuspatial.GeoSeries
:members:
9 changes: 9 additions & 0 deletions docs/source/api_docs/gis.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
GIS
---

Two GIS functions make it easier to compute distances with geographic coordinates.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

.. currentmodule:: cuspatial

.. autofunction:: cuspatial.haversine_distance
.. autofunction:: cuspatial.lonlat_to_cartesian
100 changes: 100 additions & 0 deletions docs/source/api_docs/internals.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
Internals
---------

This page includes information to help users understand the internal
data structure of cuspatial.

GeoArrow Format
+++++++++++++++

Geospatial data is context rich - aside from just a set of
isVoid marked this conversation as resolved.
Show resolved Hide resolved
numbers representing coordinates, they together represent certain geometry
that requires grouping. For example, given 5 points in a plane,
they could be 5 separate points, 2 line segments, a single linestring,
or a pantagon. Many geometry libraries stores the points in
isVoid marked this conversation as resolved.
Show resolved Hide resolved
array of geometric objects, commonly known as "Array of Structure" (AoS).
isVoid marked this conversation as resolved.
Show resolved Hide resolved
AoS is not efficient for accelerated computing on parallel devices such
as GPU. Therefore, GeoArrow format was introduced to store geodata in
densely packed format, commonly known as "Structure of Array" (SoA).
isVoid marked this conversation as resolved.
Show resolved Hide resolved

The GeoArrow format specifies a tabular data format for geometry
information. Supported types include `Point`, `MultiPoint`, `LineString`,
`MultiLineString`, `Polygon`, and `MultiPolygon`. In order to store
these coordinate types in a strictly tabular fashion, columns are
created for Points, MultiPoints, LineStrings, and Polygons.
MultiLines and MultiPolygons are stored in the same data structure
as LineStrings and Polygons.

GeoArrow format packs complex geometry types into 14 single-column Arrow
tables. See :func:`GeoArrowBuffers<cuspatial.GeoArrowBuffers>` docstring
for the complete list of keys for the columns.

Examples
********

The `Point` geometry is the simplest: N points are stored in a length 2*N
isVoid marked this conversation as resolved.
Show resolved Hide resolved
buffer with interleaved x,y coordinates. An optional z buffer of length N
can be used.

The `Multipoint` geometry is the second simplest - identical to points,
with the addition of a `multipoints_offsets` buffer. The offsets buffer
stores N+1 indexes. The first multipoint is specified by 0, which is always
stored in offsets[0], and offsets[1], which is the length in points of
the first multipoint geometry. Subsequent multipoints are the prefix-sum of
the lengths of previous multipoints.
isVoid marked this conversation as resolved.
Show resolved Hide resolved


Consider::

buffers = GeoArrowBuffers({
"multipoints_xy":
[0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2],
"multipoints_offsets":
[0, 6, 12, 18]
})

which encodes the following GeoPandas Series::

series = geopandas.Series([
MultiPoint((0, 0), (0, 1), (0, 2)),
MultiPoint((1, 0), (1, 1), (1, 2)),
MultiPoint((2, 0), (2, 1), (2, 2)),
])

`LineString` geometry is more complicated than multipoints because the
format allows for the use of `LineStrings` and `MultiLineStrings` in the same
buffer, via the mlines key::

buffers = GeoArrowBuffers({
"lines_xy":
[0, 0, 0, 1, 0, 2, 1, 0, 1, 1, 1, 2, 2, 0, 2, 1, 2, 2, 3, 0,
3, 1, 3, 2, 4, 0, 4, 1, 4, 2],
"lines_offsets":
[0, 6, 12, 18, 24, 30],
"mlines":
[1, 3]
})

Which encodes a GeoPandas Series::

series = geopandas.Series([
LineString((0, 0), (0, 1), (0, 2)),
MultiLineString([(1, 0), (1, 1), (1, 2)],
[(2, 0), (2, 1), (2, 2)],
)
LineString((3, 0), (3, 1), (3, 2)),
LineString((4, 0), (4, 1), (4, 2)),
])
Comment on lines +80 to +93
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear how this works. What does the 3 in "mlines": [1, 3]" represent? Is that a length or an offset?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps link to the docs for the function where this is explained.

Copy link
Contributor Author

@isVoid isVoid May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an offset. I think the key difference between offsets and mlines is that offsets is N+1 in length, because all points in the point array belong to some linestring. But mlines is 2N format, becuase not all linestrings are multilinestrings, so you need to explicitly specify the start and end (linestring) offset of each multilinestring. cc. @thomcom for detail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meanwhile, checkout the updated paragraph to see if that's clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya, sorry for not being in this convo earlier. I'll be looking into dropping the 'bounding regions' in the mlines object to fit GeoArrow format correctly.


Polygon geometry includes `mpolygons` for MultiPolygons similar to the
LineString geometry. Polygons are encoded using the same format as
Shapefiles, with left-wound external rings and right-wound internal rings.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

GeoArrow Internal APIs
**********************

.. autoclass:: cuspatial.GeoArrowBuffers
:members:
.. autoclass:: cuspatial.geometry.geocolumn.GeoMeta
.. autoclass:: cuspatial.geometry.geocolumn.GeoColumn
:members:
10 changes: 10 additions & 0 deletions docs/source/api_docs/io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
IO
--

cuSpatial offers native GPU-accelerated shapefile reading. In addition, any host-side GeoPandas DataFrame can be copied into GPU memory for use with cuSpatial
algorithms.

.. currentmodule:: cuspatial

.. autofunction:: cuspatial.read_polygon_shapefile
.. autofunction:: cuspatial.from_geopandas
16 changes: 16 additions & 0 deletions docs/source/api_docs/spatial_indexing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Spatial Indexing
----------------

Spatial indexing functions provide blisteringly-fast on-GPU point-in-polygon
operations.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

.. currentmodule:: cuspatial

.. autofunction:: cuspatial.quadtree_point_in_polygon
.. autofunction:: cuspatial.quadtree_point_to_nearest_polyline
.. autofunction:: cuspatial.point_in_polygon
.. autofunction:: cuspatial.polygon_bounding_boxes
.. autofunction:: cuspatial.polyline_bounding_boxes
.. autofunction:: cuspatial.quadtree_on_points
.. autofunction:: cuspatial.join_quadtree_and_bounding_boxes
.. autofunction:: cuspatial.points_in_spatial_window
14 changes: 14 additions & 0 deletions docs/source/api_docs/trajectory.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
Trajectory
----------

Trajectory functions make it easy to identify and group trajectories from point data.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

.. currentmodule:: cuspatial

.. autofunction:: cuspatial.derive_trajectories
.. autofunction:: cuspatial.trajectory_distances_and_speeds
.. autofunction:: cuspatial.directed_hausdorff_distance
.. autofunction:: cuspatial.trajectory_bounding_boxes
.. autoclass:: CubicSpline
.. automethod:: CubicSpline.__init__
.. automethod:: CubicSpline.__call__
32 changes: 15 additions & 17 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@

# General information about the project.
project = "cuspatial"
copyright = "2019, NVIDIA"
copyright = "2019-2022, NVIDIA"
author = "NVIDIA"

# The version info for the project you're documenting, acts as replacement for
Expand Down Expand Up @@ -92,27 +92,21 @@
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False

html_theme_options = {
"external_links": [],
"github_url": "https://github.com/rapidsai/cuspatial",
"twitter_url": "https://twitter.com/rapidsai",
"show_toc_level": 1,
"navbar_align": "right",
}

# -- Options for HTML output ----------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#

html_theme = "sphinx_rtd_theme"

# on_rtd is whether we are on readthedocs.org
on_rtd = os.environ.get("READTHEDOCS", None) == "True"

if not on_rtd:
# only import and set the theme if we're building docs locally
# otherwise, readthedocs.org uses their theme by default,
# so no need to specify it
import sphinx_rtd_theme

html_theme = "sphinx_rtd_theme"
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]

html_theme = "pydata_sphinx_theme"

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
Expand Down Expand Up @@ -189,8 +183,12 @@


# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {"https://docs.python.org/": None}

intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable/", None),
"geopandas": ("https://geopandas.readthedocs.io/en/latest/", None),
"cudf": ("https://docs.rapids.ai/api/cudf/stable/", None),
}

# Config numpydoc
numpydoc_show_inherited_class_members = False
Expand Down
29 changes: 19 additions & 10 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,35 @@ geometries.
GeoArrow
--------

cuSpatial proposes a new GeoArrow format from the fruit of discussions with the GeoPandas team. GeoArrow is a packed columnar data format for the six fundamental geometry types: Point, MultiPoint, Lines, MultiLines, Polygons, and MultiPolygons. MultiGeometry is a possibility that may be implemented in the future. GeoArrow uses packed coordinate and offset columns to define objects, which enables very-fast copy between CPU, GPU, and NIC.
cuSpatial proposes a new GeoArrow format from the fruit of discussions
with the GeoPandas team. GeoArrow is a packed columnar data format
Comment on lines +18 to +19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GeoArrow exists independently of cuSpatial. This makes it sound like we invented GeoArrow. I don't think this sentence belongs in our documentation. (CC @thomcom )

Suggested change
cuSpatial proposes a new GeoArrow format from the fruit of discussions
with the GeoPandas team. GeoArrow is a packed columnar data format
GeoArrow is a packed columnar data format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I had the first implementation, though GeoArrow diverged from my implementation a little as per the previous discussions. :D :D

for the six fundamental geometry types:
Point, MultiPoint, Lines, MultiLines, Polygons, and MultiPolygons.
MultiGeometry is a possibility that may be implemented in the future.
GeoArrow uses packed coordinate and offset columns to define objects,
which enables very-fast copy between CPU, GPU, and NIC.
isVoid marked this conversation as resolved.
Show resolved Hide resolved

Any data source that is loaded into cuSpatial via :func:`cuspatial.from_geopandas` can then take advantage of `cudf`'s GPU-accelerated Arrow I/O routines.
Any data source that is loaded into cuSpatial via :func:`cuspatial.from_geopandas`
can then take advantage of `cudf`'s GPU-accelerated Arrow I/O routines.

Read more about GeoArrow format in :ref:`GeoArrow Format`.

Read more about GeoArrow format in :func:`GeoArrowBuffers<cuspatial.GeoArrowBuffers>`

cuSpatial API Reference
~~~~~~~~~~~~~~~~~~~~~~~
-----------------------

.. toctree::
:maxdepth: 2
:caption: Contents:

api.rst

~~~~~~~~~~~~~~~~~~~~~~~
api_docs/gis.rst
api_docs/spatial_indexing.rst
api_docs/trajectory.rst
api_docs/geopandas_compatibility.rst
api_docs/io.rst
api_docs/internals.rst


Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
1 change: 1 addition & 0 deletions python/cuspatial/cuspatial/core/interpolate.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ class CubicSpline:
However, cuSpatial massively outperforms scipy when many splines are fit
isVoid marked this conversation as resolved.
Show resolved Hide resolved
simultaneously. Data must be arranged in a SoA format, and the exclusive
`prefix_sum` of the separate curves must also be passed to the function.::

NUM_SPLINES = 100000
SPLINE_LENGTH = 101
t = cudf.Series(
Expand Down
Loading