Skip to content

Commit

Permalink
[DOCS] Enable markdownlint rule MD022 (#1194)
Browse files Browse the repository at this point in the history
  • Loading branch information
jbampton authored Jan 10, 2024
1 parent 2a5f80f commit 415ca3e
Show file tree
Hide file tree
Showing 31 changed files with 81 additions and 3 deletions.
3 changes: 0 additions & 3 deletions .github/linters/.markdown-lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,6 @@ MD010: false
# line-length - Line length
MD013: false

# blanks-around-headings - Headings should be surrounded by blank lines
MD022: false

# no-duplicate-heading - Multiple headings with the same content
MD024: false

Expand Down
3 changes: 3 additions & 0 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@

# apache.sedona <img src="man/figures/logo.png" align="right" width="120"/>

[Apache Sedona](https://sedona.apache.org/) is a cluster computing system for processing large-scale spatial data. Sedona extends existing cluster computing systems, such as Apache Spark and Apache Flink, with a set of out-of-the-box distributed Spatial Datasets and Spatial SQL that efficiently load, process, and analyze large-scale spatial data across machines.

The apache.sedona R package exposes an interface to Apache Sedona through `{sparklyr}`
enabling higher-level access through a `{dplyr}` backend and familiar R functions.

## Installation

To use Apache Sedona from R, you just need to install the apache.sedona package; Spark dependencies are managed directly by the package.

```r
Expand All @@ -14,6 +16,7 @@ install.packages("apache.sedona")
```

#### Development version

To use the development version, you will need both the latest version of the package and of the Apache Sedona jars.

To get the latest R package from GtiHub:
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
* [Join the community](#join-the-community)

## What is Apache Sedona?

Apache Sedona™ is a spatial computing engine that enables developers to easily process spatial data at any scale within modern cluster computing systems such as Apache Spark and Apache Flink. Sedona developers can express their spatial data processing tasks in Spatial SQL, Spatial Python or Spatial R. Internally, Sedona provides spatial data loading, indexing, partitioning, and query processing/optimization functionality that enable users to efficiently analyze spatial data at any scale.

<img alt="Sedona Ecosystem" src="docs/image/sedona-ecosystem.png" width="800" class="center">

### Features

Some of the key features of Apache Sedona include:

* Support for a wide range of geospatial data formats, including GeoJSON, WKT, and ESRI Shapefile.
Expand Down Expand Up @@ -53,6 +55,7 @@ Apache Sedona is a widely used framework for working with spatial data, and it h
This example loads NYC taxi trip records and taxi zone information stored as .CSV files on AWS S3 into Sedona spatial dataframes. It then performs spatial SQL query on the taxi trip datasets to filter out all records except those within the Manhattan area of New York. The example also shows a spatial join operation that matches taxi trip records to zones based on whether the taxi trip lies within the geographical extents of the zone. Finally, the last code snippet integrates the output of Sedona with GeoPandas and plots the spatial distribution of both datasets.

#### Load NYC taxi trips and taxi zones data from CSV Files Stored on AWS S3

```python
taxidf = sedona.read.format('csv').option("header","true").option("delimiter", ",").load("s3a://your-directory/data/nyc-taxi-data.csv")
taxidf = taxidf.selectExpr('ST_Point(CAST(Start_Lon AS Decimal(24,20)), CAST(Start_Lat AS Decimal(24,20))) AS pickup', 'Trip_Pickup_DateTime', 'Payment_Type', 'Fare_Amt')
Expand All @@ -69,6 +72,7 @@ taxidf_mhtn = taxidf.where('ST_Contains(ST_PolygonFromEnvelope(-74.01,40.73,-73.
```

#### Spatial Join between Taxi Dataframe and Zone Dataframe to Find taxis in each zone

```python
taxiVsZone = sedona.sql('SELECT zone, zipcode, pickup, Fare_Amt FROM zoneDf, taxiDf WHERE ST_Contains(zone, pickup)')
```
Expand All @@ -88,6 +92,7 @@ zone.set_ylim(40.65, 40.9)

taxi = taxiGpd.plot(ax=zone, alpha=0.01, color='red', zorder=3)
```

## Docker image

We provide a Docker image for Apache Sedona with Python JupyterLab and a single-node cluster. The images are available on [DockerHub](https://hub.docker.com/r/apache/sedona)
Expand Down
5 changes: 5 additions & 0 deletions docs/api/flink/Function.md
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,7 @@ Output:
```
POLYGON ((125 100, 20 40, 50 60, 175 150, 125 100))
```

## ST_ConvexHull

Introduction: Return the Convex Hull of polygon A
Expand Down Expand Up @@ -1089,6 +1090,7 @@ POLYGON((0 0,0 5,5 0,0 0),(1 1,3 1,1 3,1 1))
```

## ST_Force3D

Introduction: Forces the geometry into a 3-dimensional model so that all output representations will have X, Y and Z coordinates.
An optionally given zValue is tacked onto the geometry if the geometry is 2-dimensional. Default value of zValue is 0.0
If the given geometry is 3-dimensional, no change is performed on it.
Expand Down Expand Up @@ -1293,6 +1295,7 @@ To understand the cell statistics please refer to [H3 Doc](https://h3geo.org/doc
H3 native fill functions doesn't guarantee full coverage on the shapes.

### Cover Polygon

When fullCover = false, for polygon sedona will use [polygonToCells](https://h3geo.org/docs/api/regions#polygontocells).
This can't guarantee full coverage but will guarantee no false positive.

Expand All @@ -1302,6 +1305,7 @@ This will lead to redundancy but can guarantee full coverage.
Choose the option according to your use case.

### Cover LineString

For the lineString, sedona will call gridPathCells(https://h3geo.org/docs/api/traversal#gridpathcells) per segment.
From H3's documentation
> This function may fail to find the line between two indexes, for example if they are very far apart. It may also fail when finding distances for indexes on opposite sides of a pentagon.
Expand Down Expand Up @@ -2550,6 +2554,7 @@ POLYGON ((8766047.980342899 17809098.336766362, 5122546.516721856 18580261.91252
```

## ST_Translate

Introduction: Returns the input geometry with its X, Y and Z coordinates (if present in the geometry) translated by deltaX, deltaY and deltaZ (if specified)

If the geometry is 2D, and a deltaZ parameter is specified, no change is done to the Z coordinate of the geometry and the resultant geometry is also 2D.
Expand Down
1 change: 1 addition & 0 deletions docs/api/flink/Predicate.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ false
```

## ST_OrderingEquals

Introduction: Returns true if the geometries are equal and the coordinates are in the same order

Format: `ST_OrderingEquals(A: geometry, B: geometry)`
Expand Down
1 change: 1 addition & 0 deletions docs/api/sql/Constructor.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
## Read ESRI Shapefile

Introduction: Construct a DataFrame from a Shapefile

Since: `v1.0.0`
Expand Down
5 changes: 5 additions & 0 deletions docs/api/sql/Function.md
Original file line number Diff line number Diff line change
Expand Up @@ -1097,6 +1097,7 @@ POLYGON((0 0,0 5,5 0,0 0),(1 1,3 1,1 3,1 1))
```

## ST_Force3D

Introduction: Forces the geometry into a 3-dimensional model so that all output representations will have X, Y and Z coordinates.
An optionally given zValue is tacked onto the geometry if the geometry is 2-dimensional. Default value of zValue is 0.0
If the given geometry is 3-dimensional, no change is performed on it.
Expand Down Expand Up @@ -1301,6 +1302,7 @@ To understand the cell statistics please refer to [H3 Doc](https://h3geo.org/doc
H3 native fill functions doesn't guarantee full coverage on the shapes.

### Cover Polygon

When fullCover = false, for polygon sedona will use [polygonToCells](https://h3geo.org/docs/api/regions#polygontocells).
This can't guarantee full coverage but will guarantee no false positive.

Expand All @@ -1310,6 +1312,7 @@ This will lead to redundancy but can guarantee full coverage.
Choose the option according to your use case.

### Cover LineString

For the lineString, sedona will call gridPathCells(https://h3geo.org/docs/api/traversal#gridpathcells) per segment.
From H3's documentation
> This function may fail to find the line between two indexes, for example if they are very far apart. It may also fail when finding distances for indexes on opposite sides of a pentagon.
Expand Down Expand Up @@ -2123,6 +2126,7 @@ Output:
```

## ST_NumPoints

Introduction: Returns number of points in a LineString

!!!note
Expand Down Expand Up @@ -2658,6 +2662,7 @@ POLYGON ((8766047.980342899 17809098.336766362, 5122546.516721856 18580261.91252
```

## ST_Translate

Introduction: Returns the input geometry with its X, Y and Z coordinates (if present in the geometry) translated by deltaX, deltaY and deltaZ (if specified)

If the geometry is 2D, and a deltaZ parameter is specified, no change is done to the Z coordinate of the geometry and the resultant geometry is also 2D.
Expand Down
2 changes: 2 additions & 0 deletions docs/api/sql/Optimizer.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Sedona Spatial operators fully supports Apache SparkSQL query optimizer. It has
Sedona join performance is heavily affected by the number of partitions. If the join performance is not ideal, please increase the number of partitions by doing `df.repartition(XXX)` right after you create the original DataFrame.

## Range join

Introduction: Find geometries from A and geometries from B such that each geometry pair satisfies a certain predicate. Most predicates supported by SedonaSQL can trigger a range join.

Spark SQL Example:
Expand Down Expand Up @@ -280,6 +281,7 @@ FROM lefts
```

## Regular spatial predicate pushdown

Introduction: Given a join query and a predicate in the same WHERE clause, first executes the Predicate as a filter, then executes the join query.

Spark SQL Example:
Expand Down
1 change: 1 addition & 0 deletions docs/api/sql/Overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Introduction

## Function list

SedonaSQL supports SQL/MM Part3 Spatial SQL Standard. It includes four kinds of SQL operators as follows. All these operators can be directly called through:
```scala
var myDataFrame = sedona.sql("YOUR_SQL")
Expand Down
2 changes: 2 additions & 0 deletions docs/api/sql/Parameter.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
## Usage

SedonaSQL supports many parameters. To change their values,

1. Set it through SparkConf:
Expand All @@ -18,6 +19,7 @@ println(sedonaConf)
```scala
sparkSession.conf.set("sedona.global.index","false")
```

## Explanation

* sedona.global.index
Expand Down
1 change: 1 addition & 0 deletions docs/api/sql/Predicate.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ true
```

## ST_OrderingEquals

Introduction: Returns true if the geometries are equal and the coordinates are in the same order

Format: `ST_OrderingEquals(A: geometry, B: geometry)`
Expand Down
4 changes: 4 additions & 0 deletions docs/api/sql/Raster-operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ POINT (156.5 -75.5)
```

### RS_PixelAsCentroids

Introduction: Returns a list of the centroid point geometry, the pixel value and its raster X and Y coordinates for each pixel in the raster at the specified band.
Each centroid represents the geometric center of the corresponding pixel's area.

Expand Down Expand Up @@ -104,6 +105,7 @@ IndexOutOfBoundsException: Specified pixel coordinates (6, 2) do not lie in the
```

### RS_PixelAsPoints

Introduction: Returns a list of the pixel's upper-left corner point geometry, the pixel value and its raster X and Y coordinates for each pixel in the raster at the specified band.

Format: `RS_PixelAsPoints(raster: Raster, band: Integer)`
Expand Down Expand Up @@ -173,6 +175,7 @@ POLYGON ((131 -246, 139 -246, 139 -254, 131 -254, 131 -246))
```

### RS_PixelAsPolygons

Introduction: Returns a list of the polygon geometry, the pixel value and its raster X and Y coordinates for each pixel in the raster at the specified band.

Format: `RS_PixelAsPolygons(raster: Raster, band: Integer)`
Expand Down Expand Up @@ -1450,6 +1453,7 @@ Output:
```
4
```

### RS_Resample

Introduction:
Expand Down
3 changes: 3 additions & 0 deletions docs/api/sql/Raster-visualizer.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Sedona offers some APIs to aid in easy visualization of a raster object.

## Image-based visualization

Sedona offers APIs to visualize a raster in an image form. This API only works for rasters with byte data, and bands <= 4 (Grayscale - RGBA). You can check the data type of an existing raster by using [RS_BandPixelType](../Raster-operators/#rs_bandpixeltype) or create your own raster by passing 'B' while using [RS_MakeEmptyRaster](../Raster-loader/#rs_makeemptyraster).

### RS_AsBase64

Introduction: Returns a base64 encoded string of the given raster. If the datatype is integral then this function internally takes the first 4 bands as RGBA, and converts them to the PNG format, finally produces a base64 string. When the datatype is not integral, the function converts the raster to TIFF format, and then generates a base64 string. To visualize other bands, please use it together with `RS_Band`. You can take the resulting base64 string in [an online viewer](https://base64-viewer.onrender.com/) to check how the image looks like.

!!!Warning
Expand All @@ -26,6 +28,7 @@ iVBORw0KGgoAAAA...
```

### RS_AsImage

Introduction: Returns a HTML that when rendered using an HTML viewer or via a Jupyter Notebook, displays the raster as a square image of side length `imageWidth`. Optionally, an imageWidth parameter can be passed to RS_AsImage in order to increase the size of the rendered image (default: 200).

Format: `RS_AsImage(raster: Raster, imageWidth: Integer = 200)`
Expand Down
2 changes: 2 additions & 0 deletions docs/api/sql/Visualization_SedonaKepler.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ A map config can be passed optionally to apply pre-apply customizations to the m
```

### **Adding SedonaDataFrame to a map object using SedonaKepler.add_df**

SedonaKepler exposes a add_df API with the following signature:

```python
Expand All @@ -60,6 +61,7 @@ The parameters name has the same conditions as 'create_map'
```

### **Setting a config via the map**

A map rendered by accessing the map object created by SedonaKepler includes a config panel which can be used to customize the map

### **Saving and setting config**
Expand Down
2 changes: 2 additions & 0 deletions docs/community/develop.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,11 +85,13 @@ Re-run the test case. Do NOT right click the test case to re-run. Instead, click
To run all Python test cases, follow steps mentioned [here](../../setup/compile/#run-python-test).

#### Run all python tests in a single test file

To run a particular python test file, specify the path of the .py file to pipenv.

For example, to run all tests in `test_function.py` located in `python/tests/sql/`, use: `pipenv run pytest tests/sql/test_function.py`.

#### Run a single test

To run a particular test in a particular .py test file, specify `file_name::class_name::test_name` to the pytest command.

For example, to run the test on ST_Contains function located in sql/test_predicate.py, use: `pipenv run pytest tests/sql/test_predicate.py::TestPredicate::test_st_contains`
Expand Down
2 changes: 2 additions & 0 deletions docs/community/publication.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Apache Sedona was formerly called GeoSpark, initiated by Arizona State University [Data Systems Lab](https://www.datasyslab.net/).

## Key publications

**"Spatial Data Management in Apache Spark: The
GeoSpark Perspective and Beyond"** is the full research paper that talks about the entire GeoSpark ecosystem. Please cite this paper if your work mentions GeoSpark core system.

Expand All @@ -20,6 +21,7 @@ GeoSpark were evaluated by papers published on database top venues. It is worth
> GeoSpark comes close to a complete spatial analytics system. It also exhibits the best performance in most cases.
## Full publications

### GeoSpark Ecosystem

["Spatial Data Management in Apache Spark: The
Expand Down
4 changes: 4 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
### 10/12/2023: Sedona 1.5.0 released. It adds comprehensive raster data ETL and analytics, native support of Uber H3 functions, and SedonaKepler / SedonaPyDeck for interactive map visualization

### 06/25/2023: Sedona 1.4.1 released. It adds geodesic / geography functions, more raster functions and support Spark 3.4.

### 03/19/2023: Sedona 1.4.0 released. It provides GeoParquet filter pushdown (10X less memory footprint), faster serialization (3X speed), S2-based fast approximate join and enhanced R language support

### 01/2023: Apache Sedona graduated to an Apache Top Level Project!

### 12/23/2022: Sedona 1.3.1-incubating is released. It adds native support of GeoParquet, DataFrame style API, Scala 2.13, Python 3.10, spatial aggregation on Flink. Please check Sedona release notes.
2 changes: 2 additions & 0 deletions docs/setup/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Download a Spark distribution from [Spark download page](http://spark.apache.org/downloads.html).

## Preliminary

1. Set up password-less SSH on your cluster. Each master-worker pair should have bi-directional password-less SSH.
2. Make sure you have installed JRE 1.8 or later.
3. Add the list of your workers' IP address in ./conf/slaves
Expand All @@ -23,6 +24,7 @@ spark.driver.maxResultSize 5g
For more details of Spark parameters, please visit [Spark Website](https://spark.apache.org/docs/latest/configuration.html).

## Start your cluster

Go the root folder of the uncompressed Apache Spark folder. Start your spark cluster via a terminal

```
Expand Down
3 changes: 3 additions & 0 deletions docs/setup/compile.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
[![Scala and Java build](https://github.com/apache/sedona/actions/workflows/java.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/java.yml) [![Python build](https://github.com/apache/sedona/actions/workflows/python.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/python.yml) [![R build](https://github.com/apache/sedona/actions/workflows/r.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/r.yml) [![Example project build](https://github.com/apache/sedona/actions/workflows/example.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/example.yml) [![Docs build](https://github.com/apache/sedona/actions/workflows/docs.yml/badge.svg)](https://github.com/apache/sedona/actions/workflows/docs.yml) [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/apache/sedona/HEAD?filepath=binder)

## Compile Scala / Java source code

Sedona Scala/Java code is a project with multiple modules. Each module is a Scala/Java mixed project which is managed by Apache Maven 3.

* Make sure your Linux/Mac machine has Java 1.8, Apache Maven 3.3.1+, and Python3.7+. The compilation of Sedona is not tested on Windows machines.
Expand Down Expand Up @@ -64,6 +65,7 @@ User can specify `-Dspark` and `-Dscala` command line options to compile with di
Sedona uses GitHub Actions to automatically generate jars per commit. You can go [here](https://github.com/apache/sedona/actions/workflows/java.yml) and download the jars by clicking the commits ==Artifacts== tag.

## Run Python test

1. Set up the environment variable SPARK_HOME and PYTHONPATH

For example,
Expand Down Expand Up @@ -103,6 +105,7 @@ cd python
pipenv run python setup.py build_ext --inplace
pipenv run pytest tests
```

## Compile the documentation

The website is automatically built after each commit. The built website can be downloaded here:
Expand Down
2 changes: 2 additions & 0 deletions docs/setup/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,12 +104,14 @@ This docker image can only be built against Sedona 1.4.1+ and Spark 3.0+
## Cluster Configuration

### Software

* OS: Ubuntu 22.02
* JDK: openjdk-19
* Python: 3.10
* Spark 3.4.1

### Web UI

* JupyterLab: http://localhost:8888/
* Spark master URL: spark://localhost:7077
* Spark job UI: http://localhost:4040
Expand Down
1 change: 1 addition & 0 deletions docs/setup/install-scala.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ Please refer to [Sedona Maven Central coordinates](maven-coordinates.md) to sele
```

### Download Sedona jar manually

1. Have your Spark cluster ready.

2. Download Sedona jars:
Expand Down
4 changes: 4 additions & 0 deletions docs/setup/maven-coordinates.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,12 +282,16 @@ Under BSD 3-clause (compatible with Apache 2.0 license)
```

## SNAPSHOT versions

Sometimes Sedona has a SNAPSHOT version for the upcoming release. It follows the same naming conversion but has "SNAPSHOT" as suffix in the version. For example, `{{ sedona_create_release.current_snapshot }}`

In order to download SNAPSHOTs, you need to add the following repositories in your pom.xml or build.sbt

### build.sbt

resolvers +=
"Apache Software Foundation Snapshots" at "https://repository.apache.org/content/groups/snapshots"

### pom.xml

```xml
Expand Down
Loading

0 comments on commit 415ca3e

Please sign in to comment.