Skip to content

Conversation

@nyurik
Copy link

@nyurik nyurik commented Dec 6, 2019

Patch researched and created by @jsanz. It has been merged to OpenMapTiles,
and should probably be merged to the origin as well. Notes from @jsanz:

Since Postgres 9.6 the PARALLEL SAFE modifiers for function definition enables the execution of queries using one or more workers in parallel (more details here).

This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present.

I've done tests on all functions to ensure they don't block parallel executions with the following procedure:

  • Created a sample database on a postgres:12 Docker container with Postgis 3 installed
  • Created a sample table points with 1M random points and a random kpi integer column
  • Imported the layer ne_10m_admin_1_states_provinces from Natural Earth to have also a multipolygon table
  • Relaxed the conditions for parallel execution:
[local] postgres@bench=#  set parallel_setup_cost = 10;
SET
Time: 0.322 ms
[local] postgres@bench=# set parallel_tuple_cost = 0.001;
SET
  • Run sample queries with the keyword enabled until the parallel execution was activated
  • Removed the keyword and run the same query for comparison.

The tests below shared are just meant to showcase that PARALLEL SAFE can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that.

It's worth noting that this makes these functions only available for Postgres 9.6 onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way.

Query examples

For each query, the execution time in milliseconds with and without the PARALLEL SAFE keyword is shared.

MakeArc

explain analyze 
select makearc(geom_lag, geom, geom_lead)
from (
        select lag(geom) over (partition by kpi % 3 order by kpi) geom_lag,
                geom,
                lead(geom) over (partition by kpi % 3 order by kpi) geom_lead
        from points
) a 
union all
select makearc(geom_lag, geom, geom_lead)
from (
        select lag(geom) over (partition by kpi % 4 order by kpi) geom_lag,
                geom,
                lead(geom) over (partition by kpi % 4 order by kpi) geom_lead
        from points
) a;
  • With parallel: 1148
  • Without parallel: 19395

MercBuffer

explain analyze verbose 
select count(*) 
from points 
where st_area(mercbuffer(geom,kpi)) > kpi^10;
  • With parallel: 8942
  • Without parallel: 25351

MercDWithin

explain analyze verbose 
select count(*) 
from points 
where
MercDWithin(
        st_transform(geom,3857),
        st_transform(
                ST_SetSRID(
                        ST_MakePoint( -71.1043443253471,42.3150676015829)
                        ,4326)
                ,3857)
        ,100000);
  • With parallel: 1222
  • Without parallel: 2885

MercLength

explain analyze verbose 
select count(*) 
from points 
where
MercLength(
        ST_MakeLine(
                st_transform(geom,3857),
                st_transform(
                        ST_SetSRID(ST_MakePoint(-71,42),4326)
                        ,3857)
        )
) > kpi^10;
  • With parallel: 2043
  • Without parallel: 4978

OrientedEnvelope

explain analyze  
select sum(r) 
from
(
        select sum(st_area(orientedenvelope(geom))) as r
        from points 
        group by kpi % 50000
        union all
        select sum(st_area(orientedenvelope(geom))) as r
        from points 
        group by kpi % 70000
) a;
  • With parallel: 8727
  • Without parallel: 22693

Sieve

explain analyze
select sum(st_area(sieve(geom,0.1)))
from countries
where iso_a2 like 'ES'
union all
select sum(st_area(sieve(geom,0.1)))
from countries
where iso_a2 like 'IT';
  • With parallel: 8368
  • Without parallel: 12244

SmartShrink

explain analyze
select sum(st_area(smartshrink(geom,0.1)))
from countries where ogc_fid % 7 = 0
UNION ALL
select sum(st_area(smartshrink(geom,0.1)))
from countries where ogc_fid % 5 = 0;
  • With parallel: 15129
  • Without parallel: 21061

TileBBox

explain analyze
select count(*)
from countries
where st_centroid(geom) && tilebbox(6,28,26)
union
select count(*)
from points
where st_centroid(geom) && tilebbox(6,28,27);
  • With parallel: 470
  • Without parallel: 879

ToPoint

explain analyze
select count(*)
from countries
where ToPoint(geom) && tilebbox(6,28,26)
union
select count(*)
from points
where ToPoint(geom) && tilebbox(6,28,27);
  • With parallel: 3857
  • Without: 4948

jsanz and others added 6 commits November 29, 2019 15:46
Added parallel safe keyword to function definitions

Since Postgres 9.6 the `PARALLEL SAFE` modifiers for function definition enables the execution of queries using one or more workers in parallel (more details [here](https://www.postgresql.org/docs/current/parallel-safety.html)).

This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present.

I've done tests on all functions to ensure they don't block parallel executions with the following procedure:

* Created a sample database on a `postgres:12` Docker container with Postgis 3 installed
* Created a sample table `points` with 1M random points and a random `kpi` integer column
* Imported the layer `ne_10m_admin_1_states_provinces` from Natural Earth to have also a multipolygon table
* Relaxed the conditions for parallel execution:
```
[local] postgres@bench=#  set parallel_setup_cost = 10;
SET
Time: 0.322 ms
[local] postgres@bench=# set parallel_tuple_cost = 0.001;
SET
```

* Run sample queries with the keyword enabled until the parallel execution was activated
* Removed the keyword and run the same query for comparison.

The tests below shared are just meant to showcase that `PARALLEL SAFE` can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that.

It's worth noting that this makes these functions only available for [Postgres 9.6](https://www.postgresql.org/docs/9.6/sql-createfunction.html) onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way.

<details><summary>Query examples</summary>

For each query, the execution time in milliseconds with and without the `PARALLEL SAFE` keyword is shared.

### MakeArc

    explain analyze 
    select makearc(geom_lag, geom, geom_lead)
    from (
            select lag(geom) over (partition by kpi % 3 order by kpi) geom_lag,
                    geom,
                    lead(geom) over (partition by kpi % 3 order by kpi) geom_lead
            from points
    ) a 
    union all
    select makearc(geom_lag, geom, geom_lead)
    from (
            select lag(geom) over (partition by kpi % 4 order by kpi) geom_lag,
                    geom,
                    lead(geom) over (partition by kpi % 4 order by kpi) geom_lead
            from points
    ) a;

- With parallel: 1148
- Without parallel: 19395

### MercBuffer

    explain analyze verbose 
    select count(*) 
    from points 
    where st_area(mercbuffer(geom,kpi)) > kpi^10;

- With parallel: 8942
- Without parallel: 25351

### MercDWithin

    explain analyze verbose 
    select count(*) 
    from points 
    where
    MercDWithin(
            st_transform(geom,3857),
            st_transform(
                    ST_SetSRID(
                            ST_MakePoint( -71.1043443253471,42.3150676015829)
                            ,4326)
                    ,3857)
            ,100000);

- With parallel: 1222
- Without parallel: 2885

### MercLength

    explain analyze verbose 
    select count(*) 
    from points 
    where
    MercLength(
            ST_MakeLine(
                    st_transform(geom,3857),
                    st_transform(
                            ST_SetSRID(ST_MakePoint(-71,42),4326)
                            ,3857)
            )
    ) > kpi^10;

- With parallel: 2043
- Without parallel: 4978

### OrientedEnvelope

    explain analyze  
    select sum(r) 
    from
    (
            select sum(st_area(orientedenvelope(geom))) as r
            from points 
            group by kpi % 50000
            union all
            select sum(st_area(orientedenvelope(geom))) as r
            from points 
            group by kpi % 70000
    ) a;

- With parallel: 8727
- Without parallel: 22693

### Sieve

    explain analyze
    select sum(st_area(sieve(geom,0.1)))
    from countries
    where iso_a2 like 'ES'
    union all
    select sum(st_area(sieve(geom,0.1)))
    from countries
    where iso_a2 like 'IT';

- With parallel: 8368
- Without parallel: 12244

### SmartShrink

    explain analyze
    select sum(st_area(smartshrink(geom,0.1)))
    from countries where ogc_fid % 7 = 0
    UNION ALL
    select sum(st_area(smartshrink(geom,0.1)))
    from countries where ogc_fid % 5 = 0;

- With parallel: 15129
- Without parallel: 21061

### TileBBox

    explain analyze
    select count(*)
    from countries
    where st_centroid(geom) && tilebbox(6,28,26)
    union
    select count(*)
    from points
    where st_centroid(geom) && tilebbox(6,28,27);

- With parallel: 470
- Without parallel: 879

### ToPoint

    explain analyze
    select count(*)
    from countries
    where ToPoint(geom) && tilebbox(6,28,26)
    union
    select count(*)
    from points
    where ToPoint(geom) && tilebbox(6,28,27);

- With parallel: 3857
- Without: 4948

</details>
I realized that I skipped `CleanInt` and `CleanNumeric` on my [last PR](#1) 😞. These two functions implemented the same approach as the [`as_numeric`](https://github.com/openmaptiles/openmaptiles/blob/master/layers/building/building.sql#L4-L14) on OpenMapTiles repo, with the only difference being that these return `NULL` instead of `-1`. The approach of using an exception avoids running them in parallel.

Hence, I decided to bring the regular expression approach to these functions I proposed on openmaptiles/openmaptiles#728. Here there is already a testing mechanism, so I added more checks to ensure more cases are covered. I also improved a bit the tests to allow compare rounded results against a given number of decimals, instead of using `floor`.

Finally, I updated the Travis configuration, since the current one was not working properly. This took more than expected (in fact ended up almost using the same configuration used at [CartoDB/cartodb-postgresql](https://github.com/CartoDB/cartodb-postgresql/blob/master/.travis.yml), but the outcome is good in my opinion, since we now can test all the functions in PostGIS 2.5 on Postgres 9.6, 10, 11, and 12 and PostGIS 3 on 12. You can see the build execution in my [Travis account](https://travis-ci.org/jsanz/postgis-vt-util).

I also found another issue, the function `OrientedEnvelope`, equivalent to  PostGIS 3 [`ST_OrientedEnvelope`](https://postgis.net/docs/ST_OrientedEnvelope.html) but implemented in SQL, yields different results in PostGIS 2.5 and 3. I haven't put time debugging the algorithm, we may want to open a new issue for this, but not sure about how much is worth, being PostGIS 3 the current stable release.

If this PR is merged, we can then make OpenMapTiles `as_numeric` to be just a wrapper to return a `-1`, or if it makes more sense, just replace the function to use `CleanNumeric` I actually prefer a function like this to return a `null` value instead of a value that is a valid input.


* Replaced CleanInt/CleanNumeric by parallel safe versions, improved tests
* fixes in test and travis config
* fix postgis version test for orientedenvelope check
* Update .travis.yml
Co-Authored-By: Yuri Astrakhan <yuriastrakhan@gmail.com>
* added new line at the end of the file
Co-authored-by: Yuri Astrakhan <yuriastrakhan@gmail.com>
Current implementation crashes if the value is a single '-' or '+' char, or in other edge cases like 'e2'.
The code has been rewritten from openmaptiles/openmaptiles#785 (comment) suggestion based on @RhodiumToad suggested code.

This change was fully tested locally using `npm test`

P.S. Admins, please enable [travis-ci build](https://travis-ci.org/github/openmaptiles/postgis-vt-util) for this repo.

cc @jsanz @sfkeller
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants