Skip to content
This repository was archived by the owner on May 19, 2020. It is now read-only.

Conversation

@jsanz
Copy link

@jsanz jsanz commented Nov 29, 2019

Since Postgres 9.6 the PARALLEL SAFE modifiers for function definition enables the execution of queries using one or more workers in parallel (more details here).

This straight forward PR adds this modifier to this repo functions to avoid them potentially blocking parallel execution when present.

I've done tests on all functions to ensure they don't block parallel executions with the following procedure:

  • Created a sample database on a postgres:12 Docker container with Postgis 3 installed
  • Created a sample table points with 1M random points and a random kpi integer column
  • Imported the layer ne_10m_admin_1_states_provinces from Natural Earth to have also a multipolygon table
  • Relaxed the conditions for parallel execution:
[local] postgres@bench=#  set parallel_setup_cost = 10;
SET
Time: 0.322 ms
[local] postgres@bench=# set parallel_tuple_cost = 0.001;
SET
  • Run sample queries with the keyword enabled until the parallel execution was activated
  • Removed the keyword and run the same query for comparison.

The tests below shared are just meant to showcase that PARALLEL SAFE can help the faster execution of queries and is not an indicator of the improvement introduced since there are many factors that can affect that.

It's worth noting that this makes these functions only available for Postgres 9.6 onwards so maybe we could consider having both versions in separated folders. Let me know if that would make sense to modify the PR that way.

Query examples

For each query, the execution time in milliseconds with and without the PARALLEL SAFE keyword is shared.

MakeArc

explain analyze 
select makearc(geom_lag, geom, geom_lead)
from (
        select lag(geom) over (partition by kpi % 3 order by kpi) geom_lag,
                geom,
                lead(geom) over (partition by kpi % 3 order by kpi) geom_lead
        from points
) a 
union all
select makearc(geom_lag, geom, geom_lead)
from (
        select lag(geom) over (partition by kpi % 4 order by kpi) geom_lag,
                geom,
                lead(geom) over (partition by kpi % 4 order by kpi) geom_lead
        from points
) a;
  • With parallel: 1148
  • Without parallel: 19395

MercBuffer

explain analyze verbose 
select count(*) 
from points 
where st_area(mercbuffer(geom,kpi)) > kpi^10;
  • With parallel: 8942
  • Without parallel: 25351

MercDWithin

explain analyze verbose 
select count(*) 
from points 
where
MercDWithin(
        st_transform(geom,3857),
        st_transform(
                ST_SetSRID(
                        ST_MakePoint( -71.1043443253471,42.3150676015829)
                        ,4326)
                ,3857)
        ,100000);
  • With parallel: 1222
  • Without parallel: 2885

MercLength

explain analyze verbose 
select count(*) 
from points 
where
MercLength(
        ST_MakeLine(
                st_transform(geom,3857),
                st_transform(
                        ST_SetSRID(ST_MakePoint(-71,42),4326)
                        ,3857)
        )
) > kpi^10;
  • With parallel: 2043
  • Without parallel: 4978

OrientedEnvelope

explain analyze  
select sum(r) 
from
(
        select sum(st_area(orientedenvelope(geom))) as r
        from points 
        group by kpi % 50000
        union all
        select sum(st_area(orientedenvelope(geom))) as r
        from points 
        group by kpi % 70000
) a;
  • With parallel: 8727
  • Without parallel: 22693

Sieve

explain analyze
select sum(st_area(sieve(geom,0.1)))
from countries
where iso_a2 like 'ES'
union all
select sum(st_area(sieve(geom,0.1)))
from countries
where iso_a2 like 'IT';
  • With parallel: 8368
  • Without parallel: 12244

SmartShrink

explain analyze
select sum(st_area(smartshrink(geom,0.1)))
from countries where ogc_fid % 7 = 0
UNION ALL
select sum(st_area(smartshrink(geom,0.1)))
from countries where ogc_fid % 5 = 0;
  • With parallel: 15129
  • Without parallel: 21061

TileBBox

explain analyze
select count(*)
from countries
where st_centroid(geom) && tilebbox(6,28,26)
union
select count(*)
from points
where st_centroid(geom) && tilebbox(6,28,27);
  • With parallel: 470
  • Without parallel: 879

ToPoint

explain analyze
select count(*)
from countries
where ToPoint(geom) && tilebbox(6,28,26)
union
select count(*)
from points
where ToPoint(geom) && tilebbox(6,28,27);
  • With parallel: 3857
  • Without: 4948

Copy link
Member

@nyurik nyurik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome research job, thank you @jsanz !!!

@nyurik nyurik merged commit bd99a71 into openmaptiles:master Dec 6, 2019
@jsanz jsanz deleted the parallel-safe branch December 24, 2019 12:07
nyurik added a commit that referenced this pull request Jan 13, 2020
I realized that I skipped `CleanInt` and `CleanNumeric` on my [last PR](#1) 😞. These two functions implemented the same approach as the [`as_numeric`](https://github.com/openmaptiles/openmaptiles/blob/master/layers/building/building.sql#L4-L14) on OpenMapTiles repo, with the only difference being that these return `NULL` instead of `-1`. The approach of using an exception avoids running them in parallel.

Hence, I decided to bring the regular expression approach to these functions I proposed on openmaptiles/openmaptiles#728. Here there is already a testing mechanism, so I added more checks to ensure more cases are covered. I also improved a bit the tests to allow compare rounded results against a given number of decimals, instead of using `floor`.

Finally, I updated the Travis configuration, since the current one was not working properly. This took more than expected (in fact ended up almost using the same configuration used at [CartoDB/cartodb-postgresql](https://github.com/CartoDB/cartodb-postgresql/blob/master/.travis.yml), but the outcome is good in my opinion, since we now can test all the functions in PostGIS 2.5 on Postgres 9.6, 10, 11, and 12 and PostGIS 3 on 12. You can see the build execution in my [Travis account](https://travis-ci.org/jsanz/postgis-vt-util).

I also found another issue, the function `OrientedEnvelope`, equivalent to  PostGIS 3 [`ST_OrientedEnvelope`](https://postgis.net/docs/ST_OrientedEnvelope.html) but implemented in SQL, yields different results in PostGIS 2.5 and 3. I haven't put time debugging the algorithm, we may want to open a new issue for this, but not sure about how much is worth, being PostGIS 3 the current stable release.

If this PR is merged, we can then make OpenMapTiles `as_numeric` to be just a wrapper to return a `-1`, or if it makes more sense, just replace the function to use `CleanNumeric` I actually prefer a function like this to return a `null` value instead of a value that is a valid input.


* Replaced CleanInt/CleanNumeric by parallel safe versions, improved tests
* fixes in test and travis config
* fix postgis version test for orientedenvelope check
* Update .travis.yml
Co-Authored-By: Yuri Astrakhan <yuriastrakhan@gmail.com>
* added new line at the end of the file
Co-authored-by: Yuri Astrakhan <yuriastrakhan@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants