Pooling by KoalaGeo · Pull Request #2345 · geopython/pygeoapi

KoalaGeo · 2026-05-19T14:18:41Z

Overview

Makes the SQLAlchemy connection pool of the SQL provider configurable per provider via the existing options: block, exposing pool_size, max_overflow, pool_recycle, pool_timeout and pool_pre_ping.

Previously get_engine() called create_engine(conn_str, connect_args=connect_args, pool_pre_ping=True) with no pool sizing or recycle, so the default QueuePool held pool_size connections open for the life of each worker process and never recycled them. In multi-process deployments this produces a large number of permanently-IDLE server-side connections (we saw connections idle for days, eventually exhausting max_connections). There was no way to bound or recycle the pool from configuration.

Changes:

store_db_parameters() now extracts the five pool keys from options, coerces them to their declared types, and stores them as a sorted, hashable tuple (self.db_pool_options). They are popped out of options so they are not forwarded to the DBAPI as connect_args.
get_engine() takes a pool_options tuple parameter and applies **dict(pool_options) to create_engine(). It stays @functools.cache-able because the parameter is a hashable tuple, so engine sharing per process is preserved; providers with differing pool config correctly get distinct engines.
pygeoapi/process/manager/postgresql.py also calls get_engine(); its call site is updated to pass self.db_pool_options so the manager does not lose pool_pre_ping or skip recycling.

Backward compatibility: defaults preserve current behaviour exactly — pool_size=5, max_overflow=10, pool_pre_ping=True, and pool_recycle=-1 (SQLAlchemy's default, i.e. the current effective behaviour).

This PR is therefore a pure, opt-in feature add with no behaviour change for existing users. (See the issue for discussion of whether a finite default pool_recycle should be adopted as a separate follow-up.)

New tests and documentation are included.

Related Issue / discussion

Closes #2344.

Additional information

Example configuration:

providers:
  - type: feature
    name: PostgreSQL
    data:
      host: 127.0.0.1
      port: 5432
      dbname: test
      user: postgres
      password: postgres
      search_path: [osm, public]
    options:
      pool_size: 2          # persistent connections per worker process
      max_overflow: 3       # short-lived burst capacity
      pool_recycle: 300     # recycle connections older than 5 minutes
      pool_timeout: 30
    id_field: osm_id
    table: hotosm_bdi_waterways
    geom_field: foo_geom

Note (documented): because get_engine() is @functools.cache-d on its full argument set, providers that share a database must use identical pool options to continue sharing a single engine per worker; differing options intentionally yield separate engines.

Dependency policy (RFC2)

I have ensured that this PR meets RFC2 requirements

No new dependencies are introduced; only the standard library and the already-required SQLAlchemy are used.

Updates to public demo

I have ensured that breaking changes to the pygeoapi master demo server have been addressed
No changes required: defaults preserve existing behaviour, so the demo local.config.yml does not need to change.

Contributions and licensing

I'd like to contribute a bugfix/feature (configurable SQL connection pool) to pygeoapi. I confirm that my contributions to pygeoapi will be compatible with the pygeoapi license guidelines at the time of contribution
I have already previously agreed to the pygeoapi Contributions and Licensing Guidelines

Added connection pool options for SQL Alchemy engine.

Change pool_recycle to -1 to preserve current behavior.

Added SQLAlchemy connection-pool tuning options to configuration.

test_sql_pool_options.py exercises `store_db_parameters()` directly, requires no database, and runs in standard CI. It asserts the zero-behaviour-change defaults, override + typing, no DBAPI leakage, the existing dict-filtering, hashable/deterministic cache keys, and coexistence with search_path.

webb-ben · 2026-05-20T22:37:20Z

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

ricardogsilva

Just leaving my two cents here - I'm not a core committer so take these with a grain of salt.

Overall I agree with the PR, as adding these connection-related options seems relevant - thanks for your work and I look forward to having it merged!

Personally, I would simplify the implementation a bit, by relying on pygeoapi's JSON Schema document for the validation of the config.

And I would not include most of these tests, which I see as not being relevant.

ricardogsilva · 2026-05-21T13:43:51Z

+    # Defaults keep SQLAlchemy's QueuePool sizing but, unlike SQLAlchemy's
+    # default of -1, recycle connections after an hour so that pooled
+    # connections cannot sit IDLE on the server indefinitely.


This part of the comment seems to be outdated, as you end up setting the default value of pool_recycle to -1

ricardogsilva · 2026-05-21T14:02:09Z

+             # SQLAlchemy connection-pool tuning (optional). Defaults match
+             # SQLAlchemy's QueuePool and preserve previous behaviour.
+             # Persistent connections held open per worker process.
+             pool_size: 5
+             # Extra short-lived connections allowed above pool_size.
+             max_overflow: 10
+             # Recreate connections older than this many seconds. -1 (the
+             # default) never recycles; set a finite value (e.g. 300) so
+             # pooled connections cannot sit IDLE on the server indefinitely.
+             pool_recycle: -1
+             # Seconds to wait for a connection from the pool before erroring.
+             pool_timeout: 30
+             # Test connections with a lightweight ping before use.
+             pool_pre_ping: true


All of these new parameters need to be added to the config schema at

pygeoapi/resources/schemas/config/pygeoapi-config-0.x.yml

This will make it possible to test a pygeoapi configuration for correctness even before starting up the server.

ricardogsilva · 2026-05-21T14:10:20Z

+        (key, type(default)(options.pop(key, default)))
+        for key, default in pool_defaults.items()
+    ))
+


In my opinion this could be made easier to read and less complex by:

Storing self.db_pool_options as a dict instead of a tuple, and defer tuple creation to when get_engine is called;

Relying on the types of passed options already being correct. Adding these new parameters to the config JSON Schema (as I mentioned in my other comment) would mean that the type of each parameter would already be documented and would be enforceable by doing a validation of the config.

Also, note that your implementation contains a subtle bug when trying to parse pool_pre_ping. If the original value had been:

{'pool_pre_ping': 'False'} # I'm passing a string with the value of "False"

then the outcome would be:

# type(True)("False") True

In other words, bool("False") is actually True because non-empty strings are truthy.

-pool_defaults = { - 'pool_size': 5, - 'max_overflow': 10, - 'pool_recycle': -1, # SQLAlchemy default; preserves current behaviour - 'pool_timeout': 30, - 'pool_pre_ping': True, -} -self.db_pool_options = tuple(sorted( - (key, type(default)(options.pop(key, default))) - for key, default in pool_defaults.items() -)) +self.pool_defaults = { + 'pool_size': options.pop('pool_size', 5), + 'max_overflow': options.pop('max_overflow', 10), + 'pool_recycle': options.pop('pool_recycle', -1), # SQLALchemy default - never release connections + 'pool_timeout': options.pop('pool_timeout', 30), + 'pool_pre_ping': options.pop('pool_pre_ping', True), +}

ricardogsilva · 2026-05-21T14:15:47Z

            self.db_user,
            self._db_password,
            self.db_conn,
+            self.db_pool_options,


as per my other comment, in my opinion it would be clearer if the tuple would be generated here, perhaps also accompanied with a comment mentioning that this is made as a way to enable making use of functools.cache.

Also, in modern Python, a dict's insertion ordering is preserved, so I don't think sorting the tuple would be needed.

Suggested change

self.db_pool_options,

tuple(self.db_pool_options.items()), # convert to hashable type, for using with functools.cache

I just found out that the upcoming Python 3.15 will come with a new frozendict type. This means this new dict type will be hashable, thus being usable as part of a cacheable function.

It will likely take a long while before pygeoapi can rely on this though, but nice to see this becoming a standard feature in Python.

ricardogsilva · 2026-05-21T14:21:52Z

+def test_pool_options_defaults_preserve_current_behaviour():
+    obj = _Dummy()
+    store_db_parameters(obj, dict(CONN), {})
+    pool = dict(obj.db_pool_options)
+    # Defaults must match pre-existing effective behaviour:
+    # pool_pre_ping was hardcoded True; pool_recycle was unset (-1).
+    assert pool['pool_size'] == 5
+    assert pool['max_overflow'] == 10
+    assert pool['pool_timeout'] == 30
+    assert pool['pool_pre_ping'] is True
+    assert pool['pool_recycle'] == -1


This test seems unnecessary to me - when this PR gets merged, the behavior it implements will become the current behavior, so the test looses its relevancy.

ricardogsilva · 2026-05-21T14:30:42Z

+def test_pool_options_are_overridable_and_typed():
+    obj = _Dummy()
+    store_db_parameters(
+        obj, dict(CONN),
+        {'pool_size': 2, 'max_overflow': 3, 'pool_recycle': 300},
+    )
+    pool = dict(obj.db_pool_options)
+    assert pool['pool_size'] == 2 and isinstance(pool['pool_size'], int)
+    assert pool['max_overflow'] == 3
+    assert pool['pool_recycle'] == 300
+    # untouched keys keep defaults
+    assert pool['pool_timeout'] == 30
+    assert pool['pool_pre_ping'] is True


This test would be unnecessary if you'd go with my suggestion above, of storing db_pool_options as a dict instead of a tuple and you'd rely on the configuration being valid after having added the JSON schema bits that are missing.

ricardogsilva · 2026-05-21T14:32:21Z

+def test_dict_valued_options_still_filtered():
+    obj = _Dummy()
+    store_db_parameters(
+        obj, dict(CONN),
+        {'pool_size': 2, 'zoom': {'min': 0, 'max': 22}},
+    )
+    assert 'zoom' not in obj.db_options
+    assert dict(obj.db_pool_options)['pool_size'] == 2
+


This test seems to be unnecessary, as it is not testing the changes you made in this PR. It verifies that the contents of obj.db_options are correct.

IMO this PR does not make any changes that would warrant this verification, unless you would not trust the behavior of dict.pop, which is a Python builtin.

ricardogsilva · 2026-05-21T14:35:05Z

+def test_pool_options_hashable_and_deterministic():
+    a, b = _Dummy(), _Dummy()
+    store_db_parameters(a, dict(CONN), {'pool_size': 2})
+    store_db_parameters(b, dict(CONN), {'pool_size': 2})
+    # identical config -> identical key -> shared engine via functools.cache
+    assert a.db_pool_options == b.db_pool_options
+    assert hash(a.db_pool_options) == hash(b.db_pool_options)
+
+    c = _Dummy()
+    store_db_parameters(c, dict(CONN), {'pool_size': 9})
+    # differing pool config -> distinct key (separate engine, by design)
+    assert c.db_pool_options != a.db_pool_options
+


This is testing Python's own implementation of how tuples are hashed, so I don't think it is relevant to include in pygeoapi.

ricardogsilva · 2026-05-21T14:39:17Z

+def test_pool_options_coexist_with_search_path():
+    obj = _Dummy()
+    store_db_parameters(
+        obj, dict(CONN),
+        {'search_path': ['published', 'public'], 'pool_size': 4},
+    )
+    assert obj.db_search_path == ('published', 'public')
+    assert dict(obj.db_pool_options)['pool_size'] == 4
+


This test seems unnecessary, as it does not test the functionality introduced in this PR

KoalaGeo · 2026-05-24T04:45:34Z

Is there a reason to not pop the attributes from the connect_args inside of get_engine? This would consolidate a bit of the complications noted in the PR between hashing and the manager using get_engine. Maybe I am missing something

That's a good shout, I'll refactor

KoalaGeo added 5 commits May 19, 2026 14:56

Enhance SQL Alchemy engine with connection pool options

37f428c

Added connection pool options for SQL Alchemy engine.

Add db_pool_options to PostgreSQL connection

5841ed9

Update pool_recycle to SQLAlchemy default value

bc68af4

Change pool_recycle to -1 to preserve current behavior.

Enhance SQLAlchemy connection pooling settings

cd9c836

Added SQLAlchemy connection-pool tuning options to configuration.

tomkralidis requested review from francbartoli, tomkralidis and webb-ben May 20, 2026 12:01

tomkralidis added this to the 0.24.0 milestone May 20, 2026

ricardogsilva reviewed May 21, 2026

View reviewed changes

	self.db_pool_options,
	tuple(self.db_pool_options.items()), # convert to hashable type, for using with functools.cache

Uh oh!

Conversation

KoalaGeo commented May 19, 2026

Overview

Related Issue / discussion

Additional information

Dependency policy (RFC2)

Updates to public demo

Contributions and licensing

Uh oh!

webb-ben commented May 20, 2026

Uh oh!

ricardogsilva left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ricardogsilva May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KoalaGeo commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ricardogsilva May 21, 2026 •

edited

Loading