Skip to content

Use RelidByRelfilenumberWithTempTables instead of RelidByRelfilenumber#8303

Closed
naisila wants to merge 8 commits intom3hm3t/pg18_beta_confsfrom
naisila/pg18_columnar_temp
Closed

Use RelidByRelfilenumberWithTempTables instead of RelidByRelfilenumber#8303
naisila wants to merge 8 commits intom3hm3t/pg18_beta_confsfrom
naisila/pg18_columnar_temp

Conversation

@naisila
Copy link
Contributor

@naisila naisila commented Oct 30, 2025

Fixes #8235

PG18 and PG latest minors ignore temporary relations in RelidByRelfilenumber (RelidByRelfilenode in PG15)
Relevant PG commit:
postgres/postgres@86831952
Based on the PG18 commit description, the window where the ERRORs happen with temporary tables is very narrow, requiring an OID wraparound to create a lookup conflict in RelidByRelfilenumber() with a temporary table reusing the same OID as another relation already cached. We don't expect that high OID consumption rate with a higher number of temporary relations created in citus columnar, so we can just go back to previous behavior and include temp tables in the check, as shown below:

We write a new function, RelidByRelfilenumberWithTempTables, which does an extra check in case RelidByRelfilenumber returned InvalidOid (i.e. 0).
This extra check is essentially a copy paste of previous RelidByRelfilenumber behavior in case of a temp table.

m3hm3t and others added 7 commits October 27, 2025 18:51
Pg18 beta conf file updated

(cherry picked from commit c36410c)

Update image suffix in build and test workflow

Update image suffix in build configuration

Update image suffix in build configuration

Update image suffix in build configuration

(cherry picked from commit 7dbb946)

Update image suffix in build_and_test.yml to reflect latest development version

Update PostgreSQL version to 18beta3 in Dockerfile and CI workflow
… WHERE … IS NOT NULL (PG15–PG18) (#8139)

DESCRIPTION: Stabilize multi_insert_select expected: accept unqualified
columns in WHERE … IS NOT NULL

fixes #8133 


**Context**

* With PG18, ruleutils adds a GROUP RTE and improves column-name dedup.
As a side-effect, Vars that point at the GROUP RTE print as unqualified
column names even when `varprefix` is true.
* In Citus’ vendored `ruleutils_18.c` we already flattened GROUP Vars in
`targetList` and `havingQual`, but not in `jointree->quals`.
* For queries like `INSERT … SELECT … GROUP BY …`, Citus injects an
implicit null-guard on the group key in the WHERE clause. Because that
Var was still referencing the GROUP RTE, the deparser emitted `WHERE
(user_id IS NOT NULL)` instead of `WHERE (raw_events_first.user_id IS
NOT NULL)`, causing regress diffs only in grouped SELECTs.
* Related upstream change: PostgreSQL commit
`52c707483ce4d0161127e4958d981d1b5655865e` (ruleutils column-name de-dup
/ GROUP RTE exposure).

**What changed**

* Added an alternative expected file
`src/test/regress/expected/multi_insert_select_0.out` to keep CI green
across mixed environments where the qualified form may still be
produced.
The change in `merge_planner.c` fixes _unrecognized range table entry_
diffs in merge regress tests (category 2 diffs in #7992), the change in
`multi_router_planner.c` fixes _column reference ... is ambiguous_ diffs
in `multi_insert_select` and `multi_insert_select_window` (category 3
diffs in #7992). Edit to `common.py` enables standalone regress tests
with pg18 (e..g `citus_tests/run_test.py merge`).
Probably Colm's commits on `GROUP BY RTE` fixes have fixed the test
diffs we used to have.
@codecov
Copy link

codecov bot commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 86.20690% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.99%. Comparing base (188c182) to head (ab71b7c).
⚠️ Report is 18 commits behind head on m3hm3t/pg18_beta_confs.

Additional details and impacted files
@@                    Coverage Diff                     @@
##           m3hm3t/pg18_beta_confs    #8303      +/-   ##
==========================================================
+ Coverage                   88.96%   88.99%   +0.03%     
==========================================================
  Files                         287      287              
  Lines                       63134    63217      +83     
  Branches                     7941     7955      +14     
==========================================================
+ Hits                        56165    56260      +95     
+ Misses                       4660     4633      -27     
- Partials                     2309     2324      +15     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PG18 and PG latest minors ignore temporary relations in
RelidByRelfilenumber (RelidByRelfilenode in PG15)
Relevant PG commit:
postgres/postgres@86831952

We write a new function, RelidByRelfilenumberWithTempTables,
which does an extra check in case RelidByRelfilenumber returned
InvalidOid (i.e. 0).
This extra check is essentially a copy paste of previous
RelidByRelfilenumber behavior in case of a temp table.
@naisila naisila force-pushed the naisila/pg18_columnar_temp branch from a43d81f to ab71b7c Compare October 30, 2025 11:53
@naisila naisila marked this pull request as ready for review October 30, 2025 12:11
@naisila naisila requested review from colm-mchugh and m3hm3t October 30, 2025 12:11
relid = RelidByRelfilenode(reltablespace, relfilenumber);
#endif

if (relid == InvalidOid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this if statement be wrapped in a PG version check for PG18 ? Or is the ignoring of temp tables backported ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nm, saw its backported to PG13.

}

systable_endscan(scandesc);
table_close(relation, AccessShareLock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything to be said for caching the entry and looking up the cache, as RelidByRelfilenode() does ? It extends the duplication, but we'll always scan pg_class for temp tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to cache the entry since Postgres is no longer caching entries for temp tables. There is the disadvantage of scanning pg_class always for temp tables. The other suggested solution doesn't have the disadvantage #8309
What do you think?

Copy link
Contributor

@colm-mchugh colm-mchugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, its a reasonable solution for a tricky problem. Preventing columnar on temp tables would risk breaking existing apps but may be a cleaner solution long term but that's a topic for another day. Can we update the PR description to:

  1. Include the limitation details on temp tables from the PG commit; uniquely identifying a temp table requires the backend's proc number
  2. Reference the still outstanding issue with concurrent sessions, that will be fixed in PR #8252

@m3hm3t m3hm3t force-pushed the m3hm3t/pg18_beta_confs branch 4 times, most recently from 5f5da94 to 15ecb4b Compare November 5, 2025 09:11
@naisila
Copy link
Contributor Author

naisila commented Nov 5, 2025

Reference the still outstanding issue with concurrent sessions, that will be fixed in PR #8252

Thanks @colm-mchugh for your review, for sharing your concerns on reverting behavior that PG identified as broken, and also for the internal discussion on which approach to choose. Since we decided to go with #8309 instead, I will apply your suggestion to that PR.

@naisila naisila closed this Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PG18 - columnar temp tables cannot be accessed "could not open relation with oid 0"

3 participants