Skip to content

Conversation

rbrw
Copy link
Contributor

@rbrw rbrw commented Jul 1, 2024

No description provided.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@rbrw rbrw force-pushed the pdb-5747-batch-db-inserts branch from cc89ad8 to 80d5135 Compare July 11, 2024 20:53
rbrw added 2 commits July 16, 2024 13:06
So we'll always see stack traces during development.
Rename test from exceeding-db-index-limit-produces-annotated-error and
make sure the type and title in the key match the type and title in
the resource.
@rbrw rbrw force-pushed the pdb-5747-batch-db-inserts branch 3 times, most recently from 8bfe5c0 to 6fb376a Compare July 16, 2024 21:15
@austb austb removed the don't merge label Jul 16, 2024
@austb austb marked this pull request as ready for review July 16, 2024 21:16
@austb austb requested review from a team as code owners July 16, 2024 21:16
rbrw added 17 commits July 16, 2024 16:55
When reporting an index error, get the file and line from the resource
itself, not just the attempted updates.
Drop the code we copied from java.jdbc (in order to add support for
on-conflict) in favor of hsql's :on-conflict support.

Change realize paths to batch and to be more efficient.
In order to support upcoming changes to batch inserts, adjust the
index error handler to also accept a collection of resources and
report any that might be large enough to have provoked the error.

Even though the actual pg limit is (currently) 8191 bytes, we can't be
certain of the culprit because if nothing else, that limit applies
after possible in-line compression:
https://www.postgresql.org/message-id/8326.1289618101%40sss.pgh.pa.us

Treat a resource as suspicious if the UTF-8 encoded length of any
indexed keys exceeds 8000.  Just use UTF-8 for now since the other
encodings seem unlikely:
https://www.postgresql.org/docs/current/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED
(Use plan to avoid creating unnecessary clojure data structures.)
Switch to a map/update so we'll be able to compose a whole set of the
operations as transducers.
Filter before remove-dupes so we can compose it with the other
operations when we switch them to transducers.
rbrw added 8 commits July 16, 2024 16:56
Use a composed transducer for much of the work, avoiding multiple
layers of transient garbage (per-map data structures).  Can't use a
simple composed function even if we wanted to, given the filter.

Handle the filtering more directly.
Apparently clojure-mode (now?) handles time! like time.
Merge the existing "maybe" helpers into the code, and only add keys
when needed, instead of adding and then conditionally removing them.
@rbrw rbrw force-pushed the pdb-5747-batch-db-inserts branch from 6fb376a to ae097c7 Compare July 16, 2024 22:00
@austb austb merged commit f5be6a1 into puppetlabs:main Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants