Skip to content

Conversation

@msfstef
Copy link
Contributor

@msfstef msfstef commented Dec 9, 2025

Closes #3451

With @magnetised 's move of the handle -> shape and shape -> handle lookups into the ShapeDb abstraction, we actually no longer need to persist anything else in ShapeStatus.

For snapshot_started? we can directly read from the storage, which has its own read-through cache, and since this is only called in await_snapshot_start? we can just directly call it there and get rid of all the unused APIs.

For latest_offset, I've taken that out of ShapeStatus and instead we read it from storage with the same read-through cache. The ShapeCache APIs are responsible for collating shape handles found in the indices with an offset from the storage to maintain the same outward facing API behaviour.

With this change, ShapeStatus no longer either recovers any of the metadata (getting rid of 66% of reads on a cold restore), nor persists any thing other than the lookups. Everything else is populated upon recovery.

I've gotten rid of all unused APIs and calls in the consumer and shape status, and I took the chance to do some API renaming for consistent behaviour (e.g. fetch_handle_by_shape and fetch_shape_by_handle, and both do {:ok, res} or :error).

I've also split Storage.get_current_position into Storage.get_latest_offset and Storage.get_pg_snapshot to avoid doing two reads when only one is needed, since in most places we need either one or the other. Only the consumer on initialization needs both, and it can simply call both APIs with the same performance.

I've also changed ETS tables for most things to set from ordered_set, basically all except the relation lookups since most don't do any ordered/range lookups.

If I have time I will also benchmark cold shape recovery for 50k shapes with this change.

@msfstef msfstef requested review from alco and magnetised December 9, 2025 17:32
@codecov
Copy link

codecov bot commented Dec 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.13%. Comparing base (850ad3d) to head (fe0d5c8).
⚠️ Report is 4 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3572   +/-   ##
=======================================
  Coverage   88.13%   88.13%           
=======================================
  Files          18       18           
  Lines        1643     1643           
  Branches      409      412    +3     
=======================================
  Hits         1448     1448           
  Misses        193      193           
  Partials        2        2           
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/typescript-client 93.76% <ø> (ø)
packages/y-electric 55.66% <ø> (ø)
typescript 88.13% <ø> (ø)
unit-tests 88.13% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@msfstef msfstef force-pushed the msfstef/use-storage-directly-for-metadata branch from 20001a4 to 8ee5277 Compare December 9, 2025 17:34
@msfstef msfstef self-assigned this Dec 9, 2025
@msfstef
Copy link
Contributor Author

msfstef commented Dec 10, 2025

benchmark this

@msfstef
Copy link
Contributor Author

msfstef commented Dec 10, 2025

One thing I figured out is that the read through cache is not actually active if the shape consumer is not "alive", because when a consumer is shut down we also take down the stack ETS entry:

def terminate(writer_state(opts: opts) = state) do
close_all_files(state)
delete_shape_ets_entry(opts.stack_id, opts.shape_handle)
end

defp terminate_writer(state) do
{writer, state} = Map.pop(state, :writer)
try do
if writer, do: ShapeCache.Storage.terminate(writer)
rescue
# In the case of shape removal, the deletion of the storage directory
# may happen before we have a chance to terminate the storage
File.Error -> :ok
end
state
end

So somehow we need to keep alive this read-through cache, or use a different one since this one is also not alive if the consumer is never started (which would be common with lazy initialization). Open to ideas, I would like to have a single cache ideally rather than a separate one. Perhaps upon recovering shapes we do an initialization on them that isn't necessarily starting a whole process?

@magnetised would like to hear your thoughts

Copy link
Contributor

@magnetised magnetised left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Feels like we're working with the grain of this thing

end

defp mark_snapshot_started(%State{stack_id: stack_id, shape_handle: shape_handle} = state) do
:ok = ShapeCache.ShapeStatus.mark_snapshot_as_started(stack_id, shape_handle)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re our conversation yesterday about giving shape db some kind of validity flag, is it worth keeping this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not because the validity will be reliant on a snapshot_finished? rather than this snapshot started, so it needs to be implemented in a different way anyway.

ctx do
expect_shape_status(
set_latest_offset: fn _, @shape_handle1, _ ->
raise "The unexpected error"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we keep this test but just move the crash onto a function that is called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried but couldn't find something that made sense - we are literally crashing the process in the next test (which a raise does exactly that, sends an exit signal to itself AFAIK) so I figured this is unnecessary coverage

@magnetised
Copy link
Contributor

benchmark this

@magnetised
Copy link
Contributor

So somehow we need to keep alive this read-through cache, or use a different one since this one is also not alive if the consumer is never started (which would be common with lazy initialization). Open to ideas, I would like to have a single cache ideally rather than a separate one. Perhaps upon recovering shapes we do an initialization on them that isn't necessarily starting a whole process?

Good catch. We need some stack-level equivalent to the shape status ets for the storage. We have the Storage.stack_start_link/1 function so that shouldn't be too hard to wrangle in

@msfstef msfstef force-pushed the msfstef/use-storage-directly-for-metadata branch from 8ee5277 to 1f87aca Compare December 11, 2025 09:20
@electric-sql electric-sql deleted a comment from github-actions bot Dec 11, 2025
@msfstef
Copy link
Contributor Author

msfstef commented Dec 11, 2025

benchmark this

@msfstef msfstef requested a review from magnetised December 11, 2025 14:07
@msfstef
Copy link
Contributor Author

msfstef commented Dec 11, 2025

@magnetised I've added a read-through cache on the storage layer so that readers directly go from memory when the shape is "sleeping"

@github-actions
Copy link
Contributor

github-actions bot commented Dec 11, 2025

Benchmark results, triggered for 43ec0

  • write fanout completed

write fanout results

  • unrelated shapes one client latency completed

unrelated shapes one client latency results

  • many shapes one client latency completed

many shapes one client latency results

  • concurrent shape creation completed

concurrent shape creation results

  • diverse shape fanout completed

diverse shape fanout results

Copy link
Contributor

@magnetised magnetised left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, though I'm confused why we're only reading from the read through cache when the writer is inactive. Feels messy to have two modes. Would it be horrible to remove the split?

end
end

defp delete_shape_ets_read_through_cache_entry(stack_id, shape_handle) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'm all for clear function names but this is a little verbose, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very verbose - but couldn't think of a better name that retains the clarity

WriteLoop.cached_chunk_boundaries(initial_acc)
)

# remove any existing read-through cache now that writer is active
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the read through cache is now unnecessary, and will also be outdated, so I'd rather remove it rather than leave outdated data hanging in memory

@msfstef
Copy link
Contributor Author

msfstef commented Dec 11, 2025

Looks good, though I'm confused why we're only reading from the read through cache when the writer is inactive. Feels messy to have two modes. Would it be horrible to remove the split?

@magnetised the read through cache is a cache that gets populated on read - the storage_meta cache is a purely write-populated cache. I was slightly worried about any potential conflicts, but that's easily resolvable.

Because storage_meta stores a bunch of things that are only relevant when a writer is active, I wasn't sure if it was reasonable to keep the whole thing around for as long as the shape itself is around. I think you're right that we can consolidate the two, I was kind of trying to not "touch" the main implementation too much in implementing this I suppose, while implementing the desired behaviour.

If you're willing to have a go at consolidating them I'd be happy to accept some help on this. We could also merge this and do it in a separate PR (?)

Copy link
Member

@alco alco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lovely!

@magnetised
Copy link
Contributor

@msfstef I switched things so that some keys are always written to the stack cache, even if there's a writer active. Simplifies things at the expense of splitting reads/writes across two tables.

Copy link
Contributor Author

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magnetised love the change - had some comments about naming that feels very hard to decode what the intent was, and questions around what the existing ETS storage does now that it is not getting updated for these keys

) do
if old_last_persisted_txn_offset != last_persisted_txn_offset do
write_metadata!(opts, :last_persisted_txn_offset, last_persisted_txn_offset)
write_cached_metadata!(opts, :last_persisted_txn_offset, last_persisted_txn_offset)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how this affects performance, I had decided to leave it as is purely for the purpose of not doing additional roundtrips to ETS as there seemed to have been an intentional effort to avoid them.

This call is on the hot path so it's an important thing to note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last_persisted_txn_offset is available to the read path, so if we don't do a write-through here then reads will go to the disk (or will be stale).
also note that the original does a write to the disk here, then a write to ets immediately below, so there's not a huge difference (apart from having to split the writes between tables, so not being able to issue a single update_element call

do: Postgrex.query!(ctx.db_conn, stmt)

Process.sleep(100)
Process.sleep(120)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could alternatively just have a helper assert_with_timeout where it retries the assertion for a maximum of 500ms

]

# Record that caches read-through metadata to avoid disk reads
defrecord :storage_meta_stack,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name storage_meta_stack as opposed to storage_meta I have to admit is a bit confusing - what was the reasoning for this name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what you mean by "as opposed to" here. I renamed it from storage_meta_read_through to storage_meta_stack - the reasoning is that these are the values maintained at the stack level, not by the writer.

I'll admit that the naming isn't 100%. perhaps storage_meta_global or something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I figured, the confusing aspect is that storage_meta is also stored in what we call the "stack ets" - so the whole thing is a bit hard to decode

a reference to it being a cache record (similar to how you renamed the ets to a cache table) might work?

end

defp write_metadata_cache(%__MODULE__{} = opts, key, value)
when key in @stack_cached_keys do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're always writing to this, and not the other, should the other not be storing them in ETS at all? This looks like a good change but I'm confused that it's leaving "code residue" in the existing ETS that's a bit hard to decode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing ets is still used for everything that isn't made available for the read path. also, what do you mean by "the other" here. wanna be sure what you mean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by "the other" I do mean the original stack ets - basically I'm asking, given that we're storing the same keys in two different ETS tables and records, how are we ensuring they are never out of sync?

@magnetised magnetised force-pushed the msfstef/use-storage-directly-for-metadata branch from 6f73ec4 to 32829d4 Compare December 17, 2025 16:25
@magnetised
Copy link
Contributor

benchmark this

@github-actions
Copy link
Contributor

github-actions bot commented Dec 17, 2025

Benchmark results, triggered for 32829

  • write fanout completed

write fanout results

  • unrelated shapes one client latency completed

unrelated shapes one client latency results

  • many shapes one client latency completed

many shapes one client latency results

  • concurrent shape creation completed

concurrent shape creation results

  • diverse shape fanout completed

diverse shape fanout results

@magnetised
Copy link
Contributor

magnetised commented Dec 18, 2025

@msfstef this now uses the stack ets for the read cache and write cache, i've merged into this commit:

fe0d5c8 (#3572)

the read path is careful about initializing the cache from disk -- theres a potential race with the writer initialising so it uses insert_new, the write path isn't - it just initializes the cached values because it knows that it's in charge. rather than deleting the cached values when the writer terminates, it just nils the cached values that aren't needed for the write path.

i've looked at clean up as well and decided to always explictly delete the writer's ets table at the cost of a few extra file writes.

@magnetised magnetised force-pushed the msfstef/use-storage-directly-for-metadata branch from 28d702d to fe0d5c8 Compare December 18, 2025 10:03
Copy link
Contributor Author

@msfstef msfstef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful, much better @magnetised

Only question is around populating the read through cache - it seems that if it is missing, we populate it in its entirety with all read path keys (rather than lazily populating each key) - was this deliberate? Why not only load the desired key with actual data and the rest with placeholders?

:removed -> :ok
end
# always need to terminate writer to remove the writer ets (which belongs
# to this process). leads to unecessary writes in the case of a deleted
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify - the unnecessary writes you mean after the shape has been marked for removal, it might receive some events that it will write to the log? in a future pr we could probably just fix that since we mark the removal anyway

otherwise the terminate_writer technically just closes files (which is a no-op if they were already closed)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a notionally deleted shape may have some buffered writes that the Storage.terminate call will flush to disk. we could save those writes, which is what I was doing by deleting the writer, but in the end I think it's worth knowing that the cleanup will always execute.

defp populate_read_through_cache!(%__MODULE__{} = opts, extra_keys) do
%{shape_handle: handle, stack_ets: stack_ets} = opts

read_keys = Enum.into(extra_keys, MapSet.new(@read_path_keys))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so to clarify - if there's no active "cache", then on any request for a single key will trigger as many reads as there are @read_path_keys? I thought the idea was to only populate it with the desired key and everything else would have a placeholder :not_cached

Unless the thought here is that if someone is doing a read, they will probably need all the read path keys anyway, so might as well load them all up front if someone needs a read. Will appreciate to know what the logic is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the thought here is that if someone is doing a read, they will probably need all the read path keys anyway, so might as well load them all up front if someone needs a read

That's the logic -- there is no partial read: if you need one value you'll need them all.

) do
if old_last_persisted_txn_offset != last_persisted_txn_offset do
write_metadata!(opts, :last_persisted_txn_offset, last_persisted_txn_offset)
write_cached_metadata!(opts, :last_persisted_txn_offset, last_persisted_txn_offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last_persisted_txn_offset is available to the read path, so if we don't do a write-through here then reads will go to the disk (or will be stale).
also note that the original does a write to the disk here, then a write to ets immediately below, so there's not a huge difference (apart from having to split the writes between tables, so not being able to issue a single update_element call

:removed -> :ok
end
# always need to terminate writer to remove the writer ets (which belongs
# to this process). leads to unecessary writes in the case of a deleted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a notionally deleted shape may have some buffered writes that the Storage.terminate call will flush to disk. we could save those writes, which is what I was doing by deleting the writer, but in the end I think it's worth knowing that the cleanup will always execute.

defp populate_read_through_cache!(%__MODULE__{} = opts, extra_keys) do
%{shape_handle: handle, stack_ets: stack_ets} = opts

read_keys = Enum.into(extra_keys, MapSet.new(@read_path_keys))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless the thought here is that if someone is doing a read, they will probably need all the read path keys anyway, so might as well load them all up front if someone needs a read

That's the logic -- there is no partial read: if you need one value you'll need them all.

@magnetised magnetised merged commit 3c88770 into main Dec 18, 2025
56 of 57 checks passed
@magnetised magnetised deleted the msfstef/use-storage-directly-for-metadata branch December 18, 2025 10:48
@github-actions
Copy link
Contributor

This PR has been released! 🚀

The following packages include changes from this PR:

  • @core/sync-service@1.2.10

Thanks for contributing to Electric!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants