[data ingestion] change remote reader implementation #16469

phoenix-o · 2024-02-29T17:58:11Z

PR makes remote fetch implementation more similar to the indexer implementation

vercel · 2024-02-29T17:58:17Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
explorer	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2024 2:10pm
multisig-toolkit	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2024 2:10pm
mysten-ui	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2024 2:10pm
sui-core	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2024 2:10pm
sui-kiosk	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2024 2:10pm
sui-typescript-docs	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Mar 13, 2024 2:10pm

bmwill · 2024-03-12T01:15:41Z

crates/sui-data-ingestion-core/src/reader.rs

+        if self.remote_fetcher_receiver.is_none() {
+            self.remote_fetcher_receiver = Some(self.start_remote_fetcher());
        }
-        Ok(checkpoints)
+        while !self.exceeds_capacity(self.current_checkpoint_number + checkpoints.len() as u64) {


I'm not sure this does exactly as you expect, you may have some configured "limit" but the underlying fetcher disrespects this and will always try to fetch and enqueue batch_size*2 number of checkpoints. 1 batch_size due to the buffered stream and 1 batch_size due to the size of the channel. I suppose if this is sufficiently smaller than MAX_CHECKPOINTS_IN_PROGRESS this could be fine but seems a waste

that's a totally different check to overall guarantee that system doesn't have more than MAX_CHECKPOINTS_IN_PROGRESS tasks in progress.
batch_size << MAX_CHECKPOINTS_IN_PROGRESS. To give specific numbers currently it's 100 vs 10_000

to elaborate more, this is required so we don't attempt to read a message from remote fetcher actor that we are not allowed to process at the moment

bmwill · 2024-03-12T01:19:54Z

crates/sui-data-ingestion-core/src/reader.rs

+                Ok(Ok(checkpoint)) => checkpoints.push(checkpoint),
+                Ok(Err(err)) => {
+                    info!("remote reader transient error {:?}", err);
+                    self.remote_fetcher_receiver = None;
+                    return checkpoints;
+                }
+                Err(_) => break,


There seems to be two different error conditions that probably need to be handled differently.

The first Ok(Err(err)) seems to be a fetch that errored. Err(_) has two cases, one where the channel is empty and we just need to wait some time till the other side can fill it, and disconnected where the channel will never have anymore data.

yeah, Ok(Err(err)) is the fetch that errored. It's ok to just log it and move on, self.remote_fetcher_receiver is getting canceled further in the block and will get respawned on the next tick.
Other break clause was written specifically for Empty clause(i.e. move on until next tick so fetcher can accumulate more tasks).
So yeah, probably Disconnected is not handled properly, although I'm not sure how it will get triggered without an error in corresponding task. But will add explicit handler

PR makes remote fetch implementation more similar to the indexer implementation

vercel bot deployed to Preview – sui-core February 29, 2024 17:59 View deployment

phoenix-o requested review from bmwill, sadhansood and gegaowp February 29, 2024 19:12

phoenix-o force-pushed the di_reader branch from 242d708 to dc250ce Compare March 11, 2024 15:41

vercel bot deployed to Preview – explorer March 11, 2024 15:42 View deployment

vercel bot deployed to Preview – multisig-toolkit March 11, 2024 15:42 View deployment

vercel bot deployed to Preview – sui-typescript-docs March 11, 2024 15:42 View deployment

vercel bot deployed to Preview – sui-kiosk March 11, 2024 15:42 View deployment

vercel bot deployed to Preview – sui-core March 11, 2024 15:42 View deployment

vercel bot temporarily deployed to Preview – mysten-ui March 11, 2024 15:42 Inactive

bmwill reviewed Mar 12, 2024

View reviewed changes

phoenix-o force-pushed the di_reader branch from dc250ce to e0c728f Compare March 12, 2024 02:30

phoenix-o requested a review from bmwill March 12, 2024 02:30

vercel bot deployed to Preview – sui-kiosk March 12, 2024 02:31 View deployment

vercel bot deployed to Preview – explorer March 12, 2024 02:31 View deployment

vercel bot deployed to Preview – sui-typescript-docs March 12, 2024 02:31 View deployment

vercel bot temporarily deployed to Preview – mysten-ui March 12, 2024 02:32 Inactive

vercel bot deployed to Preview – multisig-toolkit March 12, 2024 02:32 View deployment

vercel bot deployed to Preview – sui-core March 12, 2024 02:32 View deployment

bmwill approved these changes Mar 12, 2024

View reviewed changes

[data ingestion] change remote reader implementation

ecc0d8f

phoenix-o force-pushed the di_reader branch from e0c728f to ecc0d8f Compare March 13, 2024 14:08

vercel bot deployed to Preview – explorer March 13, 2024 14:09 View deployment

vercel bot deployed to Preview – sui-kiosk March 13, 2024 14:09 View deployment

vercel bot deployed to Preview – sui-typescript-docs March 13, 2024 14:10 View deployment

vercel bot deployed to Preview – sui-core March 13, 2024 14:10 View deployment

vercel bot temporarily deployed to Preview – mysten-ui March 13, 2024 14:10 Inactive

vercel bot deployed to Preview – multisig-toolkit March 13, 2024 14:10 View deployment

phoenix-o merged commit b6f48fa into MystenLabs:main Mar 13, 2024
41 checks passed

tx-tomcat pushed a commit to tx-tomcat/sui-network that referenced this pull request May 30, 2024

[data ingestion] change remote reader implementation (MystenLabs#16469)

cae7093

PR makes remote fetch implementation more similar to the indexer implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data ingestion] change remote reader implementation #16469

[data ingestion] change remote reader implementation #16469

phoenix-o commented Feb 29, 2024

vercel bot commented Feb 29, 2024 •

edited

Loading

bmwill Mar 12, 2024

phoenix-o Mar 12, 2024 •

edited

Loading

phoenix-o Mar 12, 2024

bmwill Mar 12, 2024

phoenix-o Mar 12, 2024 •

edited

Loading

[data ingestion] change remote reader implementation #16469

[data ingestion] change remote reader implementation #16469

Conversation

phoenix-o commented Feb 29, 2024

vercel bot commented Feb 29, 2024 • edited Loading

bmwill Mar 12, 2024

Choose a reason for hiding this comment

phoenix-o Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

phoenix-o Mar 12, 2024

Choose a reason for hiding this comment

bmwill Mar 12, 2024

Choose a reason for hiding this comment

phoenix-o Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

vercel bot commented Feb 29, 2024 •

edited

Loading

phoenix-o Mar 12, 2024 •

edited

Loading

phoenix-o Mar 12, 2024 •

edited

Loading