Skip to content

parachain informant #8332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
May 14, 2025
Merged

parachain informant #8332

merged 31 commits into from
May 14, 2025

Conversation

sistemd
Copy link
Contributor

@sistemd sistemd commented Apr 24, 2025

Closes #8216.

@sistemd sistemd requested review from skunert and iulianbarbu April 24, 2025 22:57
@alindima
Copy link
Contributor

I think this has an overlap with #8230, which adds several similar metrics and logs much of the same events

@sandreim
Copy link
Contributor

I think this has an overlap with #8230, which adds several similar metrics and logs much of the same events

Yes, there is indeed, but it seems to focus on other metrics.

@sistemd thanks for the PR, please also add an unincluded segment size metric. You could also check if we already have other metric for block authorship duration, if we don't we should have that metric. We already have the log for it.

@skunert
Copy link
Contributor

skunert commented Apr 25, 2025

Ah, I was not aware of the overlap. However, the scope seems slightly different. On the parachain informant we can also give a bit more detailed info since we decode the header.

Otherwise already looking pretty good already, some points:

  • The log messages should be a bit more scarce, we can put both the backed and included block in one message, together with the relay parent. Maybe something like Parachain status changed at #relay_num (0xrelay_block): backed #x (0xbacked) included #y (0xincluded)
  • The parachain informant itself should live in cumulus-client-service crate . This is the crate where we have the components that are instantiated during node construction. Then we can also instantiate it in cumulus-test-service and parachain-template nodes. (should be added to this PR)
  • The informant should then not depend on cumulus-client-pov-recovery. IMO transforming the streams of relay blocks into a stream of backed blocks and a stream of finalized blocks can be done in some utility crate, which is then used by the previous users and the informant. Looked a bit around but nothing stuck out. Maybe we should create cumulus-client-relay-chain-streams or something.

Copy link
Contributor

@iulianbarbu iulianbarbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much context over this, but I am catching up on the go. Nothing outstanding from me, just a few comments. Looking forward to the rest of the changes.

@sistemd sistemd marked this pull request as ready for review April 27, 2025 20:43
@sistemd
Copy link
Contributor Author

sistemd commented Apr 27, 2025

@skunert (in the order of your points above):

@sistemd
Copy link
Contributor Author

sistemd commented Apr 27, 2025

I think this has an overlap with #8230, which adds several similar metrics and logs much of the same events

Yes, there is indeed, but it seems to focus on other metrics.

@sistemd thanks for the PR, please also add an unincluded segment size metric. You could also check if we already have other metric for block authorship duration, if we don't we should have that metric. We already have the log for it.

@sandreim Does this make sense? I'm not sure if those are the right metrics, but hopefully it's on the right track.

@sistemd
Copy link
Contributor Author

sistemd commented Apr 29, 2025

Sorry for re-requesting your review @skunert, I haven't addressed your comments above yet.

@sistemd
Copy link
Contributor Author

sistemd commented Apr 29, 2025

@skunert Should be ready now 👍

loop {
select! {
(backed_block, relay_block) = backed_blocks.select_next_some() => {
match &last_backed_block {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is a new included block but nothing backed? This can happen as well. In that case we should still print the included update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know this was possible. In this case I think it's better to separate the included and backed logs again. Because what do you log if you get to included block N at relay block 15 and after that you get backed block 0 at relay block 1 for example? How are you supposed to log both at the same time in this case? It starts to get a bit confusing, I think separating them is the simplest option. 24747e7

Copy link
Contributor

@iulianbarbu iulianbarbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty nice! I am wondering if you tried out running a parachain with zombienet and check how the logs and prometheus metrics show up, while relaychain & parachain blocks get produced and finalized. Would just test it at the end to wrap everything up, once open comments are resolved. LMK what you think.

Comment on lines 668 to 674
log::info!(
"New backed block at relay block #{} ({}): #{} ({})",
relay_block.number(),
relay_block.hash(),
backed_block.number(),
backed_block.hash(),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use tracing? I think that even if it is not structured, should still work

Copy link
Contributor Author

@sistemd sistemd May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate uses log everywhere, that's why I used log as well. Maybe the whole crate should change? BTW, why is this not standard across the entire codebase? Why would you ever use log instead of tracing anyway?

Copy link
Contributor

@iulianbarbu iulianbarbu May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this codebase started long ago when maybe tracing wasn't stabilized, or not so popular (just a guess). Currently the binaries we have use a tracing-subscriber that is configured to capture logs coming from both tracing & log. I noticed the initiative is to use more of tracing (some places had recent changes from log to tracing, e.g. transaction-pool). I think all in all tracing ecosystem is a better fit for what we do here. I haven't used log that much in the past, but I find it maybe useful for small projects like simple tools, not very oriented on running in a production environment (although that's certainly not the case here, and it can still be used in production).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate uses log everywhere, that's why I used log as well. Maybe the whole crate should change?

Looks like it pulls log via sc_telemetry. However, its purpose is not to help with logging utilities for consumer crates, it is meant to consume tracing based events via the workers it defines as far as I understand from the public docs. I think we should use sc_tracing, even if our public docs mentions the crate is unstable. So yeah, maybe the whole crate needs to change, but of course, should be separate work. I guess we can live with the log crate for a while here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can resolve this once a new issue is created that tracks this (we can also add a getting started label to the issue since it should be trivial - it can be done by you or anyone else who has time).

@sandreim
Copy link
Contributor

I think this has an overlap with #8230, which adds several similar metrics and logs much of the same events

Yes, there is indeed, but it seems to focus on other metrics.
@sistemd thanks for the PR, please also add an unincluded segment size metric. You could also check if we already have other metric for block authorship duration, if we don't we should have that metric. We already have the log for it.

@sandreim Does this make sense? I'm not sure if those are the right metrics, but hopefully it's on the right track.

Yeah, looks good, thx

},
};
log::info!(
"Candidate #{} ({}) backed with parent {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm even though we now use candidate_events to fetch the events, the semantics should not change.

You are logging now the relay_parent of the parachain block. This is not the same as the relay chain block that emitted the events.

The previous logic was correct, first assemble which events where emitted in a relay chain block. Then log all of them in a single log line for the relay block which contained the events. The relay parent that is contained in the descriptor provides the context for validation of this parachain block, but it is not what we want to know for debugging. There we are interested at which block the event was emitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sistemd sistemd force-pushed the sistemd/parachain-informant branch from 1b595dd to 2fa4062 Compare May 9, 2025 15:01
@sistemd
Copy link
Contributor Author

sistemd commented May 9, 2025

/cmd prdoc --audience runtime_dev --audience node_operator --bump patch

github-actions bot and others added 2 commits May 9, 2025 15:08
Copy link
Contributor

@skunert skunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last nits, then is ready to merge IMO.

Please add it to the parachain template as @iulianbarbu suggested here:

pub async fn start_parachain_node(

Nice work!

@skunert
Copy link
Contributor

skunert commented May 12, 2025

We should probably start thinking about how long will omni-node and parachain-template-node be "in sync". I would open an issue and tackle it sometime soon where we deprecate the node in the README, and a date due for this, and recommend clearly omni-node usage (which should be 100% the same in terms of usage and features). There are other templates nodes like minimal or solochain which showcase aspects of implementing a node from scratch, but parachain-template-node is mostly replaced by omni-node from what I can tell.

Hmm I am not fully convinced that we can remove it. For sure usage should be discouraged if one can get away with omni-node. But in the end we still need a way to start a fresh parachain, and the simple node-template will not be a good starting point. People would need to look at omni-node-lib and copy their service from there.

@sistemd
Copy link
Contributor Author

sistemd commented May 12, 2025

Last nits, then is ready to merge IMO.

Please add it to the parachain template as @iulianbarbu suggested here:

pub async fn start_parachain_node(

Nice work!

I think start_parachain_node calls start_relay_chain_tasks here, which in turn spawns the parachain informant. Is this all right? Am I missing something?

@iulianbarbu
Copy link
Contributor

Last nits, then is ready to merge IMO.
Please add it to the parachain template as @iulianbarbu suggested here:

pub async fn start_parachain_node(

Nice work!

I think start_parachain_node calls start_relay_chain_tasks here, which in turn spawns the parachain informant. Is this all right? Am I missing something?

Upsy, I think you are right. I think I looked at it wrong, thought that we add the task to omni-node only, but now I see it is in cumulus/client/service. We should be fine.

Copy link
Contributor

@iulianbarbu iulianbarbu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work 💪 ! Just a few more nits.

LE: would be good to include @skunert suggestions before merging, CI needs some love too.

@sistemd sistemd force-pushed the sistemd/parachain-informant branch from 130d6ef to 7713c0a Compare May 13, 2025 13:29
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/14997912118
Failed job name: test-linux-stable

@sistemd sistemd enabled auto-merge May 14, 2025 08:59
@sistemd sistemd added this pull request to the merge queue May 14, 2025
Merged via the queue into master with commit fc3a207 May 14, 2025
247 of 249 checks passed
@sistemd sistemd deleted the sistemd/parachain-informant branch May 14, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T4-runtime_API This PR/Issue is related to runtime APIs. T9-cumulus This PR/Issue is related to cumulus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cumulus: Add better debug log output
5 participants