Skip to content

Conversation

anlowee
Copy link
Contributor

@anlowee anlowee commented Sep 3, 2025

Description

This PR added a getter for Deserializer to get the number of deserialized log events so far. This will be used by Velox CLP-connector to get the scanned log events for batch control (which is consistent to reading archives, which can get the total number of log events in an archive).

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Passed the CI.

Summary by CodeRabbit

  • New Features
    • Adds a public API to retrieve the current count of deserialized log events at runtime.
    • Exposes this metric via the public interface for easier access by clients and integrations.
    • Improves observability for tooling and dashboards to display deserialization progress and status.

@anlowee anlowee requested a review from a team as a code owner September 3, 2025 19:08
Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

Walkthrough

Added a new public accessor get_num_log_events_deserialized() to Deserializer in ir_stream/Deserializer.hpp, returning the internal counter m_next_log_event_idx. The method is const, [[nodiscard]], and placed after get_metadata() in the public interface.

Changes

Cohort / File(s) Summary of changes
IR stream deserializer API
components/core/src/clp/ffi/ir_stream/Deserializer.hpp
Added public method: [[nodiscard]] auto get_num_log_events_deserialized() const -> size_t, returning m_next_log_event_idx.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between bb7f3d7 and a8ae8fc.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2)
components/core/src/clp_s/JsonParser.cpp (5)
  • nodiscard (49-54)
  • nodiscard (56-61)
  • nodiscard (63-69)
  • nodiscard (71-74)
  • nodiscard (76-79)
components/core/tests/test-ir_encoding_methods.cpp (11)
  • nodiscard (88-88)
  • nodiscard (90-90)
  • nodiscard (92-92)
  • nodiscard (106-112)
  • nodiscard (114-119)
  • nodiscard (131-144)
  • nodiscard (146-149)
  • nodiscard (152-152)
  • nodiscard (154-156)
  • nodiscard (158-160)
  • nodiscard (254-290)
🔇 Additional comments (1)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1)

165-167: Accessor implementation LGTM

Const, [[nodiscard]], and placement in the public API look consistent with existing style.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between de378af and bb7f3d7.

📒 Files selected for processing (1)
  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}

⚙️ CodeRabbit configuration file

  • Prefer false == <expression> rather than !<expression>.

Files:

  • components/core/src/clp/ffi/ir_stream/Deserializer.hpp
🧬 Code graph analysis (1)
components/core/src/clp/ffi/ir_stream/Deserializer.hpp (2)
components/core/src/clp_s/JsonParser.cpp (5)
  • nodiscard (49-54)
  • nodiscard (56-61)
  • nodiscard (63-69)
  • nodiscard (71-74)
  • nodiscard (76-79)
components/core/tests/test-ir_encoding_methods.cpp (11)
  • nodiscard (88-88)
  • nodiscard (90-90)
  • nodiscard (92-92)
  • nodiscard (106-112)
  • nodiscard (114-119)
  • nodiscard (131-144)
  • nodiscard (146-149)
  • nodiscard (152-152)
  • nodiscard (154-156)
  • nodiscard (158-160)
  • nodiscard (254-290)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
  • GitHub Check: musllinux_1_2-x86_64-static-linked-bins
  • GitHub Check: centos-stream-9-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-static-linked-bins
  • GitHub Check: centos-stream-9-static-linked-bins
  • GitHub Check: ubuntu-jammy-lint
  • GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
  • GitHub Check: ubuntu-jammy-dynamic-linked-bins
  • GitHub Check: lint-check (ubuntu-24.04)
  • GitHub Check: build-macos (macos-15, false)

Comment on lines 161 to 167
/**
* @return The number of deserialized log events so far.
*/
[[nodiscard]] auto get_num_log_events_deserialized() const -> size_t {
return m_next_log_event_idx;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Accessor implementation LGTM; clarify semantics in docstring

The getter correctly exposes the running count via m_next_log_event_idx and matches the “scanned/deserialized” semantics (it increments before query filtering at Line 316–318). Consider tightening the comment to avoid ambiguity about query filtering and 0-based indexing.

-    /**
-     * @return The number of deserialized log events so far.
-     */
+    /**
+     * @return The number of log events successfully deserialized (scanned) so far,
+     *         regardless of query filtering. This equals the next 0-based log event index.
+     */
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/**
* @return The number of deserialized log events so far.
*/
[[nodiscard]] auto get_num_log_events_deserialized() const -> size_t {
return m_next_log_event_idx;
}
/**
* @return The number of log events successfully deserialized (scanned) so far,
* regardless of query filtering. This equals the next 0-based log event index.
*/
[[nodiscard]] auto get_num_log_events_deserialized() const -> size_t {
return m_next_log_event_idx;
}
🤖 Prompt for AI Agents
In components/core/src/clp/ffi/ir_stream/Deserializer.hpp around lines 161 to
167, tighten the getter docstring to explicitly state that
get_num_log_events_deserialized() returns the total number of log events
deserialized so far (equal to m_next_log_event_idx), that m_next_log_event_idx
is incremented before any query filtering so the count includes events that may
later be filtered out, and that the value is a 0-based size_t count starting at
0; update the comment accordingly to remove ambiguity about indexing and
filtering semantics.

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm.

Co-authored-by: Lin Zhihao <59785146+LinZhihao-723@users.noreply.github.com>
Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

feat(kv-ir): Add support for getting the number of log events read from the deserializer.

@junhaoliao
Copy link
Member

@coderabbitai y-scope/clp-ffi-js can be updated to replace the implementation of StructuredIrStreamReader::get_num_events_buffered() with this method. Currently, StructuredIrStreamReader has to maintain its own counter.

create an issue in y-scope/clp-ffi-js to track the effort to update the implementation

Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Create an issue in y-scope/clp-ffi-js to track updating StructuredIrStreamReader implementation
gh issue create \
  --repo y-scope/clp-ffi-js \
  --title "Update StructuredIrStreamReader to use Deserializer::get_num_log_events_deserialized()" \
  --body "## Description

The CLP core library has added a new getter method \`get_num_log_events_deserialized()\` to the Deserializer class (see y-scope/clp#1282). This method returns the number of deserialized log events so far.

Currently, \`StructuredIrStreamReader::get_num_events_buffered()\` maintains its own counter to track this information. This implementation can be updated to use the new getter method from the Deserializer class instead of maintaining a separate counter.

## Tasks

- [ ] Update \`StructuredIrStreamReader::get_num_events_buffered()\` to use \`Deserializer::get_num_log_events_deserialized()\`
- [ ] Remove the separate counter maintained in StructuredIrStreamReader
- [ ] Test the updated implementation

## References

- Original PR: https://github.com/y-scope/clp/pull/1282
- Comment: https://github.com/y-scope/clp/pull/1282#discussion_r1895471937
- Requested by: @junhaoliao" \
  --assignee junhaoliao

Length of output: 1114


@junhaoliao I've created issue y-scope/clp-ffi-js#111 in the y-scope/clp-ffi-js repository to track updating the StructuredIrStreamReader::get_num_events_buffered() implementation to use the new Deserializer::get_num_log_events_deserialized() method.

The issue includes:

  • Description of the new getter method from this PR
  • Tasks to update the implementation and remove the separate counter
  • References to this PR and your comment
  • Assignment to you for tracking

This will help consolidate the event counting logic and eliminate the need for StructuredIrStreamReader to maintain its own counter.

🐰✨ (thoughtful - adding useful tracking for future improvements)

@anlowee anlowee changed the title feat: Add a getter for IR deserializer to get the number of deserialized log events. feat(kv-ir): Add support for getting the number of log events read from the deserializer. Sep 3, 2025
@anlowee anlowee merged commit 0de99a9 into y-scope:main Sep 3, 2025
26 checks passed
@anlowee anlowee deleted the xwei/add-logevents branch September 3, 2025 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants