invalid lso fix #29112

joe-redpanda · 2025-12-23T20:56:37Z

rm_stm instantiated its lso to -1, where invalid_lso is 0
Normally, invalid_lso gets converted into lso_unavailable in the fetch path, but -1 != 0, so direct consumer received an lso of -1, thus firing an error log.

This pr changes invalid lso to be -1 in parity with hwm, and then spot updates places where lso's are being instantiated to the magic number "-1" to instead use invalid_lso.

Backports Required

Release Notes

Bug Fixes

guarantee invalid lso gets converted to a retryable error when relevant

Copilot

Pull request overview

This PR fixes an issue where rm_stm incorrectly initializes the last stable offset (LSO) to -1 instead of 0, which is the proper value for an invalid LSO. Since invalid LSO (0) is used throughout the codebase for error translation into retry-able consumer errors, this incorrect initialization prevents proper error handling.

Key Changes

Updated rm_stm::last_stable_offset() to initialize LSO with model::invalid_lso instead of -1
Updated source_partition_offsets::last_stable_offset default value to use model::offset_cast(model::invalid_lso) instead of -1

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/v/cluster/rm_stm.cc	Changed local variable initialization from -1 to `model::invalid_lso` in `last_stable_offset()` function
src/v/kafka/client/direct_consumer/api_types.h	Changed struct member default initialization from -1 to `model::offset_cast(model::invalid_lso)`

vbotbuildovich · 2026-01-05T20:56:04Z

CI test results

test results on build#78548

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason	test_history
MasterTestSuite	test_remote_partition_read_cached_index		unit	https://buildkite.com/redpanda/redpanda/builds/78548#019b8faf-6c04-4807-ba72-5c2e71ad3658	FAIL	0/1

bharathv · 2026-01-05T21:22:46Z

link the jira?

bharathv · 2026-01-05T21:27:15Z

src/v/cluster/rm_stm.cc


    auto synced_leader = _raft->is_leader() && _raft->term() == _insync_term;
-    model::offset lso{-1};
+    model::offset lso{model::invalid_lso};


q: the new behavior is that it returns offset_not_available and that triggers a retry?

redpanda/src/v/kafka/data/replicated_partition.cc

Line 181 in ceaceec

return error_code::offset_not_available;

If yes, I wonder (without this fix) if the translation here would throw an exception because it attempts translation before the start offset? How is it returning -1?

from the logs

TRACE 2025-12-23 01:45:37,297 [shard 1:fetc] tx - [{kafka/source-topic/0}] - rm_stm.cc:1322 - lso update in progress, last_known_lso: -1, last_applied: 0

from the code

auto maybe_lso = _partition->last_stable_offset(); if (maybe_lso == model::invalid_lso) { return error_code::offset_not_available; } return _translator->from_log_offset(maybe_lso);

and then translator is actually just a translator_state, code here

model::offset offset_translator_state::from_log_offset(model::offset o) const { const auto d = delta(o); return model::offset(o - d); }

no checks on valid offset nor exceptions

IMO, this is a +1 for trying to use name constants as much as possible as opposed to magic numbers

Let's try to be consistent. We track high watermark as the inclusive offset and add one when it is returned. Would it make sense to track LSO in the same way i.e. leave it as -1 and only add 1 when we need to retrieve it ?

We track high watermark as the inclusive offset and add one when it is returned
Can you point me in the direction of where this occurs?

I poked around and it seems hwm and lso get the same treatment almost everywhere in code. Its possible that lso actually gets an increment on some fetch path, in which case, maybe invalid_lso should actually be -1

in which case, maybe invalid_lso should actually be -1

that makes sense.. hopefully nothing breaks :)

tests pass, looks good afaik

joe-redpanda · 2026-01-06T22:17:50Z

link the jira?

no jira link, this was a failure flagged in private team core

Use named constants when possible.

joe-redpanda · 2026-01-08T20:32:21Z

letting ci run its course

vbotbuildovich · 2026-01-09T17:05:40Z

/backport v25.3.x

vbotbuildovich · 2026-01-09T17:06:57Z

Failed to create a backport PR to v25.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-29112-v25.3.x-367 remotes/upstream/v25.3.x
git cherry-pick -x 621a4d90b2 faf1868050

Workflow run logs.

github-actions bot added the area/redpanda label Dec 23, 2025

joe-redpanda changed the title ~~Dc fix err log~~ rm_stm invalid lso fix Jan 5, 2026

joe-redpanda requested review from bashtanov, bharathv, michael-redpanda and mmaslankaprv January 5, 2026 19:42

joe-redpanda marked this pull request as ready for review January 5, 2026 19:42

Copilot AI review requested due to automatic review settings January 5, 2026 19:42

Copilot AI reviewed Jan 5, 2026

View reviewed changes

bharathv reviewed Jan 5, 2026

View reviewed changes

joe-redpanda requested a review from bharathv January 6, 2026 22:02

joe-redpanda force-pushed the dc_fix_err_log branch from ecfa185 to 4fab444 Compare January 8, 2026 19:38

fundamental: set invalid_lso to -1

621a4d9

joe-redpanda force-pushed the dc_fix_err_log branch from 4fab444 to c5f985a Compare January 8, 2026 19:51

use constant for invalid lso

faf1868

Use named constants when possible.

joe-redpanda force-pushed the dc_fix_err_log branch from c5f985a to faf1868 Compare January 8, 2026 20:22

joe-redpanda changed the title ~~rm_stm invalid lso fix~~ [[do not review]] rm_stm invalid lso fix Jan 8, 2026

joe-redpanda changed the title ~~[[do not review]] rm_stm invalid lso fix~~ invalid lso fix Jan 9, 2026

mmaslankaprv approved these changes Jan 9, 2026

View reviewed changes

bharathv approved these changes Jan 9, 2026

View reviewed changes

joe-redpanda merged commit 0255b8b into redpanda-data:dev Jan 9, 2026
22 checks passed

vbotbuildovich mentioned this pull request Jan 9, 2026

[v25.3.x] invalid lso fix #29209

Open

invalid lso fix #29112

invalid lso fix #29112

Uh oh!

Conversation

joe-redpanda commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Bug Fixes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

vbotbuildovich commented Jan 5, 2026

CI test results

Uh oh!

bharathv commented Jan 5, 2026

Uh oh!

bharathv Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Jan 6, 2026

Choose a reason for hiding this comment

Uh oh!

mmaslankaprv Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

bharathv Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

joe-redpanda commented Jan 6, 2026

Uh oh!

joe-redpanda commented Jan 8, 2026

Uh oh!

Uh oh!

vbotbuildovich commented Jan 9, 2026

Uh oh!

vbotbuildovich commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joe-redpanda commented Dec 23, 2025 •

edited

Loading