Skip to content

Fixing index mode serialization in stats#149391

Open
benwtrent wants to merge 4 commits into
elastic:mainfrom
benwtrent:bugfix-only-serialize-available-modes
Open

Fixing index mode serialization in stats#149391
benwtrent wants to merge 4 commits into
elastic:mainfrom
benwtrent:bugfix-only-serialize-available-modes

Conversation

@benwtrent
Copy link
Copy Markdown
Member

It seems like we are serializing index modes that are protected behind a feature flag. This doesn't seem right and it has caused some weirdness in rolling upgrades where stats calls will fail with the following, though no index has the mode set and the cluster never had the feature flag enabled:

java.lang.IllegalArgumentException: cannot send index mode [vectordb_document] to a node that does not support it
	at org.elasticsearch.server@9.5.0/org.elasticsearch.index.IndexMode.writeTo(IndexMode.java:880)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.monitor.metrics.IndexModeStatsActionType$NodeResponse.lambda$writeTo$0(IndexModeStatsActionType.java:117)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.common.io.stream.StreamOutput.writeMap(StreamOutput.java:646)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.monitor.metrics.IndexModeStatsActionType$NodeResponse.writeTo(IndexModeStatsActionType.java:117)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.transport.OutboundHandler.serializeMessageBody(OutboundHandler.java:372)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.transport.OutboundHandler.serialize(OutboundHandler.java:313)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.transport.OutboundHandler.sendMessage(OutboundHandler.java:238)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.transport.OutboundHandler.sendResponse(OutboundHandler.java:150)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.transport.TcpTransportChannel.sendResponse(TcpTransportChannel.java:57)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:35)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:33)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20)
	at org.elasticsearch.server@9.5.0/org.elasticsearch.action.ActionListener.respondAndRelease(ActionListener.java:411)
	at

@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label May 19, 2026
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@mayya-sharipova mayya-sharipova added the test-release Trigger CI checks against release build label May 19, 2026
public Map<IndexMode, IndexStats> stats() {
final Map<IndexMode, IndexStats> stats = new EnumMap<>(IndexMode.class);
for (IndexMode mode : IndexMode.values()) {
for (IndexMode mode : IndexMode.availableModes()) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't add a bunch of index modes that won't actually be serializable by other nodes because of a feature flag right?

Comment thread server/src/main/java/org/elasticsearch/monitor/metrics/IndicesMetrics.java Outdated
Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

static Map<IndexMode, IndexStats> getStatsWithoutCache(IndicesService indicesService) {
Map<IndexMode, IndexStats> stats = new EnumMap<>(IndexMode.class);
for (IndexMode mode : IndexMode.values()) {
for (IndexMode mode : IndexMode.availableModes()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be for in a follow up PR:
what happens with BWC stats tests in snapshot builds where all modes are available or when COLUMNAR or VECTORDB comes out of FF?

It looks to me that during rolling upgrade we will still have this error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mayya-sharipova yeah, I am not sure we get away with this unless we have cluster features that are enabled and protecting serialization of stats. I opted for the feature flag route here, but maybe this needs stronger TV serialization protection.

stats.put(indexMode, new IndexStats());
}
final DiscoveryNode node = DiscoveryNodeUtils.create("node");
final var nodeResponse = new IndexModeStatsActionType.NodeResponse(node, stats);
Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use different transport versions to exercise BWC behaviour:

 final TransportVersion olderVersion = TransportVersionUtils.getPreviousVersion(
          IndexMode.VECTORDB_DOCUMENT_INDEX_MODE, true
      );

try (BytesStreamOutput out = new BytesStreamOutput()) {
          out.setTransportVersion(olderVersion);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will still fail. Maybe we need stronger tv serialization protection. But my focus here was ensuring we don't do the silly thing and initialize with feature flagged items by default (which is confusing and wrong).

Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ben, it is good enough to unblock release.
I will follow up with an issue to add more protection of serialization of stats.

benwtrent added 2 commits May 19, 2026 09:56
…nwtrent/elasticsearch into bugfix-only-serialize-available-modes
@benwtrent benwtrent added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch test-release Trigger CI checks against release build v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants