Skip to content

Adding dry_run mode for setting data stream settings #128269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
May 23, 2025

Conversation

masseyke
Copy link
Member

@masseyke masseyke commented May 21, 2025

This adds a dry_run parameter to the REST method added in #127858. If dry_run is true, it simulates setting the settings on the data stream and indices, but does not actually change any state. For example:

PUT _data_stream/my-data-stream/_settings?dry_run=true
{
    "index.lifecycle.name" : "new-test-policy",
    "index.number_of_shards": 11
}

returns:

{
  "data_streams": [
    {
      "name": "my-data-stream",
      "applied_to_data_stream": true,
      "settings": {
        "index": {
          "lifecycle": {
            "name": "new-test-policy"
          },
          "number_of_shards": "11"
        }
      },
      "effective_settings": {
        "index": {
          "lifecycle": {
            "name": "new-test-policy"
          },
          "mode": "standard",
          "number_of_shards": "11",
          "number_of_replicas": "0"
        }
      },
      "index_settings_results": {
        "applied_to_data_stream_only": [
          "index.number_of_shards"
        ],
        "applied_to_data_stream_and_backing_indices": [
          "index.lifecycle.name"
        ]
      }
    }
  ]
}

But the settings are not actually applied to the data stream or any indices.

@masseyke masseyke marked this pull request as ready for review May 21, 2025 20:48
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@lukewhiting lukewhiting requested a review from Copilot May 22, 2025 09:42
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a dry_run mode to the data stream settings update API so that when enabled, settings are validated and simulated without being applied to the underlying data stream or its indices. Key changes include:

  • Updating request classes, transport actions, and REST handlers to accept a dry_run parameter.
  • Modifying the metadata update service to support a simulated dry run flow.
  • Enhancing tests and YAML-based integration tests to validate the dry_run behavior.

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
server/src/test/java/org/elasticsearch/action/datastreams/UpdateDataStreamSettingsActionRequestTests.java Updated test instance creation and mutation logic to include dry_run.
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDataStreamsService.java Added dry_run handling in updateSettings and simulation of settings update.
server/src/main/java/org/elasticsearch/action/datastreams/UpdateDataStreamSettingsAction.java Modified request, equals, and hashCode to include dry_run.
server/src/main/java/org/elasticsearch/TransportVersions.java Introduced a new transport version constant for dry_run.
modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/240_data_stream_settings.yml Added tests for verifying proper dry_run behavior.
modules/data-streams/src/main/java/org/elasticsearch/datastreams/rest/RestUpdateDataStreamSettingsAction.java Updated REST handler to parse the dry_run parameter.
modules/data-streams/src/main/java/org/elasticsearch/datastreams/action/TransportUpdateDataStreamSettingsAction.java Adjusted transport action methods to use the dry_run flag in processing logic.
Files not reviewed (1)
  • rest-api-spec/src/main/resources/rest-api-spec/api/indices.put_data_stream_settings.json: Language not supported
Comments suppressed due to low confidence (1)

modules/data-streams/src/main/java/org/elasticsearch/datastreams/action/TransportUpdateDataStreamSettingsAction.java:324

  • [nitpick] Although code logic is correct in the dry run branch, adding an explicit return statement after listener.onResponse(null) could improve clarity and reduce potential future maintenance errors.
if (dryRun) { listener.onResponse(null); }

if (response.isAcknowledged()) {
return clusterService.state().projectState(projectId).metadata().dataStreams().get(dataStreamName);
} else {
throw new ElasticsearchException("Updating settings not accepted for unknown reasons");
Copy link
Preview

Copilot AI May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider improving the error message here to provide more specific context about why the settings update was not accepted, which can aid in troubleshooting in both dry_run and regular flows.

Suggested change
throw new ElasticsearchException("Updating settings not accepted for unknown reasons");
throw new ElasticsearchException(
"Updating settings not accepted for project [" + projectId + "], data stream [" + dataStreamName +
"], with settings overrides: " + settingsOverrides.toString()
);

Copilot uses AI. Check for mistakes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @masseyke, I have been using your PR's as a bit of a test bed for copilot-trial group. Feel free to ignore these and/or give feedback on how useful they are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I don't agree with every comment copilot has made, it has already found several bugs (or at least typos) in my PRs, and has made some good suggestions.

Copy link
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Underlying logic and approach look sound. Just some outstanding questions about code cleanliness to explore.

@Override
public void onResponse(AcknowledgedResponse response) {
UpdateDataStreamSettingsAction.DataStreamSettingsResponse.IndexSettingError error;
if (response.isAcknowledged() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're getting quite deeply nested at this point. Is it worth pulling out some of these blocks into methods / inner classes for readability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I switched everything over to ActionListeners helper methods rather than creating my own ActionListeners. I think it improves readability a bit. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :-) Yeah I think that's a little easier on the eyes 👍🏻

Comment on lines +452 to +470
ProjectMetadata projectMetadata = clusterState.metadata().getProject(projectId);
Map<String, DataStream> dataStreamMap = projectMetadata.dataStreams();
DataStream dataStream = dataStreamMap.get(dataStreamName);
Settings existingSettings = dataStream.getSettings();

Template.Builder templateBuilder = Template.builder();
Settings.Builder mergedSettingsBuilder = Settings.builder().put(existingSettings).put(settingsOverrides);
Settings mergedSettings = mergedSettingsBuilder.build();

final ComposableIndexTemplate template = lookupTemplateForDataStream(dataStreamName, projectMetadata);
ComposableIndexTemplate mergedTemplate = template.mergeSettings(mergedSettings);
MetadataIndexTemplateService.validateTemplate(
mergedTemplate.template().settings(),
mergedTemplate.template().mappings(),
indicesService
);

templateBuilder.settings(mergedSettingsBuilder);
return dataStream.copy().setSettings(mergedSettings).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to duplicate a lot of the task implementation starting at line 148. Anything we can do to reflector that to reduce the duplication?

Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?

Copy link
Member Author

@masseyke masseyke May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?

Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?
I'm on the fence about that. That's the way I had originally written it. In dry-run mode I would just return the original cluster state. But it felt odd running through the cluster state update mechanics just to effectively do template validation. So I pulled the template validation logic out and just ran it locally in dry-run mode. But I could see arguments either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to duplicate a lot of the task implementation starting at line 148. Anything we can do to reflector that to reduce the duplication?

Which parts are you referring to? This code is all code I had moved out of that task logic around line 148 so that I could call it from both places w/o duplication.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which parts are you referring to? This code is all code I had moved out of that task logic around line 148 so that I could call it from both places w/o duplication.

Sorry my bad -.- I cross referenced the class in my IDE but forgot I had checked out another branch. The looks good 👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?
I'm on the fence about that. That's the way I had originally written it. In dry-run mode I would just return the original cluster state. But it felt odd running through the cluster state update mechanics just to effectively do template validation. So I pulled the template validation logic out and just ran it locally in dry-run mode. But I could see arguments either way.

That's fair enough. If you have tried it and it didn't look noticeably better then I'm happy to stick with it how it is unless anyone else has strong opinions :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have tried it and it didn't look noticeably better then I'm happy to stick with it how it is unless anyone else has strong opinions :-)

I tried it again just to be sure. The problem is that when the updateSettingsExecutor is executed, the listener on my UpdateSettingsTask is notified with the updated cluster state. And since I don't want to actually update the cluster state in dry_run mode, that's the old cluster state. So I have no access to the data stream. I could do something like modify the UpdateSettingsTask with the new data stream so that it is available to the listener, but that gets kind of ugly (and those tasks aren't meant to be updated).

@masseyke masseyke requested a review from lukewhiting May 22, 2025 22:11
Copy link
Contributor

@lukewhiting lukewhiting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM 👍🏻 Nice work!

@masseyke masseyke merged commit 7207692 into elastic:main May 23, 2025
18 checks passed
@masseyke masseyke deleted the data-stream-settings-dry-run branch May 23, 2025 16:29
@@ -72,13 +83,21 @@ public Request(StreamInput in) throws IOException {
super(in);
this.dataStreamNames = in.readStringArray();
this.settings = Settings.readSettingsFromStream(in);
if (in.getTransportVersion().onOrAfter(TransportVersions.SETTINGS_IN_DATA_STREAMS)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be TransportVersions.SETTINGS_IN_DATA_STREAMS_DRY_RUN?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants