-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Adding dry_run mode for setting data stream settings #128269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding dry_run mode for setting data stream settings #128269
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a dry_run mode to the data stream settings update API so that when enabled, settings are validated and simulated without being applied to the underlying data stream or its indices. Key changes include:
- Updating request classes, transport actions, and REST handlers to accept a dry_run parameter.
- Modifying the metadata update service to support a simulated dry run flow.
- Enhancing tests and YAML-based integration tests to validate the dry_run behavior.
Reviewed Changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
server/src/test/java/org/elasticsearch/action/datastreams/UpdateDataStreamSettingsActionRequestTests.java | Updated test instance creation and mutation logic to include dry_run. |
server/src/main/java/org/elasticsearch/cluster/metadata/MetadataDataStreamsService.java | Added dry_run handling in updateSettings and simulation of settings update. |
server/src/main/java/org/elasticsearch/action/datastreams/UpdateDataStreamSettingsAction.java | Modified request, equals, and hashCode to include dry_run. |
server/src/main/java/org/elasticsearch/TransportVersions.java | Introduced a new transport version constant for dry_run. |
modules/data-streams/src/yamlRestTest/resources/rest-api-spec/test/data_stream/240_data_stream_settings.yml | Added tests for verifying proper dry_run behavior. |
modules/data-streams/src/main/java/org/elasticsearch/datastreams/rest/RestUpdateDataStreamSettingsAction.java | Updated REST handler to parse the dry_run parameter. |
modules/data-streams/src/main/java/org/elasticsearch/datastreams/action/TransportUpdateDataStreamSettingsAction.java | Adjusted transport action methods to use the dry_run flag in processing logic. |
Files not reviewed (1)
- rest-api-spec/src/main/resources/rest-api-spec/api/indices.put_data_stream_settings.json: Language not supported
Comments suppressed due to low confidence (1)
modules/data-streams/src/main/java/org/elasticsearch/datastreams/action/TransportUpdateDataStreamSettingsAction.java:324
- [nitpick] Although code logic is correct in the dry run branch, adding an explicit return statement after listener.onResponse(null) could improve clarity and reduce potential future maintenance errors.
if (dryRun) { listener.onResponse(null); }
if (response.isAcknowledged()) { | ||
return clusterService.state().projectState(projectId).metadata().dataStreams().get(dataStreamName); | ||
} else { | ||
throw new ElasticsearchException("Updating settings not accepted for unknown reasons"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider improving the error message here to provide more specific context about why the settings update was not accepted, which can aid in troubleshooting in both dry_run and regular flows.
throw new ElasticsearchException("Updating settings not accepted for unknown reasons"); | |
throw new ElasticsearchException( | |
"Updating settings not accepted for project [" + projectId + "], data stream [" + dataStreamName + | |
"], with settings overrides: " + settingsOverrides.toString() | |
); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @masseyke, I have been using your PR's as a bit of a test bed for copilot-trial group. Feel free to ignore these and/or give feedback on how useful they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I don't agree with every comment copilot has made, it has already found several bugs (or at least typos) in my PRs, and has made some good suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underlying logic and approach look sound. Just some outstanding questions about code cleanliness to explore.
@Override | ||
public void onResponse(AcknowledgedResponse response) { | ||
UpdateDataStreamSettingsAction.DataStreamSettingsResponse.IndexSettingError error; | ||
if (response.isAcknowledged() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're getting quite deeply nested at this point. Is it worth pulling out some of these blocks into methods / inner classes for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I switched everything over to ActionListeners helper methods rather than creating my own ActionListeners. I think it improves readability a bit. Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks :-) Yeah I think that's a little easier on the eyes 👍🏻
ProjectMetadata projectMetadata = clusterState.metadata().getProject(projectId); | ||
Map<String, DataStream> dataStreamMap = projectMetadata.dataStreams(); | ||
DataStream dataStream = dataStreamMap.get(dataStreamName); | ||
Settings existingSettings = dataStream.getSettings(); | ||
|
||
Template.Builder templateBuilder = Template.builder(); | ||
Settings.Builder mergedSettingsBuilder = Settings.builder().put(existingSettings).put(settingsOverrides); | ||
Settings mergedSettings = mergedSettingsBuilder.build(); | ||
|
||
final ComposableIndexTemplate template = lookupTemplateForDataStream(dataStreamName, projectMetadata); | ||
ComposableIndexTemplate mergedTemplate = template.mergeSettings(mergedSettings); | ||
MetadataIndexTemplateService.validateTemplate( | ||
mergedTemplate.template().settings(), | ||
mergedTemplate.template().mappings(), | ||
indicesService | ||
); | ||
|
||
templateBuilder.settings(mergedSettingsBuilder); | ||
return dataStream.copy().setSettings(mergedSettings).build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to duplicate a lot of the task implementation starting at line 148. Anything we can do to reflector that to reduce the duplication?
Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?
Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?
I'm on the fence about that. That's the way I had originally written it. In dry-run mode I would just return the original cluster state. But it felt odd running through the cluster state update mechanics just to effectively do template validation. So I pulled the template validation logic out and just ran it locally in dry-run mode. But I could see arguments either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to duplicate a lot of the task implementation starting at line 148. Anything we can do to reflector that to reduce the duplication?
Which parts are you referring to? This code is all code I had moved out of that task logic around line 148 so that I could call it from both places w/o duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which parts are you referring to? This code is all code I had moved out of that task logic around line 148 so that I could call it from both places w/o duplication.
Sorry my bad -.- I cross referenced the class in my IDE but forgot I had checked out another branch. The looks good 👍🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to push down the dry run logic all the way to the task execution so there's no diverging path?
I'm on the fence about that. That's the way I had originally written it. In dry-run mode I would just return the original cluster state. But it felt odd running through the cluster state update mechanics just to effectively do template validation. So I pulled the template validation logic out and just ran it locally in dry-run mode. But I could see arguments either way.
That's fair enough. If you have tried it and it didn't look noticeably better then I'm happy to stick with it how it is unless anyone else has strong opinions :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have tried it and it didn't look noticeably better then I'm happy to stick with it how it is unless anyone else has strong opinions :-)
I tried it again just to be sure. The problem is that when the updateSettingsExecutor
is executed, the listener on my UpdateSettingsTask
is notified with the updated cluster state. And since I don't want to actually update the cluster state in dry_run mode, that's the old cluster state. So I have no access to the data stream. I could do something like modify the UpdateSettingsTask with the new data stream so that it is available to the listener, but that gets kind of ugly (and those tasks aren't meant to be updated).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM 👍🏻 Nice work!
@@ -72,13 +83,21 @@ public Request(StreamInput in) throws IOException { | |||
super(in); | |||
this.dataStreamNames = in.readStringArray(); | |||
this.settings = Settings.readSettingsFromStream(in); | |||
if (in.getTransportVersion().onOrAfter(TransportVersions.SETTINGS_IN_DATA_STREAMS)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be TransportVersions.SETTINGS_IN_DATA_STREAMS_DRY_RUN
?
This adds a
dry_run
parameter to the REST method added in #127858. Ifdry_run
is true, it simulates setting the settings on the data stream and indices, but does not actually change any state. For example:returns:
But the settings are not actually applied to the data stream or any indices.