-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kafka
: improve CreateTopics
error and boolean handling
#23682
kafka
: improve CreateTopics
error and boolean handling
#23682
Conversation
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56026#01926d55-7886-4ec7-9d34-7c7b67f77be9:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56026#01926d55-7883-4cfd-9b69-d1abb3340db3:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56026#01926d55-788a-4412-8a4e-6e65644fc108:
non flaky failures in https://buildkite.com/redpanda/redpanda/builds/56026#01926d55-788d-4fc3-9720-49e098f057eb:
|
Retry command for Build#56026please wait until all jobs are finished before running the slash command
|
b4dee11
to
9aec1e9
Compare
Force push to:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally looks good. the first commit could be split into 3 commits (maybe four)
config = {"cleanup.policy": TopicSpec.CLEANUP_DELETE} | ||
|
||
# None case means use cluster default for topic creation. | ||
if topic_remote_write is not None: | ||
config |= {"redpanda.remote.write": topic_remote_write} | ||
|
||
self.rpk.create_topic(topic=self.topic_name, | ||
partitions=1, | ||
replicas=1, | ||
config={ | ||
"cleanup.policy": TopicSpec.CLEANUP_DELETE, | ||
"redpanda.remote.write": topic_remote_write | ||
}) | ||
config=config) | ||
|
||
if topic_remote_write != "-1": | ||
if topic_remote_write is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- is this change related to the handling of bools, or is it a cleanup?
- prior to this change -1 was the actual value passed in as the value for
redpanda.remote.write
, but that seems to no longer be the case. was that a value that was being tested for explicitly for a particular expected behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test fails with the changes in this PR if the test changes are not made.
You are correct in seeing that topic create topic -c redpanda.remote.write=-1
would be the original command executed in this test, which previously would succeed because we would hit the "silently ignore" case discussed in the cover letter. That was the author's intention, to use the cluster default, but this behavior is not supported in Kafka and therefore shouldn't be supported in redpanda
.
Using -1
as the value for a boolean returns an error in Kafka:
$ ./kafka-topics.sh --create --topic tapioca --bootstrap-server localhost:9092 --config preallocate=-1
Error while executing topic command : Invalid value -1 for configuration preallocate: Expected value to be either true or false
[2024-10-08 15:31:09,189] ERROR org.apache.kafka.common.config.ConfigException: Invalid value -1 for configuration preallocate: Expected value to be either true or false
at org.apache.kafka.common.config.ConfigDef.parseType(ConfigDef.java:706)
at org.apache.kafka.common.config.ConfigDef.parseValue(ConfigDef.java:531)
at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:524)
at org.apache.kafka.storage.internals.log.LogConfig.validate(LogConfig.java:692)
at org.apache.kafka.storage.internals.log.LogConfig.validate(LogConfig.java:684)
at org.apache.kafka.tools.TopicCommand.parseTopicConfigsToBeAdded(TopicCommand.java:176)
at org.apache.kafka.tools.TopicCommand.access$000(TopicCommand.java:81)
at org.apache.kafka.tools.TopicCommand$CommandTopicPartition.<init>(TopicCommand.java:267)
at org.apache.kafka.tools.TopicCommand$TopicService.createTopic(TopicCommand.java:460)
at org.apache.kafka.tools.TopicCommand.execute(TopicCommand.java:105)
at org.apache.kafka.tools.TopicCommand.mainNoExit(TopicCommand.java:90)
at org.apache.kafka.tools.TopicCommand.main(TopicCommand.java:85)
(org.apache.kafka.tools.TopicCommand)
The test was written with this assumed behavior for -1
meaning "cluster default" in mind, and they were accidentally correct.
I fixed the test here by just not setting a value for redpanda.remote.write
, which achieves the same thing as originally intended.
return string_switch<std::optional<bool>>(str_value) | ||
.match("true", true) | ||
.match("false", false); | ||
} catch (const std::runtime_error&) { | ||
throw boost::bad_lexical_cast(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not leave default_match
and then throw bad_lexical_cast if std::nullopt is returned? this formulation seems to assume that runtime_error can only be thrown due to a bad match. that's probably the case, but i'm not sure string_switch has any sort of strict contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the same pattern I was familiar with from
redpanda/src/v/kafka/server/handlers/configs/config_utils.h
Lines 654 to 656 in df1e247
} catch (const std::runtime_error&) { | |
// Our callers expect this exception type on malformed values | |
throw boost::bad_lexical_cast(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think string_switch
has a strict contract per se, but if
redpanda/src/v/strings/string_switch.h
Lines 195 to 202 in 497c5de
constexpr operator R() { // NOLINT | |
if (!result) { | |
throw std::runtime_error( | |
std::string( | |
"Fell off the end of a string-switch while matching: ") | |
+ std::string(view)); | |
} | |
return std::move(*result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent find and fix! 🙇♂️
9aec1e9
to
8f9cfcf
Compare
Force push to:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/56117#0192720f-3a80-4948-b6e7-20f2bb78fd6a |
In the case that some topic configurations were invalid while others were valid, the `create_topics_response` vector may have had results pushed back to it in an order that is different than the order in which the topics were supplied in the request. Add a new function `sort_topic_response()` that sorts the response's `topics` vector based on the order of the `create_topics_request` provided before returning it.
This commit changes the behavior of `to_cluster_type()`, as well as the `create_topics` handler as a whole. Before, if any invalid topic configurations in a `create_topics_request` lead to a function called from `to_cluster_type()` throwing an exception, the request would continually be retried until a timeout was hit. This had the effect of "all-or-nothing" behavior with topic creation, in which just one invalid topic config would result in none of the topics being created. By catching and handling the exception in `to_cluster_type()`, we now change the behavior to allow for partial creation of topics, in which those topics with valid configurations are succesfully created, while the invalid topics have their errors logged and returned. An additional iterator is passed in order to generate any necessary errors in the `create_topics_response`, which is returned back to the `kafka::request_context`.
To ensure a dry run `CreateTopics` request reflects the proper result, try parsing the passed config parameters. Any errors found in the configuration will be logged and returned.
Invalid boolean property values will no longer be silently ignored due to the previous use of `default_match`. An exception is handled and a `kafka::error_code::invalid_config` is returned to the user.
Boolean property values are now case-insensitive, just like in Kafka. This means any of `true`, `True`, `TRUE`, `tRuE` are all evaluated the same way when creating a topic.
8f9cfcf
to
fa9a760
Compare
Force push to:
|
kafka
: improve CreateTopics
boolean handlingkafka
: improve CreateTopics
error and boolean handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
// order as the topics were passed. This would be out of order in the case that | ||
// some topics were not successfully created (error response returned) while | ||
// others were. | ||
static void sort_topic_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be useful to note why we are sorting these. Do some clients expect these to be ordered? Observed behavior with kafka? Us being worried about changing the behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Test cases can be written more easily knowing that the topic responses are returned in a sane ordering (the same order as the request).
- Kafka seems to also respond with this ordering.
// others were. | ||
static void sort_topic_response( | ||
const create_topics_request& req, create_topics_response& resp) { | ||
absl::flat_hash_map<model::topic, size_t> topic_name_order_map; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine. Consider in the future naming maps something like pos_by_name
. Code reads more fluently with names like these I think.
@@ -111,7 +111,14 @@ static std::optional<bool> | |||
get_bool_value(const config_map_t& config, std::string_view key) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like not all bools are treated equally 😠. We can create another ticket for write.caching. No need to address it in this PR.
→ ~/redpanda/vbuild/go/linux/amd64/bin/rpk topic create baz -c write.caching=TruE
TOPIC STATUS
baz INVALID_CONFIG: Unsupported write caching configuration.
(This comment is edited because initially i used wrong commands to test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
write.caching
is an enum
, disguised as a bool
. We don't actually make calls to get_bool_value()
for this property. This INVALID_CONFIG
in particular comes from the validate_write_caching
check:
redpanda/src/v/kafka/server/handlers/topics/validators.h
Lines 320 to 323 in 078ba49
auto mode = model::write_caching_mode_from_string(it->value.value()); | |
// Failed to parse the value. | |
if (!mode) { | |
return false; |
I'm glad it's getting caught here though, because write_caching_mode_from_string()
also has a default match case...:
Lines 551 to 565 in c3ed8fe
write_caching_mode_from_string(std::string_view s) { | |
return string_switch<std::optional<write_caching_mode>>(s) | |
.match( | |
model::write_caching_mode_to_string( | |
model::write_caching_mode::default_true), | |
model::write_caching_mode::default_true) | |
.match( | |
model::write_caching_mode_to_string( | |
model::write_caching_mode::default_false), | |
model::write_caching_mode::default_false) | |
.match( | |
model::write_caching_mode_to_string( | |
model::write_caching_mode::disabled), | |
model::write_caching_mode::disabled) | |
.default_match(std::nullopt); |
Agreed that this should be cleaned up in the future as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR covers three cases:
1. Invalid boolean properties
Previously, invalid boolean property values would be silently ignored during the handling of a
CreateTopics
request, e.gwould result in the topic
tapioca
being created instead of returning an error. The topic's value forredpanda.remote.read
silently defaults to the cluster default (false
in the example above).This is unlike Kafka, in which an invalid configuration error is propagated back to the user:
Fix this inconsistency by returning an
kafka::error_code::invalid_config
when an invalid boolean value is provided.2. Case-insensitive boolean properties
Previously, the only boolean values that would be considered valid in a
CreateTopics
request would betrue
andfalse
. All other cases, such asTrue
,TRUE
,fAlSe
, etc., would again map to the "silently ignored" state mentioned above, e.g.This is unlike Kafka, in which the boolean property's value is case-insensitive:
Fix this inconsistency by making
CreateTopics
requests inredpanda
case-insensitive for boolean values.3. Other invalid topic configuration error handling
A few functions use unprotected
boost::lexical_cast
calls to convert input parameters to their required type.Before, when these calls failed and threw a
bad_lexical_cast
exception, it would not be caught and instead propagated to therequest_context
layer, retrying the command until it eventually times out.$ rpk topic create tapioca -c retention.bytes=this_should_be_an_integer
redpanda
output:rpk
output:This also has the effect of "all-or-nothing" topic creation, in which one bad configuration results in none of the topics being created, which is unlike the Kafka behaviour.
Fix this inconsistency by catching and gracefully returning error codes instead for invalid topic configurations.
Backports Required
Release Notes
Improvements
CreateTopics
request by making parsing case-insensitive.CreateTopics
request by no longer silently ignoring an invalid value, instead throwing a configuration error.CreateTopics
request.