Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(codecs): Improve deserialization of EncodingConfigAdapter / fix overriding framing #12750

Merged
merged 2 commits into from
May 17, 2022

Conversation

pablosichert
Copy link
Contributor

@pablosichert pablosichert commented May 17, 2022

Closes #12473.

The error messages are improved in so far that sinks will reject unknown fields again and missing framing and encoding fields are mentioned explicitly.

Unfortunately, deserializing the encoding field itself will still give vague error messages due to serde-rs/serde#1544 (some more details in #12162).

Additionally, this PR fixes a bug where the framing config could not be overridden when a legacy codec was used.

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@netlify
Copy link

netlify bot commented May 17, 2022

Deploy Preview for vector-project ready!

Name Link
🔨 Latest commit 840dc57
🔍 Latest deploy log https://app.netlify.com/sites/vector-project/deploys/6283cb498f62810008d4935c
😎 Deploy Preview https://deploy-preview-12750--vector-project.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@github-actions github-actions bot added the domain: sinks Anything related to the Vector's sinks label May 17, 2022
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's some interesting adapter-fu going on there 👀 😄

It all looks okay to me, but the diff is hard to follow, so I'd wait for one more pair of eyes.

@pablosichert
Copy link
Contributor Author

That's some interesting adapter-fu going on there 👀 😄

Yeah, it's... involved. Scimming the tests is probably a good start:

#[test]
fn deserialize_encoding_with_transformation() {
let string = r#"
{
"codec": "raw_message",
"only_fields": ["a.b[0]"],
"except_fields": ["ignore_me"],
"timestamp_format": "unix"
}
"#;
let adapter = serde_json::from_str::<
EncodingConfigAdapter<EncodingConfig<FooLegacyEncoding>, FooMigrator>,
>(string)
.unwrap();
let serializer = adapter.config();
assert!(matches!(serializer, SerializerConfig::RawMessage));
let transformer = adapter.transformer();
assert_eq!(transformer.only_fields(), &Some(vec![parse_path("a.b[0]")]));
assert_eq!(
transformer.except_fields(),
&Some(vec!["ignore_me".to_owned()])
);
assert_eq!(transformer.timestamp_format(), &Some(TimestampFormat::Unix));
}
#[test]
fn deserialize_encoding_with_framing_and_transformation() {
let string = r#"
{
"framing": {
"method": "character_delimited",
"character_delimited": {
"delimiter": ","
}
},
"encoding": {
"codec": "raw_message",
"only_fields": ["a.b[0]"],
"except_fields": ["ignore_me"],
"timestamp_format": "unix"
}
}
"#;
let adapter = serde_json::from_str::<
EncodingConfigWithFramingAdapter<
EncodingConfig<FooLegacyEncoding>,
FooWithFramingMigrator,
>,
>(string)
.unwrap();
let (framing, serializer) = adapter.config();
assert!(matches!(
framing,
Some(FramingConfig::CharacterDelimited {
character_delimited: CharacterDelimitedEncoderOptions { delimiter: b',' }
})
));
assert!(matches!(serializer, SerializerConfig::RawMessage));
let transformer = adapter.transformer();
assert_eq!(transformer.only_fields(), &Some(vec![parse_path("a.b[0]")]));
assert_eq!(
transformer.except_fields(),
&Some(vec!["ignore_me".to_owned()])
);
assert_eq!(transformer.timestamp_format(), &Some(TimestampFormat::Unix));
}
#[test]
fn deserialize_legacy_config() {
for string in [r#""foo""#, r#"{ "codec": "foo" }"#] {
let adapter = serde_json::from_str::<
EncodingConfigAdapter<EncodingConfig<FooLegacyEncoding>, FooMigrator>,
>(string)
.unwrap();
let serializer = adapter.config();
assert!(matches!(serializer, SerializerConfig::Json));
}
}
#[test]
fn deserialize_legacy_config_with_framing() {
for string in [
r#"{ "encoding": "foo" }"#,
r#"{ "encoding": { "codec": "foo" } }"#,
] {
let adapter = serde_json::from_str::<
EncodingConfigWithFramingAdapter<
EncodingConfig<FooLegacyEncoding>,
FooWithFramingMigrator,
>,
>(string)
.unwrap();
let (framing, serializer) = adapter.config();
assert!(matches!(framing, Some(FramingConfig::NewlineDelimited)));
assert!(matches!(serializer, SerializerConfig::Json));
}
}
#[test]
fn deserialize_legacy_config_with_framing_override() {
for string in [
r#"{ "framing": { "method": "bytes" }, "encoding": "foo" }"#,
r#"{ "framing": { "method": "bytes" }, "encoding": { "codec": "foo" } }"#,
] {
let adapter = serde_json::from_str::<
EncodingConfigWithFramingAdapter<
EncodingConfig<FooLegacyEncoding>,
FooWithFramingMigrator,
>,
>(string)
.unwrap();
let (framing, serializer) = adapter.config();
assert!(matches!(framing, Some(FramingConfig::Bytes)));
assert!(matches!(serializer, SerializerConfig::Json));
}
}
#[test]
fn serialize_encoding_with_transformation() {
let string = r#"{"codec":"raw_message","only_fields":["a.b[0]"],"except_fields":["ignore_me"],"timestamp_format":"unix"}"#;
let adapter = serde_json::from_str::<
EncodingConfigAdapter<EncodingConfig<FooLegacyEncoding>, FooMigrator>,
>(string)
.unwrap();
let serialized = serde_json::to_string(&adapter).unwrap();
assert_eq!(string, serialized);
}
#[test]
fn serialize_encoding_with_framing_and_transformation() {
let string = r#"{"framing":{"method":"character_delimited","character_delimited":{"delimiter":","}},"encoding":{"codec":"raw_message","only_fields":["a.b[0]"],"except_fields":["ignore_me"],"timestamp_format":"unix"}}"#;
let adapter = serde_json::from_str::<
EncodingConfigWithFramingAdapter<
EncodingConfig<FooLegacyEncoding>,
FooWithFramingMigrator,
>,
>(string)
.unwrap();
let serialized = serde_json::to_string(&adapter).unwrap();
assert_eq!(string, serialized);
}

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @pablosichert . This is much improved.

I was thinking, to save us some support effort if people run into this, it might be a good idea to add this to the docs so that if users search "data did not match any variant of untagged enum EncodingConfig" they find an explanation (also it gives us something to easily link to). Maybe around here: https://master.vector.dev/docs/reference/configuration/sinks/http/#encoding by adding it to https://github.com/vectordotdev/vector/blob/master/website/cue/reference/components/sinks.cue#L177

@github-actions
Copy link

Soak Test Results

Baseline: f238e5e
Comparison: af7514b
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

Changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

experiment Δ mean Δ mean % confidence
http_pipelines_no_grok_blackhole 1.67MiB 9.27 100.00%
Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_no_grok_blackhole 1.67MiB 9.27 100.00% 18.03MiB 894.64KiB 18.74KiB 0 0.0484486 19.7MiB 755.55KiB 15.82KiB 0 0.0374453 False False
socket_to_socket_blackhole 414.22KiB 1.67 100.00% 24.19MiB 565.99KiB 11.7KiB 0 0.0228472 24.59MiB 669.83KiB 13.97KiB 0 0.0265943 False False
syslog_splunk_hec_logs 301.64KiB 1.61 100.00% 18.31MiB 421.7KiB 8.7KiB 0 0.0224871 18.6MiB 395.11KiB 8.16KiB 0 0.0207352 False False
syslog_log2metric_splunk_hec_metrics 279.92KiB 1.45 100.00% 18.85MiB 388.01KiB 8.1KiB 0 0.0200997 19.12MiB 378.81KiB 7.93KiB 0 0.0193426 False False
syslog_humio_logs 240.17KiB 1.26 100.00% 18.62MiB 181.23KiB 3.78KiB 0 0.00950081 18.86MiB 158.98KiB 3.25KiB 0 0.00823087 False False
syslog_loki 189.97KiB 1.08 100.00% 17.13MiB 140.75KiB 2.89KiB 0 0.00802109 17.32MiB 120.08KiB 2.49KiB 0 0.00676945 False False
datadog_agent_remap_datadog_logs_acks 725.24KiB 0.93 100.00% 76.32MiB 392.57KiB 8.16KiB 0 0.00502207 77.03MiB 317.85KiB 6.51KiB 0 0.00402879 False False
http_pipelines_blackhole_acks 30.05KiB 0.67 92.80% 4.35MiB 557.86KiB 11.67KiB 0 0.125166 4.38MiB 570.65KiB 11.95KiB 0 0.127178 False True
datadog_agent_remap_datadog_logs 515.3KiB 0.67 100.00% 75.67MiB 1.45MiB 30.68KiB 0 0.0192109 76.17MiB 1.47MiB 31.21KiB 0 0.0192759 False False
http_pipelines_blackhole 19.72KiB 0.43 71.72% 4.44MiB 627.05KiB 12.93KiB 0 0.138009 4.46MiB 624.47KiB 13.04KiB 0 0.136846 False True
syslog_regex_logs2metric_ddmetrics 13.33KiB 0.09 67.07% 13.82MiB 464.91KiB 9.53KiB 0 0.0328333 13.84MiB 470.2KiB 9.8KiB 0 0.0331759 False False
http_to_http_noack -1.04KiB -0 11.39% 23.85MiB 245.85KiB 5.08KiB 0 0.0100665 23.84MiB 250.93KiB 5.2KiB 0 0.0102747 False False
fluent_elasticsearch 555.14B 0 27.57% 79.47MiB 54.5KiB 1.11KiB 0 0.000669534 79.47MiB 52.41KiB 1.07KiB 0 0.0006439 False False
splunk_hec_indexer_ack_blackhole 1.15KiB 0 9.56% 23.84MiB 332.84KiB 6.81KiB 0 0.0136336 23.84MiB 328.79KiB 6.72KiB 0 0.0134671 False False
splunk_hec_to_splunk_hec_logs_acks -313.39B -0 2.56% 23.84MiB 330.92KiB 6.76KiB 0 0.0135544 23.84MiB 330.15KiB 6.75KiB 0 0.0135229 False False
splunk_hec_to_splunk_hec_logs_noack -1.31KiB -0.01 10.68% 23.84MiB 332.16KiB 6.87KiB 0 0.0136012 23.84MiB 333.37KiB 6.91KiB 0 0.0136515 False False
http_to_http_json -2.26KiB -0.01 17.82% 23.84MiB 341.94KiB 7.03KiB 0 0.0140049 23.84MiB 344.72KiB 7.13KiB 0 0.0141203 False False
datadog_agent_remap_blackhole -97.63KiB -0.15 99.98% 63.92MiB 891.66KiB 18.67KiB 0 0.0136201 63.82MiB 857.19KiB 17.95KiB 0 0.0131132 False False
datadog_agent_remap_blackhole_acks -168.32KiB -0.24 100.00% 67.88MiB 941.64KiB 19.35KiB 0 0.0135432 67.72MiB 1016.32KiB 21.17KiB 0 0.0146527 False False
splunk_transforms_splunk3 -142.8KiB -0.86 96.72% 16.26MiB 2.24MiB 47.12KiB 0 0.137719 16.13MiB 2.24MiB 47.49KiB 0 0.138756 False False
syslog_log2metric_humio_metrics -119.95KiB -0.86 100.00% 13.64MiB 270.89KiB 5.55KiB 0 0.0193874 13.53MiB 282.69KiB 5.79KiB 0 0.0204074 False False
splunk_hec_route_s3 -240.63KiB -1.15 99.98% 20.42MiB 2.13MiB 45.06KiB 0 0.104125 20.19MiB 2.12MiB 45.01KiB 0 0.104999 False False
http_to_http_acks -226.45KiB -1.2 67.51% 18.44MiB 7.77MiB 163.73KiB 0 0.420957 18.22MiB 7.62MiB 161.52KiB 0 0.41826 True True

Signed-off-by: Pablo Sichert <mail@pablosichert.com>
@github-actions github-actions bot added the domain: external docs Anything related to Vector's external, public documentation label May 17, 2022
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs note!

@pablosichert pablosichert enabled auto-merge (squash) May 17, 2022 17:23
@github-actions
Copy link

Soak Test Results

Baseline: f238e5e
Comparison: 840dc57
Total Vector CPUs: 4

Explanation

A soak test is an integrated performance test for vector in a repeatable rig, with varying configuration for vector. What follows is a statistical summary of a brief vector run for each configuration across SHAs given above. The goal of these tests are to determine, quickly, if vector performance is changed and to what degree by a pull request. Where appropriate units are scaled per-core.

The table below, if present, lists those experiments that have experienced a statistically significant change in their throughput performance between baseline and comparision SHAs, with 90.0% confidence OR have been detected as newly erratic. Negative values mean that baseline is faster, positive comparison. Results that do not exhibit more than a ±8.87% change in mean throughput are discarded. An experiment is erratic if its coefficient of variation is greater than 0.3. The abbreviated table will be omitted if no interesting changes are observed.

No interesting changes in throughput with confidence ≥ 90.00% and absolute Δ mean >= ±8.87%:

Fine details of change detection per experiment.
experiment Δ mean Δ mean % confidence baseline mean baseline stdev baseline stderr baseline outlier % baseline CoV comparison mean comparison stdev comparison stderr comparison outlier % comparison CoV erratic declared erratic
http_pipelines_no_grok_blackhole 749.49KiB 4.23 100.00% 17.32MiB 1.94MiB 41.53KiB 0 0.111798 18.05MiB 2.15MiB 46.16KiB 0 0.119228 False False
socket_to_socket_blackhole 541.9KiB 2.15 100.00% 24.6MiB 280.71KiB 5.8KiB 0 0.0111392 25.13MiB 376.1KiB 7.84KiB 0 0.0146098 False False
syslog_humio_logs 319.68KiB 1.75 100.00% 17.84MiB 459.8KiB 9.57KiB 0 0.0251642 18.15MiB 457.87KiB 9.36KiB 0 0.0246276 False False
syslog_log2metric_splunk_hec_metrics 316.8KiB 1.73 100.00% 17.83MiB 278.38KiB 5.81KiB 0 0.0152423 18.14MiB 270.92KiB 5.67KiB 0 0.0145809 False False
syslog_splunk_hec_logs 226.34KiB 1.21 100.00% 18.34MiB 570.68KiB 11.77KiB 0 0.0303795 18.56MiB 591.39KiB 12.22KiB 0 0.0311075 False False
syslog_loki 189.24KiB 1.17 100.00% 15.83MiB 361.41KiB 7.41KiB 0 0.0222913 16.01MiB 369.05KiB 7.66KiB 0 0.0225001 False False
datadog_agent_remap_datadog_logs_acks 685.63KiB 0.93 100.00% 72.13MiB 1.2MiB 25.51KiB 0 0.0166174 72.8MiB 1.08MiB 22.74KiB 0 0.0148881 False False
datadog_agent_remap_datadog_logs 612.51KiB 0.79 100.00% 76.09MiB 1.12MiB 23.72KiB 0 0.0147687 76.69MiB 999.68KiB 20.75KiB 0 0.012727 False False
syslog_regex_logs2metric_ddmetrics 110.44KiB 0.79 100.00% 13.62MiB 515.53KiB 10.57KiB 0 0.0369427 13.73MiB 501.46KiB 10.46KiB 0 0.0356521 False False
http_pipelines_blackhole 35.48KiB 0.78 96.23% 4.44MiB 572.93KiB 11.81KiB 0 0.125863 4.48MiB 590.85KiB 12.33KiB 0 0.128795 False True
http_pipelines_blackhole_acks 17.94KiB 0.38 73.16% 4.55MiB 541.32KiB 11.33KiB 0 0.116072 4.57MiB 553.32KiB 11.59KiB 0 0.118191 False True
splunk_hec_indexer_ack_blackhole -26.09B -0 0.21% 23.84MiB 330.83KiB 6.77KiB 0 0.013551 23.84MiB 332.02KiB 6.79KiB 0 0.0135999 False False
fluent_elasticsearch 456.08B 0 22.04% 79.47MiB 53.99KiB 1.09KiB 0 0.000663242 79.47MiB 56.97KiB 1.16KiB 0 0.000699881 False False
splunk_hec_to_splunk_hec_logs_acks 1.2KiB 0 9.95% 23.84MiB 333.4KiB 6.82KiB 0 0.0136557 23.84MiB 329.52KiB 6.74KiB 0 0.0134961 False False
http_to_http_noack -3.06KiB -0.01 32.40% 23.85MiB 253.46KiB 5.24KiB 0 0.0103768 23.85MiB 246.93KiB 5.12KiB 0 0.0101107 False False
splunk_hec_to_splunk_hec_logs_noack -4.97KiB -0.02 38.06% 23.84MiB 335.1KiB 6.94KiB 0 0.0137233 23.84MiB 347.33KiB 7.2KiB 0 0.0142271 False False
http_to_http_json -4.77KiB -0.02 35.88% 23.84MiB 348.22KiB 7.16KiB 0 0.0142609 23.84MiB 352.98KiB 7.3KiB 0 0.0144588 False False
datadog_agent_remap_blackhole -81.15KiB -0.13 99.82% 61.47MiB 879.28KiB 18.41KiB 0 0.0139655 61.39MiB 877.19KiB 18.37KiB 0 0.0139503 False False
splunk_transforms_splunk3 -98.12KiB -0.6 85.20% 15.97MiB 2.26MiB 47.59KiB 0 0.141681 15.88MiB 2.28MiB 48.3KiB 0 0.143263 False False
datadog_agent_remap_blackhole_acks -420.4KiB -0.63 100.00% 64.75MiB 860.85KiB 17.69KiB 0 0.0129809 64.34MiB 872.82KiB 18.18KiB 0 0.0132452 False False
syslog_log2metric_humio_metrics -98.99KiB -0.7 100.00% 13.76MiB 333.73KiB 6.84KiB 0 0.0236758 13.67MiB 340.57KiB 6.98KiB 0 0.0243314 False False
splunk_hec_route_s3 -163.94KiB -0.79 98.92% 20.28MiB 2.14MiB 45.31KiB 0 0.105517 20.12MiB 2.15MiB 45.57KiB 0 0.106598 False False
http_to_http_acks -333.07KiB -1.78 85.09% 18.23MiB 7.97MiB 168.1KiB 0 0.437374 17.9MiB 7.47MiB 158.21KiB 0 0.41716 True True

@pablosichert pablosichert merged commit f269e8d into master May 17, 2022
@pablosichert pablosichert deleted the pablosichert/adapter-deserializer-improvements branch May 17, 2022 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: external docs Anything related to Vector's external, public documentation domain: sinks Anything related to the Vector's sinks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poor error message when missing required encoding configuration for aws_kinesis_firehose
3 participants