Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary escaping in influx line protocol #13162

Open
Fiery-Fenix opened this issue Jun 15, 2022 · 5 comments
Open

Unnecessary escaping in influx line protocol #13162

Fiery-Fenix opened this issue Jun 15, 2022 · 5 comments
Labels
sink: influxdb_logs Anything `influxdb_logs` sink related type: bug A code related bug.

Comments

@Fiery-Fenix
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

sinks.influx is doing unnecessary escaping which breaks further pipeline (like telegraf)
According to official specification of influx protocol - it's not needed to escape "." (dot) in any parts of influx line protocol. But vector is escaping it those making produced event incompatible with other solutions that's supports Influx line protocol.
Example (see provided configuration for pipeline):
echo '{"event.name":"event"}' | vector --config influx.toml
Expected result:
vector,metric_type=logs event.name="event" 1655286885923535800
Actual result:
vector,metric_type=logs event\\\\.name="event" 1655286885923535800

Configuration

[sources.stdin]
type = "stdin"

[transforms.parse]
type = "remap"
inputs = [ "stdin" ]
source = """
. = parse_json!(.message)"""

[sinks.influx]
type = "influxdb_logs"
inputs = ["parse"]
consistency = "any"
database = "default"
endpoint = "http://localhost:8080/"
retention_policy_name = "autogen"
measurement = "vector"

Version

vector 0.22.1 (x86_64-unknown-linux-gnu b633e95 2022-06-10)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@Fiery-Fenix Fiery-Fenix added the type: bug A code related bug. label Jun 15, 2022
@jszwedko jszwedko added the sink: influxdb_logs Anything `influxdb_logs` sink related label Jun 15, 2022
@jszwedko
Copy link
Member

Thanks for the reproduction @Fiery-Fenix ! That is odd that we are escaping .s there.

@shymega
Copy link

shymega commented Jun 26, 2022

I'm looking to take this one on. I'm still investigating, but I am also pursuing the line of inquiry into this being an issue with an external crate(s).

@shymega
Copy link

shymega commented Jul 19, 2022

I haven't had time to fix this yet, focusing on job hunt, but my approach to debugging would be a mixture of gdb, packet sniffing, and dbg!(). I'm quite sure this is an issue with either serde or the HTTP client. I have not yet spotted any parsing issues in Vector, in which case, some low-level debugging would be required.

@juvenn
Copy link
Contributor

juvenn commented Apr 1, 2023

With some naive debugging, I was able to pinpoint the culprit. A \ is introduced when trying to iterate the LogEvent, then at influxdb encoding time, an extra \ is introduced to escape it.

if key.contains('.') {
res.push_str(&key.replace('.', "\\."));
} else {
res.push_str(key);
}

Field::String(s) => {
output.put_u8(b'"');
for c in s.chars() {
if "\\\"".contains(c) {
output.put_u8(b'\\');
}
let mut c_buffer: [u8; 4] = [0; 4];
output.put_slice(c.encode_utf8(&mut c_buffer).as_bytes());

As far as I can tell, influx is encoding correctly to escape \ and ", and it has tests to ensure that

line_protocol.2,
[
"a=1i",
"nested.array[0]=\"example-value\"",
"nested.array[1]=\"<null>\"",
"nested.array[2]=\"another-value\"",
"nested.array[3]=15i",
"nested.bool=true",
"nested.field=\"2\"",
]

Though I feel I don't have enough knowledge to figure the LogEvent iteration part, yet.

@jszwedko
Copy link
Member

jszwedko commented Apr 7, 2023

It seems like this is yet another case of using String for paths, making it hard to understand how to interpret that value. In this case, it looks like all_fields is the root cause here, which recursively iterates an event to return a list of (path, value). all_fields has been a problem for multiple different parts of Vector and really just needs to be re-written to return a real path type so each user of that function can handle it appropriately. For influx specifically it seems like we want to encode the path to a string differently than all_fields is doing by default.

(from a Slack message)

This will be a bit of a big change unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sink: influxdb_logs Anything `influxdb_logs` sink related type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

4 participants