-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] Support json flatten operations #29283
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I definitely prefer format 2. I hadn't thought about flattening slices like that. Is the normal? I expected slices to be left alone but your suggestion makes sense. |
Is there any standard here when it comes to flattening? Follow up question, would we want to support both types of flattening? |
I think the second solution is probably better, that's how Cloudwatch would parse these logs. The first one only exists because I wrote a simple processor for this on my own and that was the easier solution for me at the time. |
I also prefer option 2.
I am not sure if it is desirable. A long array would cause an explosion of attributes which may cause other issues. That being said, some backends only support arrays of primitive types so Works {
"items": ["value1", "value2"]
} Does not work {
"items":
[
{
"key": "value1"
},
{
"key": "value2"
}
]
} It would be nice to have some options to handle this. |
I think this is a good point but I think this is an inherent risk when flattening data like this. With option 2 wouldn't this
become
? |
@bryan-aguilar I feel like option 1 is more of a stringify solution that flattening. I'd rather see that accomplished via I agree with your interpretation of how the slice would be flattened. |
If the desire is to prevent backends from rejecting data then this might trading 1 problem for another. For instance my backend doesn't supported json maps or more than 255 attributes. If I had (psuedo json) {
"key": "value",
"slice": [ <300 items in here> ]
} then my backend would drop "slice" but allow the rest of the payload. But if we turned this into {
"key": "value",
"slice.0": "<item0>",
"slice.1": "<item1>",
...
"slice.299": "<item299"
} Now my backend would drop the entire payload. Could we support some type of depth parameter for nested levels and some kind of limit parameter for slice length to help guard against this? I think you can use |
@danelson this is the same issue with objects though right? You could have an object with 350 nested fields that would flatten to 350 separate fields. {
"name": "test",
"bigobj": {
"field1": 1,
"field2": 2,
....,
"field350": 350
}
} |
@danelson wouldn't the existing |
Yes, you are correct. I guess I wasn't thinking about that since it hasn't been an issue for us.
I think you mean limit? I would like to give precedence to attributes that do not originate in maps/slices. I don't think that is possible. Since I don't know the names of the incoming attributes I have to apply |
I realize my use case may not be standard. Just thought it would help the discussion. I would be happy if there was support for the following which I think solves a lot of generic use cases
|
@danelson I wonder if your use case should be considered a complete separate concern. Should flatten function only worry about Does the functionality to trim down the json object before or after flattening already? |
It is best to have these types of checks in the Where clause instead of the function itself. To enable this use case we could have a |
This makes sense to me. This may veering off track now, but is there a way to write transforms that apply this to subsets of data? If I want to flatten a specific map I think we are saying that this proposal would allow for something like
But would it be able flatten all maps that less than some depth when I don't know the name?
|
At the moment you would need to know the name. |
@TylerHelmuth just to clarify, we're saying that |
I think we are saying that |
I think if we have a high degree of confidence that most users will want to limit map depth or list length when flattening a map of arbitrary keys, adding optional parameters would be an okay move here since it would simplify the overall UX for this functionality. If we can't do that, I think we could use #29289 to solve this case. Some combination of the below statements/conditions could be used to limit depth or remove keys that won't pass validation.
|
There should be an ability to prefix all the keys in the resulting flattened map.
results in something like this:
This keeps my keys neatly packaged in a namespace so as to not be confused with other attributes. |
I can see that being an optional parameter. It gets more complicated if you want to namespace the nested maps. I think in that case the best solution is to call flatten multiple times with the different prefixes. I don't want to support namespacing the sub maps. |
I'd just like to chime in that I am looking for this as well. My particular use case is that I have JSON-encoded logs that are being scraped by OTEL using the journald receiver and want to collapse all the log elements that currently are shipped under |
I've created #30455 to add this function. All interested parties please review. |
**Description:** Adds a `flatten` function that allows flattening maps. I went with an editor instead of a converter, but I'm open to debate. Using an editor means that a user can do `flatten(body) where IsMap(body)` instead of `set(body, Flatten(body)) where IsMap(body). When using ParseJson you have to do: ``` - merge_maps(cache, ParseJSON(body), "upsert") - flatten(cache) ``` instead of `merge_maps(cache, Flatten(ParseJSON(body)), "upsert")`. Ultimately I went with an editor for similar reasons that `merge_maps` is an editor: chaining too many functions together is messy and updating maps is very fast with pdata. The function supports 2 optional parameters, `prefix` and `depth`. Use `prefix` to add a "namespace" to the values that are being flattened. Use `depth` to prevent trying to flatten maps that are too deep. See the function doc for examples. **Link to tracking Issue:** <Issue number if applicable> Closes #29283 **Testing:** <Describe what testing was performed and which tests were added.> Added new unit and e2e tests. Please scrutinize. **Documentation:** <Describe the documentation added.> Added function doc. --------- Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
**Description:** Adds a `flatten` function that allows flattening maps. I went with an editor instead of a converter, but I'm open to debate. Using an editor means that a user can do `flatten(body) where IsMap(body)` instead of `set(body, Flatten(body)) where IsMap(body). When using ParseJson you have to do: ``` - merge_maps(cache, ParseJSON(body), "upsert") - flatten(cache) ``` instead of `merge_maps(cache, Flatten(ParseJSON(body)), "upsert")`. Ultimately I went with an editor for similar reasons that `merge_maps` is an editor: chaining too many functions together is messy and updating maps is very fast with pdata. The function supports 2 optional parameters, `prefix` and `depth`. Use `prefix` to add a "namespace" to the values that are being flattened. Use `depth` to prevent trying to flatten maps that are too deep. See the function doc for examples. **Link to tracking Issue:** <Issue number if applicable> Closes open-telemetry#29283 **Testing:** <Describe what testing was performed and which tests were added.> Added new unit and e2e tests. Please scrutinize. **Documentation:** <Describe the documentation added.> Added function doc. --------- Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
**Description:** Adds a `flatten` function that allows flattening maps. I went with an editor instead of a converter, but I'm open to debate. Using an editor means that a user can do `flatten(body) where IsMap(body)` instead of `set(body, Flatten(body)) where IsMap(body). When using ParseJson you have to do: ``` - merge_maps(cache, ParseJSON(body), "upsert") - flatten(cache) ``` instead of `merge_maps(cache, Flatten(ParseJSON(body)), "upsert")`. Ultimately I went with an editor for similar reasons that `merge_maps` is an editor: chaining too many functions together is messy and updating maps is very fast with pdata. The function supports 2 optional parameters, `prefix` and `depth`. Use `prefix` to add a "namespace" to the values that are being flattened. Use `depth` to prevent trying to flatten maps that are too deep. See the function doc for examples. **Link to tracking Issue:** <Issue number if applicable> Closes open-telemetry#29283 **Testing:** <Describe what testing was performed and which tests were added.> Added new unit and e2e tests. Please scrutinize. **Documentation:** <Describe the documentation added.> Added function doc. --------- Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
Component(s)
Is your feature request related to a problem? Please describe.
Original discussion in https://cloud-native.slack.com/archives/C01N6P7KR6W/p1699999733202679
Some backends do not support json slices/maps. It would be nice if there were functions to support flattening the data. Consider the following
Possible outputs
Flatten to stringified json
Flatten to modified attribute names
Initial questions:
Describe the solution you'd like
A function such as
FlattenJSON
,FlattenMap
,FlattenSlice
The text was updated successfully, but these errors were encountered: