-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows event_data format is difficult to consume #32952
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I actually was unable to figure out how to work with the event_data in ottl as merge_maps doesn't work and flatten produces |
Thanks for reporting this @strawgate. I agree it's difficult to work with but it looks like you have identified a potential change to ottl which would help. Let us know if you have other ideas or requests to help with this. |
Yeah though having everyone have to merge maps to look at event data doesn't seem ideal, especially when there are potential a dozen plus properties with empty keys that won't merge well. I think we should go back to creating a single map and come up with an incrementing identifier like param1 for those unnamed parameters. I'd be happy to try to put together a pr |
Or maybe we just separate named and unnamed parameters - a map for named, an array for unnamed? I don't know if there's appetite to switch it again but maybe @swiatekm-sumo @pjanotti have opinions too. |
I have very little appetite for a breaking change here, but I wouldn't mind a configuration option to switch between two well-defined formats. |
I'm not sure how you would actually even use the structure that exists today. Can you provide an example of how you can consume the data today? As far as I can tell, you can't "find " keys or values that are in a map that's in an arbitrary array of maps so you're relying on pulling maps by the index in the array? Which I don't think is stable for named parameters but is stable for unnamed parameters. It seems to me that it would be very hard for someone to consume as it is today as they would be taking a dependency on the order of the maps in the array? In event_data XML these are largely accessed by value or by key and only sometimes by index. Having them in the map would provide a unified access pattern. Perhaps the event_data unnamed keys being converted to array vs map should be a configuration? |
Thinking back to #28587 (comment) I opted to go with maps to keep it compact, ie.: avoiding naming the element fields, but, if we were to follow the actual XML event definition we would need to respect that. So if we were to follow the XML specification it would have to be something like: {
"event_data": {
"data": [
{
"name": "ProcessId",
"value": "7924"
},
{
"name": "Application",
"value": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe"
},
]
} it means that for the Data elements without name it will be something like: {
"event_data": {
"data": [
{
"name": "",
"value": "7924"
},
{
"name": "",
"value": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe"
},
]
} The map is really convenient, but, it is not the actual data format. The events without name are the oldest format for Windows events, but, as shown in the comment above are still used. I want to improve the usage for @strawgate, but, let's decide this carefully to avoid having to revisit it later. 🤔 |
I went looking at how this works across other tools, it looks like there are 3 main methods utilized:
I think this may be a good enough argument on its own that there should be 3 modes and that perhaps leaving it as xml might make sense as the default. The complaints around 3 (what is currently the default) is typically that it relies on the order of xml elements to produce the array of maps. Lots of event types can contain the same xml elements but in different orders. A user who wants to pull say "TargetUserSid" using method 3 and work with it using OTTL will have to either:
|
One idea regarding a configuration option: a versioned data struct option seems a good choice in this case. It allows users to stay on a desired format while allowing possible future breaking changes. Each version represents relatively a small piece of code and having a handful doesn't seem a problem and we can always drop some less popular options later. |
Are you aware of any examples of this implemented across any component that I can stare at for a while? I may start putting together a PR for how this might look, so any guidance would be lovely. |
@strawgate sorry, I don't know one from the top of my mind. @djaglowski may know some component doing that. |
@swiatekm-sumo do you view the introduction of the current behavior as a setting the user would have to explicitly pick as acceptable re: breaking change? As long as the actual event_data can come in the previous format with that setting? |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners: See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I'm still a bit stuck on this as using the current format requires taking a dependency on an unstable feature of the event_data format (the order of the values with no keys). I would like to address this if possible via a configuration flag that determines how event_data is "map"ped. Are there any objections to me putting together a PR that offers the alternative format? |
@strawgate making a PR will, at minimum, move the discussion ahead, however, there is the obvious risk of not being accepted. If you think the risk is too high for your time investment, let me (re)review this item before you embark on that. |
With #34720 merged, I think users will now have a simple way to opt into 1. (This was sort of true before, but the I want to offer a 4th option, after having spent some time on #35281
Starting from the original example, say we have a few unnamed elements in the result: <EventData>
<Data Name="ProcessId">7924</Data>
<Data Name="Application">\device\harddiskvolume2\users\test\desktop\netcat\nc.exe</Data>
<Data Name="SourceAddress">0.0.0.0</Data>
<Data Name="SourcePort">5555</Data>
<Data Name="Protocol">6</Data>
<Data Name="FilterRTID">84614</Data>
<Data Name="LayerName">%%14608</Data>
<Data Name="LayerRTID">36</Data>
<Data>val 1</Data>
<Data>val 2</Data>
</EventData> The output would be: "event_data": {
"ProcessId": "7924",
"Application": "\\device\\harddiskvolume2\\users\test\\desktop\\netcat\\nc.exe",
"SourceAddress": "0.0.0.0",
"SourcePort": "5555",
"Protocol": "6",
"FilterRTID": "84614",
"LayerName": "%%14608",
"LayerRTID": "36"
"Unnamed": [ "val 1", "val 2" ]
} We may need to support the old format (3) for backwards compatibility, but I'm wondering if 4 would be superior to 2. |
One observation @djaglowski, there shouldn't be a mix of named and unnamed entries in the same event. That is related to the code used to generate the actual event. So when doing option 4, if possible, the map can be created with the generated names, perhaps matching the legacy event format, IIRC, |
This matches my understanding too that the unnamed ones should be from XP/2003 events. @pjanotti by generated names do you mean something like param1, param2, param3? Isn't that option 2? |
Given that, I would be in favor of adding a config flag, e.g. |
Sorry, for the delay...
Yes, correct - just using the place holder of the message in the old format. @djaglowski just to confirm if I'm understanding your proposal correct: if |
✅
I was thinking it would be a map or a slice, depending on whether the name attribute is present.
This is where I would have said the result is a slice containing only the values.
✅ That said, I'm not opposed to having a map w/ some kind of numbered keys. The benefit of this would be that the result is structurally for all events. I think you should make this decision, and the rest we seem to be on the same page about already. |
@djaglowski I do prefer that we always have a map, this should simplify any ottl processing. @strawgate any preferences? |
Consistent use of a map seems ideal to me |
I'm good with using a map consistently. |
Component(s)
receiver/windowseventlog
Describe the issue you're reporting
Prior to #28587 the format for event_data on windows events was easy to consume as it was a map:
would produce
In order to support xml entries with no names, this PR #28587 switched the format to be an array of maps:
Which seems to require a mergemaps(?) to be consumable via OTTL (though I'm not sure how that would work with multiple name-less entries).
We should consider handling the happy case (where event_data has a name and value) in a way that makes the data easy to consume.
Perhaps we could generate names (value1, value2, value3) for the name-less values to offer a consistent access experience. This would still allow whatever patterns are used today to identify the right event_data value to read while making it very easy to access by the order of the event_data.
The text was updated successfully, but these errors were encountered: