-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pkg/ottl] Enhance ParseXML function with intuitive parse format and add MarshalXML function to convert parsed output back to XML #35075
Conversation
…cluding the option to flatten single attribute arrays. Also add MarshalXML function to marshal pcommon.Map into XML
@shazlehu can you please sign the CLA: https://github.com/open-telemetry/community/blob/main/guides/contributor/CLA.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits
#### Parameters | ||
| Name | Type | Description | | ||
|------|------|-------------| | ||
| `target` | Getter | A string that should be in XML format.If `target` is not a string, nil, or cannot be parsed as XML, `ParseXML` will return an error. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| `target` | Getter | A string that should be in XML format.If `target` is not a string, nil, or cannot be parsed as XML, `ParseXML` will return an error. | | |
| `target` | Getter | A string that should be in XML format. If `target` is not a string, nil, or cannot be parsed as XML, `ParseXML` will return an error. | |
#### Version 2 Parsing | ||
|
||
Unmarshalling XML is done using the following rules: | ||
1. All character data for an XML element is trimmed, joined, and placed into the `xml_value` field, or as the value of the map if the element has attributes or children. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'placed into the XML field' and 'as the value of the map' should be swapped here
"00001": "Sam", | ||
"00002": "Bob", | ||
"00003": "Alice", | ||
"xml_ordering": "00001,00003,00003", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"xml_ordering": "00001,00003,00003", | |
"xml_ordering": "00001,00002,00003", |
@@ -304,6 +306,698 @@ func Test_ParseXML(t *testing.T) { | |||
require.True(t, ok) | |||
|
|||
require.Equal(t, tt.want, resultMap.AsRaw()) | |||
// json, err := jsoniter.MarshalIndent(resultMap.AsRaw(), "", " ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
{ | ||
xml: `<Log>This record has a collision<User id="0001"/><User id="0002"/></Log>`, | ||
want: map[string]any{ | ||
"Log": map[string]any{ | ||
"xml_value": "This record has a collision", | ||
"User": []any{map[string]any{ | ||
"xml_attributes": map[string]any{ | ||
"id": "0001", | ||
}, | ||
}, map[string]any{ | ||
"xml_attributes": map[string]any{ | ||
"id": "0002", | ||
}, | ||
}}, | ||
"xml_ordering": "User.0,User.1", | ||
}, | ||
}, | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a duplicate of the test below but with no name
}, | ||
}, | ||
}, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add name here
// require.Equal(t, tt.want, rawMap) | ||
t.Logf("Result: %s", formatMap(rawMap)) | ||
|
||
// re-marshal the result to compare the output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove but good to test this for sure.
Also re-add the require.Equal above
return true | ||
} | ||
|
||
func (a *xmlElement) MarshalXML(e *xml.Encoder, start xml.StartElement) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove but good for testing
Description:
Current implementation of ParseXML
The current implementation of the ParseXML produces output that is difficult for users to manipulate, particularly when the XML is complicated. The current design loses the sense of key:value pairs which are helpful for understanding the data, and, more importantly, always parses the XML into arrays, which are difficult to manipulate in OTTL, and don't provide a stable path that can be used to access the data.
This change proposes an additional parsing version for the ParseXML function that would parse XML into a nested map, where the keys are the paths to the data, and the values are the data itself, or in the case of nodes with both values and attributes, using the special keys
xml_attributes
andxml_value
. It still maintains the order of nodes using the special keyxml_ordering
. Additionally, the proposed implementation can optionally "flatten" arrays, such as theEventData
node in the following XML, into a single map. The arrays must have a single, common attribute, and no other children (in this case,EventData
has onlyData
children). This would allow users to access the data more intuitively and would allow for easier manipulation of the data in OTTL.This proposed change also includes a new function,
MarshalXML
, to reconstruct the XML from the parsed map. This would allow users to manipulate the data in OTTL, and then convert it back to XML, typically for ingestion into another system.The proposed implementation would be backward compatible with the current implementation, allowing the user to specify the new implementation as an optional
version
parameter to the ParseXML function.Consider the following XML from a Windows event log, a common telemetry source:
The current
ParseXML
function renders this output, in a flattened view, using dot-notation:This view is extremely opaque, and difficult to manipulate in OTTL. The current design also doesn't provide a stable path that can be used to access the data. For example, the path to the
EventID
ischildren.0.children.1["content"]
, which is not intuitive.Proposed implementation:
The proposed implementation would parse the XML into nested maps. In the case where a node has multiple children with the same name, the children would be stored in an array. If a node has no attributes or children, its value would be stored as a string. Otherwise, it would be stored in the special key
xml_value
. Thexml_attributes
key would store the attributes of the node, and the special keyxml_ordering
would store the order of the children of the node, which is important for marshaling the nodes in the same order back into XML. For example, the schema definition for<EventData>
in a Windows Event log specifies that the<Data>
nodes are a sequence (<xs:sequence>
), so the order of the nodes should be preserved:We prefix the special keys with
xml_
to avoid conflicts with the actual data. It is a limitation that if the original XML uses tagsxml_value
,xml_attributes
,xml_ordering
, orxml_flattened_array
, it will not parse correctly. A future enhancement could be to add an optional parameter to the function to allow users to specify a different prefix.Additionally, the proposed implementation can optionally "flatten" arrays, such as the
EventData
node in the XML, into a single map. The arrays must have a single, common attribute, and no other children (in this case,EventData
has onlyData
children). This would allow users to access the data in a more intuitive way, and would allow for easier manipulation of the data in OTTL.Flattened Parse
MarshalXML
The MarshalXML function is able to reliably reconstruct the XML, with the limitation imposed by the GO XML library, which doesn't support self closing tags (In this example, tags like
<Security />
.) The Unit tests for the MarshalXML function use theParseXML
function to parse the XML, and then compare the output of theMarshalXML
function to the original XML. The testXML
purposely omits self-closing tags.Link to tracking Issue: #35076
Testing:
Added Unit tests for the new version, previous tests still pass for version 1.
Documentation:
Added documentation for the new parse version to pkg/ottl/ottlfuncs/README.md