Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flatten() function does not resolve attribute key conflicts when flattening #35793

Open
odubajDT opened this issue Oct 15, 2024 · 9 comments · May be fixed by #35831
Open

flatten() function does not resolve attribute key conflicts when flattening #35793

odubajDT opened this issue Oct 15, 2024 · 9 comments · May be fixed by #35831
Labels
enhancement New feature or request pkg/ottl

Comments

@odubajDT
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

The flatten() function is not able to flatten a map, which has conflicting keys after flattening. In the current state, the latest processed key in the map takes precedence. An example is

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": 2
}

which will result in

{
  "name": "test",
  "address.street": 2,
  "address.house": 1234,
}

and therefore we have data loss

Describe the solution you'd like

I would like to introduce another optional parameter of type boolean, which will enable/disable the resolution of conflicts by adding a suffix for the conflicting keys, similar to how flatten() handles slices. The definition might look like this

flatten(target, Optional[prefix], Optional[depth], Optional[resolve_conflict])

Where the 3rd parameter is type boolean and if set to false the behavior stays as it is. If set to true, it will keep all the conflicting keys and use a suffix to distinguish them. A solution for the mentioned example would be

{
  "name": "test",
  "address.street.00": "first",
  "address.house": 1234,
  "address.street.01": 2,
}

We can use a prefix 0 to the suffix to distinguish the array conversions and map key conflict resolutions

Describe alternatives you've considered

No response

Additional context

No response

@odubajDT odubajDT added enhancement New feature or request needs triage New item requiring triage labels Oct 15, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@joaopgrassi
Copy link
Member

Would also be nice to keep the "original" attribute as is and only add the suffix to the new ones that arrived after flattening. This way it is clear which one was the original vs modified.

@odubajDT
Copy link
Contributor Author

odubajDT commented Oct 16, 2024

created a first version of solution how this can potentially work #35831. Please consider it as a draft of an idea and not a PR ready for review.

Happy to hear about your feedback!

@evan-bradley evan-bradley removed the needs triage New item requiring triage label Oct 17, 2024
@evan-bradley
Copy link
Contributor

I agree we should have a conflict resolution option available for this function, thanks for opening a draft PR to look over.

Can we offer more than just a suffixing strategy? I can think of a few others that we should offer:

  1. Merge the two keys and put the values in a list.
  2. First wins
  3. Last wins (we should keep this as the default for now)

@odubajDT
Copy link
Contributor Author

I agree we should have a conflict resolution option available for this function, thanks for opening a draft PR to look over.

Can we offer more than just a suffixing strategy? I can think of a few others that we should offer:

1. Merge the two keys and put the values in a list.

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

2. First wins

3. Last wins (we should keep this as the default for now)

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

@evan-bradley
Copy link
Contributor

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

I see what you mean, I had forgotten slices are flattened as well. In that case, I think we just need to make sure that suffixing keeps slices in mind, for cases like this:

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": ["fifth", "avenue"]
}

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

That's a fair point, but pdata maps are actually essentially just wrappers around slices to key-value objects, so there is technically a deterministic ordering. Maybe providing options for preferring deeper/shallower values is a better approach. If you don't provide first-wins/last-wins options, I think we should deprecate the current behavior.

@odubajDT
Copy link
Contributor Author

Interesting idea, but doesn't this go against the functionality of flatten()? Since the function is actually converting slice to a flat map, it seems to me that this will go against the principle of the function

I see what you mean, I had forgotten slices are flattened as well. In that case, I think we just need to make sure that suffixing keeps slices in mind, for cases like this:

{
  "name": "test",
  "address": {
    "street": "first",
    "house": 1234
  },
  "address.street": ["fifth", "avenue"]
}

This might be valid, but how can you put an order on a map (unordered datatype)? Here we might be able to choose between the deeper/shallower value, but ordering might be problematic, since it will be non-deterministic.

That's a fair point, but pdata maps are actually essentially just wrappers around slices to key-value objects, so there is technically a deterministic ordering. Maybe providing options for preferring deeper/shallower values is a better approach. If you don't provide first-wins/last-wins options, I think we should deprecate the current behavior.

Ok let's get some conflict examples solved here, so I can proceed with the implementation.

  1. Use case with conflicting maps
"address": map[string]any{
	"street": "first",
	"house":  int64(1234),
},
"address.street": "second",

results in

"address.street":   "first",
"address.house":    int64(1234),
"address.street.1": "second",
  1. Use case with conflicting slices
"address": map[string]any{
	"street": []any{"first"},
	"house":  int64(1234),
},
"address.street": []any{"second"},

results in

"address.street.0":   "first",
"address.house":      int64(1234),
"address.street.0.1": "second",
  1. Use case with conflicting slices and maps
"address": map[string]any{
	"street": map[string]any{
		"number": "first",
	},
	"house": int64(1234),
},
"address.street": map[string]any{
	"number": []any{"second", "third"},
},
"address.street.number": "fourth",

results in

"address.street.number":   "first",
"address.house":           int64(1234),
"address.street.number.0": "second",
"address.street.number.1": "third",
"address.street.number.2": "fourth",

Would you expect a different behavior in any of the use-cases? Especially the third use-case would need some additional attention, since I am not sure, if the information that parameters were an array is important in the resulting outcome. If not, we can just bump the suffix counter without actually preserving the information in any way.

Thanks!

@odubajDT
Copy link
Contributor Author

Ok, looked at a little bit different approach how to resolve conflicts between slices and maps, please have a look at the PR.

If you agree, I can have a look at the potential first/last wins (only when using no conflicts resolution - here it makes no sense, since we do not loose any data)

Thanks!

@evan-bradley @TylerHelmuth

@odubajDT
Copy link
Contributor Author

odubajDT commented Nov 5, 2024

cc @evan-bradley @TylerHelmuth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl
Projects
None yet
3 participants