Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to count unique values of specified key(s) to CountAggregateAction #4644

Closed
kkondaka opened this issue Jun 19, 2024 · 3 comments · Fixed by #4652
Closed

Add an option to count unique values of specified key(s) to CountAggregateAction #4644

kkondaka opened this issue Jun 19, 2024 · 3 comments · Fixed by #4652
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@kkondaka
Copy link
Collaborator

Is your feature request related to a problem? Please describe.
Aggregate Processor count aggregate action counts events with same identification_keys. There are some cases where we need to count based on secondary keys. For example, when OTEL traces with service name, traceId are sent through aggregate processor, if we need to count number of traces in each service, it is not possible because using both serviceName and traceId as keys would send unique values of the identification keys to different nodes. If use just serviceName as identification_keys, there is no action currently implemented in AggregateProcessor that can send ALL events of a service to one node.

Describe the solution you'd like
Solution is to have an option like unique_keys under count aggregate action that counts the number of unique keys under identification_keys.

A configuration like this

   processor:
       - aggregate:
             identification_keys: ["serviceName"]
             action:
                 count:
                      unique_keys : ["traceId"]

The above config will count number of unique traceId in a serviceName

Describe alternatives you've considered (Optional)
Alternative is to have an action like all_events which passes all events matching identification_keys of serviceName and then have another aggregate processor with identification_keys as traceId. Currently, two aggregate processors of "remote peer" type are not allowed, which makes this solution infeasible.

Additional context
Add any other context or screenshots about the feature request here.

@dlvenable
Copy link
Member

I think this proposal makes sense. Even aside from the limitation of aggregate processors, it would be easier for users to have a way to select unique values.

I do want to clarify the behavior with multiple unique_keys since this is an array. Will it be the same approach as with identification_keys? Thus, when two Events are unique if all values for all unique_keys are the same?

@kkondaka kkondaka self-assigned this Jun 19, 2024
@dlvenable dlvenable added this to the v2.9 milestone Jun 19, 2024
@dlvenable dlvenable added the enhancement New feature or request label Jun 19, 2024
@kkondaka
Copy link
Collaborator Author

@dlvenable , yes, I am thinking that with multiple keys, the approach will be same as identification_keys. Thinking of implementing it same way using hashing, which means two events are unique if all values for all unique_keys are same. So, if unique_keys is ["srcIp", "srcPort"], all unique combinations of srcIp+srcPort are counted.

@dlvenable
Copy link
Member

This sounds great! Thanks for the proposal!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
2 participants