Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore][processor/transform] fix markdown for convert_exponential_histogram_to_histogram #35414

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
[chore][processor/transform] fix markdown for convert_exponential_his…
…togram_to_histogram
  • Loading branch information
daidokoro committed Sep 25, 2024
commit e11f6c7697d2cf38647e192c59792a9bb12043d9
12 changes: 6 additions & 6 deletions processor/transformprocessor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ The `convert_exponential_histogram_to_histogram` function converts an Exponentia
This function requires 2 arguments:

- `distribution` - This argument defines the distribution algorithm used to allocate the exponential histogram datapoints into a new Explicit Histogram. There are 4 options:
<br>

- __upper__ - This approach identifies the highest possible value of each exponential bucket (_the upper bound_) and uses it to distribute the datapoints by comparing the upper bound of each bucket with the ExplicitBounds provided. This approach works better for small/narrow exponential histograms where the difference between the upper bounds and lower bounds are small.

_For example, Given:_
Expand All @@ -384,14 +384,14 @@ This function requires 2 arguments:
- $15<=15$ (allocate count to this boundary)
6. Allocate count: [0, 0, __10__, 0, 0]
7. Final Counts: [0, 0, __10__, 0, 0]
<br>

- __midpoint__ - This approach works in a similar way to the __upper__ approach, but instead of using the upper bound, it uses the midpoint of each exponential bucket. The midpoint is identified by calculating the average of the upper and lower bounds. This approach also works better for small/narrow exponential histograms.
<br>


>The __uniform__ and __random__ distribution algorithms both utilise the concept of intersecting boundaries.
Intersecting boundaries are any boundary in the `boundaries array` that falls between or on the lower and upper values of the Exponential Histogram boundaries.
_For Example:_ if you have an Exponential Histogram bucket with a lower bound of 10 and upper of 20, and your boundaries array is [5, 10, 15, 20, 25], the intersecting boundaries are 10, 15, and 20 because they lie within the range [10, 20].
<br>

- __uniform__ - This approach distributes the datapoints for each bucket uniformly across the intersecting __ExplicitBounds__. The algorithm works as follows:

- If there are valid intersecting boundaries, the function evenly distributes the count across these boundaries.
Expand All @@ -410,13 +410,13 @@ This function requires 2 arguments:
8. Allocate 3 to each: [0, 3, 3, 3, 0]
9. Distribute remainder $r$ 1: [0, 4, 3, 3, 0]
10. Final Counts: [0, 4, 3, 3, 0]
<br>

- __random__ - This approach distributes the datapoints for each bucket randomly across the intersecting __ExplicitBounds__. This approach works in a similar manner to the uniform distribution algorithm with the main difference being that points are distributed randomly instead of uniformly. This works as follows:
- If there are valid intersecting boundaries, calculate the proportion of the count that should be allocated to each boundary based on the overlap of the boundary with the provided range (lower to upper).
- For each boundary, a random fraction of the calculated proportion is allocated.
- Any remaining count (_due to rounding or random distribution_) is then distributed randomly among the intersecting boundaries.
- If the bucket range does not intersect with any boundaries, the entire count is assigned to the start boundary.
<br>

- `ExplicitBounds` represents the list of bucket boundaries for the new histogram. This argument is __required__ and __cannot be empty__.

__WARNINGS:__
Expand Down