Skip to content
This repository was archived by the owner on Jun 26, 2024. It is now read-only.

chore: convert 'null' string array value to list with 'null' value #186

Merged
merged 9 commits into from
Dec 7, 2023

Conversation

saxenakshitiz
Copy link
Contributor

No description provided.

@saxenakshitiz saxenakshitiz requested a review from a team as a code owner December 6, 2023 12:15
Copy link

github-actions bot commented Dec 6, 2023

Test Results

364 tests  +3   364 ✔️ +3   8s ⏱️ ±0s
  66 suites ±0       0 💤 ±0 
  66 files   ±0       0 ±0 

Results for commit a17bd2b. ± Comparison against base commit 9f2aa9d.

This pull request removes 29 and adds 17 tests. Note that renamed tests count towards both.

      long: 60
      string: "PT1M"
      valueType: LONG
      valueType: STRING
    alias: "numCalls"
    columnName: "SERVICE.numCalls"
    value {
    }
  columnIdentifier {
…
org.hypertrace.gateway.service.baseline.BaselineServiceImplTest ‑ [1] function: AVGRATE
arguments {
  columnIdentifier {
    columnName: "SERVICE.numCalls"
    alias: "numCalls"
  }
}
arguments {
  literal {
    value {
      valueType: STRING
      string: "PT1M"
    }
  }
}
alias: "numCalls"

org.hypertrace.gateway.service.baseline.BaselineServiceImplTest ‑ [2] function: AVGRATE
arguments {
  columnIdentifier {
    columnName: "SERVICE.numCalls"
    alias: "numCalls"
  }
}
arguments {
  literal {
    value {
      valueType: LONG
      long: 60
    }
  }
}
alias: "numCalls"

org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [10] group-by-with-offset
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [11] simple-aggregations
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [12] group-by-with-group-limit-and-rest
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [13] grouped-time-aggregation-with-group-limit-with-null-value-group
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [14] time-aggregations
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [15] aggregations-with-multiple-groupbys-and-the-rest-deprecated
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [1] grouped-time-aggregation-with-group-limit-with-null-and-nonnull-value-group
org.hypertrace.gateway.service.explore.ExploreServiceTest ‑ [2] time-aggregations-space-filtered
…

♻️ This comment has been updated with latest results.

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (9f2aa9d) 81.48% compared to head (a17bd2b) 81.47%.

Files Patch % Lines
...ay/service/explore/TheRestGroupRequestHandler.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #186      +/-   ##
============================================
- Coverage     81.48%   81.47%   -0.01%     
- Complexity     1329     1334       +5     
============================================
  Files           125      125              
  Lines          5963     5971       +8     
  Branches        490      490              
============================================
+ Hits           4859     4865       +6     
- Misses          863      864       +1     
- Partials        241      242       +1     
Flag Coverage Δ
unit 81.47% <80.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -92,6 +94,12 @@ private List<String> convertToArray(String jsonString) {
return List.of();
}

// handle special case when "null" string is returned as string array value(default value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline - concerned about this behavior change, as every query that was previously returning an empty array (e.g. give me the labels for this API) will now start containing nulls. Existing behavior seemed more accurate, we just need to make sure we appropriately handle group by case and either return no groups (short circuit) or translate it back if we think we should look for explicitly empty groups as part of a group by query. I'm not sure what the actual correct expectation should be there.

Copy link
Contributor

@kotharironak kotharironak Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you mentioned, the Gateway (GW) is not handling null (or empty) translations, especially for string_array attributes on both sides (input/output). While outputting, it treats null from downstream as an empty list, but when making a query, it does not do so. The GW should handle to its own definition of handling null (or empty) when sending requests to downstream translation as well.

So, would suggest that when converting a query to Query Service (QS), if there is a string_array attribute and its right-hand side (RHS) value is empty (empty list), it should be translated to null before sending query to QS.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saxenakshitiz Sorry, was in a hurry yesterday when suggesting to output null in the toArrayConvert method. This makes sense - link to the GitHub discussion.

aaron-steinfeld
aaron-steinfeld previously approved these changes Dec 6, 2023
// Handle special case when "null" string is returned as string array value(default value
// scenario). This will be converted to empty list in value converter. In such a case use
// list with "null" value
if (value.getStringArrayList().isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so one behavior discrepancy this would lead to to - let's say we have 10 rows, and 9 of them have no values for a multivalued col X and 1 of them does. Both before and after this change, if we grouped by X we'd get a single returned grouped with a count of 1. Now, let's say we filter for where X = null. Before this change, we'd error out (the in [] case we're trying to fix), after we'd return a new group of [] -> 9. So this is problematic for two different reasons

  1. Filtering is actually introducing rather than removing results
  2. There's a group that's only discoverable when no other values are present.

WDYT about just always adding null here (if not already present)?

Copy link
Contributor

@kotharironak kotharironak Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so one behavior discrepancy this would lead to to - let's say we have 10 rows, and 9 of them have no values for a multivalued col X and 1 of them does. Both before and after this change, if we grouped by X we'd get a single returned grouped with a count of 1.

From the Query Service (QS) point of view, it will return two groups. Let's assume that 1 row which is not empty has value - ["hello"]. The output will be as below.
Returned groups:

{
  {"null", 9},
  {"hello", 1}
}

And if the none empty row of X has more than one value, say - ["hello", "world"], QS will return 3 groups:

{
  {"null", 9},
  {"hello", 1},
  {"world", 1}
}

Moving on to Pinot, if the column (or multi-value column) has no value, it will return the default value. The default value for a String column is null. For a multi-value column, if not preset, it uses the default of a single-value column of the same type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The described behavior makes sense, the first issue to resolve however is how GW should behave because it's not consistent. If we do a group by with time series, it won't return that null (or empty list); if we do it without, it returns an empty list as one value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that both should return empty list

@aaron-steinfeld aaron-steinfeld dismissed their stale review December 6, 2023 22:57

inadvertent approval

Copy link
Contributor

@kotharironak kotharironak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -0,0 +1,45 @@
{
"row": [{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the query look like for this for gateway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! okey, it will be an internal query. Or would be IN query from GraphQL to gateway too. That is currently not expressible, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if a gateway client needs to explicitly ask for a null group, it should create a filter with the right-hand side (RHS) containing only the valueType. However, it can't query the null group along with other groups explicitly in the IN clause, right? This is same refer in the comment by @aaron-steinfeld here - #186 (comment)

WDYT about just always adding null here (if not already present)?

This will be opposite to the current case - where we don't want the null group; then we will always return it. I think both seem the same to me. So, I would say we can stick to the current behavior here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have all queries in the test

  1. GraphQL query to GW
  2. GW query to QS and its response
  3. GW response to GraphQL

@@ -127,4 +128,17 @@ public static Expression createTimeColumnGroupByExpression(
.addArguments(createStringLiteralExpression(periodSecs + ":SECONDS")))
.build();
}

public static List<String> convertStringArrayValue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we adjust this naming and maybe put it with the conversion in the other direction (is that in this file?) - it's usage and implementation make sense, but the naming still feels very unintuitive. Not required if this introduces and meaningful refactoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not refactoring. Added a link to method which handles conversion in other direction.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants