-
Notifications
You must be signed in to change notification settings - Fork 8
chore: convert 'null' string array value to list with 'null' value #186
Conversation
Test Results364 tests +3 364 ✔️ +3 8s ⏱️ ±0s Results for commit a17bd2b. ± Comparison against base commit 9f2aa9d. This pull request removes 29 and adds 17 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #186 +/- ##
============================================
- Coverage 81.48% 81.47% -0.01%
- Complexity 1329 1334 +5
============================================
Files 125 125
Lines 5963 5971 +8
Branches 490 490
============================================
+ Hits 4859 4865 +6
- Misses 863 864 +1
- Partials 241 242 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -92,6 +94,12 @@ private List<String> convertToArray(String jsonString) { | |||
return List.of(); | |||
} | |||
|
|||
// handle special case when "null" string is returned as string array value(default value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline - concerned about this behavior change, as every query that was previously returning an empty array (e.g. give me the labels for this API) will now start containing nulls. Existing behavior seemed more accurate, we just need to make sure we appropriately handle group by case and either return no groups (short circuit) or translate it back if we think we should look for explicitly empty groups as part of a group by query. I'm not sure what the actual correct expectation should be there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned, the Gateway (GW) is not handling null (or empty) translations, especially for string_array attributes on both sides (input/output). While outputting, it treats null
from downstream as an empty list, but when making a query, it does not do so. The GW should handle to its own definition of handling null (or empty) when sending requests to downstream translation as well.
So, would suggest that when converting a query to Query Service (QS), if there is a string_array attribute and its right-hand side (RHS) value is empty (empty list), it should be translated to null
before sending query to QS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saxenakshitiz Sorry, was in a hurry yesterday when suggesting to output null
in the toArrayConvert method. This makes sense - link to the GitHub discussion.
// Handle special case when "null" string is returned as string array value(default value | ||
// scenario). This will be converted to empty list in value converter. In such a case use | ||
// list with "null" value | ||
if (value.getStringArrayList().isEmpty()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so one behavior discrepancy this would lead to to - let's say we have 10 rows, and 9 of them have no values for a multivalued col X and 1 of them does. Both before and after this change, if we grouped by X we'd get a single returned grouped with a count of 1. Now, let's say we filter for where X = null. Before this change, we'd error out (the in [] case we're trying to fix), after we'd return a new group of [] -> 9. So this is problematic for two different reasons
- Filtering is actually introducing rather than removing results
- There's a group that's only discoverable when no other values are present.
WDYT about just always adding null here (if not already present)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so one behavior discrepancy this would lead to to - let's say we have 10 rows, and 9 of them have no values for a multivalued col X and 1 of them does. Both before and after this change, if we grouped by X we'd get a single returned grouped with a count of 1.
From the Query Service (QS) point of view, it will return two groups. Let's assume that 1 row which is not empty has value - ["hello"]. The output will be as below.
Returned groups:
{
{"null", 9},
{"hello", 1}
}
And if the none empty row of X has more than one value, say - ["hello", "world"], QS will return 3 groups:
{
{"null", 9},
{"hello", 1},
{"world", 1}
}
Moving on to Pinot, if the column (or multi-value column) has no value, it will return the default value. The default value for a String column is null
. For a multi-value column, if not preset, it uses the default of a single-value column of the same type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The described behavior makes sense, the first issue to resolve however is how GW should behave because it's not consistent. If we do a group by with time series, it won't return that null (or empty list); if we do it without, it returns an empty list as one value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that both should return empty list
...n/java/org/hypertrace/gateway/service/explore/TimeAggregationsWithGroupByRequestHandler.java
Outdated
Show resolved
Hide resolved
...ce-impl/src/main/java/org/hypertrace/gateway/service/explore/TheRestGroupRequestHandler.java
Show resolved
Hide resolved
...ce-impl/src/main/java/org/hypertrace/gateway/service/explore/TheRestGroupRequestHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@@ -0,0 +1,45 @@ | |||
{ | |||
"row": [{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whats the query look like for this for gateway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! okey, it will be an internal query. Or would be IN query from GraphQL to gateway too. That is currently not expressible, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if a gateway client needs to explicitly ask for a null
group, it should create a filter with the right-hand side (RHS) containing only the valueType. However, it can't query the null
group along with other groups explicitly in the IN
clause, right? This is same refer in the comment by @aaron-steinfeld here - #186 (comment)
WDYT about just always adding null here (if not already present)?
This will be opposite to the current case - where we don't want the null
group; then we will always return it. I think both seem the same to me. So, I would say we can stick to the current behavior here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have all queries in the test
- GraphQL query to GW
- GW query to QS and its response
- GW response to GraphQL
...ce-impl/src/main/java/org/hypertrace/gateway/service/explore/TheRestGroupRequestHandler.java
Show resolved
Hide resolved
@@ -127,4 +128,17 @@ public static Expression createTimeColumnGroupByExpression( | |||
.addArguments(createStringLiteralExpression(periodSecs + ":SECONDS"))) | |||
.build(); | |||
} | |||
|
|||
public static List<String> convertStringArrayValue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we adjust this naming and maybe put it with the conversion in the other direction (is that in this file?) - it's usage and implementation make sense, but the naming still feels very unintuitive. Not required if this introduces and meaningful refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not refactoring. Added a link to method which handles conversion in other direction.
No description provided.