-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Hi.
While fixing our integration with PinotDb I have found what seem to be some bugs in the query compilation or execution process.
For context, we (OpenMetadata) have a stack of metrics we usually compute for our users' tables as part of our data profiling feature. I reckon non-OpenMetadata users might find themselves writing similar queries for their data observability needs.
I'll detail them separately in the order I have discovered them:
Comparing with floats in some nested functions throws a NumberFormatException
One of those metrics is actually a histogram, which for all other DBs we compute with a query roughly like this:
select
sum(case when (age < 35.0) then 1 else 0 end) as age_group_1,
sum(case when (age >= 35.0 and age < 45.0) then 1 else 0 end) age_group_2,
sum(case when (age >= 45.0) then 1 else 0 end) age_group_3
from data_source;Which returns the following error:
Error Code: 200
QueryExecutionError:
Received error query execution result block: {200=QueryExecutionError:
org.apache.pinot.spi.exception.BadQueryRequestException: java.lang.NumberFormatException: For input string: "35.0"
at org.apache.pinot.core.operator.filter.predicate.PredicateEvaluatorProvider.getPredicateEvaluator(PredicateEvaluatorProvider.java:94)
at org.apache.pinot.core.operator.filter.predicate.PredicateEvaluatorProvider.getPredicateEvaluator(PredicateEvaluatorProvider.java:100)
at org.apache.pinot.core.plan.FilterPlanNode.constructPhysicalOperator(FilterPlanNode.java:310)
at org.apache.pinot.core.plan.FilterPlanNode.run(FilterPlanNode.java:93)
...
Caused by: java.lang.NumberFormatException: For input string: "35.0"
at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
at java.base/java.lang.Integer.parseInt(Integer.java:668)
at java.base/java.lang.Integer.parseInt(Integer.java:786)
at org.apache.pinot.segment.local.segment.index.readers.IntDictionary.insertionIndexOf(IntDictionary.java:44)}
org.apache.pinot.query.service.dispatch.QueryDispatcher.runReducer(QueryDispatcher.java:306)
org.apache.pinot.query.service.dispatch.QueryDispatcher.submitAndReduce(QueryDispatcher.java:96)
org.apache.pinot.broker.requesthandler.MultiStageBrokerRequestHandler.handleRequest(MultiStageBrokerRequestHandler.java:219)
org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:133)
This is solved when changing the boundaries to integers.
I initially thought it was an issue with comparing different types but, on top of discovering that it also happens on float<->float comparisons, I discovered another issue:
Nested casting can also break query execution
I attempted to cast the column to float as a fix (before realizing float<->float comparisons also fail), and got the following:
select
sum(case when (cast(age as float) < 35.0) then 1 else 0 end) as age_group_1,
sum(case when (cast(age as float) >= 35.0 and cast(age as float) < 45.0) then 1 else 0 end) age_group_2,
sum(case when (cast(age as float) >= 45.0) then 1 else 0 end) age_group_3
from data_source;ProcessingException(errorCode:150, message:SQLParsingError:
java.lang.Exception: Unable to find table for this query
at org.apache.pinot.controller.api.resources.PinotQueryResource.getMultiStageQueryResponse(PinotQueryResource.java:222)
at org.apache.pinot.controller.api.resources.PinotQueryResource.executeSqlQuery(PinotQueryResource.java:179)
at org.apache.pinot.controller.api.resources.PinotQueryResource.handlePostSql(PinotQueryResource.java:127)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
Caused by: java.lang.RuntimeException: Error composing query plan for: select
sum(case when (cast(age as float) < 35.0) then 1 else 0 end) as age_group_1,
sum(case when (cast(age as float) >= 35.0 and cast(age as float) < 45.0) then 1 else 0 end) age_group_2,
sum(case when (cast(age as float) >= 45.0) then 1 else 0 end) age_group_3
from data_source;
...
Caused by: java.lang.IllegalArgumentException: Table does not exist: 'data_source'
at org.apache.pinot.shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:145)
at org.apache.pinot.query.catalog.PinotCatalog.getTable(PinotCatalog.java:78)
at org.apache.calcite.jdbc.SimpleCalciteSchema.getImplicitTable(SimpleCalciteSchema.java:128)
at org.apache.calcite.jdbc.CalciteSchema.getTable(CalciteSchema.java:283))
Finally, I discovered the source of these errors: SUM breaks the whole thing
If I changed the query to:
select
case when (cast(age as float) < 35.0) then 1 else 0 end as age_group_1,
case when (cast(age as float) >= 35.0 and cast(age as float) < 45.0) then 1 else 0 end age_group_2,
case when (cast(age as float) >= 45.0) then 1 else 0 end age_group_3
from data_source;And got 1's and 0's where I expected. So there's must be something fishy going about SUM.
For the record I am using PinotDB 1.2.0 in a simple docker compose deployment.