-
Notifications
You must be signed in to change notification settings - Fork 181
Description
Query Information
PPL Command/Query:
source=product_reviews
| stats COUNT() AS RatingCount, AVG(rating) AS Avg_ComputedRating by product_id
| where RatingCount >= 100
| join type=left left=RATING right=PRODUCT ON RATING.product_id = PRODUCT.product_id
[
source=product_info | fields product_id, average_rating, title
]
| eval DiffRating=Avg_ComputedRating - PRODUCT.average_rating
| fields RATING.product_id, Avg_ComputedRating, PRODUCT.average_rating, DiffRating, PRODUCT.title
Expected Result:
The query should execute successfully using the PPL 3.3.0+ extended syntax with type=left option, as documented in the join command documentation.
Actual Result:
{
"error": {
"reason": "Invalid Query",
"details": "Join type is ambiguous, remove either the join type before JOIN keyword or 'type=' option.",
"type": "SemanticCheckException"
},
"status": 400
}The error message incorrectly suggests that both syntaxes are present (join type before JOIN keyword AND type= option), when only the type= option is used.
Dataset Information
Dataset/Schema Type
- OpenTelemetry (OTEL)
- Simple Schema for Observability (SS4O)
- Open Cybersecurity Schema Framework (OCSF)
- Custom (details below)
Index Mapping
product_reviews:
{
"mappings": {
"properties": {
"product_id": { "type": "text" },
"rating": { "type": "long" }
}
}
}product_info:
{
"mappings": {
"properties": {
"product_id": { "type": "text" },
"average_rating": { "type": "float" },
"title": { "type": "text" }
}
}
}Sample Data
product_reviews (105 documents with product_id="P001" and ratings 3-5):
{"product_id": "P001", "rating": 4}product_info:
{"product_id": "P001", "average_rating": 4.2, "title": "Sample Product"}Bug Description
Issue Summary:
PPL parser incorrectly rejects the new PPL 3.3.0+ join syntax join type=left ... ON ... with a misleading error about ambiguous join type, even though only the type= option is specified and no join type appears before the JOIN keyword.
Steps to Reproduce:
- Create two indices with sample data (product_reviews and product_info)
- Execute a PPL query using the new syntax:
join type=left left=RATING right=PRODUCT ON ... - Observe the SemanticCheckException error
Impact:
- Users cannot use the documented PPL 3.3.0+ extended syntax with
type=option when using ON/WHERE join criteria - The error message is misleading, suggesting both syntaxes are present when only one is used
- Forces users to use the older syntax (e.g.,
left joininstead ofjoin type=left)
Tentative Root Cause:
Based on preliminary analysis, the issue appears to be in /ppl/src/main/java/org/opensearch/sql/ppl/parser/AstBuilder.java at lines 214-232:
public UnresolvedPlan visitJoinCommand(OpenSearchPPLParser.JoinCommandContext ctx) {
boolean sqlLike = ctx.joinCriteria() != null;
Join.JoinType joinType = null;
if (sqlLike) {
joinType = ArgumentFactory.getJoinType(ctx.sqlLikeJoinType()); // Line 219
}
// ...
if (argumentMap.get("type") != null) {
Join.JoinType joinTypeFromArgument = ArgumentFactory.getJoinType(argumentMap);
if (sqlLike && joinType != joinTypeFromArgument) { // Line 226
throw new SemanticCheckException(
"Join type is ambiguous, remove either the join type before JOIN keyword or 'type='"
+ " option.");
}
joinType = joinTypeFromArgument;
}
}The problem occurs because:
- When
ONclause is present,sqlLikeis set totrue ArgumentFactory.getJoinType(ctx.sqlLikeJoinType())is called withnullcontext- In
/ppl/src/main/java/org/opensearch/sql/ppl/utils/ArgumentFactory.java, when context is null, it defaults toJoin.JoinType.INNER - The comparison at line 226 compares this default
INNERwith the explicitLEFTfromtype=left, finds they differ, and throws the ambiguity error
The logic incorrectly treats the absence of a join type before JOIN as an implicit INNER join, then compares it against the explicit type= option.
Tentative Proposed Fix:
This is a preliminary analysis and requires further validation
The fix should distinguish between an explicitly specified join type and a null/absent join type. One approach:
In AstBuilder.java, modify the logic to only check for ambiguity when sqlLikeJoinType context is explicitly provided (not null):
if (sqlLike) {
if (ctx.sqlLikeJoinType() != null) {
joinType = ArgumentFactory.getJoinType(ctx.sqlLikeJoinType());
}
}
// ...
if (argumentMap.get("type") != null) {
Join.JoinType joinTypeFromArgument = ArgumentFactory.getJoinType(argumentMap);
if (sqlLike && joinType != null && joinType != joinTypeFromArgument) {
throw new SemanticCheckException(
"Join type is ambiguous, remove either the join type before JOIN keyword or 'type='"
+ " option.");
}
joinType = joinTypeFromArgument;
}
if (joinType == null) {
joinType = Join.JoinType.INNER; // Apply default only if no type was specified
}Workaround:
Use the older PPL 3.0.0 syntax with join type before the JOIN keyword
Environment Information
OpenSearch Version:
OpenSearch 3.3.0-SNAPSHOT (build hash: e6382333112715aee285cb81097a6ee51fb6802f, build date: 2025-10-07T01:45:05.033269Z)
Additional Details:
- Tested on local environment
- SQL plugin version: bundled with OpenSearch 3.3.0-SNAPSHOT
- The issue affects the PPL parser/semantic checker, not the execution engine
Additional Notes
Documentation Clarity Issue:
The documentation at /docs/user/ppl/cmd/join.rst uses type=outer in examples, which is ambiguous. The term "outer" doesn't clearly indicate whether it's a left outer join or a right outer join. According to the code, OUTER is treated as an alias for LEFT, but this should be explicitly documented. Consider using more explicit type names, such as type=left or type=right, in documentation examples to avoid confusion.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status