-
Notifications
You must be signed in to change notification settings - Fork 181
Add configurable sytem limitations for subsearch and join command
#4501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add configurable sytem limitations for subsearch and join command
#4501
Conversation
Signed-off-by: Lantao Jin <ltjin@amazon.com>
dd4332e to
7bee225
Compare
opensearch/src/main/java/org/opensearch/sql/opensearch/executor/OpenSearchExecutionEngine.java
Show resolved
Hide resolved
| children.forEach(c -> analyze(c, context)); | ||
| // add join.subsearch_maxout limit to subsearch side | ||
| if (context.sysLimit.joinSubsearchLimit() >= 0) { | ||
| replaceTop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, is it possible to avoid access private method?
2cents, Add a frame in CalcitePlanContext, frame is boundary of subsearch, and define limit on frame. When visit subsearch, append LogicalSystemLimit to subsearch on each frame.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, is it possible to avoid access private method?
I don't think so.
When visit the subsearch side (right in join for example), the right plan was pushed to stack.
public RelNode analyze(UnresolvedPlan unresolved, CalcitePlanContext context) {
return unresolved.accept(this, context);
}
RelBuilder.pop() is private either. So we don't have a way to replace it.
Here was my previous try code for join
public RelNode visitJoin(Join node, CalcitePlanContext context) {
// visit the main side
analyze(node.getLeft(), context);
if (context.sysLimit.joinSubsearchLimit() >= 0) {
// add join.subsearch_maxout limit to subsearch side
RelNode withLimit = context.relBuilder.with(
analyze(node.getRight(), context),
r -> LogicalSystemLimit.create(
SystemLimitType.JOIN_SUBSEARCH_MAXOUT,
r.peek(),
r.literal(context.sysLimit.joinSubsearchLimit())));
context.relBuilder.push(withLimit); // push the new subsearch plan
} else {
// visit the subsearch side
analyze(node.getRight(), context);
}
The code use relBuilder.with(), but the first parameter analyze(node.getRight(), context) will push the subsearch to stack, and the with() method push it twice.
/** Evaluates an expression with a relational expression temporarily on the
* stack. */
public <E> E with(RelNode r, Function<RelBuilder, E> fn) {
try {
push(r);
return fn.apply(this);
} finally {
stack.pop();
}
}
- push left plan by
analyze(node.getLeft(), context), stack size is 1 - push right plan by the first parameter of
with(analyze(node.getRight(), context)), stack size is 2 - push duplicated right plan by
pushinwith, stack size is 3 - pop duplicated right plan by
popinwith, stack size is 2 - push new right plan by
context.relBuilder.push(withLimit), stack size is 3 (incorrect)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work by using relbuilder.build() + relbuilder.push(newTop)? relbuilder.build() will do pop while public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sync offline. We still cannot use relbuilder.build() + relbuilder.push(newTop) since it will empty the fields of Frame.
private void replaceTop(RelNode node) {
final Frame frame = stack.pop();
stack.push(new Frame(node, frame.fields)); // <--- frame.fields will be kept all the time
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does SQL Join translate to RelNode? It use the private method?
| Description | ||
| ----------- | ||
|
|
||
| The size configures the maximum of rows to return from subsearch. The default value is: ``10000``. A value of ``-1`` indicates that the restriction is unlimited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if set to 0? Join/Subquery will be optimzied by Calcite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I followed the standard behaviour in database:
select * from t_outer where exists (select 1 from t_inner where t_outer.id = t_inner.id limit 0);
select * from t_outer where id in (select id from t_inner limit 0);
select * from t_outer where id = (select id from t_inner limit 0);
select * from t_outer where id = (select count(*) from t_inner limit 0);
All above queries return empty in SQL (postgresql).
The implementation is here https://github.com/opensearch-project/sql/pull/4501/files#diff-e5198d773af75bf3173ef25676a2803a0091cb51e32d6ae30241273519d30261R601-R605
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thoughts, set both 0 and negative value to unlimited? @penghuo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, 0 and -1 means unlimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, 0 and -1 means unlimited.
sure, let me update the code and doc to
0 means unlimited, and minValue=0 in Settings
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
| children.forEach(c -> analyze(c, context)); | ||
| // add join.subsearch_maxout limit to subsearch side | ||
| if (context.sysLimit.joinSubsearchLimit() >= 0) { | ||
| replaceTop( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it work by using relbuilder.build() + relbuilder.push(newTop)? relbuilder.build() will do pop while public.
| @Override | ||
| public RelNode visit(RelNode other) { | ||
| RelNode newInput = | ||
| other.getInputs().isEmpty() ? null : other.getInput(0).accept(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[question]Will there be case that there is join or union in subsearch? In those case there will be more than 1 input for the specific operators? If so, the current code will construct incorrect plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest commit. For BiRel or SetOp, just return.
| }); | ||
| planVisitor.replaceTop(context.relBuilder, replacement); | ||
| } | ||
| if (subqueryExpression instanceof InSubquery) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQL support correlate condition for in subquery or scalar subquery. So Calcite should support them as well.
e.g.
SELECT * FROM EMPLOYEE WHERE location in (select location from DEPART where EMPLOYEE.dept = DEPART.name) limit 1
If there is correlate condition for in or scalar subsearch, shall we do similar operation like above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the same logic for correlated in-subquery. For correlated scalar-subquery, since there is always an aggregation will be perform in subquery, sysLimit is not necessary.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteExplainIT.java
Show resolved
Hide resolved
opensearch/src/main/java/org/opensearch/sql/opensearch/setting/OpenSearchSettings.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Lantao Jin <ltjin@amazon.com>
|
The backport to To backport manually, run these commands in your terminal: # Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.19-dev 2.19-dev
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.19-dev
# Create a new branch
git switch --create backport/backport-4501-to-2.19-dev
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 fddbb705a6aeae138915e2174d5d7ea3ccbd3e9e
# Push it to GitHub
git push --set-upstream origin backport/backport-4501-to-2.19-dev
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.19-devThen, create a pull request where the |
…opensearch-project#4501) * Add configurable sytem limitations for subsearch and join command Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove rollback in doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> (cherry picked from commit fddbb70) Signed-off-by: Lantao Jin <ltjin@amazon.com>
…` and `join` command (#4501) (#4535) * Add configurable sytem limitations for `subsearch` and `join` command (#4501) * Add configurable sytem limitations for subsearch and join command Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove rollback in doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> (cherry picked from commit fddbb70) Signed-off-by: Lantao Jin <ltjin@amazon.com> * migrate java 21 to 11 Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix conflicts Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com>
commit cba8d02 Author: Tomoyuki MORITA <moritato@amazon.com> Date: Wed Oct 15 13:08:05 2025 -0700 Add MAP_APPEND internal function to Calcite PPL (opensearch-project#4515) * Add MAP_APPEND internal function to Calcite PPL Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comment Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Rebase and fix IT issue Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> commit 3388dc7 Author: Lantao Jin <ltjin@amazon.com> Date: Thu Oct 16 01:45:29 2025 +0800 Use `_doc` + `_shard_doc` as sort tiebreaker to get better performance (opensearch-project#4569) * Use _shard_doc as sort tiebreaker Signed-off-by: Lantao Jin <ltjin@amazon.com> * _doc as a part of tie-breaker have better performance Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> commit 5630119 Author: qianheng <qianheng@amazon.com> Date: Wed Oct 15 16:40:41 2025 +0800 Fix sort push down into agg after project already pushed (opensearch-project#4546) * Fix sort push down into agg Signed-off-by: Heng Qian <qianheng@amazon.com> * Change some json files to yaml format Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> commit 1e62fba Author: Tomoyuki MORITA <moritato@amazon.com> Date: Tue Oct 14 17:20:38 2025 -0700 Fix JsonExtractAllFunctionIT failure (opensearch-project#4556) Signed-off-by: Tomoyuki Morita <moritato@amazon.com> commit 02ee33e Author: Kai Huang <105710027+ahkcs@users.noreply.github.com> Date: Tue Oct 14 14:28:53 2025 -0700 Add more examples to the `where` command doc (opensearch-project#4457) Co-authored-by: Manasvini B S <manasvis@amazon.com> commit 0b7e86c Author: Jialiang Liang <jiallian@amazon.com> Date: Tue Oct 14 10:46:01 2025 -0700 [Enhancement] Error handling for illegal character usage in java regex named capture group (opensearch-project#4434) Co-authored-by: Simeon Widdis <sawiddis@amazon.com> commit 9c97cfb Author: Tomoyuki MORITA <moritato@amazon.com> Date: Tue Oct 14 08:36:43 2025 -0700 Add JSON_EXTRACT_ALL internal function for Calcite PPL (opensearch-project#4489) * Add JSON_EXTRACT_ALL internal function for Calcite PPL Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Address comments Signed-off-by: Tomoyuki Morita <moritato@amazon.com> * Minor fix Signed-off-by: Tomoyuki Morita <moritato@amazon.com> --------- Signed-off-by: Tomoyuki Morita <moritato@amazon.com> commit 89dbc31 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 18:24:52 2025 +0800 Check server status before starting Prometheus (opensearch-project#4537) * Check server status before starting Prometheus Signed-off-by: Lantao Jin <ltjin@amazon.com> * Change to func call Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix doc Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> commit fe62472 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 18:10:27 2025 +0800 Update request builder after pushdown sort into agg buckets (opensearch-project#4541) Signed-off-by: Lantao Jin <ltjin@amazon.com> commit 42a415f Author: qianheng <qianheng@amazon.com> Date: Tue Oct 14 17:42:45 2025 +0800 Including metadata fields type when doing agg/filter script push down (opensearch-project#4522) * Including metadata fields type when doing agg/filter script push down Signed-off-by: Heng Qian <qianheng@amazon.com> * Fix IT Signed-off-by: Heng Qian <qianheng@amazon.com> --------- Signed-off-by: Heng Qian <qianheng@amazon.com> commit 8de0386 Author: Xinyuan Lu <xinyual@amazon.com> Date: Tue Oct 14 16:41:08 2025 +0800 Fix percentile bug (opensearch-project#4539) * fix percentile bug Signed-off-by: xinyual <xinyual@amazon.com> * add IT Signed-off-by: xinyual <xinyual@amazon.com> * optimize it Signed-off-by: xinyual <xinyual@amazon.com> --------- Signed-off-by: xinyual <xinyual@amazon.com> commit de2fdc8 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 12:29:03 2025 +0800 [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited (opensearch-project#4534) * [FollowUp] Set 0 and negative value of subsearch.maxout as unlimited Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix doctest Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix conflicts Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> commit 977b7ab Author: Simeon Widdis <sawiddis@gmail.com> Date: Mon Oct 13 20:23:10 2025 -0700 Update stalled action (opensearch-project#4485) commit fddbb70 Author: Lantao Jin <ltjin@amazon.com> Date: Tue Oct 14 10:23:12 2025 +0800 Add configurable sytem limitations for `subsearch` and `join` command (opensearch-project#4501) * Add configurable sytem limitations for subsearch and join command Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> * remove rollback in doc Signed-off-by: Lantao Jin <ltjin@amazon.com> * address comments Signed-off-by: Lantao Jin <ltjin@amazon.com> * fix typo Signed-off-by: Lantao Jin <ltjin@amazon.com> * Fix IT Signed-off-by: Lantao Jin <ltjin@amazon.com> --------- Signed-off-by: Lantao Jin <ltjin@amazon.com> Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Description
Add two configurable limitations for PPL.
maxoutin [subsearch], ref)subsearch_maxoutin [join], ref)Related Issues
Resolves #3731 and #4430
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.