Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] OpenSearch SQL/PPL multiple indices support #3099

Open
penghuo opened this issue Oct 23, 2024 · 2 comments
Open

[FEATURE] OpenSearch SQL/PPL multiple indices support #3099

penghuo opened this issue Oct 23, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@penghuo
Copy link
Collaborator

penghuo commented Oct 23, 2024

Is your feature request related to a problem?
Currently, OpenSearch SQL/PPL support query multiple indices. for instance

  • case-1, source=acc* | stats count() / select count() from acc*
  • case-2, source=accounts, account2 | stats count() / select count() from accounts,account2
  • case-3, ```source=accounts,account2 | stats count(); / select count from `accounts,account2````

The OpenSearch _search API supports comma-separated lists of indices, wildcard (*) expressions, and table relations in the source and from clauses, following the same conventions. This applies to both OpenSearch SQL and PPL. However, the case-2 grammar contradicts the SQL comma join syntax.

Related issue, #51
Early discussion, opensearch-project/opensearch-spark#775

What solution would you like?
I propose a nameing conveing to map opensearch index to table relation.
You can use a comma-separated list indices, remote index as table relation, for instance index1,remote:test2,test3. Index name could also include wildcard () and question mark (?) to target index that match a pattern: index?, ind.
In SQL and PPL, if a table name or relation includes special characters (e.g., ','), it should be enclosed in backticks (`).

tableRelation := indexNameExpression [,indexNameExpression]
indexNameExpression := [-][cluster:]:[localIndexNameExpression]
localIndexNameExpression := [*?\w\d]
@penghuo penghuo added enhancement New feature or request untriaged and removed untriaged labels Oct 23, 2024
@penghuo penghuo removed the untriaged label Oct 23, 2024
@LantaoJin
Copy link
Member

LantaoJin commented Oct 24, 2024

Copy some context in comment to here:

PPL on OpenSearch supports:

  1. source=accounts, account2
  2. source=`accounts`,`account2`
  3. source=`account*`
  4. source=`accounts, account2`

But PPL on Spark supports the first two currently.
I think the third one (wildcard) should be supported in PPL on Spark too. But I would suggest to mark the fourth one as invalid since users will treat the content in backticks as a whole as usual. We have some examples from real, users from CloudWatch provided a table pattern:

`_CWLBasic`.default.`Canary/service.log`

And quote their requests:

So if query is select a.b.Service.Log where service.log is table name I am sure sql will also split everything by dot and consider only Log as table name while preceding stuff as namespace. However any entity enclosed in backticks sql considers as a single entity. So if query is select a.b.’Service.log’ where table name is enclosed in backticks sql will only consider a and b as name space and ignore everything which is within backticks

So I think sql considers everything enclosed in backticks as a single entity. `Canary/service.log` should be one whole.

My suggestion is treating content in backticks as a whole. For instance:

  • ✅ source=`catalog`.`namespace`.`index-2024*`, `catalog`.`namespace`.`index-2023-12`
  • ✅ source=`catalog`.`namespace`.index-2024*, index-2023-12
  • ❌ source=`catalog`.`namespace`.`index-2024*, index-2023-12`

@YANG-DB
Copy link
Member

YANG-DB commented Oct 24, 2024

@penghuo I agree with @LantaoJin - we should support content in backticks as a whole

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants