Skip to content

Conversation

ahkcs
Copy link
Contributor

@ahkcs ahkcs commented Aug 22, 2025

Description

Enhanced the DISTINCT_COUNT/DC function documentation in eventstats.rst with detailed information about the Cardinality Aggregation

Also added fields in test to make sure of the consistency of the output among different java versions

Also enhanced bin.rst file to make it clear that the command is available since 3.3

@ahkcs ahkcs changed the title Add distinct_count doc for eventstats Enhance distinct_count doc for eventstats Aug 24, 2025
@ahkcs ahkcs changed the title Enhance distinct_count doc for eventstats Doc enhancement for distinct_count for eventstats and bin command Aug 25, 2025
@ahkcs ahkcs changed the title Doc enhancement for distinct_count for eventstats and bin command Doc/test enhancement for eventstats and bin command Aug 25, 2025
@ahkcs ahkcs force-pushed the feat/eventstats_dc_doc branch from 98c0710 to 2e8b383 Compare August 29, 2025 19:10
@ahkcs ahkcs marked this pull request as ready for review August 29, 2025 19:52
@ahkcs ahkcs changed the title Doc/test enhancement for eventstats and bin command Doc for eventstats and bin command Sep 3, 2025
@ahkcs ahkcs changed the title Doc for eventstats and bin command Doc enhancement for eventstats and bin command Sep 3, 2025
@Swiddis Swiddis added the documentation Improvements or additions to documentation label Sep 3, 2025
@ahkcs ahkcs force-pushed the feat/eventstats_dc_doc branch from f6f45b8 to e5643ed Compare September 3, 2025 17:58
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs force-pushed the feat/eventstats_dc_doc branch from e5643ed to cbce5ef Compare September 3, 2025 19:25
Swiddis
Swiddis previously approved these changes Sep 3, 2025
dai-chen
dai-chen previously approved these changes Sep 3, 2025
Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values of expr using HyperLogLog++ algorithm. Both ``DISTINCT_COUNT`` and ``DC`` are equivalent and provide the same functionality.
Usage: DISTINCT_COUNT(expr), DC(expr). Returns the approximate number of distinct values using the HyperLogLog++ algorithm. Both functions are equivalent.

**Algorithm & Accuracy:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better link to OpenSearch doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link

Copy link
Collaborator

@dai-chen dai-chen Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant probably we don't need to copy these 2 sections from OS-core doc?

Signed-off-by: Kai Huang <ahkcs@amazon.com>
@ahkcs ahkcs dismissed stale reviews from dai-chen and Swiddis via ccbd1cb September 4, 2025 02:36
@Swiddis Swiddis enabled auto-merge (squash) September 4, 2025 16:30
Swiddis
Swiddis previously approved these changes Sep 4, 2025
Signed-off-by: Kai Huang <ahkcs@amazon.com>
auto-merge was automatically disabled September 4, 2025 16:50

Head branch was pushed to by a user without write access

@Swiddis Swiddis merged commit ca4d6c1 into opensearch-project:main Sep 4, 2025
42 of 43 checks passed
joshuali925 pushed a commit that referenced this pull request Sep 16, 2025
* distinct_count doc for eventstats

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* doc enhancement

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add fields for consistency between different Java versions

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove changes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add bin to index.rst

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add link

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
joshuali925 pushed a commit that referenced this pull request Sep 24, 2025
* Doc enhancement for eventstats and bin command (#4117)

* distinct_count doc for eventstats

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* doc enhancement

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add fields for consistency between different Java versions

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove changes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add bin to index.rst

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add link

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Implement `Append` command with Calcite (#4123)

* Implement Append Command

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix spotless check

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Rephrase append.rst

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Support subsearch different index for append command

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix some tests and add cross cluster IT

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Not support empty subsearch input for now

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix doctest

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Support empty source edge case

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix anonymizer tests

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Add missing test cases for nested join or lookup command in appended subsearch

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Fix compile issue

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>

* `Bin` command big5 queries (#4163)

* Bin command big5 queries

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update IT

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove tests

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>

* Don't recreate indices on every test (#4222)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Enable pushdown optimization for filtered aggregation (#4213)

* Enable filtered aggregation pushdown

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add basic UT and ignore IT for now

Signed-off-by: Chen Dai <daichen@amazon.com>

* Enable aggregate case to filter rule and fix UT and IT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add expected json file for no pushdown test

Signed-off-by: Chen Dai <daichen@amazon.com>

* Remove unnecessary aggregate case to filter rule

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UT for IS_TRUE support

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add more explain IT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Refactor UT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Extract aggregate filter analyzer abstraction

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add more UT

Signed-off-by: Chen Dai <daichen@amazon.com>

* Refactor UT with fluent API

Signed-off-by: Chen Dai <daichen@amazon.com>

* Add UT for distinct count

Signed-off-by: Chen Dai <daichen@amazon.com>

* Address comment by adding UT for script filter pushdown

Signed-off-by: Chen Dai <daichen@amazon.com>

* Fix spotless

Signed-off-by: Chen Dai <daichen@amazon.com>

---------

Signed-off-by: Chen Dai <daichen@amazon.com>

* Split up our test actions into unit, integ, and doctest. (#4193)

* Run unit test suites in parallel

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>

* Split out our test actions

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Make unit test step run in parallel

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix removed bwc tests

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add another missing parallel flag

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* [Feature] Core Implementation of `rex` Command In PPL (#4109)

* rex - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* stop using utils

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix spotless check

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* offset_field - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* max_match - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* sed - initial implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix name capture group for extraction

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add rex rst doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* IT - initial setup

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add a analyzer test for legacy engine

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add UT for rex

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* sed - add pushdown for sed and explain IT and IT with fix

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* anonymizer - add rex for anonymizer and test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add cross cluster IT for rex

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - resolve comments for rst doc 0

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - address some comments 1

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - resolve comment in rst doc to add a java doc link

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* kai - modify the bin ast builder test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - fix the extraction behavior without filter even when there is zero match

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix rex explain no pushdown

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* change the offset val output format

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix rst file

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - SWITCH TO USE CALCITE NATIVE OPERATORS

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Peng - fix tests after operator change

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* support mode=extract and update doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix the issue after rebase

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - enforce specifying field in antlr for now

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* relocate rex cmd IT

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - simplify vistFunciton

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - add UT for RexExtractMultiFunction

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - add UT RexOffsetFunction

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix some tests

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* DECOUPLE SED + OFFSET FIELD

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Improve error handling for extract

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add this rex rst into index

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix return type in extract multi

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* add rex doc into doc test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix doc test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Fix linting

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix rebase issue

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix regex anonymizer tests

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix analyzer test and setup to use util function

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* lint fix

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix doc test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add max match limit implementation

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* fix anonymizer test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - simplify if

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - make extract multi to only handle the case of max_match > 1

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add wildcard support for rename command (#4019)

* add wildcard support for rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix calcite wildcard support and add tests

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add check to analyzer

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* remove v2 engine wildcard support

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* support cascading rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add cross cluster test

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add test for cascading rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add test for cascading rename

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* change behavior for renaming existing fields

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add tests and update docs

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update docs

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update docs

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix renaming to same name

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix behavior for consecutive wildcards/address comments

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add back import

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* Add support for `median(<value>)` (#4234)

* First revision

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Fixing documentation

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Removing unnecessary comments

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Fixinf stats.rst documentation

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Fixing documentation

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

* Addressing comments

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>

---------

Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Co-authored-by: Aaron Alvarez <aaarone@amazon.com>

* Dynamic source selector (#4116)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* Add gitignore (#4258)

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Support join field list and join options (#3803)

* Support join field list and join options

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add SPL-compatible syntax setting

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Revert SPL settings

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support max=n option

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* support max=n in sql-like join syntax

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add Explain IT for new join syntax

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Refactor the user doc

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Disable the collapse pushdown

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* refactor

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support first/last aggregate functions for PPL (#4223)

* Support first/last aggregation functions for PPL

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Support null

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* remove legacy

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* update doc

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix doctest

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix stats.rst file

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* move pushdown logic to AggregateAnalyzer

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix IT and update null handling

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* add test cases for null handling

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* handle parallelism

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Simplify CalciteExplainIT and add UT for AggregateAnalyzer

Signed-off-by: Kai Huang <ahkcs@amazon.com>

# Conflicts:
#	opensearch/src/test/java/org/opensearch/sql/opensearch/request/AggregateAnalyzerTest.java

* fixes

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix gitignore to ignore symbolic link (#4263)

add comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Push down limit operator into aggregation bucket size (#4228)

* Push down limit operator into aggregation bucket size

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix robust issue in OpenSearchLimitIndexScanRule

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Refine comments

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix the IT issue caused by merging conflict (#4270)

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Print links to test logs after integTest (#4273)

* Print links to test logs after integTest

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* print even when tets failed

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* [Feature] Implementation of mode `sed` and `offset_field` in rex PPL command (#4241)

* [Feature] Implementation of mode sed and offset_field in rex PPL command

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* update rex rst doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - address comment and merge grammar in parser

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - limit offset field only in extraction mode

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - specify exception type of o_f UDF

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - add exception type of o_f UDF - 2

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - add exception type of o_f UDF - also fix the test

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* chen - alphabetical order of o_f return

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add earliest/latest aggregate function for eventstats PPL command (#4212)

* Add earliest/latest aggregate function for eventstats command

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* update docs

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor refactoring

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix doctest

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Simplify logics

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Revert visitWindowFunction

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add sort to some examples

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Refactor tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix argument validation error (WIP)

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Add argument validation for window functions

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix validation

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix tests and refactor

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix test

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix merge issue

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Speed up aggregation pushdown for single group-by expression (#3550)

* Speed up aggregation pushdown for single group-by expression

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add configs nullable_bucket

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix conflicts error

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix unit tests

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix order

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix UT in windows

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix compile error of conflicts

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add more ITs after merging push down limit to agg buckets

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix IT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* address comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Clear sorts in source builder for aggregation pushdown

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Delete the TODO of v2, it's resolved now

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix doctest

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Introduce YAML formatter for better testing/debugging (#4274)

* Implement YamlFormatter

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Enable YAML based plan comparison in tests

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix line break issue in Windows

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Minor fix in test case

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix line break issue

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* Fix comment

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

---------

Signed-off-by: Tomoyuki Morita <moritato@amazon.com>

* doctest: Use 1.0 branch instead of main (#4219)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix doctest (#4292)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Search Command Revamp (#4152)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* `mvjoin` support in PPL Caclite (#4217)

* mvjoin support in PPL Caclite

Signed-off-by: ps48 <pshenoy36@gmail.com>

* fix texts

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update docs

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update doc examples

Signed-off-by: ps48 <pshenoy36@gmail.com>

* rebase main, update test

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update test with real array fields

Signed-off-by: ps48 <pshenoy36@gmail.com>

* use verifyQueryThrowsException in CalcitePPLFunctionTypeTest

Signed-off-by: ps48 <pshenoy36@gmail.com>

* spotless check fix

Signed-off-by: ps48 <pshenoy36@gmail.com>

* remove string,string registration for mvjoin

Signed-off-by: ps48 <pshenoy36@gmail.com>

* remove string,string test

Signed-off-by: ps48 <pshenoy36@gmail.com>

---------

Signed-off-by: ps48 <pshenoy36@gmail.com>

* strftime function implementation (#4106)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* Add non-numeric field support for max/min functions (#4281)

* add non-numeric support for max/min

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix mixed field behavior

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update doc

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* update formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* add tests

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* empty

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* support ip type max/min

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix formatting

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* use tophitsparser

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* remove v2 explain

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* check for numeric fields for native max/min

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* change names

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

* fix type checking

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

---------

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>

* Add  `values` stats function with UDAF (#4276)

* Add  stats function

Signed-off-by: ps48 <pshenoy36@gmail.com>

* add settings for max values

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update functiontypetest IT

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update documentation for values settings

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update the rst docs, remove settingsholder

Signed-off-by: ps48 <pshenoy36@gmail.com>

* update AST additions

Signed-off-by: ps48 <pshenoy36@gmail.com>

* updated the IT testValuesFunctionGroupBy

Signed-off-by: ps48 <pshenoy36@gmail.com>

---------

Signed-off-by: ps48 <pshenoy36@gmail.com>

* Support ISO8601-formatted string in PPL (#4246)

* Support parsing ISO 8601 datetime format for timestamp value

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Modify tests for ISO 8601 timestamp input

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add support of iso 8601 date string to date and time

- add an IT for date time comparison with iso 8601 formatted literal

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Push down project operator with non-identity projections into scan (#4279)

* Support project push down after aggregation

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Push down project operator with non-identity projections into scan

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Also changing plan from merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix 4296

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Add spotless precommit hook + license check (#4306)

* Add spotless precommit hook

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Decouple plugin spotless versions + upgrade spotless

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Enable license headers everywhere

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Remove a redundant comment

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Fix removed additional licenses

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

---------

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Add Ryan as a maintainer (#4257)

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>

* Spotless precommit: apply instead of check (#4320)

* Add merge_group trigger to test workflows (#4216)

* Update grammar files and developer guide (#4301)

* Update grammar files and developer guide

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* fix

Signed-off-by: Kai Huang <ahkcs@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>

* Fix geopoiint issue in complex data types (#4325)

Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>

* [Doc] Correct the comparision table for rex doc (#4321)

* [Doc] Correct the comparision table for rex doc

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* peng - remove non support feature from comparison table

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

---------

Signed-off-by: Jialiang Liang <jiallian@amazon.com>

* Add splunk to ppl cheat sheet (#3726)

* update with latest ppl commands and function improvement

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Address comments

Signed-off-by: Peng Huo <penghuo@gmail.com>

---------

Signed-off-by: Peng Huo <penghuo@gmail.com>

* Date/Time based Span aggregation should always not present null bucket (#4327)

* Updating coalesce documentation (#4305)

Co-authored-by: Aaron Alvarez <aaarone@amazon.com>

* Support serializing & deserializing UDTs when pushing down scripts (#4245)

* Support serializing & deserializing UDTs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Update explain ITs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Push down UDT types as string types for comparison operators

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Separate test cases and add an ignored IT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Correct the handling of UDT in CalciteScriptEngine by substituting calcite's type factory with OpenSearchTypeFactory

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix deserialization for IP

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove testExplainPushDownScriptsContainingUDT in v2

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Enable testLimitAfterAggregation in CalcitePPLAggregationIT

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Unit test serialize map and array types

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Fix deeper level deserialization of UDTs

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a yaml test for issue 4322

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Add a test case for issue 4340

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* Remove redundant classes

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

---------

Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>

* change Anonymizer to mask PPL (#4352)

* change Anonymizer

Signed-off-by: xinyual <xinyual@amazon.com>

* fix case

Signed-off-by: xinyual <xinyual@amazon.com>

---------

Signed-off-by: xinyual <xinyual@amazon.com>

* [Feature][Enhancement] Enhance patterns command with additional sample_logs output field (#4155)

* Enhance patterns command with additional sample_logs output field

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Reorder agg fields for simple_pattern

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Test fix after previous fix to not drop group by list

Signed-off-by: Songkan Tang <songkant@amazon.com>

---------

Signed-off-by: Songkan Tang <songkant@amazon.com>

* Optimize count aggregation performance by utilizing native doc_count in v3 (#4337)

* Optimize bucket aggregation performance by utilizing native doc_count in v3

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix issue of count(FIELD)

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* fix comments

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Fix typo

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* revert the doc_count pushdown for count(FIELD) by EXPR

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Support pushdown count aggregation in no bucket aggregation to hits.total.value

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* No index found with given index pattern should throw IndexNotFoundException (#4369)

* No index found with given index pattern should throw IndexNotFoundException

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Add UT

Signed-off-by: Lantao Jin <ltjin@amazon.com>

---------

Signed-off-by: Lantao Jin <ltjin@amazon.com>

* Push down stats with bins on time field into auto_date_histogram (#4329)

* Push down stats with bins on time field into auto_date_histogram

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Prevent pushing down multiple group-by with bins in advance.

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Remove useless code

Signed-off-by: Heng Qian <qianheng@amazon.com>

* Fix IT after merging main

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Heng Qian <qianheng@amazon.com>

---------

Signed-off-by: Kai Huang <ahkcs@amazon.com>
Signed-off-by: Songkan Tang <songkant@amazon.com>
Signed-off-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@gmail.com>
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Signed-off-by: Aaron Alvarez <aaarone@amazon.com>
Signed-off-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Signed-off-by: Vamsi Manohar <reddyvam@amazon.com>
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
Signed-off-by: Lantao Jin <ltjin@amazon.com>
Signed-off-by: Heng Qian <qianheng@amazon.com>
Signed-off-by: ps48 <pshenoy36@gmail.com>
Signed-off-by: Yuanchun Shen <yuanchu@amazon.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: xinyual <xinyual@amazon.com>
Co-authored-by: Kai Huang <105710027+ahkcs@users.noreply.github.com>
Co-authored-by: Songkan Tang <songkant@amazon.com>
Co-authored-by: Simeon Widdis <sawiddis@gmail.com>
Co-authored-by: Chen Dai <daichen@amazon.com>
Co-authored-by: Jialiang Liang <jiallian@amazon.com>
Co-authored-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
Co-authored-by: Aaron Alvarez <900908alvarezaaron@gmail.com>
Co-authored-by: Aaron Alvarez <aaarone@amazon.com>
Co-authored-by: Vamsi Manohar <reddyvam@amazon.com>
Co-authored-by: Tomoyuki MORITA <moritato@amazon.com>
Co-authored-by: Lantao Jin <ltjin@amazon.com>
Co-authored-by: qianheng <qianheng@amazon.com>
Co-authored-by: Shenoy Pratik <sgguruda@amazon.com>
Co-authored-by: Yuanchun Shen <yuanchu@amazon.com>
Co-authored-by: Peng Huo <penghuo@gmail.com>
Co-authored-by: Xinyuan Lu <xinyual@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants