Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom date formats and OpenSearch date formats for date fields as part of Lucene query #2762

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

manasvinibs
Copy link
Member

@manasvinibs manasvinibs commented Jun 18, 2024

Description

This change adds support for OpenSearch Date formats and custom date formats to be part of SQL Lucene queries.
Without this change, we are supporting only list of selected formats which always gets formatted to ISO local string or epoch before submitting to opensearch dsl query.

  • Refactors ExprType to be OpenSearchDateType object instead of ExprCoretype enums passed as a param to the Lucene QueryBuilder.
  • Extract OpenSearchDateNamedFormatters and OpenSearchDateCustomFormatters from the OpenSearchDateType object which is set during IndexMapping field parsing for a field type.
  • Format information is passed to the ExprValue implementation constructor to parse the date string using the field specific formatters.
  • Adds unit tests for all the different formats use cases.
  • Adds OpenSearch package dependency to the core module to use OpenSearch provided DateFormatter and DateFormatters classes instead of java provided DateFormatter.

Follow-up PR - I'll add IT tests in a separate PR

Issues Resolved

#2700

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.


/** Constructor of ExprDateValue. */
public ExprDateValue(String date) {
try {
this.date = LocalDate.parse(date, DATE_TIME_FORMATTER_VARIABLE_NANOS_OPTIONAL);
this.datePattern = determineDatePattern(date);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does parese execute 2 times, the reasons is LocalDate.parse already parse the date string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line #44 here, yes we are already parsing the date string to Local date object. But I need to know the exact pattern that matched the input date string. So inside determineDatePattern function, using parse function (LocalDate/LocalDatetime) I'm determining the matching pattern and returning that to set this.datePattern. This date pattern is used to format the LocalDate date object instead of default ISO_LOCAL_DATE string inside value().

Copy link
Collaborator

@ykmr1224 ykmr1224 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you need to reformat the code using ./gradlew spotlessApply.

@manasvinibs manasvinibs force-pushed the Issue-2700 branch 2 times, most recently from a2c9f68 to c49231d Compare July 8, 2024 06:07
@manasvinibs manasvinibs changed the title Fix to support different date formats in the sql query without date casting Add support for custom date format and openSearch date format for date fields as part of Lucene query Jul 8, 2024
@manasvinibs manasvinibs changed the title Add support for custom date format and openSearch date format for date fields as part of Lucene query Add support for custom date formats and OpenSearch date formats for date fields as part of Lucene query Jul 8, 2024
Comment on lines 255 to 264
List<DateFormatter> dateFormatters = this.getAllNamedFormatters();
dateFormatters.addAll(this.getAllCustomFormatters());
ZonedDateTime zonedDateTime = null;

// check if dateFormatters are empty, then set default ones
if (dateFormatters.isEmpty()) {
dateFormatters = initializeDateFormatters();
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List<DateFormatter> dateFormatters = this.getAllNamedFormatters();
dateFormatters.addAll(this.getAllCustomFormatters());

this code will duplicate getAllCustomFormatters?

use Stream.concat to concat two list works?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formats is fetched from OpenSearch index mapping, if formats is null, does it means the OpenSearchDataType is a wrapper of CoreType. So in this case, should we use OPENSEARCH_DEFAULT_FORMATS to format to parse the data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When format is null, it can still be OpenSearchDateType field as there can be date fields without formats (default ones) defined. So I have used OPENSEARCH_DEFAULT_FORMATS to format the date object in those cases.

…e fields as part of Lucene query

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
@@ -400,6 +400,48 @@ Querying such index will provide a response with ``schema`` block as shown below
"status": 200
}

If the sql query contains an `IndexDateField` and a literal value with an operator (such as a term query or a range query), then the literal value can be in the `IndexDateField` format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

term query and range query is OpenSearch concept, there is not function / operator in SQL is term/range.

@@ -230,6 +234,9 @@ public String legacyTypeName() {
if (mappingType == null) {
return exprCoreType.typeName();
}
if (mappingType.toString().equalsIgnoreCase("DATE")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to mappingType == Date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check mappingType == DateNanos either?

if (mappingType == Date || mappingType == DateNanos) {

@@ -104,8 +104,8 @@ private static Stream<Arguments> getTestDataWithType() {
Arguments.of(MappingType.ScaledFloat, "scaled_float", DOUBLE),
Arguments.of(MappingType.Double, "double", DOUBLE),
Arguments.of(MappingType.Boolean, "boolean", BOOLEAN),
Arguments.of(MappingType.Date, "date", TIMESTAMP),
Arguments.of(MappingType.DateNanos, "date", TIMESTAMP),
Arguments.of(MappingType.Date, "timestamp", TIMESTAMP),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double confirm on UT chagne. does all these date -> timestamp change necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these assertions directly comparing with legacyTypeName and hence need this update. I think these changes are necessary and only in UT to avoid any breaking changes with type change from ExprCoreType to OpenSearchDateType inside fieldMappings map.

Copy link
Member

@LantaoJin LantaoJin Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only find the legacy name change for Date in production code, why the DateNanos changed in test either? https://github.com/opensearch-project/sql/pull/2762/files/ea3e205184dda4e8e0d5ba55cd9cf18eac82c30e#r1683753172

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is because when we create OpenSearchDataType instance for every field mapping type, we are creating OpenSearchDateType instance for both Mappingtype.Date and MappingType.DateNanos- https://github.com/opensearch-project/sql/blob/main/opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDataType.java#L165
and set the mapping type to Date even for DateNanos (not sure why is it this way) - https://github.com/opensearch-project/sql/blob/main/opensearch/src/main/java/org/opensearch/sql/opensearch/data/type/OpenSearchDateType.java#L143
So I think here UT assertion still looking for Mapping type date even for date nanos I believe.

Copy link
Collaborator

@penghuo penghuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double confirm on the UT assert change,

@penghuo penghuo merged commit 0fad56d into opensearch-project:main Jul 19, 2024
14 of 15 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.x
# Create a new branch
git switch --create backport/backport-2762-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 0fad56db4b3e8983e2e7fafcf9fb80e592d97ddb
# Push it to GitHub
git push --set-upstream origin backport/backport-2762-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2762-to-2.x.

manasvinibs added a commit to manasvinibs/sql that referenced this pull request Jul 22, 2024
…e fields as part of Lucene query (opensearch-project#2762)

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
(cherry picked from commit 0fad56d)
manasvinibs added a commit to manasvinibs/sql that referenced this pull request Jul 22, 2024
…e fields as part of Lucene query (opensearch-project#2762)

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
(cherry picked from commit 0fad56d)
Signed-off-by: Manasvini B S <manasvis@amazon.com>
manasvinibs added a commit to manasvinibs/sql that referenced this pull request Jul 23, 2024
…e fields as part of Lucene query (opensearch-project#2762)

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
(cherry picked from commit 0fad56d)
Signed-off-by: Manasvini B S <manasvis@amazon.com>
manasvinibs added a commit to manasvinibs/sql that referenced this pull request Jul 23, 2024
…e fields as part of Lucene query (opensearch-project#2762)

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
(cherry picked from commit 0fad56d)
Signed-off-by: Manasvini B S <manasvis@amazon.com>
ykmr1224 pushed a commit that referenced this pull request Jul 23, 2024
…e fields as part of Lucene query (#2762) (#2849)

Github Issue - #2700


(cherry picked from commit 0fad56d)

Signed-off-by: Manasvini B S <manasvis@amazon.com>
opensearch-trigger-bot bot pushed a commit that referenced this pull request Jul 23, 2024
…e fields as part of Lucene query (#2762) (#2849)

Github Issue - #2700

(cherry picked from commit 0fad56d)

Signed-off-by: Manasvini B S <manasvis@amazon.com>
(cherry picked from commit 02d57e0)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
ykmr1224 pushed a commit that referenced this pull request Jul 24, 2024
…e fields as part of Lucene query (#2762) (#2849) (#2851)

Github Issue - #2700

(cherry picked from commit 0fad56d)


(cherry picked from commit 02d57e0)

Signed-off-by: Manasvini B S <manasvis@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
manasvinibs added a commit to manasvinibs/sql that referenced this pull request Aug 14, 2024
…e fields as part of Lucene query (opensearch-project#2762)

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
jzonthemtn pushed a commit to jzonthemtn/sql that referenced this pull request Aug 28, 2024
…e fields as part of Lucene query (opensearch-project#2762)

Github Issue - opensearch-project#2700

Signed-off-by: Manasvini B S <manasvis@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport-failed v2.16.0 Issues targeting release v2.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants