Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL: Add support for single parameter text manipulating functions #31874

Merged
merged 11 commits into from
Jul 12, 2018

Conversation

astefan
Copy link
Contributor

@astefan astefan commented Jul 6, 2018

Adds support for unary string manipulating functions:

  • ASCII, BIT_LENGTH, CHAR
  • CHAR_LENGTH, LCASE, LENGTH
  • LTRIM, RTRIM, SPACE, UCASE

Part of #31604.

@astefan astefan requested review from costin and nik9000 July 6, 2018 18:17
@astefan astefan added the :Analytics/SQL SQL querying label Jul 6, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Member

@costin costin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments - otherwise LGTM.

//;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't be updated.

@@ -0,0 +1,30 @@
bitLengthGroupByAndOrderBy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick - put all functions into a functions file as oppose to string-functions.

@@ -0,0 +1,80 @@
stringAscii
// tag::ascii
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tag/end is used in the docs - not sure whether this is the case here so they should be removed (since it's just noise).

SELECT SPACE("languages") s, COUNT(*) count FROM "test_emp" GROUP BY SPACE("languages") ORDER BY SPACE(languages);

spaceGroupByAndOrderByWithCharLength
SELECT SPACE("languages") s, COUNT(*) count, CAST(CHAR_LENGTH(SPACE("languages")) AS INT) cls FROM "test_emp" WHERE "languages" IS NOT NULL GROUP BY SPACE("languages") ORDER BY SPACE("languages");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍


return new String(spaces);
}),
BIT_LENGTH((String s) -> s.getBytes(StandardCharsets.UTF_8).length * 8),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the getBytes should be replaced by UnicodeUtils#calcUTF16toUTF8Length instead.


@Override
protected ScriptTemplate asScriptFrom(FieldAttribute field) {
//TODO change this to use _source instead of the exact form (aka field.keyword for text fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an issue raised for this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should do that. I mean, we should file an issue, but I think we should be very careful about using _source in script queries.

@@ -57,6 +57,33 @@ public static String camelCaseToUnderscore(String string) {
}
return sb.toString().toUpperCase(Locale.ROOT);
}

//CAMEL_CASE to camelCase
public static String underscoreToLowerCamelCase(String string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda of the fence with this method - maybe it's worth using a different convention but if we stick with it, have an updated comment (the one above is wrong) and some tests.

assertTrue(p instanceof Filter);
Expression condition = ((Filter) p).condition();
SqlIllegalArgumentException ex = expectThrows(SqlIllegalArgumentException.class, () -> QueryTranslator.toQuery(condition, false));
assertEquals("Scalar functions (LTRIM(keyword)) not allowed (yet) as arguments for LIKE", ex.getMessage());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nitpick, the plural in functions should be optional (it's easy to check the number of arguments and add it only if needed). Also the function is LTRIM not LTRIM(keyword) - namely functionName() not function().
Lastly I would pick a different function to check whether the proper name is returned (say CHAR_LENGTH) without the casing.

@Override
public DataType dataType() {
//TODO investigate if a data type Long (BIGINT) wouldn't be more appropriate here
return DataType.INTEGER;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL returns an int so we're probably safe doing it. They aren't always right, but they usually are.


@Override
protected ScriptTemplate asScriptFrom(FieldAttribute field) {
//TODO change this to use _source instead of the exact form (aka field.keyword for text fields)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should do that. I mean, we should file an issue, but I think we should be very careful about using _source in script queries.

}

@Override
protected TypeResolution resolveType() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I've made changes to there I've been trying to make sure methods in these were either abstract or final. That sort of forces things to be the same as one another and not "almost the same". Like Char doesn't feel like a real UnaryStringFunction to me because it overrides this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 agree with you that it doesn't feel like a true unary string function, judging by the type of the parameter. And it would be an awkward move to override the method checking the parameter type for this reason. But in SQL, CHAR and SPACE (the only two functions that do not receive at least one string as parameter) are considered to be string functions probably because they create text in one form or another, irrespective of the parameters count and type. Do you think it's worth taking these two out of the UnaryStringFunction base class and handle them differently?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, yeah, I'd stop extending that class. Or at least see what that looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nik9000 I've made the changes and introduced a new UnaryStringIntFunction class to handle the Char and Space functions. Let me know what you think.

astefan added 2 commits July 9, 2018 15:53
Changed BIT_LENGTH function way of calculating the bits length.
…rameter a non-text (numeric): classes structure change, functionality remains the same.
@astefan
Copy link
Contributor Author

astefan commented Jul 10, 2018

@costin I've addressed your concerns.

@costin
Copy link
Member

costin commented Jul 11, 2018

Thanks

@astefan astefan merged commit edf83c1 into elastic:master Jul 12, 2018
astefan added a commit that referenced this pull request Jul 12, 2018
…1874)

Added support for ASCII, BIT_LENGTH, CHAR, CHAR_LENGTH, LCASE, LENGTH, LTRIM, RTRIM, SPACE, UCASE functions.
Wherever Painless scripting is necessary (WHERE conditions, ORDER BY etc), those scripts are being used.
@astefan
Copy link
Contributor Author

astefan commented Jul 12, 2018

Pushed to 6.x, as well.

dnhatn added a commit that referenced this pull request Jul 12, 2018
* master:
  [TEST] Mute SlackMessageTests.testTemplateRender
  Docs: Explain closing the high level client
  [ML] Re-enable memory limit integration tests (#31328)
  [test] disable packaging tests for suse boxes
  Add nio transport to security plugin (#31942)
  XContentTests : Insert random fields at random positions (#30867)
  Force execution of fetch tasks (#31974)
  Fix unreachable error condition in AmazonS3Fixture (#32005)
  Tests: Fix SearchFieldsIT.testDocValueFields (#31995)
  Add Expected Reciprocal Rank metric (#31891)
  [ML] Get ForecastRequestStats doc in RestoreModelSnapshotIT (#31973)
  SQL: Add support for single parameter text manipulating functions (#31874)
  [ML] Ensure immutability of MlMetadata (#31957)
  Tests: Mute SearchFieldsIT.testDocValueFields()
  muted tests due to #31940
  Work around reported problem in eclipse (#31960)
  Move build integration tests out of :buildSrc project (#31961)
  Tests: Remove use of joda time in some tests (#31922)
  [Test] Reactive 3rd party tests on CI (#31919)
  SQL: Support for escape sequences (#31884)
  SQL: HAVING clause should accept only aggregates (#31872)
  Docs: fix typo in datehistogram (#31972)
  Switch url repository rest tests to new style requests (#31944)
  Switch reindex tests to new style requests (#31941)
  Docs: Added note about cloud service to installation and getting started
  [DOCS] Removes alternative docker pull example (#31934)
  Add Snapshots Status API to High Level Rest Client (#31515)
  ingest: date_index_name processor template resolution (#31841)
  Test: fix null failure in watcher test (#31968)
  Switch test framework to new style requests (#31939)
  Switch low level rest tests to new style Requests (#31938)
  Switch high level rest tests to new style requests (#31937)
  [ML] Mute test failing due to Java 11 date time format parsing bug (#31899)
  [TEST] Mute SlackMessageTests.testTemplateRender
  Fix assertIngestDocument wrongfully passing (#31913)
  Remove unused reference to filePermissionsCache (#31923)
  rolling upgrade should use a replica to prevent relocations while running a scroll
  HLREST: Bundle the x-pack protocol project (#31904)
  Increase logging level for testStressMaybeFlush
  Added lenient flag for synonym token filter (#31484)
  [X-Pack] Beats centralized management: security role + licensing (#30520)
  HLRest: Move xPackInfo() to xPack().info() (#31905)
  Docs: add security delete role to api call table (#31907)
  [test] port archive distribution packaging tests (#31314)
  Watcher: Slack message empty text (#31596)
  [ML] Mute failing DetectionRulesIT.testCondition() test
  Fix broken NaN check in MovingFunctions#stdDev() (#31888)
  Date: Add DateFormatters class that uses java.time (#31856)
  [ML] Switch native QA tests to a 3 node cluster (#31757)
  Change trappy float comparison (#31889)
  Fix building AD URL from domain name (#31849)
  Add opaque_id to audit logging (#31878)
  re-enable backcompat tests
  add support for is_write_index in put-alias body parsing (#31674)
  Improve release notes script (#31833)
  [DOCS] Fix broken link in painless example
  Handle missing values in painless (#30975)
  Remove the ability to index or query context suggestions without context (#31007)
  Ingest: Enable Templated Fieldnames in Rename (#31690)
  [Docs] Fix typo in the Rollup API Quick Reference (#31855)
  Ingest: Add ignore_missing option to RemoveProc (#31693)
  Add template config for Beat state to X-Pack Monitoring (#31809)
  Watcher: Add ssl.trust email account setting (#31684)
  Remove link to oss-MSI (#31844)
  Painless: Restructure Definition/Whitelist (#31879)
  HLREST: Add x-pack-info API (#31870)
dnhatn added a commit that referenced this pull request Jul 12, 2018
* 6.x:
  Force execution of fetch tasks (#31974)
  [TEST] Mute SlackMessageTests.testTemplateRender
  Docs: Explain closing the high level client
  [test] disable java packaging tests for suse
  XContentTests : Insert random fields at random positions (#30867)
  Add Get Snapshots High Level REST API (#31980)
  Fix unreachable error condition in AmazonS3Fixture (#32005)
  [6.x][ML] Ensure immutability of MlMetadata (#31994)
  Add Expected Reciprocal Rank metric (#31891)
  SQL: Add support for single parameter text manipulating functions (#31874)
  muted tests due to #31940
  Work around reported problem in eclipse (#31960)
  Move build integration tests out of :buildSrc project (#31961)
  [Test] Reactive 3rd party tests on CI (#31919)
  Fix assertIngestDocument wrongfully passing (#31913) (#31951)
  SQL: Support for escape sequences (#31884)
  SQL: HAVING clause should accept only aggregates (#31872)
  Docs: fix typo in datehistogram (#31972)
  Switch url repository rest tests to new style requests (#31944)
  Switch reindex tests to new style requests (#31941)
  Switch test framework to new style requests (#31939)
  Docs: Added note about cloud service to installation and getting started
  [DOCS] Removes alternative docker pull example (#31934)
  ingest: date_index_name processor template resolution (#31841)
  Test: fix null failure in watcher test (#31968)
  Watcher: Slack message empty text (#31596)
  Switch low level rest tests to new style Requests (#31938)
  Switch high level rest tests to new style requests (#31937)
  HLREST: Bundle the x-pack protocol project (#31904)
  [ML] Mute test failing due to Java 11 date time format parsing bug (#31899)
  Increase logging level for testStressMaybeFlush
  rolling upgrade should use a replica to prevent relocations while running a scroll
  [test] port archive distribution packaging tests (#31314)
  HLRest: Move xPackInfo() to xPack().info() (#31905)
  Increase logging level for :qa:rolling-upgrade
  Backport: Add template config for Beat state to X-Pack Monitoring (#31809) (#31893)
  Fix building AD URL from domain name (#31849)
  Fix broken NaN check in MovingFunctions#stdDev() (#31888)
  Change trappy float comparison (#31889)
  Add opaque_id to audit logging (#31878)
  add support for is_write_index in put-alias body parsing (#31674)
  Ingest: Enable Templated Fieldnames in Rename (#31690) (#31896)
  Ingest: Add ignore_missing option to RemoveProc (#31693) (#31892)
  [Docs] Fix typo in the Rollup API Quick Reference (#31855)
  Watcher: Add ssl.trust email account setting (#31684)
  [PkiRealm] Invalidate cache on role mappings change (#31510)
  [Security] Check auth scheme case insensitively (#31490)
  HLREST: Add x-pack-info API (#31870)
  Remove link to oss-MSI (#31844)
  Painless: Restructure Definition/Whitelist (#31879)
@astefan astefan deleted the 31604_unary_string_functions branch September 10, 2018 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants