Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db engines): add support for Opendistro Elasticsearch (AWS ES) #12602

Merged
merged 7 commits into from
Feb 10, 2021

Conversation

dpgaspar
Copy link
Member

@dpgaspar dpgaspar commented Jan 19, 2021

SUMMARY

Adds Elasticsearch opendistro support. Most work was done upstream on the release of elasticsearch-dbapi 0.2.0.
Checkout https://github.com/preset-io/elasticsearch-dbapi/blob/master/README.md for more details.

Screenshot 2021-02-08 at 14 23 41

Screenshot 2021-02-08 at 14 25 00

ADDITIONAL INFORMATION

  • Has associated issue:
  • Changes UI
  • Requires DB Migration.
  • Confirm DB Migration upgrade and downgrade tested.
  • Introduces new feature or API
  • Removes existing feature or API

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Quick first pass review

Comment on lines 61 to 63
def make_label_compatible(cls, label: str) -> str:
new_label = super().make_label_compatible(label)
return new_label.replace(".", "_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overriding make_label_compatible, we should define a custom _mutate_label method that does this. See e.g.

@staticmethod
def _mutate_label(label: str) -> str:
"""
BigQuery field_name should start with a letter or underscore and contain only
alphanumeric characters. Labels that start with a number are prefixed with an
underscore. Any unsupported characters are replaced with underscores and an
md5 hash is added to the end of the label to avoid possible collisions.
:param label: Expected expression label
:return: Conditionally mutated label
"""
label_hashed = "_" + hashlib.md5(label.encode("utf-8")).hexdigest()
# if label starts with number, add underscore as first character
label_mutated = "_" + label if re.match(r"^\d", label) else label
# replace non-alphanumeric characters with underscores
label_mutated = re.sub(r"[^\w]+", "_", label_mutated)
if label_mutated != label:
# add first 5 chars from md5 hash to label to avoid possible collisions
label_mutated += label_hashed[:6]
return label_mutated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thks! I guess we should rename _mutate_label to mutate_label. This is still on it's infancy, still missing the dbapi PR merge and release and on this side the time grains are different for opendistro

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on renaming (let's have a separate PR for that). I guess we should have a separate spec for elasticsearch and odelasticsearch to properly support the differing time grains?

@codecov-io
Copy link

codecov-io commented Jan 19, 2021

Codecov Report

Merging #12602 (83df757) into master (bab86ab) will increase coverage by 2.31%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #12602      +/-   ##
==========================================
+ Coverage   65.05%   67.36%   +2.31%     
==========================================
  Files        1021      489     -532     
  Lines       50095    28772   -21323     
  Branches     5141        0    -5141     
==========================================
- Hits        32587    19383   -13204     
+ Misses      17332     9389    -7943     
+ Partials      176        0     -176     
Flag Coverage Δ
cypress ?
javascript ?
python 67.36% <80.00%> (+3.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/db_engine_specs/elasticsearch.py 89.74% <80.00%> (-4.71%) ⬇️
superset/db_engines/hive.py 0.00% <0.00%> (-85.72%) ⬇️
superset/db_engine_specs/hive.py 73.84% <0.00%> (-17.31%) ⬇️
superset/dataframe.py 91.66% <0.00%> (-8.34%) ⬇️
superset/db_engine_specs/clickhouse.py 87.09% <0.00%> (-7.35%) ⬇️
superset/db_engine_specs/presto.py 81.38% <0.00%> (-6.71%) ⬇️
superset/dashboards/commands/update.py 83.07% <0.00%> (-5.61%) ⬇️
superset/utils/decorators.py 94.44% <0.00%> (-5.56%) ⬇️
superset/views/database/mixins.py 80.70% <0.00%> (-1.76%) ⬇️
... and 565 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bab86ab...83df757. Read the comment docs.

@pull-request-size pull-request-size bot added size/M and removed size/S labels Jan 21, 2021
@syamat
Copy link

syamat commented Jan 24, 2021

When is the plan to merge this PR? Any time lines?

@dpgaspar
Copy link
Member Author

@syamat this still needs some work on the dialect side also https://github.com/preset-io/elasticsearch-dbapi, but hopping to have this merged in 2 weeks

@dpgaspar dpgaspar marked this pull request as ready for review February 8, 2021 14:01
@dpgaspar dpgaspar requested a review from villebro February 8, 2021 14:25
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One quick question/comment

Comment on lines +71 to +75
@classmethod
def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
if target_type.upper() == utils.TemporalType.DATETIME:
return f"""'{dttm.isoformat(timespec="seconds")}'"""
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't ES/OD support other temporal types like DATE or TIMESTAMP'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding by default Elasticsearch only supports date field type

https://opendistro.github.io/for-elasticsearch-docs/docs/sql/datatypes/#date-and-time-types

...
By default, the Elasticsearch DSL uses the date type as the only date-time related type that contains all information of an absolute time point.
...

For the SQL endpoint, we can use date functions that will return other date/time types out of them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for clarifying!

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dpgaspar dpgaspar merged commit b3a814f into apache:master Feb 10, 2021
@dpgaspar dpgaspar deleted the feat/opendistro branch February 10, 2021 08:17
amitmiran137 pushed a commit to simcha90/incubator-superset that referenced this pull request Feb 10, 2021
* master:
  fix: UI toast typo (apache#13026)
  feat(db engines): add support for Opendistro Elasticsearch (AWS ES) (apache#12602)
  fix(build): black failing on master, add to required checks (apache#13039)
  fix: time filter db migration optimization (apache#13015)
  fix: untranslated text content of Dashboard page (apache#13024)
  fix(ci): remove signature requirements for commits to master (apache#13034)
  fix: add alerts and report to default config (apache#12999)
  docs(changelog): add entries for 1.0.1 (apache#12981)
  ci: skip cypress if no code changes (apache#12982)
  chore: add cypress required checks for branch protection (apache#12970)
  Refresh dashboard list after bulk delete (apache#12945)
  Updates storybook to version 6.1.17 (apache#13014)
  feat: Save datapanel state in local storage (apache#12996)
  fix: added text and changed margins (apache#12923)
  chore: Swap Slack Url 2 more places (apache#13004)
amitmiran137 pushed a commit to nielsen-oss/superset that referenced this pull request Feb 14, 2021
…pache#12602)

* feat(db engines): add support for Opendistro Elasticsearch (AWS ES)

* add time grains

* lint

* bump elasticsearch-dbapi version

* add tests

* fix test
@Jimmy-Newtron
Copy link

Jimmy-Newtron commented Feb 15, 2021

@villebro When do you plan to release a new version with this improvement ?

henryyeh pushed a commit to preset-io/superset that referenced this pull request Feb 17, 2021
…pache#12602)

* feat(db engines): add support for Opendistro Elasticsearch (AWS ES)

* add time grains

* lint

* bump elasticsearch-dbapi version

* add tests

* fix test

(cherry picked from commit b3a814f)
@vineetalagh
Copy link

@dpgaspar Please look at issue 14662. I think the conversion for opendistro engine is not working.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset-io size/M 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants