Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db engines): add support for Opendistro Elasticsearch (AWS ES) #12602

Merged
merged 7 commits into from
Feb 10, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions superset/db_engine_specs/elasticsearch.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,18 @@ def convert_dttm(cls, target_type: str, dttm: datetime) -> Optional[str]:
if target_type.upper() == utils.TemporalType.DATETIME:
return f"""CAST('{dttm.isoformat(timespec="seconds")}' AS DATETIME)"""
return None


class OpenDistro(ElasticSearchEngineSpec):

_time_grain_expressions = {
None: "{col}",
}

engine = "odelasticsearch"
engine_name = "ElasticSearch"

@classmethod
def make_label_compatible(cls, label: str) -> str:
new_label = super().make_label_compatible(label)
return new_label.replace(".", "_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overriding make_label_compatible, we should define a custom _mutate_label method that does this. See e.g.

@staticmethod
def _mutate_label(label: str) -> str:
"""
BigQuery field_name should start with a letter or underscore and contain only
alphanumeric characters. Labels that start with a number are prefixed with an
underscore. Any unsupported characters are replaced with underscores and an
md5 hash is added to the end of the label to avoid possible collisions.
:param label: Expected expression label
:return: Conditionally mutated label
"""
label_hashed = "_" + hashlib.md5(label.encode("utf-8")).hexdigest()
# if label starts with number, add underscore as first character
label_mutated = "_" + label if re.match(r"^\d", label) else label
# replace non-alphanumeric characters with underscores
label_mutated = re.sub(r"[^\w]+", "_", label_mutated)
if label_mutated != label:
# add first 5 chars from md5 hash to label to avoid possible collisions
label_mutated += label_hashed[:6]
return label_mutated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thks! I guess we should rename _mutate_label to mutate_label. This is still on it's infancy, still missing the dbapi PR merge and release and on this side the time grains are different for opendistro

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on renaming (let's have a separate PR for that). I guess we should have a separate spec for elasticsearch and odelasticsearch to properly support the differing time grains?