Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework on OpenSearchDataType: parse, store and use mapping information #180

Merged
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
f335101
First PoC.
Yury-Fridlyand Nov 16, 2022
797d2a8
Replace old `typeMapping` by new one in `OpenSearchExprValueFactory`,…
Yury-Fridlyand Nov 17, 2022
4a76716
Minor fix.
Yury-Fridlyand Nov 17, 2022
e4d89de
Rework on `OpenSearchDataType`.
Yury-Fridlyand Dec 1, 2022
16d7ce0
Further rework on `OpenSearchDataType`:
Yury-Fridlyand Dec 20, 2022
ecf60de
Fix failing unit tests.
Yury-Fridlyand Jan 4, 2023
f699c15
Test fixes and coverage.
Yury-Fridlyand Jan 5, 2023
c538ac7
Fix code style.
Yury-Fridlyand Jan 6, 2023
4a6fb9f
Minor fixes.
Yury-Fridlyand Jan 6, 2023
de08526
Fix for code coverage in `OpenSearchDataType.of(Type)`.
Yury-Fridlyand Jan 6, 2023
53c4fdd
Fix reordering columns.
Yury-Fridlyand Jan 6, 2023
2b9f253
Fix doctests.
Yury-Fridlyand Jan 6, 2023
43a0a98
Typo fix.
Yury-Fridlyand Jan 6, 2023
1f7b151
Rework on 53c4fdd. New fix has code coverage and all test passed on it.
Yury-Fridlyand Jan 6, 2023
90bae1f
Typo fix for previous commit.
Yury-Fridlyand Jan 6, 2023
735e9f7
Revert wrong test changes made in 1f7b1516.
Yury-Fridlyand Jan 6, 2023
46f20ee
Remove `TextKeywordValue`.
Yury-Fridlyand Jan 11, 2023
ae11c22
Update comments.
Yury-Fridlyand Jan 11, 2023
84cbc67
Add changes according to the PR review.
Yury-Fridlyand Jan 14, 2023
902929e
Changes according to the PR feedback.
Yury-Fridlyand Jan 27, 2023
733a262
A fix to avoid breaking changes.
Yury-Fridlyand Jan 27, 2023
d2f4ed5
Move `convertTextToKeyword` from `ScriptUtils` to `OpenSearchTextType…
Yury-Fridlyand Jan 27, 2023
8fa98df
Update UT for `typeof` function.
Yury-Fridlyand Jan 27, 2023
6e96dae
Minor commenting.
Yury-Fridlyand Jan 27, 2023
e32798c
Typo fix for `d2f4ed5e`.
Yury-Fridlyand Jan 28, 2023
a41aa28
Make all instances of `OpenSearchDataType` and of derived types singl…
Yury-Fridlyand Jan 28, 2023
b1ffaa3
Address PR feedback.
Yury-Fridlyand Jan 31, 2023
7f614ac
Fix doc and integration tests.
Yury-Fridlyand Jan 31, 2023
4c7e293
Fix a unit test.
Yury-Fridlyand Jan 31, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ public int compareTo(ExprValue other) {
throw new IllegalStateException(
String.format("[BUG] Unreachable, Comparing with NULL or MISSING is undefined"));
}
if ((this.isNumber() && other.isNumber()) || this.type() == other.type()) {
if ((this.isNumber() && other.isNumber()) || this.type().equals(other.type())) {
return compare(other);
} else {
throw new ExpressionEvaluationException(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ public Pair<FunctionSignature, FunctionBuilder> resolve(
arguments -> new FunctionExpression(BuiltinFunctionName.TYPEOF.getName(), arguments) {
@Override
public ExprValue valueOf(Environment<Expression, ExprValue> valueEnv) {
return new ExprStringValue(getArguments().get(0).type().toString());
return new ExprStringValue(getArguments().get(0).type().legacyTypeName());
}

@Override
Expand Down
2 changes: 1 addition & 1 deletion docs/user/dql/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3125,5 +3125,5 @@ Example::
+----------------+---------------+-----------------+------------------+
| typeof(date) | typeof(int) | typeof(now()) | typeof(column) |
|----------------+---------------+-----------------+------------------|
| DATE | INTEGER | DATETIME | STRUCT |
| DATE | INTEGER | DATETIME | object |
+----------------+---------------+-----------------+------------------+
2 changes: 1 addition & 1 deletion docs/user/ppl/functions/system.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ Example::
+----------------+---------------+-----------------+------------------+
| typeof(date) | typeof(int) | typeof(now()) | typeof(column) |
|----------------+---------------+-----------------+------------------|
| DATE | INTEGER | DATETIME | STRUCT |
| DATE | INTEGER | DATETIME | object |
+----------------+---------------+-----------------+------------------+
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ public void typeof_sql_types() throws IOException {
TEST_INDEX_DATATYPE_NUMERIC));
// TODO: test null in PPL
verifyDataRows(response,
rows("STRING", "DOUBLE", "INTEGER", "LONG", "INTERVAL"));
rows("keyword", "DOUBLE", "INTEGER", "LONG", "INTERVAL"));

response = executeQuery(String.format("source=%s | eval "
+ "`timestamp` = typeof(CAST('1961-04-12 09:07:00' AS TIMESTAMP)),"
Expand Down Expand Up @@ -68,7 +68,7 @@ public void typeof_opensearch_types() throws IOException {
+ " | fields `text`, `date`, `boolean`, `object`, `keyword`, `ip`, `binary`, `geo_point`",
TEST_INDEX_DATATYPE_NONNUMERIC));
verifyDataRows(response,
rows("OPENSEARCH_TEXT", "TIMESTAMP", "BOOLEAN", "STRUCT", "STRING",
"OPENSEARCH_IP", "OPENSEARCH_BINARY", "OPENSEARCH_GEO_POINT"));
rows("text", "TIMESTAMP", "BOOLEAN", "object", "keyword",
"ip", "binary", "geo_point"));
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ public void typeof_sql_types() {
JSONObject response = executeJdbcRequest("SELECT typeof('pewpew'), typeof(NULL), typeof(1.0),"
+ "typeof(12345), typeof(1234567891011), typeof(INTERVAL 2 DAY);");
verifyDataRows(response,
rows("STRING", "UNDEFINED", "DOUBLE", "INTEGER", "LONG", "INTERVAL"));
rows("keyword", "UNDEFINED", "DOUBLE", "INTEGER", "LONG", "INTERVAL"));

response = executeJdbcRequest("SELECT"
+ " typeof(CAST('1961-04-12 09:07:00' AS TIMESTAMP)),"
Expand All @@ -54,7 +54,7 @@ public void typeof_opensearch_types() {
//+ ", typeof(nested_value)"
+ " from %s;", TEST_INDEX_DATATYPE_NONNUMERIC));
verifyDataRows(response,
rows("OPENSEARCH_TEXT", "TIMESTAMP", "BOOLEAN", "STRUCT", "STRING",
"OPENSEARCH_IP", "OPENSEARCH_BINARY", "OPENSEARCH_GEO_POINT"));
rows("text", "TIMESTAMP", "BOOLEAN", "object", "keyword",
"ip", "binary", "geo_point"));
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/


package org.opensearch.sql.opensearch.data.type;

import static org.opensearch.sql.data.type.ExprCoreType.UNKNOWN;

import lombok.EqualsAndHashCode;

/**
* The type of a binary value. See
* <a href="https://opensearch.org/docs/latest/opensearch/supported-field-types/binary/">doc</a>
*/
@EqualsAndHashCode(callSuper = false)
acarbonetto marked this conversation as resolved.
Show resolved Hide resolved
public class OpenSearchBinaryType extends OpenSearchDataType {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference of OpenSearchBinaryType and OpenSearchGeoPointType? only class name is different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mapping type is different (line 21 in both files)


public OpenSearchBinaryType() {
super(MappingType.Binary);
exprCoreType = UNKNOWN;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -6,82 +6,248 @@

package org.opensearch.sql.opensearch.data.type;

import static org.opensearch.sql.data.type.ExprCoreType.STRING;
import static org.opensearch.sql.data.type.ExprCoreType.UNKNOWN;

import com.google.common.collect.ImmutableMap;
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.io.Serializable;
import java.util.LinkedHashMap;
import java.util.Map;
import lombok.RequiredArgsConstructor;
import java.util.function.BiConsumer;
import lombok.EqualsAndHashCode;
import lombok.Getter;
import org.opensearch.sql.data.type.ExprCoreType;
import org.opensearch.sql.data.type.ExprType;

/**
* The extension of ExprType in Elasticsearch.
* The extension of ExprType in OpenSearch.
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
*/
@RequiredArgsConstructor
public enum OpenSearchDataType implements ExprType {
/**
* OpenSearch Text. Rather than cast text to other types (STRING), leave it alone to prevent
* cast_to_string(OPENSEARCH_TEXT).
* Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html
*/
OPENSEARCH_TEXT(Collections.singletonList(STRING), "string") {
@Override
public boolean shouldCast(ExprType other) {
return false;
}
},
@EqualsAndHashCode
public class OpenSearchDataType implements ExprType, Serializable {

/**
* OpenSearch multi-fields which has text and keyword.
* Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
* The mapping (OpenSearch engine) type.
*/
OPENSEARCH_TEXT_KEYWORD(Arrays.asList(STRING, OPENSEARCH_TEXT), "string") {
@Override
public boolean shouldCast(ExprType other) {
return false;
public enum MappingType {
Invalid(null),
Text("text"),
Keyword("keyword"),
Ip("ip"),
GeoPoint("geo_point"),
Binary("binary"),
Date("date"),
Object("object"),
Nested("nested"),
Byte("byte"),
Short("short"),
Integer("integer"),
Long("long"),
Float("float"),
HalfFloat("half_float"),
ScaledFloat("scaled_float"),
Double("double"),
Boolean("boolean");
// TODO: ranges, geo shape, point, shape

private String name;

MappingType(String name) {
this.name = name;
}
},

public String toString() {
return name;
}
}

OPENSEARCH_IP(Arrays.asList(UNKNOWN), "ip"),
@EqualsAndHashCode.Exclude
protected MappingType mappingType;

OPENSEARCH_GEO_POINT(Arrays.asList(UNKNOWN), "geo_point"),
// resolved ExprCoreType
protected ExprCoreType exprCoreType;

OPENSEARCH_BINARY(Arrays.asList(UNKNOWN), "binary");
/**
* Get a simplified type {@link ExprCoreType} if possible.
* To avoid returning `UNKNOWN` for `OpenSearch*Type`s, e.g. for IP, returns itself.
* @return An {@link ExprType}.
*/
public ExprType getExprType() {
if (exprCoreType != ExprCoreType.UNKNOWN) {
return exprCoreType;
}
return this;
}

/**
* The mapping between Type and legacy JDBC type name.
* A constructor function which builds proper `OpenSearchDataType` for given mapping `Type`.
* @param mappingType A mapping type.
* @return An instance or inheritor of `OpenSearchDataType`.
*/
private static final Map<ExprType, String> LEGACY_TYPE_NAME_MAPPING =
new ImmutableMap.Builder<ExprType, String>()
.put(OPENSEARCH_TEXT, "text")
.put(OPENSEARCH_TEXT_KEYWORD, "text")
.build();
public static OpenSearchDataType of(MappingType mappingType) {
ExprCoreType exprCoreType = ExprCoreType.UNKNOWN;
switch (mappingType) {
// TODO update these 2 below #1038 https://github.com/opensearch-project/sql/issues/1038
case Text: return new OpenSearchTextType();
case Keyword: exprCoreType = ExprCoreType.STRING;
break;
case Byte: exprCoreType = ExprCoreType.BYTE;
break;
case Short: exprCoreType = ExprCoreType.SHORT;
break;
case Integer: exprCoreType = ExprCoreType.INTEGER;
break;
case Long: exprCoreType = ExprCoreType.LONG;
break;
case HalfFloat: exprCoreType = ExprCoreType.FLOAT;
break;
case Float: exprCoreType = ExprCoreType.FLOAT;
break;
case ScaledFloat: exprCoreType = ExprCoreType.DOUBLE;
break;
case Double: exprCoreType = ExprCoreType.DOUBLE;
break;
case Boolean: exprCoreType = ExprCoreType.BOOLEAN;
break;
// TODO: check formats, it could allow TIME or DATE only
case Date: exprCoreType = ExprCoreType.TIMESTAMP;
break;
case Object: exprCoreType = ExprCoreType.STRUCT;
break;
case Nested: exprCoreType = ExprCoreType.ARRAY;
break;
case GeoPoint: return new OpenSearchGeoPointType();
case Binary: return new OpenSearchBinaryType();
case Ip: return new OpenSearchIpType();
default:
throw new IllegalArgumentException(mappingType.toString());
}
var res = new OpenSearchDataType(mappingType);
res.exprCoreType = exprCoreType;
return res;
}

/**
* Parent of current type.
* A constructor function which builds proper `OpenSearchDataType` for given mapping `Type`.
* Designed to be called by the mapping parser only (and tests).
* @param mappingType A mapping type.
* @param properties Properties to set.
* @param fields Fields to set.
* @return An instance or inheritor of `OpenSearchDataType`.
*/
private final List<ExprType> parents;
public static OpenSearchDataType of(MappingType mappingType,
Map<String, OpenSearchDataType> properties,
Map<String, OpenSearchDataType> fields) {
var res = of(mappingType);
res.properties = ImmutableMap.copyOf(properties);
res.fields = ImmutableMap.copyOf(fields);
return res;
}

protected OpenSearchDataType(MappingType mappingType) {
this.mappingType = mappingType;
}

/**
* JDBC type name.
* A constructor function which builds proper `OpenSearchDataType` for given {@link ExprType}.
* @param type A type.
* @return An instance of `OpenSearchDataType`.
*/
private final String jdbcType;
public static OpenSearchDataType of(ExprType type) {
if (type instanceof OpenSearchDataType) {
return (OpenSearchDataType) type;
}
return new OpenSearchDataType((ExprCoreType) type);
}

@Override
public List<ExprType> getParent() {
return parents;
protected OpenSearchDataType(ExprCoreType type) {
this.exprCoreType = type;
}

protected OpenSearchDataType() {
}

// object and nested types have properties - info about embedded types
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
// a read-only collection
@Getter
@EqualsAndHashCode.Exclude
Map<String, OpenSearchDataType> properties = ImmutableMap.of();

// text could have fields

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically, this is not limited to text unless that is the intention?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only text could have fields

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now. No need to make assumptions for OpenSearch

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So parser tries to read both always, for any type

// a read-only collection
@EqualsAndHashCode.Exclude
Map<String, OpenSearchDataType> fields = ImmutableMap.of();

@Override
public String typeName() {
return jdbcType;
// To avoid breaking changes return `string` for `typeName` call (PPL) and `text` for
// `legacyTypeName` call (SQL). See more: https://github.com/opensearch-project/sql/issues/1296
if (legacyTypeName().equals("text")) {
return "string";
}
return legacyTypeName();
}

@Override
public String legacyTypeName() {
Yury-Fridlyand marked this conversation as resolved.
Show resolved Hide resolved
return LEGACY_TYPE_NAME_MAPPING.getOrDefault(this, typeName());
if (mappingType == null) {
return exprCoreType.typeName();
}
return mappingType.toString().toLowerCase();
}

/**
* Clone type object without {@link #properties} - without info about nested object types.
* @return A cloned object.
*/
protected OpenSearchDataType cloneEmpty() {
var copy = mappingType != null ? of(mappingType) : new OpenSearchDataType(exprCoreType);
copy.fields = fields; //TODO do we need to clone object?
copy.exprCoreType = exprCoreType;
return copy;
}

/**
* Flattens mapping tree into a single layer list of objects (pairs of name-types actually),
* which don't have nested types.
* See {@link OpenSearchDataTypeTest#traverseAndFlatten() test} for example.
* @param tree A list of `OpenSearchDataType`s - map between field name and its type.
* @return A list of all `OpenSearchDataType`s from given map on the same nesting level (1).
* Nested object names are prefixed by names of their host.
*/
public static Map<String, OpenSearchDataType> traverseAndFlatten(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks similar to OpenSearchExprValueFactory.construct/parse/parseArray -- can they be combined?

Map<String, OpenSearchDataType> tree) {
final Map<String, OpenSearchDataType> result = new LinkedHashMap<>();
BiConsumer<Map<String, OpenSearchDataType>, String> visitLevel = new BiConsumer<>() {
@Override
public void accept(Map<String, OpenSearchDataType> subtree, String prefix) {
for (var entry : subtree.entrySet()) {
String entryKey = entry.getKey();
var nextPrefix = prefix.isEmpty() ? entryKey : String.format("%s.%s", prefix, entryKey);
result.put(nextPrefix, entry.getValue().cloneEmpty());
var nextSubtree = entry.getValue().getProperties();
if (!nextSubtree.isEmpty()) {
accept(nextSubtree, nextPrefix);
}
}
}
};
visitLevel.accept(tree, "");
return result;
}

/**
* Resolve type of identified from parsed mapping tree.
* @param tree Parsed mapping tree (not flattened).
* @param id An identifier.
* @return Resolved OpenSearchDataType or null if not found.
*/
public static OpenSearchDataType resolve(Map<String, OpenSearchDataType> tree, String id) {
for (var item : tree.entrySet()) {
if (item.getKey().equals(id)) {
return item.getValue();
}
OpenSearchDataType result = resolve(item.getValue().getProperties(), id);
if (result != null) {
return result;
}
}
return null;
}
}
Loading