Skip to content

Commit

Permalink
Implement In-Memory Table order_by (#3515)
Browse files Browse the repository at this point in the history
Implemented the `order_by` function with support for all modes of operation.
Added support for case insensitive natural order.

# Important Notes
- Improved MultiValueIndex/Key to not create loads of arrays.
- Adjusted HashCode for MultiValueKey to have a simple algorithm.
- Added Text_Utils.compare_normalized_ignoring_case to allow for case insensitive comparisons.
- Fixed issues with ObjectComparator and added some unit tests for it.
  • Loading branch information
jdunkerley authored Jun 8, 2022
1 parent c602404 commit 8afba43
Show file tree
Hide file tree
Showing 24 changed files with 392 additions and 163 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@
- [Added rank data, correlation and covariance statistics for `Vector`][3484]
- [Implemented `Table.order_by` for the SQLite backend.][3502]
- [Implemented `Table.order_by` for the PostgreSQL backend.][3514]
- [Implemented `Table.order_by` for the in-memory table.][3515]
- [Renamed `File_Format.Text` to `Plain_Text`, updated `File_Format.Delimited`
API and added builders for customizing less common settings.][3516]

Expand Down Expand Up @@ -212,6 +213,7 @@
[3484]: https://github.com/enso-org/enso/pull/3484
[3502]: https://github.com/enso-org/enso/pull/3502
[3514]: https://github.com/enso-org/enso/pull/3514
[3515]: https://github.com/enso-org/enso/pull/3515
[3516]: https://github.com/enso-org/enso/pull/3516

#### Enso Compiler
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from Standard.Base import Any, Ordering, Nothing, Vector
from Standard.Base import all
import Standard.Base.Data.Ordering.Natural_Order
from Standard.Base.Data.Text.Text_Ordering as Text_Ordering_Module import Text_Ordering

polyglot java import org.enso.base.ObjectComparator

Expand All @@ -9,10 +11,26 @@ polyglot java import org.enso.base.ObjectComparator
- custom_comparator:
If `Nothing` will get a singleton instance for `.compare_to`.
Otherwise can support a custom fallback comparator.
new : Nothing | (Any->Any->Ordering)
new : Nothing | (Any->Any->Ordering) -> ObjectComparator
new custom_comparator=Nothing =
comparator_to_java cmp x y = Vector.handle_incomparable_value (cmp x y . to_sign)

case custom_comparator of
Nothing -> ObjectComparator.getInstance (comparator_to_java .compare_to)
_ -> ObjectComparator.new (comparator_to_java custom_comparator)

## ADVANCED
Create a Java Comparator with the specified Text_Ordering

Arguments:
- text_ordering:
Specifies how to compare Text values within the Comparator.
for_text_ordering : Text_Ordering -> ObjectComparator
for_text_ordering text_ordering =
case text_ordering.sort_digits_as_numbers of
True ->
txt_cmp a b = Natural_Order.compare a b text_ordering.case_sensitive . to_sign
here.new.withCustomTextComparator txt_cmp
False -> case text_ordering.case_sensitive of
Case_Insensitive locale -> here.new.withCaseInsensitivity locale.java_locale
_ -> here.new
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,12 @@ polyglot java import com.ibm.icu.text.BreakIterator
Sort a vector of texts according to the natural dictionary ordering.

["a2", "a1", "a100", "a001", "a0001"].sort by=Natural_Order.compare . should_equal ["a0001", "a001", "a1", "a2", "a100"]
compare : Text -> Text -> Ordering
compare text1 text2 =
compare : Text -> Text -> (True|Case_Insensitive) Ordering
compare text1 text2 case_sensitive=True =
compare_text = case case_sensitive of
Case_Insensitive locale -> a -> b -> a.compare_to_ignore_case b locale
_ -> _.compare_to _

iter1 = BreakIterator.getCharacterInstance
iter1.setText text1

Expand Down Expand Up @@ -79,7 +83,7 @@ compare text1 text2 =
if (tmp.first.not && tmp.second) then Ordering.Greater else
case tmp.first.not of
True ->
text_comparison = substring1.compare_to substring2
text_comparison = compare_text substring1 substring2
if text_comparison != Ordering.Equal then text_comparison else
@Tail_Call order next1 iter1.next next2 iter2.next
False ->
Expand All @@ -93,7 +97,7 @@ compare text1 text2 =

value_comparison = value1.compare_to value2
if value_comparison != Ordering.Equal then value_comparison else
text_comparison = num_text1.compare_to num_text2
text_comparison = compare_text num_text1 num_text2
if text_comparison != Ordering.Equal then text_comparison else
@Tail_Call order (parsed1.at 2) (parsed1.at 3) (parsed2.at 2) (parsed2.at 3)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -614,6 +614,22 @@ Text.compare_to that =
if comparison_result < 0 then Ordering.Less else
Ordering.Greater

## Compare two texts to discover their ordering.

Arguments:
- that: The text to order `this` with respect to.

> Example
Checking how "a" orders in relation to "b".

"a".compare_to_ignore_case "b"
Text.compare_to_ignore_case : Text -> Locale -> Ordering
Text.compare_to_ignore_case that locale=Locale.default =
comparison_result = Text_Utils.compare_normalized_ignoring_case this that locale.java_locale
if comparison_result == 0 then Ordering.Equal else
if comparison_result < 0 then Ordering.Less else
Ordering.Greater

## ALIAS Check Emptiness

Check if `this` is empty.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ make_first_aggregator reverse ignore_null args =
filter_clause = if ignore_null.not then Sql.code "" else
Sql.code " FILTER (WHERE " ++ result_expr.paren ++ Sql.code " IS NOT NULL)"
modified_order_exprs =
order_exprs.map expr-> expr ++ Sql.code " ASC NULLS LAST"
order_exprs.map expr-> expr ++ Sql.code " ASC NULLS FIRST"
order_clause =
Sql.code " ORDER BY " ++ Sql.join "," modified_order_exprs
index_expr = case reverse of
Expand Down
50 changes: 49 additions & 1 deletion distribution/lib/Standard/Table/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,14 @@ from Standard.Table.Data.Data_Formatter as Data_Formatter_Module import Data_For
from Standard.Base.Data.Text.Text_Ordering as Text_Ordering_Module import Text_Ordering
from Standard.Base.Error.Problem_Behavior as Problem_Behavior_Module import Problem_Behavior, Report_Warning
from Standard.Table.Error as Error_Module import Missing_Input_Columns, Column_Indexes_Out_Of_Range, Duplicate_Type_Selector

import Standard.Table.Data.Column_Mapping
import Standard.Table.Data.Position
import Standard.Table.Data.Sort_Column_Selector
import Standard.Table.Data.Sort_Column

import Standard.Table.Data.Aggregate_Column
import Standard.Base.Data.Ordering.Comparator

polyglot java import org.enso.table.data.table.Table as Java_Table
polyglot java import org.enso.table.data.table.Column as Java_Column
Expand Down Expand Up @@ -524,7 +528,7 @@ type Table

on_problems.attach_problems_before validated.problems <|
java_key_columns = validated.key_columns.map .java_column
index = this.java_table.indexFromColumns java_key_columns.to_array
index = this.java_table.indexFromColumns java_key_columns.to_array Comparator.new

new_columns = validated.valid_columns.map c->(Aggregate_Column_Helper.java_aggregator c.first c.second)

Expand All @@ -535,6 +539,50 @@ type Table
problems = java_table.getProblems
Aggregate_Column_Helper.parse_aggregated_problems problems

## Sorts the rows of the table according to the specified columns and order.

Arguments:
- columns: The columns and order to sort the table.
- text_ordering: The ordering method to use on text values.
- on_problems: Specifies how to handle if a problem occurs, raising as a
warning by default. The following problems can occur:
- If a column in `columns` is not present in the input table, a
`Missing_Input_Columns`.
- If duplicate columns, names or indices are provided, a
`Duplicate_Column_Selectors`.
- If a column index is out of range, a `Column_Indexes_Out_Of_Range`.
- If two distinct indices refer to the same column, an
`Input_Indices_Already_Matched`.
- If two name matchers match the same column, a
`Column_Matched_By_Multiple_Selectors`.
- If no valid columns are selected, a `No_Input_Columns_Selected`.
- If values do not implement an ordering, an
`Incomparable_Values_Error`.

> Example
Order the table by the column "alpha" in ascending order.

table.order_by (Sort_Column_Selector.By_Name [Sort_Column.Name "alpha"])

> Example
Order the table by the second column in ascending order. In case of any
ties, break them based on the 7th column from the end of the table in
descending order.

table.order_by (Sort_Column_Selector.By_Index [Sort_Column.Index 1, Sort_Column.Index -7 Sort_Direction.Descending])
order_by : Sort_Column_Selector -> Text_Ordering -> Problem_Behavior -> Table
order_by (columns = (Sort_Column_Selector.By_Name [(Sort_Column.Name (this.columns.at 0 . name))])) text_ordering=Text_Ordering on_problems=Report_Warning =
columns_for_ordering = Table_Helpers.prepare_order_by this.columns columns on_problems
selected_columns = columns_for_ordering.map c->c.column.java_column
ordering = columns_for_ordering.map c->
case c.associated_selector.direction of
Sort_Direction.Ascending -> 1
Sort_Direction.Descending -> -1
comparator = Comparator.for_text_ordering text_ordering
Table <|
this.java_table.orderBy selected_columns.to_array ordering.to_array comparator


## Parses columns within a Table to a specific value type.
By default, it looks at all `Text` columns and attempts to deduce the
type (columns with other types are not affected). If `column_types` are
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ resolve_aggregate table problem_builder aggregate_column =

resolve_selector_to_vector : Column_Selector -> [Column] ! Internal_Missing_Column_Error
resolve_selector_to_vector selector =
resolved = Table_Helpers.select_columns_helper table_columns selector reorder=False problem_builder
resolved = Table_Helpers.select_columns_helper table_columns selector reorder=True problem_builder
if resolved.is_empty then Error.throw Internal_Missing_Column_Error else resolved

resolve_selector_or_nothing selector = case selector of
Expand Down Expand Up @@ -175,7 +175,7 @@ java_aggregator name column =
Count _ -> CountAggregator.new name
Count_Distinct columns _ ignore_nothing ->
resolved = columns.map .java_column
CountDistinctAggregator.new name resolved.to_array ignore_nothing
CountDistinctAggregator.new name resolved.to_array ignore_nothing Comparator.new
Count_Not_Nothing c _ -> CountNothingAggregator.new name c.java_column False
Count_Nothing c _ -> CountNothingAggregator.new name c.java_column True
Count_Not_Empty c _ -> CountEmptyAggregator.new name c.java_column False
Expand Down
99 changes: 58 additions & 41 deletions std-bits/base/src/main/java/org/enso/base/ObjectComparator.java
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,30 @@
import java.time.LocalDateTime;
import java.time.LocalTime;
import java.util.Comparator;
import java.util.Locale;
import java.util.function.BiFunction;

public class ObjectComparator implements Comparator<Object> {
private static ObjectComparator INSTANCE;

/**
* A singleton instance of an ObjectComparator
* A singleton instance of an ObjectComparator.
*
* @param fallbackComparator this MUST be the default .compare_to function for Enso. Needs to be
* passed to allow calling back from Java.
* @return Comparator object
* @return Comparator object.
*/
public static ObjectComparator getInstance(BiFunction<Object, Object, Long> fallbackComparator) {
if (INSTANCE == null) {
INSTANCE = new ObjectComparator((l, r) -> fallbackComparator.apply(l, r).intValue());
INSTANCE = new ObjectComparator(fallbackComparator);
}

return INSTANCE;
}

private final BiFunction<Object, Object, Integer> fallbackComparator;
private final BiFunction<Object, Object, Long> fallbackComparator;
private final BiFunction<String, String, Long> textComparator;


public ObjectComparator() {
this(
Expand All @@ -33,41 +36,60 @@ public ObjectComparator() {
});
}

public ObjectComparator(BiFunction<Object, Object, Integer> fallbackComparator) {
public ObjectComparator(BiFunction<Object, Object, Long> fallbackComparator) {
this(fallbackComparator, (a, b) -> Long.valueOf(Text_Utils.compare_normalized(a, b)));
}

private ObjectComparator(BiFunction<Object, Object, Long> fallbackComparator, BiFunction<String, String, Long> textComparator) {
this.fallbackComparator = fallbackComparator;
this.textComparator = textComparator;
}

/**
* Create a copy of the ObjectComparator with case-insensitive text comparisons.
* @param locale to use for case folding.
* @return Comparator object.
*/
public ObjectComparator withCaseInsensitivity(Locale locale) {
return new ObjectComparator(this.fallbackComparator, (a, b) -> Long.valueOf(Text_Utils.compare_normalized_ignoring_case(a, b, locale)));
}

/**
* Create a copy of the ObjectComparator with case-insensitive text comparisons.
* @param textComparator custom comparator for Text.
* @return Comparator object.
*/
public ObjectComparator withCustomTextComparator(BiFunction<String, String, Long> textComparator) {
return new ObjectComparator(this.fallbackComparator, textComparator);
}

@Override
public int compare(Object thisValue, Object thatValue) throws ClassCastException {
// NULLs
if (thisValue == null) {
if (thatValue != null) {
return 1;
return -1;
}
return 0;
}
if (thatValue == null) {
return -1;
return 1;
}

// Booleans
if (thisValue instanceof Boolean && thatValue instanceof Boolean) {
boolean thisBool = (Boolean) thisValue;
boolean thatBool = (Boolean) thatValue;
if (thisValue instanceof Boolean thisBool && thatValue instanceof Boolean thatBool) {
if (thisBool == thatBool) {
return 0;
}
return thisBool ? 1 : -1;
}

// Long this
if (thisValue instanceof Long) {
Long thisLong = (Long) thisValue;
if (thatValue instanceof Long) {
return thisLong.compareTo((Long) thatValue);
if (thisValue instanceof Long thisLong) {
if (thatValue instanceof Long thatLong) {
return thisLong.compareTo(thatLong);
}
if (thatValue instanceof Double) {
Double thatDouble = (Double) thatValue;
if (thatValue instanceof Double thatDouble) {
if (thisLong > thatDouble) {
return 1;
}
Expand All @@ -79,13 +101,11 @@ public int compare(Object thisValue, Object thatValue) throws ClassCastException
}

// Double this
if (thisValue instanceof Double) {
Double thisDouble = (Double) thisValue;
if (thatValue instanceof Double) {
return thisDouble.compareTo((Double) thatValue);
if (thisValue instanceof Double thisDouble) {
if (thatValue instanceof Double thatDouble) {
return thisDouble.compareTo(thatDouble);
}
if (thatValue instanceof Long) {
Long thatLong = (Long) thatValue;
if (thatValue instanceof Long thatLong) {
if (thisDouble > thatLong) {
return 1;
}
Expand All @@ -97,39 +117,36 @@ public int compare(Object thisValue, Object thatValue) throws ClassCastException
}

// Text
if (thisValue instanceof String && thatValue instanceof String) {
return Text_Utils.compare_normalized((String) thisValue, (String) thatValue);
if (thisValue instanceof String thisString && thatValue instanceof String thatString) {
return textComparator.apply(thisString, thatString).intValue();
}

// DateTimes
if (thisValue instanceof LocalDate) {
LocalDate thisDate = (LocalDate) thisValue;
if (thatValue instanceof LocalDate) {
return thisDate.compareTo((LocalDate) thatValue);
if (thisValue instanceof LocalDate thisDate) {
if (thatValue instanceof LocalDate thatDate) {
return thisDate.compareTo(thatDate);
}
if (thatValue instanceof LocalDateTime) {
return thisDate.atStartOfDay().compareTo((LocalDateTime) thatValue);
if (thatValue instanceof LocalDateTime thatDateTime) {
return thisDate.atStartOfDay().compareTo(thatDateTime);
}
}
if (thisValue instanceof LocalDateTime) {
LocalDateTime thisDateTime = (LocalDateTime) thisValue;
if (thatValue instanceof LocalDate) {
return thisDateTime.compareTo(((LocalDate) thatValue).atStartOfDay());
if (thisValue instanceof LocalDateTime thisDateTime) {
if (thatValue instanceof LocalDate thatDate) {
return thisDateTime.compareTo(thatDate.atStartOfDay());
}
if (thatValue instanceof LocalDateTime) {
return thisDateTime.compareTo((LocalDateTime) thatValue);
if (thatValue instanceof LocalDateTime thatDateTime) {
return thisDateTime.compareTo(thatDateTime);
}
}

// TimeOfDay
if (thisValue instanceof LocalTime) {
LocalTime thisTime = (LocalTime) thisValue;
if (thatValue instanceof LocalTime) {
return thisTime.compareTo((LocalTime) thatValue);
if (thisValue instanceof LocalTime thisTime) {
if (thatValue instanceof LocalTime thatTime) {
return thisTime.compareTo(thatTime);
}
}

// Fallback to Enso
return fallbackComparator.apply(thisValue, thatValue);
return fallbackComparator.apply(thisValue, thatValue).intValue();
}
}
Loading

0 comments on commit 8afba43

Please sign in to comment.