Skip to content

Commit

Permalink
Add support for Blank_Columns to Table and Database (#3812)
Browse files Browse the repository at this point in the history
  • Loading branch information
radeusgd authored Oct 20, 2022
1 parent 81e5e77 commit cc76e7d
Show file tree
Hide file tree
Showing 24 changed files with 485 additions and 266 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@
- [Extended `Filter_Condition` with `Is_In` and `Not_In`.][3790]
- [Replaced `Table.drop_missing_rows` with `filter_blank_rows` with an updated
API.][3805]
- [Replaced `Table.drop_missing_columns` with
`Table.remove_columns Column_Selector.Blank_Columns` by adding the new column
selector variant.][3812]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -344,6 +347,7 @@
[3793]: https://github.com/enso-org/enso/pull/3793
[3790]: https://github.com/enso-org/enso/pull/3790
[3805]: https://github.com/enso-org/enso/pull/3805
[3812]: https://github.com/enso-org/enso/pull/3812

#### Enso Compiler

Expand Down
4 changes: 2 additions & 2 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Data/Any.enso
Original file line number Diff line number Diff line change
Expand Up @@ -259,8 +259,8 @@ type Any
from Standard.Base import all

example_catch =
error = Error.throw (Illegal_Argument_Error "My message")
error.catch Illegal_Argument_Error (err -> err.message)
error = Error.throw (Illegal_Argument_Error_Data "My message")
error.catch Illegal_Argument_Error_Data (err -> err.message)

> Example
Catching any dataflow error and turning it into a regular value.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ type Utf_16_Span
Utility function taking a range pointing at grapheme clusters and converting
to a range on the underlying code units.
range_to_char_indices : Text -> Range -> Range ! (Index_Out_Of_Bounds_Error | Illegal_Argument_Error)
range_to_char_indices text range = if range.step != 1 then Error.throw (Illegal_Argument_Error "Text indexing only supports ranges with step equal to 1.") else
range_to_char_indices text range = if range.step != 1 then Error.throw (Illegal_Argument_Error_Data "Text indexing only supports ranges with step equal to 1.") else
len = text.length
start = if range.start < 0 then range.start + len else range.start
end = if range.end == Nothing then len else (if range.end < 0 then range.end + len else range.end)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,7 @@ type Date
## Returns the century of the date.
century : Integer
century self = if self.year > 0 then (self.year - 1).div 100 + 1 else
Error.throw (Illegal_Argument_Error "Century can only be given for AD years.")
Error.throw (Illegal_Argument_Error_Data "Century can only be given for AD years.")

## Returns the quarter of the year the date falls into.
quarter : Integer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Day_Of_Week.from (that : Integer) (first_day:Day_Of_Week=Day_Of_Week.Sunday) (st
True ->
valid_range = if start_at_zero then "0-6" else "1-7"
message = "Invalid day of week (must be " + valid_range + ")."
Error.throw (Illegal_Argument_Error message)
Error.throw (Illegal_Argument_Error_Data message)
False ->
day_number = if first_day == Day_Of_Week.Sunday then shifted else
(shifted + (first_day.to_integer start_at_zero=True)) % 7
Expand Down
18 changes: 9 additions & 9 deletions distribution/lib/Standard/Base/0.0.0-dev/src/Error/Common.enso
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ type Error
from Standard.Base import all

example_catch =
error = Error.throw (Illegal_Argument_Error "My message")
error.catch Illegal_Argument_Error (err -> err.message)
error = Error.throw (Illegal_Argument_Error_Data "My message")
error.catch Illegal_Argument_Error_Data (err -> err.message)

> Example
Catching any dataflow error and turning it into a regular value.
Expand Down Expand Up @@ -299,7 +299,7 @@ type Panic
and rethrow any others, without affecting their stacktraces.

Panic.catch Any (Panic.throw "foo") caught_panic-> case caught_panic.payload of
Illegal_Argument_Error message _ -> "Illegal arguments were provided: "+message
Illegal_Argument_Error_Data message _ -> "Illegal arguments were provided: "+message
other_panic -> Panic.throw other_panic
throw : Any -> Panic
throw payload = @Builtin_Method "Panic.throw"
Expand Down Expand Up @@ -378,13 +378,13 @@ type Panic
> Example
Handling a specific type of panic.

Panic.catch Illegal_Argument_Error (Panic.throw (Illegal_Argument_Error "Oh no!" Nothing)) error->
Panic.catch Illegal_Argument_Error_Data (Panic.throw (Illegal_Argument_Error_Data "Oh no!" Nothing)) error->
"Caught an `Illegal_Argument_Error`: "+error.payload.message

> Example
Handling any panic.

Panic.catch Any (Panic.throw (Illegal_Argument_Error "Oh no!" Nothing)) error->
Panic.catch Any (Panic.throw (Illegal_Argument_Error_Data "Oh no!" Nothing)) error->
"Caught some panic!"

> Example
Expand All @@ -395,7 +395,7 @@ type Panic
polyglot java import java.lang.NumberFormatException
parse str =
Panic.catch NumberFormatException (Long.parseLong str) caught_panic->
Error.throw (Illegal_Argument_Error "The provided string is not a valid number: "+caught_panic.payload.cause.getMessage)
Error.throw (Illegal_Argument_Error_Data "The provided string is not a valid number: "+caught_panic.payload.cause.getMessage)
catch : Any -> Any -> (Caught_Panic -> Any) -> Any
catch panic_type ~action handler =
Panic.catch_primitive action caught_panic->
Expand Down Expand Up @@ -430,7 +430,7 @@ type Panic
polyglot java import java.lang.NumberFormatException
parse str =
Panic.catch_java NumberFormatException (Long.parseLong str) java_exception->
Error.throw (Illegal_Argument_Error "The provided string is not a valid number: "+java_exception.getMessage)
Error.throw (Illegal_Argument_Error_Data "The provided string is not a valid number: "+java_exception.getMessage)
catch_java : Any -> Any -> (Throwable -> Any) -> Any
catch_java panic_type ~action handler =
Panic.catch_primitive action caught_panic-> case caught_panic.payload of
Expand All @@ -457,12 +457,12 @@ type Panic
> Example
Converting an expected panic to a dataflow error.

Panic.recover Illegal_Argument_Error (Panic.throw (Illegal_Argument_Error "Oh!" Nothing))
Panic.recover Illegal_Argument_Error_Data (Panic.throw (Illegal_Argument_Error_Data "Oh!" Nothing))

> Example
Converting one of many expected panic types to a dataflow error.

Panic.recover [Illegal_Argument_Error, Illegal_State_Error] (Panic.throw (Illegal_Argument_Error "Oh!" Nothing))
Panic.recover [Illegal_Argument_Error, Illegal_State_Error] (Panic.throw (Illegal_Argument_Error_Data "Oh!" Nothing))
recover : (Vector.Vector Any | Any) -> Any -> Any
recover expected_types ~action =
types_to_check = case expected_types of
Expand Down
48 changes: 48 additions & 0 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,35 @@ type Column
not : Column
not self = self.make_unary_op "NOT"

## UNSTABLE
Replaces `True` values with `when_true` and `False` with `when_false`.
Only meant for use with boolean columns.

TODO: Currently `when_true` and `when_false` need to be a single value.
In the future the API will also support row-based IIF if they are columns.
iif : Any -> Any -> Column
iif self when_true when_false =
## TODO we should adjust new_type based on types when_true and
when_false, but this relies on the Value Types design which is still
in progress. This function has status of an internal prototype for
now, so we just rely on a simplified handling. Once Value Types are
properly implemented, this should be accordingly extended for the
full implementation of IIF. We will need to handle when_true and
when_false being either columns or regular values and rely on a
mapping of Enso base types to SQL types, and a rule for extracting a
common type.
approximate_type x = case x of
_ : Integer -> SQL_Type.integer
_ : Decimal -> SQL_Type.real
_ : Text -> SQL_Type.text
_ : Boolean -> SQL_Type.boolean
_ -> Error.throw (Illegal_Argument_Error_Data "Unsupported type.")
left_type = approximate_type when_true
right_type = approximate_type when_false
if left_type != right_type then Error.throw (Illegal_Argument_Error_Data "when_true and when_false types do not match") else
self.make_op "IIF" [when_true, when_false] new_type=left_type


## UNSTABLE

Returns a column of booleans, with `True` items at the positions where
Expand All @@ -473,6 +502,25 @@ type Column
is_empty : Column
is_empty self = self.make_unary_op "IS_EMPTY" new_type=SQL_Type.boolean

## PRIVATE
Returns a column of booleans with `True` at the positions where this
column contains a blank value.

Arguments:
- treat_nans_as_blank: If `True`, then `Number.nan` is considered as
blank.

? Blank values
Blank values are `Nothing`, `""` and depending on setting `Number.nan`.
is_blank : Boolean -> Boolean -> Column
is_blank self treat_nans_as_blank=False =
is_blank = case self.sql_type.is_definitely_text of
True -> self.is_empty
False -> self.is_missing
case treat_nans_as_blank && self.sql_type.is_definitely_double of
True -> is_blank || self.is_nan
False -> is_blank

## UNSTABLE

Returns a new column where missing values have been replaced with the
Expand Down
30 changes: 9 additions & 21 deletions distribution/lib/Standard/Database/0.0.0-dev/src/Data/Table.enso
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ type Table
Icon: select_column
select_columns : Vector Text | Column_Selector -> Boolean -> Problem_Behavior -> Table
select_columns self (columns = Column_Selector.By_Index [0]) (reorder = False) (on_problems = Report_Warning) =
new_columns = Table_Helpers.select_columns internal_columns=self.internal_columns selector=columns reorder=reorder on_problems=on_problems
new_columns = self.columns_helper.select_columns selector=columns reorder=reorder on_problems=on_problems
self.updated_columns new_columns

## Returns a new table with the chosen set of columns, as specified by the
Expand Down Expand Up @@ -195,7 +195,7 @@ type Table
table.remove_columns (Column_Selector.By_Column [column1, column2])
remove_columns : Vector Text | Column_Selector -> Problem_Behavior -> Table
remove_columns self (columns = Column_Selector.By_Index [0]) (on_problems = Report_Warning) =
new_columns = Table_Helpers.remove_columns internal_columns=self.internal_columns selector=columns on_problems=on_problems
new_columns = self.columns_helper.remove_columns selector=columns on_problems=on_problems
self.updated_columns new_columns

## Returns a new table with the specified selection of columns moved to
Expand Down Expand Up @@ -250,7 +250,7 @@ type Table
table.reorder_columns (Column_Selector.By_Column [column1, column2])
reorder_columns : Vector Text | Column_Selector -> Position.Position -> Problem_Behavior -> Table
reorder_columns self (columns = Column_Selector.By_Index [0]) (position = Position.Before_Other_Columns) (on_problems = Report_Warning) =
new_columns = Table_Helpers.reorder_columns internal_columns=self.internal_columns selector=columns position=position on_problems=on_problems
new_columns = self.columns_helper.reorder_columns selector=columns position=position on_problems=on_problems
self.updated_columns new_columns

## Returns a new table with the columns sorted by name according to the
Expand Down Expand Up @@ -797,24 +797,7 @@ type Table
Blank values are `Nothing`, `""` and depending on setting `Number.nan`.
filter_blank_rows : Boolean -> Boolean -> Table
filter_blank_rows self when_any=False treat_nans_as_blank=False =
can_contain_text col = col.sql_type.is_definitely_text
can_contain_double col = col.sql_type.is_definitely_double
Table_Helpers.filter_blank_rows self can_contain_text can_contain_double when_any treat_nans_as_blank

## DEPRECATED Will be replaced with `Incomplete_Columns` selector (to be used with `remove_columns`).
drop_missing_columns : Table
drop_missing_columns self =
rows_expr = Expression.Operation "COUNT_ROWS" []
all_rows_column_name = "row_count"
make_count_expr expr = Expression.Operation "COUNT" [expr]
cols = self.internal_columns.map (c -> [c.name, make_count_expr c.expression])
query = Query.Select [[all_rows_column_name, rows_expr]]+cols self.context
sql = self.connection.dialect.generate_sql query
table = self.connection.read_statement sql
all_rows = table.at all_rows_column_name . at 0
kept_columns = self.internal_columns . filter c->
all_rows == table.at c.name . at 0
self.updated_columns kept_columns
Table_Helpers.filter_blank_rows self when_any treat_nans_as_blank

## Returns the amount of rows in this table.
row_count : Integer
Expand Down Expand Up @@ -917,6 +900,11 @@ type Table
new_ctx = self.context.set_index ixes
Column.Value internal.name self.connection internal.sql_type internal.expression new_ctx

## PRIVATE
columns_helper : Table_Column_Helper
columns_helper self =
Table_Helpers.Table_Column_Helper.Value self.internal_columns self.make_column self .read

## PRIVATE

Returns a copy of this table with updated internal columns.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ make_expression aggregate dialect =
case aggregate of
Group_By c _ -> c.expression
Count _ -> Expression.Operation "COUNT_ROWS" []
Count_Distinct columns _ ignore_nothing -> if columns.is_empty then Error.throw (Illegal_Argument_Error "Count_Distinct must have at least one column.") else
Count_Distinct columns _ ignore_nothing -> if columns.is_empty then Error.throw (Illegal_Argument_Error_Data "Count_Distinct must have at least one column.") else
case ignore_nothing of
True -> Expression.Operation "COUNT_DISTINCT" (columns.map .expression)
False -> Expression.Operation "COUNT_DISTINCT_INCLUDE_NULL" (columns.map .expression)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ base_dialect =
fun = name -> [name, make_function name]

arith = [bin "+", bin "-", bin "*", bin "/", bin "%"]
logic = [bin "AND", bin "OR", unary "NOT"]
logic = [bin "AND", bin "OR", unary "NOT", ["IIF", make_iif]]
compare = [bin "=", bin "!=", bin "<", bin ">", bin "<=", bin ">=", ["BETWEEN", make_between]]
agg = [fun "MAX", fun "MIN", fun "AVG", fun "SUM"]
counts = [fun "COUNT", ["COUNT_ROWS", make_constant "COUNT(*)"]]
Expand All @@ -186,6 +186,17 @@ is_empty = lift_unary_op "IS_EMPTY" arg->
is_empty = (arg ++ " = ''").paren
(is_null ++ " OR " ++ is_empty).paren

## PRIVATE
make_iif : Vector Builder -> Builder
make_iif arguments = case arguments.length of
3 ->
expr = arguments.at 0
when_true = arguments.at 1
when_false = arguments.at 2
(code "CASE WHEN" ++ expr ++ " THEN " ++ when_true ++ " WHEN " ++ expr ++ " IS NULL THEN NULL ELSE " ++ when_false ++ " END").paren
_ ->
Error.throw <| Illegal_State_Error_Data ("Invalid amount of arguments for operation IIF")

## PRIVATE
make_between : Vector Builder -> Builder
make_between arguments = case arguments.length of
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import Standard.Examples

example_drop_missing_cols =
Examples.inventory_table.drop_missing_columns
Examples.inventory_table.remove (Column_Selector.Blank_Columns when_any=True)

> Example
Fill missing values in a column with the value 20.5.
Expand Down
34 changes: 34 additions & 0 deletions distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,22 @@ type Column
not : Column
not self = run_vectorized_unary_op self "not" .not

## UNSTABLE
Replaces `True` values with `when_true` and `False` with `when_false`.
Only meant for use with boolean columns.

TODO: Currently `when_true` and `when_false` need to be a single value.
In the future the API will also support row-based IIF if they are columns.
iif : Any -> Any -> Column
iif self when_true when_false = case self.storage_type of
Storage.Boolean ->
s = self.java_column.getStorage
ix = self.java_column.getIndex
rs = s.iif when_true when_false
Column.Column_Data (Java_Column.new "Result" ix rs)
_ -> Error.throw (Illegal_Argument_Error "`iif` can only be used with boolean columns.")


## Returns a column of booleans, with `True` items at the positions where
this column contains a `Nothing`.

Expand Down Expand Up @@ -513,6 +529,24 @@ type Column
is_present : Column
is_present self = self.is_missing.not

## PRIVATE
Returns a column of booleans with `True` at the positions where this
column contains a blank value.

Arguments:
- treat_nans_as_blank: If `True`, then `Number.nan` is considered as
blank.

? Blank values
Blank values are `Nothing`, `""` and depending on setting `Number.nan`.
is_blank : Boolean -> Boolean -> Column
is_blank self treat_nans_as_blank=False =
case self.storage_type of
Storage.Text -> self.is_empty
Storage.Decimal -> if treat_nans_as_blank then self.is_missing || self.is_nan else self.is_missing
Storage.Any -> if treat_nans_as_blank then self.is_empty || self.is_nan else self.is_empty
_ -> self.is_missing

## ALIAS Fill Missing

Returns a new column where missing values have been replaced with the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,24 @@ type Column_Selector
this approach can be used to match columns with the same names as a set
of columns of some other table, for example, when preparing for a join.
By_Column (columns : Vector Column)

## ALIAS dropna
ALIAS drop_missing_columns
Select columns which are either all blank or contain blank values.

Arguments:
- when_any: By default, only columns consisting of all blank cells are
selected. If set to `True`, columns containing at least one blank value
will be selected too. If there are no rows, the column is treated as
blank regardless of this argument.
- treat_nans_as_blank: If `True`, then `Number.nan` is considered as
blank.

? Blank values
Blank values are `Nothing`, `""` and depending on setting `Number.nan`.

> Example
Remove completely blank columns from a table.

table.remove_columns Column_Selector.Blank_Columns
Blank_Columns when_any:Boolean=False treat_nans_as_blank:Boolean=False
Loading

0 comments on commit cc76e7d

Please sign in to comment.