All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
v0.5.1 - 2023-02-17
-
Add boolean dtype to
Series.in/2
. -
Add binary dtype to
Series.in/2
. -
Add
Series.day_of_week/1
. -
Allow
Series.fill_missing/2
to:- receive
:infinity
and:neg_infinity
values. - receive date and datetime values.
- receive binary values.
- receive
-
Add support for
time
dtype. -
Add version of
Series.pow/2
that accepts series on both sides. -
Allow
Series.from_list/2
to receive:nan
,:infinity
and:neg_infinity
atoms. -
Add
Series.to_date/1
andSeries.to_time/1
for datetime series. -
Allow casting of string series to category.
-
Accept tensors when creating a new dataframe.
-
Add compatibility with Nx v0.5.
-
Add support for Nx's serialize and deserialize.
-
Add the following function implementations for the Polars' Lazy dataframe backend:
arrange_with
concat_columns
concat_rows
distinct
drop_nil
filter_with
join
mutate_with
pivot_longer
rename
summarise_with
to_parquet
Only
summarise_with
supports groups for this version.
- Require version of Rustler to be
~> 0.27.0
, which mirrors the NIF requirement.
- Casting to an unknown dtype returns a better error message.
v0.5.0 - 2023-01-12
-
Add
DataFrame.describe/2
to gather some statistics from a dataframe. -
Add
Series.nil_count/1
to count nil values. -
Add
Series.in/2
to check if a given value is inside a series. -
Add
Series
float predicates:is_finite/1
,is_infinite/1
andis_nan/1
. -
Add
Series
string functions:contains/2
,trim/1
,trim_leading/1
,trim_trailing/1
,upcase/1
anddowncase/1
. -
Enable slicing of lazy frames (
LazyFrame
). -
Add IO operations "from/load" to the lazy frame implementation.
-
Add support for the
:lazy
option in theDataFrame.new/2
function. -
Add
Series
float rounding methods:round/2
,floor/1
andceil/1
. -
Add support for precompiling to Linux running on RISCV CPUs.
-
Add support for precompiling to Linux - with musl - running on AARCH64 computers.
-
Allow
DataFrame.new/1
to receive the:dtypes
option. -
Accept
:nan
as an option forSeries.fill_missing/2
with float series. -
Add basic support for the categorical dtype - the
:category
dtype. -
Add
Series.categories/1
to return categories from a categorical series. -
Add
Series.categorise/2
to categorise a series of integers using predefined categories. -
Add
Series.replace/2
to replace the contents of a series. -
Support selecting columns with unusual names (like with spaces) inside
Explorer.Query
withcol/1
.The usage is like this:
Explorer.DataFrame.filter(df, col("my col") > 42)
- Fix
DataFrame.mutate/2
using a boolean scalar value. - Stop leaking
UInt32
series to Elixir. - Cast numeric columns to our supported dtypes after IO read. This fix is only applied for the eager implementation for now.
- Rename
Series.bintype/1
toSeries.iotype/1
.
v0.4.0 - 2022-11-29
-
Add
Series.quotient/2
andSeries.remainder/2
to work with integer division. -
Add
Series.iotype/1
to return the underlying representation type. -
Allow series on both sides of binary operations, like:
add(series, 1)
andadd(1, series)
. -
Allow comparison, concat and coalesce operations on "(series, lazy series)".
-
Add lazy version of
Series.sample/3
andSeries.size/1
. -
Add support for Arrow IPC Stream files.
-
Add
Explorer.Query
and the macros that allow a simplified query API. This is a huge improvement to some of the main functions, and allow refering to columns as they were variables.Before this change we would need to write a filter like this:
Explorer.DataFrame.filter_with(df, &Explorer.Series.greater(&1["col1"], 42))
But now it's also possible to write this operation like this:
Explorer.DataFrame.filter(df, col1 > 42)
This operation is going to use
filter_with/2
underneath, which means that is going to use lazy series and compute the results at once. Notice that is mandatory to "require" the DataFrame module, since these operations are implemented as macros.The following new macros were added:
filter/2
mutate/2
summarise/2
arrange/2
They substitute older versions that did not accept the new query syntax.
-
Add
DataFrame.put/3
to enable adding or replacing columns in a eager manner. This works similar to the previous version ofmutate/2
. -
Add
Series.select/3
operation that enables selecting a value from two series based on a predicate. -
Add "dump" and "load" functions to IO operations. They are useful to load or dump dataframes from/to memory.
-
Add
Series.to_iovec/2
andSeries.to_binary/1
. They return the underlying representation of series as binary. The first one returns a list of binaries, possibly with one element if the series is contiguous in memory. The second one returns a single binary representing the series. -
Add
Series.shift/2
that shifts the series by an offset with nil values. -
Rename
Series.fetch!/2
andSeries.take_every/2
toSeries.at/2
andSeries.at_every/2
. -
Add
DataFrame.discard/2
to drop columns. This is the opposite ofselect/2
. -
Implement
Nx.LazyContainer
forExplorer.DataFrame
andExplorer.Series
so data can be passed into Nx. -
Add
Series.not/1
that negates values in a boolean series. -
Add the
:binary
dtype for Series. This enables the usage of arbitrary binaries.
- Change DataFrame's
to_*
functions to return only:ok
. - Change series inspect to resamble the dataframe inspect with the backend name.
- Rename
Series.var/1
toSeries.variance/1
- Rename
Series.std/1
toSeries.standard_deviation/1
- Rename
Series.count/2
toSeries.frequencies/1
and add a newSeries.count/1
that returns the size of an "eager" series, or the count of members in a group for a lazy series. In case there is no groups, it calculates the size of the dataframe. - Change the option to control direction in
Series.sort/2
andSeries.argsort/2
. Instead of a boolean, now we have a new option called:direction
that accepts:asc
or:desc
.
- Fix the following DataFrame functions to work with groups:
filter_with/2
head/2
tail/2
slice/2
slice/3
pivot_longer/3
pivot_wider/4
concat_rows/1
concat_columns/1
- Improve the documentation of functions that behave differently with groups.
- Fix
arrange_with/2
to use "group by" stable, making results more predictable. - Add
nil
as a possible return value of aggregations. - Fix the behaviour of
Series.sort/2
andSeries.argsort/2
to add nils at the front when direction is descending, or at the back when the direction is ascending. This also adds an option to control this behaviour.
- Remove support for
NDJSON
read and write for ARM 32 bits targets. This is due to a limitation of a dependency of Polars.
v0.3.1 - 2022-09-09
- Define
multiply
inside*_with
operations. - Fix column types in several operations, such as
n_distinct
.
v0.3.0 - 2022-09-01
-
Add
DataFrame.concat_columns/1
andDataFrame.concat_columns/2
for horizontally stacking dataframes. -
Add compression as an option to write parquet files.
-
Add count metadata to
DataFrame
table reader. -
Add
DataFrame.filter_with/2
,DataFrame.summarise_with/2
,DataFrame.mutate_with/2
andDataFrame.arrange_with/2
. They all accept aDataFrame
and a function, and they all work with a new concept called "lazy series".Lazy Series is an opaque representation of a series that can be used to perform complex operations without pulling data from the series. This is faster than using masks. There is no big difference from the API perspective compared to the functions that were accepting callbacks before (eg.
filter/2
and the newfilter_with/2
), with the exception beingDataFrame.summarise_with/2
that now accepts a lot more operations.
- Bump version requirement of the
table
dependency to~> 0.1.2
, and raise for non-tabular values. - Normalize how columns are handled. This changes some functions to accept one column or a list of columns, ranges, indexes and callbacks selecting columns.
- Rename
DataFrame.filter/2
toDataFrame.mask/2
. - Rename
Series.filter/2
toSeries.mask/2
. - Rename
take/2
from bothSeries
andDataFrame
toslice/2
.slice/2
now they accept ranges as well. - Raise an error if
DataFrame.pivot_wider/4
has float columns as IDs. This is because we can´t properly compare floats. - Change
DataFrame.distinct/2
to accept columns as argument instead of receiving it as option.
- Ensure that we can compare boolean series in functions like
Series.equal/2
. - Fix rename of columns after summarise.
- Fix inspect of float series containing
NaN
orInfinity
values. They are represented as atoms.
- Deprecate
DataFrame.filter/2
with a callback in favor ofDataFrame.filter_with/2
.
v0.2.0 - 2022-06-22
- Consistently support ranges throughout the columns API
- Support negative indexes throughout the columns API
- Integrate with the
table
package - Add
Series.to_enum/1
for lazily traversing the series - Add
Series.coalesce/1
andSeries.coalesce/2
for finding the first non-null value in a list of series
Series.length/1
is nowSeries.size/1
in keeping with Elixir idiomsNx
is now an optional dependency- Minimum Elixir version is now 1.13
DataFrame.to_map/2
is nowDataFrame.to_columns/2
andDataFrame.to_series/2
Rustler
is now an optional dependencyread_
andwrite_
IO functions are nowfrom_
andto_
to_binary
is nowdump_csv
- Now uses
polars
's "simd" feature - Now uses
polars
's "performant" feature Explorer.default_backend/0
is nowExplorer.Backend.get/0
Explorer.default_backend/1
is nowExplorer.Backend.put/1
Series.cum_*
functions are nowSeries.cumulative_*
to mirrorNx
Series.rolling_*
functions are nowSeries.window_*
to mirrorNx
reverse?
is now an option instead of an argument inSeries.cumulative_*
functionsDataFrame.from_columns/2
andDataFrame.from_rows/2
is nowDataFrame.new/2
- Rename "col" to "column" throughout the API
- Remove "with_" prefix in options throughout the API
DataFrame.table/2
accepts options with:limit
instead of single integerrename/2
no longer accepts a function, userename_with/2
insteadrename_with/3
now expects the function as the last argument
- Explorer now works on Linux with musl
v0.1.1 - 2022-04-27
v0.1.0 - 2022-04-26
First release.