- add option
matchmissing=:notequal
in joins; inleftjoin
,semijoin
andantijoin
missings are dropped in right data frame, but preserved in left; inrightjoin
missings are dropped in left data frame, but preserved in right; ininnerjoin
missings are dropped in both data frames; inouterjoin
this value of keyword argument is not supported (#2724) - correctly handle selectors of the form
:col => AsTable
and:col => cols
by expanding a single column into multiple columns (#2780)
- fix bug in how
issorted
handles custom orderings and improve performance of sorting when complex custom orderings are passed (#2746) - fix bug in
combine
,select
,select!
,transform
, andtransform!
that incorrectly disallowed matrices ofPair
s inGroupedDataFrame
processing (#2782)
SubDataFrame
,filter!
,unique!
,getindex
,delete!
,leftjoin
,rightjoin
, andouterjoin
are now more efficient if rows selected in internal operations form a continuous block (#2727, #2769)
text/plain
rendering of columns containing complex numbers is now improved (#2756)- in
text/html
display of a data frame show full type information when hovering over the shortened type with a mouse (#2774)
- fix performance issue when aggregation function produces multiple rows in split-apply-combine (2749)
completecases
is now optimized and only processes columns that can contain missing values; additionally it is now type stable and always returns aBitVector
(#2726)- fix performance bottleneck when displaying wide tables (#2750)
- make sure
subset
checks if the passed condition function returns a vector of values (in the 1.0 release also returning scalartrue
,false
, ormissing
was allowed which was unintended and error prone) (#2744)
- fix of performance issue of
groupby
when using multi-threading (#2736) - fix of performance issue of
groupby
when usingPooledVector
(2733)
- No breaking changes are planned for v1.0 release
- DataFrames.jl now checks that passed columns are 1-based as this is a current design assumption (#2594)
mapcols!
makes sure not to create columns beingAbstractRange
consistently with other methods that add columns to aDataFrame
(#2594)transform
andtransform!
always copy columns when column renaming transformation is passed. If similar issues are identified after 1.0 release (i.e. that a copy of data is not made in scenarios where it normally should be made these will be considered bugs and fixed as non-breaking changes) (#2721)
firstindex
,lastindex
,size
,ndims
, andaxes
are now consistently defined and documented in the manual forAbstractDataFrame
,DataFrameRow
,DataFrameRows
,DataFrameColumns
,GroupedDataFrame
,GroupKeys
, andGroupKey
(#2573)- add
subset
andsubset!
functions that allow to subset rows (#2496) names
now allows passing a predicate as a column selector (#2417)vcat
now allows asource
keyword argument that specifies the additional column to be added in the last position in the resulting data frame that will identify the source data frame. (#2649)GroupKey
andDataFrameRow
are consistently behaving likeNamedTuple
in comparisons and they now implement:hash
,==
,isequal
,<
,isless
(#2669])- since Julia 1.7 using broadcasting assignment on a
DataFrame
column selected as a property (e.g.df.col .= 1
) is allowed when column does not exist and it allocates a fresh column (#2655) delete!
now correctly handles the case when columns of a data frame are aliased (#2690)
- in
leftjoin
,rightjoin
, andouterjoin
theindicator
keyword argument is deprecated in favor ofsource
keyword argument;indicator
will be removed in 2.0 release (2649) - Using broadcasting assignment on a
SubDataFrames
column selected as a property (e.g.sdf.col .= 1
) is deprecated; it will be disallowed in the future. (#2655) - Broadcasting assignment to an existing column of a
DataFrame
selected as a property (e.g.df.col .= 1
) being an in-place operation is deprecated. It will allocate a fresh column in the future (#2655) - all deprecations present in 0.22 release now throw an error
(#2554);
in particular
convert
methods,map
onGroupedDataFrame
that were deprecated in 0.22.6 release now throw an error (#2679)
innerjoin
,leftjoin
,rightjoin
,outerjoin
,semijoin
, andantijoin
are now much faster and check if passed data frames are sorted by theon
columns and take into account if shorter data frame that is joined has unique values inon
columns. These aspects of input data frames might affect the order of rows produced in the output (#2612, #2622)DataFrame
constructor,copy
,getindex
,select
,select!
,transform
,transform!
,combine
,sort
, and join functions now use multiple threads in selected operations (#2647, #2588, #2574, #2664)
convert
methods fromAbstractDataFrame
,DataFrameRow
andGroupKey
toArray
,Matrix
,Vector
andTuple
, as well as fromAbstractDict
toDataFrame
, are now deprecated: use corresponding constructors instead. The only conversions that are retained areconvert(::Type{NamedTuple}, dfr::DataFrameRow)
,convert(::Type{NamedTuple}, key::GroupKey)
, andconvert(::Type{DataFrame}, sdf::SubDataFrame)
; the deprecated methods will be removed in 1.0 release- as a bug fix
eltype
of vector returned byeachrow
is nowDataFrameRow
(#2662) - applying
map
toGroupedDataFrame
is now deprecated. It will be an error in 1.0 release. (#2662) copycols
keyword argument is now respected when building aDataFrame
fromTables.CopiedColumns
(#2656)
- the rules for transformations passed to
select
/select!
,transform
/transform!
, andcombine
have been made more flexible; in particular now it is allowed to return multiple columns from a transformation function (#2461 and #2481) - CategoricalArrays.jl is no longer reexported: call
using CategoricalArrays
to use it #2404. In the same vein, thecategorical
andcategorical!
functions have been deprecated in favor oftransform(df, cols .=> categorical .=> cols)
and similar syntaxes #2394.stack
now creates aPooledVector{String}
variable column rather than aCategoricalVector{String}
column by default; passvariable_eltype=CategoricalValue{String}
to get the previous behavior (#2391) isless
forDataFrameRow
s now checks column names (#2292)DataFrameColumns
is now not a subtype ofAbstractVector
(#2291)nunique
is not reported now bydescribe
by default (#2339)- stop reordering columns of the parent in
transform
andtransform!
; always generate columns that were specified to be computed even forGroupedDataFrame
with zero rows (#2324) - improve the rule for automatically generated column names in
combine
/select(!)
/transform(!)
with composed functions (#2274) :nmissing
indescribe
now produces0
if the column does not allow missing values; earliernothing
was produced in this case (#2360)- fast aggregation functions in for
GroupedDataFrame
now correctly choose the fast path only when it is safe; this resolves inconsistencies with what the same functions not using fast path produce (#2357) - joins now return
PooledVector
notCategoricalVector
in indicator column (#2505) GroupKeys
now supportsin
forGroupKey
,Tuple
,NamedTuple
and dictionaries (2392)- in
describe
the specification of custom aggregation is nowfunction => name
; oldname => function
order is now deprecated (#2401) - in joins passing
NaN
or real or imaginary-0.0
inon
column now throws an error; passingmissing
thows an error unlessmatchmissing=:equal
keyword argument is passed (#2504) unstack
now produces row and column keys in the order of their first appearance and has two new keyword argumentsallowmissing
andallowduplicates
(#2494)- PrettyTables.jl is now the
default back-end to print DataFrames to text/plain; the print option
splitcols
was removed and the output format was changed (#2429)
- add
filter
toGroupedDataFrame
(#2279) - add
empty
andempty!
function forDataFrame
that remove all rows from it, but keep columns (#2262) - make
indicator
keyword argument in joins allow passing a string (#2284, #2296) - add new functions to
GroupKey
API to make it more consistent withDataFrameRow
(#2308) - allow column renaming in joins (#2313 and (#2398)
- add
rownumber
toDataFrameRow
(#2356) - allow passing column name to specify the position where a new columns should be
inserted in
insertcols!
(#2365) - allow
GroupedDataFrame
s to be indexed using a dictionary, which can useSymbol
or string keys and are not dependent on the order of keys. (#2281) - add
isapprox
method to check for approximate equality between two dataframes (#2373) - add
columnindex
forDataFrameRow
(#2380) names
now acceptsType
as a column selector (#2400)select
,select!
,transform
,transform!
andcombine
now allowrenamecols
keyword argument that makes it possible to avoid adding transformation function name as a suffix in automatically generated column names (#2397)filter
,sort
,dropmissing
, andunique
now support aview
keyword argument which if set totrue
makes them retun aSubDataFrame
view into the passed data frame.- add
only
method forAbstractDataFrame
(#2449) - passing empty sets of columns in
filter
/filter!
and inselect
/transform
/combine
withByRow
is now accepted (#2476) - add
permutedims
method forAbstractDataFrame
(#2447) - add support for
Cols
from DataAPI.jl (#2495)
DataFrame!
is now deprecated (#2338)- several in-standard
DataFrame
constructors are now deprecated (#2464) - all old deprecations now throw an error (#2350)
- Tables.jl version 1.2 is now required.
- DataAPI.jl version 1.4 is now required. It implies that
All(args...)
is deprecated andCols(args...)
is recommended instead.All()
is still supported.
- Documentation is now available also in Dark mode (#2315)
- add rich display support for Markdown cell entries in HTML and LaTeX (#2346)
- limit the maximal display width the output can use in
text/plain
before being truncated (in thetextwidth
sense, excluding…
) to32
per column by default and fix a corner case when no columns are printed in situations when they are too wide (#2403) - Common methods are now precompiled to improve responsiveness the first time a method is called in a Julia session. Precompilation takes up to 30 seconds after installing the package (#2456).