Change deprecation warning for CSV.read #687

quinnj · 2020-07-10T16:35:08Z

Fixes JuliaData/DataFrames.jl#2309. After much
discussion, people really like the CSV.read function call naming, so
it was decided to keep it around, while still allowing a break from
DataFrames.jl as a dependency.

I also realized that CSV.read does indeed provide one bit of
value/functionality: we can wrap CSV.File in Tables.CopiedColumns.
What this means is that columnar sinks can safely use the columns passed
without needing to make copies; i.e. they can assume ownership of the
columns. In CSV.read, the user is essentially saying, "I want to make
a CSV.File and pass it directly to sink" which also implies that
CSV.File doesn't need to "own" its own columns.

The only question left in my mind is, with the 1.0 release, what to do
with CSV.read(file) when no sink argument is passed. Suggestions
have included just returning a CSV.File, since it's a valid table
anyway. Or allowing DataFrames.jl to define the no-sink-arg version and
return a DataFrame; that one's a bit awkward as a form of "blessed"
type piracy, but could also be useful for users as convenience (they
just have to remember to do using DataFrames before trying to use it).
The other awkward part is that we're currently warning users of the
deprecation and that they should explicitly spell out DataFrame
whereas we might not actually require that.

Fixes JuliaData/DataFrames.jl#2309. After much discussion, people really like the `CSV.read` function call naming, so it was decided to keep it around, while still allowing a break from DataFrames.jl as a dependency. I also realized that `CSV.read` does indeed provide one bit of value/functionality: we can wrap `CSV.File` in `Tables.CopiedColumns`. What this means is that columnar sinks can safely use the columns passed without needing to make copies; i.e. they can assume ownership of the columns. In `CSV.read`, the user is essentially saying, "I want to make a `CSV.File` and pass it directly to `sink`" which also implies that `CSV.File` doesn't need to "own" its own columns. The only question left in my mind is, with the 1.0 release, what to do with `CSV.read(file)` when no `sink` argument is passed. Suggestions have included just returning a `CSV.File`, since it's a valid table anyway. Or allowing DataFrames.jl to define the no-sink-arg version and return a `DataFrame`; that one's a bit awkward as a form of "blessed" type piracy, but could also be useful for users as convenience (they just have to remember to do `using DataFrames` before trying to use it). The other awkward part is that we're currently warning users of the deprecation and that they should explicitly spell out `DataFrame` whereas we might not actually require that.

src/CSV.jl

bkamins · 2020-07-10T16:45:30Z

what to do with CSV.read(file) when no sink argument is passed

I would disallow it

one's a bit awkward as a form of "blessed" type piracy, but could also be useful for users as convenience

I think it is better to require specifying sink.

piever · 2020-07-10T17:19:04Z

The only question left in my mind is, with the 1.0 release, what to do with CSV.read(file) when no sink argument is passed.

A part from the possibility of returning CSV.File, from the point of view of the casual user who has never heard of DataFrames and just wants to read a .csv file into something they can understand, I imagine it could be useful to default to a simple sink made of simple Base julia types, like Tables.columntable.

Unfortunately, this could be problematic for wide data (hopefully future julia versions will be able to deal with big named tuples). Maybe it's safer to disallow for now, and only decide on this later (it can be a post 1.0 decision if on 1.0 it errors).

bkamins · 2020-07-10T17:25:16Z

it can be a post 1.0 decision if on 1.0 it errors

This is one of the reasons I prefer throwing an error for now (@nalimilan ™️)

codecov · 2020-07-10T17:40:46Z

Codecov Report

Merging #687 into master will decrease coverage by 0.04%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##           master     #687      +/-   ##
==========================================
- Coverage   84.28%   84.23%   -0.05%     
==========================================
  Files          10       10              
  Lines        1801     1802       +1     
==========================================
  Hits         1518     1518              
- Misses        283      284       +1

Impacted Files	Coverage Δ
src/CSV.jl	`28.57% <0.00%> (-4.77%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4136307...ac3761d. Read the comment docs.

JeffBezanson · 2020-07-17T02:47:45Z

👍 Very in favor of this. I also think it would be fine to make DataFrame the default sink. The possibility of different types of tables is not SO important that every user must be forced to think about it from the first time they read a csv file in Julia.

JeffBezanson · 2020-07-17T03:05:05Z

Ah, I understand you want to remove the DataFrames dependency. I think JuliaData/DataFrames.jl#1764 is the way to go --- if that happens then maybe we can add back a default sink.

quinnj mentioned this pull request Jul 10, 2020

Do we need CSV.read? JuliaData/DataFrames.jl#2309

Closed

bkamins reviewed Jul 10, 2020

View reviewed changes

src/CSV.jl Show resolved Hide resolved

bkamins mentioned this pull request Jul 17, 2020

remove DataFrame!? JuliaData/DataFrames.jl#2317

Closed

quinnj merged commit c5ebf92 into master Jul 28, 2020

quinnj deleted the jq/csvread branch July 28, 2020 04:16

omus mentioned this pull request Jul 31, 2020

Fix DataFrame(CSV.File(...)) with header-only #703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change deprecation warning for CSV.read #687

Change deprecation warning for CSV.read #687

quinnj commented Jul 10, 2020

bkamins commented Jul 10, 2020

piever commented Jul 10, 2020 •

edited

Loading

bkamins commented Jul 10, 2020

codecov bot commented Jul 10, 2020 •

edited

Loading

JeffBezanson commented Jul 17, 2020

JeffBezanson commented Jul 17, 2020

Change deprecation warning for CSV.read #687

Change deprecation warning for CSV.read #687

Conversation

quinnj commented Jul 10, 2020

bkamins commented Jul 10, 2020

piever commented Jul 10, 2020 • edited Loading

bkamins commented Jul 10, 2020

codecov bot commented Jul 10, 2020 • edited Loading

Codecov Report

JeffBezanson commented Jul 17, 2020

JeffBezanson commented Jul 17, 2020

piever commented Jul 10, 2020 •

edited

Loading

codecov bot commented Jul 10, 2020 •

edited

Loading