-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change deprecation warning for CSV.read #687
Conversation
Fixes JuliaData/DataFrames.jl#2309. After much discussion, people really like the `CSV.read` function call naming, so it was decided to keep it around, while still allowing a break from DataFrames.jl as a dependency. I also realized that `CSV.read` does indeed provide one bit of value/functionality: we can wrap `CSV.File` in `Tables.CopiedColumns`. What this means is that columnar sinks can safely use the columns passed without needing to make copies; i.e. they can assume ownership of the columns. In `CSV.read`, the user is essentially saying, "I want to make a `CSV.File` and pass it directly to `sink`" which also implies that `CSV.File` doesn't need to "own" its own columns. The only question left in my mind is, with the 1.0 release, what to do with `CSV.read(file)` when no `sink` argument is passed. Suggestions have included just returning a `CSV.File`, since it's a valid table anyway. Or allowing DataFrames.jl to define the no-sink-arg version and return a `DataFrame`; that one's a bit awkward as a form of "blessed" type piracy, but could also be useful for users as convenience (they just have to remember to do `using DataFrames` before trying to use it). The other awkward part is that we're currently warning users of the deprecation and that they should explicitly spell out `DataFrame` whereas we might not actually require that.
I would disallow it
I think it is better to require specifying sink. |
A part from the possibility of returning Unfortunately, this could be problematic for wide data (hopefully future julia versions will be able to deal with big named tuples). Maybe it's safer to disallow for now, and only decide on this later (it can be a post 1.0 decision if on 1.0 it errors). |
This is one of the reasons I prefer throwing an error for now (@nalimilan ™️) |
Codecov Report
@@ Coverage Diff @@
## master #687 +/- ##
==========================================
- Coverage 84.28% 84.23% -0.05%
==========================================
Files 10 10
Lines 1801 1802 +1
==========================================
Hits 1518 1518
- Misses 283 284 +1
Continue to review full report at Codecov.
|
👍 Very in favor of this. I also think it would be fine to make |
Ah, I understand you want to remove the DataFrames dependency. I think JuliaData/DataFrames.jl#1764 is the way to go --- if that happens then maybe we can add back a default sink. |
Fixes JuliaData/DataFrames.jl#2309. After much
discussion, people really like the
CSV.read
function call naming, soit was decided to keep it around, while still allowing a break from
DataFrames.jl as a dependency.
I also realized that
CSV.read
does indeed provide one bit ofvalue/functionality: we can wrap
CSV.File
inTables.CopiedColumns
.What this means is that columnar sinks can safely use the columns passed
without needing to make copies; i.e. they can assume ownership of the
columns. In
CSV.read
, the user is essentially saying, "I want to makea
CSV.File
and pass it directly tosink
" which also implies thatCSV.File
doesn't need to "own" its own columns.The only question left in my mind is, with the 1.0 release, what to do
with
CSV.read(file)
when nosink
argument is passed. Suggestionshave included just returning a
CSV.File
, since it's a valid tableanyway. Or allowing DataFrames.jl to define the no-sink-arg version and
return a
DataFrame
; that one's a bit awkward as a form of "blessed"type piracy, but could also be useful for users as convenience (they
just have to remember to do
using DataFrames
before trying to use it).The other awkward part is that we're currently warning users of the
deprecation and that they should explicitly spell out
DataFrame
whereas we might not actually require that.