-
Notifications
You must be signed in to change notification settings - Fork 147
Closed
Description
The following code:
using CSV
using DataFrames
using Random
NCOLS = 30
NROWS = 150
Random.seed!(1)
fname = tempname()
f = open(fname, "w")
write(f, join("col".*string.(1:NCOLS), ","))
write(f, "\r\n")
for i in 0:NROWS
write(f, join(string.(rand(Int16, NCOLS)), ","))
write(f, "\r\n")
end
close(f)
df_by_threads = CSV.read(fname, DataFrame)
df_single_threaded = CSV.read(fname, DataFrame; ntasks=1)
print(eltype.(eachcol(df_by_threads)) == eltype.(eachcol(df_single_threaded)))
will print false , or at least it does on my Windows box, with 8 threads, running CSV v0.10.7.
If I reduce NCOLS or NROWS it will be true. If I choose a different random seed it may become true.
If one inspects the columns of df_by_threads, at least one column will be of type String7, but which column may vary with repeated execution and sometimes there are two such columns, even though the data written to the file is fixed.
Metadata
Metadata
Assignees
Labels
No labels