Skip to content

[Bug] CSV.read randomly changes eltype of column  #1089

@hungpham3112

Description

@hungpham3112

Step to reproduce:

  • Copy code into Jupyter notebook or Pluto to see the result
using Plots,DataFrames, DataFramesMeta, CSV, HTTP, Statistics
filename = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DA0101EN-SkillsNetwork/labs/Data%20files/auto.csv"
headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base", "length","width","height","curb-weight","engine-type",
         "num-of-cylinders", "engine-size","fuel-system","bore","stroke","compression-ratio","horsepower",
         "peak-rpm","city-mpg","highway-mpg","price"]
df = CSV.read(HTTP.get(filename).body, DataFrame, header=headers)
eltype(df[!, 1]), eltype(df[!, 2])
  • Run multiple times the line df = CSV.read(HTTP.get(filename).body, DataFrame, header=headers) and see sometimes the column changes its type.

I tested the csv file in Python, the first column is always fixed data type (Float64)-> not the problem with csv file.
Then I tried above snippet in Jupyter notebook and Pluto both experience the same bug. -> The problem with CSV.read and CSV.File

Vid:

  • Pluto.jl
bandicam.2023-05-12.08-33-29-925.mp4
  • Jupyter notebook
bandicam.2023-05-12.08-52-40-528.mp4

Versioninfo:

Julia Version 1.9.0
Commit 8e63055292 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-11370H @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, tigerlake)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_DEPOT_PATH = C:\Users\sofia\.julia;C:\Users\sofia\.julia\juliaup\julia-1.9.0+0.x64.w64.mingw32\local\share\julia;C:\Users\sofia\.julia\juliaup\julia-1.9.0+0.x64.w64.mingw32\share\julia
  JULIA_LOAD_PATH = C:\Users\sofia\AppData\Local\Temp\jl_MjE6XO;@;@v#.#;@stdlib
  JULIA_NUM_THREADS = 8
  JULIA_PROJECT = C:\Users\sofia\JuliaProjects\MachineLearning\LinearRegression\Project.toml
  JULIA_REVISE_WORKER_ONLY = 1
  • CSV: v.10.10

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions