Skip to content

Accessing data in a DataFrameColumn is insanely slow. #5966

Open
@DrDryg

Description

@DrDryg

System Information (please complete the following information):

  • Win 10
  • Microsoft.Data.Analysis 0.18.0
  • .net framework 4.7.2

Describe the bug
Accessing data in a PrimitiveDataFrameColumn<> is very very very slow.

To Reproduce
int n = 1000_000;
PrimitiveDataFrameColumn column = new PrimitiveDataFrameColumn("Name", n);

for (int i = 0; i <n; i++)
column[i] = 1;

Expected behavior
I filling in values in a column should cost a few clock cycles per value. So perhaps at least 100 million values per second should be achievable on a normal computer. But 1 million elements take around 0.5s on a high performance new laptop.

Is it simply that nullable objects are this slow? If that is the case, why did you go for such a technology for a data processing library where performance is a key factor?

For perspective, writing the data to disk is 10 times faster!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Microsoft.Data.AnalysisAll DataFrame related issues and PRsperfPerformance and Benchmarking related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions