Skip to content

[C++] Feather: slow writing of NullArray #22894

@asfimport

Description

@asfimport

From https://stackoverflow.com/questions/57877017/pandas-feather-format-is-slow-when-writing-a-column-of-none

Smaller example with just using pyarrow, it seems that writing an array of nulls takes much longer than an array of for example ints, which seems a bit strange:

In [93]: arr = pa.array([None]*1000, type='int64')

In [94]: %%timeit 
    ...: w = pyarrow.feather.FeatherWriter('__test.feather') 
    ...: w.writer.write_array('x', arr) 
    ...: w.writer.close() 

31.4 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [95]: arr = pa.array([None]*1000)  

In [96]: arr    
Out[96]: 
<pyarrow.lib.NullArray object at 0x7fa47a23ca40>
1000 nulls

In [97]: %%timeit 
    ...: w = pyarrow.feather.FeatherWriter('__test.feather') 
    ...: w.writer.write_array('x', arr) 
    ...: w.writer.close() 

3.75 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

So writing the same length NullArray takes ca 100x more time compared to an array of nulls but with Integer type.

Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Wes McKinney / @wesm

Related issues:

Note: This issue was originally created as ARROW-6529. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions