- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3.9k
Closed
Labels
Description
Smaller example with just using pyarrow, it seems that writing an array of nulls takes much longer than an array of for example ints, which seems a bit strange:
In [93]: arr = pa.array([None]*1000, type='int64')
In [94]: %%timeit 
    ...: w = pyarrow.feather.FeatherWriter('__test.feather') 
    ...: w.writer.write_array('x', arr) 
    ...: w.writer.close() 
31.4 µs ± 464 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [95]: arr = pa.array([None]*1000)  
In [96]: arr    
Out[96]: 
<pyarrow.lib.NullArray object at 0x7fa47a23ca40>
1000 nulls
In [97]: %%timeit 
    ...: w = pyarrow.feather.FeatherWriter('__test.feather') 
    ...: w.writer.write_array('x', arr) 
    ...: w.writer.close() 
3.75 ms ± 64.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)So writing the same length NullArray takes ca 100x more time compared to an array of nulls but with Integer type.
Reporter: Joris Van den Bossche / @jorisvandenbossche
Assignee: Wes McKinney / @wesm
Related issues:
Note: This issue was originally created as ARROW-6529. Please see the migration documentation for further details.