Open
Description
Arctic Version
arctic==1.79.4
pandas=1.1.5
Arctic Store
TickStore
Platform and version
Ubuntu Linux 20.04, Python 3.8.8 (Anaconda), running JupyterLab
Modern CPU w/ NVMe
Description of problem and/or code sample that reproduces the issue
I am experiencing very slow Tickstore reads. In my sample code below, the write operation clocks at 1.2s for 5 million rows, which seems good. However, when I read the data, the read operation clocks at 59s.
import arctic
import pandas as pd
arctic_host = 'localhost:27017'
test_library_name = 'dev_speed_testing_library'
test_store = arctic.Arctic(arctic_host)
test_store.delete_library(test_library_name)
test_store.initialize_library(test_library_name, 'TickStoreV3')
test_library = test_store[test_library_name]
test_library._chunk_size = 1000000
test_library.list_symbols()
[]
from numpy.random import default_rng
data_length = 5000000
sample_index = pd.date_range(start='1990-01-01', periods=data_length, freq='1ms', tz='UTC')
rng = default_rng()
sample_data = rng.standard_normal(data_length)
sample_data = sample_data * sample_data #get rid of negative #s
test_df=pd.DataFrame(sample_data, sample_index, columns=['price'])
test_df.dtypes
price float64
dtype: object
%%time #don't use this if you're not running in jupyter
test_library.write('testsymbol', test_df)
CPU times: user 1.16 s, sys: 64.1 ms, total: 1.22 s
Wall time: 1.26 s
%%time #don't use this if you're not running jupyter
tmp = test_library.read('testsymbol')
CPU times: user 59.3 s, sys: 3.14 s, total: 1min 2s
Wall time: 59.7 s
On the read operation, the process seems to be cpu bound, with a single python thread pegged at 100%.
Not sure if I'm missing something obvious here, like using the wrong data types or something, but writes that are that many multiples faster than reads seems odd.
Metadata
Metadata
Assignees
Labels
No labels