Skip to content

Experiencing extremely slow reads on TickStore -- fully executable example included. #895

Open
@jeffneuen

Description

@jeffneuen

Arctic Version

arctic==1.79.4
pandas=1.1.5

Arctic Store

TickStore

Platform and version

Ubuntu Linux 20.04, Python 3.8.8 (Anaconda), running JupyterLab
Modern CPU w/ NVMe

Description of problem and/or code sample that reproduces the issue

I am experiencing very slow Tickstore reads. In my sample code below, the write operation clocks at 1.2s for 5 million rows, which seems good. However, when I read the data, the read operation clocks at 59s.

import arctic
import pandas as pd

arctic_host  = 'localhost:27017'
test_library_name = 'dev_speed_testing_library'
test_store = arctic.Arctic(arctic_host)
test_store.delete_library(test_library_name)
test_store.initialize_library(test_library_name, 'TickStoreV3')

test_library = test_store[test_library_name]
test_library._chunk_size = 1000000
test_library.list_symbols()

[]

from numpy.random import default_rng
data_length = 5000000
sample_index = pd.date_range(start='1990-01-01', periods=data_length, freq='1ms', tz='UTC')
rng = default_rng()
sample_data = rng.standard_normal(data_length)
sample_data = sample_data * sample_data #get rid of negative #s
test_df=pd.DataFrame(sample_data, sample_index, columns=['price'])
test_df.dtypes

price float64
dtype: object

%%time #don't use this if you're not running in jupyter
test_library.write('testsymbol', test_df)

CPU times: user 1.16 s, sys: 64.1 ms, total: 1.22 s
Wall time: 1.26 s

%%time #don't use this if you're not running jupyter
tmp = test_library.read('testsymbol')

CPU times: user 59.3 s, sys: 3.14 s, total: 1min 2s
Wall time: 59.7 s

On the read operation, the process seems to be cpu bound, with a single python thread pegged at 100%.

Not sure if I'm missing something obvious here, like using the wrong data types or something, but writes that are that many multiples faster than reads seems odd.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions