How to address hdf5 latency and spawn errors when loading fragments in torch? #417

mkarikom · 2025-10-03T13:03:15Z

mkarikom
Oct 3, 2025

Currently, when using fragments in torch, I'm only using a single torch worker, which avoids 'unpickleable' errors when torch multiprocessing tries to spawn and can't pickle the fragment file, due to some issue with hdf5

Under this implementation, it's still quite slow because getting fragments from random indices in the fragment file is bottlenecked somehow.

Wondering if you have addressed this, possibly within Selene SDK, or some other framework.

I thought about pre-serializing the fragments for each cell using the snapatac.export, but I would much prefer any suggestions for a low-latency approach to multiprocessing and indexing on the native snapatac fragment files themselves.

kaizhang · 2025-10-05T03:49:33Z

kaizhang
Oct 5, 2025
Maintainer

First, you need to open the h5ad in memory mode if you want efficient random indexing. Fragment data are stored as a specialized sparse row matrix. Random indexing along the rows should be fast.

0 replies

mkarikom · 2025-10-06T20:46:53Z

mkarikom
Oct 6, 2025
Author

hi @kaizhang,
Thanks, I think part of the issues I had were that using multiple gpus caused in_memory fragment datasets to consume more memory with additional workers.
That is why I resorted to loading backed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to address hdf5 latency and spawn errors when loading fragments in torch? #417

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to address hdf5 latency and spawn errors when loading fragments in torch? #417

Uh oh!

Uh oh!

mkarikom Oct 3, 2025

Replies: 2 comments

Uh oh!

kaizhang Oct 5, 2025 Maintainer

Uh oh!

mkarikom Oct 6, 2025 Author

mkarikom
Oct 3, 2025

kaizhang
Oct 5, 2025
Maintainer

mkarikom
Oct 6, 2025
Author