Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up Genomic Benchmarks dataloader #10

Merged
merged 1 commit into from
Oct 17, 2023

Conversation

cbirchsy
Copy link
Contributor

@cbirchsy cbirchsy commented Aug 11, 2023

I've modified the dataset class such that sequences are simply stored in a list and indexed from memory rather than being read from file. With this change I observed a speedup in epoch time of up to ~ 5x

Dataset Epoch time (old) Epoch time (new)
dummy_mouse_enhancers_ensembl 3s 2s
human_enhancers_cohn 42s 14s
human_nontata_promoters 49s 10s
demo_coding_vs_intergenomic_seqs 2m 6s 25s
demo_human_or_worm 2m 19s 25s
human_enhancers_ensembl 3m 56s 1m 20s
human_ocr_ensembl 4m 48s 1m 32s
human_ensembl_regulatory 8m 21s 2m 37s

@exnx
Copy link
Collaborator

exnx commented Aug 15, 2023

this looks great, thanks so much!

@exnx exnx merged commit 6b3e88e into HazyResearch:main Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants