faster modisco for decima #23

MuhammedHasan · 2025-08-12T04:21:02Z

Speed up for the decima modisco-pattern function:
Old modisco with #22

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
3948.6744       1:05:48 9562.70 18040.41        9521.04 9538.74 0.00    15611.66        261.27  10318.95

New modisco jmschrei/tfmodisco-lite#71

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
1033.2864       0:17:13 15015.64        49914.56        12574.11        12719.28        0.00    3122.35 859.06  9723.68

Speed up for the decima modisco-reports function:
Old modisco

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
736.9390        0:12:16 1065.41 6516.85 1018.21 1036.29 0.00    0.63    50.92   533.50

New modisco

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
234.5583        0:03:54 6290.30 40260.50        4804.80 4889.07 0.00    0.70    189.41  548.03

Runtime of attribution prediction shared and takes on L40 for each replicate:

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
336.5114        0:05:36 45098.63        229566.50       7871.31 12046.93        0.00    3842.39 98.06   334.85
s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
340.7673        0:05:40 46461.95        231611.04       8226.79 12526.48        0.00    3842.41 95.93   331.32
s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
343.4498        0:05:43 46129.14        231169.75       7997.02 12321.96        0.00    3842.39 95.39   331.99
s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
337.7918        0:05:37 46199.02        231601.22       8314.75 12585.25        0.00    3842.39 98.30   337.34

Code to call attributions:

from decima.interpret.modisco import predict_save_modisco_attributions


cell_types = [
    "Amygdala excitatory",
    "CGE interneuron",
    "Cerebellar inhibitory",
    "Deep-layer corticothalamic and 6b",
    "Deep-layer intratelencephalic",
    "Deep-layer near-projecting",
    "Eccentric medium spiny neuron",
    "Hippocampal CA1-3",
    "Hippocampal CA4",
    "Hippocampal dentate gyrus",
    "LAMP5-LHX6 and Chandelier",
    "Lower rhombic lip",
    "MGE interneuron",
    "Mammillary body",
    "Medium spiny neuron",
    "Midbrain-derived inhibitory",
    "Splatter",
    "Thalamic excitatory",
    "Upper rhombic lip",
    "Upper-layer intratelencephalic",
]
off_cell_types = [
    "Astrocyte",
    "Bergmann glia",
    "Choroid plexus",
    "Ependymal",
    "Microglia",
    "Oligodendrocyte",
]

predict_save_modisco_attributions(
    output_prefix='neuron',
    tasks=f'cell_type in {cell_types} and organ == "CNS"',
    off_tasks=f'cell_type in {off_cell_types} and organ == "CNS"',
    model=0,
    top_n_markers=250,
    num_workers=16,
)

The step to run modisco:

modisco_patterns(
    output_prefix='neuron',
    attributions=['neuron.attributions_0.h5', 'neuron.attributions_1.h5', 'neuron.attributions_2.h5', 'neuron.attributions_3.h5'],
    tasks=f'cell_type in {cell_types} and organ == "CNS"',
    off_tasks=f'cell_type in {off_cell_types} and organ == "CNS"',
    top_n_markers=250,
    tss_distance=10_000,
    max_seqlets_per_metacluster=10_000,
    num_workers=16,
)

and step to generate motifs:

modisco_reports(
    output_prefix='neuron',
    modisco_h5='neuron.modisco.h5',
    num_workers=16,
)

Copilot

Pull Request Overview

This PR significantly improves the performance of ModISCo (motif discovery) functions in Decima by upgrading to a faster ModISCo implementation and adding new functionality. The primary changes include updating to a faster ModISCo library version, parallelizing attribution loading, and adding seqlet bed file generation.

Key changes:

Performance optimization through faster ModISCo library and parallel processing
New motif utility functions for information content, trimming, and motif positioning
Addition of seqlet bed file generation functionality

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
setup.cfg	Updates to faster ModISCo library dependency
src/decima/utils/motifs.py	New utility functions for motif analysis and trimming
src/decima/interpret/modisco.py	Major refactoring with parallelization and new seqlet bed functionality
src/decima/core/attribution.py	Parallel processing support for attribution loading
src/decima/utils/io.py	BigWig writing improvements and gradient correction options
src/decima/hub/init.py	Better error handling for missing model/metadata files
src/decima/cli/*	CLI updates to expose new parameters and functionality
tests/*	New test coverage for motif utilities and updated existing tests

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/decima/interpret/modisco.py

src/decima/core/attribution.py

src/decima/utils/motifs.py

src/decima/core/attribution.py

Muhammed Hasan Celik added 8 commits August 7, 2025 15:28

modisco added

4cc2e53

modisco slow version

fa50322

io

f6325c7

add modisco-lite to setup dependencies

a675496

bug fix in bigwig writer

758c401

faster version

5cacde4

check path in env variables

cb9a8e1

avoid zero division warnning and load metadata ones

fafd27b

MuhammedHasan mentioned this pull request Aug 12, 2025

faster tfmodisco-lite jmschrei/tfmodisco-lite#71

Closed

MuhammedHasan requested review from avantikalal and Copilot August 12, 2025 04:36

This comment was marked as outdated.

Sign in to view

MuhammedHasan self-assigned this Aug 12, 2025

MuhammedHasan added the enhancement New feature or request label Aug 12, 2025

Muhammed Hasan Celik added 5 commits August 25, 2025 17:21

seqlet bed files

04609c4

setup fix

b76352d

merge with modisco

d735b3a

fix for modisco cli doc

99c7881

printing issue

8ba89fd

MuhammedHasan requested a review from Copilot August 26, 2025 21:27

Copilot AI reviewed Aug 26, 2025

View reviewed changes

Muhammed Hasan Celik added 3 commits August 27, 2025 14:53

motif fix

5e952c4

merge

c6efd9f

motif functions and seqlet calling update

d6e8a40

avantikalal reviewed Sep 16, 2025

View reviewed changes

src/decima/core/attribution.py Outdated Show resolved Hide resolved

attributions

a62a005

MuhammedHasan merged commit fc083a6 into main Sep 17, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

faster modisco for decima #23

faster modisco for decima #23

Uh oh!

MuhammedHasan commented Aug 12, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

faster modisco for decima #23

faster modisco for decima #23

Uh oh!

Conversation

MuhammedHasan commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MuhammedHasan commented Aug 12, 2025 •

edited

Loading