Skip to content

Conversation

@MuhammedHasan
Copy link
Collaborator

@MuhammedHasan MuhammedHasan commented Aug 12, 2025

Speed up for the decima modisco-pattern function:
Old modisco with #22

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
3948.6744       1:05:48 9562.70 18040.41        9521.04 9538.74 0.00    15611.66        261.27  10318.95

New modisco jmschrei/tfmodisco-lite#71

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
1033.2864       0:17:13 15015.64        49914.56        12574.11        12719.28        0.00    3122.35 859.06  9723.68

Speed up for the decima modisco-reports function:
Old modisco

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
736.9390        0:12:16 1065.41 6516.85 1018.21 1036.29 0.00    0.63    50.92   533.50

New modisco

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
234.5583        0:03:54 6290.30 40260.50        4804.80 4889.07 0.00    0.70    189.41  548.03

Runtime of attribution prediction shared and takes on L40 for each replicate:

s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
336.5114        0:05:36 45098.63        229566.50       7871.31 12046.93        0.00    3842.39 98.06   334.85
s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
340.7673        0:05:40 46461.95        231611.04       8226.79 12526.48        0.00    3842.41 95.93   331.32
s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
343.4498        0:05:43 46129.14        231169.75       7997.02 12321.96        0.00    3842.39 95.39   331.99
s       h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load       cpu_time
337.7918        0:05:37 46199.02        231601.22       8314.75 12585.25        0.00    3842.39 98.30   337.34

Code to call attributions:

from decima.interpret.modisco import predict_save_modisco_attributions


cell_types = [
    "Amygdala excitatory",
    "CGE interneuron",
    "Cerebellar inhibitory",
    "Deep-layer corticothalamic and 6b",
    "Deep-layer intratelencephalic",
    "Deep-layer near-projecting",
    "Eccentric medium spiny neuron",
    "Hippocampal CA1-3",
    "Hippocampal CA4",
    "Hippocampal dentate gyrus",
    "LAMP5-LHX6 and Chandelier",
    "Lower rhombic lip",
    "MGE interneuron",
    "Mammillary body",
    "Medium spiny neuron",
    "Midbrain-derived inhibitory",
    "Splatter",
    "Thalamic excitatory",
    "Upper rhombic lip",
    "Upper-layer intratelencephalic",
]
off_cell_types = [
    "Astrocyte",
    "Bergmann glia",
    "Choroid plexus",
    "Ependymal",
    "Microglia",
    "Oligodendrocyte",
]

predict_save_modisco_attributions(
    output_prefix='neuron',
    tasks=f'cell_type in {cell_types} and organ == "CNS"',
    off_tasks=f'cell_type in {off_cell_types} and organ == "CNS"',
    model=0,
    top_n_markers=250,
    num_workers=16,
)

The step to run modisco:

modisco_patterns(
    output_prefix='neuron',
    attributions=['neuron.attributions_0.h5', 'neuron.attributions_1.h5', 'neuron.attributions_2.h5', 'neuron.attributions_3.h5'],
    tasks=f'cell_type in {cell_types} and organ == "CNS"',
    off_tasks=f'cell_type in {off_cell_types} and organ == "CNS"',
    top_n_markers=250,
    tss_distance=10_000,
    max_seqlets_per_metacluster=10_000,
    num_workers=16,
)

and step to generate motifs:

modisco_reports(
    output_prefix='neuron',
    modisco_h5='neuron.modisco.h5',
    num_workers=16,
)

This comment was marked as outdated.

@MuhammedHasan MuhammedHasan self-assigned this Aug 12, 2025
@MuhammedHasan MuhammedHasan added the enhancement New feature or request label Aug 12, 2025
@MuhammedHasan MuhammedHasan requested a review from Copilot August 26, 2025 21:27
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR significantly improves the performance of ModISCo (motif discovery) functions in Decima by upgrading to a faster ModISCo implementation and adding new functionality. The primary changes include updating to a faster ModISCo library version, parallelizing attribution loading, and adding seqlet bed file generation.

Key changes:

  • Performance optimization through faster ModISCo library and parallel processing
  • New motif utility functions for information content, trimming, and motif positioning
  • Addition of seqlet bed file generation functionality

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
setup.cfg Updates to faster ModISCo library dependency
src/decima/utils/motifs.py New utility functions for motif analysis and trimming
src/decima/interpret/modisco.py Major refactoring with parallelization and new seqlet bed functionality
src/decima/core/attribution.py Parallel processing support for attribution loading
src/decima/utils/io.py BigWig writing improvements and gradient correction options
src/decima/hub/init.py Better error handling for missing model/metadata files
src/decima/cli/* CLI updates to expose new parameters and functionality
tests/* New test coverage for motif utilities and updated existing tests

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@MuhammedHasan MuhammedHasan merged commit fc083a6 into main Sep 17, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants