Skip to content

Refactoring and Performance Updates#12

Merged
r-trimbour merged 24 commits intocantinilab:mainfrom
shahrozeabbas:main
Mar 25, 2026
Merged

Refactoring and Performance Updates#12
r-trimbour merged 24 commits intocantinilab:mainfrom
shahrozeabbas:main

Conversation

@shahrozeabbas
Copy link
Copy Markdown

Hey @r-trimbour,

Opening a PR for the recent changes.

  • performance improvements
  • organizing methods

Thanks for reviewing!

Shahroze Abbas and others added 24 commits January 5, 2026 16:53
- Add circe/vae.py: VAE model adapted from favapy
- Add circe/latent_network.py: TF-IDF + log + minmax preprocessing, latent correlation computation
- Modify compute_atac_network() to accept method parameter (graphical_lasso or vae)
- Export compute_latent_network in __init__.py
- Simplify double negative in local_alpha (circe.py)
- Remove unreachable return statements (quic_graph_lasso.py)
- Convert .format() calls to f-strings across codebase
- Vectorize index mapping in graphical lasso and latent network
- Improve metacell aggregation with batch slicing
- Remove broken inplace parameter from add_region_infos
- Fix invalid exception raises in metacells.py
- Consolidate distance functions and organism config
hasattr(data, "var") incorrectly matched pandas DataFrames since
they have a .var() method for variance calculation. Changed to
isinstance(data, ad.AnnData) for proper type checking.
Create explicit copy of chr_var DataFrame slice before passing
to chr_batch_graphical_lasso to prevent pandas warning when
modifying the slice.
- Add _preprocess_data() for data conversion and normalization
- Add _determine_architecture() for VAE dimension selection
- Add _extract_latent_embeddings() for encoder output processing
- Add _compute_window_indices() for sliding window peak finding
- Add _compute_window_correlation() for single window correlation
- Remove unused Union import and chr_X parameter
- Move global_idx computation outside inner loop
- Use f-strings for window keys consistently
- Remove unused chromosomes_sizes parameter
- Compute chromosome mask once instead of twice per iteration
- Simplify global_idx to direct np.arange (chr_var is already a copy)
- Replace tqdm with Rich Progress for cleaner single-line display
- Process chromosomes sequentially with per-chromosome progress tick
- Remove unused njobs parameter (no longer parallel)
- Simplify chr_latent_correlation signature
- Update numpy constraint from <2.0 to >=1.23
- Bump version to 0.4.0 for compatibility release
- Add prog.refresh() after loops in latent_network.py
- Add prog.refresh() after track() and loops in circe.py
- Add verbose parameter to VAE class (0=silent, 1=progress bar, 2=one line per epoch)
- Pass verbose from compute_latent_network to VAE training
- Disable per-chromosome tqdm progress bars (always disable_tqdm=True)
- Suppress joblib verbose output (verbose=0)
- Add metric parameter to compute_atac_network() and compute_latent_network()
- Support both 'pearson' (default) and 'cosine' similarity metrics
- Update correlation computation to use sklearn cosine_similarity when metric='cosine'
- Maintain backward compatibility with default 'pearson' metric
- Replace joblib.Parallel with ThreadPoolExecutor for chromosome-level parallelism
- Add Rich progress bar showing chromosome completion
- Remove unused njobs/disable_tqdm params from chr_batch_graphical_lasso
- Remove misleading parallel comment from window processing loop
- Add "Done." message after concatenation
- Expand quiet_dask to silence all Dask loggers (scheduler, worker, core, http.proxy)
- Gate Rich progress bars with verbose parameter (disable when verbose < 1)
- Remove unnecessary comment separators in latent_network.py
- Remove separate print statement before GL progress bar
- Update progress bar descriptions to "Calculating co-accessibility scores" for both GL and VAE
- Add flush=True to "Concatenating results..." prints for proper output ordering
- Add "Done." message to VAE path for consistency
Fixes ImplicitModificationWarning when assigning to varp after add_region_infos
- Move GL-specific functions from circe.py to gl_network.py
- Keep only compute_atac_network dispatcher and reconcile in circe.py
- Update imports to avoid circular dependencies
- Maintain backwards compatibility via __init__.py exports
- Move ORGANISM_DEFAULTS and reconcile from circe.py to utils.py to break circular dependency
- Fix broken f-string in log message
- Fix extra quote in docstring
- Fix docstring default value mismatch
- Remove dead code (useless conditional, unreachable code, redundant pass)
- Improve code quality (empty docstring, simplified checks, removed decorative comments, simplified covariance block)
- Add resolve_organism_params helper to utils.py
- Resolve organism params once at top level in compute_atac_network
- Skip resolution in sliding_graphical_lasso if params already provided
- Maintains backward compatibility for direct function calls
- Optimize utils.py: replace set comparison with all(), fix error message, add regex=False
- Modernize package config: move metadata to pyproject.toml (PEP 621), slim setup.py to Cython-only
- Bump minimum Python version to 3.10
- Remove obsolete __future__ imports from inverse_covariance, quic_graph_lasso, rank_correlation
- Modernize type hints: replace Union with | syntax in graphical_lasso.py
- Rename gl_network.py to graphical_lasso.py for clarity
- Remove unused imports across codebase
- Fix ThreadPoolExecutor results being collected in completion order
  instead of chromosome order, causing incorrect matrix concatenation
- Add cmake and numba to dependencies to help pip resolve builds
@r-trimbour
Copy link
Copy Markdown
Member

Thanks a lot @shahrozeabbas!

I will just rewrite the failing tests so they all pass, and then merge :)

Copilot AI added a commit that referenced this pull request Mar 25, 2026
Incorporates all contributions from shahrozeabbas/Circe#12:
- Performance improvements across graphical lasso and VAE methods
- Extracted graphical_lasso.py module for better code organization
- Added organism-specific parameter defaults (ORGANISM_DEFAULTS)
- Added resolve_organism_params helper function
- Added Rich progress bars for better UX
- NumPy 2.x support (v0.4.0)
- VAE-based co-accessibility method
- Multiple refactoring and bug fixes (sort_regions copy, SettingWithCopyWarning, etc.)

Conflict resolution:
- circe/utils.py: kept PR #12's version (complete additions)
  + fixed typo: 'occurence' -> 'occurrence'
- tests/test_network/test_network.py: kept our fix that resolves the
  UnboundLocalError caused by assigning to a module-level variable

Co-authored-by: shahrozeabbas <28969387+shahrozeabbas@users.noreply.github.com>
@r-trimbour r-trimbour merged commit b1cbc4f into cantinilab:main Mar 25, 2026
3 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants