Skip to content

Commit

Permalink
Added Support For Merging Multiple DataStore and Handling Different A…
Browse files Browse the repository at this point in the history
…ssay Types (#125)

* init.py - Added merge imports. datastore.py - added a get_assay() method to get assay metadata. metadata.py - added method to fetch polars dataframe. utils.py - added methods for permute_in_chunks to get chunks in random. writer.py - remove merge code. merge.py - added alias for zarrmerge, added randomization functionality on assaymerge, initial implementation on multiple modality merge.

* Formatted code

* Fixed polar index issues in AssayMerge. Implemented DatasetMerge class to merge multiple datastores, handling different assay types and generating missing assays on the fly.

* Added unit tests and assertions for new merge function

* code cleanup and condition check for empty permute

* Doc String Update

* Test coverage to 3 dataset

* Added 'polars' to requirements. Added descriptions for the tests. Updated permute_into_chunks() with new (and suggested) NumPy functions. Renamed the variables in merge.py to be more descriptive. Added type hints to merge.py. Made DummyAssay class assessable. Added more descriptive assert messages to the perfrom_randomization_rows() method of AssayMerge class.

* Added support for mismatching features. Currently all features are renamed. Working on renaming duplicates only.

* Update: Rename only for the duplicated values when checking mismatching features

* support for automatic detection of id and name not equal. Resolved edge case where chunksize creates issue if one dataset is really small

* Added support for duplicate feature IDs merging into single ID. Added checks in between. Corrected polar indexing issues for _ref_order_feat_idx(). Changed _dask_to_coo() to perform multiple to one feature mapping on the fly + changed indexing strategy.

* Cleanup

* Updated return type for get_assay() in datastore.py. Fixed typos and optimal memory allocation for dummy assay in merge.py

* print() -> logger.info()

* merge conflict resolve

* conflict resolution

* conflict resolution

* fixed import typo
  • Loading branch information
Gautam8387 authored Oct 20, 2024
1 parent 8a57bfb commit 393b2cf
Show file tree
Hide file tree
Showing 8 changed files with 951 additions and 283 deletions.
1 change: 1 addition & 0 deletions scarf/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,4 @@
from .meld_assay import *
from .utils import *
from .downloader import *
from .merge import *
20 changes: 19 additions & 1 deletion scarf/datastore/datastore.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from loguru import logger

from .mapping_datastore import MappingDatastore
from ..assay import RNAassay, ATACassay
from ..assay import Assay, RNAassay, ATACassay
from ..feat_utils import hto_demux
from ..utils import tqdmbar, controlled_compute, ZARRLOC
from ..writers import create_zarr_obj_array, create_zarr_dataset
Expand Down Expand Up @@ -75,6 +75,24 @@ def __init__(
synchronizer=synchronizer,
)

def get_assay(
self,
assay_name: str
) -> Assay:
"""Returns the assay object for the given assay name.
Args:
assay_name: Name of the assay to be returned.
Returns:
Assay object
"""
if assay_name not in self.assay_names:
raise ValueError(f"ERROR: Assay {assay_name} not found in the Zarr file")
else:
return getattr(self, assay_name)


def filter_cells(
self,
attrs: Iterable[str],
Expand Down
Loading

0 comments on commit 393b2cf

Please sign in to comment.