Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added Support For Merging Multiple DataStore and Handling Different A…
…ssay Types (#125) * init.py - Added merge imports. datastore.py - added a get_assay() method to get assay metadata. metadata.py - added method to fetch polars dataframe. utils.py - added methods for permute_in_chunks to get chunks in random. writer.py - remove merge code. merge.py - added alias for zarrmerge, added randomization functionality on assaymerge, initial implementation on multiple modality merge. * Formatted code * Fixed polar index issues in AssayMerge. Implemented DatasetMerge class to merge multiple datastores, handling different assay types and generating missing assays on the fly. * Added unit tests and assertions for new merge function * code cleanup and condition check for empty permute * Doc String Update * Test coverage to 3 dataset * Added 'polars' to requirements. Added descriptions for the tests. Updated permute_into_chunks() with new (and suggested) NumPy functions. Renamed the variables in merge.py to be more descriptive. Added type hints to merge.py. Made DummyAssay class assessable. Added more descriptive assert messages to the perfrom_randomization_rows() method of AssayMerge class. * Added support for mismatching features. Currently all features are renamed. Working on renaming duplicates only. * Update: Rename only for the duplicated values when checking mismatching features * support for automatic detection of id and name not equal. Resolved edge case where chunksize creates issue if one dataset is really small * Added support for duplicate feature IDs merging into single ID. Added checks in between. Corrected polar indexing issues for _ref_order_feat_idx(). Changed _dask_to_coo() to perform multiple to one feature mapping on the fly + changed indexing strategy. * Cleanup * Updated return type for get_assay() in datastore.py. Fixed typos and optimal memory allocation for dummy assay in merge.py * print() -> logger.info() * merge conflict resolve * conflict resolution * conflict resolution * fixed import typo
- Loading branch information