-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a script that converts CellBrowser config to Anndata-Zarr file which is digestable by Vitessce zero config mode #259
Conversation
print(f"obsm {key} is an instance of DataFrame, converting it to numpy array.") | ||
self.adata.obsm[key] = self.adata.obsm[key].to_numpy() | ||
|
||
self.adata = optimize_adata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.adata = optimize_adata( | |
return optimize_adata( |
To make things simpler and easier to test, I would not worry about writing to the Zarr format in this converter, and instead would return the AnnData object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you suggest that we delete the following lines:
os.makedirs(os.path.dirname(data_dir), exist_ok=True)
self.adata.write_zarr(zarr_filepath, chunks=[self.adata.shape[0], VAR_CHUNK_SIZE])
If we don't write things to the Zarr format and just return the adata object, written to the store, then how will the zero config mode functionality in the Vitessce website work? How will it be able to pick up the local Anndata-Zarr object and generate the view config? Or am I misunderstanding your suggestion?
Can we add an example notebook in docs/notebooks (and linked from https://github.com/vitessce/vitessce-python/blob/main/docs/widget_examples.rst so that it gets included in the documentation website) |
Looking good, and thanks for all of the tests! A few minor comments |
…. Added jupyter notebooks
That is done now |
- pandas>=1.5.3 | ||
- anndata==0.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these versions were incompatible and the new scripts were erroring out on the step where I call optimize_adata
- numba>=0.53.0 | ||
- scanpy>=1.6.0 | ||
- jupyterlab>=3 | ||
- zarr>=2.5.0 | ||
- boto3>=1.16.30 | ||
- starlette==0.30.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this dependency is needed when we call anndata_wrapper_inst.auto_view_config(vc)
, part of the convert_cellbrowser_project_to_vitessce_config
function. I upgraded straight to 0.30, instead of 0.14 (as it is in the vitessce-python-dev
environment) to avoid having to install 'aiofiles>=0.6.0'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to address this first #254
'ome-zarr==0.2.1', | ||
'tifffile>=2020.10.1', | ||
'jsonschema>=3.2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used for the validator of the CellBrowser config
@@ -38,7 +38,7 @@ def add_mapping(self, name, coords): | |||
if len(coords) != len(self._cell_ids): | |||
raise ArgumentLengthDoesNotMatchCellIdsException( | |||
'Coordinates length does not match Cell IDs Length') | |||
if type(name) != str: | |||
if not isinstance(name, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto corrected by the linter
Fixes #1228
Changes
vitessce/
containing a script that takes in project_name and output_dir, downloads all files and configurations for that project and creates an Anndata object out of them.- 1. As an object saved locally and written to the Anndata-Zarr store. The saved object can be loaded with Vitessce zero config mode functionality.
- 2. Vitessce view config that can be loaded to Vitessce directly.
NOTE: For both cases, the user will need to use a local http server to load the files (issue #1278)
NOTE: For the second option, the user will need to add the correct URL to their file and also define
coordinationValues
andoptions
.Tested with the following projects:
Projects are taken from: https://github.com/ucscGenomeBrowser/cellbrowser-confs
They are available in https://cells.ucsc.edu, for example: https://cells.ucsc.edu/?ds=adultPancreas
Successfully processed:
Successfully processed, but there are no cell set colours:
cardiac-differentiation+trajectory+cm-combined-trajectory (35902) - no cell colours and no heatmap
Unsuccessful, because it took too long to load the matrix file:
NOTE: I only ran the script for smaller datasets. For datasets with more than 40 000 cells, loading the expression matrix takes too long time (more than half an hour).