Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a script that converts CellBrowser config to Anndata-Zarr file which is digestable by Vitessce zero config mode #259

Merged
merged 18 commits into from
Jul 18, 2023

Conversation

ivababukova
Copy link
Contributor

@ivababukova ivababukova commented Jul 11, 2023

Fixes #1228

Changes

  • A new file under vitessce/ containing a script that takes in project_name and output_dir, downloads all files and configurations for that project and creates an Anndata object out of them.
  • The script provides 2 ways to export the project data:
    - 1. As an object saved locally and written to the Anndata-Zarr store. The saved object can be loaded with Vitessce zero config mode functionality.
    - 2. Vitessce view config that can be loaded to Vitessce directly.

NOTE: For both cases, the user will need to use a local http server to load the files (issue #1278)
NOTE: For the second option, the user will need to add the correct URL to their file and also define coordinationValues and options.

Tested with the following projects:

Projects are taken from: https://github.com/ucscGenomeBrowser/cellbrowser-confs
They are available in https://cells.ucsc.edu, for example: https://cells.ucsc.edu/?ds=adultPancreas

Successfully processed:

  • adultPancreas (2542 number of cells) –- good
  • brain-dronc-seq (13067 number of cells) – good
  • adult-retina (19694 number of cells) – good
  • chporg (32464 number of cells) – good, but have to wait about 20 mins because the dataset is big
  • dros-brain+normal (4407 number of cells) – good
  • dros-brain+merge (8696 number of cells) – good

Successfully processed, but there are no cell set colours:

  • dental-cells+human-adult-molars (41673) – no cell colours and no heatmap
  • cortex-dev — cell colours are not appearing (no suitable fields in obs)
  • cross-tissue-maps+immune (14156 number of cells) - cell colours are not appearing (no suitable fields in obs)
  • covid19-smoking (19361 number of cells) - cell colours are not appearing. There are Cluster and Cell Type fields. It is strange that it is not working.
  • dros-olfac (3833 number of cells) - cell colours not appearing because obs doesn’t contain supported props
  • dental-cells+mouse-incisor (2888 number of cells) – cell set colours don’t show up (no valid obs prop)
  • chi-10x-mouse-cardiomyocytes (9072 number of cells) – the colours of the cells are not appearing (Cluster is in obs and in Vitessce view config, but colors still not showing up)
  • covid19-cellular-targets+kidney (33872 number of cells) – the cellSet colors are not showing up: there are no obsSets under options, but even if I add them (adding cellId or donor), the app crashes and doesn’t work. Requires 10+ mins wait time to process
    cardiac-differentiation+trajectory+cm-combined-trajectory (35902) - no cell colours and no heatmap

Unsuccessful, because it took too long to load the matrix file:

  • covid19-periph-immuno (44721 number of cells) - couldn’t process it after 9 mins of waiting
  • covid19-immuno+b-cells (11377 number of cells) — took more than 30 mins to process
  • covid19-bronch-epi — dataset is really big (expr matrix unzipped is 3.86 GB). Couldn’t process it after half an hour of waiting

NOTE: I only ran the script for smaller datasets. For datasets with more than 40 000 cells, loading the expression matrix takes too long time (more than half an hour).

@ivababukova ivababukova changed the title Create a script that converts CellBrowser config to Vitessce view config Create a script that converts CellBrowser config to Anndata-Zarr file which is digestable by Vitessce zero config mode Jul 11, 2023
vitessce/config_converter.py Outdated Show resolved Hide resolved
print(f"obsm {key} is an instance of DataFrame, converting it to numpy array.")
self.adata.obsm[key] = self.adata.obsm[key].to_numpy()

self.adata = optimize_adata(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.adata = optimize_adata(
return optimize_adata(

To make things simpler and easier to test, I would not worry about writing to the Zarr format in this converter, and instead would return the AnnData object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you suggest that we delete the following lines:

os.makedirs(os.path.dirname(data_dir), exist_ok=True)
self.adata.write_zarr(zarr_filepath, chunks=[self.adata.shape[0], VAR_CHUNK_SIZE])

If we don't write things to the Zarr format and just return the adata object, written to the store, then how will the zero config mode functionality in the Vitessce website work? How will it be able to pick up the local Anndata-Zarr object and generate the view config? Or am I misunderstanding your suggestion?

vitessce/config_converter.py Outdated Show resolved Hide resolved
@keller-mark
Copy link
Member

Can we add an example notebook in docs/notebooks (and linked from https://github.com/vitessce/vitessce-python/blob/main/docs/widget_examples.rst so that it gets included in the documentation website)

@keller-mark
Copy link
Member

Looking good, and thanks for all of the tests! A few minor comments

@ivababukova
Copy link
Contributor Author

Can we add an example notebook in docs/notebooks (and linked from https://github.com/vitessce/vitessce-python/blob/main/docs/widget_examples.rst so that it gets included in the documentation website)

That is done now

Comment on lines +8 to +9
- pandas>=1.5.3
- anndata==0.8.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these versions were incompatible and the new scripts were erroring out on the step where I call optimize_adata

- numba>=0.53.0
- scanpy>=1.6.0
- jupyterlab>=3
- zarr>=2.5.0
- boto3>=1.16.30
- starlette==0.30.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this dependency is needed when we call anndata_wrapper_inst.auto_view_config(vc), part of the convert_cellbrowser_project_to_vitessce_config function. I upgraded straight to 0.30, instead of 0.14 (as it is in the vitessce-python-dev environment) to avoid having to install 'aiofiles>=0.6.0'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to address this first #254

'ome-zarr==0.2.1',
'tifffile>=2020.10.1',
'jsonschema>=3.2'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used for the validator of the CellBrowser config

@@ -38,7 +38,7 @@ def add_mapping(self, name, coords):
if len(coords) != len(self._cell_ids):
raise ArgumentLengthDoesNotMatchCellIdsException(
'Coordinates length does not match Cell IDs Length')
if type(name) != str:
if not isinstance(name, str):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto corrected by the linter

@ivababukova ivababukova merged commit b5c5795 into main Jul 18, 2023
6 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants