Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined datasets #5

Open
ivirshup opened this issue Apr 27, 2022 · 0 comments
Open

Combined datasets #5

ivirshup opened this issue Apr 27, 2022 · 0 comments

Comments

@ivirshup
Copy link

I’m wondering if there has been much thought on how metadata for combined datasets are handled. Here I’m thinking about multiple datasets measuring the same variables which have been combined.

Typically, this becomes a single concatenated object with a ”batch” or ”dataset” annotation. However, it could be represented as a collection of objects.

Can/ should there be a convention for maintaining experiment level metadata when multiple experiments are combined? This is trivial for the “collection of experiments” object, but is more complicated for the concatenated object.

For a more concrete example, what happens to the dataset id, and external data identified by the dataset id when we concatenate? Another example is the “files” from a muon.atac generated AnnData: scverse/mudata#20

squidpy's solution for concatenated objects

A similar issue came up in squidpy, which we addressed by essentially requiring a ”library_id” annotation for the observations. Image data is stored under .uns/spatial/{library_id}/ to avoid conflicts when merging. E.g.

# These do not conflict
uns/spatial/library1/images/hires: “image1.png”
uns/spatial/library2/images/hires: “image2.png”

# These do
uns/spatial/images/hires: “image1.png”
uns/spatial/images/hires: “image2.png”

Relevant docs:

Collection of objects

The collection of objects sidesteps this issue by allowing each constituent object to hold its own metadata. However, my impression is that far more tools expect a single concatenated object. There is also not as much tooling for collections of objects, though this has been changing (e.g. anndata.AnnCollection, snapatac2.AnnDataSet)

Question

Should there be conventions for maintaining metadata with concatenated objects? Should we insist on collections of objects if we want to maintain metadata?

Relating to #3, what would the obs_subset of a concatenated object be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant