Passing metadata (/configuration?) through DataFrame.attrs

### The feature

It would be really nice to have a way of associating chromosome info with the dataframe containing the ranges. I would propose using `pd.DataFrame.attrs` for storing metadata like chromosome info, column names.

### Why

`GRanges` objects from bioconductor have a `@seqinfo` attribute that contains sequence info about the assembly being used. For example:

```R
library(EnsDb.Hsapiens.v86)
ensdb = EnsDb.Hsapiens.v86
g = genes(ensdb)
head(g, 3)
# GRanges object with 3 ranges and 6 metadata columns:
#                   seqnames      ranges strand |         gene_id   gene_name           gene_biotype seq_coord_system      symbol                       entrezid
#                      <Rle>   <IRanges>  <Rle> |     <character> <character>            <character>      <character> <character>                         <list>
#   ENSG00000223972        1 11869-14409      + | ENSG00000223972     DDX11L1 transcribed_unproces..       chromosome     DDX11L1 100287596,100287102,727856,...
#   ENSG00000227232        1 14404-29570      - | ENSG00000227232      WASH7P unprocessed_pseudogene       chromosome      WASH7P                           <NA>
#   ENSG00000278267        1 17369-17436      - | ENSG00000278267   MIR6859-1                  miRNA       chromosome   MIR6859-1                      102466751
#   -------
#   seqinfo: 357 sequences (1 circular) from GRCh38 genome
g@seqinfo
# Seqinfo object with 357 sequences (1 circular) from GRCh38 genome:
#   seqnames seqlengths isCircular genome
#   1         248956422      FALSE GRCh38
#   10        133797422      FALSE GRCh38
#   11        135086622      FALSE GRCh38
#   12        133275309      FALSE GRCh38
#   13        114364328      FALSE GRCh38
#   ...             ...        ...    ...
#   LRG_741      231167      FALSE GRCh38
#   LRG_93        22459      FALSE GRCh38
#   MT            16569       TRUE GRCh38
#   X         156040895      FALSE GRCh38
#   Y          57227415      FALSE GRCh38
```

It would be nice if we could also attach this kind of information to our range dataframe for use with bioframe. This could be done by putting something equivalent to `@seqinfo` into the `pd.DataFrame.attrs` attribute. Something similar could also be done for different range column names.

#### Current use of global configuration

With `cols`, this library already provides ways of setting different values without needing to pass them all the time ([docs](https://bioframe.readthedocs.io/en/latest/guide-intervalops.html#flexible-column-naming)). These are using a global config or temporarily modifying that config with a context manager.

I think both of these are less ergonomic

* They require explicit code for something which could be explicit in the data, but implicit in the code.
* They're global, and don't allow working with different configurations at the same time

### Downsides

#### `pd.DataFrame.attrs`

The main downside is `pd.DataFrame.attrs`.

* It's still marked as experimental, and can change
* It doesn't show up in the repr, so it's not obvious if anything has been added

I would hope that usage here could influence further development of the features.

#### May not work with other backends

It's not immediately obvious whether alternative backends would also support this kind of feature

* (I proposed alternative backends in #137)

### Alternatives

* Do nothing, keep passing this metadata as is.
* Custom class of some sort (like bioconductor)
  * Instead of a custom dataframe class, this could be a pandas extension array, which would be a lighter touch.
  * But this doesn't fit with the current `bioframe` design


 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing metadata (/configuration?) through DataFrame.attrs #151

The feature

Why

Current use of global configuration

Downsides

`pd.DataFrame.attrs`

May not work with other backends

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Passing metadata (/configuration?) through DataFrame.attrs #151

Description

The feature

Why

Current use of global configuration

Downsides

pd.DataFrame.attrs

May not work with other backends

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`pd.DataFrame.attrs`