Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add peak variable support to MsBackendMemory and MsBackendDataFrame #297

Merged
merged 19 commits into from
Sep 22, 2023

Conversation

jorainer
Copy link
Member

This PR adds the peak variable functionality mentioned in issues #289 (and #287). In essence the promise of peaksData is to return a 2 dimensional data structure. Can be a matrix or data.frame. If only "mz" and "intensity" are requested MsBackendMemory and MsBackendDataFrame return a list of matrix, otherwise a list of data.frame.

- `peaksData,MsBackendMemory` returns by default a `list` of `matrix` or a
  `list` of `data.frame`s if other peak variables than `"mz"`, `"intensity"`
  are requested. Issue #289.
- Ensure lazy evaluation of processing queue in `peaksData,Spectra` does not
  break if requested `columns` do not contains `"mz"` and `"intensity"`. Issue
  #289.
- `applyProcessing` ensures that all peaks variables are properly updates/subset
  depending on the processing queue (issue #289).
- `spectraData<-,Spectra` and `$<-,Spectra` throw an error if processing queue
  is not empty and values for peaks variables are requested to be
  replaced. Issue #289.
- Add support for peak variables to `MsBackendDataFrame`. Issue #289.
- Add examples for peak variables to `MsBackend` and `Spectra` documentation.
- Expand documentation for peaks variables to the `Spectra` vignette.
- Fix issue in `MsBackendMemory` that allowed to have only either m/z or
  intensity as a peaks variable.
- Add support for peaks variables to `MsBackendDataFrame` and add/test its
  funcitionality (issue #289).
@jorainer jorainer requested a review from lgatto June 15, 2023 12:39
NEWS.md Outdated
- Add `combinePeaks` generics.
- Add `combinePeaks,Spectra` to combine peaks within each spectrum in a
`Spectra`.
>>>>>>> main
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now :)

@jorainer jorainer requested a review from lgatto June 16, 2023 19:27
@@ -69,6 +69,26 @@ NULL
NULL
}

.peaks_variables <- function(x) {
if (x@version != "0.1") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we not agree to check for the presence of the peaksVariablesslot here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now

columns = c("mz", "intensity")) {
if (length(object)) {
cns <- colnames(object@peaksData[[1L]])
if (length(columns) == length(cns) && all(cns == columns))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not fail if I set peaksData(..., columns = c("intensity", "mz"))? Should cns and columns not be sorted in the last part of the if?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The aim of that if is that @peaksData is only returned if columns is equal to the colnames of the @peaksData matrices, i.e. only if columns = c("mz", "intensity) and colnames of @peaksData is c("mz", "intensity"). If columns is e.g. c("mz", "intensity", "peak_ann") or it is c("intensity", "mz") it should be FALSE to ensure that we combine the @peaksData with the @peaksDataFrame or we iterate over @peaksData to re-order the columns.

So, sorting would not be correct here because then the order of columns in peaks data would be e.g. "mz", "intensity" even if the user requested "intensity", "mz". MsBackend should allow to extract peaks data matrices in any user requested order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

R/Spectra.R Outdated
#' `data.frame`), with each array providing the values for the requested
#' *peak variables* (by default `"mz"` and `"intensity"`). Optional parameter
#' `columns` is passed to the backend's `peaksData` function to allow
#' selection of specific (or additional) peaks variables (columns) that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... allow the selection ...

setMethod(
"spectraData", "Spectra",
function(object, columns = spectraVariables(object)) {
if (length(object@processingQueue) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion, but I would find it useful to have some comments that describe the different situations here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean within the code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments in the code. Basically, if the processing queue is empty or the user does not request peak variables with columns the spectraData method from the backend is called. Otherwise we need to execute the processing queue (calling peaksData on the Spectra object) to ensure that e.g. peaks are filtered etc and add the individual peaks variables to the returned DataFrame.

@lgatto
Copy link
Member

lgatto commented Jun 19, 2023

For now, additional peaks variables are only possible for in-memory (i.e. Memory and DataFrame) backends, right? Additional ones could come (for example SQL-based backends), but not (possibly never) for the mzR backend. This could be mentioned/discussed either in the manual page and/or in the vignette.

@jorainer
Copy link
Member Author

For now, additional peaks variables are only possible for in-memory (i.e. Memory and DataFrame) backends, right?

Actually, no. Since e.g. MsBackendMzR and others inherit from MsBackendDataFrame (or MsBackendMemory) they support also peaks variables. For MsBackendMzR they would obviously not (yet) imported from mzML files directly, but it would be possible to add them later (with a future addPeaksVariable function). There is already some notes on peaks variable support of different backends in the (too large) documentation of Spectra objects. But that could then also be improved in the revamp of the docs (issue #288 )

@jorainer jorainer requested a review from lgatto June 20, 2023 15:50
@jorainer
Copy link
Member Author

Thanks @lgatto for the review. I have now addressed the comments.

@jorainer jorainer requested a review from sgibb June 28, 2023 05:20
R/MsBackend.R Outdated
#' `MsBackendMemory` supports also arbitrary peak annotations while the
#' `MsBackendDataFrame` does not have support for such additional peak
#' variables.
#' `"intensity"` is fastest for the `MsBackendMemory`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a missing ) after "intensity".

R/MsBackend.R Outdated
#' The `peaksData` function for `MsBackendMemory` and `MsBackendDataFrame`
#' returns a `list` of `numeric` `matrix` by default (with parameter
#' `columns = c("mz", "intensity")`). If other peak variables are requested,
#' a `list` of `data.frame` is returned (to ensure m/z and intensity values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure to understand the "to ensure m/z ...". Should it not read "ensuring that m/z ..."?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks

if (missing(data)) data <- DataFrame()
if (is.data.frame(data))
data <- DataFrame(data)
if (!is(data, "DataFrame"))
stop("'data' has to be a 'DataFrame'")
peaksVariables <- intersect(peaksVariables, colnames(data))
if (sum(c("mz", "intensity") %in% peaksVariables) == 1L)
Copy link
Member

@lgatto lgatto Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this check, it will throw an error if only one of "mz" or "intensity" are in the peaksVariables. If none are, this wouldn't trigger the error. Why not

sum(c("mz", "intensity") %in% peaksVariables) != 2L

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch! thanks, changed that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, after checking again: we were supporting the following situations:

  • user provides m/z and intensity
  • user does not provide neither m/z and intensity (in which case 0-length spectra are available, but the Spectra is still considered valid).

An error is only thrown if either m/z but not intensity, or intensity but not m/z is provided. from a data point of view that does not make sense, you can't have one of the two. you could either have both or none.

Open to discuss @lgatto when you think that's not a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fine by me. May be just clarify with a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jepp, will add that and also check that I mention it in documentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added as a comment and docu.

columns = c("mz", "intensity")) {
if (length(object)) {
cns <- colnames(object@peaksData[[1L]])
if (length(columns) == length(cns) && all(cns == columns))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@lgatto lgatto self-requested a review September 22, 2023 11:54
@jorainer jorainer merged commit 585abb1 into main Sep 22, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants