-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add peak variable support to MsBackendMemory and MsBackendDataFrame #297
Conversation
- `peaksData,MsBackendMemory` returns by default a `list` of `matrix` or a `list` of `data.frame`s if other peak variables than `"mz"`, `"intensity"` are requested. Issue #289.
- Ensure lazy evaluation of processing queue in `peaksData,Spectra` does not break if requested `columns` do not contains `"mz"` and `"intensity"`. Issue #289.
- `applyProcessing` ensures that all peaks variables are properly updates/subset depending on the processing queue (issue #289).
- `spectraData<-,Spectra` and `$<-,Spectra` throw an error if processing queue is not empty and values for peaks variables are requested to be replaced. Issue #289.
- Add support for peak variables to `MsBackendDataFrame`. Issue #289. - Add examples for peak variables to `MsBackend` and `Spectra` documentation. - Expand documentation for peaks variables to the `Spectra` vignette.
- Fix issue in `MsBackendMemory` that allowed to have only either m/z or intensity as a peaks variable.
- Add support for peaks variables to `MsBackendDataFrame` and add/test its funcitionality (issue #289).
NEWS.md
Outdated
- Add `combinePeaks` generics. | ||
- Add `combinePeaks,Spectra` to combine peaks within each spectrum in a | ||
`Spectra`. | ||
>>>>>>> main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now :)
R/MsBackendDataFrame-functions.R
Outdated
@@ -69,6 +69,26 @@ NULL | |||
NULL | |||
} | |||
|
|||
.peaks_variables <- function(x) { | |||
if (x@version != "0.1") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we not agree to check for the presence of the peaksVariables
slot here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed now
columns = c("mz", "intensity")) { | ||
if (length(object)) { | ||
cns <- colnames(object@peaksData[[1L]]) | ||
if (length(columns) == length(cns) && all(cns == columns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this not fail if I set peaksData(..., columns = c("intensity", "mz"))
? Should cns
and columns
not be sorted in the last part of the if
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim of that if
is that @peaksData
is only returned if columns
is equal to the colnames of the @peaksData
matrices, i.e. only if columns = c("mz", "intensity)
and colnames of @peaksData
is c("mz", "intensity")
. If columns
is e.g. c("mz", "intensity", "peak_ann")
or it is c("intensity", "mz")
it should be FALSE
to ensure that we combine the @peaksData
with the @peaksDataFrame
or we iterate over @peaksData
to re-order the columns.
So, sorting would not be correct here because then the order of columns in peaks data would be e.g. "mz"
, "intensity"
even if the user requested "intensity"
, "mz"
. MsBackend
should allow to extract peaks data matrices in any user requested order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
R/Spectra.R
Outdated
#' `data.frame`), with each array providing the values for the requested | ||
#' *peak variables* (by default `"mz"` and `"intensity"`). Optional parameter | ||
#' `columns` is passed to the backend's `peaksData` function to allow | ||
#' selection of specific (or additional) peaks variables (columns) that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... allow the selection ...
setMethod( | ||
"spectraData", "Spectra", | ||
function(object, columns = spectraVariables(object)) { | ||
if (length(object@processingQueue) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion, but I would find it useful to have some comments that describe the different situations here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean within the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments in the code. Basically, if the processing queue is empty or the user does not request peak variables with columns
the spectraData
method from the backend is called. Otherwise we need to execute the processing queue (calling peaksData
on the Spectra
object) to ensure that e.g. peaks are filtered etc and add the individual peaks variables to the returned DataFrame
.
For now, additional peaks variables are only possible for in-memory (i.e. Memory and DataFrame) backends, right? Additional ones could come (for example SQL-based backends), but not (possibly never) for the mzR backend. This could be mentioned/discussed either in the manual page and/or in the vignette. |
Actually, no. Since e.g. |
Thanks @lgatto for the review. I have now addressed the comments. |
R/MsBackend.R
Outdated
#' `MsBackendMemory` supports also arbitrary peak annotations while the | ||
#' `MsBackendDataFrame` does not have support for such additional peak | ||
#' variables. | ||
#' `"intensity"` is fastest for the `MsBackendMemory`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a missing ) after "intensity".
R/MsBackend.R
Outdated
#' The `peaksData` function for `MsBackendMemory` and `MsBackendDataFrame` | ||
#' returns a `list` of `numeric` `matrix` by default (with parameter | ||
#' `columns = c("mz", "intensity")`). If other peak variables are requested, | ||
#' a `list` of `data.frame` is returned (to ensure m/z and intensity values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand the "to ensure m/z ...". Should it not read "ensuring that m/z ..."?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, thanks
if (missing(data)) data <- DataFrame() | ||
if (is.data.frame(data)) | ||
data <- DataFrame(data) | ||
if (!is(data, "DataFrame")) | ||
stop("'data' has to be a 'DataFrame'") | ||
peaksVariables <- intersect(peaksVariables, colnames(data)) | ||
if (sum(c("mz", "intensity") %in% peaksVariables) == 1L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this check, it will throw an error if only one of "mz" or "intensity" are in the peaksVariables. If none are, this wouldn't trigger the error. Why not
sum(c("mz", "intensity") %in% peaksVariables) != 2L
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! thanks, changed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, after checking again: we were supporting the following situations:
- user provides m/z and intensity
- user does not provide neither m/z and intensity (in which case 0-length spectra are available, but the
Spectra
is still considered valid).
An error is only thrown if either m/z but not intensity, or intensity but not m/z is provided. from a data point of view that does not make sense, you can't have one of the two. you could either have both or none.
Open to discuss @lgatto when you think that's not a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, fine by me. May be just clarify with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jepp, will add that and also check that I mention it in documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added as a comment and docu.
columns = c("mz", "intensity")) { | ||
if (length(object)) { | ||
cns <- colnames(object@peaksData[[1L]]) | ||
if (length(columns) == length(cns) && all(cns == columns)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
This PR adds the peak variable functionality mentioned in issues #289 (and #287). In essence the promise of
peaksData
is to return a 2 dimensional data structure. Can be amatrix
ordata.frame
. If only"mz"
and"intensity"
are requestedMsBackendMemory
andMsBackendDataFrame
return alist
ofmatrix
, otherwise alist
ofdata.frame
.