Feature/refactor DIMS PeakFinding #76

mraves2 · 2025-07-11T15:33:59Z

The PeakFinding step in the DIMS pipeline has been refactored. Instead of averaging the intensities for technical replicates and doing PeakFinding on the averages, the new method will do PeakFinding for each technical replicate and then average the peak intensities for every biological sample. To do this, several scripts have been modified:

MakeInit: changed variable name 'tmp' to 'replicates_persample'
GenerateBreaks: breaks and trim parameters put into separate RData files
AssignToBins: trim parameters read in separately, 'sample_name' changed to 'techrep_name', weighted mean for half-bad TICs
AverageTechReplicates: averaging part removed, script renamed to EvaluateTics. Update txt file with info on samples and corresponding tech reps
PeakFinding: new simplified way to find peaks for every technical replicate. First step: find regions of interest (roi) with some intensity; second step: integrate intensity in each roi by fitting a Gaussian curve
- preprocessing/peak_finding_functions: new functions for PeakFinding. Note that two functions are borrowed from other R packages which will be included in the Docker image in the future. These two functions may not adhere to our coding standards
AveragePeaks: averaging technical replicates after PeakFinding. Information for technical replicates from a txt file with scanmode included.
CollectAveraged: collect averaged peaks for all biological samples
PeakGrouping: input from CollectAveraged
tests/testthat/test_peak_finding_functions: unit tests for PeakFinding funtions. Note: no unit tests have been added for functions from external packages.

…essing

… file with scanmode

…akFinding method

fdekievit

Veel werk!

Ik heb hier en daar wat comments achter gelaten :)

fdekievit · 2025-07-25T11:25:57Z

DIMS/CollectAveraged.nf

+
+    script:
+        """
+        Rscript ${baseDir}/CustomModules/DIMS/CollectAveraged.R


Dit Rscript wordt aangeroepen zonder argumenten, maar het R script consumed cmd_args[1] (DIMS/CollectAveraged.R regel 4) is dit correct?

DIMS/AssignToBins.R

fdekievit · 2025-07-25T11:58:34Z

DIMS/AssignToBins.R


 # get TIC intensities for areas between trim_left and trim_right
-tic_intensity_persample <- cbind(round(raw_data@scantime, 2), raw_data@tic)
+tic_intensity_persample <- cbind(raw_data@scantime, raw_data@tic)


klopt het dat er helemaal niet meer afgerond hoeft te worden?

Klopt, deze getallen worden alleen gebruikt voor het maken van een plot, dus afronden doet er niet toe.

fdekievit · 2025-07-25T12:02:05Z

DIMS/AveragePeaks.R

+techreps <- cmd_args[2]
+scanmode <- cmd_args[3]
+tech_reps <- strsplit(techreps, ";")[[1]]
+print(sample_name)


is dit alleen voor debugging of moet dit er in blijven staan voor de release?

Print statements zijn alleen voor debugging. Verwijderd.

fdekievit · 2025-07-25T12:03:49Z

DIMS/AveragePeaks.R

+print(scanmode)
+
+# set ppm as fixed value, not the same ppm as in peak grouping
+ppm_peak <- 2


hier staan 'not the same ppm as in peak grouping' maar in peak grouping wordt deze value dynamisch ge-assigned. kan het voorkomen dat de code daar soms toch 2 gebruikt, en zo ja, is het een probleem als ze allebij t zelfde zijn en waarom?

De parameter ppm wordt op 4 verschillende plekken in de workflow gebruikt en is in te stellen bij het starten van de workflow; deze wordt voornamelijk gebruikt voor de annotatie bij een bepaalde massa plus of min een tolerantie die berekend wordt adhv ppm.
De parameter ppm_peak wordt alleen bij PeakFinding gebruikt en heeft een vaste waarde. Deze wordt gebruikt om pieken in verschillende samples bij elkaar te zoeken en een piekgroep te vormen. De waarde van 2 reflecteert de nauwkeurigheid van het MS apparaat.
Het is geen probleem als ppm en ppm_peak dezelfde waarde zouden hebben omdat ze niet in hetzelfde script gebruikt worden en een andere toepassing hebben.

fdekievit · 2025-07-25T13:20:47Z