Skip to content

Conversation

mraves2
Copy link
Contributor

@mraves2 mraves2 commented Jul 17, 2025

HMDB database (including adducts and isotopes) is divided into parts to enable parallel processing. In the old setup, each part contained 20000 lines; in the new set-up, each part contains a certain mass range. For m/z < 100, smaller ranges are used because of the higher abundance of peaks at lower masses.
There are no modifications in the DIMS repo necessary for this change.

Copy link
Member

@rernst rernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor comment, let me know if you would like to update the code.

Comment on lines 31 to 41
while (segment_end < max_mz) {
if (segment_start < 100) {
mz_segments <- c(mz_segments, segment_start)
segment_start <- segment_start + 5
segment_end <- segment_end + 5
} else {
mz_segments <- c(mz_segments, segment_start)
segment_start <- segment_start + 10
segment_end <- segment_end + 10
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To separate some logic (prepare for tests and easier to maintain) I would write this like:

while (segment_end < max_mz) {
    if (segment_start < 100) {
      segment_size = 5
    } else {
      segment_size = 10
    }
    mz_segments <- c(mz_segments, segment_start)
    segment_start <- segment_start + segment_size
    segment_end <- segment_end + segment_size
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion; I implemented the modification.

@rernst rernst self-requested a review September 30, 2025 11:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants