Merged
Conversation
- Update correlation matrix computation to only calculate off-diagonal elements - Simplify initialization of the correlation matrix with NaN values - Remove redundant NaN assignments within the correlation calculation loop
- Introduce additional features including cycle_fwhm, mz_observed, mz_library, and mz_calibrated - Expand error metrics with mean_ms2_mass_error and top_3_ms2_mass_error - Include top3_frame_correlation and top3_b_ion_correlation for improved correlation analysis - Comment out unused correlation features for clarity
- Change LOG2_TRANSFORM_FEATURES and ALIGN_FEATURES from "intensity" to "ms2_intensity" for improved clarity and alignment with data processing standards and to avoid confusion between ms1 and ms2 intensity.
- Change occurrences of "intensity" to "ms2_intensity" and "ms1_intensity" to improve clarity and maintain consistency across data processing.
- Replace occurrences of "intensity" with "ms2_intensity" in the PreprocessingPipeline class for consistency with recent changes in data handling. - Adjust column dropping and reordering logic to align with the updated feature names.
- Add "height" to LOG2_TRANSFORM_FEATURES and ALIGN_FEATURES for enhanced feature representation and consistency in data processing.
- Update column dropping logic to utilize ColumnConfig.IDENTIFIERS instead of hardcoded categorical features for better maintainability. - Enhance logging message formatting for clarity.
- Modify load_fragment_data_files and load_precursor_file methods to accept an optional directory argument, defaulting to the current working directory if not provided.
- move _extract_data out of _prepare_data method for more dynamic data handling - Update _prepare_data method to directly use the log-transformed data, enhancing readability and maintainability.
- Introduce SharedState class to encapsulate state information shared between multiple classes. - Create a singleton instance of SharedState for easy access in other modules.
…gement - Replace local variable for sorted_columns with shared_state.sorted_columns to enable cross-module access. - Store level information in shared_state for improved data extraction consistency.
… representation - Update the return value of the ensemble prediction method to include a DataFrame with normalized predictions, leveraging shared_state for column names and index information. - This change improves data organization and consistency across modules.
…d data accuracy - Modify the _shift_prediction method to accept shared_state.lin_scaled_data, ensuring predictions are accurately transformed back to the original scale. - Update the model initialization to include a PreprocessingPipeline instance for better data handling.
- Modify the assignment of extracted intensity data to use shared_state.lin_scaled_data, ensuring consistency in data handling across modules.
…ment - Introduce lin_scaled_data attribute in the SharedState class to facilitate better handling of scaled data across modules, ensuring consistency in data processing.
- Introduce ModelConfig class to encapsulate model configuration parameters, enhancing organization and readability. - Update the number of engineered features from 2 to 3 and adjust input size calculation accordingly. - Consolidate configuration parameters into a structured CONFIG dictionary for better maintainability and clarity.
…management - Remove unused feature names from MS2_FEATURE_NAMES for clarity. - Update LOG2_TRANSFORM_FEATURES and ALIGN_FEATURES to include additional features for improved data processing. - Introduce PRECURSOR_IDENTIFIERS in ColumnConfig to expand identifier options for better data handling.
…uced dependencies
…n for cleaner output
- Introduce load_features method to load both precursor and fragment features from specified directories. - Add logging statements to track the reading and processing of MS1 and MS2 features for better debugging. - Update feature naming conventions to include prefixes for clarity. - Remove unused methods to streamline the Loader class.
…a tracking - Introduce ms2_identifiers and ms1_identifiers attributes to the SharedState class to facilitate better tracking of identifiers across modules. - Add extracted_ms2_identifiers and extracted_ms1_identifiers attributes to improve data management and consistency in identifier handling.
…ctionality - Add docstring to the get_logger function for better documentation. - Ensure consistent logging format with a specified date format. - Remove commented-out code for cleaner implementation.
|
The following feedback could not be added to specific lines, but still contains valuable information: |
|
Number of tokens: input_tokens=44640 output_tokens=4096 max_tokens=4096 |
mschwoer
approved these changes
Mar 19, 2025
| ) # Set to NaN if not enough data points | ||
|
|
||
| np.fill_diagonal(correlation_matrix, np.nan) | ||
| for j in range(n): |
Collaborator
There was a problem hiding this comment.
if i==j: continue
saves a level of indent ;-)
| @@ -0,0 +1,15 @@ | |||
| class SharedState: | |||
Collaborator
There was a problem hiding this comment.
not sure if this is a good pattern .. could the modules not just pass the data around?
| @@ -1,41 +1,45 @@ | |||
| import torch.nn as nn | |||
Collaborator
There was a problem hiding this comment.
question on the PR description: this is AI-generated I guess?
if so, please prompt it to be more concise there.. it should be one level above the actual code changes ;-)
if not: please do not invest so much time in that :-D
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request includes several changes to improve the configuration, data loading, and preprocessing in the
selectlfqpackage. The most important changes include the introduction of aModelConfigclass, updates to data configuration constants, enhancements to the data loading process, and refinements in the preprocessing pipeline.Configuration Updates:
ModelConfigclass with constants for engineered and removed features inselectlfq/config.py.CONFIGdictionary to include comments and changedverboseparameter toTrueinselectlfq/config.py. [1] [2]Data Configuration Constants:
DataConfigto comment out unused MS2 feature names and added new lists for log2 transform and alignment features inselectlfq/constants.py.PRECURSOR_IDENTIFIERStoColumnConfiginselectlfq/constants.py.Data Loading Enhancements:
load_featuresmethod toLoaderclass and updated existing methods to handle default directories and logging inselectlfq/loader.py. [1] [2] [3] [4]selectlfq/loader.py.Preprocessing Refinements:
selectlfq/preprocessing.py. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]Miscellaneous Changes: