-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Move NA values to end of table #124
Conversation
WalkthroughThe changes to the Changes
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
workflow/scripts/compare_diffexp.py (1)
49-62
: Improvements in data handling and sortingThe changes in this segment effectively address the PR objective of moving NA values to the end of the table. Here are the key points:
- The added print statement at line 49 may be useful for debugging. Consider removing it or wrapping it in a debug condition for production use.
- Handling of NaN values for 'pi_value' and 'difference' columns (lines 55-56) improves data quality and prevents potential issues in downstream operations.
- The sorting modification with
nulls_last=True
(line 61) ensures that NA values are moved to the end of the table, aligning with the PR objective.These changes enhance the robustness of the data processing pipeline.
Consider wrapping the print statement in a debug condition or removing it for production use.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- workflow/scripts/compare_diffexp.py (2 hunks)
🧰 Additional context used
🪛 Ruff
workflow/scripts/compare_diffexp.py
76-76: SyntaxError: Expected a statement
76-77: SyntaxError: Expected a statement
🔇 Additional comments (1)
workflow/scripts/compare_diffexp.py (1)
Line range hint
1-150
: Overall improvements and addition of data visualizationThe changes in this file effectively address the PR objective of moving NA values to the end of the table. The modifications to the data processing pipeline, particularly the handling of NaN values and the sorting with
nulls_last=True
, achieve this goal.Additionally, the implementation of an Altair chart significantly enhances the data visualization capabilities of the script. This is a valuable addition, although it wasn't explicitly mentioned in the PR objectives.
The changes meet the stated PR objective and provide additional functionality through data visualization.
To ensure that the Altair chart implementation doesn't introduce any new dependencies or conflicts, please run the following script:
This script will help verify that Altair is properly imported and listed as a dependency in the project.
✅ Verification successful
To ensure that Altair is properly imported and listed as a dependency, please run the following updated script:
This script corrects the previous command and expands the search to additional dependency files to verify that Altair is properly included in the project dependencies.
To further verify the Altair import and dependency management, please run the following updated script:
This script adjusts the
rg
command to search for Python files based on their.py
extension and performs a broader search for any references to Altair across all files in the repository. Additionally, it lists all Python files to help understand the project's structure and dependency management approach.
Dependency Verification Successful
The verification confirms that Altair is correctly imported in multiple scripts and is listed in the project's environment YAML files (
QC.yaml
,pystats.yaml
). This ensures that Altair dependencies are properly managed.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify Altair-related imports and check for any new dependencies # Test: Check for Altair import in other files rg --type python "import altair as alt" --files-without-match workflow/scripts/compare_diffexp.py # Test: Check for any new dependencies in the project if [ -f "requirements.txt" ]; then grep "altair" requirements.txt elif [ -f "environment.yml" ]; then grep "altair" environment.yml else echo "No requirements.txt or environment.yml found. Please ensure Altair is properly listed as a dependency." fiLength of output: 404
Script:
#!/bin/bash # Description: Verify Altair-related imports and check for any new dependencies # Corrected Test: Check for Altair import in other Python files rg -t python "import altair as alt" --files-without-match workflow/scripts/compare_diffexp.py # Test: Check for Altair in common dependency files DEPENDENCY_FILES=("requirements.txt" "environment.yml" "setup.py" "pyproject.toml" "Pipfile") found_dependency=0 for file in "${DEPENDENCY_FILES[@]}"; do if [ -f "$file" ]; then grep "altair" "$file" if [ $? -eq 0 ]; then found_dependency=1 break fi fi done if [ $found_dependency -eq 0 ]; then echo "Altair is not listed in any recognized dependency files. Please ensure it is added appropriately." fiLength of output: 791
Script:
#!/bin/bash # Description: Verify Altair-related imports and check for any new dependencies # Corrected Test: Search for Altair import in all Python files by extension rg "import altair as alt" --glob "*.py" --files-with-matches # Comprehensive Test: Search for Altair in all files to identify dependency declarations grep -r "altair" . # Optionally, list all Python files to understand the repository structure fd -e pyLength of output: 1478
workflow/scripts/compare_diffexp.py
Outdated
) | ||
print(combined) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax error and additional print statement
There are two issues in this segment:
- There's an extra closing parenthesis at line 76, which is causing a syntax error. This needs to be removed.
- Another print statement is added at line 77, which might be useful for debugging but could clutter output in production.
To fix the syntax error and address the print statement, apply the following changes:
-)
-print(combined)
+print(combined) # Consider removing or wrapping in a debug condition
This will resolve the syntax error and maintain the print statement. However, consider removing the print statement or wrapping it in a debug condition for production use.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
) | |
print(combined) | |
print(combined) # Consider removing or wrapping in a debug condition |
🧰 Tools
🪛 Ruff
76-76: SyntaxError: Expected a statement
76-77: SyntaxError: Expected a statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
workflow/rules/meta_comparisons.smk (2)
70-70
: LGTM! Completes consistent logging update.This change finalizes the transition from
.ipynb
to.log
format for all three rules in this file. The consistency in logging format acrossmeta_compare_diffexp
,meta_compare_enrichment
, andmeta_compare_pathways
rules is commendable.Consider updating any documentation or README files to reflect this change in logging format. This will help maintain clear and up-to-date information for other developers or users of the workflow.
Line range hint
14-70
: Overall changes improve logging, but direct connection to PR objective is unclear.The consistent change from
.ipynb
to.log
format across all three rules (meta_compare_diffexp
,meta_compare_enrichment
, andmeta_compare_pathways
) improves the logging mechanism and potentially simplifies log parsing. While this could indirectly support the PR objective of moving NA values to the end of tables by making log analysis easier, the direct connection is not immediately apparent from these changes alone.Consider adding a comment in the file or updating the PR description to clarify how these logging changes contribute to the overall objective of moving NA values to the end of the table. This would help reviewers and future maintainers understand the full context and impact of these modifications.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- workflow/rules/meta_comparisons.smk (3 hunks)
- workflow/scripts/compare_diffexp.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- workflow/scripts/compare_diffexp.py
🧰 Additional context used
🔇 Additional comments (2)
workflow/rules/meta_comparisons.smk (2)
45-45
: LGTM! Consistent change across rules.The modification from
.ipynb
to.log
format for logging is consistent with the change in themeta_compare_diffexp
rule. This uniformity in logging format across rules is a good practice.
14-14
: LGTM! Consider verifying downstream log processing.The change from
.ipynb
to.log
format for logging is appropriate and aligns with standard practices. This modification should make log parsing easier, potentially supporting the PR's objective of reorganizing table data.To ensure this change doesn't break any downstream processes, please run the following script:
✅ Verification successful
Verified: No remaining references to
.ipynb
log format found.The change from
.ipynb
to.log
format has been thoroughly verified. No references to the old.ipynb
log format exist in the codebase, ensuring that downstream processes remain unaffected.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any references to the old .ipynb log format in the codebase # Search for references to the old .ipynb log format rg --type python --type yaml --type snakemake "logs/meta_compare_diffexp/.*\.ipynb" # If the above command returns any results, those files may need to be updated to use the new .log formatLength of output: 116
Script:
#!/bin/bash # Description: Check for any references to the old .ipynb log format in the codebase # Search for references to the old .ipynb log format rg --type py --type yaml --type snakemake 'logs/meta_compare_diffexp/.*\.ipynb' # If the above command returns any results, those files may need to be updated to use the new .log formatLength of output: 115
Script:
#!/bin/bash # Description: Check for any references to the old .ipynb log format in the codebase # Define snakemake file type rg --type-add 'snakemake:*.smk' --type snakemake 'logs/meta_compare_diffexp/.*\.ipynb'Length of output: 88
.with_columns( | ||
pl.col("pi_value").fill_nan(None), | ||
pl.col("difference").fill_nan(None), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems convoluted, but at least it's well documented:
https://docs.pola.rs/user-guide/expressions/missing-data/#notanumber-or-nan-values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Not sure though, why the CodeRabbit summary is so verbose...
🤖 I have created a release *beep* *boop* --- ## [2.8.2](v2.8.1...v2.8.2) (2024-10-18) ### Bug Fixes * Convert pval and qval to heatmap ([#123](#123)) ([478c759](478c759)) * Move NA values to end of table ([#124](#124)) ([df807a5](df807a5)) * Update max in memory rows for diffexp report ([#122](#122)) ([830e678](830e678)) ### Performance Improvements * Update to latest datavzrd wrapper ([#125](#125)) ([3a4c020](3a4c020)) * Update to latest datavzrd wrapper version ([#120](#120)) ([7b2767c](7b2767c)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Bug Fixes