-
Notifications
You must be signed in to change notification settings - Fork 45
fix: debug fgsea error #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
the fgsea script throws: "Error: C stack usage 7974932 is too close to the limit"
WalkthroughThe FGSEA R script was updated to add error handling infrastructure via Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/scripts/fgsea.R(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-kallisto-sleuth PR: 0
File: :0-0
Timestamp: 2025-10-10T07:43:04.543Z
Learning: In this Snakemake workflow project, tidyverse package functions (such as `read_tsv()` from `readr`) provide good error handling and clear error messages. Manual error handling with `tryCatch()` for common cases is not necessary when using tidyverse functions.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: run-three-prime-rna-workflow
- GitHub Check: run-rna-workflow
🔇 Additional comments (1)
workflow/scripts/fgsea.R (1)
17-39: Good: Debug print statements removed.The removal of interim debug prints (
print(gene_sets)andprint(ranked_genes)) improves code cleanliness for production use. The data loading and preprocessing logic looks sound.
| sink(log) | ||
| sink(log, type="message") | ||
|
|
||
| rlang::global_entrace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify that rlang::global_entrace() actually resolves the C stack error.
The PR description indicates a "C stack usage is too close to the limit" error. However, rlang::global_entrace() is primarily an error tracing utility that captures better backtraces—it doesn't directly address C stack limit issues, which typically arise from deep recursion or large nested data structures.
Could you confirm that adding this line actually resolves the C stack error? If the issue persists, consider these alternatives:
- Adjust the
nprocparameter in thefgsea()call (line 52) to control parallelization - Increase expression depth limit with
options(expressions = 50000)before thefgsea()call - Check if input data size can be reduced or chunked
Run the script with the changes to confirm the C stack error is resolved:
🏁 Script executed:
#!/bin/bash
# Description: Check if rlang is listed as a dependency
# This helps verify if rlang::global_entrace() will work as expected
# Search for rlang in package dependencies (e.g., DESCRIPTION, environment.yaml, requirements files)
fd -t f -e yaml -e yml -e txt --exec rg -l "rlang" {}
rg -n "library.*rlang|require.*rlang" --type rLength of output: 0
🏁 Script executed:
#!/bin/bash
# Find project structure and R package metadata files
echo "=== Project metadata files ==="
fd -t f \( -name "DESCRIPTION" -o -name "renv.lock" -o -name ".Rprofile" -o -name "environment.yml" \)
echo -e "\n=== Listing workflow/scripts directory ==="
fd -t f --base-directory workflow/scripts
echo -e "\n=== Contents of fgsea.R (first 60 lines) ==="
head -60 workflow/scripts/fgsea.R
echo -e "\n=== Search for package dependencies on rlang ==="
rg -n "rlang|fgsea" --type r -C 2 workflow/scripts/fgsea.R | head -80Length of output: 6701
🏁 Script executed:
#!/bin/bash
# Verify if fgsea depends on rlang or tidyverse
echo "=== Search for package dependency information ==="
# Look for any R package metadata that might list dependencies
fd -t f \( -name "DESCRIPTION" -o -name "*.lock" \)
echo -e "\n=== Check common.R to understand what's loaded ==="
head -30 workflow/scripts/common.R
echo -e "\n=== Search for any version control or environment files ==="
fd -t f \( -name "*.lock" -o -name "*requirements*" -o -name "*.txt" \) --max-depth 2 2>/dev/null | head -20Length of output: 1263
🌐 Web query:
rlang global_entrace C stack error fix
💡 Result:
The C-stack errors are caused by stack overflows (deep/infinitely recursive code) while rlang’s global entrace/error handlers try to capture a backtrace. Fixes:
- Stop the recursion (best fix).
- Temporarily disable rlang’s global entracing when running code that may overflow:
- In your session: rlang::global_entrace(FALSE).
- Or remove/avoid calling rlang::global_entrace() from your Rprofile. [1][2]
- Upgrade rlang and R (rlang’s handlers use R >= 4.0 globalCallingHandlers; newer rlang releases include behavior and docs about stack-overflow handling). If you rely on handlers for diagnostics, keep rlang up to date. [1][3]
- If you must catch stack overflows, use tryCatch/try_fetch around the risky call (note: handlers behave differently for stack overflows and R >= 4.2). See rlang docs on try_fetch and stack overflows. [4]
References:
[1] rlang::global_entrace documentation. [2] rlang::entrace docs (how entrace/handlers work). [3] rlang global_handle / handlers docs. [4] rlang docs on try_fetch and stack overflows.
rlang::global_entrace() likely worsens rather than fixes C stack overflow errors.
Based on rlang documentation, global_entrace() captures detailed error backtraces and adds error handling overhead. For C stack overflow errors (like "C stack usage 7974932 is too close to the limit"), this approach is counterproductive because it increases stack consumption during error processing.
The recommended fixes for C stack overflows are:
- Disable
global_entrace()rather than enable it - Reduce
nprocparameter in thefgsea()call (line 48) to decrease parallelization overhead - Adjust
epsparameter to affect convergence behavior - Reduce input data size if possible
Please verify that adding this line actually resolves the reported C stack error. If the error persists, consider removing global_entrace() and implementing one of the alternatives above.
🤖 Prompt for AI Agents
In workflow/scripts/fgsea.R around line 5, the call to rlang::global_entrace()
is likely increasing stack usage and can worsen C stack overflow errors; remove
or comment out this line so backtrace capture is disabled, then in the fgsea()
invocation around line 48 reduce the nproc value (e.g., lower or set to 1) to
limit parallel threads, consider increasing the eps parameter to relax
convergence or otherwise tune it, and if feasible reduce the input data size;
after these changes run the failing analysis to confirm the C stack error is
resolved and only re-enable advanced error tracing if the overflow no longer
occurs.
the fgsea script throws:
"Error: C stack usage 7974932 is too close to the limit"
Summary by CodeRabbit
Bug Fixes
Chores