-
Notifications
You must be signed in to change notification settings - Fork 757
Description
Summary
The S3 filesystem provider appears to propagate raw AWS SDK exceptions when operations fail due to credential or access issues. This results in cryptic error messages that don't help users understand what went wrong.
Example Error
When a user's AWS credentials cannot access an S3 path, they see errors like:
could not validate file format of 's3://annotation-cache/snpeff_cache/': Unable to marshall request to JSON: Key cannot be empty
A more helpful message would be something like:
Cannot access S3 path 's3://annotation-cache/snpeff_cache/': AWS credentials missing or access denied
Affected Use Cases
This affects nf-core pipelines that have default S3 paths for resources (e.g., annotation caches, iGenomes). Users with their own AWS credentials configured for different accounts see confusing errors when pipeline validation checks these paths.
Related user report: nf-core/sarek#2079
Why I Think This Can't Be Fixed in nf-schema
I attempted to handle this in nf-schema (nextflow-io/nf-schema#191) by catching exceptions during path validation, but couldn't find a reliable way to distinguish credential/access errors from other failures. The exceptions are generic SDK exceptions with varying message text.
I think the fix needs to happen somewhere in Nextflow's S3 handling, but I'm not confident about where or how.
What I'm Trying to Achieve
Tools like nf-schema need to distinguish between:
- Access errors (credentials missing, permission denied) → should skip validation gracefully
- Other errors → should fail or propagate normally
If nf-schema could catch a standard AccessDeniedException, it could skip path validation gracefully when users don't have access to optional S3 resources.
Speculative Diagnosis
I'm not familiar with the Nextflow codebase, so this may be completely off-base.
Looking at S3ObjectSummaryLookup.lookup(), it seems like AWS SDK exceptions might propagate uncaught? I put together PR #6669 as an illustration of the sort of solution I'm thinking of, but the approach of matching on error message strings ("Unable to marshall request") is clearly not viable.
Related Issues
- Unhelpful error message when service account is missing access to resource on GCP #4351 - Similar issue for GCP error messages
- Feature request: More explicit error message when publishing to blob storage fails due to insufficient permissions. #4639 - Related error improvements for publish permissions
- Skip format validation for cloud storage paths (S3, Azure, GCS) nf-schema#191 - Attempted workaround in nf-schema (closed - not feasible)
- Ignore Azure paths when validating directories nf-schema#179 - Azure-specific handling