Skip to content

Improve S3 filesystem error messages when credentials are missing or access is denied #6668

@pinin4fjords

Description

@pinin4fjords

Summary

The S3 filesystem provider appears to propagate raw AWS SDK exceptions when operations fail due to credential or access issues. This results in cryptic error messages that don't help users understand what went wrong.

Example Error

When a user's AWS credentials cannot access an S3 path, they see errors like:

could not validate file format of 's3://annotation-cache/snpeff_cache/': Unable to marshall request to JSON: Key cannot be empty

A more helpful message would be something like:

Cannot access S3 path 's3://annotation-cache/snpeff_cache/': AWS credentials missing or access denied

Affected Use Cases

This affects nf-core pipelines that have default S3 paths for resources (e.g., annotation caches, iGenomes). Users with their own AWS credentials configured for different accounts see confusing errors when pipeline validation checks these paths.

Related user report: nf-core/sarek#2079

Why I Think This Can't Be Fixed in nf-schema

I attempted to handle this in nf-schema (nextflow-io/nf-schema#191) by catching exceptions during path validation, but couldn't find a reliable way to distinguish credential/access errors from other failures. The exceptions are generic SDK exceptions with varying message text.

I think the fix needs to happen somewhere in Nextflow's S3 handling, but I'm not confident about where or how.

What I'm Trying to Achieve

Tools like nf-schema need to distinguish between:

  1. Access errors (credentials missing, permission denied) → should skip validation gracefully
  2. Other errors → should fail or propagate normally

If nf-schema could catch a standard AccessDeniedException, it could skip path validation gracefully when users don't have access to optional S3 resources.

Speculative Diagnosis

I'm not familiar with the Nextflow codebase, so this may be completely off-base.

Looking at S3ObjectSummaryLookup.lookup(), it seems like AWS SDK exceptions might propagate uncaught? I put together PR #6669 as an illustration of the sort of solution I'm thinking of, but the approach of matching on error message strings ("Unable to marshall request") is clearly not viable.

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions