Skip to content

Conversation

@pinin4fjords
Copy link

@pinin4fjords pinin4fjords commented Dec 17, 2025

Summary

Gracefully handle inaccessible cloud storage paths in format validation, instead of failing with cryptic errors.

Problem

Cloud storage paths (s3://, az://, gs://) may fail validation when users don't have credentials or permissions to access them. This causes errors like:

could not validate file format of 's3://annotation-cache/snpeff_cache/': Unable to marshall request to JSON: Key cannot be empty

This affects multiple nf-core pipelines that have default S3 paths for resources like annotation caches or iGenomes.

Why this affects multiple pipelines

The issue is triggered by format: directory-path (and similar format keywords) even without exists: true. The format evaluator calls file.exists() to verify the path is accessible and is actually a directory:

file = Nextflow.file(value) as Path
file.exists()  // This triggers cloud access → fails without credentials
Pipeline Schema Pattern Why it fails
nf-core/sarek format: directory-path + exists: true Both FormatDirectoryPathEvaluator AND ExistsEvaluator try to access S3
nf-core/riboseq format: directory-path only FormatDirectoryPathEvaluator tries to access S3

Same symptom, slightly different triggers - but both fixed by handling inaccessible cloud paths gracefully in the evaluators.

Solution

Take a targeted approach that preserves validation when possible:

  1. Still attempt validation for cloud paths (preserves validation for users with proper access)
  2. Catch exceptions from exists()/isDirectory() calls for cloud paths and skip validation gracefully instead of failing
  3. DO NOT give cloud paths a free pass on legitimate validation failures - only skip when access fails

This means:

  • ✅ Users with cloud access → validation works normally
  • ✅ Users without cloud access → validation skipped gracefully (no error)
  • ✅ Cloud path that is accessible but wrong type → validation fails correctly
  • ✅ Local paths → unchanged behavior

Changes

  • Added isCloudStoragePath() helper to Common.groovy (single source of truth)
  • FormatDirectoryPathEvaluator.groovy - Graceful handling for inaccessible cloud paths
  • FormatFilePathEvaluator.groovy - Graceful handling for inaccessible cloud paths
  • FormatFilePathPatternEvaluator.groovy - Graceful handling for inaccessible cloud paths
  • ExistsEvaluator.groovy - Graceful handling for inaccessible cloud paths
  • SchemaEvaluator.groovy - Graceful handling for inaccessible cloud paths
  • Added test cases for S3, GCS, and Azure path validation
  • Updated CHANGELOG.md

Relationship with PR #179

PR #179 addresses a related issue specifically for Azure. This PR:

Related

Test plan

  • Added unit tests for S3, GCS, and Azure path validation
  • Existing tests pass

🤖 Generated with Claude Code

@pinin4fjords pinin4fjords force-pushed the fix/cloud-storage-path-validation branch from 5ac6f8f to fd3606b Compare December 17, 2025 12:49
Cloud storage paths (s3://, az://, gs://) may fail validation when users
don't have credentials or permissions to access them. This causes errors
like:

  "could not validate file format of 's3://...': Unable to marshall
   request to JSON: Key cannot be empty"

This fix takes a targeted approach:
- Still attempt validation for cloud paths (preserves validation for
  users with proper access)
- Catch exceptions from exists()/isDirectory() calls for cloud paths
  and skip validation gracefully instead of failing
- Also handle the case where cloud storage paths report incorrect
  file/directory type (similar to PR nextflow-io#179's Azure fix)

This allows pipelines with default cloud storage paths to be launched
without requiring access to those paths during parameter validation,
while still validating when access is available.

Fixes nf-core/sarek#2079

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pinin4fjords pinin4fjords force-pushed the fix/cloud-storage-path-validation branch from fd3606b to f22ddb5 Compare December 17, 2025 12:53
pinin4fjords and others added 4 commits December 17, 2025 12:57
Complete the cloud storage path graceful handling by adding
the same pattern to ExistsEvaluator. This ensures that the
`exists: true` schema keyword also handles inaccessible
cloud paths (S3, Azure, GCS) without failing validation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The try/catch blocks correctly handle inaccessible cloud paths by
skipping validation when access fails. However, the additional checks
that gave cloud paths a free pass on legitimate validation failures
(e.g., "path is a file not a directory") were too aggressive.

If we can successfully access a cloud path and determine its type,
we should validate it properly - not skip the check.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move duplicated isCloudStoragePath() function to Common.groovy
to eliminate code duplication across 5 evaluator classes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When catching exceptions for cloud storage paths, we don't actually
know if it's a credentials issue - could be malformed path, network
error, etc. Updated log messages to say "due to exception" instead
of claiming the path is "inaccessible".

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pinin4fjords
Copy link
Author

Closing this PR after further analysis. The root cause is that Nextflow/AWS SDK throws unhelpful exceptions for missing credentials, and nf-schema catching these exceptions is a workaround that might hide legitimate errors.

A better fix would be in Nextflow core - e.g., providing a way to check cloud path accessibility without throwing, or better exception types that distinguish 'no credentials' from 'malformed path'.

Will open an issue against Nextflow instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Default values for un-used options break workflow

1 participant