Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 214 additions & 0 deletions core/core/src/docs/rfcs/7130_route_layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
- Proposal Name: `route_layer`
- Start Date: 2026-01-04
- RFC PR: [apache/opendal#7130](https://github.com/apache/opendal/pull/7130)
- Tracking Issue: [apache/opendal#7131](https://github.com/apache/opendal/issues/7131)

# Summary

Introduce `RouteLayer`, a layer that dispatches each OpenDAL operation to one of multiple pre-built `Operator` stacks by matching the operation path against user-provided glob patterns.

# Motivation

In practice, OpenDAL users often want different policies for different parts of a namespace:

- Apply caching only for frequently-read objects (e.g., `**/*.parquet`).
- Use a tighter timeout for latency-sensitive paths (e.g., `hot/**`).
- Attach different observability or throttling configurations per dataset.

Today, users can achieve this by building multiple `Operator`s and routing requests in application code. However, this approach duplicates routing logic across projects and makes it harder to share a consistent, well-tested routing behavior across bindings and integrations.

`RouteLayer` centralizes this routing as a reusable primitive while preserving OpenDAL's existing composition model: users still build independent `Operator` stacks for each policy, and `RouteLayer` only decides which stack should handle a request.

# Guide-level explanation

## Enable feature

```toml
opendal = { version = "*", features = ["layers-route"] }
```

## Build per-route operators

Build each routed `Operator` as usual, including its own service configuration and layer stack.

```rust
use std::time::Duration;

use opendal::layers::RouteLayer;
use opendal::services;
use opendal::Operator;
use opendal::Result;
use opendal_layer_timeout::TimeoutLayer;

fn build_default() -> Result<Operator> {
Operator::new(services::Memory::default())?.finish()
}

fn build_parquet_fast_path() -> Result<Operator> {
Operator::new(services::Memory::default())?
.layer(TimeoutLayer::default().with_timeout(Duration::from_secs(3)))
.finish()
}
```

## Combine them with `RouteLayer`

`RouteLayer` is applied to the default operator. When none of the patterns match, the request is handled by the default operator.

Patterns are evaluated in insertion order. The first matching pattern wins.

```rust
use opendal::Operator;
use opendal::Result;

fn build_routed_operator() -> Result<Operator> {
let default_op = build_default()?;
let parquet_op = build_parquet_fast_path()?;

let routed = default_op.layer(
RouteLayer::builder()
.route("**/*.parquet", parquet_op)
.build()?,
);

Ok(routed)
}
```

## Pattern rules

`RouteLayer` uses glob patterns (via `globset`) and does not perform any implicit expansion.

- `*.parquet` matches `file.parquet` in the root, but does not match `dir/file.parquet`.
- `**/*.parquet` matches `dir/file.parquet` and any other depth.

Paths are matched against OpenDAL normalized paths as seen by the accessor:

- The root is represented as `/`.
- Non-root paths do not start with `/` (e.g., `dir/file`).
- Directory paths end with `/` (e.g., `dir/`).

## Routing for `copy` and `rename`

For `copy(from, to, ..)` and `rename(from, to, ..)`, routing is decided only by `from`. The `to` path is forwarded unchanged to the selected operator.

# Reference-level explanation

## Public API

`RouteLayer` is constructed via a fallible builder, because glob compilation can fail and `Layer::layer()` cannot return `Result`.

```rust
pub struct RouteLayer { /* compiled router */ }

impl RouteLayer {
pub fn builder() -> RouteLayerBuilder;
}

pub struct RouteLayerBuilder { /* patterns + operators */ }

impl RouteLayerBuilder {
pub fn route(self, pattern: impl AsRef<str>, op: Operator) -> Self;
pub fn build(self) -> Result<RouteLayer>;
}
```

`RouteLayer` is intended to be applied to a fully-built `Operator` (dynamic dispatch). Routed operators are also fully-built and independent.

## Internal structure

`RouteLayer` holds:

- `glob: globset::GlobSet`, compiled from patterns in insertion order.
- `targets: Vec<Accessor>`, where `targets[i]` is the accessor corresponding to the `i`-th inserted pattern.

At `build()` time:

1. Compile each pattern into a `globset::Glob`.
2. Insert each glob into a `globset::GlobSetBuilder` in insertion order.
3. Convert each routed `Operator` into its `Accessor` and store it in `targets` in the same order.

Any glob compilation error results in `build()` returning an `ErrorKind::InvalidInput` error with the pattern and the underlying source error attached.

## Dispatch algorithm

For an operation that includes a path `p`:

1. Compute `matches = glob.matches(p)`, yielding a set of matching pattern indices.
2. Select `idx = min(matches)` as the chosen route (first match in insertion order).
3. If no match exists, dispatch to the inner accessor (the default operator that `RouteLayer` is applied to).

This approach does not rely on any ordering guarantees from `globset` for returned matches.

### Operations with paths

The following operations are dispatched based on their `path` argument:

- `create_dir(path, ..)`
- `read(path, ..)`
- `write(path, ..)`
- `stat(path, ..)`
- `list(path, ..)`
- `presign(path, ..)`

### Operations with two paths

The following operations are dispatched based on `from` only:

- `copy(from, to, ..)`
- `rename(from, to, ..)`

### Delete semantics

`Access::delete()` returns a deleter that can queue multiple deletions and execute them in `close()`. To preserve this batching behavior while supporting per-path routing, `RouteLayer` returns a `RouteDeleter`:

- `RouteDeleter::delete(path, ..)` routes by `path` and lazily creates (and caches) the underlying deleter for the selected target accessor.
- `RouteDeleter::close()` closes all created deleters and returns an error if any close fails.

This design batches deletions per routed operator and avoids creating deleters for unused routes.

## `info()` behavior

`RouteAccessor::info()` forwards to the default (inner) accessor's `AccessorInfo`. This keeps `Operator::info()` stable and avoids synthesizing a potentially misleading merged capability across routed operators.

Users that need per-route `AccessorInfo` can query the routed `Operator`s directly before passing them into the builder.

# Drawbacks

- Order-dependent routing can be error-prone when multiple patterns overlap; users must manage ordering carefully.
- `Operator::info()` reflects the default operator, not a merged view across routes.
- `copy`/`rename` routing based only on `from` can lead to surprising behavior if the destination logically belongs to a different route; this is accepted for simplicity.
- Adds a new dependency (`globset`) and introduces a small per-operation routing cost.

# Rationale and alternatives

## Why glob patterns via `globset`?

- Glob patterns are a well-known way to express path-based rules and cover common needs like `**/*.parquet`.
- `globset` compiles patterns into an efficient matcher and provides a `matches` API suitable for first-match selection without scanning all patterns with separate matchers.

## Why first-match-wins without priority?

First-match-wins keeps the model explicit and deterministic without additional concepts. Rule ordering is visible in code review and can be reasoned about locally.

## Alternatives considered

- **User-side routing**: build multiple operators and dispatch in application code. Works today but duplicates logic and increases maintenance burden across projects.
- **Prefix-only routing**: efficient but does not satisfy suffix-based rules like `**/*.parquet`.
- **Implicit expansion (e.g., treating `*.parquet` as `**/*.parquet`)**: simplifies some patterns but introduces surprising behavior; this proposal requires explicit `**`.
- **Routing on operation type or options**: valuable, but out of scope for this RFC and reserved for future work.

# Prior art

- `gitignore`-style globs for hierarchical path matching.
- HTTP routers and proxy configuration models (e.g., ordered rules with first-match semantics).
- Data lake engines commonly route by file suffix (e.g., `.parquet`) to specialized code paths.

# Unresolved questions

None

# Future possibilities

- Extend routing conditions beyond path, such as operation type (read/write/list) or per-operation options.
- Support loading rules from configuration files or environment variables for CLI tools and integrations.
4 changes: 4 additions & 0 deletions core/core/src/docs/rfcs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -292,3 +292,7 @@ pub mod rfc_6817_checksum {}
/// Core
#[doc = include_str!("6828_core.md")]
pub mod rfc_6828_core {}

/// Route Layer
#[doc = include_str!("7130_route_layer.md")]
pub mod rfc_7130_route_layer {}
Loading