Skip to content

Conversation

@Soap2G
Copy link
Contributor

@Soap2G Soap2G commented Jan 9, 2026

Closes #709

Documentation on setting up and configuring Rucio Storage Elements (RSEs) from an operator's perspective. Includes:

  • Overview of RSE types (POSIX, WebDAV, Disk, Tape) (removed in f80b901)
  • Two setup methods: CLI and Python API with side-by-side examples
  • Configuration examples for each RSE type
  • WebDAV setup with Apache configuration and davs protocol (removed in 040ee73)
  • EOS disk RSE with https and root protocols
  • CTA tape RSE configuration with staging timeouts
  • RSE attributes, protocols, and account limits reference
  • Best practices and common pitfalls
  • Quick reference commands

The examples use the latest rucio CLI commands

@Soap2G Soap2G self-assigned this Jan 9, 2026
@voetberg
Copy link
Contributor

voetberg commented Jan 9, 2026

Overall all these changes are really good!!

Only thing I didn't comment on directly in the body of the review is that we might want to mention how rse specific limits work vs account only limits

@Soap2G
Copy link
Contributor Author

Soap2G commented Jan 12, 2026

@voetberg Added a few fixes in 317c0db.

Also, I stumbled upon https://rucio.github.io/documentation/operator/configuration/#creating-new-rses; should I add a link to there pointing to this page?

@voetberg
Copy link
Contributor

@voetberg Added a few fixes in 317c0db.

Also, I stumbled upon https://rucio.github.io/documentation/operator/configuration/#creating-new-rses; should I add a link to there pointing to this page?

I would say this PR completely supersedes that page, and I would link to this page and drastically reduce what's on the config params page. (Maybe just summarizing into a TL;DR with "add rse", "add rse attribute", "add rse protocol", "add account limit")

If you would like, I can take that over and you can just link your page there with something like "An in-depth guide to configuring RSEs can be found here"

@Soap2G
Copy link
Contributor Author

Soap2G commented Jan 12, 2026

@voetberg Added a few fixes in 317c0db.
Also, I stumbled upon https://rucio.github.io/documentation/operator/configuration/#creating-new-rses; should I add a link to there pointing to this page?

I would say this PR completely supersedes that page, and I would link to this page and drastically reduce what's on the config params page. (Maybe just summarizing into a TL;DR with "add rse", "add rse attribute", "add rse protocol", "add account limit")

If you would like, I can take that over and you can just link your page there with something like "An in-depth guide to configuring RSEs can be found here"

Cool, then I'll let you take care of the summary, while I'll just link this page in there. As soon as I have some info about istape, I'll finish up this. Thanks!!

@panta-123
Copy link
Contributor

panta-123 commented Jan 12, 2026

@Soap2G , there seem to be existing section in doc title "Creating new RSEs"
https://rucio.cern.ch/documentation/operator/configuration#creating-new-rses

Quota stuff is also discussed in here: https://rucio.cern.ch/documentation/operator/configuration/#setting-quota-and-permissions

We should consolidate the two and have a single place to have these information.

So I would think:
We put all these info into https://rucio.cern.ch/documentation/operator/configuration/ or put link to the new docs file into there.

@Soap2G
Copy link
Contributor Author

Soap2G commented Jan 14, 2026

@Soap2G , there seem to be existing section in doc title "Creating new RSEs" https://rucio.cern.ch/documentation/operator/configuration#creating-new-rses

Quota stuff is also discussed in here: https://rucio.cern.ch/documentation/operator/configuration/#setting-quota-and-permissions

We should consolidate the two and have a single place to have these information.

So I would think: We put all these info into https://rucio.cern.ch/documentation/operator/configuration/ or put link to the new docs file into there.

See this comment; Maggie will take care of that once the page is up.

@Soap2G
Copy link
Contributor Author

Soap2G commented Jan 16, 2026

@voetberg @panta-123 We should be ready to go, with the reminder of merging the redundant RSE-related pages after this is done.

@voetberg
Copy link
Contributor

@Soap2G Please rebase to grab the pre-commit ci!

Soap2G and others added 4 commits January 19, 2026 13:40
Documentation on setting up and configuring Rucio Storage
Elements (RSEs) from an operator's perspective. Includes:

- Overview of RSE types (POSIX, WebDAV, Disk, Tape)
- Two setup methods: CLI and Python API with side-by-side examples
- Configuration examples for each RSE type
- WebDAV setup with Apache configuration and davs protocol
- EOS disk RSE with https and root protocols
- CTA tape RSE configuration with staging timeouts
- RSE attributes, protocols, and account limits reference
- Best practices and common pitfalls
- Quick reference commands

The examples use the latest `rucio` CLI commands

Co-authored-by: Nikita Avdeev <naavdeev.astro@gmail.com>
Co-authored-by: Luis Antonio Obis Aparicio <luis.antonio.obis@gmail.com>
…nstead of core

Used RSEClient class for rse operations, and AccountLimitClient for account.
Additionally added a paragraph about configuration concepts.
Removed istape from RSE config guide, as it's not needed by Rucio
and it can be replaced by rse_type.

Additionally, added a clearer description of istape in the attributes page.
@Soap2G Soap2G force-pushed the gguerrie-rse-docs branch from d46af70 to f80b901 Compare January 19, 2026 12:41
@Soap2G
Copy link
Contributor Author

Soap2G commented Jan 19, 2026

@Soap2G Please rebase to grab the pre-commit ci!

Hey @voetberg, done 😁

Copy link
Contributor

@voetberg voetberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - nothing seems obviously wrong and if people want to add more examples we can do that in the future

voetberg
voetberg previously approved these changes Jan 20, 2026
@voetberg
Copy link
Contributor

@panta-123 Do you want to give this a read-over and review?

voetberg
voetberg previously approved these changes Jan 28, 2026
This commit consolidates multiple documentation corrections and clarifications:

- Corrected RSE settings vs attributes distinction
   - Fixed TypedDict field listings to match actual implementation
   - Removed non-existent protocol priority fields (priority_lan, priority_wan)
   - Clarified that lfn2pfn_algorithm is an RSE attribute, not a creation parameter
   - Restored geographic fields (city, country_name, latitude, longitude, region_code, time_zone) as valid RSE settings that can be set via gateway API

- Clarified deterministic vs non-deterministic RSE behavior
   - lfn2pfn_algorithm: for deterministic RSEs (disk), computes paths from scope+name only
   - naming_convention: for non-deterministic RSEs (tape), uses metadata/timestamps
   - Added detailed comparison table explaining the differences
   - Updated all examples to reflect correct usage patterns

- Fixed Python API examples**
   - Removed incorrect lfn2pfn_algorithm parameter from add_rse() calls
   - Shows correct attribute-based configuration via add_rse_attribute()
   - Updated workflow examples to match actual client implementation

- CLI fixes
   - Minor fixes to commands structure
@Soap2G
Copy link
Contributor Author

Soap2G commented Feb 3, 2026

Thanks to @Geogouz for the AI-assisted review. It was very useful to spot some inconsistencies in the text.
It also helped in cross checking the CLI commands structure (the suggestions sometimes mixed up legacy and new commands, but overall was useful).

I've also slightly updated configuration_parameters to reflect some clarifications needed in the RSE page.
Most of the content is about RSE settings / attributes, and lfn2pfn algos vs non deterministic RSEs.

Reviews are welcome @voetberg @panta-123

@Geogouz
Copy link
Contributor

Geogouz commented Feb 3, 2026

Thanks to @Geogouz for the AI-assisted review. It was very useful to spot some inconsistencies in the text. It also helped in cross checking the CLI commands structure (the suggestions sometimes mixed up legacy and new commands, but overall was useful).

I've also slightly updated configuration_parameters to reflect some clarifications needed in the RSE page. Most of the content is about RSE settings / attributes, and lfn2pfn algos vs non deterministic RSEs.

Reviews are welcome @voetberg @panta-123

At Rucio's service upon request :D. For reference, here is what the output was in case others would like to use it too in the future. May not be perfect, but I would say it clearly does more good than harm to have it as an additional review opinion:

1) “RSE Attributes” section mixes up settings vs attributes

Inaccurate / misleading in the guide

  • Treating these as “RSE attributes” you set with rucio rse attribute add / add_rse_attribute(...):
    • rse_type
    • verify_checksum

What’s correct

  • rse_type is an RSE setting/property stored on the RSE record (not an attribute). Update it via rucio rse update / RSEClient.update_rse(...) (or set it at creation time).
  • verify_checksum is an RSE setting/property (not an attribute). Update it via rucio rse update / RSEClient.update_rse(...) and pass a boolean.
  • lfn2pfn_algorithm is documented as an RSE attribute (and is surfaced in the RSE “settings” output). It’s typically set via rucio rse attribute add / RSEClient.add_rse_attribute(...) and is effectively immutable afterwards.

Why this matters: setting rse_type / verify_checksum via the attribute path won’t change the RSE settings you think it changes, so the deployment won’t behave as described.


2) The “Mandatory attributes” danger box is incorrect

Inaccurate

  • “Mandatory attributes: rse_type, fts, lfn2pfn_algorithm

What’s correct

  • rse_type:
    • Not an attribute; it’s an RSE setting.
    • Defaults to DISK if not set.
  • fts:
    • Is an RSE attribute, but it’s only needed if your deployment uses FTS for transfers (e.g., third‑party copy via FTS).
  • lfn2pfn_algorithm:
    • Is an RSE attribute, but it’s not mandatory because there is a policy default when none is set.
    • It’s also effectively immutable after creation, so treat it as a creation-time decision.

3) Protocol priority rules in the guide are wrong / misleading

Inaccurate

  • “Higher numbers indicate higher priority”

Misleading / needs tightening

  • “Priority 0 or omitted disables the protocol”

What’s correct directionally

  • Priority ordering is not “bigger number = more preferred”. The transfer tooling considers protocols “ordered by priority”, and examples/documentation consistently use 1 for enabled operations and 0 to disable an operation.
  • Use 0 to disable an operation, and use a positive integer (commonly 1) to enable it. If an operation key is omitted, treat it as “not supported”.

4) CLI examples don’t match the current rucio (click) CLI, and the guide mixes rucio vs rucio-admin

Inaccurate / misleading in the guide

  • Using positional RSE arguments where the click-based rucio CLI expects options, e.g.:
    • rucio rse add RSE_NAME
    • rucio rse protocol add ... RSE_NAME
    • rucio rse distance add --distance 1 SOURCE_RSE DEST_RSE
    • rucio account limit add account_name --rse RSE_NAME --bytes quota
  • Treating the legacy rucio-admin syntax and the click-based rucio syntax as interchangeable.

What’s correct directionally

  • In the documented click-based rucio CLI, RSEs are passed via --rse / --rses, distance endpoints via --source/--destination, and protocol host via --host, e.g.:
    • rucio rse add --rse XRD1
    • rucio rse protocol add --host xrd1 --rse XRD1 ...
    • rucio rse distance add --source XRD1 --destination XRD2 --distance 1
    • rucio account limit add --account root --rses XRD1 --bytes infinity
  • rucio-admin is a different (legacy) CLI with different subcommand names and flags (e.g. rucio-admin rse add-protocol --hostname ...). If the guide wants to support both, it must explicitly split “rucio (click)” vs “rucio-admin (legacy)” examples.

5) RSE inspection / “Quick reference” command names are wrong (or at least not the ones documented)

Inaccurate / misleading

  • rucio rse info RSE_NAME
  • rucio rse protocol list RSE_NAME
  • rucio rse usage RSE_NAME

What’s correct directionally

  • For the click-based client, rucio rse show is the documented command for inspecting an RSE.
  • For usage specifically, the documented client command is rucio list-rse-usage RSE_NAME.

6) Python API examples: wrong method for settings + missing required arguments

Inaccurate

  • Setting rse_type via add_rse_attribute('RSE', 'rse_type', ...).
  • Setting verify_checksum via add_rse_attribute(...).
  • Calling AccountLimitClient.set_account_limit(account, rse, bytes_) without a locality.
  • Passing boolean-ish values as strings (e.g. 'False') instead of actual booleans.

What’s correct

  • Set RSE settings via:
    • RSEClient.add_rse('RSE', rse_type='TAPE', ...), or
    • RSEClient.update_rse('RSE', {'rse_type': 'TAPE', 'verify_checksum': False, ...})
  • Use add_rse_attribute for attributes like lfn2pfn_algorithm, fts, archive_timeout, greedyDeletion, etc.
  • Account limits:
    • set_account_limit(account, rse, bytes_, locality) where locality is 'local' or 'global', or use the convenience methods set_local_account_limit(...) / set_global_account_limit(...).

7) Quota sizes: the guide’s TB/PB byte counts are TiB/PiB, not what Rucio parses as “TB/PB”

Inaccurate

  • 1 TB = 1099511627776 bytes
  • 1 PB = 1125899906842624 bytes

What’s correct

  • Rucio’s get_bytes_value_from_string uses decimal multipliers:
    • TB10**12
    • PB10**15
  • Therefore:
    • 1 TB = 1000000000000 bytes
    • 1 PB = 1000000000000000 bytes
  • If you want binary units, label them as TiB/PiB or provide explicit byte counts.

8) Tape-specific: archive_timeout is not “file staging (stage-in)”

Inaccurate

  • archive_timeout ... maximum time for file staging”

What’s correct

  • archive_timeout is used for transfers with a tape destination to control how long the FTS transfer manager waits for archival completion (terminal FAILED/FINISHED states). It does not control stage-in / bring-online.

9) Example attributes likely deployment-specific / outdated (or at least not part of the documented core set)

Potentially misleading

  • Presenting backend_type and storage_usage_tool as core, required configuration for a POSIX RSE.

Safer / correct framing

  • These keys are not part of the documented RSE settings/attributes list in current upstream docs.
  • If you keep them, document them explicitly as deployment-specific (i.e., only meaningful if your site runs custom integrations which read them).

10) Minor wording-level inaccuracies (optional to fix, but improves correctness)

  • “POSIX RSEs cannot be accessed from remote machines”
    • More accurate: file:///POSIX access works from any machine which can see the filesystem path (e.g., via a shared mount). It’s not inherently “single-node”, it’s “same filesystem namespace”.

11) Non-deterministic RSEs: naming_convention is not the PFN naming algorithm

Inaccurate / misleading

  • Implying that setting naming_convention defines how PFNs are generated for a non-deterministic RSE.

What’s correct

  • Non-deterministic behaviour is controlled by the RSE’s deterministic flag (e.g. creating the RSE as non-deterministic).
  • naming_convention is documented as a policy algorithm used to validate DIDs on an RSE; it is not the LFN→PFN mapping algorithm (that role belongs to lfn2pfn_algorithm for deterministic RSEs).

The RSE settings are set separately using `rucio.RSEClient.update_rse` or `rucio rse update`, and specifies RSE configuration used by the Rucio instance.
The RSE settings are set separately using `rucio.RSEClient.update_rse` or `rucio rse update`, and specify RSE configuration used by the Rucio instance.
Mutable settings are `deterministic`, `rse_type`, `staging_area`, `volatile`, `qos_class`, `availability_delete`, `availability_read`, `availability_write`, `city`, `country_name`, `latitude`, `longitude`, `region_code`, and `time_zone`.
Geographic fields (`city`, `country_name`, `latitude`, `longitude`, `region_code`, `time_zone`) can also be set at RSE creation time via API parameters, though they are stored and returned as part of the RSESettingsDict structure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording here is really confusing (and sort of redundant). Do we mention the RSESettingsDict anywhere else? Is this construct useful for an operator to use or is just part of how they're displayed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the last part of the sentence.

- **fts**: String. Specify the REST API URL of the FTS3 transfer manager. No default.
- **greedyDeletion**: Boolean. Allow files without a rule locking them to be deleted by a Reaper Daemon. Default behavior only marks a file for deletion when there is no space on an RSE for a new required file. Default: `False`.
- **group_by_rse_attribute**: String. Control the RSE attribute (such as `country`) which transfer source RSEs will be grouped by when determining an appropriate transfer source. Default: `UNKNOWN`.
- **lfn2pfn_algorithm**: String. Name of the algorithm to be used for generating paths to files on **deterministic RSEs**. Must be defined in the configured policy package. If not set, defaults to `lfn2pfn_algorithm_default` from the `[policy]` section of the config file. Common values: `identity`, `hash`. Default: `default`. Note: This attribute is also included in the RSE settings dictionary when protocols are retrieved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the re-ordering just to put it in alphabetic order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the edits were to standardise the format (name:type. instead of name:type:). This one specifically was a leftover, I've removed it since lfn2pfn_algorithm is an attribute


# Set backend type
rucio rse attribute add POSIX_RSE --key backend_type --value POSIX
rucio rse attribute add --key backend_type --value POSIX POSIX_RSE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, placement of the RSE name here doesn't matter so either approach is fine (be it rucio rse attribute add POSIX_RSE --key backend_type --value POSIX or rucio rse attribute add --key backend_type --value POSIX POSIX_RSE)

I don't think we have a consistent style for how we write CLI commands bc it's sort of a free-for-all. I guess adding to the comment on 149 to say "Add the RSE, named POSIX_RSE" would avoid all possible confusion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I looked at the help from the CLI and since it looks like:

Usage: rucio rse add [OPTIONS] RSE_NAME

So I've preferred to stick to the order that it's in there. Good to know that order doesn't matter in any case.

- `rse_type` is set to `TAPE`
- `archive_timeout` attribute specifies maximum time for file staging (86400 = 24 hours)
- `rse_type` is set to `TAPE` at creation time
- `archive_timeout` attribute controls maximum time for tape archival operations (86400 = 24 hours)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"file staging" is more precise. There are steps taken after the file is staged as part of the archiving process that Rucio wouldn't know about

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rolled back to the previous phrasing

- Tape systems with internal file organization requiring metadata-driven naming
- Storage systems that don't support directory structures
- Systems requiring custom naming schemes
- Systems where physical file names must be generated using business logic beyond scope/name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Business logic" here feels sticky for a reason I can't place. Either way, this says functionally the same thing as point 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the bullet point

For non-deterministic RSEs:
1. Physical file paths must be registered explicitly during replication
2. Files cannot be uploaded directly by clients (only via replication)
3. The `naming_convention` attribute must reference a `non_deterministic_pfn` algorithm in your policy package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be a good time to link the policy package docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


- Set reasonable `archive_timeout` values (24 hours recommended)
- Use staging-aware clients
- Set reasonable `archive_timeout` values (24 hours recommended). This controls how long FTS waits for tape archival operations to complete
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Points again to this comment - a5a0b21#r2760241802

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# 2. Add attributes
rse_client.add_rse_attribute('RSE_NAME', 'rse_type', 'DISK')
rse_client.add_rse_attribute('RSE_NAME', 'attribute_name', 'attribute_value')
rse_client.add_rse_attribute('RSE_NAME', 'lfn2pfn_algorithm', 'identity')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this comment is so late in the review process. Would it be better to use full arg names so it's clear what is being set and how here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, done.

### RSE Settings vs Attributes

RSE attributes are key-value pairs that control the behavior and capabilities of an RSE. They define how Rucio interacts with the storage system.
Rucio distinguishes between **RSE settings** (properties of the RSE record itself) and **RSE attributes** (key-value metadata).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSE "record"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, with record I wanted to say the RSE itself, but doesn't make much sense and it's redundant. Removed.

:::

An exhaustive list of RSE attributes can be found in the [RSE attributes page](configuration_parameters/#rse-attributes).
An exhaustive list of RSE attributes can be found in the [configuration parameters page](configuration_parameters/#rse-settings).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linking to the settings section but talking about attributes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, I forgot to update it after copying the line from above. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add RSE operator docs

5 participants