Skip to content

Progressive File Layoutย #16

@obilaniu

Description

@obilaniu

A few weeks back at the Stammtisch, the idea of a Progressive File Layout (PFL) implementation (ร  la Lustre but less complicated) was raised. The user that raised the topic hasn't made a feature request yet here so I am taking the initiative.

  1. Absent PFL, when a filesystem is configured, a choice of default striping must be made, and thus a compromise.
    • If no striping is done (stripe=1) then one single large file downloaded onto the filesystem will unbalance the storage targets and all accesses to it will be directed to one storage target. On the other hand, smaller files perform better.
    • If striping is done (stripe>1) then a stripe count and size might be found to ease the burden of any one large file on the filesystem, but it might penalize small files also on the filesystem because of more targets to be contacted to piece them together.
      • For example, the Canadian national clusters managed by Digital Research Alliance Canada configure their Lustre with a default PFL of 1x (no) striping [0, 128MiB) and 2x1MiB striping for the range [128MiB, end), or suchlike.
  2. Certain files have internal structure (such as a read-mostly header, followed by parallel-access data areas) that could benefit from different striping schemes.
  3. Currently, it is not possible to migrate in-place a file from one striping scheme to another. A "deep" copy is required, taking double the space temporarily. Such space may not be available, thus also requiring a more convoluted migration process.

What I proposed at the Stammtisch is a simplified variant of PFL with 2 (+1) zones. A user would be able to define two zones:

  • A "header" from offset 0 to offset +X blocks and
  • A "payload" from offset +X to the end of the file.

each with independent stripe count/size.

The additional (+1) zone would be a filesystem-internal zone, not visible to the user, whose utility would be in guaranteeing that an in-place, server-side migration between arbitrary two-zone striping schemes can always be performed safely. That would be achieved by gradually rewriting the file from one scheme to another, chunk-by-chunk, never fully duplicating the file and atomically updating with every chunk the updated "true" PFL until it matches the target PFL.

This would address the target-unbalance issue and the performance issues; two zones ought to cover most use-cases; and also enable restriping without deep-copying.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgathering-requirementsIssues open for collecting feedback, ideas, and requirements for potential future implementations.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions