Skip to content

Conversation

@geky
Copy link
Member

@geky geky commented May 29, 2025

                  ####                                 
               ######## #####    #####                 
             ################# ########  ####          
       #### ### ############################## ####    
     #############-#### ##_#########################   
    ########'#########_\'|######_####################  
     ##### ##########_.. \ .'#_._#### #/#_##########   
   ################_    \ v .'_____#_.'.'_ ########### 
  ################_ "'. |  .""  _________ '"-##########
  ########"#####..--.  /    .-"'|  | |   '"-####'######
    #############    \     /  .---.|_|_    ########### 
     ###########     |    /  -|        |-   ########   
                     )    |  -|littlefs|-              
                     |    |  -|   v3   |-              
                     |    |   '--------'               
                     |   ||     ' ' '                  
       -------------/-/---\\----------------------     

Note: v3-alpha discussion (#1114)

Unfortunately GitHub made a complete mess of the PR discussion. To try to salvage things, please use #1114 for new comments. Feedback/criticism are welcome and immensely important at this stage.

Table of contents ^

  1. Hello!
  2. Wait, a disk breaking change?
  3. What's new?
    1. Implemented
      1. Efficient metadata compaction
      2. Efficient random writes
      3. Better logging: No more sync-padding issues
      4. Efficient inline files, no more RAM constraints
      5. Independent file caches
      6. Easier logging APIs: lfs3_file_fruncate
      7. Sparse files
      8. Efficient file name lookup
      9. A simpler/more robust metadata tree
      10. A well-defined sync model
      11. Stickynotes, no more 0-sized files
      12. A new and improved compat flag system
      13. Error detection! - Global-checksums
      14. Better traversal APIs
      15. Incremental GC
      16. Better recovery from runtime errors
      17. Standard custom attributes
      18. More tests!
      19. Simple key-value APIs
    2. Planned
      1. Efficient block allocation, via optional on-disk block-map (bmap)
      2. Bad block tracking
      3. Pre-erased block tracking
      4. Error correction! - Metadata redundancy
      5. Error correction! - Data redundancy
      6. Transparent block deduplication
    3. Stretch goals
      1. lfs3_migrate for v2->v3 migration
      2. 16-bit and 64-bit variants
      3. Config API rework
      4. Block device API rework
      5. Custom attr API rework
      6. Alternative (cheaper) write-strategies
      7. Advanced file tree operations
      8. Advanced file copy-on-write operations
      9. Reserved blocks to prevent CoW lockups
      10. Metadata checks to prevent metadata lockups
      11. Integrated block-level ECC
      12. Disk-level RAID
    4. Out-of-scope (for now)
      1. Alternative checksums
      2. Feature-limited configurations for smaller code/stack sizes
      3. lfs3_file_openat for dir-relative APIs
      4. lfs3_file_openn for non-null-terminated-string APIs
      5. Transparent compression
      6. Filesystem shrinking
      7. High-level caches
      8. Symbolic links
      9. 100% line/branch coverage
  4. Code/stack size
    1. Runtime error recovery
    2. B-tree flexibility
    3. Traversal inversion
  5. Benchmarks
    1. Simulated benchmarks
      1. Linear writes
      2. Random writes
      3. Logging
  6. Funding
  7. Next steps

Hello! ^

Hello everyone! As some of you may have already picked up on, there's been a large body of work fermenting in the background for the past couple of years. Originally started as an experiment to try to solve littlefs's $O(n^2)$ metadata compaction, this branch eventually snowballed into more-or-less a full rewrite of the filesystem from the ground up.

There's still several chunks of planned work left, but now that this branch has reached on-disk feature parity with v2, there's nothing really stopping it from being merged eventually.

So I figured it's a good time to start calling this v3, and put together a public roadmap.

NOTE: THIS WORK IS INCOMPLETE AND UNSTABLE

Here's a quick TODO list of planned work before stabilization. More details below:

  • Test framework rework (merged, mostly)
  • Rbyds
  • B-trees
  • M-tree
  • B-shrubs
  • Fruncate
  • Sync model rework
  • Stickynotes
  • Traversal API rework
  • Incremental GC
  • Global-checksums
  • Key-value APIs
  • On-disk block-map (in progress)
  • Metadata redundancy
  • Data redundancy
  • Document, document, document
  • Stretch goals

This work may continue to break the on-disk format.

That being said, I highly encourage others to experiment with v3 where possible. Feedback is welcome, and immensely important at this stage. Once it's stabilized, it's stabilized.

To help with this, the current branch uses a v0.0 as its on-disk version to indicate that it is experimental. When it is eventually released, v3 will reject this version and fail to mount.

Unfortunately, the API will be under heavy flux during this period.

A note on benchmarking: The on-disk block-map is key for scalable allocator performance, so benchmarks at this stage needs to be taken with a grain of salt when many blocks are involved. Please refer to this version as "v3 (no bmap)" or something similar in any published benchmarks until this work is completed.

Wait, a disk breaking change? ^

Yes. v3 breaks disk compatibility from v2.

I think this is a necessary evil. Attempting to maintain backwards compatibility has a heavy cost:

  1. Development time - The littlefs team is ~1 guy, and v3 has already taken ~2.5 years. The extra work to make everything compatible would stretch this out much longer and likely be unsustainable.

  2. Code cost - The goal of littlefs is to be, well, little. This is unfortunately in conflict with backwards compatibility.

    Take the new B-tree data-structure, for example. It would be easy to support both B-tree and CTZ skip-list files, but now you need ~2x the code. This cost gets worse for the more enmeshed features, and potentially exceeds the cost of just including both v3 and v2 in the codebase.

So I think it's best for both littlefs as a project and long-term users to break things here.

Note v2 isn't going anywhere! I'm happy to continue maintaining the v2 branch, merge bug fixes when necessary, etc. But the economic reality is my focus will be shifting to v3.

What's new ^

Ok, with that out of the way, what does breaking everything actually get us?

Implemented: ^

  • Efficient metadata compaction: $O(n^2) \rightarrow O(n \log n)$ ^

    v3 adopts a new metadata data-structure: Red-black-yellow Dhara trees (rbyds). Based on the data-structure invented by Daniel Beer for the Dhara FTL, rbyds extend log-encoded Dhara trees with self-balancing and self-counting (also called order-statistic) properties.

    This speeds up most metadata operations, including metadata lookup ( $O(n) \rightarrow O(\log n)$ ), and, critically, metadata compaction ( $O(n^2) \rightarrow O(n \log n)$ ).

    This improvement may sound minor on paper, but it's a difference measured in seconds, sometimes even minutes, on devices with extremely large blocks.

  • Efficient random writes: $O(n) \rightarrow O(\log_b^2 n)$ ^

    A much requested feature, v3 adopts B-trees, replacing the CTZ skip-list that previously backed files.

    This avoids needing to rewrite the entire file on random reads, bringing the performance back down into tractability.

    For extra cool points, littlefs's B-trees use rbyds for the inner nodes, which makes CoW updates much cheaper than traditional array-packed B-tree nodes when large blocks are involved ( $O(n) \rightarrow O(\log n)$ ).

  • Better logging: No more sync-padding issues ^

    v3's B-trees support inlining data directly in the B-tree nodes. This gives us a place to store data during sync, without needing to pad things for prog alignment.

    In v2 this padding would force the rewriting of blocks after sync, which had a tendency to wreck logging performance.

  • Efficient inline files, no more RAM constraints: $O(n^2) \rightarrow O(n \log n)$ ^

    In v3, B-trees can have their root inlined in the file's mdir, giving us what I've been calling a "B-shrub". This, combined with the above inlined leaves, gives us a much more efficient inlined file representation, with better code reuse to boot.

    Oh, and B-shrubs also make small B-trees more efficient by avoiding the extra block needed for the root.

  • Independent file caches ^

    littlefs's pcache, rcache, and file caches can be configured independently now. This should allow for better RAM utilization when tuning the filesystem.

  • Easier logging APIs: lfs3_file_fruncate ^

    Thanks to the new self-counting/order-statistic properties, littlefs can now truncate from both the end and front of files via the new lfs3_file_fruncate API.

    Before, the best option for logging was renaming log files when they filled up. Now, maintaining a log/FIFO is as easy as:

    lfs3_file_write(&lfs, &file, entry, entry_size) => entry_size;
    lfs3_file_fruncate(&lfs, &file, log_size) => 0;
  • Sparse files ^

    Another advantage of adopting B-trees, littlefs can now cheaply represent file holes, where contiguous runs of zeros can be implied without actually taking up any disk space.

    Currently this is limited to a couple operations:

    • lfs3_file_truncate
    • lfs3_file_fruncate
    • lfs3_file_seek + lfs3_file_write past the end of the file

    But more advanced hole operations may be added in the future.

  • Efficient file name lookup: $O(n) \rightarrow O(\log_b n)$ ^

    littlefs now uses a B-tree (yay code reuse) to organize files by file name. This allows for much faster file name lookup than the previous linked-list of metadata blocks.

  • A simpler/more robust metadata tree ^

    As a part of adopting B-trees for metadata, the previous threaded file tree has been completely ripped out and replaced with one big metadata tree: the M-tree.

    I'm not sure how much users are aware of it, but the previous threaded file tree was a real pain-in-the-ass with the amount of bugs it caused. Turns out having a fully-connected graph in a CoBW filesystem is a really bad idea.

    In addition to removing an entire category of possible bugs, adopting the M-tree allows for multiple directories in a single metadata block, removing the 1-dir = 1-block minimum requirement.

  • A well-defined sync model ^

    One interesting thing about littlefs, it doesn't have a strictly POSIX API. This puts us in a relatively unique position, where we can explore tweaks to the POSIX API that may make it easer to write powerloss-safe applications.

    To leverage this (and because the previous sync model had some real problems), v3 includes a new, well-defined sync model.

    I think this discussion captures most of the idea, but for a high-level overview:

    1. Open file handles are strictly snapshots of the on-disk state. Writes to a file are copy-on-write (CoW), with no immediate affect to the on-disk state or any other file handles.

    2. Syncing or closing an in-sync file atomically updates the on-disk state and any other in-sync file handles.

    3. Files can be desynced, either explicitly via lfs3_file_desync, or because of an error. Desynced files do not recieve sync broadcasts, and closing a desynced file has no affect on the on-disk state.

    4. Calling lfs3_file_sync on a desynced file will atomically update the on-disk state, any other in-sync file handles, and mark the file as in-sync again.

    5. Calling lfs3_file_resync on a file will discard its current contents and mark the file as in-sync. This is equivalent to
      closing and reopening the file.

  • Stickynotes, no more 0-sized files ^

    As an extension of the littlefs's new sync model, v3 introduces a new file type: LFS3_TYPE_STICKYNOTE.

    A stickynote represents a file that's in the awkward state of having been created, but not yet synced. If you lose power, stickynotes are hidden from the user and automatically cleaned up on the next mount.

    This avoids the 0-sized file issue, while still allowing most of the POSIX interactions users expect.

  • A new and improved compat flag system ^

    v2.1 was a bit of a mess, but it was a learning experience. v3 still includes a global version field, but also includes a set of compat flags that allow non-linear addition/removal of future features.

    These are probably familiar to users of Linux filesystems, though I've given them slightly different names:

    • rcompat flags - Must understand to read the filesystem (incompat_flags)
    • wcompat flags - Must understand to write to the filesystem (ro_compat_flags)
    • ocompat flags - No understanding necessary (compat_flags)

    This also provides an easy route for marking a filesystem as read-only, non-standard, etc, on-disk.

  • Error detection! - Global-checksums ^

    v3 now supports filesystem-wide error-detection. This is actually quite tricky in a CoBW filesystem, and required the invention of global-checksums (gcksums) to prevent rollback issues caused by naive checksumming.

    With gcksums, and a traditional Merkle-tree-esque B-tree construction, v3 now provides a filesystem-wide self-validating checksum via lfs3_fs_cksum. This checksum can be stored external to the filesystem to provide protection against last-commit rollback issues, metastability, or just for that extra peace of mind.

    Funny thing about checksums. It's incredibly cheap to calculate checksums when writing, as we're already processing that data anyways. The hard part is, when do you check the checksums?

    This is a problem that mostly ends up on the user, but to help, v3 adds a large number checksum checking APIs (probably too many if I'm honest):

    • LFS3_M_CKMETA/CKDATA - Check checksums during mount
    • LFS3_O_CKMETA/CKDATA - Check checksums during file open
    • lfs3_fs_ckmeta/ckdata - Explicitly check all checksums in the filesystem
    • lfs3_file_ckmeta/ckdata - Explicitly check a file's checksums
    • LFS3_T_CKMETA/CKDATA - Check checksums incrementally during a traversal
    • LFS3_GC_CKMETA/CKDATA - Check checksums during GC operations
    • LFS3_M_CKPROGS - Closed checking of data during progs
    • LFS3_M_CKFETCHES - Optimistic (not closed) checking of data during fetches
    • LFS3_M_CKREADS (planned) - Closed checking of data during reads
  • Better traversal APIs ^

    The traversal API has been completely reworked to be easier to use (both externally and internally).

    No more callback needed, blocks can now be iterated over via the dir-like lfs3_trv_read function.

    Traversals can also perform janitorial work and check checksums now, based on the flags provided to lfs3_trv_open.

  • Incremental GC ^

    GC work can now be accomplished incrementally, instead of requiring one big go. This is managed by lfs3_fs_gc, cfg.gc_flags, and cfg.gc_steps.

    Internally, this just shoves one of the new traversal objects into lfs3_t. It's equivalent to managing a traversal object yourself, but hopefully makes it easier to write library code.

    However, this does add a significant chunk of RAM to lfs3_t, so GC is now an opt-in feature behind the LFS3_GC ifdef.

  • Better recovery from runtime errors ^

    Since we're already doing a full rewrite, I figured let's actually take the time to make sure things don't break on exceptional errors.

    Most in-RAM filesystem state should now revert to the last known-good state on error.

    The one exception involves file data (not metadata!). Reverting file data correctly turned out to roughly double the cost of files. And now that you can manual revert with lfs3_file_resync, I figured this cost just isn't worth it. So file data remains undefined after an error.

    In total, these changes add a significant amount of code and stack, but I'm of the opinion this is necessary for the maturing of littlefs as a filesystem.

  • Standard custom attributes ^

    Breaking disk gives us a chance to reserve attributes 0x80-0xbf for future standard custom attributes:

    • 0x00-0x7f - Free for user-attributes (uattr)
    • 0x80-0xbf - Reserved for standard-attributes (sattr)
    • 0xc0-0xff - Encouraged for system-attributes (yattr)

    In theory, it was technically possible to reserve these attributes without a disk-breaking change, but it's much safer to do so while we're already breaking the disk.

    v3 also includes the possibility of extending the custom attribute space from 8-bits to ~25-bits in the future, but I'd hesitate to to use this, as it risks a significant increase in stack usage.

  • More tests! ^

    v3 comes with a couple more tests than v2 (+~6812.2%):

         suites  cases  permutations ;     pls   runtime
    v2:      22    198         11641 ;   28741    54.16s
    v3:      23    784        804655 ; 2228513  1323.18s
    

    You may or may not have seen the test framework rework that went curiously under-utilized. That was actually in preparation for the v3 work.

    The goal is not 100% line/branch coverage, but just to have more confidence in littlefs's reliability.

  • Simple key-value APIs ^

    v3 includes a couple easy-to-use key-value APIs:

    • lfs3_get - Get the contents of a file
    • lfs3_size - Get the size of a file
    • lfs3_set - Set the contents of a file
    • lfs3_remove - Remove a file (this one already exists)

    This API is limited to files that fit in RAM, but if it fits your use case, you can disable the full file API with LFS3_KVONLY to save some code.

    If your filesystem fits in only 2 blocks, you can also define LFS3_2BONLY to save more code.

    These can be useful for creating small key-value stores on systems that already use littlefs for other storage.

Planned: ^

  • Efficient block allocation, via optional on-disk block-map (bmap) ^

    The one remaining bottleneck in v3 is block allocation. This is a tricky problem for littlefs (and any CoW/CoBW filesystem), because we don't actually know when a block becomes free.

    This is in-progress work, but the solution I'm currently looking involves 1. adding an optional on-disk block map (bmap) stored in gstate, and 2. updating it via tree diffing on sync. In theory this will drop huge file writes: $O(n^2 \log n) \rightarrow O(n \log_b^2 n)$

    There is also the option of using the bmap as a simple cache, which doesn't avoid the filesystem-wide scan but at least eliminates the RAM constraint of the lookahead buffer.

    As a plus, we should be able to leverage the self-counting property of B-trees to make the on-disk bmap compressible.

  • Bad block tracking ^

    This is a much requested feature, and adding the optional on-disk bmap finally gives us a place to track bad blocks.

  • Pre-erased block tracking ^

    Just like bad-blocks, the optional on-disk bmap gives us a place to track pre-erased blocks. Well, at least in theory.

    In practice it's a bit more of a nightmare. To avoid multiple progs, we need to mark erased blocks as unerased before progging. This introduces an unbounded number of catch-22s when trying to update the bmap itself.

    Fortunately, if instead we store a simple counter in the bmap's gstate, we can resolve things at the mrootanchor worst case.

  • Error correction! - Metadata redundancy ^

    Note it's already possible to do error-correction at the block-device level outside of littlefs, see ramcrc32cbd and ramrsbd for examples. Because of this, integrating in-block error correction is low priority.

    But I think there's potential for cross-block error-correction in addition to the in-block error-correction.

    The plan for cross-block error-correction/block redundancy is a bit different for metadata vs data. In littlefs, all metadata is logs, which is a bit of a problem for parity schemes. I think the best we can do is store metadata redundancy as naive copies.

    But we already need two blocks for every mdir, one usually just sits unused when not compacting. This, combined with metadata usually being much smaller than data, makes the naive scheme less costly than one might expect.

  • Error correction! - Data redundancy ^

    For raw data blocks, we can be a bit more clever. If we add an optional dedup tree for block -> parity group mapping, and an optional parity tree for parity blocks, we can implement a RAID-esque parity scheme for up to 3 blocks of data redundancy relatively cheaply.

  • Transparent block deduplication ^

    This one is a bit funny. Originally block deduplication was intentionally out-of-scope, but it turns out you need something that looks a lot like a dedup tree for error-correction to work in a system that allows multiple block references.

    If we already need a virtual -> physical block mapping for error correction, why not make the key the block checksum and get block deduplication for free?

    Though if this turns out to not be as free as I think it is, block deduplication will fall out-of-scope.

Stretch goals: ^

These may or may not be included in v3, depending on time and funding:

  • lfs3_migrate for v2->v3 migration ^

  • 16-bit and 64-bit variants ^

  • Config API rework ^

  • Block device API rework ^

  • Custom attr API rework ^

  • Alternative (cheaper) write-strategies (write-once, global-aligned, eager-crystallization) ^

  • Advanced file tree operations (lfs3_file_punchhole, lfs3_file_insertrange, lfs3_file_collapserange, LFS3_SEEK_DATA, LFS3_SEEK_HOLE) ^

  • Advanced file copy-on-write operations (shallow lfs3_cowcopy + opportunistic lfs3_copy) ^

  • Reserved blocks to prevent CoW lockups ^

  • Metadata checks to prevent metadata lockups ^

  • Integrated block-level ECC (ramcrc32cbd, ramrsbd) ^

  • Disk-level RAID (this is just data redund + a disk aware block allocator) ^

Out-of-scope (for now): ^

If we don't stop somewhere, v3 will never be released. But these may be added in the future:

  • Alternative checksums (crc16, crc64, sha256, etc) ^

  • Feature-limited configurations for smaller code/stack sizes (LFS3_NO_DIRS, LFS3_KV, LFS3_2BLOCK, etc) ^

  • lfs3_file_openat for dir-relative APIs ^

  • lfs3_file_openn for non-null-terminated-string APIs ^

  • Transparent compression ^

  • Filesystem shrinking ^

  • High-level caches (block cache, mdir cache, btree leaf cache, etc) ^

  • Symbolic links ^

  • 100% line/branch coverage ^

Code/stack size ^


littlefs v1, v2, and v3, 1 pixel ~= 1 byte of code, click for a larger interactive codemap (commit)


littlefs v2 and v3 rdonly, 1 pixel ~= 1 byte of code, click for a larger interactive codemap (commit)

Unfortunately, v3 is a little less little than v2:

            code            stack           ctx
v2:        17144             1440           580
v3:        37352 (+117.9%)   2280 (+58.3%)  636 (+9.7%)
            code            stack           ctx
v2-rdonly:  6270              448           580
v3-rdonly: 10616 (+69.3%)     808 (+80.4%)  508 (-12.4%)

On one hand, yes, more features generally means more code.

And it's true there's an opportunity here to carve out more feature-limited builds to save code/stack in the future.

But I think it's worth discussing some of the other reasons for the code/stack increase:

  1. Runtime error recovery ^

    Recovering from runtime errors isn't cheap. We need to track both the before and after state of things during fallible operations, and this adds both stack and code.

    But I think this is necessary for the maturing of littlefs as a filesystem.

    Maybe it will make sense to add a sort of LFS3_GLASS mode in the future, but this is out-of-scope for now.

  2. B-tree flexibility ^

    The bad news: The new B-tree files are extremely flexible. Unfortunately, this is a double-edged sword.

    B-trees, on their own, don't add that much code. They are a relatively poetic data-structure. But deciding how to write to a B-tree, efficiently, with an unknown write pattern, is surprisingly tricky.

    The current implementation, what I've taken to calling the "lazy-crystallization algorithm", leans on the more complicated side to see what is possible performance-wise.

    The good news: The new B-tree files are extremely flexible.

    There's no reason you need the full crystallization algorithm if you have a simple write pattern, or don't care as much about performance. This will either be a future or stretch goal, but it would be interesting to explore alternative write-strategies that could save code in these cases.

  3. Traversal inversion ^

    Inverting the traversal, i.e. moving from a callback to incremental state machine, adds both code and stack as 1. all of the previous on-stack state needs to be tracked explicitly, and 2. we now need to worry about what happens if the filesystem is modified mid-traversal.

    In theory, this could be reverted if you don't need incremental traversals, but extricating incremental traversals from the current codebase would be an absolute nightmare, so this is out-of-scope for now.

Benchmarks ^

A note on benchmarking: The on-disk block-map is key for scalable allocator performance, so benchmarks at this stage needs to be taken with a grain of salt when many blocks are involved. Please refer to this version as "v3 (no bmap)" or something similar in any published benchmarks until this work is completed.

First off, I would highly encourage others to do their own benchmarking with v3/v2. Filesystem performance is tricky to measure because it depends heavily on your application's write pattern and hardware nuances. If you do, please share in this thread! Others may find the results useful, and now is the critical time for finding potential disk-related performance issues.

Simulated benchmarks ^

To test the math behind v3, I've put together some preliminary simulated benchmarks.

Note these are simulated and optimistic. They do not take caching or hardware buffers into account, which can have a big impact on performance. Still, I think they provide at least a good first impression of v3 vs v2.

To find an estimate of runtime, I first measured the amount of bytes read, progged, and erased, and then scaled based on values found in relevant datasheets. The options here were a bit limited, but WinBond fortunately provides runtime estimates in the datasheets on their website:

  • NOR flash - w25q64jv

  • NAND flash - w25n01gv

  • SD/eMMC - Also w25n01gv, assuming a perfect FTL

    I said optimistic, didn't I? I could't find useful estimates for SD/eMMC, so I'm just assuming a perfect FTL here.

These also assume an optimal bus configuration, which, as any embedded engineer knows, is often not the case.

Full benchmarks here: https://benchmarks.littlefs.org (repo, commit)

And here are the ones I think are the most interesting:

Note that SD/eMMC is heavily penalized by the lack of on-disk block-map! SD/eMMC breaks flash down into many small blocks, which tends to make block allocator performance dominate.

  1. Linear writes, where we write a 1 MiB file and don't call sync until closing the file. ^

    This one is the most frustrating to compare against v2. CTZ skip-lists are really fast at appending! The problem is they are only fast at appending:

    (commit, reads, progs, erases, sim, usage)

  2. Random writes, note we start with a 1MiB file. ^

    As expected, v2 is comically bad at random writes. v3 is indistinguishable from zero in the NOR case:

    (commit, reads, progs, erases, sim, usage)

  3. Logging, write 4 MiB, but limit the file to 1 MiB. ^

    In v2 this is accomplished by renaming the file, in v3 we can leverage lfs3_file_fruncate.

    v3 performs significantly better with large blocks thanks to avoiding the sync-padding problem:

    (commit, reads, progs, erases, sim, usage)

Funding ^

If you think this work is worthwhile, consider sponsoring littlefs. Current benefits include:

  1. Being able to complain about v3 not being released yet
  2. Being able to complain about the disk breakage v2 -> v3

I joke, but I truly appreciate those who have contributed to littlefs so far. littlefs, in its current form, is a mostly self-funded project, so every little bit helps.

If you would like to contribute in a different way, or have other requests, feel free to reach me at geky at geky.net.

As stabilization gets closer, I will also be open to contract work to help port/integrate/adopt v3. If this is interesting to anyone, let me know.

Thank you @micropython, @fusedFET for sponsoring littlefs, and thank you @Eclo, @kmetabg, and @nedap for your past sponsorships!

Next steps ^

For me, I think it's time to finally put together a website/wiki/discussions/blog. I'm not sure on the frequency quite yet, but I plan to write/publish the new DESIGN.md in chapters in tandem with the remaining work.

EDIT: Pinned codemap/plot links to specific commits via benchmarks.littlefs.org/tree.html
EDIT: Updated with rdonly code/stack sizes
EDIT: Added link to #1114
EDIT: Implemented simple key-value APIs
EDIT: Added lfs3_migrate stretch goal with link to #1120
EDIT: Adopted lfs3_traversal_t -> lfs3_trv_t rename
EDIT: Added link to #1125 to clarify "feature parity"

geky added 30 commits May 15, 2025 14:28
Globs in CLI attrs (-L'*=bs=%(bs)s' for example), have been remarkably
useful. It makes sense to extend this to the other flags that match
against CSV fields, though this does add complexity to a large number of
smaller scripts.

- -D/--define can now use globs when filtering:

    $ ./scripts/code.py lfs.o -Dfunction='lfsr_file_*'

  -D/--define already accepted a comma-separated list of options, so
  extending this to globs makes sense.

  Note this differs from test.py/bench.py's -D/--define. Globbing in
  test.py/bench.py wouldn't really work since -D/--define is generative,
  not matching. But there's already other differences such as integer
  parsing, range, etc. It's not worth making these perfectly consistent
  as they are really two different tools that just happen to look the
  same.

- -c/--compare now matches with globs when finding the compare entry:

    $ ./scripts/code.py lfs.o -c'lfs*_file_sync'

  This is quite a bit less useful that -D/--define, but makes sense for
  consistency.

  Note -c/--compare just chooses the first match. It doesn't really make
  sense to compare against multiple entries.

This raised the question of globs in the field specifiers themselves
(-f'bench_*' for example), but I'm rejecting this for now as I need to
draw the complexity/scope _somewhere_, and I'm worried it's already way
over on the too-complex side.

So, for now, field names must always be specified explicitly. Globbing
field names would add too much complexity. Especially considering how
many flags accept field names in these scripts.
So now the hidden variants of field specifiers can be used to manipulate
by fields and field fields without implying a complete field set:

  $ ./scripts/csv.py lfs.code.csv \
          -Bsubsystem=lfsr_file -Dfunction='lfsr_file_*' \
          -fcode_size

Is the same as:

  $ ./scripts/csv.py lfs.code.csv \
          -bfile -bsubsystem=lfsr_file -Dfunction='lfsr_file_*' \
          -fcode_size

Attempting to use -b/--by here would delete/merge the file field, as
cvs.py assumes -b/-f specify all of the relevant field type.

Note that fields can also be explicitly deleted with -D/--define's new
glob support:

  $ ./scripts/csv.py lfs.code.csv -Dfile='*' -fcode_size

---

This solves an annoying problem specific to csv.py, where manipulating
by fields and field fields would often force you to specify all relevant
-b/-f fields. With how benchmarks are parameterized, this list ends up
_looong_.

It's a bit of a hack/abuse of the hidden flags, but the alternative
would be field globbing, which 1. would be a real pain-in-the-ass to
implement, and 2. affect almost all of the scripts. Reusing the hidden
flags for this keeps the complexity limited to csv.py.
This adds __csv__ methods to all Csv* classes to indicate how to write
csv/json output, and adopts Python's default float repr. As a plus, this
also lets us use "inf" for infinity in csv/json files, avoiding
potential unicode issues.

Before this we were reusing __str__ for both table rendering and
csv/json writing, which rounded to a single decimal digit! This made
float output pretty much useless outside of trivial cases.

---

Note Python apparently does some of its own rounding (1/10 -> 0.1?), so
the result may still not be round-trippable, but this is probably fine
for our somewhat hack-infested csv scripts.
Whoops! A missing splat repetition here meant we only ever accepted
floats with a single digit of precision and no e/E exponents.

Humorously this went unnoticed because our scripts were only
_outputting_ single digit floats, but now that that's fixed, float
parsing also needs a fix.

Fixed by allowing >1 digit of precision in our CsvFloat regex.
Before this, the only option for ordering the legend was by specifying
explicit -L/--add-label labels. This works for the most part, but
doesn't cover the case where you don't know the parameterization of the
input data.

And we already have -s/-S flags in other csv scripts, so it makes sense
to adopt them in plot.py/plotmpl.py to allow sorting by one or more
explicit fields.

Note that -s/-S can be combined with explicit -L/--add-labels to order
datasets with the same sort field:

  $ ./scripts/plot.py bench.csv \
          -bBLOCK_SIZE \
          -xn \
          -ybench_readed \
          -ybench_proged \
          -ybench_erased \
          --legend \
          -sBLOCK_SIZE \
          -L'*,bench_readed=bs=%(BLOCK_SIZE)s' \
          -L'*,bench_proged=' \
          -L'*,bench_erased='

---

Unfortunately this conflicted with -s/--sleep, which is a common flag in
the ascii-art scripts. This was bound to conflict with -s/--sort
eventually, so a came up with some alternatives:

- -s/--sleep -> -~/--sleep
- -S/--coalesce -> -+/--coalesce

But I'll admit I'm not the happiest about these...
This was a simple typo. Unfortunately went unnoticed because the
lingering dataset assigned in the above for loop made the results look
mostly correct. Yay.
This should be floor (rounds towards -inf), not int (rounds towards
zero), otherwise sub-integer results get funky:

- floor si(0.00001) => 10u
- int   si(0.00001) => 0.01m

- floor si(0.000001) => 1u
- int   si(0.000001) => m (???)
Whoops, looks like cumulative results were overlooked when multiple
bench measurements per bench were added. We were just adding all
cumulative results together!

This led to some very confusing bench results.

The solution here is to keep track of per-measurement cumulative results
via a Python dict. Which adds some memory usage, but definitely not
enough to be noticeable in the context of the bench-runner.
This prevents runaway O(n^2) behavior on devices with extremely large
block sizes (NAND, bs=~128KiB - ~1MiB).

The whole point of shrubs is to avoid this O(n^2) runaway when inline
files become necessarily large. Setting FRAGMENT_SIZE to a factor of the
BLOCK_SIZE humorously defeats this.

The 512 byte cutoff is somewhat arbitrary, it's the natural BLOCK_SIZE/8
FRAGMENT_SIZE on most NOR flash (bs=4096), but it's probably worth
tuning based on actual device performance.
This adds mattr_estimate, which is basically the same as rattr_estimate,
but assumes weight <= 1:

  rattr tag:
  .---+---+---+- -+- -+- -+- -+---+- -+- -+- -.  worst case: <=11 bytes
  |  tag  | weight            | size          |  rattr est:  <=3t + 4
  '---+---+---+- -+- -+- -+- -+---+- -+- -+- -'              <=37 bytes

  mattr tag:
  .---+---+---+---+- -+- -+- -.                  worst case: <=7 bytes
  |  tag  | w | size          |                  mattr est:  <=3t + 4
  '---+---+---+---+- -+- -+- -'                              <=25 bytes

This may seem like only a minor improvement, but with 3 tags for every
attr, this really adds up. And with our compaction estimate overheads we
need every byte of shaving we can get.

---

This ended up necessary to get littlefs running with 512 byte blocks
again. Now that our compaction overheads are so high, littlefs is having
a hard time fitting even just the filesystem config in a single block:

  mroot estimate 512B before: 246/256
  mroot estimate 512B after:  162/256 (-34.1%)

Whether or not it makes sense to run littlefs with 512 byte blocks is
still an open question, even after this tweak.

Note that even if 512 byte blocks ends up intractable, this doesn't mean
littlefs won't be able to run on SD/eMMC! The configured block_size can
always be a multiple, >=, of the physical block_size, and choosing a
larger block_size completely side-steps this problem.

The new design of littlefs is primarily focused on devices with very
large block sizes, so you may want to use larger block sizes on SD/eMMC
for performance reasons anyways.

---

Code changes were pretty minimal. This does add an additional field to
lfs_t, but it's just a byte and fits into padding with the other small
precomputed constants:

           code          stack          ctx
  before: 35824           2368          636
  after:  35836 (+0.0%)   2368 (+0.0%)  636 (+0.0%)
So:

    $(filter-out %.t.c %.b.c %.a.c,$(wildcard bd/*.c))

Instead of:

    $(filter-out $(wildcard bd/*.t.* bd/*.b.*),$(wildcard bd/*.c))

The main benefit is we no longer need to explicitly specify all
subdirectories, though the single wildcard is a bit less flexible if
test.py/bench.py ever end up with other non-C artifacts.

Unfortunately only a single wildcard is supported in filter-out.
This adds --xlim-stddev and --ylim-stddev as alternatives to -X/--xlim
and -Y/--ylim that define the plot limits in terms of standard
deviations from the mean, instead of in absolute values.

So want to only plot data within +-1 standard deviation? Use:

  $ ./scripts/plot.py --ylim-stddev=-1,+1

Want to ignore outliers >3 standard deviations? Use:

  $ ./scripts/plot.py --ylim-stddev=3

This is very useful for plotting the amortized/per-byte benchmarks,
which have a tendency to run off towards infinity near zero.

Before, we could truncate data explicitly with -Y/--ylim, but this was
getting very tedious and doesn't work well when you don't know what the
data is going to look like beforehand.
Mainly to avoid confusion with littlefs's attrs, uattrs, rattrs, etc.

This risked things getting _really_ confusing as the scripts evolve.
- codemapd3.py -> codemapsvg.py
- dbgbmapd3.py -> dbgbmapsvg.py
- treemapd3.py -> treemapsvg.py

Originally these were named this way to match plotmpl.py, but these
names were misleading. These scripts don't actually use the d3 library,
they're just piles of Python, SVG, and Javascript, modelled after the
excellent d3 treemap examples.

Keeping the *d3.py names around also felt a bit unfair to brendangregg's
flamegraph SVGs, which were the inspiration for the interactive
component. With d3 you would normally expect a rich HTML page, which is
how you even include the d3 library.

plotmpl.py is also an outlier in that it supports both .svg and .png
output. So having a different naming convention in this case makes
sense to me.

So, renaming *d3.py -> *svg.py. The inspiration from d3 is still
mentioned in the top-level comments in the relevant files.
This adds an alternative sync path for small in-cache files, where we
combine the shrub commit with the file sync commit, potentially writing
everything out in a single prog.

This is reminiscent of bmoss (old inlined) files, but notably avoids the
additional on-disk data-structure and extra code necessary to manage it.

---

The motivation for this comes from ongoing benchmarking, where we're
seeing a fairly significant regression in small-file performance on NAND
flash. Especially curious since the whole goal of this work was to make
NAND flash tractable.

But it makes sense: 2 commits are more than 1.

While the separate shrub + sync commits are barely noticeable on NOR
flash, on NAND flash, with its huge >512B prog sizes, the extra commit
is hard to miss.

In theory, the most performant solution would be to merge all bshrub
commits with sync commits whenever possible. This is technically doable,
and may make sense for a more performance-focused littlefs driver, but
it would 1. require an invasive code rewrite, 2. entangle lfsr_file_sync
-> lfsr_file_flush -> lfsr_file_carve, and 3. add even more code.

If we only merge shrub + sync commits when the file fits in the cache,
we can skip lfsr_file_flush, craft a simple shrubcommit by hand, and
avoid all of this mess. While still speeding up the most common write
path for small files.

And sure enough, our bench-many benchmark, which creates ~1000 4 byte
files, shows a ~2x speed improvement on bs=128KiB NAND (basically just
because we compact/split ~5 times instead of ~10 times).

---

Unfortunately the shrub commit requires quite a bit of state to set up,
and in the middle of lfsr_file_sync, one of the more critical functions
on our stack hot-path. So this does have a big cost:

           code          stack          ctx
  before: 35836           2368          636
  after:  35992 (+0.4%)   2408 (+1.7%)  636 (+0.0%)

Though this is also a perfect contender to be compile-time ifdefed. It
may be worth adding something like LFS_NO_MERGESHRUBCOMMITS (better
name?) to claw back some of the cost if you don't care about
performances as much.

This could also probably be a bit cheaper if our file write configs were
organized differently... At the moment we need to check inline_size,
fragment_size, _and_ crystal_thresh since these can sometimes overlap.
But this is waiting on the future config rework.

---

Actually... Looking at this closer, I'm not sure the added commit logic
should really be included in the hot-path cost...

lfsr_file_flush is the hot path, and flush -> sync are sequential
operations that don't really share stack (with the shrub commit we
humorously _never_ call flush). The commit logic is only being dragged
in because our stack measurements are pessimistic about shrinkwrapping,
which is a bit frustrating.

I've explored shrinkwrapping in stack.py before, but the idea pretty
much failed. Unfortunately GCC simply doesn't make this info available
short of parsing the per-arch disassembly.
This adds LFS_NOINLINE, and forces lfsr_file_sync_ (the commit logic in
lfsr_file_sync) off the stack hot-path.

This adds a bit of code, function calls are surprisingly expensive, but
saves a nice big chunk of stack:

           code          stack          ctx
  before: 35992           2408          636
  after:  36016 (+0.1%)   2296 (-4.7%)  636 (+0.0%)

Well, maybe not _real_ stack. The fact that this worked suggests the
real stack usage is less than our measured value.

The reason is because our stack.py script is relatively simple. It just
adds together stack frames based on the callgraph at compile time, which
misses shrinkwrapping and similar optimizations. Unfortunately that sort
of information is simply not available via GCC short of parsing the
disassembly.

But this is the number that will be used for statically allocated stacks,
and of course the number that will probably end up associated with
littlefs, so it still seems like a worthwhile number to "optimize" for.

Maybe in the future this will be different as tooling around stack
measurements improves.

---

The other benefit of moving lfsr_file_sync_ off the hot-path is that we
now no longer incorrectly include the sync commit context in the
hot-path. This tells a much different story for the cost of 1-commit
shrubs:

                     code          stack          ctx
  before 1c-shrubs: 35848           2296          636
  after 1c-shrubs:  36016 (+0.5%)   2296 (+0.0%)  636 (+0.0%)
Maybe it's just habit, but the trailing underscores_ felt far more
useful serving only as a out-pointer/new/biproduct hint. Having trailing
underscores_ serve dual purposes as both a new/biproduct hint and
optional hint just muddies things and makes the hint much less useful.

No code changes.
TLDR: Added file->leaf, which can track file fragments (read only) and
blocks independently from file->b.shrub. This speeds up linear
read/write performance at a heavy code/stack cost.

The jury is still out on if this ends up reverted.

---

This is another change motivated by benchmarking, specifically the
significant regression in linear reads.

The problem is that CTZ skip-lists are actually _really_ good at
appending blocks! (but only appending blocks) The entire state of the
file is contained in the last block, so file writes can resume without
any reads. With B-trees, we need at least 1 B-tree lookup to resume
appending, and this really adds up when writing extremely blocks.

To try to mitigate this, I added file->leaf, a single in-RAM bptr for
tracking the most recent leaf we've operated on. This avoids B-tree
lookups during linear reads, and allowing the leaf to fall out-of-sync
with the B-tree avoids both B-tree lookups and commits during writes.

Unfortunately this isn't a complete win for writes. If we write
fragments, i.e. cache_size < prog_size, we still need to incrementally
commit to the B-tree. Fragments are a bit annoying for caching as any
B-tree commit can discard the block they reside on.

For reading, however, this brings read performance back to roughly the
same as CTZ skip-lists.

---

This also turned into more-or-less a full rewrite of the lfsr_file_flush
-> lfsr_file_crystallize code path, which is probably a good thing. This
code needed some TLC.

file->leaf also replaces the previous eblock/eoff mechanism for
erased-state tracking via the new LFSR_BPTR_ISERASED flag. This should
be useful when exploring more erased-state tracking mechanisms (ddtree).

Unfortunately, all of this additional in-RAM state is very costly. I
think there's some cleanup that can be done (the current impl is a bit
of a mess/proof-of-concept), but this does add a significant chunk of
both code and stack:

           code          stack          ctx
  before: 36016           2296          636
  after:  37228 (+3.4%)   2328 (+1.4%)  636 (+0.0%)

file->leaf also increases the size of lfsr_file_t, but this doesn't show
up in ctx because struct lfs_info dominates:

  lfsr_file_t before: 116
  lfsr_file_t after:  136 (+17.2%)

Hm... Maybe ctx measurements should use a lower LFS_NAME_MAX?
Mostly adding convenience functions to deduplicate code:

- Adopted lfsr_bptr_claim
- Renamed lfsr_file_graft -> lfsr_file_graft_
- Adopted lfsr_file_graft
- Didn't bother with lfsr_file_discardleaf

This saves a bit of code, though not that much in the context of the
file->leaf code cost:

                      code          stack          ctx
  before cleanup:    37228           2328          636
  after:             37180 (-0.1%)   2360 (+1.4%)  636 (+0.0%)

                      code          stack          ctx
  before file->leaf: 36016           2296          636
  after:             37180 (+3.2%)   2360 (+2.8%)  636 (+0.0%)
This is just a bit simpler/more flexible of an API. Taking flags
directly has worked well for similar functions.

This also drops lfsr_*_mkdirty. I think we should keep the mk* names
reserved for heavy-weight filesystem operations.

That being said, this does add a surprising bit of code. I because the
flags end up in literal pools? Doesn't thumb have a bunch of fancy
single-bit immediate encodings?

           code          stack          ctx
  before: 37180           2360          636
  after:  37192 (+0.0%)   2360 (+0.0%)  636 (+0.0%)
These mostly just help with the mess that is:

  file->leaf.bptr.data.u.disk.block

No code changes.
Except for the unknown flag checks. I don't know why but they really
mess with readability there for me. Maybe because the logic matches
english grammar ("is not any of these" vs "is any not of these")?

No code changes.
This sort of abuses the bptr/data type overlap again, taking an explicit
delta along with a list of datas where:

- data_count=-1 => single bptr
- data_count>=0 => list of concatenated fragments

It's a bit of a hack, but the previous rattr argument it replaces was
an arguably worse hack. I figured if we're going to interrogate the
rattr to figure out what type it is, we might as well just make the type
explicit.

Saved a surprising amount of stack! So that's nice:

           code          stack          ctx
  before: 37192           2360          636
  after:  37080 (-0.3%)   2304 (-2.4%)  636 (+0.0%)
With the new crystallization logic, we have two routes for resuming
crystallization:

1. before finding our crystal heuristic, if buffer is in-block and
   enough for prog alignment

2. after finding our crystal heuristic, if crystal heuristic is in-block
   and enough for prog alignment

But thinking about the second case, when would this happen that isn't
caught by the first case? When there are fragments trailing our buffer?
Are you writing to the file backwards?

This corner case doesn't seem worth the extra logic.

Benchmarking didn't find a noticeable difference in performance, so
removing.

Saves a bit of code:

           code          stack          ctx
  before: 37080           2304          636
  after:  37056 (-0.1%)   2304 (+0.0%)  636 (+0.0%)
In lfsr_mdir_compact__, we rely on shrub_.block != mdir.block to avoid
compacting shrubs multiple times. This works for the most part because
we set shrub_.block = shrub.block (the old mdir block) at the beginning
of lfsr_mdir_commit. We don't actually reset shrub_.block on a bad prog,
but in theory that was ok because we never try to compact into the same
block twice.

But this falls apart if we overrecycle the mdir!

With overrecycling, if we encounter a bad prog during a compaction and
there are no more blocks to relocate to, we try one last time to compact
into the same block (this logic is mainly for recycle overflows, where
it makes a bit more sense).

Of course, compacting into the same block breaks the above shrub_.block
!= mdir.block invariant, which causes the shrub compaction to be
skipped, uses the old shrub_.trunk (which now points to garbage), and
breaks everything.

Fortunately the solution is relatively simple: Just discard any staged
shrubs that have been committed when we relocate/overrecycle.

---

While fixing this I went ahead and renamed overcompaction ->
overrecycling. To me, overcompaction implies something _very_ different,
and I think this better describes the relationship between overrecycling
and block_recycles.

Also added test_ck_ckprogs_overrecycling to nail this down and prevent a
regression in the future. This bug _was_ caught by
test_ck_spam_fwrite_fuzz, but only after unrelated fs changes.

Adds a bit of code, but a smaller + dysfunctional filesystem is not very
useful:

           code          stack          ctx
  before: 37056           2304 (+0.0%)  636 (+0.0%)
  after:  37088 (+0.1%)   2304 (+0.0%)  636 (+0.0%)
This tweaks lfsr_mdir_commit_ to avoid overrecycling if we encounter a
bad prog (LFS_ERR_CORRUPT). This avoids compacting to the same block
twice, which risks an undetected prog error and breaks internal
invariants.

Note we still overrecycle if the relocation reason is a recycle
overflow.

---

This is an alternative solution to the previous overrecycling + shrub +
ckprog bug: Just make sure we don't compact to the same block twice!

After all, if we just got a bad prog, why are we trying to prog again?

(There are actually some arguments for multiple prog attempts, bus
errors for example, but I don't think that's a great excuse for littlefs
attempting multiple progs without user input.)

Even though this adds logic to lfsr_mdir_commit_, it ends up saving
code since we can drop the shrub discard pass:

           code          stack          ctx
  before: 37088           2304          636
  after:  37056 (-0.1%)   2304 (+0.0%)  636 (+0.0%)

Not that we _really_ care about this quantity of code. The real
motivation is 1. lowering the risk of a missed prog error, and
2. maintaining the never-compact-same-block invariant in case there
are other invariant-dependent bugs lurking around.
This should better match other relocation loops in the codebase, and is
hopefully a bit more readable.

---

Note we generally have two patterns for relocation loops:

Loops where we unconditionally allocate/relocate:

  relocate:;
      alloc();
      compact();
      if (err) goto relocate;
      commit();
      if (err) goto relocate;
      return;

And loops where we fallback to allocation/relocation:

  while (true) {
      commit();
      if (err) goto relocate;
      return;
  relocate:;
      alloc();
      compact();
      if (err) goto relocate;
  }

lfsr_mdir_commit_ falls into the latter.

No code changes.
- lfsr_file_discardcache
- lfsr_file_discardleaf
- lfsr_file_discardbshrub

The code deduplication saves a bit of code:

           code          stack          ctx
  before: 37056           2304          636
  after:  37012 (-0.1%)   2304 (+0.0%)  636 (+0.0%)
This adopts lazy crystallization in _addition_ to lazy grafting, managed
by separate LFS_o_UNCRYST and LFS_o_UNGRAFT flags:

  LFS_o_UNCRYST  0x00400000  File's leaf not fully crystallized
  LFS_o_UNGRAFT  0x00800000  File's leaf does not match bshrub/btree

This lets us graft not-fully-crystallized blocks into the tree without
needing to fully crystallize, avoiding repeated recrystallizations when
linearly rewriting a file.

Long story short, this gives file rewrites roughly the same performance
as linear file writes.

---

In theory you could also have fully crystallized but ungrafted blocks
(UNGRAFT + ~UNCRYST), but this doesn't happen with the current logic.
lfsr_file_crystallize eagerly grafts blocks once they're crystallized.

Internally, lfsr_file_crystallize replaces lfsr_file_graft for the
"don't care, gimme file->leaf" operation. This is analogous to
lfsr_file_flush for file->cache.

Note we do _not_ use LFS_o_UNCRYST to track erased-state! If we did,
erased-state wouldn't survive lfsr_file_flush!

---

Of course, this adds even more code. Fortunately not _that_ much
considering how many lines of code changed:

           code          stack          ctx
  before: 37012           2304          636
  after   37084 (+0.2%)   2304 (+0.0%)  636 (+0.0%)

There is another downside however, and that's that our benchmarked disk
usage is slightly worse during random writes.

I haven't fully investigated this, but I think it's due to more
temporary fragments/blocks in the B-tree before flushing. This can cause
B-tree inner nodes to split earlier than when eagerly recrystallizing.

This also leads to higher disk usage pre-flush since we keep both the
old and new blocks around while uncrystallized, but since most rewrites
are probably going to be CoW on top of committed files, I don't think
this will be a big deal.

Note the disk usage ends up the same after lfsr_file_flush.
This reverts most of the lazy-grafting/crystallization logic, but keeps
the general crystallization algorithm rewrite and file->leaf for caching
read operations and erased-state.

Unfortunately lazy-grafting/crystallization is both a code and stack
heavy feature for a relatively specific write pattern. It doesn't even
help if we're forced to write fragments due to prog alignment.

Dropping lazy-grafting/crystallization trades off linear write/rewrite
performance for code and stack savings:

                           code          stack          ctx
  before:                 37084           2304          636
  after:                  36428 (-1.8%)   2248 (-2.4%)  636 (+0.0%)

But with file->leaf we still keep the improvements to linear read
performance!

Compared to pre-file->leaf:

                           code          stack          ctx
  before file->leaf:      36016           2296          636
  after lazy file->leaf:  37084 (+3.0%)   2304 (+0.3%)  636 (+0.0%)
  after eager file->leaf: 36428 (+1.1%)   2248 (-2.1%)  636 (+0.0%)

I'm still on the fence about this, but lazy-grafting/crystallization is
just a lot of code... And the first 6 letters of littlefs don't spell
"speedy" last time I checked...

At the very least we can always add lazy-grafting/crystallization as an
opt-in write strategy later.
geky added 2 commits October 17, 2025 14:02
And:

- Tweaked the behavior of gbmap.window/known to _not_ match disk.
  gbmap.known matching disk is what required a separate
  lookahead.bmapped in the first place, but we never use both fields.

- _Don't_ revert gbmap on failed mdir commits!

  This was broken! If we reverted we risked inheriting outdated
  in-flight block information.

  This could be fixed by also zeroing lookahead.bmapped, but would force
  a gbmap rebuild. And why? The only interaction between mdir commit and
  the gbmap is block allocation, which is intentionally allowed to go
  out-of-sync to relax issues like this.

  Note we still revert in lfs3_fs_grow, the new gbmap we create there is
  incompatible with the previous disk size.

As a part of these changes, gbmap.window now behaves roughly the same as
gbmap.known and updates eagerly on block allocation.

This makes lookahead.window and gbmap.window somewhat redundant, but
simplifies the relevant logic (especially due to how lookahead.window
lags behind lookahead.off).

---

A bunch of bugs fell out-of-this, the interactions with lfs3_fs_mkgbmap
and lfs3_fs_grow being especially tricky, but fortunately our testing is
doing a good job.

At least the code changes were minimal, saves a bit of RAM:

                       code          stack          ctx
  no-gbmap before:    37168           2352          684
  no-gbmap after:     37168 (+0.0%)   2352 (+0.0%)  684 (+0.0%)

                       code          stack          ctx
  maybe-gbmap before: 39688           2392          852
  maybe-gbmap after:  39720 (+0.1%)   2376 (-0.7%)  848 (-0.5%)

                       code          stack          ctx
  yes-gbmap before:   39156           2392          852
  yes-gbmap after:    39208 (+0.1%)   2376 (-0.7%)  848 (-0.5%)
lfs3_fs_mkconsistent is already limited to call sites where
lfs3_alloc_ckpoint is valid (lfs3_fs_mkconsistent internally relies on
lfs3_mdir_commit), so might as well include an unconditional
lfs3_alloc_ckpoint to populate allocators and save some code:

                       code          stack          ctx
  no-gbmap before:    37168           2352          684
  no-gbmap after:     37164 (-0.0%)   2352 (+0.0%)  684 (+0.0%)

                       code          stack          ctx
  maybe-gbmap before: 39720           2376          848
  maybe-gbmap after:  39708 (-0.0%)   2376 (+0.0%)  848 (+0.0%)

                       code          stack          ctx
  yes-gbmap before:   39208           2376          848
  yes-gbmap after:    39204 (-0.0%)   2376 (+0.0%)  848 (+0.0%)
geky added 26 commits October 23, 2025 23:39
This adds LFS3_T_REBUILDGBMAP and friends, and enables incremental gbmap
rebuilds as a part of gc/traversal work:

  LFS3_M_REBUILDGBMAP   0x00000400  Rebuild the gbmap
  LFS3_GC_REBUILDGBMAP  0x00000400  Rebuild the gbmap
  LFS3_I_REBUILDGBMAP   0x00000400  The gbmap is not full
  LFS3_T_REBUILDGBMAP   0x00000400  Rebuild the gbmap

On paper, this is more or less identical to repopulating the lookahead
buffer -- traverse the filesystem, mark blocks as in-use, adopt the new
gbmap/lookahead buffer on success -- but a couple nuances make
rebuilding the gbmap a bit trickier:

- Unlike the lookahead buffer, which eagerly zeros in allocation, we
  need an explicit zeroing pass before we start marking blocks as
  in-use. This means multiple traversals can potentially conflict with
  each other, risking the adoption of a clobbered gbmap.

- The gbmap, which stores information on disk, relies on block
  allocation and the temporary "in-flight window" defined by allocator
  ckpoints to avoid circular block states during gbmap rebuilds. This
  makes gbmap rebuilds sensitive to allocator ckpoints, which we
  consider more-or-less a noop in other parts of the system.

  Though now that I'm writing this, it might have been possible to
  instead include gbmap rebuild snapshots in fs traversals... but that
  would probably have been much more complicated.

- Rebuilding the gbmap requires writing to disk and is generally much
  more expensive/destructive. We want to avoid trying to rebuild the
  gbmap when it's not possible to actually make progress.

On top of this, the current trv-clobber system is a delicate,
error-prone mess.

---

To simplify everything related to gbmap rebuilds, I added a new
internal traversal flag: LFS3_t_CKPOINTED:

  LFS3_t_CKPOINTED  0x04000000  Filesystem ckpointed during traversal

LFS3_t_CKPOINTED is set, unconditionally, on all open traversals in
lfs3_alloc_ckpoint, and provides a simple, robust mechanism for checking
if _any_ allocator checkpoints have occured since a traversal was
started. Since lfs3_alloc_ckpoint is required before any block
allocation, this provides a strong guarantee that nothing funny happened
to any allocator state during a traversal.

This makes lfs3_alloc_ckpoint a bit less cheap, but the strong
guarantees that allocator state is unmodified during traversal are well
worth it.

This makes both lookahead and gbmap passes simpler, safer, and easier to
reason about.

I'd like to adopt something similar+stronger for LFs3_t_MUTATED, and
reduce this back to two flags, but that can be a future commit.

---

Unfortunately due to the potential for recursion, this ended up reusing
less logic between lfs3_alloc_rebuildgbmap and lfs3_mtree_gc than I had
hoped, but at like the main chunks (lfs3_alloc_remap,
lfs3_gbmap_setbptr, lfs3_alloc_adoptgbmap) could be split out into
common functions.

The result is a decent chunk of code and stack, but the value is high as
incremental gbmap rebuilds are the only option to reduce the latency
spikes introduced by the gbmap allocator (it's not significantly worse
than the lookahead buffer, but both do require traversing the entire
filesystem):

                 code          stack          ctx
  before:       37164           2352          684
  after:        37208 (+0.1%)   2360 (+0.3%)  684 (+0.0%)

                 code          stack          ctx
  gbmap before: 39708           2376          848
  gbmap after:  40100 (+1.0%)   2432 (+2.4%)  848 (+0.0%)

Note the gbmap build is now measured with LFS3_GBMAP=1, instead of
LFS3_YES_GBMAP=1 (maybe-gbmap) as before. This includes the cost of
mkgbmap, lfs3_f_isgbmap, etc.
- lfs3_gbmap_set* -> lfs3_gbmap_mark*
- lfs3_alloc_markfree -> lfs3_alloc_adopt
- lfs3_alloc_mark* -> lfs3_alloc_markinuse*

Mainly for consistency, since the gbmap and lookahead buffer are more or
less the same algorithm, ignoring nuances (lookahead only ors inuse
bits, gbmap rebuilding can result in multiple snapshots, etc).

The rename lfs3_gbmap_set* -> lfs3_gbmap_mark* also makes space for
lfs3_gbmap_set* to be used for range assignments with a payload, which
may be useful for erased ranges (gbmap tracked ecksums?)
A bit less simplified than I hoped, we don't _strictly_ need both
LFS3_t_DIRTY + LFS3_t_MUTATED if we're ok with either (1) making
multiple passes to confirm fixorphans succeeded or (2) clear the COMPACT
flag after one pass (which may introduce new uncompacted metadata). But
both of these have downsides, and we're not _that_ stressed for flag
space yet...

So keeping all three of:

  LFS3_t_DIRTY      0x04000000  Filesystem modified outside traversal
  LFS3_t_MUTATED    0x02000000  Filesystem modified during traversal
  LFS3_t_CKPOINTED  0x01000000  Filesystem ckpointed during traversal

But I did manage to get rid of the bit swapping by tweaking LFS3_t_DIRTY
to imply LFS3_t_MUTATED instead of being exclusive. This removes the
"failed" gotos in lfs3_mtree_gc and makes things a bit more readable.

---

I also split lfs3_fs/handle_clobber into separate lfs3_fs/handle_clobber
and lfs3_fs/handle_mutate functions. This added a bit of code, but I
think is worth it for a simpler internal API. A confusing internal API
is no good.

In total these simplifications saved a bit of code:

                 code          stack          ctx
  before:       37208           2360          684
  after:        37176 (-0.1%)   2360 (+0.0%)  684 (+0.0%)

                 code          stack          ctx
  gbmap before: 40100           2432          848
  gbmap after:  40060 (-0.1%)   2432 (+0.0%)  848 (+0.0%)
A big downside of LFS3_T_REBUILDGBMAP is the addition of an lfs3_btree_t
struct to _every_ traversal object.

Unfortunately, I don't see a way around this. We need to track the new
gbmap snapshot _somewhere_, and other options (such as a global gbmap.b_
snapshot) just move the RAM around without actually saving anything.

To at least mitigate this internally, this splits lfs3_trv_t into
distinct lfs3_trv_t, lfs3_mgc_t, and lfs3_mtrv_t structs that capture
only the relevant state for internal traversal layers:

- lfs3_mtree_traverse <- lfs3_mtrv_t
- lfs3_mtree_gc       <- lfs3_mgc_t (contains lfs3_mtrv_t)
- lfs3_trv_read       <- lfs3_trv_t (contains lfs3_mgc_t)

This minimizes the impact of the gbmap rebuild snapshots, and saves a
big chunk of RAM. As a plus it also saves RAM in the default build by
limiting the 2-block block queue to the high-level lfs3_trv_read API:

                 code          stack          ctx
  before:       37176           2360          684
  after:        37176 (+0.0%)   2352 (-0.3%)  684 (+0.0%)

                 code          stack          ctx
  gbmap before: 40060           2432          848
  gbmap after:  40024 (-0.1%)   2368 (-2.6%)  848 (+0.0%)

The main downside? Our field names are continuing in their
ridiculousness:

  lfs3.gc.gc.t.b.h.flags // where else would the global gc flags be?
And tweaked a few related comments.

I'm still on the fence with this name, I don't think it's great, but it
at least betters describes the "repopulation" operation than
"rebuilding". The important distinction is that we don't throw away
information. Bad/erased block info (future) is still carried over into
the new gbmap snapshot, and persists unless you explicitly call
rmgbmap + mkgbmap.

So, adopting gbmap_repop_thresh for now to see if it's just a habit
thing, but may adopt a different name in the future.

As a plus, gbmap_repop_thresh is two characters shorter.
This really didn't match the use of "flush" elsewhere in the system.
There's a strong argument for naming this inline_size as that's more
likely what users expect, but shrub_size is just the more correct name
and avoids confusion around having multiple names for the same thing.

It also highlights that shrubs in littlefs3 are a bit different than
inline files in littlefs2, and that this config also affects large files
with a shrubbed root.

May rerevert this in the future, but probably only if there is
significant user confusion.
And friends:

  LFS3_M_REPOPLOOKAHEAD   0x00000200  Repopulate lookahead buffer
  LFS3_GC_REPOPLOOKAHEAD  0x00000200  Repopulate lookahead buffer
  LFS3_I_REPOPLOOKAHEAD   0x00000200  Lookahead buffer is not full
  LFS3_T_REPOPLOOKAHEAD   0x00000200  Repopulate lookahead buffer

To match LFS3_T_REPOPGBMAP, which is more-or-less the same operation.
Though this does turn into quite the mouthful...
- LFS3_T_COMPACT -> LFS3_T_COMPACTMETA
- gc_compact_thresh -> gc_compactmeta_thresh

And friends:

  LFS3_M_COMPACTMETA   0x00000800  Compact metadata logs
  LFS3_GC_COMPACTMETA  0x00000800  Compact metadata logs
  LFS3_I_COMPACTMETA   0x00000800  Filesystem may have uncompacted metadata
  LFS3_T_COMPACTMETA   0x00000800  Compact metadata logs

---

This does two things:

1. Highlights that LFS3_T_COMPACTMETA only interacts with metadata logs,
   and has no effect on data blocks.

2. Better matches the verb+noun names used for other gc/traversal flags
   (REPOPGBMAP, CKMETA, etc).

It is a bit more of a mouthful, but I'm not sure that's entirely a bad
thing. These are pretty low-level flags.
This is an alias for all possible gc work, which is a bit more
complicated than you might think due to compile-time features (example:
LFS3_GC_REPOPGBMAP).

The intention is to make loops like the following easy to write:

  struct lfs3_fsinfo fsinfo;
  lfs3_fs_stat(&lfs3, &fsinfo) => 0;

  lfs3_trv_t trv;
  lfs3_trv_open(&lfs3, &trv, fsinfo.flags & LFS3_GC_ALL) => 0;
  ...

It's possible to do this by explicitly setting all gc flags, but that
requires quite a bit of knowledge from the user.

Another option is allowing -1 for gc/traversal flags, but that loses
assert protection against unknown/misplaced flags.

---

This raises more questions about the prefix naming: it feels a bit weird
to take LFS3_I_* flags, mask with LFS3_GC_* flags, and pass them as
LFS3_T_* flags, but it gets the job done.

Limiting LFS3_GC_ALL to the LFS3_GC_* namespace avoids issues with
opt-out/mode flags such as LFS3_T_RDONLY, LFS3_T_MTREEONLY, etc. For
this reason it probably doesn't make sense to add something similar to
the other namespaces.
To allow relaxing when LFS3_I_REPOPLOOKAHEAD and LFS3_I_REPOPGBMAP will
be set, potentially reducing gc workload after allocating only a couple
blocks.

The relevant cfg comments have quite a bit more info.

Note -1 (not the default, 0, maybe we should explicitly flip this?)
restores the previous functionality of setting these flags on the first
block allocation.

---

Also tweaked gbmap repops during gc/traversals to _not_ try to repop
unless LFS3_I_REPOPGBMAP is set. We probably should have done this from
the beginning since repopulating the gbmap writes to disk and is
potentially destructive.

Adds code, though hopefully we can claw this back with future config
rework:

                 code          stack          ctx
  before:       37176           2352          684
  after:        37208 (+0.1%)   2352 (+0.0%)  688 (+0.6%)

                 code          stack          ctx
  gbmap before: 40024           2368          848
  gbmap after:  40120 (+0.2%)   2368 (+0.0%)  856 (+0.9%)
Unfortunately this doesn't work and will need to be ripped-out/reverted.

---

The goal was to limit in-use -> free zeroing to the uknown window, which
would allow the gbmap to be updated in-place, saving the extra RAM we
need to maintain the extra gbmap snapshot during traversals and
lfs3_alloc_zerogbmap.

Unfortunately this doesn't seem to work. If we limit zeroing to the
unknown window, blocks can get stuck in the in-use state as long as they
stay in the known window. Since the gbmap's known window encompasses
most of the disk, this can cause the allocators to lock up and be unable
to make progress.

So will revert, but committing the current implementation in case we
revisit the idea.

As a plus, reverting avoids needing to maintain this unknown window
logic, which is tricky and error-prone.
These are more-or-less equivalent, but:

- Making lfs3_alloc_zerogbmap a non-gbmap function avoids awkward
  conversations about why it's not atomic.

- Making lfs3_alloc_zerogbmap alloc-specific makes room for pererased-
  specific zeroing operations that we might need when adopt bmerased
  ranges (future).

No code changes, which means const-propagation works as expected:

                 code          stack          ctx
  before:       37208           2352          688
  after:        37208 (+0.0%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40120           2368          856
  gbmap after:  40120 (+0.0%)   2368 (+0.0%)  856 (+0.0%)
This relaxes error encountered during lfs3_mtree_gc to _not_ propagate,
but instead just log a warning and prevent the relevant work from being
checked off during EOT.

The idea is this allows other work to make progress in low-space
conditions.

I originally meant to limit this to gbmap repopulations, to match the
behavior of lfs3_alloc_repopgbmap, but I think extending the idea to all
filesystem mutating operations makes sense (LFS3_T_MKCONSISTENT +
LFS3_T_REPOPGBMAP + LFS3_T_COMPACTMETA).

---

To avoid incorrectly marking traversal work as completed, we need to
track if we hit any ENOSPC errors, thus the new LFS3_t_NOSPC flag:

  LFS3_t_NOSPC  0x00800000  Optional gc work ran out of space

Not the happiest just throwing flags at problems, but I can't think of a
better solution at the moment.

This doesn't differentiate between ENOSPC errors during the different
types of work, but in theory if we're hitting ENOSPC errors whatever
work returns the error is a toss-up anyways.

---

Adds a bit of code:

                 code          stack          ctx
  before:       37208           2352          688
  after:        37248 (+0.1%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40120           2368          856
  gbmap after:  40204 (+0.2%)   2368 (+0.0%)  856 (+0.0%)
This adds test_gc_nospc with more aggressive testing of gc/traversal
operations in low-space conditions. The original intention was to test
the new soft-ENOSPC traversal behavior, but instead it found a couple
unrelated bugs.

In my defense these involve some rather subtle filesystem interactions
and went unnoticed because we don't usually check data checksums:

1. lfs3_bd_flush had a rare chance where it could corrupt our
   prog-aligned pcksum when (1) we bypass the pcache, allowing any
   previous contents to stay there until flush/pcksum, and (2) some
   other failed prog, in this case failing repopgbmaps due to the
   low-space condition, leaves garbage in the pcache. When we flush
   we corrupt the pcksum even though the old data belongs to an
   unrelated block.

   This resulted in CKDATA failing, though the failed check is a false
   positive.

   As a workaround, lfs3_bd_prog and lfs3_bd_prognext now discard _any_
   unrelated pcache, even if bypassing the pcache. This should ensure
   consistent behavior in all cases. Note we do something similar for
   with the file cache in lfs3_file_write.

   This means progs may not complete unless lfs3_bd_flush is called, but
   I think we need to call lfs3_bd_flush in all cases anyways to ensure
   power-loss safe behavior.

   The end result should be a more reliable internal bd prog API.

2. On a successful traversal with LFS3_T_REPOPLOOKAHEAD and
   LFS3_T_REPOPGBMAP we adopt both the new gbmap and lookahead buffer.

   This is wrong! The lookahead buffer is not aware of the gbmap during
   the traversal, and _can't_ be aware as the gbmap changes during
   repopulation work. This is the whole reason we have the alloc
   ckpoints and the in-flight window.

   To fix, adopting the lookahead buffer is now conditional on _not_
   adopting a new gbmap.

   It makes the code a bit more messy, but this is the correct behavior.
   Populating both the gbmap and lookahead buffere requires at least two
   passes.

Code changes minimal:

                 code          stack          ctx
  before:       37248           2352          688
  after:        37260 (+0.0%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40204           2368          856
  gbmap after:  40220 (+0.0%)   2368 (+0.0%)  856 (+0.0%)
Note: This affects the blocking lfs3_alloc_repopgbmap as well as
incremental gc/traversal repopulations. Now all repop attempts return
LFS3_ERR_NOSPC when we don't have space for the gbmap, motivation below.

This reverts the previous LFS3_t_NOSPC soft error, in which traversals
were allowed to continue some gc/traversal work when encountering
LFS3_ERR_NOSPC. This results in a simpler implementation and fewer error
cases to worry about.

Observation/motivation:

- The main motivation is noticing that when we're in low-space
  conditions, we just start spamming gbmap repops even if they all fail.

  That's really not great! We might as well just mark the flash as dead
  if we're going to start spamming erases!

  At least with an error the user can call rmgbmap to try to make
  progress.

- If we're in a low-space condition, something else will probably return
  LFS3_ERR_NOSPC anyways. Might as well report this early and simplify
  our system.

- It's a simpler model, and littlefs3 is already much more complicated
  than littlefs2. Maybe we should lean more towards a simpler system
  at the cost of some niche optimizations.

---

This had the side-effect of causing more lfs3_alloc_ckpoints to return
errors during testing, which revealed a bug in our uz/uzd_fuzz tests:

- We weren't flushing after writes to the opened RDWR files, which could
  cause delayed errors to occur during the later read checks in the
  test.

  Fortunately LFS3_O_FLUSH provides a quick and easy fix!

  Note we _don't_ adopt this in all uz/uzd_fuzz tests, only those that
  error. It's good to test both with and without LFS3_O_FLUSH to test
  that read-flushing also works under stress.

Saves a bit of code:

                 code          stack          ctx
  before:       37260           2352          688
  after:        37220 (-0.1%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40220           2368          856
  gbmap after:  40184 (-0.1%)   2368 (+0.0%)  856 (+0.0%)
This drops LFS3_t_MUTATED in favor of just using LFS3_t_CKPOINTED
everywhere:

1. These meant roughly the same thing, with LFS3_t_MUTATED being a bit
   tighter at the cost of needing to be explicitly set.

2. The implicit setting of LFS3_t_CKPOINTED by lfs3_alloc_ckpoint -- a
   function that already needs to be called before mutation -- means we
   have one less thing to worry about.

   Implicit properties like LFS3_t_CKPOINTED are great for building a
   reliable system. Manual flags like LFS3_t_MUTATED, not so much.

3. Why use two flags when we can get away with one?

The only downside is we may unnecessarily clobber gc/traversal work when
we don't actually mutate the filesystem. Failed file open calls are a
good example.

However this tradeoff seems well worth it for an overall simpler +
more reliable system.

---

Saves a bit of code:

                 code          stack          ctx
  before:       37220           2352          688
  after:        37160 (-0.2%)   2352 (+0.0%)  688 (+0.0%)

                 code          stack          ctx
  gbmap before: 40184           2368          856
  gbmap after:  40132 (-0.1%)   2368 (+0.0%)  856 (+0.0%)
This has just proven much easier to tweak in dbgtag.py, so adopting the
same self-parsing pattern in dbgflags.py/dbgerr.py. This makes editing
easier by (1) not needing to worry about parens/quotes/commas, and
(2) allowing for non-python expressions, such as the mode flags in
dbgflags.py.

The only concern is script startup may be slightly slower, but we really
don't care.
This required a bit of a hack: LFS3_seek_MODE, which is marked internal
to try to minimize confusion, but really doesn't exist in the code at
all.

But a hack is probably good enough for now.
This should make tag editing less tedious/error-prone. We already used
self-parsing to generate -l/--list in dbgtag.py, but this extends the
idea to tagrepr (now Tag.repr), which is used in quite a few more
scripts.

To make this work the little tag encoding spec had to become a bit more
rigorous, fortunately the only real change was the addition of '+'
characters to mark reserved-but-expected-zero bits.

Example:

  TAG_CKSUM = 0x3000  ## v-11 ---- ++++ +pqq
                         ^--^----^----^--^-^-- valid bit, unmatched
                            '----|----|--|-|-- matches 1
                                 '----|--|-|-- matches 0
                                      '--|-|-- reserved 0, unmatched
                                         '-|-- perturb bit, unmatched
                                           '-- phase bits, unmatched

  dbgtag.py 0x3000  =>  cksumq0
  dbgtag.py 0x3007  =>  cksumq3p
  dbgtag.py 0x3017  =>  cksumq3p 0x10
  dbgtag.py 0x3417  =>  0x3417

Though Tag.repr still does a bit of manual formatting for the
differences between shrub/normal/null/alt tags.

Still, this should reduce the number of things that need to be changed
from 2 -> 1 when adding/editing most new tags.
So:

- cfg.gc_repoplookahead_thresh -> cfg.gc_relookahead_thresh
- cfg.gc_repopgbmap_thresh     -> cfg.gc_regbmap_thresh
- cfg.gbmap_repop_thresh       -> cfg.gbmap_re_thresh
- LFS3_*_REPOPLOOKAHEAD        -> LFS3_*_RELOOKAHEAD
- LFS3_*_REPOPGBMAP            -> LFS3_*_REGBMAP

Mainly trying to reduce the mouthful that is REPOPLOOKAHEAD and
REPOPGBMAP.

As a plus this also avoids potential confusion of "repop" as a push/pop
related operation.
Just to avoid the awkward escaped newlines when possible. Note this has
no effect on the output of dbgflags.py.
This walks back some of the attempt at strict object namespacing in
struct lfs3_cfg:

- cfg.file_cache_size  -> cfg.fcache_size
- filecfg.cache_size   -> filecfg.fcache_size
- filecfg.cache_buffer -> filecfg.fcache_buffer
- cfg.gbmap_re_thresh  -> cfg.regbmap_thresh

Motivation:

- cfg.regbmap_thresh now matches cfg.gc_regbmap_thresh, instead of using
  awkwardly different namespacing patterns.

- Giving fcache a more unique name is useful for discussion. Having
  pcache, rcache, and then file_cache was a bit awkward.

  Hopefully it's also more clear that cfg.fcache_size and
  filecfg.fcache_size are related.

- Config in struct lfs3_cfg is named a bit more consistently, well, if
  you ignore gc_*_* options.

- Less typing.

Though this gets into pretty subjective naming territory. May revert
this if the new terms are uncomfortable after use.
This just fell out-of-sync a bit during the gbmap work. Note we _do_
support LFS3_RDONLY + LFS3_GBMAP, as fetching the gbmap is necessary for
CKMETA to check all metadata. Fortunately this is relatively cheap:

                 code          stack          ctx
  rdonly:       10716            896          532
  rdonly+gbmap: 10988 (+2.5%)    896 (+0.0%)  680 (+27.8%)

Though this does highlight that a sort of LFS3_NO_TRV mode could remove
quite a bit of code.
I think these are good ideas to bring back when littlefs3 is more
mature, but at the moment the number of different builds is creating too
much friction.

LFS3_KVONLY and LFS3_2BONLY in particular _add_ significant chunks of
code (lfs3_file_readget_, lfs3_file_flushset_, and various extra logic
sprinkled throughout the codebase), and the current state of testing
means I have no idea if any of it still works.

These are also low-risk for introducing any disk related changes.

So, ripping out for now to keep the current experimental development
tractable. May reintroduce in the future (probably after littlefs3 is
stabilized) if there is sufficient user interest. But doing so will
probably also need to come with actual testing in CI.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs major version breaking functionality only allowed in major versions next major on-disk major WEEWOOWEEWOO v3

Projects

None yet

Development

Successfully merging this pull request may close these issues.

v3-alpha: Discussion

5 participants