Skip to content

Conversation

hkBst
Copy link
Member

@hkBst hkBst commented Jan 20, 2025

fixes #134088, though it is a shame to lose some of this wonderful detail.

r? @workingjubilee

EDIT: newest versions keep old detail, but move it down a bit.

@rustbot
Copy link
Collaborator

rustbot commented Jan 20, 2025

Could not assign reviewer from: workingjubilee.
User(s) workingjubilee are either the PR author, already assigned, or on vacation, and there are no other candidates.
Use r? to specify someone else to assign.

@rustbot
Copy link
Collaborator

rustbot commented Jan 20, 2025

r? @tgross35

rustbot has assigned @tgross35.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 20, 2025
Comment on lines 43 to 44
/// A B-tree resembles a [binary search tree], but each leaf (node) contains
/// an entire array (of unspecified size) of elements, instead of just a single element.
Copy link
Contributor

@hanna-kruppe hanna-kruppe Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this talk about leaves is taken from #134088 but it doesn't reflect how BTreeMap works. All nodes (leaf and interior nodes) contain an array of elements and search may terminate in any them, not just in leaves. That's also closer to how binary search trees usually work. B+ trees, a variant of the traditional B-tree, do the "only leaves store elements, interior nodes guide search to the right leaf" thing but Rust's BTree* types are not like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so IIUC each node either has the element in its array, or it is in the left child, or it is in the right child.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now corrected to "node" instead of "leaf (node)".

Comment on lines 45 to 46
/// A search first traverses the tree structure to find, in logarithmic time, the correct leaf.
/// This leaf is then searched linearly, which is very fast on modern hardware.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may just be collateral damage from the leaves vs interior distinction, but it seems strange to only mention the linear search for leaves. The current implementation does it for every node, if it's mentioned for one it should be mentioned for both. Alternatively, it may be considered an implementation detail that distracts from the main point (logarithmic time complexity w.r.t. total number of elements, despite linear search in each node). But then it shouldn't be mentioned for leaves either. Certainly the way each node is searched is less central to the mechanical sympathy of B-trees than the "B" constant: fewer and larger nodes are obviously good (up to a point), while linear vs binary search is a more nuanced decision.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is definitely collateral damage from my faulty guessing at the internal workings. I guess I was too hasty dismissing this possibility as not being compatible with logarithmic behavior, but I see now that I was wrong. Thanks for clarifying.
I don't agree about larger nodes being obviously good. I think it has more to do with the minimum number of bits that a real machine can look at, due to the size of its cache lines or some other detail. If your algorithm uses smaller chunks, then it is wasting bandwidth. So I would say smaller is obviously better (since it gets you from linear search to logarithmic search), but only up to a point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this detail and included something about the natural granularity of data for modern machines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of comparison against BST, anything more than one key per node is a "larger node". You're right that the granularity of memory/storage access (cache lines in main memory, block/page size for disks) is a useful rule of thumb. Larger nodes also make insertion and deletion more expensive. But you're mistaken about search complexity: as the existing comments note, you can maintain optimal number of comparisons (up to a small constant factor) by doing binary search in each node. Sometimes that turns out a bit slower than a linear search, but it's competitive or superior in other cases.

In any case, I don't think it's useful to try and communicate these nuances in the BTreeMap docs.

Comment on lines -53 to -54
/// A B-Tree instead makes each node contain B-1 to 2B-1 elements in a contiguous array. By doing
/// this, we reduce the number of allocations by a factor of B, and improve cache efficiency in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do want to salvage e.g. a brief mention of fewer nodes with more elements being better for reducing heap allocations and cache misses, consider this suggestions from #134088 (comment)

If we do wish to explain the theory behind B-tree performance, we can do it near the end, when we are not crushed for time and space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restored the old comments with the suggested fixes from @danielrainer from #134088 in a new background section below everything.

/// and possibly other factors. Using linear search, searching for a random element is expected
/// to take B * log(n) comparisons, which is generally worse than a BST. In practice,
/// however, performance is excellent.
/// A B-tree resembles a [binary search tree], but each leaf (node) contains
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @workingjubilee (#134088 (comment)) that

our first primary goal is not to explain "why not BinarySearchTreeMap?" We don't have a BinarySearchTreeMap, but we do have HashMap, so our first task is "why should you EVER look at this data structure and not HashMap?" This is true even if we never directly compare, in the type's docs, against HashMap.

So while you're already condensing the existing description, do you think you could also include this aspect? Essentially, say more about this being an "ordered map" before diving into how it maintains order internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included a short introduction that focuses on the ordered aspect and lifted the existing text about the order produced by iterators up. Also the section about not mucking about with the order is now much closer to the top and less unexpected (hopefully).

@tgross35
Copy link
Contributor

For any of the details that are still correct but just shouldn't be user-facing, leaving them as a non-doc comment seems preferable to removing them.

Cc @workingjubilee if you want to yoink this review since you were involved in #134088

@hkBst
Copy link
Member Author

hkBst commented Jan 21, 2025

@tgross35 I've restored the old comments (with small corrections from the original bug report) to a new section at the end.

@hkBst hkBst requested a review from hanna-kruppe January 21, 2025 11:11
/// triggers a heap-allocation, and every single comparison should be a cache-miss. Since these
/// are both notably expensive things to do in practice, we are forced to, at the very least,
/// reconsider the BST strategy.
/// An ordered map is a map in which the keys are totally ordered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've occasionally seen people be confused about ordered maps in this sense vs. "map that maintains some consistent order" (e.g., insertion order) as exemplified by Python's OrderedDict or http://crates.io/crates/ordermap in Rust. I'd throw in something about the entries being stored in sorted order before detailing what determines the sort order.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a valuable insight and I can see how the first sentence can be mistaken for talking about a map with insertion order. It now says: "Given a key type with a [total order], an ordered map stores its entries in key order." which I feel is much harder to misunderstand. Let me know if you have more valuable suggestions.

@hkBst hkBst requested a review from hanna-kruppe February 5, 2025 13:21
Copy link
Contributor

@hanna-kruppe hanna-kruppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have any further suggestions but I gave up my bors permissions in 2020 (wow, that long ago?) when I stopped contributing regularly, so I can't move this any further. It's up to @tgross35 or another reviewer if they're too busy.

///
/// # Background
///
/// A B-tree is (like) a [binary search tree], but adapted to the natural granularity that modern machines like to consume data at.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this line a little too long? But I don't see tidy complaining so idk, just looks out of place 🤷

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this should be wrapped.

@hkBst
Copy link
Member Author

hkBst commented Feb 19, 2025

@tgross35 or @workingjubilee, would be great if you could take a look.

@tgross35 tgross35 assigned workingjubilee and unassigned tgross35 Mar 4, 2025
@tgross35
Copy link
Contributor

tgross35 commented Mar 4, 2025

I didn't realize this was still assigned to me, I'm not entirely sure what the vision from the original issue was

r? @workingjubilee

@rustbot
Copy link
Collaborator

rustbot commented Mar 4, 2025

Could not assign reviewer from: workingjubilee.
User(s) workingjubilee are either the PR author, already assigned, or on vacation. Please use r? to specify someone else to assign.

@tgross35
Copy link
Contributor

tgross35 commented Mar 4, 2025

Ah sorry, I'll take that back then and try to have a look

@tgross35 tgross35 assigned tgross35 and unassigned workingjubilee Mar 4, 2025
@hkBst
Copy link
Member Author

hkBst commented Jul 14, 2025

@tgross35 friendly ping

@tgross35
Copy link
Contributor

Argh sorry, I thought this got rerolled a long time ago (probably should have). I still have no strong thoughts here so I'm just going to do that now

r? libs

@rustbot rustbot assigned ibraheemdev and unassigned tgross35 Jul 29, 2025
Comment on lines 159 to 160
/// triggers a heap-allocation, and for every comparison a node needs to be loaded,
/// which could result in a cache miss. Since both heap-allocations and cache-misses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// triggers a heap-allocation, and for every comparison a node needs to be loaded,
/// which could result in a cache miss. Since both heap-allocations and cache-misses
/// triggers a heap-allocation, and comparison is a potential cache-miss due to the indirection.
/// Since both heap-allocations and cache-misses

@ibraheemdev
Copy link
Member

r=me after the nits.

@ibraheemdev ibraheemdev added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 19, 2025
@rustbot

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Aug 24, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@hkBst
Copy link
Member Author

hkBst commented Aug 24, 2025

@rustbot ready

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 24, 2025
@ibraheemdev
Copy link
Member

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Aug 24, 2025

📌 Commit 1b77387 has been approved by ibraheemdev

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 24, 2025
jhpratt added a commit to jhpratt/rust that referenced this pull request Aug 24, 2025
Dial down detail of B-tree description

fixes rust-lang#134088, though it is a shame to lose some of this wonderful detail.

r? `@workingjubilee`

EDIT: newest versions keep old detail, but move it down a bit.
jhpratt added a commit to jhpratt/rust that referenced this pull request Aug 24, 2025
Dial down detail of B-tree description

fixes rust-lang#134088, though it is a shame to lose some of this wonderful detail.

r? ``@workingjubilee``

EDIT: newest versions keep old detail, but move it down a bit.
bors added a commit that referenced this pull request Aug 24, 2025
Rollup of 6 pull requests

Successful merges:

 - #135761 (Dial down detail of B-tree description)
 - #144373 (remove deprecated Error::description in impls)
 - #145620 (Account for impossible bounds making seemingly unsatisfyable dyn-to-dyn casts)
 - #145783 (add span to struct pattern rest (..))
 - #145817 (cg_llvm: Replace the `llvm::Bool` typedef with a proper newtype)
 - #145820 (raw-dylib-elf: set correct `DT_VERDEFNUM`)

r? `@ghost`
`@rustbot` modify labels: rollup
jhpratt added a commit to jhpratt/rust that referenced this pull request Aug 24, 2025
Dial down detail of B-tree description

fixes rust-lang#134088, though it is a shame to lose some of this wonderful detail.

r? ```@workingjubilee```

EDIT: newest versions keep old detail, but move it down a bit.
bors added a commit that referenced this pull request Aug 25, 2025
Rollup of 5 pull requests

Successful merges:

 - #135761 (Dial down detail of B-tree description)
 - #144373 (remove deprecated Error::description in impls)
 - #145620 (Account for impossible bounds making seemingly unsatisfyable dyn-to-dyn casts)
 - #145817 (cg_llvm: Replace the `llvm::Bool` typedef with a proper newtype)
 - #145820 (raw-dylib-elf: set correct `DT_VERDEFNUM`)

r? `@ghost`
`@rustbot` modify labels: rollup
Zalathar added a commit to Zalathar/rust that referenced this pull request Aug 25, 2025
Dial down detail of B-tree description

fixes rust-lang#134088, though it is a shame to lose some of this wonderful detail.

r? ````@workingjubilee````

EDIT: newest versions keep old detail, but move it down a bit.
@Zalathar
Copy link
Contributor

Tracking down a failure in rollup #145833.

@bors try jobs=dist-various-2

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Aug 25, 2025
Dial down detail of B-tree description

try-job: dist-various-2
@rust-bors
Copy link

rust-bors bot commented Aug 25, 2025

☀️ Try build successful (CI)
Build commit: 776e4d7 (776e4d77395f0403e79a600d5b16c3ca0e6249af, parent: a1dbb443527bd126452875eb5d5860c1d001d761)

bors added a commit that referenced this pull request Aug 25, 2025
Rollup of 10 pull requests

Successful merges:

 - #135761 (Dial down detail of B-tree description)
 - #145620 (Account for impossible bounds making seemingly unsatisfyable dyn-to-dyn casts)
 - #145788 (Fix attribute target checking for macro calls)
 - #145794 (bootstrap.py: Improve CPU detection on NetBSD)
 - #145817 (cg_llvm: Replace the `llvm::Bool` typedef with a proper newtype)
 - #145820 (raw-dylib-elf: set correct `DT_VERDEFNUM`)
 - #145828 (Update `bitflags` to 2.9.3.)
 - #145830 (Remove the lifetime from `ExpTokenPair`/`SeqSep`.)
 - #145836 (Remove outdated bug comments)
 - #145842 (rustc-dev-guide subtree update)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 9b46273 into rust-lang:master Aug 25, 2025
11 checks passed
@rustbot rustbot added this to the 1.91.0 milestone Aug 25, 2025
rust-timer added a commit that referenced this pull request Aug 25, 2025
Rollup merge of #135761 - hkBst:patch-9, r=ibraheemdev

Dial down detail of B-tree description

fixes #134088, though it is a shame to lose some of this wonderful detail.

r? `@workingjubilee`

EDIT: newest versions keep old detail, but move it down a bit.
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this pull request Aug 26, 2025
Dial down detail of B-tree description

fixes rust-lang#134088, though it is a shame to lose some of this wonderful detail.

r? `@workingjubilee`

EDIT: newest versions keep old detail, but move it down a bit.
github-actions bot pushed a commit to rust-lang/rustc-dev-guide that referenced this pull request Sep 8, 2025
Rollup of 10 pull requests

Successful merges:

 - rust-lang/rust#135761 (Dial down detail of B-tree description)
 - rust-lang/rust#145620 (Account for impossible bounds making seemingly unsatisfyable dyn-to-dyn casts)
 - rust-lang/rust#145788 (Fix attribute target checking for macro calls)
 - rust-lang/rust#145794 (bootstrap.py: Improve CPU detection on NetBSD)
 - rust-lang/rust#145817 (cg_llvm: Replace the `llvm::Bool` typedef with a proper newtype)
 - rust-lang/rust#145820 (raw-dylib-elf: set correct `DT_VERDEFNUM`)
 - rust-lang/rust#145828 (Update `bitflags` to 2.9.3.)
 - rust-lang/rust#145830 (Remove the lifetime from `ExpTokenPair`/`SeqSep`.)
 - rust-lang/rust#145836 (Remove outdated bug comments)
 - rust-lang/rust#145842 (rustc-dev-guide subtree update)

r? `@ghost`
`@rustbot` modify labels: rollup
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BTreeMap documentation has confusing wording about binary search tree performance issues

8 participants