Skip to content

Commit 72a2d42

Browse files
committed
static_sym!(), documentation
1 parent d259585 commit 72a2d42

File tree

15 files changed

+416
-94
lines changed

15 files changed

+416
-94
lines changed

Cargo.lock

Lines changed: 0 additions & 25 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 99 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Stringleton String Interner
1+
# Stringleton
22

33
Extremely efficient string interning solution for Rust crates.
44

@@ -9,22 +9,22 @@ Extremely efficient string interning solution for Rust crates.
99
- Symbol literals (`sym!(...)`) are "free" at the call-site. Multiple
1010
invocations with the same string value are eagerly reconciled on program
1111
startup, using link-time tricks.
12+
- Symbols are tiny. Just a single pointer - 8 bytes on 64-bit platforms.
1213
- Debugger friendly: If your debugger is able to display a plain Rust `&str`, it
1314
is capable of displaying `Symbol`.
1415
- Dynamic library support: Symbols can be passed across dynamic linking
1516
boundaries (terms and conditions apply - see the documentation of
1617
`stringleton-dylib`).
1718
- `no_std` support: `std` synchronization primitives used in the symbol registry
18-
can be replaced with `once_cell` and `spin`. (`alloc` is still needed by the
19-
internal hash table used in the registry.)
19+
can be replaced with `once_cell` and `spin`. _See below for caveats._
2020
- `serde` support - symbols are serialized/deserialized as strings.
2121
- Fast bulk-insertion of symbols at runtime.
2222

2323
## Good use cases
2424

2525
- You have lots of little strings that you need to frequently copy and compare.
2626
- Your strings come from trusted sources.
27-
- You need good debugger support for your symbols.
27+
- You want good debugger support for your symbols.
2828

2929
## Bad use cases
3030

@@ -36,7 +36,7 @@ Extremely efficient string interning solution for Rust crates.
3636

3737
## Usage
3838

39-
Add `stringleton` as a dependency of your project.
39+
Add `stringleton` as a dependency of your project, and then you can do:
4040

4141
```rust,ignore
4242
use stringleton::{sym, Symbol};
@@ -56,13 +56,30 @@ assert_eq!(message, message2);
5656
assert_eq!(message.as_str().as_ptr(), message2.as_str().as_ptr());
5757
```
5858

59+
## Crate features
60+
61+
- **std** _(enabled by default)_: Use synchronization primitives from the
62+
standard library. Implies `alloc`. When disabled, `critical-section` and
63+
`spin` must both be enabled _(see below for caveats)_.
64+
- **alloc** _(enabled by default)_: Support creating symbols from `String`.
65+
- **serde**: Implements `serde::Serialize` and `serde::Deserialize` for symbols,
66+
which will be serialized/deserialized as plain strings.
67+
- **debug-assertions**: Enables expensive debugging checks at runtime - mostly
68+
useful to diagnose problems in complicated linker scenarios.
69+
- **critical-section**: When `std` is not enabled, this enables `once_cell` as a
70+
dependency with the `critical-section` feature enabled. Only relevant in
71+
`no_std` environments. _[See `critical-section` for more
72+
details.](https://docs.rs/critical-section/latest/critical_section/)_
73+
- **spin**: When `std` is not enabled, this enables `spin` as a dependency,
74+
which is used to obtain global read/write locks on the symbol registry. Only
75+
relevant in `no_std` environments (and is a pessimization in other
76+
environments).
77+
5978
## Efficiency
6079

6180
Stringleton tries to be as efficient as possible, but it may make different
62-
tradeoffs than other string interning libraries.
63-
64-
In particular, Stringleton is optimized towards making the use of the
65-
`sym!(...)` macro practically free.
81+
tradeoffs than other string interning libraries. In particular, Stringleton is
82+
optimized towards making the use of the `sym!(...)` macro practically free.
6683

6784
Consider this function:
6885

@@ -81,6 +98,79 @@ get_symbol:
8198
8bf7 ret
8299
```
83100

101+
This is "as fast as it gets", but the price is that all symbols in the program
102+
are deduplicated when the program starts. Any theoretically faster solution
103+
would need fairly deep cooperation from the compiler aimed at this specific use
104+
case.
105+
106+
Also, symbol literals are _always_ a memory load. The compiler cannot perform
107+
optimizations based on the contents of symbols, because it doesn't know how they
108+
will be reconciled until link time. For example, while `sym!(a) != sym!(a)` is
109+
always false, the compiler cannot eliminate code paths relying on that.
110+
111+
## Dynamic libraries
112+
113+
Stringleton relies on magical linker tricks (supported by `linkme` and `ctor`)
114+
to minimize the cost of the `sym!(...)` macro at runtime. These tricks are
115+
broadly compatible with dynamic libraries, but there are a few caveats:
116+
117+
1. When a Rust `dylib` crate appears in the dependency graph, and it has
118+
`stringleton` as a dependency, things should "just work", due to Rust's
119+
[linkage rules](https://doc.rust-lang.org/reference/linkage.html).
120+
2. When a Rust `cdylib` crate appears in the dependency graph, Cargo seems to be
121+
a little less clever, and the `cdylib` dependency may need to use the
122+
`stringleton-dylib` crate instead. Due to Rust's linkage rules, this will
123+
cause the "host" crate to also link dynamically with Stringleton, and
124+
everything will continue to work.
125+
3. When a library is loaded dynamically at runtime, and it does not appear in
126+
the dependency graph, the "host" crate must be prevented from linking
127+
statically to `stringleton`, because it would either cause duplicate symbol
128+
definitions, or worse, the host and client binaries would disagree about
129+
which `Registry` to use. To avoid this, the _host_ binary can use
130+
`stringleton-dylib` explicitly instead of `stringleton`, which forces dynamic
131+
linkage of the symbol registry.
132+
4. Dynamically _unloading_ libraries is extremely risky (`dlclose()` and
133+
similar). Unloading a library that has any calls to the `sym!(..)` or
134+
`static_sym!(..)` macros is instant UB. Such a library can in principle use
135+
`Symbol::new()`, but probably not `Symbol::new_static()`.
136+
137+
To summarize:
138+
139+
1. When no dynamic libraries are present in the project, it is always best to
140+
use `stringleton` directly.
141+
2. When only normal Rust dynamic libraries (`crate-type = ["dylib"]`) are
142+
present, it is also fine to use `stringleton` directly - Cargo and rustc will
143+
figure out how to link things correctly.
144+
3. `cdylib` dependencies should use `stringleton-dylib`. The host can use
145+
`stringleton`.
146+
4. When loading dynamic libraries at runtime, both sides should use
147+
`stringleton-dylib` instead of `stringleton`.
148+
5. Do not unload dynamic libraries at runtime unless you are really, really sure
149+
what you are doing.
150+
151+
## `no_std` caveats
152+
153+
Stringleton works in `no_std` environments, but it does fundamentally require
154+
two things:
155+
156+
1. Allocator support, in order to maintain the global symbol registry. This is a
157+
`hashbrown` hash map.
158+
2. Some synchronization primitives to control access to the global symbol
159+
registry when new symbols are created.
160+
161+
The latter can be supported by the `spin` and `critical-section` features:
162+
163+
- `spin` replaces `std::sync::RwLock`, and is almost always a worse choice when
164+
`std` is available.
165+
- `critical-section` replaces `std::sync::OnceLock` with
166+
[`once_cell::sync::OnceCell`](https://docs.rs/once_cell/latest/once_cell/sync/struct.OnceCell.html),
167+
and enables the `critical-secion` feature of `once_cell`. Using
168+
`critical-section` requires additional work, because you must manually link in
169+
a crate that provides the relevant synchronization primitive for the target
170+
platform.
171+
172+
Do not use these features unless you are familiar with the tradeoffs.
173+
84174
## Name
85175

86176
The name is a portmanteau of "string" and "singleton".

stringleton-dylib/Cargo.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,4 @@ alloc = ["stringleton-registry/alloc"]
2424
debug-assertions = ["stringleton-registry/debug-assertions"]
2525
serde = ["stringleton-registry/serde"]
2626
critical-section = ["stringleton-registry/critical-section"]
27-
race = ["stringleton-registry/race"]
2827
spin = ["stringleton-registry/spin"]

stringleton-dylib/lib.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
//! Dynamic linking support for Stringleton.
22
//!
3+
//! _[See the docs for `stringleton`](../stringleton/index.html)._
4+
//!
35
//! This crate always produces a dynamic library, and it should be used by any
46
//! crate that ends up being a `cdylib`. When this appears somewhere in the
57
//! dependency graph, it causes the Rust compiler to produce a dynamic version

stringleton-registry/Cargo.toml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,11 @@ workspace = true
1818
[dependencies]
1919
hashbrown.workspace = true
2020
# Using once_cell because `std::sync::OnceLock` is not available in no_std.
21-
once_cell = { version = "1.21.1", optional = true }
21+
once_cell = { version = "1.21.1", optional = true, default-features = false }
2222
serde = { workspace = true, optional = true }
23-
spin = { version = "0.9.8", optional = true }
23+
spin = { version = "0.9.8", optional = true, default-features = false, features = [
24+
"rwlock",
25+
] }
2426

2527
[features]
2628
default = ["std"]
@@ -29,5 +31,4 @@ alloc = []
2931
debug-assertions = []
3032
serde = ["dep:serde"]
3133
critical-section = ["once_cell/critical-section"]
32-
race = ["once_cell/race"]
3334
spin = ["dep:spin"]

stringleton-registry/lib.rs

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
//! Registry helper crate for `stringleton`
22
//!
3-
//! You probably don't have a use for this crate directly. Use the
3+
//! You probably don't need to use this crate directly. Use the
44
//! [`stringleton`](../stringleton) crate or the
55
//! [`stringleton-dylib`](../stringleton-dylib) crate instead.
66
//!
@@ -27,13 +27,15 @@
2727
#[cfg(feature = "std")]
2828
extern crate std;
2929

30-
#[cfg(feature = "alloc")]
30+
#[cfg(any(feature = "std", feature = "alloc"))]
3131
extern crate alloc;
3232

3333
mod registry;
3434
mod site;
35+
mod static_symbol;
3536
mod symbol;
3637

3738
pub use registry::*;
3839
pub use site::*;
40+
pub use static_symbol::*;
3941
pub use symbol::*;

stringleton-registry/registry.rs

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,20 @@ use hashbrown::{HashMap, hash_map};
66
#[cfg(feature = "alloc")]
77
use alloc::{borrow::ToOwned, boxed::Box};
88

9-
#[cfg(not(any(feature = "std", feature = "critical-section", feature = "race")))]
10-
compile_error!("Either the `std` or `critical-section` or `race` feature must be enabled");
9+
#[cfg(not(any(feature = "std", feature = "critical-section")))]
10+
compile_error!("Either the `std` or `critical-section` feature must be enabled");
1111
#[cfg(not(any(feature = "std", feature = "spin")))]
1212
compile_error!("Either the `std` or `spin` feature must be enabled");
1313

14-
#[cfg(feature = "std")]
15-
use std::sync::{OnceLock, RwLock, RwLockReadGuard, RwLockWriteGuard};
14+
#[cfg(feature = "spin")]
15+
use spin::{RwLock, RwLockReadGuard, RwLockWriteGuard};
16+
#[cfg(not(feature = "spin"))]
17+
use std::sync::{RwLock, RwLockReadGuard, RwLockWriteGuard};
1618

17-
#[cfg(not(feature = "std"))]
19+
#[cfg(feature = "critical-section")]
1820
use once_cell::sync::OnceCell as OnceLock;
19-
#[cfg(not(feature = "std"))]
20-
use spin::{RwLock, RwLockReadGuard, RwLockWriteGuard};
21+
#[cfg(not(feature = "critical-section"))]
22+
use std::sync::OnceLock;
2123

2224
/// Helper to control the behavior of symbol strings in the registry's hash map.
2325
#[derive(Clone, Copy, PartialEq, Eq)]
@@ -54,9 +56,9 @@ impl From<&str> for SymbolStr {
5456
/// This is available for advanced use cases, such as bulk-insertion of many
5557
/// symbols.
5658
pub struct Registry {
57-
#[cfg(feature = "std")]
59+
#[cfg(not(feature = "spin"))]
5860
store: std::sync::RwLock<Store>,
59-
#[cfg(all(feature = "spin", not(feature = "std")))]
61+
#[cfg(feature = "spin")]
6062
store: spin::RwLock<Store>,
6163
}
6264

@@ -100,12 +102,12 @@ impl Registry {
100102
#[inline]
101103
pub fn read(&'static self) -> RegistryReadGuard {
102104
RegistryReadGuard {
103-
#[cfg(feature = "std")]
105+
#[cfg(not(feature = "spin"))]
104106
guard: self
105107
.store
106108
.read()
107109
.unwrap_or_else(std::sync::PoisonError::into_inner),
108-
#[cfg(not(feature = "std"))]
110+
#[cfg(feature = "spin")]
109111
guard: self.store.read(),
110112
}
111113
}
@@ -117,12 +119,12 @@ impl Registry {
117119
#[inline]
118120
pub fn write(&'static self) -> RegistryWriteGuard {
119121
RegistryWriteGuard {
120-
#[cfg(feature = "std")]
122+
#[cfg(not(feature = "spin"))]
121123
guard: self
122124
.store
123125
.write()
124126
.unwrap_or_else(std::sync::PoisonError::into_inner),
125-
#[cfg(not(feature = "std"))]
127+
#[cfg(feature = "spin")]
126128
guard: self.store.write(),
127129
}
128130
}

0 commit comments

Comments
 (0)