1
- # Stringleton String Interner
1
+ # Stringleton
2
2
3
3
Extremely efficient string interning solution for Rust crates.
4
4
@@ -9,22 +9,22 @@ Extremely efficient string interning solution for Rust crates.
9
9
- Symbol literals (` sym!(...) ` ) are "free" at the call-site. Multiple
10
10
invocations with the same string value are eagerly reconciled on program
11
11
startup, using link-time tricks.
12
+ - Symbols are tiny. Just a single pointer - 8 bytes on 64-bit platforms.
12
13
- Debugger friendly: If your debugger is able to display a plain Rust ` &str ` , it
13
14
is capable of displaying ` Symbol ` .
14
15
- Dynamic library support: Symbols can be passed across dynamic linking
15
16
boundaries (terms and conditions apply - see the documentation of
16
17
` stringleton-dylib ` ).
17
18
- ` no_std ` support: ` std ` synchronization primitives used in the symbol registry
18
- can be replaced with ` once_cell ` and ` spin ` . (` alloc ` is still needed by the
19
- internal hash table used in the registry.)
19
+ can be replaced with ` once_cell ` and ` spin ` . _ See below for caveats._
20
20
- ` serde ` support - symbols are serialized/deserialized as strings.
21
21
- Fast bulk-insertion of symbols at runtime.
22
22
23
23
## Good use cases
24
24
25
25
- You have lots of little strings that you need to frequently copy and compare.
26
26
- Your strings come from trusted sources.
27
- - You need good debugger support for your symbols.
27
+ - You want good debugger support for your symbols.
28
28
29
29
## Bad use cases
30
30
@@ -36,7 +36,7 @@ Extremely efficient string interning solution for Rust crates.
36
36
37
37
## Usage
38
38
39
- Add ` stringleton ` as a dependency of your project.
39
+ Add ` stringleton ` as a dependency of your project, and then you can do:
40
40
41
41
``` rust,ignore
42
42
use stringleton::{sym, Symbol};
@@ -56,13 +56,30 @@ assert_eq!(message, message2);
56
56
assert_eq!(message.as_str().as_ptr(), message2.as_str().as_ptr());
57
57
```
58
58
59
+ ## Crate features
60
+
61
+ - ** std** _ (enabled by default)_ : Use synchronization primitives from the
62
+ standard library. Implies ` alloc ` . When disabled, ` critical-section ` and
63
+ ` spin ` must both be enabled _ (see below for caveats)_ .
64
+ - ** alloc** _ (enabled by default)_ : Support creating symbols from ` String ` .
65
+ - ** serde** : Implements ` serde::Serialize ` and ` serde::Deserialize ` for symbols,
66
+ which will be serialized/deserialized as plain strings.
67
+ - ** debug-assertions** : Enables expensive debugging checks at runtime - mostly
68
+ useful to diagnose problems in complicated linker scenarios.
69
+ - ** critical-section** : When ` std ` is not enabled, this enables ` once_cell ` as a
70
+ dependency with the ` critical-section ` feature enabled. Only relevant in
71
+ ` no_std ` environments. _ [ See ` critical-section ` for more
72
+ details.] ( https://docs.rs/critical-section/latest/critical_section/ ) _
73
+ - ** spin** : When ` std ` is not enabled, this enables ` spin ` as a dependency,
74
+ which is used to obtain global read/write locks on the symbol registry. Only
75
+ relevant in ` no_std ` environments (and is a pessimization in other
76
+ environments).
77
+
59
78
## Efficiency
60
79
61
80
Stringleton tries to be as efficient as possible, but it may make different
62
- tradeoffs than other string interning libraries.
63
-
64
- In particular, Stringleton is optimized towards making the use of the
65
- ` sym!(...) ` macro practically free.
81
+ tradeoffs than other string interning libraries. In particular, Stringleton is
82
+ optimized towards making the use of the ` sym!(...) ` macro practically free.
66
83
67
84
Consider this function:
68
85
@@ -81,6 +98,79 @@ get_symbol:
81
98
8bf7 ret
82
99
```
83
100
101
+ This is "as fast as it gets", but the price is that all symbols in the program
102
+ are deduplicated when the program starts. Any theoretically faster solution
103
+ would need fairly deep cooperation from the compiler aimed at this specific use
104
+ case.
105
+
106
+ Also, symbol literals are _ always_ a memory load. The compiler cannot perform
107
+ optimizations based on the contents of symbols, because it doesn't know how they
108
+ will be reconciled until link time. For example, while ` sym!(a) != sym!(a) ` is
109
+ always false, the compiler cannot eliminate code paths relying on that.
110
+
111
+ ## Dynamic libraries
112
+
113
+ Stringleton relies on magical linker tricks (supported by ` linkme ` and ` ctor ` )
114
+ to minimize the cost of the ` sym!(...) ` macro at runtime. These tricks are
115
+ broadly compatible with dynamic libraries, but there are a few caveats:
116
+
117
+ 1 . When a Rust ` dylib ` crate appears in the dependency graph, and it has
118
+ ` stringleton ` as a dependency, things should "just work", due to Rust's
119
+ [ linkage rules] ( https://doc.rust-lang.org/reference/linkage.html ) .
120
+ 2 . When a Rust ` cdylib ` crate appears in the dependency graph, Cargo seems to be
121
+ a little less clever, and the ` cdylib ` dependency may need to use the
122
+ ` stringleton-dylib ` crate instead. Due to Rust's linkage rules, this will
123
+ cause the "host" crate to also link dynamically with Stringleton, and
124
+ everything will continue to work.
125
+ 3 . When a library is loaded dynamically at runtime, and it does not appear in
126
+ the dependency graph, the "host" crate must be prevented from linking
127
+ statically to ` stringleton ` , because it would either cause duplicate symbol
128
+ definitions, or worse, the host and client binaries would disagree about
129
+ which ` Registry ` to use. To avoid this, the _ host_ binary can use
130
+ ` stringleton-dylib ` explicitly instead of ` stringleton ` , which forces dynamic
131
+ linkage of the symbol registry.
132
+ 4 . Dynamically _ unloading_ libraries is extremely risky (` dlclose() ` and
133
+ similar). Unloading a library that has any calls to the ` sym!(..) ` or
134
+ ` static_sym!(..) ` macros is instant UB. Such a library can in principle use
135
+ ` Symbol::new() ` , but probably not ` Symbol::new_static() ` .
136
+
137
+ To summarize:
138
+
139
+ 1 . When no dynamic libraries are present in the project, it is always best to
140
+ use ` stringleton ` directly.
141
+ 2 . When only normal Rust dynamic libraries (` crate-type = ["dylib"] ` ) are
142
+ present, it is also fine to use ` stringleton ` directly - Cargo and rustc will
143
+ figure out how to link things correctly.
144
+ 3 . ` cdylib ` dependencies should use ` stringleton-dylib ` . The host can use
145
+ ` stringleton ` .
146
+ 4 . When loading dynamic libraries at runtime, both sides should use
147
+ ` stringleton-dylib ` instead of ` stringleton ` .
148
+ 5 . Do not unload dynamic libraries at runtime unless you are really, really sure
149
+ what you are doing.
150
+
151
+ ## ` no_std ` caveats
152
+
153
+ Stringleton works in ` no_std ` environments, but it does fundamentally require
154
+ two things:
155
+
156
+ 1 . Allocator support, in order to maintain the global symbol registry. This is a
157
+ ` hashbrown ` hash map.
158
+ 2 . Some synchronization primitives to control access to the global symbol
159
+ registry when new symbols are created.
160
+
161
+ The latter can be supported by the ` spin ` and ` critical-section ` features:
162
+
163
+ - ` spin ` replaces ` std::sync::RwLock ` , and is almost always a worse choice when
164
+ ` std ` is available.
165
+ - ` critical-section ` replaces ` std::sync::OnceLock ` with
166
+ [ ` once_cell::sync::OnceCell ` ] ( https://docs.rs/once_cell/latest/once_cell/sync/struct.OnceCell.html ) ,
167
+ and enables the ` critical-secion ` feature of ` once_cell ` . Using
168
+ ` critical-section ` requires additional work, because you must manually link in
169
+ a crate that provides the relevant synchronization primitive for the target
170
+ platform.
171
+
172
+ Do not use these features unless you are familiar with the tradeoffs.
173
+
84
174
## Name
85
175
86
176
The name is a portmanteau of "string" and "singleton".
0 commit comments