This crate offers a cheaply cloneable and sliceable UTF-8 string type. It is
inspired by thebytes
crate, which offers zero-copy byte slices, and the
im
crate which offers immutable copy-on-write data structures. It offers
a standard-libraryString
-compatible API.
Internally, the crate uses a standard library string stored in a smart pointer,
and a range into thatString
.This allows for cheap zero-copy cloning and
slicing of the string. This is especially useful for parsing operations, where
a large string needs to be sliced into a lot of substrings.
TL;DR: This crate offers an
ImString
type that acts as aString
(in that it can be modified and used in the same way), anArc<String>
(in that it is cheap to clone) and an&str
(in that it is cheap to slice) all in one, owned type.
This crate offers a safe API that ensures that every string and every string
slice is UTF-8 encoded. It does not allow slicing of strings within UTF-8
multibyte sequences. It offerstry_*
functions for every operation that can
fail to avoid panics. It also uses extensive unit testing with a full test
coverage to ensure that there is no unsoundness.
Efficient Cloning:The crate's architecture enables low-cost (zero-copy) clone and slice creation, making it ideal for parsing strings that are widely shared.
Efficient Slicing:The crate's architecture enables low-cost (zero-copy) slice creation, making it ideal for parsing operations where one large input string is slices into many smaller strings.
Copy on Write:Despite being cheap to clone and slice, it allows for mutation using copy-on-write. For strings that are not shared, it has an optimisation to be able to mutate it in-place safely to avoid unnecessary copying.
Compatibility:The API is designed to closely resemble Rust's standard
libraryString
,facilitating smooth integration and being almost a drop-in
replacement. It also integrates with many popular Rust crates, such as
serde
,peg
andnom
.
Generic over Storage:The crate is flexible in terms of how the data is
stored. It allows for usingArc<String>
for multithreaded applications and
Rc<String>
for single-threaded use, providing adaptability to different
storage requirements and avoiding the need to pay for atomic operations when
they are not needed.
Safety:The crate enforces that all strings and string slices are UTF-8
encoded. Any methods that might violate this are marked as unsafe. All methods
that can fail have atry_*
variant that will not panic. Use of safe functions
cannot result in unsound behaviour.
useimstr::ImString;
// Create new ImString, allocates data.
letmutstring =ImString::from("Hello, World");
// Edit: happens in-place (because this is the only reference).
string.push_str("!");
// Clone: this is zero-copy.
letclone = string.clone();
// Slice: this is zero-copy.
lethello = string.slice(0..5);
assert_eq!(hello,"Hello");
// Slice: this is zero-copy.
letworld = string.slice(7..12);
assert_eq!(world,"World");
// Here we have to copy only the part that the slice refers to so it can be modified.
lethello = hello +"!";
assert_eq!(hello,"Hello!");
Optional features that can be turned on using feature-flags.
Feature | Description |
---|---|
serde |
Serialize and deserializeImString fields as strings with theserde crate. |
peg |
UseImString as the data structure that is parsed with thepeg crate. Seepeg-list.rs for an example. |
nom |
AllowImString to be used to build parsers withnom .Seenom-json.rs for an example. |
There are several crates similar to this, which are listed in theRust String Benchmarks.You may want to check the other crates out as well.
This is a comparison of this crate to other, similar crates. The comparison is made on these features:
- Cheap Clone:is it a zero-copy operation to clone a string?
- Cheap Slice🍕: is it possibly to cheaply slice a string?
- Mutable:is it possible to modify strings?
- Generic Storage:is it possible to swap out the storage mechanism?
- String Compatible:is it compatible with
String
?
Here is the data, with links to the crates for further examination:
Crate | Cheap Clone | Cheap Slice | Mutable | Generic Storage | String Compatible | Notes |
---|---|---|---|---|---|---|
imstr |
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | This crate. |
tendril |
✔️ | ✔️ | ✔️ | ✔️ | ❌ | Complex implementation. API not quite compatible withString ,but otherwise closest to what this crate does. |
immut_string |
✔️ | ❌ | 🟡 (no optimization) | ❌ | ❌ | Simply a wrapper aroundArc<String> . |
immutable_string |
✔️ | ❌ | ❌ | ❌ | ❌ | Wrapper aroundArc<str> . |
arccstr |
✔️ | ❌ | ❌ | ❌ | ❌ | Not UTF-8 (Null-terminated C string). Hand-writtenArc implementation. |
implicit-clone |
✔️ | ❌ | ❌ | 🟡 | ✔️ | Immutable string library. Hassync andunsync variants. |
semistr |
❌ | ❌ | ❌ | ❌ | ❌ | Stores short strings inline. |
quetta |
✔️ | ✔️ | ❌ | ❌ | ❌ | Wrapper aroundArc<String> that can be sliced. |
bytesstr |
✔️ | 🟡 | ❌ | ❌ | ❌ | Wrapper aroundBytes .Cannot be directly sliced. |
fast-str |
✔️ | ❌ | ❌ | ❌ | ❌ | Looks like there could be some unsafety. |
flexstr |
✔️ | ❌ | ❌ | ✔️ | ❌ | |
bytestring |
✔️ | 🟡 | ❌ | ❌ | ❌ | Wrapper aroundBytes .Used byactix .Can be indirectly sliced usingslice_ref() . |
arcstr |
✔️ | ✔️ | ❌ | ❌ | ❌ | Can store string literal as&'static str . |
cowstr |
✔️ | ❌ | ✔️ | ❌ | ❌ | ReimplementsArc ,custom allocation strategy. |
strck |
❌ | ❌ | ❌ | ✔️ | ❌ | Typechecked string library. |
MIT, seeLICENSE.md.