Skip to content

xfbs/imstr

Repository files navigation

Immutable Strings

crates.io docs.rs

This crate offers a cheaply cloneable and sliceable UTF-8 string type. It is inspired by thebytescrate, which offers zero-copy byte slices, and the imcrate which offers immutable copy-on-write data structures. It offers a standard-libraryString-compatible API.

Internally, the crate uses a standard library string stored in a smart pointer, and a range into thatString.This allows for cheap zero-copy cloning and slicing of the string. This is especially useful for parsing operations, where a large string needs to be sliced into a lot of substrings.

TL;DR: This crate offers anImString type that acts as aString(in that it can be modified and used in the same way), anArc<String>(in that it is cheap to clone) and an&str(in that it is cheap to slice) all in one, owned type.

Diagram of ImString Internals

This crate offers a safe API that ensures that every string and every string slice is UTF-8 encoded. It does not allow slicing of strings within UTF-8 multibyte sequences. It offerstry_*functions for every operation that can fail to avoid panics. It also uses extensive unit testing with a full test coverage to ensure that there is no unsoundness.

Features

Efficient Cloning:The crate's architecture enables low-cost (zero-copy) clone and slice creation, making it ideal for parsing strings that are widely shared.

Efficient Slicing:The crate's architecture enables low-cost (zero-copy) slice creation, making it ideal for parsing operations where one large input string is slices into many smaller strings.

Copy on Write:Despite being cheap to clone and slice, it allows for mutation using copy-on-write. For strings that are not shared, it has an optimisation to be able to mutate it in-place safely to avoid unnecessary copying.

Compatibility:The API is designed to closely resemble Rust's standard libraryString,facilitating smooth integration and being almost a drop-in replacement. It also integrates with many popular Rust crates, such as serde,pegandnom.

Generic over Storage:The crate is flexible in terms of how the data is stored. It allows for usingArc<String>for multithreaded applications and Rc<String>for single-threaded use, providing adaptability to different storage requirements and avoiding the need to pay for atomic operations when they are not needed.

Safety:The crate enforces that all strings and string slices are UTF-8 encoded. Any methods that might violate this are marked as unsafe. All methods that can fail have atry_*variant that will not panic. Use of safe functions cannot result in unsound behaviour.

Example

useimstr::ImString;

// Create new ImString, allocates data.
letmutstring =ImString::from("Hello, World");

// Edit: happens in-place (because this is the only reference).
string.push_str("!");

// Clone: this is zero-copy.
letclone = string.clone();

// Slice: this is zero-copy.
lethello = string.slice(0..5);
assert_eq!(hello,"Hello");

// Slice: this is zero-copy.
letworld = string.slice(7..12);
assert_eq!(world,"World");

// Here we have to copy only the part that the slice refers to so it can be modified.
lethello = hello +"!";
assert_eq!(hello,"Hello!");

Optional Features

Optional features that can be turned on using feature-flags.

Feature Description
serde Serialize and deserializeImStringfields as strings with theserdecrate.
peg UseImStringas the data structure that is parsed with thepegcrate. Seepeg-list.rsfor an example.
nom AllowImStringto be used to build parsers withnom.Seenom-json.rsfor an example.

Similar

There are several crates similar to this, which are listed in theRust String Benchmarks.You may want to check the other crates out as well.

This is a comparison of this crate to other, similar crates. The comparison is made on these features:

  • Cheap Clone:is it a zero-copy operation to clone a string?
  • Cheap Slice🍕: is it possibly to cheaply slice a string?
  • Mutable:is it possible to modify strings?
  • Generic Storage:is it possible to swap out the storage mechanism?
  • String Compatible:is it compatible withString?

Here is the data, with links to the crates for further examination:

Crate Cheap Clone Cheap Slice Mutable Generic Storage String Compatible Notes
imstr ✔️ ✔️ ✔️ ✔️ ✔️ This crate.
tendril ✔️ ✔️ ✔️ ✔️ Complex implementation. API not quite compatible withString,but otherwise closest to what this crate does.
immut_string ✔️ 🟡 (no optimization) Simply a wrapper aroundArc<String>.
immutable_string ✔️ Wrapper aroundArc<str>.
arccstr ✔️ Not UTF-8 (Null-terminated C string). Hand-writtenArcimplementation.
implicit-clone ✔️ 🟡 ✔️ Immutable string library. Hassyncandunsyncvariants.
semistr Stores short strings inline.
quetta ✔️ ✔️ Wrapper aroundArc<String>that can be sliced.
bytesstr ✔️ 🟡 Wrapper aroundBytes.Cannot be directly sliced.
fast-str ✔️ Looks like there could be some unsafety.
flexstr ✔️ ✔️
bytestring ✔️ 🟡 Wrapper aroundBytes.Used byactix.Can be indirectly sliced usingslice_ref().
arcstr ✔️ ✔️ Can store string literal as&'static str.
cowstr ✔️ ✔️ ReimplementsArc,custom allocation strategy.
strck ✔️ Typechecked string library.

License

MIT, seeLICENSE.md.