Crate zerovec

Source
Expand description

Zero-copy vector abstractions for arbitrary types, backed by byte slices.

zerovec enables a far wider range of types — beyond just &[u8] and &str — to participate in zero-copy deserialization from byte slices. It is serde compatible and comes equipped with proc macros

Clients upgrading to zerovec benefit from zero heap allocations when deserializing read-only data.

This crate has four main types:

The first two are intended as close-to-drop-in replacements for Vec<T> in Serde structs. The third and fourth are intended as a replacement for HashMap or LiteMap. When used with Serde derives, be sure to apply #[serde(borrow)] to these types, same as one would for Cow<'a, T>.

ZeroVec<'a, T>, VarZeroVec<'a, T>, ZeroMap<'a, K, V>, and ZeroMap2d<'a, K0, K1, V> all behave like Cow<'a, T> in that they abstract over either borrowed or owned data. When performing deserialization from human-readable formats (like json and xml), typically these types will allocate and fully own their data, whereas if deserializing from binary formats like bincode and postcard, these types will borrow data directly from the buffer being deserialized from, avoiding allocations and only performing validity checks. As such, this crate can be pretty fast (see below for more information) on deserialization.

See the design doc for details on how this crate works under the hood.

§Cargo features

This crate has several optional Cargo features:

  • serde: Allows serializing and deserializing zerovec’s abstractions via serde
  • yoke: Enables implementations of Yokeable from the yoke crate, which is also useful in situations involving a lot of zero-copy deserialization.
  • derive: Makes it easier to use custom types in these collections by providing the #[make_ule] and #[make_varule] proc macros, which generate appropriate ULE and VarULE-conformant types for a given “normal” type.
  • std: Enabled std::Error implementations for error types. This crate is by default no_std with a dependency on alloc.

§Examples

Serialize and deserialize a struct with ZeroVec and VarZeroVec with Bincode:

use zerovec::{VarZeroVec, ZeroVec};

// This example requires the "serde" feature
#[derive(serde::Serialize, serde::Deserialize)]
pub struct DataStruct<'data> {
    #[serde(borrow)]
    nums: ZeroVec<'data, u32>,
    #[serde(borrow)]
    chars: ZeroVec<'data, char>,
    #[serde(borrow)]
    strs: VarZeroVec<'data, str>,
}

let data = DataStruct {
    nums: ZeroVec::from_slice_or_alloc(&[211, 281, 421, 461]),
    chars: ZeroVec::alloc_from_slice(&['ö', '冇', 'म']),
    strs: VarZeroVec::from(&["hello", "world"]),
};
let bincode_bytes =
    bincode::serialize(&data).expect("Serialization should be successful");
assert_eq!(bincode_bytes.len(), 67);

let deserialized: DataStruct = bincode::deserialize(&bincode_bytes)
    .expect("Deserialization should be successful");
assert_eq!(deserialized.nums.first(), Some(211));
assert_eq!(deserialized.chars.get(1), Some('冇'));
assert_eq!(deserialized.strs.get(1), Some("world"));
// The deserialization will not have allocated anything
assert!(!deserialized.nums.is_owned());

Use custom types inside of ZeroVec:

use zerovec::{ZeroVec, VarZeroVec, ZeroMap};
use std::borrow::Cow;
use zerovec::ule::encode_varule_to_box;

// custom fixed-size ULE type for ZeroVec
#[zerovec::make_ule(DateULE)]
#[derive(Copy, Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
struct Date {
    y: u64,
    m: u8,
    d: u8
}

// custom variable sized VarULE type for VarZeroVec
#[zerovec::make_varule(PersonULE)]
#[zerovec::derive(Serialize, Deserialize)] // add Serde impls to PersonULE
#[derive(Clone, PartialEq, Eq, Ord, PartialOrd, serde::Serialize, serde::Deserialize)]
struct Person<'a> {
    birthday: Date,
    favorite_character: char,
    #[serde(borrow)]
    name: Cow<'a, str>,
}

#[derive(serde::Serialize, serde::Deserialize)]
struct Data<'a> {
    #[serde(borrow)]
    important_dates: ZeroVec<'a, Date>,
    // note: VarZeroVec always must reference the ULE type directly
    #[serde(borrow)]
    important_people: VarZeroVec<'a, PersonULE>,
    #[serde(borrow)]
    birthdays_to_people: ZeroMap<'a, Date, PersonULE>
}


let person1 = Person {
    birthday: Date { y: 1990, m: 9, d: 7},
    favorite_character: 'π',
    name: Cow::from("Kate")
};
let person2 = Person {
    birthday: Date { y: 1960, m: 5, d: 25},
    favorite_character: '冇',
    name: Cow::from("Jesse")
};

let important_dates = ZeroVec::alloc_from_slice(&[Date { y: 1943, m: 3, d: 20}, Date { y: 1976, m: 8, d: 2}, Date { y: 1998, m: 2, d: 15}]);
let important_people = VarZeroVec::from(&[&person1, &person2]);
let mut birthdays_to_people: ZeroMap<Date, PersonULE> = ZeroMap::new();
// `.insert_var_v()` is slightly more convenient over `.insert()` for custom ULE types
birthdays_to_people.insert_var_v(&person1.birthday, &person1);
birthdays_to_people.insert_var_v(&person2.birthday, &person2);

let data = Data { important_dates, important_people, birthdays_to_people };

let bincode_bytes = bincode::serialize(&data)
    .expect("Serialization should be successful");
assert_eq!(bincode_bytes.len(), 168);

let deserialized: Data = bincode::deserialize(&bincode_bytes)
    .expect("Deserialization should be successful");

assert_eq!(deserialized.important_dates.get(0).unwrap().y, 1943);
assert_eq!(&deserialized.important_people.get(1).unwrap().name, "Jesse");
assert_eq!(&deserialized.important_people.get(0).unwrap().name, "Kate");
assert_eq!(&deserialized.birthdays_to_people.get(&person1.birthday).unwrap().name, "Kate");

} // feature = serde and derive

§Performance

zerovec is designed for fast deserialization from byte buffers with zero memory allocations while minimizing performance regressions for common vector operations.

Benchmark results on x86_64:

OperationVec<T>zerovec
Deserialize vec of 100 u32233.18 ns14.120 ns
Compute sum of vec of 100 u32 (read every element)8.7472 ns10.775 ns
Binary search vec of 1000 u32 50 times442.80 ns472.51 ns
Deserialize vec of 100 strings7.3740 μs*1.4495 μs
Count chars in vec of 100 strings (read every element)747.50 ns955.28 ns
Binary search vec of 500 strings 10 times466.09 ns790.33 ns

* This result is reported for Vec<String>. However, Serde also supports deserializing to the partially-zero-copy Vec<&str>; this gives 1.8420 μs, much faster than Vec<String> but a bit slower than zerovec.

OperationHashMap<K,V>LiteMap<K,V>ZeroMap<K,V>
Deserialize a small map2.72 μs1.28 μs480 ns
Deserialize a large map50.5 ms18.3 ms3.74 ms
Look up from a small deserialized map49 ns42 ns54 ns
Look up from a large deserialized map51 ns155 ns213 ns

Small = 16 elements, large = 131,072 elements. Maps contain <String, String>.

The benches used to generate the above table can be found in the benches directory in the project repository. zeromap benches are named by convention, e.g. zeromap/deserialize/small, zeromap/lookup/large. The type is appended for baseline comparisons, e.g. zeromap/lookup/small/hashmap.

Modules§

  • This module contains additional utility types and traits for working with ZeroMap and ZeroMap2d. See their docs for more details on the general purpose of these types.
  • Traits over unaligned little-endian data (ULE, pronounced “yule”).
  • This module contains additional utility types for working with ZeroVec and VarZeroVec. See their docs for more details on the general purpose of these types.

Macros§

  • Given Self ($aligned), Self::ULE ($unaligned), and a conversion function ($single or Self::from_aligned), implement from_array for arrays of $aligned to $unaligned.
  • Convenience wrapper for ZeroSlice::from_ule_slice. The value will be created at compile-time, meaning that all arguments must also be constant.
  • Creates a borrowed ZeroVec. Convenience wrapper for zeroslice!(...).as_zerovec(). The value will be created at compile-time, meaning that all arguments must also be constant.

Structs§

  • A zero-copy “slice”, that works for unsized types, i.e. the zero-copy version of [T] where T is not Sized.
  • A zero-copy map datastructure, built on sorted binary-searchable ZeroVec and VarZeroVec.
  • A zero-copy, two-dimensional map datastructure .
  • A zero-copy “slice”, i.e. the zero-copy version of [T]. This behaves similarly to ZeroVec<T>, however ZeroVec<T> is allowed to contain owned data and as such is ideal for deserialization since most human readable serialization formats cannot unconditionally deserialize zero-copy.
  • A zero-copy, byte-aligned vector for fixed-width types.

Enums§

  • A zero-copy, byte-aligned vector for variable-width types.
  • A generic error type to be used for decoding slices of ULE types

Attribute Macros§