1// This file is part of ICU4X. For terms of use, please see the file
2// called LICENSE at the top level of the ICU4X source tree
3// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).
45//! # Internal layout of ZeroTrie
6//!
7//! A ZeroTrie is composed of a series of nodes stored in sequence in a byte slice.
8//!
9//! There are 4 types of nodes:
10//!
11//! 1. ASCII (`0xxxxxxx`): matches a literal ASCII byte.
12//! 2. Span (`101xxxxx`): matches a span of non-ASCII bytes.
13//! 3. Value (`100xxxxx`): associates a value with a string
14//! 4. Branch (`11xxxxxx`): matches one of a set of bytes.
15//!
16//! Span, Value, and Branch nodes contain a varint, which has different semantics for each:
17//!
18//! - Span varint: length of the span
19//! - Value varint: value associated with the string
20//! - Branch varint: number of edges in the branch and width of the offset table
21//!
22//! If reading an ASCII, Span, or Branch node, one or more bytes are consumed from the input
23//! string. If the next byte(s) in the input string do not match the node, we return `None`.
24//! If reading a Value node, if the string is empty, return `Some(value)`; otherwise, we skip
25//! the Value node and continue on to the next node.
26//!
27//! When a node is consumed, a shorter, well-formed ZeroTrie remains.
28//!
29//! ### Basic Example
30//!
31//! Here is an example ZeroTrie without branch nodes:
32//!
33//! ```
34//! use zerotrie::ZeroTriePerfectHash;
35//!
36//! let bytes = [
37//! b'a', // ASCII literal
38//! 0b10001010, // value 10
39//! b'b', // ASCII literal
40//! 0b10100011, // span of 3
41//! 0x81, // first byte in span
42//! 0x91, // second byte in span
43//! 0xA1, // third and final byte in span
44//! 0b10000100, // value 4
45//! ];
46//!
47//! let trie = ZeroTriePerfectHash::from_bytes(&bytes);
48//!
49//! // First value: "a" → 10
50//! assert_eq!(trie.get(b"a"), Some(10));
51//!
52//! // Second value: "ab\x81\x91\xA1" → 4
53//! assert_eq!(trie.get(b"ab\x81\x91\xA1"), Some(4));
54//!
55//! // A few examples of strings that do NOT have values in the trie:
56//! assert_eq!(trie.get(b"ab"), None);
57//! assert_eq!(trie.get(b"b"), None);
58//! assert_eq!(trie.get(b"b\x81\x91\xA1"), None);
59//! ```
60//!
61//! ## Branch Nodes
62//!
63//! There are two types of branch nodes: binary search and perfect hash. `ZeroTrieSimpleAscii`
64//! contains only binary search nodes, whereas `ZeroTriePerfectHash` can contain either.
65//!
66//! The head node of the branch has a varint that encodes two things:
67//!
68//! - Bottom 8 bits: number of edges in the branch (`N`); if N = 0, set N to 256
69//! - Bits 9 and 10: width of the offset table (`W`)
70//!
71//! Note that N is always in the range [1, 256]. There can't be more than 256 edges because
72//! there are only 256 unique u8 values.
73//!
74//! A few examples of the head node of the branch:
75//!
76//! - `0b11000000`: varint bits `0`: N = 0 which means N = 256; W = 0
77//! - `0b11000110`: varint bits `110`: N = 6; W = 0
78//! - `0b11100000 0b00000101`: varint bits `1000101`: N = 69; W = 0
79//! - `0b11100010 0b00000000`: varint bits `101000000`: N = 64; W = 1
80//!
81//! In `ZeroTriePerfectHash`, if N <= 15, the branch is assumed to be a binary search, and if
82//! N > 15, the branch is assumed to be a perfect hash.
83//!
84//! ### Binary Search Branch Nodes
85//!
86//! A binary search branch node is used when:
87//!
88//! 1. The trie is a `ZeroTrieSimpleAscii`, OR
89//! 2. There are 15 or fewer items in the branch.
90//!
91//! The head branch node is followed by N sorted bytes. When evaluating a branch node, one byte
92//! is consumed from the input. If it is one of the N sorted bytes (scanned using binary search),
93//! the index `i` of the byte within the list is used to index into the offset table (described
94//! below). If the byte is not in the list, the string is not in the trie, so return `None`.
95//!
96//! ### Perfect Hash Branch Nodes
97//!
98//! A perfect hash branch node is used when:
99//!
100//! 1. The trie is NOT a `ZeroTrieSimpleAscii`, AND
101//! 2. There are 16 or more items in the branch.
102//!
103//! The head branch node is followed by 1 byte containing parameter `p`, N bytes containing
104//! parameters `q`, and N bytes containing the bytes to match. From these parameters, either an
105//! index within the hash table `i` is resolved and used as input to index into the offset
106//! table (described below), or the value is determined to not be present and `None` is
107//! returned. For more detail on resolving the perfect hash function, see [`crate::byte_phf`].
108//!
109//! ### Offset Tables
110//!
111//! The _offset table_ encodes the range of the remaining buffer containing the trie reachable
112//! from the byte matched in the branch node. Both types of branch nodes include an offset
113//! table followig the key lookup. Given the index `i` from the first step, the range
114//! `[s_i, s_(i+1))` brackets the next step in the trie.
115//!
116//! Offset tables utilize the `W` parameter stored in the branch head node. The special case
117//! when `W == 0`, with `N - 1` bytes, is easiest to understand:
118//!
119//! **Offset table, W = 0:** `[s_1, s_2, ..., s_(N-1)]`
120//!
121//! Note that `s_0` is always 0 and `s_N` is always the length of the remaining slice, so those
122//! values are not explicitly included in the offset table.
123//!
124//! When W > 0, the high and low bits of the offsets are in separate bytes, arranged as follows:
125//!
126//! **Generalized offset table:** `[a_1, a_2, ..., a_(N-1), b_1, b_2, ..., b_(N-1), c_1, ...]`
127//!
128//! where `s_i = (a_i << 8 + b_i) << 8 + c_i ...` (high bits first, low bits last)
129//!
130//! ### Advanced Example
131//!
132//! The following trie encodes the following map. It has multiple varints and branch nodes, which
133//! are all binary search with W = 0. Note that there is a value for the empty string.
134//!
135//! - "" → 0
136//! - "axb" → 100
137//! - "ayc" → 2
138//! - "azd" → 3
139//! - "bxe" → 4
140//! - "bxefg" → 500
141//! - "bxefh" → 6
142//! - "bxei" → 7
143//! - "bxeikl" → 8
144//!
145//! ```
146//! use zerotrie::ZeroTrieSimpleAscii;
147//!
148//! let bytes = [
149//! 0b10000000, // value 0
150//! 0b11000010, // branch of 2
151//! b'a', //
152//! b'b', //
153//! 13, //
154//! 0b11000011, // start of 'a' subtree: branch of 3
155//! b'x', //
156//! b'y', //
157//! b'z', //
158//! 3, //
159//! 5, //
160//! b'b', //
161//! 0b10010000, // value 100 (lead)
162//! 0x54, // value 100 (trail)
163//! b'c', //
164//! 0b10000010, // value 2
165//! b'd', //
166//! 0b10000011, // value 3
167//! b'x', // start of 'b' subtree
168//! b'e', //
169//! 0b10000100, // value 4
170//! 0b11000010, // branch of 2
171//! b'f', //
172//! b'i', //
173//! 7, //
174//! 0b11000010, // branch of 2
175//! b'g', //
176//! b'h', //
177//! 2, //
178//! 0b10010011, // value 500 (lead)
179//! 0x64, // value 500 (trail)
180//! 0b10000110, // value 6
181//! 0b10000111, // value 7
182//! b'k', //
183//! b'l', //
184//! 0b10001000, // value 8
185//! ];
186//!
187//! let trie = ZeroTrieSimpleAscii::from_bytes(&bytes);
188//!
189//! // Assert that the specified items are in the map
190//! assert_eq!(trie.get(b""), Some(0));
191//! assert_eq!(trie.get(b"axb"), Some(100));
192//! assert_eq!(trie.get(b"ayc"), Some(2));
193//! assert_eq!(trie.get(b"azd"), Some(3));
194//! assert_eq!(trie.get(b"bxe"), Some(4));
195//! assert_eq!(trie.get(b"bxefg"), Some(500));
196//! assert_eq!(trie.get(b"bxefh"), Some(6));
197//! assert_eq!(trie.get(b"bxei"), Some(7));
198//! assert_eq!(trie.get(b"bxeikl"), Some(8));
199//!
200//! // Assert that some other items are not in the map
201//! assert_eq!(trie.get(b"a"), None);
202//! assert_eq!(trie.get(b"bx"), None);
203//! assert_eq!(trie.get(b"xba"), None);
204//! ```
205206use crate::byte_phf::PerfectByteHashMap;
207use crate::cursor::AsciiProbeResult;
208use crate::helpers::*;
209use crate::options::*;
210use crate::varint::read_varint_meta2;
211use crate::varint::read_varint_meta3;
212213#[cfg(feature = "alloc")]
214use alloc::string::String;
215216/// Given a slice starting with an offset table, returns the trie for the given index.
217///
218/// Arguments:
219/// - `trie` = a trie pointing at an offset table (after the branch node and search table)
220/// - `i` = the desired index within the offset table
221/// - `n` = the number of items in the offset table
222/// - `w` = the width of the offset table items minus one
223#[inline]
224fn get_branch(mut trie: &[u8], i: usize, n: usize, mut w: usize) -> &[u8] {
225let mut p = 0usize;
226let mut q = 0usize;
227loop {
228let indices;
229 (indices, trie) = trie.debug_split_at(n - 1);
230p = (p << 8)
231 + if i == 0 {
2320
233} else {
234*indices.get(i - 1).debug_unwrap_or(&0) as usize235 };
236q = match indices.get(i) {
237Some(x) => (q << 8) + *xas usize,
238None => trie.len(),
239 };
240if w == 0 {
241break;
242 }
243w -= 1;
244 }
245trie.get(p..q).debug_unwrap_or(&[])
246}
247248/// Version of [`get_branch()`] specialized for the case `w == 0` for performance
249#[inline]
250fn get_branch_w0(mut trie: &[u8], i: usize, n: usize) -> &[u8] {
251let indices;
252 (indices, trie) = trie.debug_split_at(n - 1);
253let p = if i == 0 {
2540
255} else {
256*indices.get(i - 1).debug_unwrap_or(&0) as usize257 };
258let q = match indices.get(i) {
259Some(x) => *xas usize,
260None => trie.len(),
261 };
262trie.get(p..q).debug_unwrap_or(&[])
263}
264265/// The node type. See the module-level docs for more explanation of the four node types.
266enum NodeType {
267/// An ASCII node. Contains a single literal ASCII byte and no varint.
268Ascii,
269/// A span node. Contains a varint indicating how big the span is.
270Span,
271/// A value node. Contains a varint representing the value.
272Value,
273/// A branch node. Contains a varint of the number of output nodes, plus W in the high bits.
274Branch,
275}
276277impl core::fmt::Debugfor NodeType {
278fn fmt(&self, f: &mut core::fmt::Formatter) -> core::fmt::Result {
279use NodeType::*;
280f.write_str(match *self {
281Ascii => "a",
282Span => "s",
283Value => "v",
284Branch => "m",
285 })
286 }
287}
288289#[inline]
290fn byte_type(b: u8) -> NodeType {
291match b & 0b11100000 {
2920b10000000 => NodeType::Value,
2930b10100000 => NodeType::Span,
2940b11000000 => NodeType::Branch,
2950b11100000 => NodeType::Branch,
296_ => NodeType::Ascii,
297 }
298}
299300#[inline]
301pub(crate) fn get_parameterized<T: ZeroTrieWithOptions + ?Sized>(
302mut trie: &[u8],
303mut ascii: &[u8],
304) -> Option<usize> {
305loop {
306let (b, x, i, search);
307 (b, trie) = trie.split_first()?;
308let byte_type = byte_type(*b);
309 (x, trie) = match byte_type {
310 NodeType::Ascii => (0, trie),
311 NodeType::Span => {
312if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.ascii_mode {
AsciiMode::BinarySpans => true,
_ => false,
}matches!(T::OPTIONS.ascii_mode, AsciiMode::BinarySpans) {
313read_varint_meta3(*b, trie)
314 } else {
315if true {
if !false {
{
::core::panicking::panic_fmt(format_args!("Span node found in ASCII trie!"));
}
};
};debug_assert!(false, "Span node found in ASCII trie!");
316return None;
317 }
318 }
319 NodeType::Value => read_varint_meta3(*b, trie),
320 NodeType::Branch => read_varint_meta2(*b, trie),
321 };
322if let Some((c, temp)) = ascii.split_first() {
323if #[allow(non_exhaustive_omitted_patterns)] match byte_type {
NodeType::Ascii => true,
_ => false,
}matches!(byte_type, NodeType::Ascii) {
324let is_match = if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.case_sensitivity {
CaseSensitivity::IgnoreCase => true,
_ => false,
}matches!(T::OPTIONS.case_sensitivity, CaseSensitivity::IgnoreCase)325 {
326b.eq_ignore_ascii_case(c)
327 } else {
328b == c329 };
330if is_match {
331// Matched a byte
332ascii = temp;
333continue;
334 } else {
335// Byte that doesn't match
336return None;
337 }
338 }
339if #[allow(non_exhaustive_omitted_patterns)] match byte_type {
NodeType::Value => true,
_ => false,
}matches!(byte_type, NodeType::Value) {
340// Value node, but not at end of string
341continue;
342 }
343if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.ascii_mode {
AsciiMode::BinarySpans => true,
_ => false,
}matches!(T::OPTIONS.ascii_mode, AsciiMode::BinarySpans)344 && #[allow(non_exhaustive_omitted_patterns)] match byte_type {
NodeType::Span => true,
_ => false,
}matches!(byte_type, NodeType::Span)345 {
346let (trie_span, ascii_span);
347 (trie_span, trie) = trie.debug_split_at(x);
348 (ascii_span, ascii) = ascii.split_at_checked(x)?;
349if trie_span == ascii_span {
350// Matched a byte span
351continue;
352 } else {
353// Byte span that doesn't match
354return None;
355 }
356 }
357// Branch node
358let (x, w) = if x >= 256 { (x & 0xff, x >> 8) } else { (x, 0) };
359let w = if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.capacity_mode {
CapacityMode::Extended => true,
_ => false,
}matches!(T::OPTIONS.capacity_mode, CapacityMode::Extended) {
360w361 } else {
362// See the table below regarding this assertion
363if true {
if !(w <= 3) {
{
::core::panicking::panic_fmt(format_args!("get: w > 3 but we assume w <= 3"));
}
};
};debug_assert!(w <= 3, "get: w > 3 but we assume w <= 3");
364w & 0x3
365};
366let x = if x == 0 { 256 } else { x };
367if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.phf_mode {
PhfMode::BinaryOnly => true,
_ => false,
}matches!(T::OPTIONS.phf_mode, PhfMode::BinaryOnly) || x < 16 {
368// binary search
369(search, trie) = trie.debug_split_at(x);
370let bsearch_result =
371if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.case_sensitivity {
CaseSensitivity::IgnoreCase => true,
_ => false,
}matches!(T::OPTIONS.case_sensitivity, CaseSensitivity::IgnoreCase) {
372search.binary_search_by_key(&c.to_ascii_lowercase(), |x| {
373x.to_ascii_lowercase()
374 })
375 } else {
376search.binary_search(c)
377 };
378i = bsearch_result.ok()?;
379 } else {
380// phf
381(search, trie) = trie.debug_split_at(x * 2 + 1);
382i = PerfectByteHashMap::from_store(search).get(*c)?;
383 }
384trie = if w == 0 {
385get_branch_w0(trie, i, x)
386 } else {
387get_branch(trie, i, x, w)
388 };
389ascii = temp;
390continue;
391 } else {
392if #[allow(non_exhaustive_omitted_patterns)] match byte_type {
NodeType::Value => true,
_ => false,
}matches!(byte_type, NodeType::Value) {
393// Value node at end of string
394return Some(x);
395 }
396return None;
397 }
398 }
399}
400401// DISCUSS: This function is 7% faster *on aarch64* if we assert a max on w.
402//
403// | Bench | No Assert, x86_64 | No Assert, aarch64 | Assertion, x86_64 | Assertion, aarch64 |
404// |---------------|-------------------|--------------------|-------------------|--------------------|
405// | basic | ~187.51 ns | ~97.586 ns | ~199.11 ns | ~99.236 ns |
406// | subtags_10pct | ~9.5557 µs | ~4.8696 µs | ~9.5779 µs | ~4.5649 µs |
407// | subtags_full | ~137.75 µs | ~76.016 µs | ~142.02 µs | ~70.254 µs |
408409/// Steps one node into the trie assuming all branch nodes are binary search and that
410/// there are no span nodes.
411///
412/// The input-output argument `trie` starts at the original trie and ends pointing to
413/// the sub-trie reachable by `c`.
414#[inline]
415pub(crate) fn step_parameterized<T: ZeroTrieWithOptions + ?Sized>(
416 trie: &mut &[u8],
417 c: u8,
418) -> Option<u8> {
419// Currently, the only option `step_parameterized` supports is `CaseSensitivity::IgnoreCase`.
420 // `AsciiMode::BinarySpans` is tricky because the state can no longer be simply a trie.
421 // If a span node is encountered, `None` is returned later in this function.
422if true {
if !#[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.ascii_mode
{
AsciiMode::AsciiOnly => true,
_ => false,
} {
{
::core::panicking::panic_fmt(format_args!("Spans not yet implemented in step function"));
}
};
};debug_assert!(
423matches!(T::OPTIONS.ascii_mode, AsciiMode::AsciiOnly),
424"Spans not yet implemented in step function"
425);
426// PHF can be easily implemented but the code is not yet reachable
427if true {
if !#[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.phf_mode {
PhfMode::BinaryOnly => true,
_ => false,
} {
{
::core::panicking::panic_fmt(format_args!("PHF not yet implemented in step function"));
}
};
};debug_assert!(
428matches!(T::OPTIONS.phf_mode, PhfMode::BinaryOnly),
429"PHF not yet implemented in step function"
430);
431// Extended Capacity can be easily implemented but the code is not yet reachable
432if true {
if !#[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.capacity_mode
{
CapacityMode::Normal => true,
_ => false,
} {
{
::core::panicking::panic_fmt(format_args!("Extended capacity not yet implemented in step function"));
}
};
};debug_assert!(
433matches!(T::OPTIONS.capacity_mode, CapacityMode::Normal),
434"Extended capacity not yet implemented in step function"
435);
436let (mut b, x, search);
437loop {
438 (b, *trie) = match trie.split_first() {
439Some(v) => v,
440None => {
441// Empty trie or only a value node
442return None;
443 }
444 };
445match byte_type(*b) {
446 NodeType::Ascii => {
447let is_match = if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.case_sensitivity {
CaseSensitivity::IgnoreCase => true,
_ => false,
}matches!(T::OPTIONS.case_sensitivity, CaseSensitivity::IgnoreCase)448 {
449b.eq_ignore_ascii_case(&c)
450 } else {
451*b == c452 };
453if is_match {
454// Matched a byte
455return Some(*b);
456 } else {
457// Byte that doesn't match
458*trie = &[];
459return None;
460 }
461 }
462 NodeType::Branch => {
463// Proceed to the branch node logic below
464(x, *trie) = read_varint_meta2(*b, trie);
465break;
466 }
467 NodeType::Span => {
468// Question: Should we put the trie back into a valid state?
469 // Currently this code is unreachable so let's not worry about it.
470if true {
if !false {
{
::core::panicking::panic_fmt(format_args!("Span node found in ASCII trie!"));
}
};
};debug_assert!(false, "Span node found in ASCII trie!");
471return None;
472 }
473 NodeType::Value => {
474// Skip the value node and go to the next node
475(_, *trie) = read_varint_meta3(*b, trie);
476continue;
477 }
478 };
479 }
480// Branch node
481let (x, w) = if x >= 256 { (x & 0xff, x >> 8) } else { (x, 0) };
482// See comment above regarding this assertion
483if true {
if !(w <= 3) {
{
::core::panicking::panic_fmt(format_args!("get: w > 3 but we assume w <= 3"));
}
};
};debug_assert!(w <= 3, "get: w > 3 but we assume w <= 3");
484let w = w & 0x3;
485let x = if x == 0 { 256 } else { x };
486// Always use binary search
487(search, *trie) = trie.debug_split_at(x);
488let bsearch_result = if #[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.case_sensitivity {
CaseSensitivity::IgnoreCase => true,
_ => false,
}matches!(T::OPTIONS.case_sensitivity, CaseSensitivity::IgnoreCase) {
489search.binary_search_by_key(&c.to_ascii_lowercase(), |x| x.to_ascii_lowercase())
490 } else {
491search.binary_search(&c)
492 };
493match bsearch_result {
494Ok(i) => {
495// Matched a byte
496*trie = if w == 0 {
497get_branch_w0(trie, i, x)
498 } else {
499get_branch(trie, i, x, w)
500 };
501#[allow(clippy::indexing_slicing)] // i is from a binary search
502Some(search[i])
503 }
504Err(_) => {
505// Byte that doesn't match
506*trie = &[];
507None508 }
509 }
510}
511512/// Steps one node into the trie, assuming all branch nodes are binary search and that
513/// there are no span nodes, using an index.
514///
515/// The input-output argument `trie` starts at the original trie and ends pointing to
516/// the sub-trie indexed by `index`.
517#[inline]
518pub(crate) fn probe_parameterized<T: ZeroTrieWithOptions + ?Sized>(
519 trie: &mut &[u8],
520 index: usize,
521) -> Option<AsciiProbeResult> {
522// Currently, the only option `step_parameterized` supports is `CaseSensitivity::IgnoreCase`.
523 // `AsciiMode::BinarySpans` is tricky because the state can no longer be simply a trie.
524 // If a span node is encountered, `None` is returned later in this function.
525if true {
if !#[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.ascii_mode
{
AsciiMode::AsciiOnly => true,
_ => false,
} {
{
::core::panicking::panic_fmt(format_args!("Spans not yet implemented in step function"));
}
};
};debug_assert!(
526matches!(T::OPTIONS.ascii_mode, AsciiMode::AsciiOnly),
527"Spans not yet implemented in step function"
528);
529// PHF can be easily implemented but the code is not yet reachable
530if true {
if !#[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.phf_mode {
PhfMode::BinaryOnly => true,
_ => false,
} {
{
::core::panicking::panic_fmt(format_args!("PHF not yet implemented in step function"));
}
};
};debug_assert!(
531matches!(T::OPTIONS.phf_mode, PhfMode::BinaryOnly),
532"PHF not yet implemented in step function"
533);
534// Extended Capacity can be easily implemented but the code is not yet reachable
535if true {
if !#[allow(non_exhaustive_omitted_patterns)] match T::OPTIONS.capacity_mode
{
CapacityMode::Normal => true,
_ => false,
} {
{
::core::panicking::panic_fmt(format_args!("Extended capacity not yet implemented in step function"));
}
};
};debug_assert!(
536matches!(T::OPTIONS.capacity_mode, CapacityMode::Normal),
537"Extended capacity not yet implemented in step function"
538);
539let (mut b, x, search);
540loop {
541 (b, *trie) = match trie.split_first() {
542Some(v) => v,
543None => {
544// Empty trie or only a value node
545return None;
546 }
547 };
548match byte_type(*b) {
549 NodeType::Ascii => {
550if index > 0 {
551*trie = &[];
552return None;
553 }
554return Some(AsciiProbeResult {
555 byte: *b,
556 total_siblings: 1,
557 });
558 }
559 NodeType::Branch => {
560// Proceed to the branch node logic below
561(x, *trie) = read_varint_meta2(*b, trie);
562break;
563 }
564 NodeType::Span => {
565// Question: Should we put the trie back into a valid state?
566 // Currently this code is unreachable so let's not worry about it.
567if true {
if !false {
{
::core::panicking::panic_fmt(format_args!("Span node found in ASCII trie!"));
}
};
};debug_assert!(false, "Span node found in ASCII trie!");
568return None;
569 }
570 NodeType::Value => {
571// Skip the value node and go to the next node
572(_, *trie) = read_varint_meta3(*b, trie);
573continue;
574 }
575 };
576 }
577// Branch node
578let (x, w) = if x >= 256 { (x & 0xff, x >> 8) } else { (x, 0) };
579if true {
if !u8::try_from(x).is_ok() {
::core::panicking::panic("assertion failed: u8::try_from(x).is_ok()")
};
};debug_assert!(u8::try_from(x).is_ok());
580let total_siblings = xas u8;
581// See comment above regarding this assertion
582if true {
if !(w <= 3) {
{
::core::panicking::panic_fmt(format_args!("get: w > 3 but we assume w <= 3"));
}
};
};debug_assert!(w <= 3, "get: w > 3 but we assume w <= 3");
583let w = w & 0x3;
584let x = if x == 0 { 256 } else { x };
585if index >= x {
586*trie = &[];
587return None;
588 }
589 (search, *trie) = trie.debug_split_at(x);
590*trie = if w == 0 {
591get_branch_w0(trie, index, x)
592 } else {
593get_branch(trie, index, x, w)
594 };
595Some(AsciiProbeResult {
596#[allow(clippy::indexing_slicing)] // index < x, the length of search
597byte: search[index],
598total_siblings,
599 })
600}
601602/// Steps one node into the trie if the head node is a value node, returning the value.
603/// If the head node is not a value node, no change is made.
604///
605/// The input-output argument `trie` starts at the original trie and ends pointing to
606/// the sub-trie with the value node removed.
607pub(crate) fn take_value(trie: &mut &[u8]) -> Option<usize> {
608let (b, new_trie) = trie.split_first()?;
609match byte_type(*b) {
610 NodeType::Ascii | NodeType::Span | NodeType::Branch => None,
611 NodeType::Value => {
612let x;
613 (x, *trie) = read_varint_meta3(*b, new_trie);
614Some(x)
615 }
616 }
617}
618619#[cfg(feature = "alloc")]
620use alloc::vec::Vec;
621622/// Iterator type for walking the byte sequences contained in a ZeroTrie.
623///
624/// ✨ *Enabled with the `alloc` Cargo feature.*
625#[cfg(feature = "alloc")]
626#[derive(Debug)]
627pub struct ZeroTrieIterator<'a> {
628/// Whether the PHF is enabled on this trie.
629use_phf: bool,
630/// Intermediate state during iteration:
631 /// 1. A trie (usually a slice of the original, bigger trie)
632 /// 2. The string that leads to the trie
633 /// 3. If the trie's lead node is a branch node, the current index being evaluated
634state: Vec<(&'a [u8], Vec<u8>, usize)>,
635}
636637#[cfg(feature = "alloc")]
638impl<'a> ZeroTrieIterator<'a> {
639pub(crate) fn new<S: AsRef<[u8]> + ?Sized>(store: &'a S, use_phf: bool) -> Self {
640 ZeroTrieIterator {
641 use_phf,
642 state: alloc::vec![(store.as_ref(), alloc::vec![], 0)],
643 }
644 }
645}
646647#[cfg(feature = "alloc")]
648impl Iterator for ZeroTrieIterator<'_> {
649type Item = (Vec<u8>, usize);
650fn next(&mut self) -> Option<Self::Item> {
651let (mut trie, mut string, mut branch_idx);
652 (trie, string, branch_idx) = self.state.pop()?;
653loop {
654let (b, x, span, search);
655let return_trie = trie;
656 (b, trie) = match trie.split_first() {
657Some(tpl) => tpl,
658None => {
659// At end of current branch; step back to the branch node.
660 // If there are no more branches, we are finished.
661(trie, string, branch_idx) = self.state.pop()?;
662continue;
663 }
664 };
665let byte_type = byte_type(*b);
666if matches!(byte_type, NodeType::Ascii) {
667 string.push(*b);
668continue;
669 }
670 (x, trie) = match byte_type {
671 NodeType::Ascii => (0, trie),
672 NodeType::Span | NodeType::Value => read_varint_meta3(*b, trie),
673 NodeType::Branch => read_varint_meta2(*b, trie),
674 };
675if matches!(byte_type, NodeType::Span) {
676 (span, trie) = trie.debug_split_at(x);
677 string.extend(span);
678continue;
679 }
680if matches!(byte_type, NodeType::Value) {
681let retval = string.clone();
682// Return to this position on the next step
683self.state.push((trie, string, 0));
684return Some((retval, x));
685 }
686// Match node
687let (x, w) = if x >= 256 { (x & 0xff, x >> 8) } else { (x, 0) };
688let x = if x == 0 { 256 } else { x };
689if branch_idx + 1 < x {
690// Return to this branch node at the next index
691self.state
692 .push((return_trie, string.clone(), branch_idx + 1));
693 }
694let byte = if x < 16 || !self.use_phf {
695// binary search
696(search, trie) = trie.debug_split_at(x);
697debug_unwrap!(search.get(branch_idx), return None)
698 } else {
699// phf
700(search, trie) = trie.debug_split_at(x * 2 + 1);
701debug_unwrap!(search.get(branch_idx + x + 1), return None)
702 };
703 string.push(*byte);
704 trie = if w == 0 {
705 get_branch_w0(trie, branch_idx, x)
706 } else {
707 get_branch(trie, branch_idx, x, w)
708 };
709 branch_idx = 0;
710 }
711 }
712}
713714#[cfg(feature = "alloc")]
715pub(crate) fn get_iter_phf<S: AsRef<[u8]> + ?Sized>(store: &S) -> ZeroTrieIterator<'_> {
716 ZeroTrieIterator::new(store, true)
717}
718719/// # Panics
720/// Panics if the trie contains non-ASCII items.
721#[cfg(feature = "alloc")]
722#[expect(clippy::type_complexity)]
723pub(crate) fn get_iter_ascii_or_panic<S: AsRef<[u8]> + ?Sized>(
724 store: &S,
725) -> core::iter::Map<ZeroTrieIterator<'_>, fn((Vec<u8>, usize)) -> (String, usize)> {
726 ZeroTrieIterator::new(store, false).map(|(k, v)| {
727#[expect(clippy::unwrap_used)] // in signature of function
728let ascii_str = String::from_utf8(k).unwrap();
729 (ascii_str, v)
730 })
731}