diesel/connection/
statement_cache.rs

1//! Helper types for prepared statement caching
2//!
3//! A primer on prepared statement caching in Diesel
4//! ------------------------------------------------
5//!
6//! Diesel uses prepared statements for virtually all queries. This is most
7//! visible in our lack of any sort of "quoting" API. Values must always be
8//! transmitted as bind parameters, we do not support direct interpolation. The
9//! only method in the public API that doesn't require the use of prepared
10//! statements is [`SimpleConnection::batch_execute`](super::SimpleConnection::batch_execute).
11//!
12//! In order to avoid the cost of re-parsing and planning subsequent queries,
13//! Diesel caches the prepared statement whenever possible. Queries will fall
14//! into one of three buckets:
15//!
16//! - Unsafe to cache
17//! - Cached by SQL
18//! - Cached by type
19//!
20//! A query is considered unsafe to cache if it represents a potentially
21//! unbounded number of queries. This is communicated to the connection through
22//! [`QueryFragment::is_safe_to_cache_prepared`]. While this is done as a full AST
23//! pass, after monomorphisation and inlining this will usually be optimized to
24//! a constant. Only boxed queries will need to do actual work to answer this
25//! question.
26//!
27//! The majority of AST nodes are safe to cache if their components are safe to
28//! cache. There are at least 4 cases where a query is unsafe to cache:
29//!
30//! - queries containing `IN` with bind parameters
31//!     - This requires 1 bind parameter per value, and is therefore unbounded
32//!     - `IN` with subselects are cached (assuming the subselect is safe to
33//!        cache)
34//!     - `IN` statements for postgresql are cached as they use `= ANY($1)` instead
35//!        which does not cause a unbound number of binds
36//! - `INSERT` statements with a variable number of rows
37//!     - The SQL varies based on the number of rows being inserted.
38//! - `UPDATE` statements
39//!     - Technically it's bounded on "number of optional values being passed to
40//!       `SET` factorial" but that's still quite high, and not worth caching
41//!       for the same reason as single row inserts
42//! - `SqlLiteral` nodes
43//!     - We have no way of knowing whether the SQL was generated dynamically or
44//!       not, so we must assume that it's unbounded
45//!
46//! For queries which are unsafe to cache, the statement cache will never insert
47//! them. They will be prepared and immediately released after use (or in the
48//! case of PG they will use the unnamed prepared statement).
49//!
50//! For statements which are able to be cached, we then have to determine what
51//! to use as the cache key. The standard method that virtually all ORMs or
52//! database access layers use in the wild is to store the statements in a
53//! hash map, using the SQL as the key.
54//!
55//! However, the majority of queries using Diesel that are safe to cache as
56//! prepared statements will be uniquely identified by their type. For these
57//! queries, we can bypass the query builder entirely. Since our AST is
58//! generally optimized away by the compiler, for these queries the cost of
59//! fetching a prepared statement from the cache is the cost of [`HashMap<u32,
60//! _>::get`], where the key we're fetching by is a compile time constant. For
61//! these types, the AST pass to gather the bind parameters will also be
62//! optimized to accessing each parameter individually.
63//!
64//! Determining if a query can be cached by type is the responsibility of the
65//! [`QueryId`] trait. This trait is quite similar to `Any`, but with a few
66//! differences:
67//!
68//! - No `'static` bound
69//!     - Something being a reference never changes the SQL that is generated,
70//!       so `&T` has the same query id as `T`.
71//! - `Option<TypeId>` instead of `TypeId`
72//!     - We need to be able to constrain on this trait being implemented, but
73//!       not all types will actually have a static query id. Hopefully once
74//!       specialization is stable we can remove the `QueryId` bound and
75//!       specialize on it instead (or provide a blanket impl for all `T`)
76//! - Implementors give a more broad type than `Self`
77//!     - This really only affects bind parameters. There are 6 different Rust
78//!       types which can be used for a parameter of type `timestamp`. The same
79//!       statement can be used regardless of the Rust type, so [`Bound<ST, T>`](crate::expression::bound::Bound)
80//!       defines its [`QueryId`] as [`Bound<ST, ()>`](crate::expression::bound::Bound).
81//!
82//! A type returning `Some(id)` or `None` for its query ID is based on whether
83//! the SQL it generates can change without the type changing. At the moment,
84//! the only type which is safe to cache as a prepared statement but does not
85//! have a static query ID is something which has been boxed.
86//!
87//! One potential optimization that we don't perform is storing the queries
88//! which are cached by type ID in a separate map. Since a type ID is a u64,
89//! this would allow us to use a specialized map which knows that there will
90//! never be hashing collisions (also known as a perfect hashing function),
91//! which would mean lookups are always constant time. However, this would save
92//! nanoseconds on an operation that will take microseconds or even
93//! milliseconds.
94
95use std::any::TypeId;
96use std::borrow::Cow;
97use std::collections::HashMap;
98use std::hash::Hash;
99use std::ops::{Deref, DerefMut};
100
101use crate::backend::Backend;
102use crate::connection::InstrumentationEvent;
103use crate::query_builder::*;
104use crate::result::QueryResult;
105
106use super::Instrumentation;
107
108/// A prepared statement cache
109#[allow(missing_debug_implementations, unreachable_pub)]
110#[cfg_attr(
111    docsrs,
112    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
113)]
114pub struct StatementCache<DB: Backend, Statement> {
115    pub(crate) cache: HashMap<StatementCacheKey<DB>, Statement>,
116}
117
118/// A helper type that indicates if a certain query
119/// is cached inside of the prepared statement cache or not
120///
121/// This information can be used by the connection implementation
122/// to signal this fact to the database while actually
123/// preparing the statement
124#[derive(Debug, Clone, Copy)]
125#[cfg_attr(
126    docsrs,
127    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
128)]
129#[allow(unreachable_pub)]
130pub enum PrepareForCache {
131    /// The statement will be cached
132    Yes,
133    /// The statement won't be cached
134    No,
135}
136
137#[allow(
138    clippy::len_without_is_empty,
139    clippy::new_without_default,
140    unreachable_pub
141)]
142impl<DB, Statement> StatementCache<DB, Statement>
143where
144    DB: Backend,
145    DB::TypeMetadata: Clone,
146    DB::QueryBuilder: Default,
147    StatementCacheKey<DB>: Hash + Eq,
148{
149    /// Create a new prepared statement cache
150    #[allow(unreachable_pub)]
151    pub fn new() -> Self {
152        StatementCache {
153            cache: HashMap::new(),
154        }
155    }
156
157    /// Get the current length of the statement cache
158    #[allow(unreachable_pub)]
159    #[cfg(any(
160        feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes",
161        feature = "postgres",
162        all(feature = "sqlite", test)
163    ))]
164    #[cfg_attr(
165        docsrs,
166        doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
167    )]
168    pub fn len(&self) -> usize {
169        self.cache.len()
170    }
171
172    /// Prepare a query as prepared statement
173    ///
174    /// This functions returns a prepared statement corresponding to the
175    /// query passed as `source` with the bind values passed as `bind_types`.
176    /// If the query is already cached inside this prepared statement cache
177    /// the cached prepared statement will be returned, otherwise `prepare_fn`
178    /// will be called to create a new prepared statement for this query source.
179    /// The first parameter of the callback contains the query string, the second
180    /// parameter indicates if the constructed prepared statement will be cached or not.
181    /// See the [module](self) documentation for details
182    /// about which statements are cached and which are not cached.
183    #[allow(unreachable_pub)]
184    pub fn cached_statement<T, F>(
185        &mut self,
186        source: &T,
187        backend: &DB,
188        bind_types: &[DB::TypeMetadata],
189        mut prepare_fn: F,
190        instrumentation: &mut dyn Instrumentation,
191    ) -> QueryResult<MaybeCached<'_, Statement>>
192    where
193        T: QueryFragment<DB> + QueryId,
194        F: FnMut(&str, PrepareForCache) -> QueryResult<Statement>,
195    {
196        self.cached_statement_non_generic(
197            T::query_id(),
198            source,
199            backend,
200            bind_types,
201            &mut prepare_fn,
202            instrumentation,
203        )
204    }
205
206    /// Reduce the amount of monomorphized code by factoring this via dynamic dispatch
207    fn cached_statement_non_generic(
208        &mut self,
209        maybe_type_id: Option<TypeId>,
210        source: &dyn QueryFragmentForCachedStatement<DB>,
211        backend: &DB,
212        bind_types: &[DB::TypeMetadata],
213        prepare_fn: &mut dyn FnMut(&str, PrepareForCache) -> QueryResult<Statement>,
214        instrumentation: &mut dyn Instrumentation,
215    ) -> QueryResult<MaybeCached<'_, Statement>> {
216        use std::collections::hash_map::Entry::{Occupied, Vacant};
217
218        let cache_key = StatementCacheKey::for_source(maybe_type_id, source, bind_types, backend)?;
219
220        if !source.is_safe_to_cache_prepared(backend)? {
221            let sql = cache_key.sql(source, backend)?;
222            return prepare_fn(&sql, PrepareForCache::No).map(MaybeCached::CannotCache);
223        }
224
225        let cached_result = match self.cache.entry(cache_key) {
226            Occupied(entry) => entry.into_mut(),
227            Vacant(entry) => {
228                let statement = {
229                    let sql = entry.key().sql(source, backend)?;
230                    instrumentation
231                        .on_connection_event(InstrumentationEvent::CacheQuery { sql: &sql });
232                    prepare_fn(&sql, PrepareForCache::Yes)
233                };
234
235                entry.insert(statement?)
236            }
237        };
238
239        Ok(MaybeCached::Cached(cached_result))
240    }
241}
242
243/// Implemented for all `QueryFragment`s, dedicated to dynamic dispatch within the context of
244/// `statement_cache`
245///
246/// We want the generated code to be as small as possible, so for each query passed to
247/// [`StatementCache::cached_statement`] the generated assembly will just call a non generic
248/// version with dynamic dispatch pointing to the VTABLE of this minimal trait
249///
250/// This preserves the opportunity for the compiler to entirely optimize the `construct_sql`
251/// function as a function that simply returns a constant `String`.
252#[allow(unreachable_pub)]
253#[cfg_attr(
254    docsrs,
255    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
256)]
257pub trait QueryFragmentForCachedStatement<DB> {
258    /// Convert the query fragment into a SQL string for the given backend
259    fn construct_sql(&self, backend: &DB) -> QueryResult<String>;
260    /// Check whether it's safe to cache the query
261    fn is_safe_to_cache_prepared(&self, backend: &DB) -> QueryResult<bool>;
262}
263impl<T, DB> QueryFragmentForCachedStatement<DB> for T
264where
265    DB: Backend,
266    DB::QueryBuilder: Default,
267    T: QueryFragment<DB>,
268{
269    fn construct_sql(&self, backend: &DB) -> QueryResult<String> {
270        let mut query_builder = DB::QueryBuilder::default();
271        self.to_sql(&mut query_builder, backend)?;
272        Ok(query_builder.finish())
273    }
274
275    fn is_safe_to_cache_prepared(&self, backend: &DB) -> QueryResult<bool> {
276        <T as QueryFragment<DB>>::is_safe_to_cache_prepared(self, backend)
277    }
278}
279
280/// Wraps a possibly cached prepared statement
281///
282/// Essentially a customized version of [`Cow`]
283/// that does not depend on [`ToOwned`]
284#[allow(missing_debug_implementations, unreachable_pub)]
285#[cfg_attr(
286    docsrs,
287    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
288)]
289#[non_exhaustive]
290pub enum MaybeCached<'a, T: 'a> {
291    /// Contains a not cached prepared statement
292    CannotCache(T),
293    /// Contains a reference cached prepared statement
294    Cached(&'a mut T),
295}
296
297impl<T> Deref for MaybeCached<'_, T> {
298    type Target = T;
299
300    fn deref(&self) -> &Self::Target {
301        match *self {
302            MaybeCached::CannotCache(ref x) => x,
303            MaybeCached::Cached(ref x) => x,
304        }
305    }
306}
307
308impl<T> DerefMut for MaybeCached<'_, T> {
309    fn deref_mut(&mut self) -> &mut Self::Target {
310        match *self {
311            MaybeCached::CannotCache(ref mut x) => x,
312            MaybeCached::Cached(ref mut x) => x,
313        }
314    }
315}
316
317/// The lookup key used by [`StatementCache`] internally
318///
319/// This can contain either a at compile time known type id
320/// (representing a statically known query) or a at runtime
321/// calculated query string + parameter types (for queries
322/// that may change depending on their parameters)
323#[allow(missing_debug_implementations, unreachable_pub)]
324#[derive(Hash, PartialEq, Eq)]
325#[cfg_attr(
326    docsrs,
327    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
328)]
329pub enum StatementCacheKey<DB: Backend> {
330    /// Represents a at compile time known query
331    ///
332    /// Calculated via [`QueryId::QueryId`]
333    Type(TypeId),
334    /// Represents a dynamically constructed query
335    ///
336    /// This variant is used if [`QueryId::HAS_STATIC_QUERY_ID`]
337    /// is `false` and [`AstPass::unsafe_to_cache_prepared`] is not
338    /// called for a given query.
339    Sql {
340        /// contains the sql query string
341        sql: String,
342        /// contains the types of any bind parameter passed to the query
343        bind_types: Vec<DB::TypeMetadata>,
344    },
345}
346
347impl<DB> StatementCacheKey<DB>
348where
349    DB: Backend,
350    DB::QueryBuilder: Default,
351    DB::TypeMetadata: Clone,
352{
353    /// Create a new statement cache key for the given query source
354    // Note: Intentionally monomorphic over source.
355    #[allow(unreachable_pub)]
356    pub fn for_source(
357        maybe_type_id: Option<TypeId>,
358        source: &dyn QueryFragmentForCachedStatement<DB>,
359        bind_types: &[DB::TypeMetadata],
360        backend: &DB,
361    ) -> QueryResult<Self> {
362        match maybe_type_id {
363            Some(id) => Ok(StatementCacheKey::Type(id)),
364            None => {
365                let sql = source.construct_sql(backend)?;
366                Ok(StatementCacheKey::Sql {
367                    sql,
368                    bind_types: bind_types.into(),
369                })
370            }
371        }
372    }
373
374    /// Get the sql for a given query source based
375    ///
376    /// This is an optimization that may skip constructing the query string
377    /// twice if it's already part of the current cache key
378    // Note: Intentionally monomorphic over source.
379    #[allow(unreachable_pub)]
380    pub fn sql(
381        &self,
382        source: &dyn QueryFragmentForCachedStatement<DB>,
383        backend: &DB,
384    ) -> QueryResult<Cow<'_, str>> {
385        match *self {
386            StatementCacheKey::Type(_) => source.construct_sql(backend).map(Cow::Owned),
387            StatementCacheKey::Sql { ref sql, .. } => Ok(Cow::Borrowed(sql)),
388        }
389    }
390}