diesel/connection/statement_cache/
mod.rs

1//! Helper types for prepared statement caching
2//!
3//! A primer on prepared statement caching in Diesel
4//! ------------------------------------------------
5//!
6//! Diesel uses prepared statements for virtually all queries. This is most
7//! visible in our lack of any sort of "quoting" API. Values must always be
8//! transmitted as bind parameters, we do not support direct interpolation. The
9//! only method in the public API that doesn't require the use of prepared
10//! statements is [`SimpleConnection::batch_execute`](super::SimpleConnection::batch_execute).
11//!
12//! In order to avoid the cost of re-parsing and planning subsequent queries,
13//! by default Diesel caches the prepared statement whenever possible. This
14//! can be customized by calling
15//! [`Connection::set_cache_size`](super::Connection::set_cache_size).
16//!
17//! Queries will fall into one of three buckets:
18//!
19//! - Unsafe to cache
20//! - Cached by SQL
21//! - Cached by type
22//!
23//! A query is considered unsafe to cache if it represents a potentially
24//! unbounded number of queries. This is communicated to the connection through
25//! [`QueryFragment::is_safe_to_cache_prepared`]. While this is done as a full AST
26//! pass, after monomorphisation and inlining this will usually be optimized to
27//! a constant. Only boxed queries will need to do actual work to answer this
28//! question.
29//!
30//! The majority of AST nodes are safe to cache if their components are safe to
31//! cache. There are at least 4 cases where a query is unsafe to cache:
32//!
33//! - queries containing `IN` with bind parameters
34//!     - This requires 1 bind parameter per value, and is therefore unbounded
35//!     - `IN` with subselects are cached (assuming the subselect is safe to
36//!       cache)
37//!     - `IN` statements for postgresql are cached as they use `= ANY($1)` instead
38//!       which does not cause an unbound number of binds
39//! - `INSERT` statements with a variable number of rows
40//!     - The SQL varies based on the number of rows being inserted.
41//! - `UPDATE` statements
42//!     - Technically it's bounded on "number of optional values being passed to
43//!       `SET` factorial" but that's still quite high, and not worth caching
44//!       for the same reason as single row inserts
45//! - `SqlLiteral` nodes
46//!     - We have no way of knowing whether the SQL was generated dynamically or
47//!       not, so we must assume that it's unbounded
48//!
49//! For queries which are unsafe to cache, the statement cache will never insert
50//! them. They will be prepared and immediately released after use (or in the
51//! case of PG they will use the unnamed prepared statement).
52//!
53//! For statements which are able to be cached, we then have to determine what
54//! to use as the cache key. The standard method that virtually all ORMs or
55//! database access layers use in the wild is to store the statements in a
56//! hash map, using the SQL as the key.
57//!
58//! However, the majority of queries using Diesel that are safe to cache as
59//! prepared statements will be uniquely identified by their type. For these
60//! queries, we can bypass the query builder entirely. Since our AST is
61//! generally optimized away by the compiler, for these queries the cost of
62//! fetching a prepared statement from the cache is the cost of [`HashMap<u32,
63//! _>::get`], where the key we're fetching by is a compile time constant. For
64//! these types, the AST pass to gather the bind parameters will also be
65//! optimized to accessing each parameter individually.
66//!
67//! Determining if a query can be cached by type is the responsibility of the
68//! [`QueryId`] trait. This trait is quite similar to `Any`, but with a few
69//! differences:
70//!
71//! - No `'static` bound
72//!     - Something being a reference never changes the SQL that is generated,
73//!       so `&T` has the same query id as `T`.
74//! - `Option<TypeId>` instead of `TypeId`
75//!     - We need to be able to constrain on this trait being implemented, but
76//!       not all types will actually have a static query id. Hopefully once
77//!       specialization is stable we can remove the `QueryId` bound and
78//!       specialize on it instead (or provide a blanket impl for all `T`)
79//! - Implementors give a more broad type than `Self`
80//!     - This really only affects bind parameters. There are 6 different Rust
81//!       types which can be used for a parameter of type `timestamp`. The same
82//!       statement can be used regardless of the Rust type, so [`Bound<ST, T>`](crate::expression::bound::Bound)
83//!       defines its [`QueryId`] as [`Bound<ST, ()>`](crate::expression::bound::Bound).
84//!
85//! A type returning `Some(id)` or `None` for its query ID is based on whether
86//! the SQL it generates can change without the type changing. At the moment,
87//! the only type which is safe to cache as a prepared statement but does not
88//! have a static query ID is something which has been boxed.
89//!
90//! One potential optimization that we don't perform is storing the queries
91//! which are cached by type ID in a separate map. Since a type ID is a u64,
92//! this would allow us to use a specialized map which knows that there will
93//! never be hashing collisions (also known as a perfect hashing function),
94//! which would mean lookups are always constant time. However, this would save
95//! nanoseconds on an operation that will take microseconds or even
96//! milliseconds.
97
98use std::any::TypeId;
99use std::borrow::Cow;
100use std::collections::hash_map::Entry;
101use std::hash::Hash;
102use std::ops::{Deref, DerefMut};
103
104use strategy::{
105    LookupStatementResult, StatementCacheStrategy, WithCacheStrategy, WithoutCacheStrategy,
106};
107
108use crate::backend::Backend;
109use crate::connection::InstrumentationEvent;
110use crate::query_builder::*;
111use crate::result::QueryResult;
112
113use super::{CacheSize, Instrumentation};
114
115/// Various interfaces and implementations to control connection statement caching.
116#[allow(unreachable_pub)]
117pub mod strategy;
118
119/// A prepared statement cache
120#[allow(missing_debug_implementations, unreachable_pub)]
121#[cfg_attr(
122    docsrs,
123    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
124)]
125pub struct StatementCache<DB: Backend, Statement> {
126    cache: Box<dyn StatementCacheStrategy<DB, Statement>>,
127    // increment every time a query is cached
128    // some backends might use it to create unique prepared statement names
129    cache_counter: u64,
130}
131
132/// A helper type that indicates if a certain query
133/// is cached inside of the prepared statement cache or not
134///
135/// This information can be used by the connection implementation
136/// to signal this fact to the database while actually
137/// preparing the statement
138#[derive(Debug, Clone, Copy)]
139#[cfg_attr(
140    docsrs,
141    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
142)]
143#[allow(unreachable_pub)]
144pub enum PrepareForCache {
145    /// The statement will be cached    
146    Yes {
147        /// Counter might be used as unique identifier for prepared statement.
148        #[allow(dead_code)]
149        counter: u64,
150    },
151    /// The statement won't be cached
152    No,
153}
154
155#[allow(clippy::new_without_default, unreachable_pub)]
156impl<DB, Statement> StatementCache<DB, Statement>
157where
158    DB: Backend + 'static,
159    Statement: Send + 'static,
160    DB::TypeMetadata: Send + Clone,
161    DB::QueryBuilder: Default,
162    StatementCacheKey<DB>: Hash + Eq,
163{
164    /// Create a new prepared statement cache using [`CacheSize::Unbounded`] as caching strategy.
165    #[allow(unreachable_pub)]
166    pub fn new() -> Self {
167        StatementCache {
168            cache: Box::new(WithCacheStrategy::default()),
169            cache_counter: 0,
170        }
171    }
172
173    /// Set caching strategy from predefined implementations
174    pub fn set_cache_size(&mut self, size: CacheSize) {
175        if self.cache.cache_size() != size {
176            self.cache = match size {
177                CacheSize::Unbounded => Box::new(WithCacheStrategy::default()),
178                CacheSize::Disabled => Box::new(WithoutCacheStrategy::default()),
179            }
180        }
181    }
182
183    /// Setting custom caching strategy. It is used in tests, to verify caching logic
184    #[allow(dead_code)]
185    pub(crate) fn set_strategy<Strategy>(&mut self, s: Strategy)
186    where
187        Strategy: StatementCacheStrategy<DB, Statement> + 'static,
188    {
189        self.cache = Box::new(s);
190    }
191
192    /// Prepare a query as prepared statement
193    ///
194    /// This functions returns a prepared statement corresponding to the
195    /// query passed as `source` with the bind values passed as `bind_types`.
196    /// If the query is already cached inside this prepared statement cache
197    /// the cached prepared statement will be returned, otherwise `prepare_fn`
198    /// will be called to create a new prepared statement for this query source.
199    /// The first parameter of the callback contains the query string, the second
200    /// parameter indicates if the constructed prepared statement will be cached or not.
201    /// See the [module](self) documentation for details
202    /// about which statements are cached and which are not cached.
203    //
204    // Notes:
205    // This function takes explicitly a connection and a function pointer (and no generic callback)
206    // as argument to ensure that we don't leak generic query types into the prepare function
207    #[allow(unreachable_pub)]
208    #[cfg(any(
209        feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes",
210        feature = "sqlite",
211        feature = "mysql"
212    ))]
213    pub fn cached_statement<'a, T, R, C>(
214        &'a mut self,
215        source: &T,
216        backend: &DB,
217        bind_types: &[DB::TypeMetadata],
218        conn: C,
219        prepare_fn: fn(C, &str, PrepareForCache, &[DB::TypeMetadata]) -> R,
220        instrumentation: &mut dyn Instrumentation,
221    ) -> R::Return<'a>
222    where
223        T: QueryFragment<DB> + QueryId,
224        R: StatementCallbackReturnType<Statement, C> + 'a,
225    {
226        self.cached_statement_non_generic(
227            T::query_id(),
228            source,
229            backend,
230            bind_types,
231            conn,
232            prepare_fn,
233            instrumentation,
234        )
235    }
236
237    /// Prepare a query as prepared statement
238    ///
239    /// This function closely mirrors `Self::cached_statement` but
240    /// eliminates the generic query type in favour of a trait object
241    ///
242    /// This can be easier to use in situations where you already turned
243    /// the query type into a concrete SQL string
244    // Notes:
245    // This function takes explicitly a connection and a function pointer (and no generic callback)
246    // as argument to ensure that we don't leak generic query types into the prepare function
247    #[allow(unreachable_pub)]
248    #[allow(clippy::too_many_arguments)] // we need all of them
249    pub fn cached_statement_non_generic<'a, R, C>(
250        &'a mut self,
251        maybe_type_id: Option<TypeId>,
252        source: &dyn QueryFragmentForCachedStatement<DB>,
253        backend: &DB,
254        bind_types: &[DB::TypeMetadata],
255        conn: C,
256        prepare_fn: fn(C, &str, PrepareForCache, &[DB::TypeMetadata]) -> R,
257        instrumentation: &mut dyn Instrumentation,
258    ) -> R::Return<'a>
259    where
260        R: StatementCallbackReturnType<Statement, C> + 'a,
261    {
262        Self::cached_statement_non_generic_impl(
263            self.cache.as_mut(),
264            maybe_type_id,
265            source,
266            backend,
267            bind_types,
268            conn,
269            |conn, sql, is_cached| {
270                if is_cached {
271                    instrumentation.on_connection_event(InstrumentationEvent::CacheQuery { sql });
272                    self.cache_counter += 1;
273                    prepare_fn(
274                        conn,
275                        sql,
276                        PrepareForCache::Yes {
277                            counter: self.cache_counter,
278                        },
279                        bind_types,
280                    )
281                } else {
282                    prepare_fn(conn, sql, PrepareForCache::No, bind_types)
283                }
284            },
285        )
286    }
287
288    /// Reduce the amount of monomorphized code by factoring this via dynamic dispatch
289    /// There will be only one instance of `R` for diesel (and a different single instance for diesel-async)
290    /// There will be only a instance per connection type `C` for each connection that
291    /// uses this prepared statement impl, this closely correlates to the types `DB` and `Statement`
292    /// for the overall statement cache impl
293    fn cached_statement_non_generic_impl<'a, R, C>(
294        cache: &'a mut dyn StatementCacheStrategy<DB, Statement>,
295        maybe_type_id: Option<TypeId>,
296        source: &dyn QueryFragmentForCachedStatement<DB>,
297        backend: &DB,
298        bind_types: &[DB::TypeMetadata],
299        conn: C,
300        prepare_fn: impl FnOnce(C, &str, bool) -> R,
301    ) -> R::Return<'a>
302    where
303        R: StatementCallbackReturnType<Statement, C> + 'a,
304    {
305        // this function cannot use the `?` operator
306        // as we want to abstract over returning `QueryResult<MaybeCached>` and
307        // `impl Future<Output = QueryResult<MaybeCached>>` here
308        // to share the prepared statement cache implementation between diesel and
309        // diesel_async
310        //
311        // For this reason we need to match explicitly on each error and call `R::from_error()`
312        // to construct the right error return variant
313        let cache_key =
314            match StatementCacheKey::for_source(maybe_type_id, source, bind_types, backend) {
315                Ok(o) => o,
316                Err(e) => return R::from_error(e),
317            };
318        let is_safe_to_cache_prepared = match source.is_safe_to_cache_prepared(backend) {
319            Ok(o) => o,
320            Err(e) => return R::from_error(e),
321        };
322        // early return if the statement cannot be cached
323        if !is_safe_to_cache_prepared {
324            let sql = match cache_key.sql(source, backend) {
325                Ok(sql) => sql,
326                Err(e) => return R::from_error(e),
327            };
328            return prepare_fn(conn, &sql, false).map_to_no_cache();
329        }
330        let entry = cache.lookup_statement(cache_key);
331        match entry {
332            // The statement is already cached
333            LookupStatementResult::CacheEntry(Entry::Occupied(e)) => {
334                R::map_to_cache(e.into_mut(), conn)
335            }
336            // The statement is not cached but there is capacity to cache it
337            LookupStatementResult::CacheEntry(Entry::Vacant(e)) => {
338                let sql = match e.key().sql(source, backend) {
339                    Ok(sql) => sql,
340                    Err(e) => return R::from_error(e),
341                };
342                let st = prepare_fn(conn, &sql, true);
343                st.register_cache(|stmt| e.insert(stmt))
344            }
345            // The statement is not cached and there is no capacity to cache it
346            LookupStatementResult::NoCache(cache_key) => {
347                let sql = match cache_key.sql(source, backend) {
348                    Ok(sql) => sql,
349                    Err(e) => return R::from_error(e),
350                };
351                prepare_fn(conn, &sql, false).map_to_no_cache()
352            }
353        }
354    }
355}
356
357/// Implemented for all `QueryFragment`s, dedicated to dynamic dispatch within the context of
358/// `statement_cache`
359///
360/// We want the generated code to be as small as possible, so for each query passed to
361/// [`StatementCache::cached_statement`] the generated assembly will just call a non generic
362/// version with dynamic dispatch pointing to the VTABLE of this minimal trait
363///
364/// This preserves the opportunity for the compiler to entirely optimize the `construct_sql`
365/// function as a function that simply returns a constant `String`.
366#[allow(unreachable_pub)]
367#[cfg_attr(
368    docsrs,
369    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
370)]
371pub trait QueryFragmentForCachedStatement<DB> {
372    /// Convert the query fragment into a SQL string for the given backend
373    fn construct_sql(&self, backend: &DB) -> QueryResult<String>;
374
375    /// Check whether it's safe to cache the query
376    fn is_safe_to_cache_prepared(&self, backend: &DB) -> QueryResult<bool>;
377}
378
379impl<T, DB> QueryFragmentForCachedStatement<DB> for T
380where
381    DB: Backend,
382    DB::QueryBuilder: Default,
383    T: QueryFragment<DB>,
384{
385    fn construct_sql(&self, backend: &DB) -> QueryResult<String> {
386        let mut query_builder = DB::QueryBuilder::default();
387        self.to_sql(&mut query_builder, backend)?;
388        Ok(query_builder.finish())
389    }
390
391    fn is_safe_to_cache_prepared(&self, backend: &DB) -> QueryResult<bool> {
392        <T as QueryFragment<DB>>::is_safe_to_cache_prepared(self, backend)
393    }
394}
395
396/// Wraps a possibly cached prepared statement
397///
398/// Essentially a customized version of [`Cow`]
399/// that does not depend on [`ToOwned`]
400#[allow(missing_debug_implementations, unreachable_pub)]
401#[cfg_attr(
402    docsrs,
403    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
404)]
405#[non_exhaustive]
406pub enum MaybeCached<'a, T: 'a> {
407    /// Contains a not cached prepared statement
408    CannotCache(T),
409    /// Contains a reference cached prepared statement
410    Cached(&'a mut T),
411}
412
413/// This trait abstracts over the type returned by the prepare statement function
414///
415/// The main use-case for this abstraction is to share the same statement cache implementation
416/// between diesel and diesel-async.
417#[cfg_attr(
418    docsrs,
419    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
420)]
421#[allow(unreachable_pub)]
422pub trait StatementCallbackReturnType<S: 'static, C> {
423    /// The return type of `StatementCache::cached_statement`
424    ///
425    /// Either a `QueryResult<MaybeCached<S>>` or a future of that result type
426    type Return<'a>;
427
428    /// Create the return type from an error
429    fn from_error<'a>(e: diesel::result::Error) -> Self::Return<'a>;
430
431    /// Map the callback return type to the `MaybeCached::CannotCache` variant
432    fn map_to_no_cache<'a>(self) -> Self::Return<'a>
433    where
434        Self: 'a;
435
436    /// Map the cached statement to the `MaybeCached::Cached` variant
437    fn map_to_cache(stmt: &mut S, conn: C) -> Self::Return<'_>;
438
439    /// Insert the created statement into the cache via the provided callback
440    /// and then turn the returned reference into `MaybeCached::Cached`
441    fn register_cache<'a>(
442        self,
443        callback: impl FnOnce(S) -> &'a mut S + Send + 'a,
444    ) -> Self::Return<'a>
445    where
446        Self: 'a;
447}
448
449impl<S, C> StatementCallbackReturnType<S, C> for QueryResult<S>
450where
451    S: 'static,
452{
453    type Return<'a> = QueryResult<MaybeCached<'a, S>>;
454
455    fn from_error<'a>(e: diesel::result::Error) -> Self::Return<'a> {
456        Err(e)
457    }
458
459    fn map_to_no_cache<'a>(self) -> Self::Return<'a> {
460        self.map(MaybeCached::CannotCache)
461    }
462
463    fn map_to_cache(stmt: &mut S, _conn: C) -> Self::Return<'_> {
464        Ok(MaybeCached::Cached(stmt))
465    }
466
467    fn register_cache<'a>(
468        self,
469        callback: impl FnOnce(S) -> &'a mut S + Send + 'a,
470    ) -> Self::Return<'a>
471    where
472        Self: 'a,
473    {
474        Ok(MaybeCached::Cached(callback(self?)))
475    }
476}
477
478impl<T> Deref for MaybeCached<'_, T> {
479    type Target = T;
480
481    fn deref(&self) -> &Self::Target {
482        match *self {
483            MaybeCached::CannotCache(ref x) => x,
484            MaybeCached::Cached(ref x) => x,
485        }
486    }
487}
488
489impl<T> DerefMut for MaybeCached<'_, T> {
490    fn deref_mut(&mut self) -> &mut Self::Target {
491        match *self {
492            MaybeCached::CannotCache(ref mut x) => x,
493            MaybeCached::Cached(ref mut x) => x,
494        }
495    }
496}
497
498/// The lookup key used by [`StatementCache`] internally
499///
500/// This can contain either a at compile time known type id
501/// (representing a statically known query) or a at runtime
502/// calculated query string + parameter types (for queries
503/// that may change depending on their parameters)
504#[allow(missing_debug_implementations, unreachable_pub)]
505#[derive(Hash, PartialEq, Eq)]
506#[cfg_attr(
507    docsrs,
508    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
509)]
510pub enum StatementCacheKey<DB: Backend> {
511    /// Represents a at compile time known query
512    ///
513    /// Calculated via [`QueryId::QueryId`]
514    Type(TypeId),
515    /// Represents a dynamically constructed query
516    ///
517    /// This variant is used if [`QueryId::HAS_STATIC_QUERY_ID`]
518    /// is `false` and [`AstPass::unsafe_to_cache_prepared`] is not
519    /// called for a given query.
520    Sql {
521        /// contains the sql query string
522        sql: String,
523        /// contains the types of any bind parameter passed to the query
524        bind_types: Vec<DB::TypeMetadata>,
525    },
526}
527
528impl<DB> StatementCacheKey<DB>
529where
530    DB: Backend,
531    DB::QueryBuilder: Default,
532    DB::TypeMetadata: Clone,
533{
534    /// Create a new statement cache key for the given query source
535    // Note: Intentionally monomorphic over source.
536    #[allow(unreachable_pub)]
537    pub fn for_source(
538        maybe_type_id: Option<TypeId>,
539        source: &dyn QueryFragmentForCachedStatement<DB>,
540        bind_types: &[DB::TypeMetadata],
541        backend: &DB,
542    ) -> QueryResult<Self> {
543        match maybe_type_id {
544            Some(id) => Ok(StatementCacheKey::Type(id)),
545            None => {
546                let sql = source.construct_sql(backend)?;
547                Ok(StatementCacheKey::Sql {
548                    sql,
549                    bind_types: bind_types.into(),
550                })
551            }
552        }
553    }
554
555    /// Get the sql for a given query source based
556    ///
557    /// This is an optimization that may skip constructing the query string
558    /// twice if it's already part of the current cache key
559    // Note: Intentionally monomorphic over source.
560    #[allow(unreachable_pub)]
561    pub fn sql(
562        &self,
563        source: &dyn QueryFragmentForCachedStatement<DB>,
564        backend: &DB,
565    ) -> QueryResult<Cow<'_, str>> {
566        match *self {
567            StatementCacheKey::Type(_) => source.construct_sql(backend).map(Cow::Owned),
568            StatementCacheKey::Sql { ref sql, .. } => Ok(Cow::Borrowed(sql)),
569        }
570    }
571}