diesel/connection/statement_cache/
mod.rs

1//! Helper types for prepared statement caching
2//!
3//! A primer on prepared statement caching in Diesel
4//! ------------------------------------------------
5//!
6//! Diesel uses prepared statements for virtually all queries. This is most
7//! visible in our lack of any sort of "quoting" API. Values must always be
8//! transmitted as bind parameters, we do not support direct interpolation. The
9//! only method in the public API that doesn't require the use of prepared
10//! statements is [`SimpleConnection::batch_execute`](super::SimpleConnection::batch_execute).
11//!
12//! In order to avoid the cost of re-parsing and planning subsequent queries,
13//! by default Diesel caches the prepared statement whenever possible. This
14//! can be customized by calling
15//! [`Connection::set_cache_size`](super::Connection::set_cache_size).
16//!
17//! Queries will fall into one of three buckets:
18//!
19//! - Unsafe to cache
20//! - Cached by SQL
21//! - Cached by type
22//!
23//! A query is considered unsafe to cache if it represents a potentially
24//! unbounded number of queries. This is communicated to the connection through
25//! [`QueryFragment::is_safe_to_cache_prepared`]. While this is done as a full AST
26//! pass, after monomorphisation and inlining this will usually be optimized to
27//! a constant. Only boxed queries will need to do actual work to answer this
28//! question.
29//!
30//! The majority of AST nodes are safe to cache if their components are safe to
31//! cache. There are at least 4 cases where a query is unsafe to cache:
32//!
33//! - queries containing `IN` with bind parameters
34//!     - This requires 1 bind parameter per value, and is therefore unbounded
35//!     - `IN` with subselects are cached (assuming the subselect is safe to
36//!        cache)
37//!     - `IN` statements for postgresql are cached as they use `= ANY($1)` instead
38//!        which does not cause an unbound number of binds
39//! - `INSERT` statements with a variable number of rows
40//!     - The SQL varies based on the number of rows being inserted.
41//! - `UPDATE` statements
42//!     - Technically it's bounded on "number of optional values being passed to
43//!       `SET` factorial" but that's still quite high, and not worth caching
44//!       for the same reason as single row inserts
45//! - `SqlLiteral` nodes
46//!     - We have no way of knowing whether the SQL was generated dynamically or
47//!       not, so we must assume that it's unbounded
48//!
49//! For queries which are unsafe to cache, the statement cache will never insert
50//! them. They will be prepared and immediately released after use (or in the
51//! case of PG they will use the unnamed prepared statement).
52//!
53//! For statements which are able to be cached, we then have to determine what
54//! to use as the cache key. The standard method that virtually all ORMs or
55//! database access layers use in the wild is to store the statements in a
56//! hash map, using the SQL as the key.
57//!
58//! However, the majority of queries using Diesel that are safe to cache as
59//! prepared statements will be uniquely identified by their type. For these
60//! queries, we can bypass the query builder entirely. Since our AST is
61//! generally optimized away by the compiler, for these queries the cost of
62//! fetching a prepared statement from the cache is the cost of [`HashMap<u32,
63//! _>::get`], where the key we're fetching by is a compile time constant. For
64//! these types, the AST pass to gather the bind parameters will also be
65//! optimized to accessing each parameter individually.
66//!
67//! Determining if a query can be cached by type is the responsibility of the
68//! [`QueryId`] trait. This trait is quite similar to `Any`, but with a few
69//! differences:
70//!
71//! - No `'static` bound
72//!     - Something being a reference never changes the SQL that is generated,
73//!       so `&T` has the same query id as `T`.
74//! - `Option<TypeId>` instead of `TypeId`
75//!     - We need to be able to constrain on this trait being implemented, but
76//!       not all types will actually have a static query id. Hopefully once
77//!       specialization is stable we can remove the `QueryId` bound and
78//!       specialize on it instead (or provide a blanket impl for all `T`)
79//! - Implementors give a more broad type than `Self`
80//!     - This really only affects bind parameters. There are 6 different Rust
81//!       types which can be used for a parameter of type `timestamp`. The same
82//!       statement can be used regardless of the Rust type, so [`Bound<ST, T>`](crate::expression::bound::Bound)
83//!       defines its [`QueryId`] as [`Bound<ST, ()>`](crate::expression::bound::Bound).
84//!
85//! A type returning `Some(id)` or `None` for its query ID is based on whether
86//! the SQL it generates can change without the type changing. At the moment,
87//! the only type which is safe to cache as a prepared statement but does not
88//! have a static query ID is something which has been boxed.
89//!
90//! One potential optimization that we don't perform is storing the queries
91//! which are cached by type ID in a separate map. Since a type ID is a u64,
92//! this would allow us to use a specialized map which knows that there will
93//! never be hashing collisions (also known as a perfect hashing function),
94//! which would mean lookups are always constant time. However, this would save
95//! nanoseconds on an operation that will take microseconds or even
96//! milliseconds.
97
98use std::any::TypeId;
99use std::borrow::Cow;
100use std::collections::hash_map::Entry;
101use std::hash::Hash;
102use std::ops::{Deref, DerefMut};
103
104use strategy::{
105    LookupStatementResult, StatementCacheStrategy, WithCacheStrategy, WithoutCacheStrategy,
106};
107
108use crate::backend::Backend;
109use crate::connection::InstrumentationEvent;
110use crate::query_builder::*;
111use crate::result::QueryResult;
112
113use super::{CacheSize, Instrumentation};
114
115/// Various interfaces and implementations to control connection statement caching.
116#[allow(unreachable_pub)]
117pub mod strategy;
118
119/// A prepared statement cache
120#[allow(missing_debug_implementations, unreachable_pub)]
121#[cfg_attr(
122    docsrs,
123    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
124)]
125pub struct StatementCache<DB: Backend, Statement> {
126    cache: Box<dyn StatementCacheStrategy<DB, Statement>>,
127    // increment every time a query is cached
128    // some backends might use it to create unique prepared statement names
129    cache_counter: u64,
130}
131
132/// A helper type that indicates if a certain query
133/// is cached inside of the prepared statement cache or not
134///
135/// This information can be used by the connection implementation
136/// to signal this fact to the database while actually
137/// preparing the statement
138#[derive(Debug, Clone, Copy)]
139#[cfg_attr(
140    docsrs,
141    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
142)]
143#[allow(unreachable_pub)]
144pub enum PrepareForCache {
145    /// The statement will be cached    
146    Yes {
147        /// Counter might be used as unique identifier for prepared statement.
148        #[allow(dead_code)]
149        counter: u64,
150    },
151    /// The statement won't be cached
152    No,
153}
154
155#[allow(clippy::new_without_default, unreachable_pub)]
156impl<DB, Statement> StatementCache<DB, Statement>
157where
158    DB: Backend + 'static,
159    Statement: Send + 'static,
160    DB::TypeMetadata: Send + Clone,
161    DB::QueryBuilder: Default,
162    StatementCacheKey<DB>: Hash + Eq,
163{
164    /// Create a new prepared statement cache using [`CacheSize::Unbounded`] as caching strategy.
165    #[allow(unreachable_pub)]
166    pub fn new() -> Self {
167        StatementCache {
168            cache: Box::new(WithCacheStrategy::default()),
169            cache_counter: 0,
170        }
171    }
172
173    /// Set caching strategy from predefined implementations
174    pub fn set_cache_size(&mut self, size: CacheSize) {
175        if self.cache.cache_size() != size {
176            self.cache = match size {
177                CacheSize::Unbounded => Box::new(WithCacheStrategy::default()),
178                CacheSize::Disabled => Box::new(WithoutCacheStrategy::default()),
179            }
180        }
181    }
182
183    /// Setting custom caching strategy. It is used in tests, to verify caching logic
184    #[allow(dead_code)]
185    pub(crate) fn set_strategy<Strategy>(&mut self, s: Strategy)
186    where
187        Strategy: StatementCacheStrategy<DB, Statement> + 'static,
188    {
189        self.cache = Box::new(s);
190    }
191
192    /// Prepare a query as prepared statement
193    ///
194    /// This functions returns a prepared statement corresponding to the
195    /// query passed as `source` with the bind values passed as `bind_types`.
196    /// If the query is already cached inside this prepared statement cache
197    /// the cached prepared statement will be returned, otherwise `prepare_fn`
198    /// will be called to create a new prepared statement for this query source.
199    /// The first parameter of the callback contains the query string, the second
200    /// parameter indicates if the constructed prepared statement will be cached or not.
201    /// See the [module](self) documentation for details
202    /// about which statements are cached and which are not cached.
203    //
204    // Notes:
205    // This function takes explicitly a connection and a function pointer (and no generic callback)
206    // as argument to ensure that we don't leak generic query types into the prepare function
207    #[allow(unreachable_pub)]
208    pub fn cached_statement<'a, T, R, C>(
209        &'a mut self,
210        source: &T,
211        backend: &DB,
212        bind_types: &[DB::TypeMetadata],
213        conn: C,
214        prepare_fn: fn(C, &str, PrepareForCache, &[DB::TypeMetadata]) -> R,
215        instrumentation: &mut dyn Instrumentation,
216    ) -> R::Return<'a>
217    where
218        T: QueryFragment<DB> + QueryId,
219        R: StatementCallbackReturnType<Statement, C> + 'a,
220    {
221        self.cached_statement_non_generic(
222            T::query_id(),
223            source,
224            backend,
225            bind_types,
226            conn,
227            prepare_fn,
228            instrumentation,
229        )
230    }
231
232    /// Prepare a query as prepared statement
233    ///
234    /// This function closely mirrors `Self::cached_statement` but
235    /// eliminates the generic query type in favour of a trait object
236    ///
237    /// This can be easier to use in situations where you already turned
238    /// the query type into a concrete SQL string
239    // Notes:
240    // This function takes explicitly a connection and a function pointer (and no generic callback)
241    // as argument to ensure that we don't leak generic query types into the prepare function
242    #[allow(unreachable_pub)]
243    #[allow(clippy::too_many_arguments)] // we need all of them
244    pub fn cached_statement_non_generic<'a, R, C>(
245        &'a mut self,
246        maybe_type_id: Option<TypeId>,
247        source: &dyn QueryFragmentForCachedStatement<DB>,
248        backend: &DB,
249        bind_types: &[DB::TypeMetadata],
250        conn: C,
251        prepare_fn: fn(C, &str, PrepareForCache, &[DB::TypeMetadata]) -> R,
252        instrumentation: &mut dyn Instrumentation,
253    ) -> R::Return<'a>
254    where
255        R: StatementCallbackReturnType<Statement, C> + 'a,
256    {
257        Self::cached_statement_non_generic_impl(
258            self.cache.as_mut(),
259            maybe_type_id,
260            source,
261            backend,
262            bind_types,
263            conn,
264            |conn, sql, is_cached| {
265                if is_cached {
266                    instrumentation.on_connection_event(InstrumentationEvent::CacheQuery { sql });
267                    self.cache_counter += 1;
268                    prepare_fn(
269                        conn,
270                        sql,
271                        PrepareForCache::Yes {
272                            counter: self.cache_counter,
273                        },
274                        bind_types,
275                    )
276                } else {
277                    prepare_fn(conn, sql, PrepareForCache::No, bind_types)
278                }
279            },
280        )
281    }
282
283    /// Reduce the amount of monomorphized code by factoring this via dynamic dispatch
284    /// There will be only one instance of `R` for diesel (and a different single instance for diesel-async)
285    /// There will be only a instance per connection type `C` for each connection that
286    /// uses this prepared statement impl, this closely correlates to the types `DB` and `Statement`
287    /// for the overall statement cache impl
288    fn cached_statement_non_generic_impl<'a, R, C>(
289        cache: &'a mut dyn StatementCacheStrategy<DB, Statement>,
290        maybe_type_id: Option<TypeId>,
291        source: &dyn QueryFragmentForCachedStatement<DB>,
292        backend: &DB,
293        bind_types: &[DB::TypeMetadata],
294        conn: C,
295        prepare_fn: impl FnOnce(C, &str, bool) -> R,
296    ) -> R::Return<'a>
297    where
298        R: StatementCallbackReturnType<Statement, C> + 'a,
299    {
300        // this function cannot use the `?` operator
301        // as we want to abstract over returning `QueryResult<MaybeCached>` and
302        // `impl Future<Output = QueryResult<MaybeCached>>` here
303        // to share the prepared statement cache implementation between diesel and
304        // diesel_async
305        //
306        // For this reason we need to match explicitly on each error and call `R::from_error()`
307        // to construct the right error return variant
308        let cache_key =
309            match StatementCacheKey::for_source(maybe_type_id, source, bind_types, backend) {
310                Ok(o) => o,
311                Err(e) => return R::from_error(e),
312            };
313        let is_safe_to_cache_prepared = match source.is_safe_to_cache_prepared(backend) {
314            Ok(o) => o,
315            Err(e) => return R::from_error(e),
316        };
317        // early return if the statement cannot be cached
318        if !is_safe_to_cache_prepared {
319            let sql = match cache_key.sql(source, backend) {
320                Ok(sql) => sql,
321                Err(e) => return R::from_error(e),
322            };
323            return prepare_fn(conn, &sql, false).map_to_no_cache();
324        }
325        let entry = cache.lookup_statement(cache_key);
326        match entry {
327            // The statement is already cached
328            LookupStatementResult::CacheEntry(Entry::Occupied(e)) => {
329                R::map_to_cache(e.into_mut(), conn)
330            }
331            // The statement is not cached but there is capacity to cache it
332            LookupStatementResult::CacheEntry(Entry::Vacant(e)) => {
333                let sql = match e.key().sql(source, backend) {
334                    Ok(sql) => sql,
335                    Err(e) => return R::from_error(e),
336                };
337                let st = prepare_fn(conn, &sql, true);
338                st.register_cache(|stmt| e.insert(stmt))
339            }
340            // The statement is not cached and there is no capacity to cache it
341            LookupStatementResult::NoCache(cache_key) => {
342                let sql = match cache_key.sql(source, backend) {
343                    Ok(sql) => sql,
344                    Err(e) => return R::from_error(e),
345                };
346                prepare_fn(conn, &sql, false).map_to_no_cache()
347            }
348        }
349    }
350}
351
352/// Implemented for all `QueryFragment`s, dedicated to dynamic dispatch within the context of
353/// `statement_cache`
354///
355/// We want the generated code to be as small as possible, so for each query passed to
356/// [`StatementCache::cached_statement`] the generated assembly will just call a non generic
357/// version with dynamic dispatch pointing to the VTABLE of this minimal trait
358///
359/// This preserves the opportunity for the compiler to entirely optimize the `construct_sql`
360/// function as a function that simply returns a constant `String`.
361#[allow(unreachable_pub)]
362#[cfg_attr(
363    docsrs,
364    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
365)]
366pub trait QueryFragmentForCachedStatement<DB> {
367    /// Convert the query fragment into a SQL string for the given backend
368    fn construct_sql(&self, backend: &DB) -> QueryResult<String>;
369
370    /// Check whether it's safe to cache the query
371    fn is_safe_to_cache_prepared(&self, backend: &DB) -> QueryResult<bool>;
372}
373
374impl<T, DB> QueryFragmentForCachedStatement<DB> for T
375where
376    DB: Backend,
377    DB::QueryBuilder: Default,
378    T: QueryFragment<DB>,
379{
380    fn construct_sql(&self, backend: &DB) -> QueryResult<String> {
381        let mut query_builder = DB::QueryBuilder::default();
382        self.to_sql(&mut query_builder, backend)?;
383        Ok(query_builder.finish())
384    }
385
386    fn is_safe_to_cache_prepared(&self, backend: &DB) -> QueryResult<bool> {
387        <T as QueryFragment<DB>>::is_safe_to_cache_prepared(self, backend)
388    }
389}
390
391/// Wraps a possibly cached prepared statement
392///
393/// Essentially a customized version of [`Cow`]
394/// that does not depend on [`ToOwned`]
395#[allow(missing_debug_implementations, unreachable_pub)]
396#[cfg_attr(
397    docsrs,
398    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
399)]
400#[non_exhaustive]
401pub enum MaybeCached<'a, T: 'a> {
402    /// Contains a not cached prepared statement
403    CannotCache(T),
404    /// Contains a reference cached prepared statement
405    Cached(&'a mut T),
406}
407
408/// This trait abstracts over the type returned by the prepare statement function
409///
410/// The main use-case for this abstraction is to share the same statement cache implementation
411/// between diesel and diesel-async.
412#[cfg_attr(
413    docsrs,
414    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
415)]
416#[allow(unreachable_pub)]
417pub trait StatementCallbackReturnType<S: 'static, C> {
418    /// The return type of `StatementCache::cached_statement`
419    ///
420    /// Either a `QueryResult<MaybeCached<S>>` or a future of that result type
421    type Return<'a>;
422
423    /// Create the return type from an error
424    fn from_error<'a>(e: diesel::result::Error) -> Self::Return<'a>;
425
426    /// Map the callback return type to the `MaybeCached::CannotCache` variant
427    fn map_to_no_cache<'a>(self) -> Self::Return<'a>
428    where
429        Self: 'a;
430
431    /// Map the cached statement to the `MaybeCached::Cached` variant
432    fn map_to_cache(stmt: &mut S, conn: C) -> Self::Return<'_>;
433
434    /// Insert the created statement into the cache via the provided callback
435    /// and then turn the returned reference into `MaybeCached::Cached`
436    fn register_cache<'a>(
437        self,
438        callback: impl FnOnce(S) -> &'a mut S + Send + 'a,
439    ) -> Self::Return<'a>
440    where
441        Self: 'a;
442}
443
444impl<S, C> StatementCallbackReturnType<S, C> for QueryResult<S>
445where
446    S: 'static,
447{
448    type Return<'a> = QueryResult<MaybeCached<'a, S>>;
449
450    fn from_error<'a>(e: diesel::result::Error) -> Self::Return<'a> {
451        Err(e)
452    }
453
454    fn map_to_no_cache<'a>(self) -> Self::Return<'a> {
455        self.map(MaybeCached::CannotCache)
456    }
457
458    fn map_to_cache(stmt: &mut S, _conn: C) -> Self::Return<'_> {
459        Ok(MaybeCached::Cached(stmt))
460    }
461
462    fn register_cache<'a>(
463        self,
464        callback: impl FnOnce(S) -> &'a mut S + Send + 'a,
465    ) -> Self::Return<'a>
466    where
467        Self: 'a,
468    {
469        Ok(MaybeCached::Cached(callback(self?)))
470    }
471}
472
473impl<T> Deref for MaybeCached<'_, T> {
474    type Target = T;
475
476    fn deref(&self) -> &Self::Target {
477        match *self {
478            MaybeCached::CannotCache(ref x) => x,
479            MaybeCached::Cached(ref x) => x,
480        }
481    }
482}
483
484impl<T> DerefMut for MaybeCached<'_, T> {
485    fn deref_mut(&mut self) -> &mut Self::Target {
486        match *self {
487            MaybeCached::CannotCache(ref mut x) => x,
488            MaybeCached::Cached(ref mut x) => x,
489        }
490    }
491}
492
493/// The lookup key used by [`StatementCache`] internally
494///
495/// This can contain either a at compile time known type id
496/// (representing a statically known query) or a at runtime
497/// calculated query string + parameter types (for queries
498/// that may change depending on their parameters)
499#[allow(missing_debug_implementations, unreachable_pub)]
500#[derive(Hash, PartialEq, Eq)]
501#[cfg_attr(
502    docsrs,
503    doc(cfg(feature = "i-implement-a-third-party-backend-and-opt-into-breaking-changes"))
504)]
505pub enum StatementCacheKey<DB: Backend> {
506    /// Represents a at compile time known query
507    ///
508    /// Calculated via [`QueryId::QueryId`]
509    Type(TypeId),
510    /// Represents a dynamically constructed query
511    ///
512    /// This variant is used if [`QueryId::HAS_STATIC_QUERY_ID`]
513    /// is `false` and [`AstPass::unsafe_to_cache_prepared`] is not
514    /// called for a given query.
515    Sql {
516        /// contains the sql query string
517        sql: String,
518        /// contains the types of any bind parameter passed to the query
519        bind_types: Vec<DB::TypeMetadata>,
520    },
521}
522
523impl<DB> StatementCacheKey<DB>
524where
525    DB: Backend,
526    DB::QueryBuilder: Default,
527    DB::TypeMetadata: Clone,
528{
529    /// Create a new statement cache key for the given query source
530    // Note: Intentionally monomorphic over source.
531    #[allow(unreachable_pub)]
532    pub fn for_source(
533        maybe_type_id: Option<TypeId>,
534        source: &dyn QueryFragmentForCachedStatement<DB>,
535        bind_types: &[DB::TypeMetadata],
536        backend: &DB,
537    ) -> QueryResult<Self> {
538        match maybe_type_id {
539            Some(id) => Ok(StatementCacheKey::Type(id)),
540            None => {
541                let sql = source.construct_sql(backend)?;
542                Ok(StatementCacheKey::Sql {
543                    sql,
544                    bind_types: bind_types.into(),
545                })
546            }
547        }
548    }
549
550    /// Get the sql for a given query source based
551    ///
552    /// This is an optimization that may skip constructing the query string
553    /// twice if it's already part of the current cache key
554    // Note: Intentionally monomorphic over source.
555    #[allow(unreachable_pub)]
556    pub fn sql(
557        &self,
558        source: &dyn QueryFragmentForCachedStatement<DB>,
559        backend: &DB,
560    ) -> QueryResult<Cow<'_, str>> {
561        match *self {
562            StatementCacheKey::Type(_) => source.construct_sql(backend).map(Cow::Owned),
563            StatementCacheKey::Sql { ref sql, .. } => Ok(Cow::Borrowed(sql)),
564        }
565    }
566}