Module diesel::connection::statement_cache

source ·
Available on crate feature i-implement-a-third-party-backend-and-opt-into-breaking-changes only.
Expand description

Helper types for prepared statement caching

§A primer on prepared statement caching in Diesel

Diesel uses prepared statements for virtually all queries. This is most visible in our lack of any sort of “quoting” API. Values must always be transmitted as bind parameters, we do not support direct interpolation. The only method in the public API that doesn’t require the use of prepared statements is SimpleConnection::batch_execute.

In order to avoid the cost of re-parsing and planning subsequent queries, Diesel caches the prepared statement whenever possible. Queries will fall into one of three buckets:

  • Unsafe to cache
  • Cached by SQL
  • Cached by type

A query is considered unsafe to cache if it represents a potentially unbounded number of queries. This is communicated to the connection through QueryFragment::is_safe_to_cache_prepared. While this is done as a full AST pass, after monomorphisation and inlining this will usually be optimized to a constant. Only boxed queries will need to do actual work to answer this question.

The majority of AST nodes are safe to cache if their components are safe to cache. There are at least 4 cases where a query is unsafe to cache:

  • queries containing IN with bind parameters
    • This requires 1 bind parameter per value, and is therefore unbounded
    • IN with subselects are cached (assuming the subselect is safe to cache)
    • IN statements for postgresql are cached as they use = ANY($1) instead which does not cause a unbound number of binds
  • INSERT statements with a variable number of rows
    • The SQL varies based on the number of rows being inserted.
  • UPDATE statements
    • Technically it’s bounded on “number of optional values being passed to SET factorial” but that’s still quite high, and not worth caching for the same reason as single row inserts
  • SqlLiteral nodes
    • We have no way of knowing whether the SQL was generated dynamically or not, so we must assume that it’s unbounded

For queries which are unsafe to cache, the statement cache will never insert them. They will be prepared and immediately released after use (or in the case of PG they will use the unnamed prepared statement).

For statements which are able to be cached, we then have to determine what to use as the cache key. The standard method that virtually all ORMs or database access layers use in the wild is to store the statements in a hash map, using the SQL as the key.

However, the majority of queries using Diesel that are safe to cache as prepared statements will be uniquely identified by their type. For these queries, we can bypass the query builder entirely. Since our AST is generally optimized away by the compiler, for these queries the cost of fetching a prepared statement from the cache is the cost of HashMap<u32, _>::get, where the key we’re fetching by is a compile time constant. For these types, the AST pass to gather the bind parameters will also be optimized to accessing each parameter individually.

Determining if a query can be cached by type is the responsibility of the QueryId trait. This trait is quite similar to Any, but with a few differences:

  • No 'static bound
    • Something being a reference never changes the SQL that is generated, so &T has the same query id as T.
  • Option<TypeId> instead of TypeId
    • We need to be able to constrain on this trait being implemented, but not all types will actually have a static query id. Hopefully once specialization is stable we can remove the QueryId bound and specialize on it instead (or provide a blanket impl for all T)
  • Implementors give a more broad type than Self
    • This really only affects bind parameters. There are 6 different Rust types which can be used for a parameter of type timestamp. The same statement can be used regardless of the Rust type, so Bound<ST, T> defines its QueryId as Bound<ST, ()>.

A type returning Some(id) or None for its query ID is based on whether the SQL it generates can change without the type changing. At the moment, the only type which is safe to cache as a prepared statement but does not have a static query ID is something which has been boxed.

One potential optimization that we don’t perform is storing the queries which are cached by type ID in a separate map. Since a type ID is a u64, this would allow us to use a specialized map which knows that there will never be hashing collisions (also known as a perfect hashing function), which would mean lookups are always constant time. However, this would save nanoseconds on an operation that will take microseconds or even milliseconds.