Module diesel::connection::statement_cache
source · i-implement-a-third-party-backend-and-opt-into-breaking-changes
only.Expand description
Helper types for prepared statement caching
A primer on prepared statement caching in Diesel
Diesel uses prepared statements for virtually all queries. This is most
visible in our lack of any sort of “quoting” API. Values must always be
transmitted as bind parameters, we do not support direct interpolation. The
only method in the public API that doesn’t require the use of prepared
statements is SimpleConnection::batch_execute
.
In order to avoid the cost of re-parsing and planning subsequent queries, Diesel caches the prepared statement whenever possible. Queries will fall into one of three buckets:
- Unsafe to cache
- Cached by SQL
- Cached by type
A query is considered unsafe to cache if it represents a potentially
unbounded number of queries. This is communicated to the connection through
QueryFragment::is_safe_to_cache_prepared
. While this is done as a full AST
pass, after monomorphisation and inlining this will usually be optimized to
a constant. Only boxed queries will need to do actual work to answer this
question.
The majority of AST nodes are safe to cache if their components are safe to cache. There are at least 4 cases where a query is unsafe to cache:
- queries containing
IN
with bind parameters- This requires 1 bind parameter per value, and is therefore unbounded
IN
with subselects are cached (assuming the subselect is safe to cache)IN
statements for postgresql are cached as they use= ANY($1)
instead which does not cause a unbound number of binds
INSERT
statements with a variable number of rows- The SQL varies based on the number of rows being inserted.
UPDATE
statements- Technically it’s bounded on “number of optional values being passed to
SET
factorial” but that’s still quite high, and not worth caching for the same reason as single row inserts
- Technically it’s bounded on “number of optional values being passed to
SqlLiteral
nodes- We have no way of knowing whether the SQL was generated dynamically or not, so we must assume that it’s unbounded
For queries which are unsafe to cache, the statement cache will never insert them. They will be prepared and immediately released after use (or in the case of PG they will use the unnamed prepared statement).
For statements which are able to be cached, we then have to determine what to use as the cache key. The standard method that virtually all ORMs or database access layers use in the wild is to store the statements in a hash map, using the SQL as the key.
However, the majority of queries using Diesel that are safe to cache as
prepared statements will be uniquely identified by their type. For these
queries, we can bypass the query builder entirely. Since our AST is
generally optimized away by the compiler, for these queries the cost of
fetching a prepared statement from the cache is the cost of HashMap<u32, _>::get
, where the key we’re fetching by is a compile time constant. For
these types, the AST pass to gather the bind parameters will also be
optimized to accessing each parameter individually.
Determining if a query can be cached by type is the responsibility of the
QueryId
trait. This trait is quite similar to Any
, but with a few
differences:
- No
'static
bound- Something being a reference never changes the SQL that is generated,
so
&T
has the same query id asT
.
- Something being a reference never changes the SQL that is generated,
so
Option<TypeId>
instead ofTypeId
- We need to be able to constrain on this trait being implemented, but
not all types will actually have a static query id. Hopefully once
specialization is stable we can remove the
QueryId
bound and specialize on it instead (or provide a blanket impl for allT
)
- We need to be able to constrain on this trait being implemented, but
not all types will actually have a static query id. Hopefully once
specialization is stable we can remove the
- Implementors give a more broad type than
Self
- This really only affects bind parameters. There are 6 different Rust
types which can be used for a parameter of type
timestamp
. The same statement can be used regardless of the Rust type, soBound<ST, T>
defines itsQueryId
asBound<ST, ()>
.
- This really only affects bind parameters. There are 6 different Rust
types which can be used for a parameter of type
A type returning Some(id)
or None
for its query ID is based on whether
the SQL it generates can change without the type changing. At the moment,
the only type which is safe to cache as a prepared statement but does not
have a static query ID is something which has been boxed.
One potential optimization that we don’t perform is storing the queries which are cached by type ID in a separate map. Since a type ID is a u64, this would allow us to use a specialized map which knows that there will never be hashing collisions (also known as a perfect hashing function), which would mean lookups are always constant time. However, this would save nanoseconds on an operation that will take microseconds or even milliseconds.
Structs
- A prepared statement cache
Enums
- Wraps a possibly cached prepared statement
- A helper type that indicates if a certain query is cached inside of the prepared statement cache or not
- The lookup key used by
StatementCache
internally