Appendix 8: CQL Best Practices
This is a brief discussion of every statement type and some general best practices for that statement. The statements are in mostly alphabetical order except related statements were moved up in the order to make logical groups.
Refer also to Appendix 7: CQL Anti-patterns.
Data Definition Language (DDL)β
ALTER TABLE ADD COLUMN
CREATE INDEX
CREATE PROC
CREATE TABLE
CREATE TRIGGER
CREATE VIEW
CREATE VIRTUAL TABLE
DROP INDEX
DROP TABLE
DROP TRIGGER
DROP VIEW
These statements almost never appear in normal procedures and generally should be avoided. The normal way of handling schema in CQL
is to have one or more files declare all the schema you need and then let CQL create a schema upgrader for you. This means you'll
never manually drop tables or indices etc. The create
declarations with their annotations will totally drive the schema.
Any ad hoc DDL is usually a very bad sign. Test code is an obvious exception to this as it often does setup and teardown of schema to set up things for the test.
Ad Hoc Migrationsβ
@SCHEMA_AD_HOC_MIGRATION
This is a special upgrade step that should be taken at the version indicated in the statement. These can be quite complex and even super important but should not be used lightly. Any migration procedure has to be highly tolerant of a variety of incoming schema versions and previous partial successes. In any case this directive should not appear in normal code. It should be part of the schema DDL declarations.
Transactionsβ
BEGIN TRANSACTION
COMMIT TRANSACTION
ROLLBACK TRANSACTION
Transactions do not nest and most procedures do not know the context in which they will be called, so the vast majority of procedures will not and should not actually start transactions. You can only do this if you know, somehow, for sure, that the procedure in question is somehow a "top level" procedure. So generally, don't use these statements.
Savepointsβ
SAVEPOINT
ROLLBACK TO SAVEPOINT
RELEASE SAVEPOINT
PROC SAVEPOINT
COMMIT RETURN
ROLLBACK RETURN
Savepoints are the preferred tool for having interim state that can be rolled back if needed. You can use ad hoc
savepoints, just give your save point and name then use RELEASE SAVEPOINT
to commit it, or else ROLLBACK TO SAVEPOINT
followed by a RELEASE
to abort it. Note that you always RELEASE
savepoints in both the rollback and the commit case.
Managing savepoints can be tricky, especially given the various error cases. They combine nicely with TRY CATCH
to do
this job. However, even that is a lot of boilerplate. The best way to use savepoints is with PROC SAVEPOINT BEGIN
.. END
;
When you use PROC SAVEPOINT
, a savepoint is created for you with the name of your procedure. When the block exits
the savepoint is released (committed). However you also get an automatically generated try/catch block which will
rollback the savepoint if anything inside the block were to invoke THROW
. Also, you may not use a regular RETURN
inside this block, you must use either ROLLBACK RETURN
or COMMIT RETURN
. Both of these directly indicate the fate
of the automatically generated statement when they run. This gives you useful options to early-out (with no error)
while keeping or abandoning any work in progress. Of course you can use THROW
to return an error and
abandon the work in progress.
Compilation optionsβ
@ENFORCE_NORMAL
@ENFORCE_POP
@ENFORCE_PUSH
@ENFORCE_RESET
@ENFORCE_STRICT
CQL allows you to specify a number of useful options such as "do not allow Window Functions" or "all foreign keys must choose some update or delete strategy".
These additional enforcements are designed to prevent errors. Because of this they should be established once, somewhere central and they should be rarely
if ever overridden. For instance @ENFORCE_NORMAL WINDOW FUNCTION
would allow you to use window functions again, but this is probably a bad idea. If
strict mode is on, disallowing them, that probably means your project is expected to target versions of SQLite that do not have window functions. Overriding
that setting is likely to lead to runtime errors.
In general you don't want to see these options in most code.
Previous Schemaβ
@PREVIOUS_SCHEMA
CQL can ensure that the current schema is compatible with the previous schema, meaning that an upgrade script could reasonably be generated to go from the previous to the current. This directive demarks the start of the previous schema section when that validation happens. This directive is useless except for creating that schema validation so it should never appear in normal procedures.
Schema Regionsβ
@BEGIN_SCHEMA_REGION
@DECLARE_DEPLOYABLE_REGION
@DECLARE_SCHEMA_REGION
@END_SCHEMA_REGION
CQL allows you to declare arbitrary schema regions and limit what parts of the schema any given region may consume. This helps you to prevent schema from getting entangled. There is never a reason to use this directives inside normal procedures; They should appear only in your schema declaration files.
Schema Versionβ
@SCHEMA_UPGRADE_SCRIPT
@SCHEMA_UPGRADE_VERSION
The @SCHEMA_UPGRADE_SCRIPT
directive is only used by CQL itself to declare that the incoming file is an autogenerated schema upgrade script.
These scripts have slightly different rules for schema declaration that are not useful outside of such scripts. So you should never use this.
@SCHEMA_UPGRADE_VERSION
on the other hand is used if you are creating a manual migration script. You need this script to run in the context
of the schema version that it affects. Use this directive at the start of the file to do so. Generally manual migration scripts are to be
avoided so hopefully this directive is rarely if ever used.
C Text Echoβ
@ECHO
This directive emits plain text directly into the compiler's output stream. It can be invaluable for adding new runtime features and for ensuring that
(e.g.) additional #include
or #define
directives are present in the output but you can really break things by over-using this feature. Most parts
of the CQL output are subject to change so any use of this should be super clean. The intended use was, as mentioned, to allow an extra #include
in your code
so that CQL could call into some library. Most uses of this combine with DECLARE FUNCTION
or DECLARE PROCEDURE
to declare an external entity.
Enumerationsβ
DECLARE ENUM
@EMIT_ENUMS
Avoid embedded constants whenever possible. Instead declare a suitable enumeration. Use @EMIT_ENUMS Some_Enum
to get the enumeration
constants into the generated .h file for C. But be sure to do this only from one compiland. You do not want the enumerations in every .h file.
Choose a single .sql file (not included by lots of other things) to place the @EMIT_ENUMS
directive. You can make a file specifically for this
purpose if nothing else is serviceable.
Cursor Lifetimeβ
CLOSE
OPEN
The OPEN
statement is a no-op, SQLite has no such notion. It was included because it is present in MYSQL
and other variants and its inclusion can
ease readability sometimes. But it does nothing. The CLOSE
statement is normally not necessary because all cursors are closed at the end of the
procedure they are declared in (unless they are boxed, see below). You only need CLOSE
if you want to close a global cursor (which has no scope)
or if you want to close a local cursor "sooner" because waiting to the end of the procedure might be a very long time. Using close more than once
is safe, the second and later close operations do nothing.
Procedure Calls and Exceptionsβ
CALL
THROW
TRY CATCH
Remember that if you call a procedure and it uses THROW
or else uses some SQL that failed, this return code will cause your
code to THROW
when the procedure returns. Normally that's exactly what you want, the error will ripple out and some top-level
CATCH
will cause a ROLLBACK
and the top level callers sees the error. If you have your own rollback needs be sure to install
your own TRY
/CATCH
block or else use PROC SAVEPOINT
as above to do it for you.
Inside of a CATCH
block you can use the special variable @RC
to see the most recent return code from SQLite.
Control Flow with "Big Moves"β
CONTINUE
LEAVE
RETURN
These work as usual but beware, you can easily use any of these to accidentally leave a block with a savepoint or transaction
and you might skip over the ROLLBACK
or COMMIT
portions of the logic. Avoid this problem by using PROC SAVEPOINT
.
Getting access to external codeβ
DECLARE FUNCTION
DECLARE SELECT FUNCTION
DECLARE PROCEDURE
The best practice is to put any declarations into a shared header file which you can #include
in all the places it is needed.
This is especially important should you have to forward declare a procedure. CQL normally provides exports for all procedures
so you basically get an automatically generated and certain-to-be-correct #include
file. But, if the procedures are being compiled
together then an export file won't have been generated yet at the time you need it; To work around this you use the DECLARE PROCEDURE
form. However, procedure declarations are tricky; they include not just the type of the arguments but the types of any/all of the
columns in any result set the procedure might have. This must not be wrong or callers will get unpredictable failures.
The easiest way to ensure it is correct is to use the same trick as you would in C -- make sure that you #include
the declaration
the in the translation unit with the definition. If they don't match there will be an error.
A very useful trick: the error will include the exact text of the correct declaration. So if you don't know it, or are too lazy to
figure it out; simply put ANY
declaration in the shared header file and then paste in the correct declaration from the error. should
the definition ever change you will get a compilation error which you can again harvest to get the correct declaration.
In this way you can be sure the declarations are correct.
Functions have no CQL equivalent, but they generally don't change very often. Use DECLARE FUNCTION
to allow access to some C code
that returns a result of some kind. Be sure to add the CREATE
option if the function returns a reference that the caller owns.
Use DECLARE SELECT FUNCTION
to tell CQL about any User Defined Functions you have added to SQLite so that it knows how to call them.
Note that CQL does not register those UDFs, it couldn't make that call lacking the essential C information required to do so. If you
find that you are getting errors when calling a UDF the most likely reason for the failure is that the UDF was declared but never
registered with SQLite at runtime. This happens in test code a lot -- product code tends to have some central place to register the
UDFs and it normally runs at startup, e.g. right after the schema is upgraded.
Regular Data Manipulation Language (DML)β
DELETE
INSERT
SELECT
UPDATE
UPSERT
These statements are the most essential and they'll appear in almost every procedure. There are a few general best practices we can go over.
Try to do as much as you can in one batch rather than iterating; e.g.
- don't write a loop with a
DELETE
statement that deletes one row if you can avoid it, write a delete statement that deletes all you need to delete - don't write a loop with of
SELECT
statement that fetches one row, try to fetch all the rows you need with one select
- don't write a loop with a
Make sure
UPSERT
is supported on the SQLite system you are using, older versions do not support itDon't put unnecessary casts in your
SELECT
statements, they just add fatDon't use
CASE
/WHEN
to compute a boolean, the boolean operations are more economical (e.g. useIS
)Don't use
COUNT
if all you need to know is whether a row exists or not, useEXISTS
Don't use
GROUP BY
,ORDER BY
, orDISTINCT
on large rowsets, the sort is expensive and it will make yourSELECT
statements write to disk rather than just readAlways use the
INSERT INTO FOO USING
form of theINSERT
statement, it's much easier to read than the standard form and compiles to the same thing
Variable and Cursor declarationsβ
DECLARE OUT CALL
DECLARE
LET
SET
These are likely to appear all over as well. If you can avoid a variable declaration by using LET
then do so; The code will be more concise and you'll
get the exact variable type you need. This is the same as var x = foo();
in other languages. Once the variable is declared use SET
.
You can save yourself a lot of declarations of OUT
variables with DECLARE OUT CALL
. That declaration form automatically declares the OUT
variables used
in the call you are about to make with the correct type. If the number of arguments changes you just have to add the args you don't have to also add
new declarations.
The LIKE
construct can be used to let you declare things whose type is the same as another thing. Patterns like DECLARE ARGS CURSOR LIKE FOO ARGUMENTS
save you a lot of typing and also enhance correctness. There's a whole chapter dedicated to "shapes" defined by LIKE
.
Query Plansβ
EXPLAIN
Explain can be used in front of other queries to generate a plan. The way SQLite handles this is that you fetch the rows of the plan as usual. So basically
EXPLAIN
is kind of like SELECT QUERY PLAN OF
. This hardly ever comes up in normal coding. CQL has an output option where it will generate code that gives you
the query plan for a procedures queries rather than the normal body of the procedure.
Fetching Data from a Cursor or from Loose Dataβ
FETCH
UPDATE CURSOR
The FETCH
statement has many variations, all are useful at some time or another. There are a few helpful guidelines.
- If fetching from loose values into a cursor use the
FETCH USING
form (as you would withINSERT INTO USING
) because it is less error prone FETCH INTO
is generally a bad idea, you'll have to declare a lot of variables, instead just rely on automatic storage in the cursor e.g.fetch my_cursor
rather thanfetch my_cursor into a, b, c
- If you have data already in a cursor you can mutate some of the columns using
UPDATE CURSOR
, this can let you adjust values or apply defaults
Control Flowβ
IF
LOOP
SWITCH
WHILE
These are your bread and butter and they will appear all over. One tip: Use the ALL VALUES
variant of switch whenever possible to ensure that you haven't missed any cases.
Manual Control of Resultsβ
OUT
OUT UNION
If you know you are producing exactly one row
OUT
is more economical thanSELECT
If you need complete flexibility on what rows to produce (e.g. skip some, add extras, mutate some) then
OUT UNION
will give you that, use it only when needed, it's more expensive than justSELECT
CTEs and Shared Fragmentsβ
To understand what kinds of things you can reasonably do with fragments, really you just have to understand the things that you can do with common table expressions or CTEs. For those who don't know, CTEs are the things you declare in the WITH clause of a SELECT statement. They're kind of like local views. Well, actually, they are exactly like local views.
Query fragments help you to define useful CTEs so basically what you can do economically in a CTE directly determines what you can do economically in a fragment.
To demonstrate some things that happen with CTEs we're going to use these three boring tables.
create table A
(
id integer primary key,
this text not null
);
create table B
(
id integer primary key,
that text not null
);
create table C
(
id integer primary key,
other text not null
);
Let's start with a very simple example, the first few examples are like control cases.
explain query plan
select * from A
inner join B on B.id = A.id;
QUERY PLAN
|--SCAN TABLE A
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
OK as we can see A
is not constrained so it has to be scanned but B
isn't scanned,
we use its primary key for the join. This is the most common kind of join: a search
based on a key of the table you are joining to.
Let's make it a bit more realistic.
explain query plan
select * from A
inner join B on B.id = A.id
where A.id = 5;
QUERY PLAN
|--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
Now A
is constrained by the WHERE
clause so we can use its index and then use the B
index.
So we get a nice economical join from A
to B
and no scans at all.
Now suppose we try this with some CTE replacements for A
and B
. Does this make it worse?
explain query plan
with
AA(id, this) as (select * from A),
BB(id, that) as (select * from B)
select * from AA
left join BB on BB.id = AA.id
where AA.id = 5;
QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
The answer is a resounding no. The CTE AA
was not materialized it was expanded in place,
as was the CTE BB
. We get exactly the same query plan. Now this means that the
inner expressions like select * from A
could have been fragments such as:
@attribute(cql:shared_fragment)
create proc A_()
begin
select * from A;
end;
@attribute(cql:shared_fragment)
create proc B_()
begin
select * from B;
end;
explain query plan
with
(call A_()), -- short for A_(*) AS (call A_())
(call B_()) -- short for B_(*) AS (call B_())
select * from A_
left join B_ on B_.id = A_.id
where A_.id = 5;
Note: I'll use the convention that A_
is the fragment proc that could have generated the CTE AA
,
likewise with B_
and so forth.
The above will expand into exactly what we had before and hence will have the exactly same good query plan. Of course this is totally goofy, why make a fragment like that -- it's just more typing. Well now lets generalize the fragments just a bit.
@attribute(cql:shared_fragment)
create proc A_(experiment bool not null)
begin
-- data source might come from somewhere else due to an experiment
if not experiment then
select * from A;
else
select id, this from somewhere_else;
end if;
end;
@attribute(cql:shared_fragment)
create proc B_()
begin
-- we don't actually refer to "B" if the filter is null
if b_filter is not null then
-- applies b_filter if specified
select * from B where B.other like b_filter;
else
-- generates the correct shape but zero rows of it
select null as id, null as that where false;
end if;
end;
create proc getAB(
id_ integer not null,
experiment bool not null,
b_filter text)
begin
with
(call A_(experiment)),
(call B_(b_filter))
select * from A_
left join B_ on B_.id = A_.id
where A_.id = id_;
end;
The above now has 4 combos economically encoded and all of them have a good plan.
Importantly though, if b_filter
is not specified then we don't actually join to B
.
The B_
CTE will have no reference to B
, it just has zero rows.
Now lets look at some things you don't want to do.
Consider this form:
explain query plan
with
AA(id, this) as (select * from A),
BB(id, that) as (select A.id, B.that from A left join B on B.id = A.id)
select * from AA
left join BB on BB.id = AA.id
where AA.id = 5;
QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
Note that here we get 3 joins. Now a pretty cool thing happened here -- even though the expression
for BB
does not include a WHERE
clause SQLite has figured out the AA.id
being 5 forces A.id
to be 5
which in turn gives a constraint on BB
. Nice job SQLite. If it hadn't been able to figure that out
then the expansion of BB
would have resulted in a table scan.
Still, 3 joins is bad when we only need 2 joins to do the job. What happened? Well, when we did
the original fragments with extensions and stuff we saw this same pattern in fragment code.
Basically the fragment for BB
isn't just doing the B
things it's restarting from A
and doing its
own join to get B
. This results in a wasted join. And it might result in a lot of work on the A
table
as well if the filtering was more complex and couldn't be perfectly inferred.
You might think, "oh, no problem, I can save this, I'll just refer to AA
instead of A
in the second query."
This does not help (but it's going in the right direction):
explain query plan
with
AA(id, this) as (select * from A),
BB(id, that) as (select AA.id, B.that from AA left join B on B.id = AA.id)
select * from AA
left join BB on BB.id = AA.id
where AA.id = 5;
QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
In terms of fragments the anti-pattern is this.
@attribute(cql:shared_fragment)
create proc B_()
begin
select B.* from A left join B on B.id = A.id;
end;
The above starts the query for B
again from the root. You can save this, the trick is to not
try to generate just the B
columns and then join them later. You can get a nice data
flow going with chain of CTEs.
explain query plan
with
AA(id, this) as (select * from A),
AB(id, this, that) as (select AA.*, B.that from AA left join B on B.id = AA.id)
select * from AB
where AB.id = 5;
QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
And we're right back to the perfect plan. The good form creates a CTE chain where we only need the result of the final CTE. A straight line of CTEs each depending on the previous one results in a excellent data flow.
In terms of fragments this is now:
@attribute(cql:shared_fragment)
create proc A_()
begin
select * from A;
end;
@attribute(cql:shared_fragment)
create proc AB_()
begin
with
(call A_)
select A_.*, B.that from A_ left join B on B.id = A_.id
end;
with (call AB_())
select * from AB_ where AB_.id = 5;
For brevity I didn't include the possibility of using IF and such. Another option
that makes the same good query plan. We can generalize AB_
so that it doesn't know
where the base data is coming from and can be used in more cases.
@attribute(cql:shared_fragment)
create proc A_()
begin
select * from A;
end;
@attribute(cql:shared_fragment)
create proc AB_()
begin
with
source(*) like A -- you must provide some source that is the same shape as A
select source.*, B.that from source left join B on B.id = source.id
end;
with
(call A_())
(call AB_() using A_ as source)
select * from AB_ where AB_.id = 5;
Again this results in a nice straight chain of CTEs and even though the where clause is last the A table is constrained properly.
It's important not to fork the chain... if you do that then whatever came before the fork must be materialized for use in both branches. That can be quite bad because then the filtering might come after the materialization. This is an example that is quite bad.
explain query plan
with
AA(id, this) as (select * from A),
BB(id, that) as (select AA.id, B.that from AA left join B on B.id = AA.id),
CC(id, other) as (select AA.id, C.other from AA left join C on C.id = AA.id)
select * from AA
left join BB on BB.id = AA.id
left join CC on CC.id = AA.id
where AA.id = 5;
QUERY PLAN
|--MATERIALIZE 2
| |--SCAN TABLE A
| \--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
|--MATERIALIZE 3
| |--SCAN TABLE A
| \--SEARCH TABLE C USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SCAN SUBQUERY 2
\--SEARCH SUBQUERY 3 USING AUTOMATIC COVERING INDEX (id=?)
Things have gone way of the rails here. As you can see A
is now
scanned twice. and there are many more joins. We could make this
a lot better by moving the A condition all the way up into the first
CTE. With fragments that would just mean creating something like
@attribute(cql:shared_fragment)
create proc A_(id_)
begin
select * from A where A.id = id_;
end;
At least then if we have to materialize we'll get only one row. This could be a good thing to do universally, but it's especially important if you know that forking in the query shape is mandatory for some reason.
A better pattern might be this:
explain query plan
with
AA(id, this) as (select * from A),
AB(id, this, that) as (select AA.*, B.that from AA left join B on B.id = AA.id),
ABC(id, this, that, other) as (select AB.*, C.other from AB left join C on C.id = AB.id)
select * from ABC
where ABC.id = 5;
QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE C USING INTEGER PRIMARY KEY (rowid=?)
Here we've just extended the chain. With shared fragments you could easily build an
AB_
proc as before and then build an ABC_
proc either by calling AB_
directly or by
having a table parameter that is LIKE AB_
.
Both cases will give you a great plan.
So the most important things are:
- Avoid forking the chain of CTEs/fragments, a straight chain works great.
- Avoid re-joining to tables, even unconstrained CTEs result in great plans if they don't have to be materialized.
- If you do need to fork in your CTE chain, because of your desired shape, be sure to move as many filters as you can further upstream so that by the time you materialize only a very small number of rows need to be materialiized.
These few rules will go far in helping you to create shapes.
One last thing, without shared fragments, if you wanted to create a large 10 way join or something you had to type that join into your file and it would be very much in your face. Now that join might be hidden from you in a nice easy-to-use fragment. Which you might then decide you want to use 3 times... And now with a tiny amount of code you have 30 joins.
The thing is shared fragments make it easy to generate a lot of SQL. It's not bad that shared fragments make things easy, but with great power comes great responsibility, so give a care as to what it is you are assembling. Understanding your fragments, especially any big ones, will help you to create great code.