Appendix 8: CQL Best Practices

This is a brief discussion of every statement type and some general best practices for that statement. The statements are in mostly alphabetical order except related statements were moved up in the order to make logical groups.

Refer also to Appendix 7: CQL Anti-patterns.

Data Definition Language (DDL)

ALTER TABLE ADD COLUMN
CREATE INDEX
CREATE PROC
CREATE TABLE
CREATE TRIGGER
CREATE VIEW
CREATE VIRTUAL TABLE
DROP INDEX
DROP TABLE
DROP TRIGGER
DROP VIEW

These statements almost never appear in normal procedures and generally should be avoided. The normal way of handling schema in CQL is to have one or more files declare all the schema you need and then let CQL create a schema upgrader for you. This means you'll never manually drop tables or indices etc. The create declarations with their annotations will totally drive the schema.

Any ad hoc DDL is usually a very bad sign. Test code is an obvious exception to this as it often does setup and teardown of schema to set up things for the test.

Ad Hoc Migrations

@SCHEMA_AD_HOC_MIGRATION

This is a special upgrade step that should be taken at the version indicated in the statement. These can be quite complex and even super important but should not be used lightly. Any migration procedure has to be highly tolerant of a variety of incoming schema versions and previous partial successes. In any case this directive should not appear in normal code. It should be part of the schema DDL declarations.

Transactions

BEGIN TRANSACTION
COMMIT TRANSACTION
ROLLBACK TRANSACTION

Transactions do not nest and most procedures do not know the context in which they will be called, so the vast majority of procedures will not and should not actually start transactions. You can only do this if you know, somehow, for sure, that the procedure in question is somehow a "top level" procedure. So generally, don't use these statements.

Savepoints

SAVEPOINT
ROLLBACK TO SAVEPOINT
RELEASE SAVEPOINT
PROC SAVEPOINT
COMMIT RETURN
ROLLBACK RETURN

Savepoints are the preferred tool for having interim state that can be rolled back if needed. You can use ad hoc savepoints, just give your save point and name then use RELEASE SAVEPOINT to commit it, or else ROLLBACK TO SAVEPOINT followed by a RELEASE to abort it. Note that you always RELEASE savepoints in both the rollback and the commit case.

Managing savepoints can be tricky, especially given the various error cases. They combine nicely with TRY CATCH to do this job. However, even that is a lot of boilerplate. The best way to use savepoints is with PROC SAVEPOINT BEGIN .. END;

When you use PROC SAVEPOINT, a savepoint is created for you with the name of your procedure. When the block exits the savepoint is released (committed). However you also get an automatically generated try/catch block which will rollback the savepoint if anything inside the block were to invoke THROW. Also, you may not use a regular RETURN inside this block, you must use either ROLLBACK RETURN or COMMIT RETURN. Both of these directly indicate the fate of the automatically generated statement when they run. This gives you useful options to early-out (with no error) while keeping or abandoning any work in progress. Of course you can use THROW to return an error and abandon the work in progress.

Compilation options

@ENFORCE_NORMAL
@ENFORCE_POP
@ENFORCE_PUSH
@ENFORCE_RESET
@ENFORCE_STRICT

CQL allows you to specify a number of useful options such as "do not allow Window Functions" or "all foreign keys must choose some update or delete strategy". These additional enforcements are designed to prevent errors. Because of this they should be established once, somewhere central and they should be rarely if ever overridden. For instance @ENFORCE_NORMAL WINDOW FUNCTION would allow you to use window functions again, but this is probably a bad idea. If strict mode is on, disallowing them, that probably means your project is expected to target versions of SQLite that do not have window functions. Overriding that setting is likely to lead to runtime errors.

In general you don't want to see these options in most code.

Previous Schema

@PREVIOUS_SCHEMA

CQL can ensure that the current schema is compatible with the previous schema, meaning that an upgrade script could reasonably be generated to go from the previous to the current. This directive demarks the start of the previous schema section when that validation happens. This directive is useless except for creating that schema validation so it should never appear in normal procedures.

Schema Regions

@BEGIN_SCHEMA_REGION
@DECLARE_DEPLOYABLE_REGION
@DECLARE_SCHEMA_REGION
@END_SCHEMA_REGION

CQL allows you to declare arbitrary schema regions and limit what parts of the schema any given region may consume. This helps you to prevent schema from getting entangled. There is never a reason to use this directives inside normal procedures; They should appear only in your schema declaration files.

Schema Version

@SCHEMA_UPGRADE_SCRIPT
@SCHEMA_UPGRADE_VERSION

The @SCHEMA_UPGRADE_SCRIPT directive is only used by CQL itself to declare that the incoming file is an autogenerated schema upgrade script. These scripts have slightly different rules for schema declaration that are not useful outside of such scripts. So you should never use this.

@SCHEMA_UPGRADE_VERSION on the other hand is used if you are creating a manual migration script. You need this script to run in the context of the schema version that it affects. Use this directive at the start of the file to do so. Generally manual migration scripts are to be avoided so hopefully this directive is rarely if ever used.

C Text Echo

@ECHO

This directive emits plain text directly into the compiler's output stream. It can be invaluable for adding new runtime features and for ensuring that (e.g.) additional #include or #define directives are present in the output but you can really break things by over-using this feature. Most parts of the CQL output are subject to change so any use of this should be super clean. The intended use was, as mentioned, to allow an extra #include in your code so that CQL could call into some library. Most uses of this combine with DECLARE FUNCTION or DECLARE PROCEDURE to declare an external entity.

Enumerations

DECLARE ENUM
@EMIT_ENUMS

Avoid embedded constants whenever possible. Instead declare a suitable enumeration. Use @EMIT_ENUMS Some_Enum to get the enumeration constants into the generated .h file for C. But be sure to do this only from one compiland. You do not want the enumerations in every .h file. Choose a single .sql file (not included by lots of other things) to place the @EMIT_ENUMS directive. You can make a file specifically for this purpose if nothing else is serviceable.

Cursor Lifetime

CLOSE
OPEN

The OPEN statement is a no-op, SQLite has no such notion. It was included because it is present in MYSQL and other variants and its inclusion can ease readability sometimes. But it does nothing. The CLOSE statement is normally not necessary because all cursors are closed at the end of the procedure they are declared in (unless they are boxed, see below). You only need CLOSE if you want to close a global cursor (which has no scope) or if you want to close a local cursor "sooner" because waiting to the end of the procedure might be a very long time. Using close more than once is safe, the second and later close operations do nothing.

Procedure Calls and Exceptions

CALL
THROW
TRY CATCH

Remember that if you call a procedure and it uses THROW or else uses some SQL that failed, this return code will cause your code to THROW when the procedure returns. Normally that's exactly what you want, the error will ripple out and some top-level CATCH will cause a ROLLBACK and the top level callers sees the error. If you have your own rollback needs be sure to install your own TRY/CATCH block or else use PROC SAVEPOINT as above to do it for you.

Inside of a CATCH block you can use the special variable @RC to see the most recent return code from SQLite.

Control Flow with "Big Moves"

CONTINUE
LEAVE
RETURN

These work as usual but beware, you can easily use any of these to accidentally leave a block with a savepoint or transaction and you might skip over the ROLLBACK or COMMIT portions of the logic. Avoid this problem by using PROC SAVEPOINT.

Getting access to external code

DECLARE FUNCTION
DECLARE SELECT FUNCTION
DECLARE PROCEDURE

The best practice is to put any declarations into a shared header file which you can #include in all the places it is needed. This is especially important should you have to forward declare a procedure. CQL normally provides exports for all procedures so you basically get an automatically generated and certain-to-be-correct #include file. But, if the procedures are being compiled together then an export file won't have been generated yet at the time you need it; To work around this you use the DECLARE PROCEDURE form. However, procedure declarations are tricky; they include not just the type of the arguments but the types of any/all of the columns in any result set the procedure might have. This must not be wrong or callers will get unpredictable failures.

The easiest way to ensure it is correct is to use the same trick as you would in C -- make sure that you #include the declaration the in the translation unit with the definition. If they don't match there will be an error.

A very useful trick: the error will include the exact text of the correct declaration. So if you don't know it, or are too lazy to figure it out; simply put ANY declaration in the shared header file and then paste in the correct declaration from the error. should the definition ever change you will get a compilation error which you can again harvest to get the correct declaration.

In this way you can be sure the declarations are correct.

Functions have no CQL equivalent, but they generally don't change very often. Use DECLARE FUNCTION to allow access to some C code that returns a result of some kind. Be sure to add the CREATE option if the function returns a reference that the caller owns.

Use DECLARE SELECT FUNCTION to tell CQL about any User Defined Functions you have added to SQLite so that it knows how to call them. Note that CQL does not register those UDFs, it couldn't make that call lacking the essential C information required to do so. If you find that you are getting errors when calling a UDF the most likely reason for the failure is that the UDF was declared but never registered with SQLite at runtime. This happens in test code a lot -- product code tends to have some central place to register the UDFs and it normally runs at startup, e.g. right after the schema is upgraded.

Regular Data Manipulation Language (DML)

DELETE
INSERT
SELECT
UPDATE
UPSERT

These statements are the most essential and they'll appear in almost every procedure. There are a few general best practices we can go over.

Try to do as much as you can in one batch rather than iterating; e.g.
- don't write a loop with a DELETE statement that deletes one row if you can avoid it, write a delete statement that deletes all you need to delete
- don't write a loop with of SELECT statement that fetches one row, try to fetch all the rows you need with one select
Make sure UPSERT is supported on the SQLite system you are using, older versions do not support it
Don't put unnecessary casts in your SELECT statements, they just add fat
Don't use CASE/WHEN to compute a boolean, the boolean operations are more economical (e.g. use IS)
Don't use COUNT if all you need to know is whether a row exists or not, use EXISTS
Don't use GROUP BY, ORDER BY, or DISTINCT on large rowsets, the sort is expensive and it will make your SELECT statements write to disk rather than just read
Always use the INSERT INTO FOO USING form of the INSERT statement, it's much easier to read than the standard form and compiles to the same thing

Variable and Cursor declarations

DECLARE OUT CALL
DECLARE
LET
SET

These are likely to appear all over as well. If you can avoid a variable declaration by using LET then do so; The code will be more concise and you'll get the exact variable type you need. This is the same as var x = foo(); in other languages. Once the variable is declared use SET.

You can save yourself a lot of declarations of OUT variables with DECLARE OUT CALL. That declaration form automatically declares the OUT variables used in the call you are about to make with the correct type. If the number of arguments changes you just have to add the args you don't have to also add new declarations.

The LIKE construct can be used to let you declare things whose type is the same as another thing. Patterns like DECLARE ARGS CURSOR LIKE FOO ARGUMENTS save you a lot of typing and also enhance correctness. There's a whole chapter dedicated to "shapes" defined by LIKE.

Query Plans

EXPLAIN

Explain can be used in front of other queries to generate a plan. The way SQLite handles this is that you fetch the rows of the plan as usual. So basically EXPLAIN is kind of like SELECT QUERY PLAN OF. This hardly ever comes up in normal coding. CQL has an output option where it will generate code that gives you the query plan for a procedures queries rather than the normal body of the procedure.

Fetching Data from a Cursor or from Loose Data

FETCH
UPDATE CURSOR

The FETCH statement has many variations, all are useful at some time or another. There are a few helpful guidelines.

If fetching from loose values into a cursor use the FETCH USING form (as you would with INSERT INTO USING) because it is less error prone
FETCH INTO is generally a bad idea, you'll have to declare a lot of variables, instead just rely on automatic storage in the cursor e.g. fetch my_cursor rather than fetch my_cursor into a, b, c
If you have data already in a cursor you can mutate some of the columns using UPDATE CURSOR, this can let you adjust values or apply defaults

Control Flow

IF
LOOP
SWITCH
WHILE

These are your bread and butter and they will appear all over. One tip: Use the ALL VALUES variant of switch whenever possible to ensure that you haven't missed any cases.

Manual Control of Results

OUT
OUT UNION
If you know you are producing exactly one row OUT is more economical than SELECT
If you need complete flexibility on what rows to produce (e.g. skip some, add extras, mutate some) then OUT UNION will give you that, use it only when needed, it's more expensive than just SELECT

CTEs and Shared Fragments

To understand what kinds of things you can reasonably do with fragments, really you just have to understand the things that you can do with common table expressions or CTEs. For those who don't know, CTEs are the things you declare in the WITH clause of a SELECT statement. They're kind of like local views. Well, actually, they are exactly like local views.

Query fragments help you to define useful CTEs so basically what you can do economically in a CTE directly determines what you can do economically in a fragment.

To demonstrate some things that happen with CTEs we're going to use these three boring tables.

create table A
(
   id integer primary key,
   this text not null
);

create table B
(
   id integer primary key,
   that text not null
);

create table C
(
   id integer primary key,
   other text not null
);

Let's start with a very simple example, the first few examples are like control cases.

explain query plan
select * from A
inner join B on B.id = A.id;

QUERY PLAN
|--SCAN TABLE A
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)

OK as we can see A is not constrained so it has to be scanned but B isn't scanned, we use its primary key for the join. This is the most common kind of join: a search based on a key of the table you are joining to.

Let's make it a bit more realistic.

explain query plan
select * from A
inner join B on B.id = A.id
where A.id = 5;

QUERY PLAN
|--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)

Now A is constrained by the WHERE clause so we can use its index and then use the B index. So we get a nice economical join from A to B and no scans at all.

Now suppose we try this with some CTE replacements for A and B. Does this make it worse?

explain query plan
with
  AA(id, this) as (select * from A),
  BB(id, that) as (select * from B)
select * from AA
left join BB on BB.id = AA.id
where AA.id = 5;

QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)

The answer is a resounding no. The CTE AA was not materialized it was expanded in place, as was the CTE BB. We get exactly the same query plan. Now this means that the inner expressions like select * from A could have been fragments such as:

@attribute(cql:shared_fragment)
create proc A_()
begin
  select * from A;
end;

@attribute(cql:shared_fragment)
create proc B_()
begin
  select * from B;
end;

explain query plan
with
  (call A_()),    -- short for A_(*) AS (call A_())
  (call B_())     -- short for B_(*) AS (call B_())
select * from A_
left join B_ on B_.id = A_.id
where A_.id = 5;

Note: I'll use the convention that A_ is the fragment proc that could have generated the CTE AA, likewise with B_ and so forth.

The above will expand into exactly what we had before and hence will have the exactly same good query plan. Of course this is totally goofy, why make a fragment like that -- it's just more typing. Well now lets generalize the fragments just a bit.

@attribute(cql:shared_fragment)
create proc A_(experiment bool not null)
begin
  -- data source might come from somewhere else due to an experiment
  if not experiment then
    select * from A;
  else
    select id, this from somewhere_else;
  end if;
end;

@attribute(cql:shared_fragment)
create proc B_()
begin
  -- we don't actually refer to "B" if the filter is null
  if b_filter is not null then
    -- applies b_filter if specified
    select * from B where B.other like b_filter;
  else
    -- generates the correct shape but zero rows of it
    select null as id, null as that where false;
  end if;
end;

create proc getAB(
    id_ integer not null,
    experiment bool not null,
    b_filter text)
begin
  with
    (call A_(experiment)),
    (call B_(b_filter))
  select * from A_
  left join B_ on B_.id = A_.id
  where A_.id = id_;
end;

The above now has 4 combos economically encoded and all of them have a good plan. Importantly though, if b_filter is not specified then we don't actually join to B. The B_ CTE will have no reference to B, it just has zero rows.

Now lets look at some things you don't want to do.

Consider this form:

explain query plan
with
  AA(id, this) as (select * from A),
  BB(id, that) as (select A.id, B.that from A left join B on B.id = A.id)
select * from AA
left join BB on BB.id = AA.id
where AA.id = 5;

QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)

Note that here we get 3 joins. Now a pretty cool thing happened here -- even though the expression for BB does not include a WHERE clause SQLite has figured out the AA.id being 5 forces A.id to be 5 which in turn gives a constraint on BB. Nice job SQLite. If it hadn't been able to figure that out then the expansion of BB would have resulted in a table scan.

Still, 3 joins is bad when we only need 2 joins to do the job. What happened? Well, when we did the original fragments with extensions and stuff we saw this same pattern in fragment code. Basically the fragment for BB isn't just doing the B things it's restarting from A and doing its own join to get B. This results in a wasted join. And it might result in a lot of work on the A table as well if the filtering was more complex and couldn't be perfectly inferred.

You might think, "oh, no problem, I can save this, I'll just refer to AA instead of A in the second query."

This does not help (but it's going in the right direction):

explain query plan
with
  AA(id, this) as (select * from A),
  BB(id, that) as (select AA.id, B.that from AA left join B on B.id = AA.id)
select * from AA
left join BB on BB.id = AA.id
where AA.id = 5;

QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)

In terms of fragments the anti-pattern is this.

@attribute(cql:shared_fragment)
create proc B_()
begin
  select B.* from A left join B on B.id = A.id;
end;

The above starts the query for B again from the root. You can save this, the trick is to not try to generate just the B columns and then join them later. You can get a nice data flow going with chain of CTEs.

explain query plan
with
  AA(id, this) as (select * from A),
  AB(id, this, that) as (select AA.*, B.that from AA left join B on B.id = AA.id)
select * from AB
where AB.id = 5;

QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)

And we're right back to the perfect plan. The good form creates a CTE chain where we only need the result of the final CTE. A straight line of CTEs each depending on the previous one results in a excellent data flow.

In terms of fragments this is now:

@attribute(cql:shared_fragment)
create proc A_()
begin
  select * from A;
end;

@attribute(cql:shared_fragment)
create proc AB_()
begin
  with
  (call A_)
  select A_.*, B.that from A_ left join B on B.id = A_.id
end;

with (call AB_())
select * from AB_ where AB_.id = 5;

For brevity I didn't include the possibility of using IF and such. Another option that makes the same good query plan. We can generalize AB_ so that it doesn't know where the base data is coming from and can be used in more cases.

@attribute(cql:shared_fragment)
create proc A_()
begin
  select * from A;
end;

@attribute(cql:shared_fragment)
create proc AB_()
begin
  with
  source(*) like A -- you must provide some source that is the same shape as A
  select source.*, B.that from source left join B on B.id = source.id
end;

with
(call A_())
(call AB_() using A_ as source)
select * from AB_ where AB_.id = 5;

Again this results in a nice straight chain of CTEs and even though the where clause is last the A table is constrained properly.

It's important not to fork the chain... if you do that then whatever came before the fork must be materialized for use in both branches. That can be quite bad because then the filtering might come after the materialization. This is an example that is quite bad.

explain query plan
with
  AA(id, this) as (select * from A),
  BB(id, that) as (select AA.id, B.that from AA left join B on B.id = AA.id),
  CC(id, other) as (select AA.id, C.other from AA left join C on C.id = AA.id)
select * from AA
left join BB on BB.id = AA.id
left join CC on CC.id = AA.id
where AA.id = 5;

QUERY PLAN
|--MATERIALIZE 2
|  |--SCAN TABLE A
|  \--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
|--MATERIALIZE 3
|  |--SCAN TABLE A
|  \--SEARCH TABLE C USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SCAN SUBQUERY 2
\--SEARCH SUBQUERY 3 USING AUTOMATIC COVERING INDEX (id=?)

Things have gone way of the rails here. As you can see A is now scanned twice. and there are many more joins. We could make this a lot better by moving the A condition all the way up into the first CTE. With fragments that would just mean creating something like

@attribute(cql:shared_fragment)
create proc A_(id_)
begin
  select * from A where A.id = id_;
end;

At least then if we have to materialize we'll get only one row. This could be a good thing to do universally, but it's especially important if you know that forking in the query shape is mandatory for some reason.

A better pattern might be this:

explain query plan
with
  AA(id, this) as (select * from A),
  AB(id, this, that) as (select AA.*, B.that from AA left join B on B.id = AA.id),
  ABC(id, this, that, other) as (select AB.*, C.other from AB left join C on C.id = AB.id)
select * from ABC
where ABC.id = 5;

QUERY PLAN
|--SEARCH TABLE A USING INTEGER PRIMARY KEY (rowid=?)
|--SEARCH TABLE B USING INTEGER PRIMARY KEY (rowid=?)
\--SEARCH TABLE C USING INTEGER PRIMARY KEY (rowid=?)

Here we've just extended the chain. With shared fragments you could easily build an AB_ proc as before and then build an ABC_ proc either by calling AB_ directly or by having a table parameter that is LIKE AB_.

Both cases will give you a great plan.

So the most important things are:

Avoid forking the chain of CTEs/fragments, a straight chain works great.
Avoid re-joining to tables, even unconstrained CTEs result in great plans if they don't have to be materialized.
If you do need to fork in your CTE chain, because of your desired shape, be sure to move as many filters as you can further upstream so that by the time you materialize only a very small number of rows need to be materialiized.

These few rules will go far in helping you to create shapes.

One last thing, without shared fragments, if you wanted to create a large 10 way join or something you had to type that join into your file and it would be very much in your face. Now that join might be hidden from you in a nice easy-to-use fragment. Which you might then decide you want to use 3 times... And now with a tiny amount of code you have 30 joins.

The thing is shared fragments make it easy to generate a lot of SQL. It's not bad that shared fragments make things easy, but with great power comes great responsibility, so give a care as to what it is you are assembling. Understanding your fragments, especially any big ones, will help you to create great code.

Data Definition Language (DDL)​

Ad Hoc Migrations​

Transactions​

Savepoints​

Compilation options​

Previous Schema​

Schema Regions​

Schema Version​

C Text Echo​

Enumerations​

Cursor Lifetime​

Procedure Calls and Exceptions​

Control Flow with "Big Moves"​

Getting access to external code​

Regular Data Manipulation Language (DML)​

Variable and Cursor declarations​

Query Plans​

Fetching Data from a Cursor or from Loose Data​

Control Flow​

Manual Control of Results​

CTEs and Shared Fragments​