Appendix 8: CQL Best Practices

This is a brief discussion of every statement type and some general best practices for that statement. The statements are in mostly alphabetical order except related statements were moved up in the order to make logical groups.

Refer also to Appendix 7: CQL Anti-patterns.

Data Definition Language (DDL)

  • ALTER TABLE ADD COLUMN
  • CREATE INDEX
  • CREATE PROC
  • CREATE TABLE
  • CREATE TRIGGER
  • CREATE VIEW
  • CREATE VIRTUAL TABLE
  • DROP INDEX
  • DROP TABLE
  • DROP TRIGGER
  • DROP VIEW

These statements almost never appear in normal procedures and generally should be avoided. The normal way of handling schema in CQL is to have one or more files declare all the schema you need and then let CQL create a schema upgrader for you. This means you'll never manually drop tables or indices etc. The create declarations with their annotations will totally drive the schema.

Any ad hoc DDL is usually a very bad sign.

Ad Hoc Migrations

  • @SCHEMA_AD_HOC_MIGRATION

This is a special upgrade step that should be taken at the version indicated in the statement. These can be quite complex and even super important but should not be used lightly. Any migration procedure has to be highly tolerant of a variety of incoming schema versions and previous partial successes. In any case this directive should not appear in normal code. It should be part of the schema DDL declarations.

Transactions

  • BEGIN TRANSACTION
  • COMMIT TRANSACTION
  • ROLLBACK TRANSACTION

Transactions do not nest and most procedures do not know the context in which they will be called, so the vast majority of procedures will not and should not actaully start transactions. You can only do this if you know, somehow, for sure, that the procedure in question is somehow a "top level" procedure. So generally, don't use these statements.

Savepoints

  • SAVEPOINT
  • ROLLBACK TO SAVEPOINT
  • RELEASE SAVEPOINT
  • PROC SAVEPOINT
  • COMMIT RETURN
  • ROLLBACK RETURN

Savepoints are the preferred tool for having interim state that can be rolled back if needed. You can use ad hoc savepoints, just give your save point and name then use RELEASE SAVEPOINT to commit it, or else ROLLBACK TO SAVEPOINT followed by a RELEASE to abort it. Note that you always RELEASE savepoints in both the rollback and the commit case.

Managing savepoints can be tricky, especially given the various error cases. They combine nicely with TRY CATCH to do this job. However, even that is a lot of boilerplate. The best way to use savepoints is with PROC SAVEPOINT BEGIN .. END;

When you use PROC SAVEPOINT, a savepoint is created for you with the name of your procedure. When the block exits the savepoint is released (committed). However you also get an automatically generated try/catch block which will rollback the savepoint if anything inside the block were to invoke THROW. Also, you may not use a regular RETURN inside this block, you must use either ROLLBACK RETURN or COMMIT RETURN. Both of these directly indicate the fate of the automatically generated statement when they run. This gives you useful options to early-out (with no error) while keeping or abandoning any work in progress. Of course you can use THROW to return an error and abandon the work in progress.

Compilation options

  • @ENFORCE_NORMAL
  • @ENFORCE_POP
  • @ENFORCE_PUSH
  • @ENFORCE_RESET
  • @ENFORCE_STRICT

CQL allows you to specify a number of useful options such as "do not allow Window Functions" or "all foreign keys must choose some update or delete strategy". These additional enforcements are designed to prevent errors. Because of this they should be established once, somewhere central and they should be rarely if ever overrided. For instance @ENFORCE_NORMAL WINDOW FUNCTION would allow you to use window functions again, but this is probably a bad idea. If strict mode is on, disallowing them, that probably means your project is expected to target versions of SQLite that do not have window functions. Overriding that setting is likely to lead to runtime errors.

In general you don't want to see these options in most code.

Previous Schema

  • @PREVIOUS_SCHEMA

CQL can ensure that the current schema is compatible with the previous schema, meaning that an upgrade script could reasonably be generated to go from the previous to the current. This directive demarks the start of the previous schema section when that validatio happens. This direcive is useless except for creating that schema validation so it should never appear in normal procedures.

Schema Regions

  • @BEGIN_SCHEMA_REGION
  • @DECLARE_DEPLOYABLE_REGION
  • @DECLARE_SCHEMA_REGION
  • @END_SCHEMA_REGION

CQL allows you to declare arbitrary schema regions and limit what parts of the schema any given region may consume. This helps you to prevent schema from getting entangled. There is never a reason to use this directives inside normal procedures; They should appear only in your schema declaration files.

Schema Version

  • @SCHEMA_UPGRADE_SCRIPT
  • @SCHEMA_UPGRADE_VERSION

The @SCHEMA_UPGRADE_SCRIPT directive is only used by CQL itself to declare that the incoming file is an autogenerated schema upgrade script. These scripts have slightly different rules for schema declaration that are not useful outside of such scripts. So you should never use this.

@SCHEMA_UPGRADE_VERSION on the other hand is used if you are creating a manual migration script. You need this script to run in the context of the schema version that it affects. Use this directive at the start of the file to do so. Generally manual migration scripts are to be avoided so hopefully this directive is rarely if ever used.

C Text Echo

  • @ECHO

This directive emits plain text directly into the compiler's output stream. It can be invaluable for adding new runtime features and for ensuring that (e.g.) additional #include or #define directives are present in the output but you can really break things by over-using this feature. Most parts of the CQL output are subject to change so any use of this should be super clean. The indended use was, as mentioned, to allow an extra #include in your code so that CQL could call into some library. Most uses of this combine with DECLARE FUNCTION or DECLARE PROCEDURE to declare an external entity.

Enumerations

  • DECLARE ENUM
  • @EMIT_ENUMS

Avoid embedded constants whenever possible. Instead declare a suitable enumeration. Use @EMIT_ENUMS Some_Enum to get the enumeration constants into the generated .h file for C. But be sure to do this only from one compiland. You do not want the enumerations in every .h file. Choose a single .sql file (not included by lots of other things) to place the @EMIT_ENUMS directive. You can make a file specifically for this purpose if nothing else is serviceable.

Cursor Lifetime

  • CLOSE
  • OPEN

The OPEN statement is a no-op, SQLite has no such notion. It was included becasue it is present in MYSQL and other variants and its inclusion can easy readability sometimes. But it does nothing. The CLOSE statement is normally not necessary because all cursors are closed at the end of the procedure they are declared in (unless they are boxed, see below). You only need CLOSE if you want to close a global cursor (which has no scope) or if you want to close a local cursor "sooner" because waiting to the end of the procedure might be a very long time. Using close more than once is safe, the second and later close operations do nothing.

Procedure Calls and Exceptions

  • CALL
  • THROW
  • TRY CATCH

Remember that if you call a procedure and it uses THROW or else uses some SQL that failed, this return code will cause your code to THROW when the procedure returns. Normally that's exactly what you want, the error will ripple out and some top-level CATCH will cause a ROLLBACK and the top level callers sees the error. If you have your own rollback needs be sure to install your own TRY/CATCH block or else use PROC SAVEPOINT as above to do it for you.

Inside of a CATCH block you can use the special variable @RC to see the most recent return code from SQLite.

Control Flow with "Big Moves"

  • CONTINUE
  • LEAVE
  • RETURN

These work as usual but beware, you can easily use any of these to accidentally leave a block with a savepoint or transaction and you might skip over the ROLLBACK or COMMIT portions of the logic. Avoid this problem by using PROC SAVEPOINT.

Getting access to external code

  • DECLARE FUNCTION
  • DECLARE SELECT FUNCTION
  • DECLARE PROCEDURE

The best practice is to put any declarations into a shared header file which you can #include in all the places it is needed. This is especially important should you have to forward declare a procedure. CQL normally provides exports for all procedures so you basically get an automatically generated and certain-to-be-correct #include file. But, if the procedures are being compiled together then an export file won't have been generated yet at the time you need it; To work around this you use the DECLARE PROCEDURE form. However, procedure declarations are tricky; they include not just the type of the arguments but the types of any/all of the columns in any result set the procedure might have. This must not be wrong or callers will get unpredictable failures.

The easiest way to ensure it is correct is to use the same trick as you would in C -- make sure that you #include the declaration the in the translation unit with the definition. If they don't match there will be an error.

A very useful trick: the error will include the exact text of the correct declaration. So if you don't know it, or are too lazy to figure it out; simply put ANY declaration in the shared header file and then paste in the correct declaration from the error. should the definition ever change you will get a compilation error which you can again harvest to get the correct declaration.

In this way you can be sure the declarations are correct.

Functions have no CQL equivalent, but they generally don't change very often. Use DECLARE FUNCTION to allow access to some C code that returns a result of some kind. Be sure to add the CREATE option if the function returns a reference that the caller owns.

Use DECLARE SELECT FUNCTION to tell CQL about any User Defined Functions you have added to SQLite so that it knows how to call them. Note that CQL does not register those UDFs, it couldn't make that call lacking the essential C information required to do so. If you find that you are getting errors when calling a UDF the most likely reason for the failure is that the UDF was declared but never registered with SQLite at runtime. This happens in test code a lot -- product code tends to have some central place to register the UDFs and it normally runs at startup, e.g. right after the schema is upgraded.

Regular Data Manipulation Language (DML)

  • DELETE
  • INSERT
  • SELECT
  • UPDATE
  • UPSERT

These statements are the most essential and they'll appear in almost every procedure. There are a few general best practices we can go over.

  • Try to do as much as you can in one batch rather than iterating; e.g.

    • don't write a loop with a DELETE statement that deletes one row if you can avoid it, write a delete statement that deletes all you need to delete
    • don't write a loop with of SELECT statement that fetches one row, try to fetch all the rows you need with one select
  • Make sure UPSERT is supported on the SQLite system you are using, older versions do not support it

  • Don't put unnecessary casts in your SELECT statements, they just add fat

  • Don't use CASE/WHEN to compute a boolean, the boolean operations are more economical (e.g. use IS)

  • Don't use COUNT if all you need to know is whether a row exists or not, use EXISTS

  • Don't use GROUP BY, ORDER BY, or DISTINCT on large rowsets, the sort is expensive and it will make your SELECT statements write to disk rather than just read

  • Always use the INSERT INTO FOO USING form of the INSERT statement, it's much easier to read than the standard form and compiles to the same thing

Variable and Cursor declarations

  • DECLARE OUT CALL
  • DECLARE
  • LET
  • SET

These are likely to appear all over as well. If you can avoid a variable declaration by using LET then do so; The code will be more concise and you'll get the exact variable type you need. This is the same as var x = foo(); in other languages. Once the variable is declared use SET.

You can save yourself a lot of declarations of OUT variables with DECLARE OUT CALL. That declaration form automatically declares the OUT variables used in the call you are about to make with the correct type. If the number of arguments changes you just have to add the args you don't have to also add new declarations.

The LIKE construct can be used to let you declare things whose type is the same as another thing. Patterns like DECLARE ARGS CURSOR LIKE FOO ARGUMENTS save you a lot of typing and also enhance correctness. There's a whole chapter dedicated to "shapes" defined by LIKE.

Query Plans

  • EXPLAIN

Explain can be used in front of other queries to generate a plan. The way SQLite handles this is that you fetch the rows of the plan as usual. So basically EXPLAIN is kind of like SELECT QUERY PLAN OF. This hardly ever comes up in normal coding. CQL has an output option where it will generate code that gives you the query plan for a procedures queries rather than the normal body of the procedure.

Fetching Data from a Cursor or from Loose Data

  • FETCH
  • UPDATE CURSOR

The FETCH statement has many variations, all are useful at some time or another. There are a few helpful guidelines.

  • If fetching from loose values into a cursor use the FETCH USING form (as you would with INSERT INTO USING) because it is less error prone
  • FETCH INTO is generally a bad idea, you'll have to declare a lot of variables, instead just rely on automatic storage in the cursor e.g. fetch my_cursor rather than fetch my_cursor into a, b, c
  • If you have data already in a cursor you can mutate some of the columns using UPDATE CURSOR, this can let you adjust values or apply defaults

Control Flow

  • IF
  • LOOP
  • SWITCH
  • WHILE

These are your bread and butter and they will appear all over. One tip: Use the ALL VALUES variant of switch whenever possible to ensure that you haven't missed any cases.

Manual Control of Results

  • OUT

  • OUT UNION

  • If you know you are producing exactly one row OUT is more economical than SELECT

  • If you need complete flexibility on what rows to produce (e.g. skip some, add extras, mutate some) then OUT UNION will give you that, use it only when needed, it's more expensive than just SELECT

Last updated on by Winnie Quinn