Chapter 2: Using Data
The point of using CQL is to facilitate access to a SQLite database so we'll switch gears to a slightly more complicated setup. We'll still keep things fairly simple but let's start to use some database features. Note: it is not the intent of this tutorial to also be a primer for the SQLite programming language which is so ably documented on https://sqlite.org/. Please refer to that site for details on the meaning of the SQL statements used here if you are new to SQL.
A Sample Program
Suppose we have the following program:
That looks like an interesting little baby program and it appears as though it would once again print that most famous of salutations, "Hello, world".
Well, it doesn't. At least, not yet. Let's walk through the various things that are going to go wrong as this will teach us everything we need to know about activating CQL from some environment of your choice.
Providing a Suitable Database
CQL is just a compiler, it doesn't know how the code it creates will be provisioned any more than say clang does.
It creates functions with predictable signatures so that they can be called from C just as easily as the SQLite API
itself, and using the same currency. Our new version of hello
now requires a database handle because it performs
database operations. Also there are now opportunities for the database operations to fail, and so hello
now provides a
return code.
A new minimal main
program might look something like this:
If we re-run CQL and look in the hello.h
output file we'll see that the declaration of the hello
function is now:
indicating that the database is used and a SQLite return code is provided. We're nearly there. If you attempt to build the program as before there will be several link-time errors due to missing functions. Typically these are resolved by providing the SQLite library to the command line and also adding the CQL runtime. The new command line looks something like this:
The cql runtime can be anywhere you want it to be, and of course the usual C seperate compilation methods can be applied. More on that later.
But actually, that program doesn't quite work yet. If you run it, you'll get an error result code, not the message "Hello, world".
Let's talk about the final missing bit.
Declaring Schema
In CQL a loose piece of Data Definition Language (henceforth DDL) does not actually create or drop anything. In most CQL programs, the normal situation is that "something" has already created the database and put some data in it. You need to tell the CQL compiler about the schema so that it knows what the tables are and what to expect to find in those tables. This is because typically you're reconnecting to some sort of existing database. So, in CQL, loose DDL simply declares schema, it does not create it. To create schema you have to put the DDL into a procedure you can run. If you do that, then the DDL still serves a declaration, but also the schema will be created when the procedure is executed.
We need to change our program a tiny bit.
If we rebuild the program, it will now behave as expected.
Explaining The New Hello World
Let's go over every important line of the new program, starting from main.
This statement gives us an empty private in-memory only database to work with. This is the simplest case
and it's still very useful. The sqlite_open
and sqlite_open_v2
functions can be used to create a variety of
databases per the SQLite documentation.
We'll need such a database to use our procedure, and we use it in the call here:
This provides a valid db handle to our procedure. Note that the procedure doesn't know what database it is supposed to operate on, it expects to be handed a suitable database on a silver platter. In fact any given proc could be used with various databases at various times. Just like SQLite, CQL does not enforce any particular database setup; it does what you tell it to.
When hello
runs we begin with
This will create the my_data
table with a single column t
, of type text not null
. That will work because
we know we're going to be called with a fresh/empty database. More typically you might do create table if not exists ...
or otherwise have a general attach/create phase or something like that. We'll dispense with that here.
Next we'll run the insert statement:
This will add a single row to the table. Note that we have again used double quotes, meaning this is a C string literal. This is highly convenient given the escape sequences. Normally SQLite text has the newlines directly embedded in it; that practice isn't very compiler friendly, hence the alternative.
Next we declare a local variable to hold our data.
This is a simple string reference, it will be initialized to NULL
by default. That's actually important;
even though the variable is NOT NULL
there is no reasonable default value for it other than NULL
.
The NOT NULL
declaration will guard against NULL
assignments but it will not prevent reference types
from beginning with NULL
as their value. Junk would be worse and some random initialized value
would create unnecessary cost. This mirrors the choice the C language makes with its _Nonnull
extensions.
At this point we can read back our data.
This form of database reading has very limited usability but it does work for this case and it is illustrative.
The presence of (select ...)
indicates to the CQL compiler that parenthesized expression should be given to
SQLite for evaluation according to the SQLite rules. The expression is statically checked at compile time to
ensure that it has exactly one result column. In this case the *
is just column t
, and actually it would have
be clearer to use t
directly here but then there wouldn't have a reason to talk about *
and multiple columns.
At run time, the select
query must return exactly one row or an error code will be returned. It's not uncommmon
to see (select ... limit 1)
to force the issue. But that still leaves the possibility of zero rows, which would be an error. We'll talk about more flexible ways to read from the database later.
At this point it seems wise to bring up the unusual expression evaluation properties of CQL.
CQL is by necessity a two-headed beast. On the one side there is a rich expression evaluation language for
working with local variables. Those expressions are compiled into C logic that emulates the behavior of SQLite
on the data. It provides complex expression constructs such IN
and CASE
but it is ultimately evaluated by C
execution. Alternately, anything that is inside of a piece of SQL is necessarily evaluated by SQLite itself. To make this clearer let's change the example a little bit before we move on.
This is a somewhat silly example but it illustrates some important things:
- even though SQLite doesn't support double quotes that's no problem because CQL will convert the expression into single quotes with the correct escape values as a matter of course during compilation
- the
||
concatenation operator is evaluated by SQLite - you can mix and match both kinds of string literals, they will be all be the single quote variety by the time SQLite sees them
- the
||
operator has lots of complex formatting conversions (such as converting real values to strings) - in fact the conversions are so subtle as to be impossible to emulate in loose C code with any economy, so, like a few other operators,
||
is only supported in the SQLite context
Returning now to our code as written, we see something very familiar:
Note we've used the single quote syntax here for no good reason other than illustration. There are no escape
sequences here so either form would the job. Importantly, the string literal will not create a string object as before
but the text variable t
is of course a string reference. Before it can be used in a call to an un-declared function it
must be converted into a temporary C string. This might require allocation in general, that allocation is automatically
managed. Note that by default CQL assumes that calls to unknown C functions should be emitted as written. In this way you can use printf
even though CQL knows nothing about it.
Lastly we have:
This is not strictly necessary because the database is in memory anyway and the program is about to exit but there it is for illustration.
Now the Data Manipulation Langauge (i.e. insert and select here; and henceforth DML) and the DDL might fail for various reasons. If that happens the proc will goto
a cleanup handler and return the failed return code instead of running the rest of the code. Any temporary memory allocations will be freed and any pending
SQLite statements will be finalized. More on that later when we discuss error handling.
With that we have a much more complicated program that prints "Hello, world"
Introducing Cursors
In order to read data with reasonable flexibility, we need a more powerful construction. Let's change our example again and start using some database features.
Reviewing the essential parts of the above.
The table now includes a position column to give us some ordering. That is the primary key.
The insert statements provide both columns, not in the printed order. The insert form where the columns are not specified indicates that all the columns will be present, in order, this is more economical to type. CQL will generate errors at compile time if there are any missing columns or if any of the values are not type compatible with the indicated column.
The most important change is here:
We've created a non-scalar variable C
, a cursor over the indicated result set. The results will be ordered by pos
.
This loop will run until there are no results left (it might not run at all if there are zero rows, that is not an error). The FETCH
construct allows you to specify target variables, but if you do not do so, then a synethetic structure is
automatically created to capture the projection of the select
. In this case the columns are pos
and txt
.
The automatically created storage exactly matches the type of the columns in the select list which could itself be tricky to calculate if the select
is complex. In this case the select
is quite simple and the columns of the result directly match the schema for my_data
. An integer and a string reference. Both not null.
The storage for the cursor is given the same names as the columns of the projection of the select, in this case the columns were not renamed so pos
and txt
are the fields in the cursor.
Double quotes were used in the format string to get the newline in there easily.
The cursor is automatically released at the end of the procedure but in this case we'd like to release it before the
drop table
happens so there is an explicit close
. This is frequently elided in favor of the automatic cleanup.
There is an open
cursor statement as well but it doesn't do anything. It's there because many systems have that
construct and it does balance the close
.
If you compile and run this program you'll get this output.
So the data was inserted and then sorted.
Going Crazy
We've only scratched the surface of what SQLite can do and most DML constructs are supported by CQL. This includes common table expressions, and even recursive versions of the same. But remember, when it comes to DML, the CQL compiler only has to validate the types and figure out what the result shape will be -- SQLite always does all the heavy lifting of evaluation. All of this means with remarkably little additional code, the example below from the SQLite documentation can be turned into a CQL stored proc using the constructs we have defined above.
This code uses all kinds of SQLite features to produce this text:
Which probably doesn't come up very often but it does illustrate several things:
WITH RECURSIVE
actually provides a full lambda calculus so arbitrary computation is possible- You can use
WITH RECURSIVE
to create table expressions that are sequences of numbers easily, with no reference to any real data - You can announce SQLite builtin functions other than the most standard ones, or SQLite user defined functions with
DECLARE SELECT FUNCTION
-- this creates a function whose type is known and is only useable inside of SQL evaluation contexts (just like the||
operator)- in general CQL doesn't know what these declared functions do, so it cannot emulate them
- some would be very hard to emulate correctly under the best of circumstances