Chapter 5: Types of Cursors, OUT and OUT UNION, and FETCH flavors
In the previous chapters we have used cursor variables without fully discussing them. Most of the uses are fairly self-evident but a more exhaustive discussion is also useful.
First there are three types of cursors, as we will see below.
Statement Cursors
A statement cursor is based on a SQL SELECT
statement. A full example might look like this:
When compiled, this will result in creating a SQLite statement object (type sqlite_stmt*
)
and storing it in a variable called C
. This statement can then be used later in various ways.
Here's perhaps the simplest way to use a cursor:
This will have the effect of reading one row from the results of the query into
the local variables x
and y
.
These variables might then be used to create some output such as
Or any other use.
More generally, there may or may not be a fetched value. The cursor variable C
can be used by itself as a virtual boolean indicating the presence of a row.
So a more complete example might be
And even more generally
Here we read all the rows out and print them.
Now if the table xy_table
had instead had dozens of columns those declarations
would be very verbose and error prone. And frankly annoying, especially if
the table definition was changing over time.
To make this a little easier, there are also so-called 'automatic' cursors. These happen implicitly and include all the necessary storage to exactly match the rows in their statement. Using the automatic syntax for the above we might get
or the equivalent loop form:
All the necessary local state is automatically created hence "automatic" cursor. This pattern is generally preferred but the loose variables pattern is in some sense more general.
In all the cases if the number or type of variables do not match the select statement, semantic errors are produced.
Value Cursors
The purpose of value cursors is to make it possible for a stored procedure to work with structures as a unit rather than field by field. SQL doesn't have the notion of structure types but structures actually appear pretty directly in many places:
- the columns of a table are a structure
- the projection of a
SELECT
statement is a structure - other things directly derived from the above (like the columns of a statement cursor) are likewise structures
Let's first start by how you declare a value cursor. It is by analogy to one of the structure types above.
So:
Any of those forms define a valid set of columns. Note that the select
example in no way causes the query provided to run. Instead, the select statement is analyzed and the column names and types are computed. The cursor get the same field names and types. Nothing happens at run time.
The last example assumes that there is a stored procedure defined somewhere earlier in this translation unit and that procedure returns a result set. The cursor declaration makes a cursor that could receive the result of that procedure. We'll cover
that particular case in more detail below when we deal with the OUT
statement.
Now once we have declared the cursor we can load it with values using fetch
in the value form.
You can load up a cursor from values. The values must be type-compatible of course.
You can call a procedure that returns a single row:
You can fetch a cursor from another cursor:
In this case D must be an 'automatic' cursor but it could be coming from a statement. This lets you copy a row and save it for later. E.g. you could copy the current max-valued row into a value cursor and use it after the loop.
Value cursors are always 'automatic' -- they have their own storage.
Value cursors also may or may not be holding a row.
When you call a procedure you may or may not get a row as we'll see below.
OUT Statement
Value cursors were initially designed to create a convenient way for
a procedure to return a single row from a complex query
without having a crazy number of OUT
parameters. It's easiest
to illustrate this with an example.
The older verbose pattern looks like this:
Now you can imagine this gets very annoying if get_a_row
has to produce
a couple dozen column values. And of course you have to get the types
exactly right. And they might evolve over time. Joy.
On the receiving side you get to do something just as annoying:
Using the out
statement we get the equivalent functionality with a much simplified pattern. It looks like this:
To use it you simply do this:
In fact originally the above was the only way to load a value cursor, before the calculus was generalized. The original form still works, and does both things in one step:
The OUT
statement lets you return a single row economically and
lets you then test if there actually was a row and read the columns.
It infers all the various column names and types so it is resilient
to schema change and generally a lot less error prone than having a
large number of out
arguments to your procedure.
Once you have the result in a value cursor you can do the usual cursor operations to move it around or otherwise work with it.
The use of the LIKE
keyword to refer to the types of complex entities spread to other
places in CQL as a very useful construct, but it began here with the
need to describe a cursor shape economically, by reference.
OUT UNION Statement
The semantics of the out
statement are that it always produces one row
of output (a procedure can produce no row if an out
never actually ran but the procedure does use out
).
If an out
statement runs more than once the most recent row becomes the result. So the out
statement really does
mirror having one out
variable for each column. This was its intent and procedures
that return at most, or exactly, one row are very common. However, in general, one row results
do not suffice; you might want to produce a result set from various sources
with maybe arbitary compution in there as well. For that you need to be able to emit multiple
rows from a computed source. This is exactly what out union
provides.
Here's a (somewhat contrived) example of the kind of thing you can do with this form:
In foo
above, we make an entire result set out of thin air. It isn't very
interesting but of course any computation would have been possible.
This pattern is very flexibe as we see below in bar
where
we're going to merge two different data streams.
Just like foo
, in bar
, each time out union
runs a new row is accumulated. Now, if you build
a procedure that ends with a select
statement CQL automatically creates a fetcher function
that does exactly the same thing -- it loops over the SQLite statement for the select and fetches
each row, materializing a result. With out union
you take manual control of this process, allowing you
to build arbitrary result sets. Note that either of C
or D
above could have been modified, replaced, skipped,
normalized, etc. with any kind of computation. Even entirely synthetic rows can be computed
and inserted into the output as we saw in foo
.
Result Set Cursors
So OUT UNION
makes it possible to create arbitrary result sets using a mix of sources and filtering. Unfortunately this result type is not a simple row, nor is it a SQLite statement and so while it can produce ordinary result sets for CQL callers, CQL could not itself consume that result type.
To address this hole, and thereby make it a lot easier to test these result sets (which really is the most interesting use case for re-consuming a result set) we need an additional cursor type. The syntax is exactly the same as the statement cursor cases described above but, instead of holding a SQLite statement, the cursor holds a result set pointer and the current/max row numbers. Stepping through it simply increments the row number and fetches the next row out of the rowset instead of from SQLite.
Example:
If bar
had been created with a select union
and order by
to merge the results, the above would have worked with C
being a standard statement cursor, iterating over the union. Since foo
produces a result set, CQL transparently produces a suitable cursor implementation behind the scenes but otherwise the usage is the same.
Note this is a lousy way to iterate over rows; you have to materialize the entire result set so that you can just step over it. Re-consuming like this is not recommended at all for production code but it is ideal for testing result sets that were made with out union
which otherwise would require C/C++ to test. Testing CQL with CQL is generally a lot easier.
LIKE
forms
Reshaping Data, Cursor There are lots of cases where you have big rows with many columns and there are various manipulations you need to do. Some of these choices are emitting extra, related, rows, some of them are altering some of the columns before emitting the rows into the result set for use by some client.
What follows is a set of useful syntactic sugar constructs that simplify handling complex rows. The idea is that pretty much anywhere you can specify a list of columns you can instead use the LIKE x
construct to get the columns as the appear in object x
-- which is usually a cursor. It’s a lot easier to illustrate with examples, even though these are, again, a bit contrived.
First we need some table with lots of columns usually the column names are much bigger which makes it all the more important to not have to type them over and over.
We're going to emit two rows as the result of this proc. Easy enough...
Now let's briefly discuss what is above. The two essential parts are:
fetch result from cursor main_row(like result);
and
update cursor result(like alt_row) from cursor alt_row;
In the first case what we're saying is that we want to load the columns of result
from main_row
but we only want to take the columns that are actually present in result
. So this is a narrowing
of a wide row into a smaller row. In this case the smaller row, result
is what we want to emit.
We needed the other columns to compute alt_row
.
The second case, what we're saying is that we want update result
by replacing the columns
found in alt_row
with the values in alt_row
. So in this case we're writing a smaller cursor
into part of a wider cursor. Note that we used the update
form here becuase it preserves
all other columns. If we used fetch
we would be rewriting the entire row contents, using NULL
if necessary, not desired here.
Here is the rewritten version of the above procedure; this is what ultimately gets compiled into C.
Of course you could have typed all that before but when there’s 50 odd columns it gets old fast and it’s very error prone. The sugar form is going to be 100% correct and much less typing.
Finally, while I've shown both LIKE
forms seperately they can also be used together. For instance
The above would mean, "move the columns that are found in X
from cursor D
to cursor C
", presumably X
has columns common to both.
Fetch Statement Specifics
Many of the examples used the FETCH
statement in a sort of demonstrative way that is hopefully self-evident but the statement has many forms and so it's wroth going over them specifically. Below we'll use the letters C
and D
for the names of cursors. Usually C
;
For Statement or Result Set Cursors
A cursor declared in one of these forms:
declare C cursor for select * from foo;
declare C cursor for call foo();
(foo might end with aselect
or useout union
)
Is either a statement cursor or a result set cursor. In either case it moves through the results. You load the next row with
FETCH C
, orFETCH C into x, y, z;
In the first form C
is said to be automatic in that it automatically declares the storage needed to hold all its columns. As mentioned above automatic cursors have storage for their row.
Having done this fetch you can use C as a scalar variable to see if it holds a row, e.g.
You can easily iterate, e.g.
Automatic cursors are so much easier to use than explicit storage that explicit storage is rarely seen. Storing to out
parameters is a case where explicit is ok, the out
parameters have to be declared anyway.
For Value Cursors
A cursor declared in one of these forms:
declare C cursor fetch from call foo(args)
foo
must be a procedure that returns one row withOUT
`declare C cursor like select 1 id, "x" name;
declare C cursor like X;
- where X is the name of a table, a view, another cursor, or a procedure that returns a structured result
Is a value cursor. A value cursor is always automatic, it's purpose is to hold a row. It doesn't iterate over anything but it can be re-loaded in a loop.
fetch C
orfetch C into ...
is not valid on such a cursor, it doesn't have a source to step throughThe canonical ways to load such a cursor is:
fetch C from call foo(args);
foo
must be a procedure that returns one row withOUT
fetch C(a,b,c...) from values(x, y, z);
The first form is in some sense the origin of value cursor. Value cursors were added to the language initially to have a way to capture the single row out
statement results, much like result set cursors were added to capture procedure results from out union
. In the first form the cursor storage (a C struct) is provided by reference as a hidden out parameter to procedure and the procedure fills it in. The procedure may or may not use the out
statement in its control flow the the cursor might not hold a row. You can use if C then ...
as before to test for a row.
The second form is more interesting as it allows the cursor to be loaded from arbitary expressions subject to some rules:
- you should think of the cursor as a logical row, it's fully loaded or not, therefore you must specify enough columns in the column list to ensure that all
NOT NULL
columns will get a value - if not mentioned in the list, NULL will be loaded if possible
- if insufficient columns are named, an error is generated
- if the value types specified are not compatible with the column types mentioned, an error is generated
With this form, any possible valid cursor values could be set, but many forms of updates that are common would be awkward. So there are various forms of syntatic sugar that are automatically rewitten into the canonical form. Several standard rewrites happen.
fetch C from values(x, y, z)
- if no columns are specified this is the same as naming all the columns, in order
fetch C from arguments
- the arguments to the procedure in which this statement appears are used, in order, as the values
- in this case
C
is also rewritten intoC(a,b,c,..)
fetch C from arguments like C
- the arguments to the procedure in which this statement appears are used, by name, as the values
- the order in which the arguments appeared no longer matters, the names that match the columsn of C are used if present
- the formal parameter name may have a single trailing underscore (this is what
like C
would generate) - e.g. if
C
has columnsa
andb
then there must exist formals nameda
ora_
andb
orb_
in any position
fetch C(a,b) from cursor D(a,b)
- the named columns of D are used as the values
- in this case it becomes: `fetch C(a,b) from values(D.a, D.b);
That most recent form does seem like it saves much but recall the first rewrite:
fetch C from cursor D
- both cursors are expanded into all their columns, creating a copy from one to the other
fetch C from D
can be used if the cursors have the exact same column names and types; it also generates slightly better code and is a common case
It is very normal to want to use some of the columns of a cursor in a standard way, these
like
forms do that job.fetch C from cursor D(like C)
- here
D
is presumed to be "bigger" thanC
, in that it has all of theC
columns and maybe more. Thelike C
expands into the names of theC
columns soC
is loaded from theC
part ofD
- the expansion might be
fetch C(a, b, g) from values (D.a, D.b, D.g)
D
might have had fieldsc, d, e, f
which were not used because they are not inC
.
The symmetric operation, loading some of the columns of a wider cursor can be expressed neatly:
- here
fetch C(like D) from cursor D
- the
like D
expands into the columns ofD
causing the cursor to be loaded with what's inD
andNULL
(if needed) - this might look like
fetch C(d1, d2) from values(D.d1, D.d2)
- the
Like can be used in both places, for instance suppose E
is a cursor that has a subset of the rows of both C
and D
. Without ever loading this cursor, just by defining its type you can write a form like this:
fetch C(like E) from cursor D(like E)
- this means take the column names found in
E
and copy them from D to C. - the usual type checking is done of course but the types of the columns in
E
won't matter, only those inC
andD
As is mentioned above, the
fetch
form means to load an entire row into the cursor. This is important because "half loaded" cursors would be semantically problematic. However there are many cases where you might like to amend the values of an already loaded cursor. You can do this with theupdate
form.- this means take the column names found in
`update cursor C(a,b,..) from values(1,2,..);
- the update form is a no-op if the cursor is not already loaded with values (!!)
- the columns and values are type checked so a valid row is ensured (or no row)
- all the re-writes above are legal so
update cursor C(like D) from D
is possible, it is in fact the use-case for which this was designed.
Calling Procedures with Bulk Arguments
It's often desireable to treat bundles of arguments as a unit, or cursors as a unit, especially calling other procedures. The patterns above are very helpful for moving data between cursors, arguments, and the database. These can be rounded out with similar constructs for procedure calls as follows.
First we'll define some shapes to use in the examples. Note that we made U
using T
.
As we've seen, we can do this:
But the following is also possible. It isn't an especially fabulous example but of course
it generalizes. The arguments will be x_
, y_
, and z_
.
Now we might want to chain these things together. This next example uses a cursor to
call p1
.
The like
construct allows you to select some of the arguments, or
some of a cursor to use as arguments. This next procedure has more arguments
than just T
. The arguments will be x_
, y_
, z_
, a_
, b_
Or similarly. using a cursor.
Note that the from
argument forms do not have to be all the arguments. For instance
you can get columns from two cursors like so:
All the varieties can be combined but of course the procedure signature must match. And all these forms work in function expressions as well as procedure calls.
e.g.
Since these forms are simply syntatic sugar, they can also appear inside of functions in SQL statements. The variables mentioned will be expanded and become bound variables just like any other variable that appears in a SQL statement.
Note the form x IN (from arguments) is not supported at this time, though this is a realitively easy addition.
Missing Data Columns, Nulls and Dummy Data
What follows are the rules for columns that are missing. This are also done by rewriting the AST. There are several options, with the dummy data choices (see below) being really only interesting in test code. None of what follows applies to the update cursor
statement because its purpose is to do partial updates.
- When fetching a row all the columns must come from somewhere, if the column is mentioned or mentioned by rewrite then it must have a value mentioned, or mentioned by rewrite
- For columns that are not mentioned, a NULL value is used if it is legal
fetch C(a) from values(1)
might turn intofetch C(a,b,c,d) from values (1, NULL, NULL, NULL)
In addition to the automatic NULL you may add the annotation @dummy_seed([long integer expression])
, if present
- the expression is evaluated and stored in the hidden variable seed
- all integers, and long integers get seed as their value (possibly truncated)
- booleans get 1 if and only if seed is non-zero
- strings get the name of the string column an underscore and the value as text (e.g. "myText7" if _seed is 7)
- blobs are not currently supported for dummy data (CQL is missing blob conversions which are needed first)
This construct is hugely powerful in a loop to create many complete rows with very little effort, even if the schema change over time.
Now in this example we don't need to know anything about my_table
other than that it has a column named id
. As the example shows several things.
- we got the shape of the cursor from the table we were inserting into
- you can do your own computation for some of the columns (those named) and leave the unnamed values to be defaulted
- the rewrites mentioned above work for the
insert
statement as well asfetch
- in fact
insert into my_table(id) values(i+10000) @dummy_seed(i)
would have worked too with no cursor at all- bonus, dummy blob data does work in insert statements because SQLite can do the string conversion easily
- the dummy value for a blob is a blob that holds the text of the column name and the text of the seed just like a string column
The @dummy_seed
form can be modified with @dummy_nullables
, this indicates that rather than using NULL for any nullable value that is missing, CQL should use the seed value. This overrides the default behavior of using NULL where columns are needed. Note the NULL filling works a little differently on insert statements. Since SQLite will provide a NULL if one is legal the column doesn't have to be added to the list with a NULL value during rewriting, it can simply be omitted, making the statement smaller.
Finally for insert
statement only, SQLite will normally use the default value of a column if it has one, so there is no need to add missing columsn with default values to the insert statement. However if you specify @dummy_defaults
then columns with a default value will instead be rewritten and they will get _seed_
as their value.
Some examples. Suppose columns a, b, c are not null; m, n are nullable; and x, y have defaults.
The sugar features on fetch
, insert
, and update cursor
are as symmetric as possible, but again, dummy data is generally only interesting in test code. Dummy data will continue to give you valid test rows even if columns are added or removed from the tables in question.
Generalized Cursor Lifetimes aka Cursor "Boxing"
Generalized Cursor Lifetime refers to capturing a Statement Cursor in an object so that it can used more flexibly. Wrapping something in an object is often called "boxing". Since Generalized Cursor Lifetime is a mouthful we'll refer to it as "boxing" from here forward. The symmetric operation "unboxing" refers to converting the boxed object back into a cursor.
The normal cursor usage pattern is by far the most common, a cursor is created directly with something like these forms:
At this point the cursor can be used normally as follows:
Those are the usual patterns and they allow statement cursors to be consumed sort of "up" the call chain from where the cursor was created. But what if you want some worker procedures that consume a cursor? There is no way to pass your cursor down again with these normal patterns alone.
To generalize the patterns, allowing, for instance, a cursor to be returned as an out parameter or accepted as an in parameter you first need to declare an object variable that can hold the cursor and has a type indicating the shape of the cursor.
To make an object that can hold a cursor:
Where T
is the name of a shape. It can be a table name, or a view name, or it can be the name of the canonical procedure that returns the result.
T should be some kind of global name, something that could be accessed with #include
in various places.
Referring to the examples above, choices for T
might be shape_source
the table or proc_that_returns_a_shape
the procedure.
Note: it's always possible make a fake procedure that returns a result to sort of "typedef" a shape name. e.g.
The procedure here my_shape
doesn’t have to actually ever be created, in fact it’s better if it isn't. It won’t ever be called,
its hypothetical result is just being as a shape. This can be useful if you have several procedures like proc_that_returns_a_shape
that all return results with the columns of my_shape
.
To create the boxed cursor, first declare the object variable that will hold it and then set object from the cursor.
Note that in the following example the cursor C
must have the shape defined by my_shape
or an error is produced.
The type of the object is crucial because, as we'll see, during unboxing, during unboxing that type defines the shape
of the unboxed cursor.
The variable box_obj
can now be passed around as usual. It could be stored in a suitable out
variable
or it could be passed to a procedure as an in
parameter. Then, later, you can "unbox" box_obj
to get a
cursor back. Like so
These primitives will allow cursors to be passed around with general purpose lifetime.
Example:
Importantly, once you box a cursor the underlying SQLite statement’s lifetime is managed by the box object with normal retain/release semantics. The box and underlying statement can be released simply by setting all references to it to null as usual.
With this pattern it's possible to, for instance, create a cursor, box it, consume some of the rows in one procedure, do some other stuff, and then consume the rest of the rows in another different procedure.
Important Notes:
- the underlying SQLite statement is shared by all references to it. Unboxing does not reset the cursor's position. It is possible, even desirable, to have different procedures advancing the same cursor
- there is no operation for "peeking" at a cursor without advancing it; if your code requires that you inspect the row and then delegate it, you can do this simply by passing the cursor data as a value rather than the cursor statement. Boxing and unboxing are for cases where you need to stream data out of the cursor in helper procedures
- durably storing a boxed cursor (e.g. in a global) could lead to all manner of problems -- it is exactly like holding on to a
sqlite3_stmt *
for a long time with all the same problems because that is exactly is happening
Summarizing, the main reason for using the boxing patterns is to allow for standard helper procedures that can get a cursor from a variety of places and process it. Boxing isn’t the usual pattern at all and returning cursors in a box, while possible, should be avoided in favor of the simpler patterns, if only because then then lifetime management is very simple in all those cases.