We've long needed a way to see the most recent SQLite result code SQLite in the context
of say a catch block (most other times you can assume SQLITE_OK was the last
result code otherwise control flow would transfer elsewhere. Sometimes SQLITE_ROW
or SQLITE_DONE might be the current result code.
Soon we'll provide a sample header that declares the most common error codes in an enum but
for now you could do something like this:
-- pasted from the sqlite.c
#define SQLITE_BUSY 5 /* The database file is locked */
The nested select statement is frequently misused, in particular if you get no rows back from the expression that's an error. So for instance:
set x_ :=(select x from foo.x where id = y);
This will throw (with a peculiar error, SQLITE_DONE) if there is no such row.
Sometimes people try to fix this problem with a nullcheck:
set x_ := IFNULL((select x from foo.x where id = y),-1);
That doesn't help at all. It's not a null value situation, there's no row at all.
set x_ :=(select IFNULL(x,-1)from foo.x where id = y),-1);
Is likewise unhelpful. To help with this situation we add two forms:
-- useful if foo.x is already known to be not null
set x_ :=(select x from foo.x where id = y IF NOTHING -1);
-- useful if foo.x might be null
set x_ :=(select x from foo.x where id = y IF NOTHING ORNULL-1);
Both of these deal with the case where there is no row. The second lets you have a simple default for both
no row or null value. That form is equivalent to:
set x_ :=(select IFNULL(x,-1)from foo.x where id = y IF NOTHING -1);
i.e. both problem cases are handled.
Of course the -1 here could be any valid expression, even a second (select...)
Previously if you did an early return, or fall through the end, from a procedure that is supposed to return a result set
but did not in fact provide one, you would get a fake SQLITE_ERROR. Now you get an empty result set for "free".
This interpretation seems much more natural and avoids a lot of really annoying stub selects to comply with the contract.
This also works for the out statement in the same fashion.
If you want to return an error, use throw. This is a lot more natural...
Further adding to the type calculus of the CQL language we introduced the ability to encode the "kind" of primitive types. This
can be used in a number of ways -- like "units" for natural things and like a "type" for synthetic keys and other such. It's easier
to illustrate by example.
declare job_id type long<job_id>;
declare person_id type long<person_id>;
declare j job_id;
decalre p person_id;
set p := j; -- this is an error
With the above in place, other expressions like p == j would also produce errors as these long values are no longer type compatible. This is
a great way to add enforcement to your schema and procedures. Likewise you can use these annotations to add "units" to your data types. e.g.
declare meters type real<meters>;
declare grams type real<grams>;
declare m meters;
declare g grams;
Variables of type grams (e.g. g) are not compatible with variables of type meters (e.g. m) even though both are real.
Likewise, attemping to insert grams into a column that is typed to meters will give errors. Of course SQLite doesn't know about any of this
so all the <> stuff is removed in the generated SQL. This is just about type enforcement at compile time.
Enumerations like:
declare enum surface integer (paper, canvas);
declare enum writer integer (pen, paper, brush);
enable this:
declare s surface; -- s is now of type integer<surface>
declare w writer; -- w is now of type integer<writer>
set s := surface.paper; -- ok
set s := writer.pen; -- error
set w := writer.pencil; -- ok
case when s == w then 1 else 0 end; -- error (w/s not comparable)
set w := s; -- error again
additionally in DML/DDL:
create table draw_action(
w writer,
s surface
);
insert into draw_action values(w, s); -- ok
insert into draw_action values(s, w); -- error!
So the type kinds can be quite helpful when dealing with loose variables.
The notion of specific types was added to the language nearly two years ago to support the object type because there was a great desire
to prevent object<dictionary> being assigned from object<list> but this "type kind", whether it's with units (e.g. "meters", "grams")
or a type name (e.g. "job_id") adds a lot of high value type checking.
The kind can be added, stripped, or changed with a cast operation and the type system allows a constant or variable with no kind (e.g. "1")
to mix and match with any kind so long as the base type is compatible as usual. So you get the most value by using the specific type
consistently but you won't go insane adding test cases that use constants for instance.
As of this writing the expression kinds are checked for compatibility everywhere plus or minus bugs. There are extensive tests.
A common source of errors in stored procedures is incorrect typing in arguments. For instance, a particular key
for an entity might need to be LONG or even always LONG NOT NULL or LONG NOT NULL @SENSITIVE and the only
way to do this in the past was maybe with some #define thing. Otherwise you have to diligently get the type right
in all the places, and should it ever change, again you have to visit all the places. To help with this situation,
and to make code a little more self-describing we add named types to the language. This is a lot like typedef in
the C language. They do not create different incompatible types but do let you name things well.
Language support for virtual tables has lagged since I always thought they were of little interest to
the language anyway. The CREATE TABLE forms in general are only declarations (except if you're doing
the schema installer/upgrader output) and so you could just declare say a temp table that corresponds to the
virtual table that you made in the same way that you might declare say sqlite_master if you wanted to use it.
And since you have to register the module anyway, you may as well create the virtual table at the same time.
So there was no point in adding language support for the thing.
Furthermore the CREATE VIRTUAL TABLE form includes no information about the schema of the table so you'd
need some kind of declaration anyway in order to tell the language what the columns are for the table you
just created. So again you may as well just declare it like a normal table and not include that table in your
schema upgrade file and be done with it.
And that was my thinking for the last 2 years. And then I learned something.
Virtual tables are durable.
Yeah, I always assumed that virtual tables were temp tables and they vanish and have to be redeclared every
session but that is not the case. They are part of the durable schema so while you must pre-register the
module associated with the virtual table, the virtual table is like other tables in that you only create it
once and from then on it's part of the schema every time the database loads until you DROP it.
This changes everything.
With virtual tables being durable they belong in the schema upgrade process. And if they go there they also
have to go into the JSON output. But we can't use the vanilla syntax that SQLite uses because that syntax is:
not parseable, because the module arguments can be literally anything (or nothing), even a letter to your gramma.
the arguments do not necessarily say anything about the table's schema at all
So in the CQL langauge I change the syntax a bit, the generalized form looks like this:
create virtual table virt_table using my_module [(module arguments)] as (
id integer not null,
name text
);
The part after the AS is used by CQL as a table declaration for the virtual table. The grammar for that
is exactly the same as a normal CREATE TABLE statement. However that part is not transmitted to
SQLite; when the table is created, SQLite sees only the part it cares about, the part before the AS.
Now this leaves the module arguments, they can be one of three things:
no arguments at all
a list of identifiers, constants, and parenthesized sublists just like in the @attribute form
create virtual table virt_table using my_module as (
id integer not null,
name text
);
becomes (to SQLite)
CREATE TABLE virt_table USING my_module;
Note: empty arguments USING my_module() are not allowed in the SQLite docs but do seem to work in SQLite.
We take the position that no args should be done with no parens, at least for now.
Virtual tables go into their own section in the JSON and they include the module and moduleArgs entries, they are additionally
marked isVirtual in case you want to use the same processing code for virtual tables as normal tables. The JSON format is otherwise
the same, although some things can't happen in virtual tables (e.g. there is no TEMP option so "isTemp" must be false in the JSON.
For purposes of schema processing, virtual tables are on the @recreate plan, just like indices, triggers, etc. This is the only option since
the alter table form is not allowed on a virtual table.
Semantic validation enforces "no alter statements on virtual tables" as well as other things like, no indices, and no triggers, since SQLite
does not support any of those things.
Finally, because virtual tables are on the @recreate plan, you may not have foreign keys that reference virtual tables. Such keys seem
like a bad idea in any case.
There are many cases where stored procedures require complex arguments using data shapes well known
to higher level languages or that come from the schema. There is already some affordance for this
sort of thing in the form of this kind of pattern:
(I'll continue to use this simple example as I discuss the generalization below)
create table Person (
id text primary key,
name text not null,
address text not null,
birthday real
);
Then maybe something like this
create proc insert_person(like Person)
begin
insert into Person from arguments;
end;
The above expands into:
create proc insert_person(
id_ text not null,
name_ text not null,
address_ text not null,
birthday_ real)
begin
insert into Person(id, name, address, birthday)
values(id_, name_, address_, birthday_);
end;
And I think we can all agree the sugared version is a lot easier to reason about
and much less prone to errors as well.
Those features have been in the language for a long time and that's all fine and well
but it isn't general enough to handle the usual mix of situations. For instance what
if you need a procedure that works with two people? A hypothetical insert_two_people
procedure cannot be written with the old form. This is where argument bundles come in.
The idea here is to name the bundle which provides useful reference. To wit:
create proc insert_two_people(p1 like Person, p2 like Person)
begin
call insert_person(from p1);
call insert_person(from p2);
end;
or alternatively
create proc insert_two_people(p1 like Person, p2 like Person)
begin
insert into Person from p1;
insert into Person from p2;
end;
So what's going on here? Well, there are lots of reasons to keep the API to procedures simple
and adding general purpose structured types would be at odds with that. It would require lots
of knowledge about C structure layout and whatnot. And trying to call from java would require
very complex JNI for any such procedure. So we avoid all that. We keep simple arguments.
The above expands into:
create proc insert_person(
p1_id text not null,
p1_name text not null,
p1_address text not null,
p1_birthday real,
p2_id text not null,
p2_name text not null,
p2_address text not null,
p2_birthday real)
begin
insert into Person(id, name, address, birthday)
values(p1_id, p1_name, p1_address, p1_birthday);
insert into Person(id, name, address, birthday)
values(p2_id, p2_name, p2_address, p2_birthday);
end;
Or course the types don't have to be the same, you can create and name shapes of your choice. The language allow
you to use an argument bundle in all the places that a cursor was previously a valid source. That includes insert,
fetch, update cursor, and procedure calls. You can refer to the arguments by their expanded name p1_address
or alternatively p1.address means the same thing.
Here's another example showing a silly but illustrative thing you could do:
create proc insert_lotsa_people(P like Person)
begin
declare C cursor like P;
fetch C from P;
declare i integer not null;
set i := 0;
while (i < 20)
begin
update cursor C using
printf("id_%d", i) id;
insert into Person from C;
end;
end;
The above shows that you can use a bundle as the source of a shape elsewhere, and you can
use a bundle as a source of data to load a cursor. After which you can do all the usual value cursor things
like out statements and so forth.
In order to call procedures with argument bundles more readily from other languages, the JSON output now includes additional
information about where procedure arguments originated; The field with this information is creatively called "argOrigin:"
and it has 3 forms.
"arg_name" -> the argument is not an expansion of anything
"T arg_name" -> the argument came from like T
there will be one arg for each member of T
the formal argument name for this arg will be argname
if T is procedure arguments like p1 arguments then you'll get "p1[arguments] arg_name"
"name T arg_name" -> the argument came from name like T (a named bundle)
there will be one arg for each member of T
the formal argument name for this arg will be T_arg_name
T could be procedure arguments as above
If the source of an argument was a cursor or argument bundle name you get instead that thing's shape source name
this is always better because cursor names and bundle names are not globally unique.
If the cursor had an anonymous source (e.g. like select 1 x) then you get the useless shape name "select"
this is an indicator that you should make some ad hoc struct for this procedure because there is no useful name for the arg bundle's type
None of this matters unless you're trying to make wrappers for a CQL procedure for some other language
and you'd like to have your wrapper deal with structs rather than all loose arguments. the JSON
basically tells you the structs.
Interestingly, argument bundles resulted in a significant reduction of code in the compiler. The argument bundle
name has to be usable in the contexts where a cursor was previously usable. It is another source of shaped data.
Getting that to work proved to be super simple as the two forms look almost identical to the compiler -- no coincidence there.
So very little code was required to make from [cursor_name] work with from [any_shape_name] in the half dozen or so places
that this construct is allowed (e.g. procedure call arguments, insert statements, etc.). However, there was as
much code associated with from arguments as there was from cursor_name. And the code was nearly identical..
When argument bundles were introduced the natural thing to do was to create an artifical bundle called "arguments" which
represents the bundle that is ALL the arguments. With that done, all the code for from arguments could be deleted
because arguments itself was a valid shape name. Hence insert into T from arguments "just works". And so half
the rewrites were deleted. The only cost was that the form from arguments like shape became the cursor form
from arguments(like shape) which only adds mandatory parens to a form that was largely unused anyway (there were two
cases in our entire codebase). The cursor form is more general as you can do from C(like A, like B) to get the
fields that match A then those that match B. Arguments get this for free as well (well, at the cost of parens).
So overall, this feature was added, and the compiler got smaller and cleaner. Only the test suite had to grow.
There is an unfortunate pattern of hard coding constants in SQL which I think comes from the
fact that there's not an especially good way to encode constants in SQL. Things are a little
better In CG/SQL's CQL language because it's normal to run things through the pre-processor first
so you can do things like:
#define BUSINESS_TYPE_RESTAURANT 1
#define BUSINESS_TYPE_LAUNDROMAT 2
Having done so, you could write:
insert into Business using
"Rico's Laundry" name,
BUSINESS_TYPE_LAUNDROMAT type;
-- by the time SQL sees this it becomes
insert into Business(name, type) values('Rico''s Laundry', 2);
And at least you don't have to see these loose '2' values all over. An especially unfortunate
form is the below, in which the auther is clearly crying for a symbol to use:
insert into Business using
"Rico's Laundry" name,
2 type; /* laundromat */
But if we use #define the language knows nothing of the names and it can't help you manage them
or export them consistently or anything like that. I guess #define is pretty useful in several
langauges (C and C++) so you could maybe #include the macros somehow but that doesn't seem
especially great. And if you need them in Java you're getting no help at all.
So to this world we add enumerated constants. This is a bit short of enumerated types as we'll
see later. You can now write something like this:
declare enum business_type integer (
restuarant,
laundromat,
corner_store = 11+3 /* math added for demo purposes only */
);
After this:
select business_type.corner_store;
is the same as
select 14;
And that is exactly what SQLite will see, the literal 14.
What's going on here? There's just a few rules:
the enumeration can be any numeric type (bool, integer, long integer, real)
the values of the enumeration start at 1 (i.e. if there is no = expression the first item will be 1, not 0)
if you don't specify a value, the next value is the previous value + 1
if you do specify a value it can be any constant expression and it will be cast to the type of the enumeration (even if thatis lossy)
the enumeration can refer to previous values in itself with no qualification (big = 100.0, medium = big/2, small = medium/2)
the enumeration can refer to previously defined enumerations as usual (code = business_type.restaurant)
Once the enumeration is defined you refer to its members in a fully qualified fashion enum_name.member_name elsewhere
Why is this better than macros? Well for one thing the enum values can be checked at their declaration site, so if you
have errors you will hear about them in a more reasonable place. But additionally since the structure is known to the
compiler it can give you useful information in the outputs.
In the .h files you get:
enum business_type {
business_type__restaurant = 1,
business_type__laundromat = 2,
business_type__corner_store = 14
};
In case of floating point values such as:
declare enum floating real (
one = 1.0,
two = 2.0,
e = 2.71828,
pi = 3.14159
);
You get:
// enum floating (floating point values)
#define floating__one 1.000000e+00
#define floating__two 2.000000e+00
#define floating__e 2.718280e+00
#define floating__pi 3.141590e+00
Which is unfortunately the best you can do since C has no floating point enums.
But in both cases the enums section of the JSON has the name of the enums and their members and values ready to go.
With these values you can readily generate (with moustache or something) the language interfaces of your choice. This
is a real help if you're trying to make helpers to call your CQL from say Java or something.
To do all this we needed to add some constant folding and general evaluation to the compiler. It's not much,
just the normal numeric types and null. The supported operations include:
+, -, *, /, %, |, &, <<, >>, ~, and, or, not, ==, <=, >=, !=, <, >, the cast operator
and the case forms. These are enough to make a lot of very interesting expressions, all of which are envaluated at
compile time.
While the constant folding was added to allow for rich enum expressions, there is also the const() primitive in the
language now which can appear anywhere a literal could appear. This allows you do things that were previously not
allowed such as:
create table something(
x integer default const((1<<16)|0xf) /* again the math is just for illustration */
);
The const form is also very useful in macros:
#define SOMETHING const(12+3)
This form ensures that the constant will be evaluated at compile time. Const can also also nest so you can build these
kinds of macros from other macros or you can build enums this way. Anywhere you might need literals, you can use const.
Importantly, no enumerated data types were added to the language to do any of this. The values can help you to
achieve some correctness by avoiding transcription mistakes but there is no additional type-safety provided here.
Indeed given the rich mix between these types in SQLite, and with SQLite having no knowledge of enumerations at
all it would be tricky to do a complete job. Still, this might happen in the future.
But for now, declaring constants that are really an intimate part of your schema is now possible and the addition
of the constants to the .h files and the .json output should hopefully make these generally useful. At least
we might see less of the hard-coded constant business with good values baked right into the schema declarations.
Everyone knows the usual expression syntax x LIKE y to do a string match. But the CQL compiler also uses
LIKE in a different way that's powerful and important. CQL has the notion of data shapes and you use
LIKE to refer to them. The simplest source of a data shape, and maybe the most common, is a table.
Maybe something like this:
createtable T(
id integernotnull,
name textnotnull,
age integernotnull
);
Now suppose you want to write a procedure that can insert a row into that table, You could write
This is all fine and well but what if T had 50 columns? That gets old fast. And how can you
be sure that you inserted the columns into T in the right order? This second example also compiles
even though it's clearly wrong:
insert into T(id, name, age) values(age_, name_, id_);
And of course you can imagine things get only more complicated with more columns in T.
We started adding the LIKE form to ease these issues and to ensure some consistency in the APIs while preventing
simple transpostion errors. So you can instead write:
create proc insert_into_T(like T)
begin
insert into T from arguments;
end;
so here the like T in the argument list simply means "make arguments that are the same as the columns of table T" -- well,
almost. It also adds an _ to the end of each name so you end up exactly the same declaration as the long form above.
But you won't miss any arguments, and they'll be in the right order.
And notice that we used from arguments to indicate that we wanted the values to come from the arguments in order. Again
this saves you from a lot of typing and a lot of error checking. You can't get the arguments in the wrong order.
These are the most basic patterns. But there are quite a few more.
Let's suppose you want to write a procedure that returns in row with the highest age in the above. Maybe you write
something like this:
createproc highest_age()
begin
declare C cursorforselect*from T;
declare M cursorlike C;
loopfetch C
begin
if(not M or M.age < C.age)then
fetch M from C;
endif;
end;
out M;
end;
Here we made a cursor M that is the same as the cursor C and then we are going to generate a single row result
from the cursor. Note that if you use a cursor name like M in an expression it refers to the hidden boolean
that says if the cursor has a row in it or not. So M begins empty and we will load it if it's empty or if the age
is higher than what we've already got.
Let's show a few more of the forms. Suppose we don't want to return name, just the id and the age. We can
change things up a tiny bit.
createproc highest_age()
begin
declare C cursorforselect*from T;
declare M cursorlikeselect1 id,99 age;
loopfetch C
begin
if(not M or M.age < C.age)then
fetch M fromcursor C(like M);
endif;
end;
out M;
end;
So two things to notice. We used an ad hoc shape, making a fake select statement that returns the shape we want. This
select doesn't run but it does define types and columns easily. Two not null integers in this case. Now M is not the
same as C so we can't use the simplest form fetch M from C we have to use the more general form.
Fully expanded, what we wrote becomes:
FETCH M(id, age)FROMVALUES(C.id, C.age);
But as you can see, we didn't have to type all those column names. And that's kind of the point of the LIKE construct.
So we've covered a bunch of the shape sources already:
a table name
a cursor name
a select statement that gives the shape in an ad hoc fashion
There are three more
a view name
the return shape of a procedure that returns a result set
the arguments of a procedure
View names are pretty simple, and they work the same as table names so we don't need to discuss those. Let's look
at some of the other uses with procedures.
Suppose we have a procedure that can return a result set shape but we want to be able to mock its results so we
can fake whatever result we need for testing.
We'll complicate this a bit adding a new table (keeping short table names for the sample to save typing)
createtable U(
id integernotnull,
email textnotnull
);
And here's a procedure:
createproc my_proc()
begin
select T.*, U.email from T innerjoin U on T.id = U.id;
end;
Now we want to be able to make any fake result we want, so maybe want a temp table. No problem:
createproc _init_fake_results()
begin
createtemptableifnotexists fake_results(
like my_proc
);
end;
createproc add_fake_result(like fake_results)
begin
insertinto fake_results from arguments;
end;
createproc get_fake_results()
begin
select*from fake_results;
end;
The above is very generic and will maintain well. You can see we made a temp table that will have
exactly the same shape as whatever my_proc returns. In this case it becomes:
CREATEPROC _init_fake_results ()
BEGIN
CREATETEMPTABLEIFNOTEXISTS fake_results(
id INTEGERNOTNULL,
name TEXTNOTNULL,
age INTEGERNOTNULL,
email TEXTNOTNULL
);
END;
And the rest are patterns we've seem before.
The last source of shapes are procedure arguments. There's lots of good cases for those, I wrote an entry on those previously but I'll give a simple example here too.
Suppose we have this weird procedure:
createproc delete_stuff(age_ integer, name_ text)
begin
if age_ isnotnullthen
deletefrom T where T.age = age_;
endif;
if name_ isnotnullthen
deletefrom T where T.name = name_;
endif;
end;
What if we wanted to log any errors that happen here? Maybe make a verison that logs. We can do it like this:
The nice thing about this logging wrapper procedure is that if delete_stuff changes, the wrapper will change with it.
That covers all of the shape sources and as we saw we can use them to create things like cursors, tables, and argument lists.
We can use them to specify a subset of columns that might be of interest when fetching or updating cursors. And we can use
them in one last way -- to restrict arguments to a particular shape. Let's see how that works by making the previous logger
a little different. Here we added an argument which tells if we should look. And that might look like it would
spoil the from arguments part of the forwarding, but there is the final way to use LIKE.
call printf("deleting %d\n", name_);-- or whatever
endif;
call delete_stuff(from arguments like delete_stuff arguments);
end;
So this form lets you use some of your arguments, the ones that match a certain shape. And as we saw in
the previous article you can also use from C to pass arguments where C is a cursor and in that case
you can also specify that arguments be matched by name from C like shape. In both those cases the
formal parameter names of the called procedure are matched against the names of the shape and passed in
the order of the formals. So this is like "call by name", the fields of the cursor or the order of
arguments in the argument list might be different than the formals but you'll get the correct items
in the correct order regardless, because it matches by name.
These forms can save you a lot of typing... and are excellent at avoiding errors and improving maintainability.
Where they appear in SQL statements, everything is expanded before it goes to SQLite so SQLite will see
normal syntax forms. Which is good because obviously SQLite doesn't know anything about this enhanced
LIKE business.
In the examples above there were only one or two columns with very short names, but in real world code
there can easily be dozens of columns with very long names. In those cases, these forms really shine.
Following up on the last blog entry, I thought it would be useful to present a simple error tracing macro that you can use
to see what kind of error flow is going on when you're having trouble understanding why a procedure is returning
an error code. The idea is we want to create a macro that we can use like this:
BEGIN_VERBOSE_STDERR_TRACING;
-- Some procedure(s) that you want to trace
END_VERBOSE_STDERR_TRACING;
We can do that with something like the below macros. These particular ones cause the output to go to stderr via fprintf but if that isn't what you need you can simply edit the macro. The macros looks like this:
-- manually force tracing on by redefining the cql_error_trace macro
#define BEGIN_VERBOSE_STDERR_TRACING \
@echo c, "#undef cql_error_trace\n"; \
@echo c, "#define cql_error_trace() fprintf(stderr, \"CQL Trace at %s:%d in %s: %d %s\\n\", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))\n"
#define END_VERBOSE_STDERR_TRACING \
@echo c, "#undef cql_error_trace\n"; \
@echo c, "#define cql_error_trace()\n"
So basically it's telling CQL to emit a #define into its output stream. In this case:
#define cql_error_trace() fprintf(stderr, "CQL Trace at %s:%d in %s: %d %s\n", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))
You could change that to any function you like, you can have it dump the errors where you like, or you can make it some dummy function you add so that you can set a breakpoint on it.
Whatever you do, do not leave your code with this sort of tracing enabled -- it's far too expensive in terms of code size. But it's perfect if you have this one procedure that is failing and it's hard for you to see where.
Obviously if you're making a custom trace thingy you don't need the macro at all, you can just emit your own #define with @echo as needed.
Note: @echo is quite a sledgehammer so don't use it lightly and not in production code but it is quite helpful for this sort of thing. CQL tests often use it to help make things visible to the tests. If you use @echo in weird ways you might not get working code when the codegen changes in the future.
The relevant state that is available to you inside a macro like this is:
__FILE__ the current filename (comes from the C pre-processor, this is the .c file name not the .sql)
__LINE__ the current line number (comes from the C pre-processor, this is the .c line number)
_rc_ the current SQLite result code (always the current return code in every CQL procedure that uses SQLite)
_db_ the current SQLite database pointer (always the current database in every CQL procedure that uses SQLite)
_PROC_ the current procedure name (CQL has a #define for this for you)
Today we made a couple of minor changes in the code generation to take care of some lingering issues.
The first is that when you did a throw inside a catch to basically rethrow the error, you would lose
the error code if something had succeeded within the catch handler.
The old codegen looked something like this:
catch_start_1: {
printf("error\n");
cql_best_error(&_rc_)
goto cql_cleanup;
}
The problem being that while the printf above is fine and well, if you did any SQL operation then _rc_ would be
clobbered and you'd end up throwing an unrelated error code. cql_best_error would at least make sure it was
a failure code (SQLITE_ERROR) but the original error code was lost.
The new code looks like this:
catch_start_1: {
_rc_thrown_ = _rc_;
printf("error\n");
_rc_ = cql_best_error(_rc_thrown_);
goto cql_cleanup;
}
So now if there are db operations, the original return code is still preserved. Note: you still lose sqlite3_errmsg() because
SQLite doesn't know that cleanup logic is running.
This brings us to the second new thing: general purpose error traces.
Error checking of result codes happens very consistently in CQL output. The usual pattern looks something like this:
_rc_ = cql_exec(_db_,
"SAVEPOINT base_proc_savepoint");
if (_rc_ != SQLITE_OK) goto cql_cleanup;
or if it's inside a try block a little different... very little actually
// try
{
_rc_ = cql_exec(_db_,
"RELEASE SAVEPOINT base_proc_savepoint");
if (_rc_ != SQLITE_OK) goto catch_start_8;
// ... the rest of the try block
}
Basically if the local _rc_ doersn't match the necessary condition we goto the appropriate error label... either the relevant
catch block or else the procedure's cleanup code.
We generalize this a bit now so that it looks like this:
if (_rc_ != SQLITE_OK) { cql_error_trace(); goto cql_cleanup; }
-- or, in a catch...
if (_rc_ != SQLITE_OK) { cql_error_trace(); goto catch_start_8; }
Now the default implementation of cql_error_trace() is in cqlrt.h which you can and should customize. I'll be writing more
about that later but suffice to say you're supposed to replace cqlrt.h and cqlrt.c with suitable runtime helpers for your environment
while keeping cqlrt_common.h and cqlrt_common.c fixed.
So for instance, your cqlrt.h could look like this:
#ifndef CQL_TRACING_ENABLED
#define cql_error_trace()
#else
// whatever tracing you want, for example this might help in test code.
#define cql_error_trace() \
fprintf(stderr, "Error at %s:%d in %s: %d %s\n", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))
#endif
So then when you need to debug problems involving lots of error recovery you can watch the entire chain of events easily.
Note that there are some useful variables there:
In any procedure _db_ is the current database and _rc_ is the most recent return code from SQLite. __FILE__ and __LINE__
of course come from the preprocessor. and _PROC_ (one underscore) is now generated by the compiler. Every procedure's
body now begins with:
#undef _PROC_
#define _PROC_ "the_current_procedure"
So by defining your own cql_error_trace macro you can cause whatever logging you need to happen. Note this can be
very expensive indeed because this happens a lot and even the string literals needed are a significant cost. So generally
this should be off for production builds and enabled as needed for debug builds.
The default implementation is just an empty block
#define cql_error_trace()
But the hook is enough to light up whatever logging you might need, and you can use sqlite3_errmsg() before that message is gone.
I was reviewing the update posting that just went out and I realized I'd forgotten to mention another big ticket item. So
consider this an appendix to the update.
In some cases we started seeing a need to "ship cursors around" a little bit more flexibly.
Note shipping values around is already doable so this new work is largely about being able to create a "statement cursor"
in one procedure and consume it safely elsewhere. The general pattern looks like this:
Declare a statement cursor as usual, maybe something like this:
declare C cursorforselect*from shape_source;
-- or
declare C cursorforcall proc_that_returns_a_shape();
Make an object that can hold a cursor:
declare obj object<T cursor>;
Where T is the name of a shape. It can be a table name, or a view name, or it can be the name of the canonical procedure that returns the result. You really want this to be some kind of global name though. Something you can get with a #include in various places. In this case choices for T might be shape_source the table or proc_that_returns_a_shape the procedure.
Remember you can always make a fake procedure that returns a result to sort of typedef a shape name. e.g.
declareproc my_shape()(id integernotnull, name text);
The procedure here my_shape doesn’t have to actually ever be created, in fact it’s probably better if it doesn’t. You won’t call it, you’re just using its hypothetical result as a shape. This could be useful if you have several procedures like proc_that_returns_a_shape that all return my_shape.
At this point you could use the cursor maybe something like:
loopfetch C
begin
-- do stuff with C
end;
Those are the usual patterns and they let you consume statement cursors sort of “up” from where it was created, but what if you want some worker procedures that consume a cursor there is no good way to pass your cursor down again. Well, there wasn't. Now there is. Let's go back to that box object creation and use it
-- recap: declare the box that holds the cursor (T changed to my_shape for this example)
declare obj object<my_shape cursor>;
-- box the cursor into the object (the cursor shape must match the box shape)
set obj fromcursor C;
The variable obj can now be passed around as usual. Then, later, you can "unbox" it to get a cursor back. Like so
-- unboxing a cursor from an object
declare D cursorfor obj;
These primitives will allow cursors to be passed around with managed lifetime.
Example:
declare C cursorfor box;-- the cursors shape will be my_shape matching box
loopfetch C
begin
-- do something with C
end;
end;
-- captures a cursor and passes it on
createproc cursor_boxer()
begin
declare C cursorforselect*from something_like_my_shape;
declare box object<my_shape cursor>
set box fromcursor C;-- produces error if shape doesn't match
call cursor_user(box);
end;
Importantly, once you box a cursor the underlying SQLite statement’s lifetime is managed by the box object with normal
retain/release semantics so timely release becomes imperative.
With this pattern it's possible to, for instance, consume some of the rows in one procedure and the rest in another procedure.
Now, the main reason for doing this is if you have some standard helper methods that can get a cursor from a variety of places and process it.
But remember, that boxing isn’t the usual pattern at all and returning cursors in a box, while possible, should be avoided in favor of the simpler
pattern of doing your select or call at the end to compute the result as we do now, if only because then then lifetime is very simple in all those cases.
Durably storing a boxed cursor could lead to all manner of problems -- it's just like holding on to a sqlite3_stmt * for a long time.
Actually "just like" is an understatement, it's exactly the same as holding on to a statement for a long time with all the same problems because that
is exactly what's going on here.
So, good generalization, but possibly less Pit of Success, especially with complex box patterns. So watch the sharp edges.
It's hard to believe it's been a month since the welcome message went up. We were
happy to see interest right away and even a bunch of forks but most of all
pull requests. A sweeping change to modernize the cql.y grammar was much
appreciated. That $1 stuff was very old school (I'm showing my age now).
@mingodad gave us an implementation of check and collate column attributes (the check attribute on tables should be easy to add from here)
the select function form should never return objects, only SQLite types, enforced
@attribute(cql:suppress_result_set) was added to save code gen for procedures that don't need the C result set wrappers
cql_cursor_diff_col and cql_cursor_diff_val methods were added to report what's different about two cursors (highly useful in test code)
cql_cursor_format was added so you can quickly convert any cursor into columns and values as string for debug output (no matter the shape)
sqlite3_changes was added to the builtin list so you don't have to use declare select function to use it anymore
cql_get_blob_size was added so you can see how big your blobs are (useful for diagnostics)
trim, rtrim and ltrim were added to the builtin list so you can use them without declare select function
the builtin function attest_notnull was added so that nullables that have already checked can be safely typecast to not null
the bug we saw in demo video number 2 where some foreign keys were not properly linked up in autotest code was fixed (yay videos)
time functions are now known to be not null for a bunch of simple cases such as 'now' arguments
you can use the cast(.. as ..) operator on numeric types outside of the SQL context
@mingodad replaced all the positional references by named references in cql.y (yes! thank you!)
several minor bug fixes
the railroad diagrams were updated
NOTE: I often refer to "sugar" in the below. This is short for syntatic sugar which, in case you're not familiar with the term, refers to a syntatically more pleasing way of writing a concept that is otherwise totally doable with normal syntax. Many languages have sugar for forms that are common -- for brevity, clarity, and/or correctness.
We often add new features to the language to facilitate the writing of tests. The tests have a lot of boilerplate often setting up
and calling the same procedures again and again with slightly different arguments. Long argument lists and long insert column
lists are especially problematic as these can be very error prone. Here good language constructs are very helpful.
We've found good test constructs are often invaluable in production code as well, though in our experience the
tests often have a lot more repitition that needs refactoring than production code. To that end we added some very useful things
in the last month:
Declare cursors in the shape of a procedure's arguments and use them#
The most common way to create a cursor is from a select statement but you can also make a cursor that can hold values for you
by declaring it to be LIKE something else with a shape. A classic example is:
declare C cursorlike some_table;
Now C has the same columns and types as some_table
Many procedures have a result type that is also a shape, for instance any procedure that ends with a select statement has a result
shape defined by the columns of the select statement. You could always do this sort of thing:
declare C cursorlike some_proc;
Meaning make C a cursor whose shape is whatever some_procreturns, which is of course exactly the kind of cursor you need to capture
the result of some_proc.
Now we add:
declare C cursorlike some_proc arguments;
The idea being that the arguments of some_proc are also a shape (unless it has none). With this done you want to use that cursor
to call the procedure -- that being sort of the whole point. So we add this:
call some_proc(from C);
How do we use this effectively? Hold on just a second -- for that answer we need one more big tool to really help the syntax.
Loading up a cursor is done with syntax that is very much like an insert statement. An example might be something like this:
fetch C(x,y,z)fromvalues(1,2,3);
This is simple enough but it becomes more problematic if there are many values and especially if the values have complex names.
To make this a little less error prone CQL now has this sugar form for fetch, insert, and soon update cursor (like maybe
before you see this blog). The more readable form is:
fetch C using
1 x,
2 y,
3 z;
This form has the values next to their names just like in a select statement, like all sugars, it is automatically rewritten to the normal form.
Let's suppose you have to write a test. You have a procedure test_subject that takes some arguments plus
you have another helper procedure test_setup that puts seed data in the right places for your subject.
But there are many variations and a lot of what you do between variations is the same. How can you write this
economically making it clear what is different between variations without a lot of fuss.
Well you can do something like this:
-- use defaults for all the named values
-- use 'seed' for everything else that isn't named
Importantly, the cursor set operations are all by name so the order doesn't matter. Which means even if there are many arguments
you don't have to worry that you got them in the wrong order or that they are the wrong type. Effectively you have
a simple call by name strategy and you can easily read off the arguments. You could do something similarly brief with
helper functions to provide the default arguments but then you can't readily re-use those arguments in later calls or
for verification so this way seems a lot more useful in a test context.
When it comes time to validate, probably your test subject is returning a cursor from a select that you want to check.
A slightly different call will do the job there.
It kills a lot of boilerplate resulting in tests that are much clearer.
And that's what's been going on for the last month in CG/SQL land.
If you got this far thanks for reading. If you didn't get this far,
you aren't reading this anyway so thanking you is moot =P
Stay safe.
Rico for CG/SQL
P.S. most of these fragments don't actually compile because of missing schema and maybe the odd typo. If there is interest I'll make a demo that
works soup to nuts.
Thank you for visiting the CG/SQL's blog page. If you would like to read the very first blog announcing the project, please go over to the Facebook's Engineering post published in early Octover 2020.