14 posts tagged with "facebook"

View All Tags

Introducing @RC builtin variable

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

We've long needed a way to see the most recent SQLite result code SQLite in the context of say a catch block (most other times you can assume SQLITE_OK was the last result code otherwise control flow would transfer elsewhere. Sometimes SQLITE_ROW or SQLITE_DONE might be the current result code.

Soon we'll provide a sample header that declares the most common error codes in an enum but for now you could do something like this:

-- pasted from the sqlite.c
#define SQLITE_BUSY 5 /* The database file is locked */
-- this is a contrived example
create proc get_first_foo(out can_retry bool not null)
begin
-- can_retry is set to 0 automatically, language semantics guarantee this
begin try
select foo from bar limit 1;
end try;
begin catch
set can_retry := (@rc == SQLITE_BUSY);
throw; -- rethrow the original error
end catch;
end;

Introducing Select .. If Nothing

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

The nested select statement is frequently misused, in particular if you get no rows back from the expression that's an error. So for instance:

set x_ := (select x from foo.x where id = y);

This will throw (with a peculiar error, SQLITE_DONE) if there is no such row.

Sometimes people try to fix this problem with a nullcheck:

set x_ := IFNULL((select x from foo.x where id = y), -1);

That doesn't help at all. It's not a null value situation, there's no row at all.

set x_ := (select IFNULL(x,-1) from foo.x where id = y), -1);

Is likewise unhelpful. To help with this situation we add two forms:

-- useful if foo.x is already known to be not null
set x_ := (select x from foo.x where id = y IF NOTHING -1);
-- useful if foo.x might be null
set x_ := (select x from foo.x where id = y IF NOTHING OR NULL -1);

Both of these deal with the case where there is no row. The second lets you have a simple default for both no row or null value. That form is equivalent to:

set x_ := (select IFNULL(x,-1) from foo.x where id = y IF NOTHING -1);

i.e. both problem cases are handled.

Of course the -1 here could be any valid expression, even a second (select...)

Change in No-Result Semantics

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Important change in CQL semantics.

Previously if you did an early return, or fall through the end, from a procedure that is supposed to return a result set but did not in fact provide one, you would get a fake SQLITE_ERROR. Now you get an empty result set for "free".

This interpretation seems much more natural and avoids a lot of really annoying stub selects to comply with the contract.

This also works for the out statement in the same fashion.

If you want to return an error, use throw. This is a lot more natural...

examples:

-- this gives you an empty result set if x <= 0
create proc maybe_return(x integer)
begin
if x > 0 then
select * from foo where foo.y > x;
end if;
end;
-- so does this
create proc maybe_return(x integer)
begin
if x <= 0 then
return;
end if;
select * from foo where foo.y > x;
end;
-- so does this
create proc maybe_out(x integer)
begin
if x <= 0 then
declare C cursor for select etc.
out C;
end if;
end;

Introducing Type "Kinds"

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Further adding to the type calculus of the CQL language we introduced the ability to encode the "kind" of primitive types. This can be used in a number of ways -- like "units" for natural things and like a "type" for synthetic keys and other such. It's easier to illustrate by example.

declare job_id type long<job_id>;
declare person_id type long<person_id>;
declare j job_id;
decalre p person_id;
set p := j; -- this is an error

With the above in place, other expressions like p == j would also produce errors as these long values are no longer type compatible. This is a great way to add enforcement to your schema and procedures. Likewise you can use these annotations to add "units" to your data types. e.g.

declare meters type real<meters>;
declare grams type real<grams>;
declare m meters;
declare g grams;

Variables of type grams (e.g. g) are not compatible with variables of type meters (e.g. m) even though both are real.

Likewise, attemping to insert grams into a column that is typed to meters will give errors. Of course SQLite doesn't know about any of this so all the <> stuff is removed in the generated SQL. This is just about type enforcement at compile time.

Enumerations like:

declare enum surface integer (paper, canvas);
declare enum writer integer (pen, paper, brush);

enable this:

declare s surface; -- s is now of type integer<surface>
declare w writer; -- w is now of type integer<writer>
set s := surface.paper; -- ok
set s := writer.pen; -- error
set w := writer.pencil; -- ok
case when s == w then 1 else 0 end; -- error (w/s not comparable)
set w := s; -- error again

additionally in DML/DDL:

create table draw_action(
w writer,
s surface
);
insert into draw_action values(w, s); -- ok
insert into draw_action values(s, w); -- error!

So the type kinds can be quite helpful when dealing with loose variables.

The notion of specific types was added to the language nearly two years ago to support the object type because there was a great desire to prevent object<dictionary> being assigned from object<list> but this "type kind", whether it's with units (e.g. "meters", "grams") or a type name (e.g. "job_id") adds a lot of high value type checking.

The kind can be added, stripped, or changed with a cast operation and the type system allows a constant or variable with no kind (e.g. "1") to mix and match with any kind so long as the base type is compatible as usual. So you get the most value by using the specific type consistently but you won't go insane adding test cases that use constants for instance.

As of this writing the expression kinds are checked for compatibility everywhere plus or minus bugs. There are extensive tests.

Introducing Named Types

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

A common source of errors in stored procedures is incorrect typing in arguments. For instance, a particular key for an entity might need to be LONG or even always LONG NOT NULL or LONG NOT NULL @SENSITIVE and the only way to do this in the past was maybe with some #define thing. Otherwise you have to diligently get the type right in all the places, and should it ever change, again you have to visit all the places. To help with this situation, and to make code a little more self-describing we add named types to the language. This is a lot like typedef in the C language. They do not create different incompatible types but do let you name things well.

You can now write these sorts of forms:

declare foo_id type long not null;
create table foo(
id foo_id primary key autoincrement,
name text
);
create proc inserter(name_ text, out id foo_id)
begin
insert into foo(id, name) values(NULL, name_);
set id := last_insert_rowid();
end;

Refer to the railroad diagram for the grammar details.

Additionally any enumerated type can be used as a type name. e.g.

declare enum thing integer (
thing1,
thing2
);
declare x thing;

Enumerations always get "not null" in addition to their base type.

This isn't a very complex feature but we hope that it will help create clearer code that is less likely to have type-related bugs.

Introducing Virtual Tables

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Language support for virtual tables has lagged since I always thought they were of little interest to the language anyway. The CREATE TABLE forms in general are only declarations (except if you're doing the schema installer/upgrader output) and so you could just declare say a temp table that corresponds to the virtual table that you made in the same way that you might declare say sqlite_master if you wanted to use it. And since you have to register the module anyway, you may as well create the virtual table at the same time.

So there was no point in adding language support for the thing.

Furthermore the CREATE VIRTUAL TABLE form includes no information about the schema of the table so you'd need some kind of declaration anyway in order to tell the language what the columns are for the table you just created. So again you may as well just declare it like a normal table and not include that table in your schema upgrade file and be done with it.

And that was my thinking for the last 2 years. And then I learned something.

Virtual tables are durable.

Yeah, I always assumed that virtual tables were temp tables and they vanish and have to be redeclared every session but that is not the case. They are part of the durable schema so while you must pre-register the module associated with the virtual table, the virtual table is like other tables in that you only create it once and from then on it's part of the schema every time the database loads until you DROP it.

This changes everything.

With virtual tables being durable they belong in the schema upgrade process. And if they go there they also have to go into the JSON output. But we can't use the vanilla syntax that SQLite uses because that syntax is:

  • not parseable, because the module arguments can be literally anything (or nothing), even a letter to your gramma.
  • the arguments do not necessarily say anything about the table's schema at all

So in the CQL langauge I change the syntax a bit, the generalized form looks like this:

create virtual table virt_table using my_module [(module arguments)] as (
id integer not null,
name text
);

The part after the AS is used by CQL as a table declaration for the virtual table. The grammar for that is exactly the same as a normal CREATE TABLE statement. However that part is not transmitted to SQLite; when the table is created, SQLite sees only the part it cares about, the part before the AS.

Now this leaves the module arguments, they can be one of three things:

  1. no arguments at all
  2. a list of identifiers, constants, and parenthesized sublists just like in the @attribute form
  3. the words arguments following

Case 1 Example

create virtual table virt_table using my_module as (
id integer not null,
name text
);

becomes (to SQLite)

CREATE TABLE virt_table USING my_module;

Note: empty arguments USING my_module() are not allowed in the SQLite docs but do seem to work in SQLite. We take the position that no args should be done with no parens, at least for now.

Case 2 Example

create virtual table virt_table using my_module(foo, 'goo', (1.5, (bar, baz))) as (
id integer not null,
name text
);
CREATE VIRTUAL TABLE virt_table USING my_module(foo, "goo", (1.5, (bar, baz)));

This form allows for very flexible arguments but not totally arbitary arguments, so it can still be parsed and validated.

Case 3 Example

This case recognizes the popular choice that the arguments are often the actual schema declaration for the table in question. So

create virtual table virt_table using my_module(arguments following) as (
id integer not null,
name text
);

becomes

CREATE VIRTUAL TABLE virt_table USING my_module(
id INTEGER NOT NULL,
name TEXT
);

The normalized text (keywords capitalized, whitespace normalized) of the table declaration in the as clause is used as the arguments.

Other details

Virtual tables go into their own section in the JSON and they include the module and moduleArgs entries, they are additionally marked isVirtual in case you want to use the same processing code for virtual tables as normal tables. The JSON format is otherwise the same, although some things can't happen in virtual tables (e.g. there is no TEMP option so "isTemp" must be false in the JSON.

For purposes of schema processing, virtual tables are on the @recreate plan, just like indices, triggers, etc. This is the only option since the alter table form is not allowed on a virtual table.

Semantic validation enforces "no alter statements on virtual tables" as well as other things like, no indices, and no triggers, since SQLite does not support any of those things.

Finally, because virtual tables are on the @recreate plan, you may not have foreign keys that reference virtual tables. Such keys seem like a bad idea in any case.

Introducing Argument Bundles

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

There are many cases where stored procedures require complex arguments using data shapes well known to higher level languages or that come from the schema. There is already some affordance for this sort of thing in the form of this kind of pattern:

(I'll continue to use this simple example as I discuss the generalization below)

create table Person (
id text primary key,
name text not null,
address text not null,
birthday real
);

Then maybe something like this

create proc insert_person(like Person)
begin
insert into Person from arguments;
end;

The above expands into:

create proc insert_person(
id_ text not null,
name_ text not null,
address_ text not null,
birthday_ real)
begin
insert into Person(id, name, address, birthday)
values(id_, name_, address_, birthday_);
end;

And I think we can all agree the sugared version is a lot easier to reason about and much less prone to errors as well.

Those features have been in the language for a long time and that's all fine and well but it isn't general enough to handle the usual mix of situations. For instance what if you need a procedure that works with two people? A hypothetical insert_two_people procedure cannot be written with the old form. This is where argument bundles come in. The idea here is to name the bundle which provides useful reference. To wit:

create proc insert_two_people(p1 like Person, p2 like Person)
begin
call insert_person(from p1);
call insert_person(from p2);
end;

or alternatively

create proc insert_two_people(p1 like Person, p2 like Person)
begin
insert into Person from p1;
insert into Person from p2;
end;

So what's going on here? Well, there are lots of reasons to keep the API to procedures simple and adding general purpose structured types would be at odds with that. It would require lots of knowledge about C structure layout and whatnot. And trying to call from java would require very complex JNI for any such procedure. So we avoid all that. We keep simple arguments. The above expands into:

create proc insert_person(
p1_id text not null,
p1_name text not null,
p1_address text not null,
p1_birthday real,
p2_id text not null,
p2_name text not null,
p2_address text not null,
p2_birthday real)
begin
insert into Person(id, name, address, birthday)
values(p1_id, p1_name, p1_address, p1_birthday);
insert into Person(id, name, address, birthday)
values(p2_id, p2_name, p2_address, p2_birthday);
end;

Or course the types don't have to be the same, you can create and name shapes of your choice. The language allow you to use an argument bundle in all the places that a cursor was previously a valid source. That includes insert, fetch, update cursor, and procedure calls. You can refer to the arguments by their expanded name p1_address or alternatively p1.address means the same thing.

Here's another example showing a silly but illustrative thing you could do:

create proc insert_lotsa_people(P like Person)
begin
declare C cursor like P;
fetch C from P;
declare i integer not null;
set i := 0;
while (i < 20)
begin
update cursor C using
printf("id_%d", i) id;
insert into Person from C;
end;
end;

The above shows that you can use a bundle as the source of a shape elsewhere, and you can use a bundle as a source of data to load a cursor. After which you can do all the usual value cursor things like out statements and so forth.

In order to call procedures with argument bundles more readily from other languages, the JSON output now includes additional information about where procedure arguments originated; The field with this information is creatively called "argOrigin:" and it has 3 forms.

  • "arg_name" -> the argument is not an expansion of anything
  • "T arg_name" -> the argument came from like T
    • there will be one arg for each member of T
    • the formal argument name for this arg will be argname
    • if T is procedure arguments like p1 arguments then you'll get "p1[arguments] arg_name"
  • "name T arg_name" -> the argument came from name like T (a named bundle)
    • there will be one arg for each member of T
    • the formal argument name for this arg will be T_arg_name
    • T could be procedure arguments as above
  • If the source of an argument was a cursor or argument bundle name you get instead that thing's shape source name
    • this is always better because cursor names and bundle names are not globally unique.
  • If the cursor had an anonymous source (e.g. like select 1 x) then you get the useless shape name "select"
    • this is an indicator that you should make some ad hoc struct for this procedure because there is no useful name for the arg bundle's type

None of this matters unless you're trying to make wrappers for a CQL procedure for some other language and you'd like to have your wrapper deal with structs rather than all loose arguments. the JSON basically tells you the structs.

Interestingly, argument bundles resulted in a significant reduction of code in the compiler. The argument bundle name has to be usable in the contexts where a cursor was previously usable. It is another source of shaped data. Getting that to work proved to be super simple as the two forms look almost identical to the compiler -- no coincidence there. So very little code was required to make from [cursor_name] work with from [any_shape_name] in the half dozen or so places that this construct is allowed (e.g. procedure call arguments, insert statements, etc.). However, there was as much code associated with from arguments as there was from cursor_name. And the code was nearly identical..

When argument bundles were introduced the natural thing to do was to create an artifical bundle called "arguments" which represents the bundle that is ALL the arguments. With that done, all the code for from arguments could be deleted because arguments itself was a valid shape name. Hence insert into T from arguments "just works". And so half the rewrites were deleted. The only cost was that the form from arguments like shape became the cursor form from arguments(like shape) which only adds mandatory parens to a form that was largely unused anyway (there were two cases in our entire codebase). The cursor form is more general as you can do from C(like A, like B) to get the fields that match A then those that match B. Arguments get this for free as well (well, at the cost of parens).

So overall, this feature was added, and the compiler got smaller and cleaner. Only the test suite had to grow.

Stay safe out there.

Introducing Declare Enum

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

There is an unfortunate pattern of hard coding constants in SQL which I think comes from the fact that there's not an especially good way to encode constants in SQL. Things are a little better In CG/SQL's CQL language because it's normal to run things through the pre-processor first so you can do things like:

#define BUSINESS_TYPE_RESTAURANT 1
#define BUSINESS_TYPE_LAUNDROMAT 2

Having done so, you could write:

insert into Business using
"Rico's Laundry" name,
BUSINESS_TYPE_LAUNDROMAT type;
-- by the time SQL sees this it becomes
insert into Business(name, type) values('Rico''s Laundry', 2);

And at least you don't have to see these loose '2' values all over. An especially unfortunate form is the below, in which the auther is clearly crying for a symbol to use:

insert into Business using
"Rico's Laundry" name,
2 type; /* laundromat */

But if we use #define the language knows nothing of the names and it can't help you manage them or export them consistently or anything like that. I guess #define is pretty useful in several langauges (C and C++) so you could maybe #include the macros somehow but that doesn't seem especially great. And if you need them in Java you're getting no help at all.

So to this world we add enumerated constants. This is a bit short of enumerated types as we'll see later. You can now write something like this:

declare enum business_type integer (
restuarant,
laundromat,
corner_store = 11+3 /* math added for demo purposes only */
);

After this:

select business_type.corner_store;

is the same as

select 14;

And that is exactly what SQLite will see, the literal 14.

What's going on here? There's just a few rules:

  • the enumeration can be any numeric type (bool, integer, long integer, real)
  • the values of the enumeration start at 1 (i.e. if there is no = expression the first item will be 1, not 0)
  • if you don't specify a value, the next value is the previous value + 1
  • if you do specify a value it can be any constant expression and it will be cast to the type of the enumeration (even if thatis lossy)
  • the enumeration can refer to previous values in itself with no qualification (big = 100.0, medium = big/2, small = medium/2)
  • the enumeration can refer to previously defined enumerations as usual (code = business_type.restaurant)
  • Once the enumeration is defined you refer to its members in a fully qualified fashion enum_name.member_name elsewhere

Why is this better than macros? Well for one thing the enum values can be checked at their declaration site, so if you have errors you will hear about them in a more reasonable place. But additionally since the structure is known to the compiler it can give you useful information in the outputs.

In the .h files you get:

enum business_type {
business_type__restaurant = 1,
business_type__laundromat = 2,
business_type__corner_store = 14
};

In case of floating point values such as:

declare enum floating real (
one = 1.0,
two = 2.0,
e = 2.71828,
pi = 3.14159
);

You get:

// enum floating (floating point values)
#define floating__one 1.000000e+00
#define floating__two 2.000000e+00
#define floating__e 2.718280e+00
#define floating__pi 3.141590e+00

Which is unfortunately the best you can do since C has no floating point enums.

But in both cases the enums section of the JSON has the name of the enums and their members and values ready to go. With these values you can readily generate (with moustache or something) the language interfaces of your choice. This is a real help if you're trying to make helpers to call your CQL from say Java or something.

To do all this we needed to add some constant folding and general evaluation to the compiler. It's not much, just the normal numeric types and null. The supported operations include:

+, -, *, /, %, |, &, <<, >>, ~, and, or, not, ==, <=, >=, !=, <, >, the cast operator and the case forms. These are enough to make a lot of very interesting expressions, all of which are envaluated at compile time.

While the constant folding was added to allow for rich enum expressions, there is also the const() primitive in the language now which can appear anywhere a literal could appear. This allows you do things that were previously not allowed such as:

create table something(
x integer default const((1<<16)|0xf) /* again the math is just for illustration */
);

The const form is also very useful in macros:

#define SOMETHING const(12+3)

This form ensures that the constant will be evaluated at compile time. Const can also also nest so you can build these kinds of macros from other macros or you can build enums this way. Anywhere you might need literals, you can use const.

Importantly, no enumerated data types were added to the language to do any of this. The values can help you to achieve some correctness by avoiding transcription mistakes but there is no additional type-safety provided here. Indeed given the rich mix between these types in SQLite, and with SQLite having no knowledge of enumerations at all it would be tricky to do a complete job. Still, this might happen in the future.

But for now, declaring constants that are really an intimate part of your schema is now possible and the addition of the constants to the .h files and the .json output should hopefully make these generally useful. At least we might see less of the hard-coded constant business with good values baked right into the schema declarations.

Happy Holidays.

A quick tutorial on LIKE forms

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Everyone knows the usual expression syntax x LIKE y to do a string match. But the CQL compiler also uses LIKE in a different way that's powerful and important. CQL has the notion of data shapes and you use LIKE to refer to them. The simplest source of a data shape, and maybe the most common, is a table. Maybe something like this:

create table T(
id integer not null,
name text not null,
age integer not null
);

Now suppose you want to write a procedure that can insert a row into that table, You could write

create proc insert_into_T(
id_ integer not null,
name_ text not null,
age_ integer not null
)
begin
insert into T(id, name, age) values(id_, name_, age_);
end;

This is all fine and well but what if T had 50 columns? That gets old fast. And how can you be sure that you inserted the columns into T in the right order? This second example also compiles even though it's clearly wrong:

insert into T(id, name, age) values(age_, name_, id_);

And of course you can imagine things get only more complicated with more columns in T.

We started adding the LIKE form to ease these issues and to ensure some consistency in the APIs while preventing simple transpostion errors. So you can instead write:

create proc insert_into_T(like T)
begin
insert into T from arguments;
end;

so here the like T in the argument list simply means "make arguments that are the same as the columns of table T" -- well, almost. It also adds an _ to the end of each name so you end up exactly the same declaration as the long form above. But you won't miss any arguments, and they'll be in the right order.

And notice that we used from arguments to indicate that we wanted the values to come from the arguments in order. Again this saves you from a lot of typing and a lot of error checking. You can't get the arguments in the wrong order.

These are the most basic patterns. But there are quite a few more.

Let's suppose you want to write a procedure that returns in row with the highest age in the above. Maybe you write something like this:

create proc highest_age()
begin
declare C cursor for select * from T;
declare M cursor like C;
loop fetch C
begin
if (not M or M.age < C.age) then
fetch M from C;
end if;
end;
out M;
end;

Here we made a cursor M that is the same as the cursor C and then we are going to generate a single row result from the cursor. Note that if you use a cursor name like M in an expression it refers to the hidden boolean that says if the cursor has a row in it or not. So M begins empty and we will load it if it's empty or if the age is higher than what we've already got.

Let's show a few more of the forms. Suppose we don't want to return name, just the id and the age. We can change things up a tiny bit.

create proc highest_age()
begin
declare C cursor for select * from T;
declare M cursor like select 1 id, 99 age;
loop fetch C
begin
if (not M or M.age < C.age) then
fetch M from cursor C(like M);
end if;
end;
out M;
end;

So two things to notice. We used an ad hoc shape, making a fake select statement that returns the shape we want. This select doesn't run but it does define types and columns easily. Two not null integers in this case. Now M is not the same as C so we can't use the simplest form fetch M from C we have to use the more general form.

Fully expanded, what we wrote becomes:

FETCH M(id, age) FROM VALUES(C.id, C.age);

But as you can see, we didn't have to type all those column names. And that's kind of the point of the LIKE construct.

So we've covered a bunch of the shape sources already:

  • a table name
  • a cursor name
  • a select statement that gives the shape in an ad hoc fashion

There are three more

  • a view name
  • the return shape of a procedure that returns a result set
  • the arguments of a procedure

View names are pretty simple, and they work the same as table names so we don't need to discuss those. Let's look at some of the other uses with procedures.

Suppose we have a procedure that can return a result set shape but we want to be able to mock its results so we can fake whatever result we need for testing.

We'll complicate this a bit adding a new table (keeping short table names for the sample to save typing)

create table U(
id integer not null,
email text not null
);

And here's a procedure:

create proc my_proc()
begin
select T.*, U.email from T inner join U on T.id = U.id;
end;

Now we want to be able to make any fake result we want, so maybe want a temp table. No problem:

create proc _init_fake_results()
begin
create temp table if not exists fake_results(
like my_proc
);
end;
create proc add_fake_result(like fake_results)
begin
insert into fake_results from arguments;
end;
create proc get_fake_results()
begin
select * from fake_results;
end;

The above is very generic and will maintain well. You can see we made a temp table that will have exactly the same shape as whatever my_proc returns. In this case it becomes:

CREATE PROC _init_fake_results ()
BEGIN
CREATE TEMP TABLE IF NOT EXISTS fake_results(
id INTEGER NOT NULL,
name TEXT NOT NULL,
age INTEGER NOT NULL,
email TEXT NOT NULL
);
END;

And the rest are patterns we've seem before.

The last source of shapes are procedure arguments. There's lots of good cases for those, I wrote an entry on those previously but I'll give a simple example here too.

Suppose we have this weird procedure:

create proc delete_stuff(age_ integer, name_ text)
begin
if age_ is not null then
delete from T where T.age = age_;
end if;
if name_ is not null then
delete from T where T.name = name_;
end if;
end;

What if we wanted to log any errors that happen here? Maybe make a verison that logs. We can do it like this:

create proc delete_and_log(like delete_stuff arguments)
begin
begin try
call delete_stuff(from arguments);
end try;
begin catch
call printf("delete failed\n"); -- or whatever
throw;
end catch;
end;

The nice thing about this logging wrapper procedure is that if delete_stuff changes, the wrapper will change with it.

That covers all of the shape sources and as we saw we can use them to create things like cursors, tables, and argument lists. We can use them to specify a subset of columns that might be of interest when fetching or updating cursors. And we can use them in one last way -- to restrict arguments to a particular shape. Let's see how that works by making the previous logger a little different. Here we added an argument which tells if we should look. And that might look like it would spoil the from arguments part of the forwarding, but there is the final way to use LIKE.

create proc delete_and_log2(log bool not null, like delete_stuff arguments)
begin
if log and age_ is not null then
call printf("deleting %d\n", age_); -- or whatever
end if;
if log and name_ is not null then
call printf("deleting %d\n", name_); -- or whatever
end if;
call delete_stuff(from arguments like delete_stuff arguments);
end;

So this form lets you use some of your arguments, the ones that match a certain shape. And as we saw in the previous article you can also use from C to pass arguments where C is a cursor and in that case you can also specify that arguments be matched by name from C like shape. In both those cases the formal parameter names of the called procedure are matched against the names of the shape and passed in the order of the formals. So this is like "call by name", the fields of the cursor or the order of arguments in the argument list might be different than the formals but you'll get the correct items in the correct order regardless, because it matches by name.

These forms can save you a lot of typing... and are excellent at avoiding errors and improving maintainability. Where they appear in SQL statements, everything is expanded before it goes to SQLite so SQLite will see normal syntax forms. Which is good because obviously SQLite doesn't know anything about this enhanced LIKE business.

In the examples above there were only one or two columns with very short names, but in real world code there can easily be dozens of columns with very long names. In those cases, these forms really shine.

Error Tracing Helper Macro

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Following up on the last blog entry, I thought it would be useful to present a simple error tracing macro that you can use to see what kind of error flow is going on when you're having trouble understanding why a procedure is returning an error code. The idea is we want to create a macro that we can use like this:

BEGIN_VERBOSE_STDERR_TRACING;
-- Some procedure(s) that you want to trace
END_VERBOSE_STDERR_TRACING;

We can do that with something like the below macros. These particular ones cause the output to go to stderr via fprintf but if that isn't what you need you can simply edit the macro. The macros looks like this:

-- manually force tracing on by redefining the cql_error_trace macro
#define BEGIN_VERBOSE_STDERR_TRACING \
@echo c, "#undef cql_error_trace\n"; \
@echo c, "#define cql_error_trace() fprintf(stderr, \"CQL Trace at %s:%d in %s: %d %s\\n\", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))\n"
#define END_VERBOSE_STDERR_TRACING \
@echo c, "#undef cql_error_trace\n"; \
@echo c, "#define cql_error_trace()\n"

So basically it's telling CQL to emit a #define into its output stream. In this case:

#define cql_error_trace() fprintf(stderr, "CQL Trace at %s:%d in %s: %d %s\n", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))

You could change that to any function you like, you can have it dump the errors where you like, or you can make it some dummy function you add so that you can set a breakpoint on it.

Whatever you do, do not leave your code with this sort of tracing enabled -- it's far too expensive in terms of code size. But it's perfect if you have this one procedure that is failing and it's hard for you to see where.

Obviously if you're making a custom trace thingy you don't need the macro at all, you can just emit your own #define with @echo as needed.

Note: @echo is quite a sledgehammer so don't use it lightly and not in production code but it is quite helpful for this sort of thing. CQL tests often use it to help make things visible to the tests. If you use @echo in weird ways you might not get working code when the codegen changes in the future.

The relevant state that is available to you inside a macro like this is:

  • __FILE__ the current filename (comes from the C pre-processor, this is the .c file name not the .sql)
  • __LINE__ the current line number (comes from the C pre-processor, this is the .c line number)
  • _rc_ the current SQLite result code (always the current return code in every CQL procedure that uses SQLite)
  • _db_ the current SQLite database pointer (always the current database in every CQL procedure that uses SQLite)
  • _PROC_ the current procedure name (CQL has a #define for this for you)

Introducing General Purpose Error Tracing

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Today we made a couple of minor changes in the code generation to take care of some lingering issues.

The first is that when you did a throw inside a catch to basically rethrow the error, you would lose the error code if something had succeeded within the catch handler.

The old codegen looked something like this:

catch_start_1: {
printf("error\n");
cql_best_error(&_rc_)
goto cql_cleanup;
}

The problem being that while the printf above is fine and well, if you did any SQL operation then _rc_ would be clobbered and you'd end up throwing an unrelated error code. cql_best_error would at least make sure it was a failure code (SQLITE_ERROR) but the original error code was lost.

The new code looks like this:

catch_start_1: {
_rc_thrown_ = _rc_;
printf("error\n");
_rc_ = cql_best_error(_rc_thrown_);
goto cql_cleanup;
}

So now if there are db operations, the original return code is still preserved. Note: you still lose sqlite3_errmsg() because SQLite doesn't know that cleanup logic is running.

This brings us to the second new thing: general purpose error traces.

Error checking of result codes happens very consistently in CQL output. The usual pattern looks something like this:

_rc_ = cql_exec(_db_,
"SAVEPOINT base_proc_savepoint");
if (_rc_ != SQLITE_OK) goto cql_cleanup;

or if it's inside a try block a little different... very little actually

// try
{
_rc_ = cql_exec(_db_,
"RELEASE SAVEPOINT base_proc_savepoint");
if (_rc_ != SQLITE_OK) goto catch_start_8;
// ... the rest of the try block
}

Basically if the local _rc_ doersn't match the necessary condition we goto the appropriate error label... either the relevant catch block or else the procedure's cleanup code.

We generalize this a bit now so that it looks like this:

if (_rc_ != SQLITE_OK) { cql_error_trace(); goto cql_cleanup; }
-- or, in a catch...
if (_rc_ != SQLITE_OK) { cql_error_trace(); goto catch_start_8; }

Now the default implementation of cql_error_trace() is in cqlrt.h which you can and should customize. I'll be writing more about that later but suffice to say you're supposed to replace cqlrt.h and cqlrt.c with suitable runtime helpers for your environment while keeping cqlrt_common.h and cqlrt_common.c fixed.

So for instance, your cqlrt.h could look like this:

#ifndef CQL_TRACING_ENABLED
#define cql_error_trace()
#else
// whatever tracing you want, for example this might help in test code.
#define cql_error_trace() \
fprintf(stderr, "Error at %s:%d in %s: %d %s\n", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))
#endif

So then when you need to debug problems involving lots of error recovery you can watch the entire chain of events easily.

Note that there are some useful variables there:

In any procedure _db_ is the current database and _rc_ is the most recent return code from SQLite. __FILE__ and __LINE__ of course come from the preprocessor. and _PROC_ (one underscore) is now generated by the compiler. Every procedure's body now begins with:

#undef _PROC_
#define _PROC_ "the_current_procedure"

So by defining your own cql_error_trace macro you can cause whatever logging you need to happen. Note this can be very expensive indeed because this happens a lot and even the string literals needed are a significant cost. So generally this should be off for production builds and enabled as needed for debug builds.

The default implementation is just an empty block

#define cql_error_trace()

But the hook is enough to light up whatever logging you might need, and you can use sqlite3_errmsg() before that message is gone.

Good hunting.

More Flexible Cursor Patterns Using "Boxing"

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

I was reviewing the update posting that just went out and I realized I'd forgotten to mention another big ticket item. So consider this an appendix to the update.

In some cases we started seeing a need to "ship cursors around" a little bit more flexibly. Note shipping values around is already doable so this new work is largely about being able to create a "statement cursor" in one procedure and consume it safely elsewhere. The general pattern looks like this:

Declare a statement cursor as usual, maybe something like this:

declare C cursor for select * from shape_source;
-- or
declare C cursor for call proc_that_returns_a_shape();

Make an object that can hold a cursor:

declare obj object<T cursor>;

Where T is the name of a shape. It can be a table name, or a view name, or it can be the name of the canonical procedure that returns the result. You really want this to be some kind of global name though. Something you can get with a #include in various places. In this case choices for T might be shape_source the table or proc_that_returns_a_shape the procedure.

Remember you can always make a fake procedure that returns a result to sort of typedef a shape name. e.g.

declare proc my_shape() (id integer not null, name text);

The procedure here my_shape doesn鈥檛 have to actually ever be created, in fact it鈥檚 probably better if it doesn鈥檛. You won鈥檛 call it, you鈥檙e just using its hypothetical result as a shape. This could be useful if you have several procedures like proc_that_returns_a_shape that all return my_shape.

At this point you could use the cursor maybe something like:

loop fetch C
begin
-- do stuff with C
end;

Those are the usual patterns and they let you consume statement cursors sort of 鈥渦p鈥 from where it was created, but what if you want some worker procedures that consume a cursor there is no good way to pass your cursor down again. Well, there wasn't. Now there is. Let's go back to that box object creation and use it

-- recap: declare the box that holds the cursor (T changed to my_shape for this example)
declare obj object<my_shape cursor>;
-- box the cursor into the object (the cursor shape must match the box shape)
set obj from cursor C;

The variable obj can now be passed around as usual. Then, later, you can "unbox" it to get a cursor back. Like so

-- unboxing a cursor from an object
declare D cursor for obj;

These primitives will allow cursors to be passed around with managed lifetime. Example:

-- consumes a cursor
create proc cursor_user(box object<my_shape cursor>)
begin
declare C cursor for box; -- the cursors shape will be my_shape matching box
loop fetch C
begin
-- do something with C
end;
end;
-- captures a cursor and passes it on
create proc cursor_boxer()
begin
declare C cursor for select * from something_like_my_shape;
declare box object<my_shape cursor>
set box from cursor C; -- produces error if shape doesn't match
call cursor_user(box);
end;

Importantly, once you box a cursor the underlying SQLite statement鈥檚 lifetime is managed by the box object with normal retain/release semantics so timely release becomes imperative.

With this pattern it's possible to, for instance, consume some of the rows in one procedure and the rest in another procedure.

Now, the main reason for doing this is if you have some standard helper methods that can get a cursor from a variety of places and process it. But remember, that boxing isn鈥檛 the usual pattern at all and returning cursors in a box, while possible, should be avoided in favor of the simpler pattern of doing your select or call at the end to compute the result as we do now, if only because then then lifetime is very simple in all those cases. Durably storing a boxed cursor could lead to all manner of problems -- it's just like holding on to a sqlite3_stmt * for a long time. Actually "just like" is an understatement, it's exactly the same as holding on to a statement for a long time with all the same problems because that is exactly what's going on here.

So, good generalization, but possibly less Pit of Success, especially with complex box patterns. So watch the sharp edges.

One Month Update

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

It's hard to believe it's been a month since the welcome message went up. We were happy to see interest right away and even a bunch of forks but most of all pull requests. A sweeping change to modernize the cql.y grammar was much appreciated. That $1 stuff was very old school (I'm showing my age now).

Here's a quick summary of what's been going on:

  • @mingodad gave us an implementation of check and collate column attributes (the check attribute on tables should be easy to add from here)
  • the select function form should never return objects, only SQLite types, enforced
  • @attribute(cql:suppress_result_set) was added to save code gen for procedures that don't need the C result set wrappers
  • cql_cursor_diff_col and cql_cursor_diff_val methods were added to report what's different about two cursors (highly useful in test code)
  • cql_cursor_format was added so you can quickly convert any cursor into columns and values as string for debug output (no matter the shape)
  • sqlite3_changes was added to the builtin list so you don't have to use declare select function to use it anymore
  • cql_get_blob_size was added so you can see how big your blobs are (useful for diagnostics)
  • trim, rtrim and ltrim were added to the builtin list so you can use them without declare select function
  • the builtin function attest_notnull was added so that nullables that have already checked can be safely typecast to not null
  • the bug we saw in demo video number 2 where some foreign keys were not properly linked up in autotest code was fixed (yay videos)
  • time functions are now known to be not null for a bunch of simple cases such as 'now' arguments
  • you can use the cast(.. as ..) operator on numeric types outside of the SQL context
  • @mingodad replaced all the positional references by named references in cql.y (yes! thank you!)
  • several minor bug fixes
  • the railroad diagrams were updated

NOTE: I often refer to "sugar" in the below. This is short for syntatic sugar which, in case you're not familiar with the term, refers to a syntatically more pleasing way of writing a concept that is otherwise totally doable with normal syntax. Many languages have sugar for forms that are common -- for brevity, clarity, and/or correctness.

And now a few notes on The Big Stuff

We often add new features to the language to facilitate the writing of tests. The tests have a lot of boilerplate often setting up and calling the same procedures again and again with slightly different arguments. Long argument lists and long insert column lists are especially problematic as these can be very error prone. Here good language constructs are very helpful. We've found good test constructs are often invaluable in production code as well, though in our experience the tests often have a lot more repitition that needs refactoring than production code. To that end we added some very useful things in the last month:

Declare cursors in the shape of a procedure's arguments and use them

The most common way to create a cursor is from a select statement but you can also make a cursor that can hold values for you by declaring it to be LIKE something else with a shape. A classic example is:

declare C cursor like some_table;

Now C has the same columns and types as some_table

Many procedures have a result type that is also a shape, for instance any procedure that ends with a select statement has a result shape defined by the columns of the select statement. You could always do this sort of thing:

declare C cursor like some_proc;

Meaning make C a cursor whose shape is whatever some_procreturns, which is of course exactly the kind of cursor you need to capture the result of some_proc.

Now we add:

declare C cursor like some_proc arguments;

The idea being that the arguments of some_proc are also a shape (unless it has none). With this done you want to use that cursor to call the procedure -- that being sort of the whole point. So we add this:

call some_proc(from C);

How do we use this effectively? Hold on just a second -- for that answer we need one more big tool to really help the syntax.

Loading cursors and inserting columns

Loading up a cursor is done with syntax that is very much like an insert statement. An example might be something like this:

fetch C(x,y,z) from values(1,2,3);

This is simple enough but it becomes more problematic if there are many values and especially if the values have complex names. To make this a little less error prone CQL now has this sugar form for fetch, insert, and soon update cursor (like maybe before you see this blog). The more readable form is:

fetch C using
1 x,
2 y,
3 z;

This form has the values next to their names just like in a select statement, like all sugars, it is automatically rewritten to the normal form.

Likewise

insert into some_table using
1 id,
'fred' first_name,
'flintstone' last_name,
'bedrock' home_town,
'dino' favorite_pet,
'wilma' life_partner;

becomes

insert into some_table(id, first_name, last_name, home_town, favorite_pet, life_partner)
values(1, 'fred', 'flintstone', 'bedrock', 'dino', 'wilma');

except the sugar form is much less error prone. This form doesn't generalize to many values but the single row case is super common.

Since this form is automatically rewritten SQLite will never see the sugar syntax, it will get the normal syntax.

NOTE: the insert rewrite is coming later today, and will likely be live by the time you read this.

Putting these together

Let's suppose you have to write a test. You have a procedure test_subject that takes some arguments plus you have another helper procedure test_setup that puts seed data in the right places for your subject. But there are many variations and a lot of what you do between variations is the same. How can you write this economically making it clear what is different between variations without a lot of fuss. Well you can do something like this:

-- use defaults for all the named values
-- use 'seed' for everything else that isn't named
create proc default_setup_args(seed integer not null)
begin
declare args cursor like test_setup arguments;
fetch args using
1334 primary_id,
98012 secondary_id,
'foo' useful_name,
'bar' other_useful_name,
1 fast_mode
@dummy_seed(seed);
out args;
end;

With the above you can easily see which values go to which arguments

Your test setup can now look something like this:

declare setup_args cursor like test_setup arguments;
fetch setup_args from call default_setup_args(1999);
update cursor setup_args using
0 fast_mode; -- override fast mode for this test
call test_setup(from setup_args);

To call the test subject you probably need some of those setup arguments and maybe some more things.

create proc default_subject_args(like default_setup_args, other_thing bool not null)
begin
declare args cursor like test_subject arguments;
fetch args using
primary_id primary_id, -- this came from the default_setup_args result
secondary_id secondary_id, -- so did this
useful_name name, -- the field names don't have to match
fast_mode fast_mode,
other_thing other_thing;
out args;
end;

Then the test code

declare test_args cursor like test_subject arguments;
fetch test_args from call default_subject_args(0);
call test_subject(from test_args);

Importantly, the cursor set operations are all by name so the order doesn't matter. Which means even if there are many arguments you don't have to worry that you got them in the wrong order or that they are the wrong type. Effectively you have a simple call by name strategy and you can easily read off the arguments. You could do something similarly brief with helper functions to provide the default arguments but then you can't readily re-use those arguments in later calls or for verification so this way seems a lot more useful in a test context.

When it comes time to validate, probably your test subject is returning a cursor from a select that you want to check. A slightly different call will do the job there.

Cursor Differencing

With the setup above you can verify results very easily. Let's change it a little bit:

-- same as before, with a cursor
declare results cursor for call test_subject(from test_args);
-- get the first row
fetch results;
declare expected cursor like results;
fetch expected using
setup_args.primary_id primary_id,
setup_args.useful_name name,
test_args.other_thing other_thing
@dummy_seed(1999); -- dummy values for all other columns
-- make a macro like EXPECT_CURSOR_EQ(x,y) for this
-- if the cursors are different the result is a string with the first
-- different column name and the left and right values ready to print
call ExpectNull(cql_cursor_diff_val(expected, result));

ExpectEqual could be

create proc ExpectNull(t text)
begin
if t is not null then
call printf('%s\n', t); -- or whatever
throw;
end if;
end;

All that testing support comes from:

  • cursors in the shape of arguments
  • cleaner fetch/insert syntax
  • cursors passed as arguments
  • cursor differences

It kills a lot of boilerplate resulting in tests that are much clearer.

And that's what's been going on for the last month in CG/SQL land.

If you got this far thanks for reading. If you didn't get this far, you aren't reading this anyway so thanking you is moot =P

Stay safe.

Rico for CG/SQL

P.S. most of these fragments don't actually compile because of missing schema and maybe the odd typo. If there is interest I'll make a demo that works soup to nuts.

Welcome

CG/SQL Team

CG/SQL Team

Maintainer of CG/SQL

Hello everyone!

Thank you for visiting the CG/SQL's blog page. If you would like to read the very first blog announcing the project, please go over to the Facebook's Engineering post published in early Octover 2020.

Looking forward to working with all of you!

Sincerely, CG/SQL Team