5 posts tagged with "errors"

Introducing Argument Bundles

December 8, 2020 · 6 min read

Maintainer of CG/SQL

There are many cases where stored procedures require complex arguments using data shapes well known to higher level languages or that come from the schema. There is already some affordance for this sort of thing in the form of this kind of pattern:

(I'll continue to use this simple example as I discuss the generalization below)

create table Person (
   id text primary key,
   name text not null,
   address text not null,
   birthday real
);

Then maybe something like this

create proc insert_person(like Person)
begin
    insert into Person from arguments;
end;

The above expands into:

create proc insert_person(
    id_ text not null,
    name_ text not null,
    address_ text not null,
    birthday_ real)
begin
    insert into Person(id, name, address, birthday)
        values(id_, name_, address_, birthday_);
end;

And I think we can all agree the sugared version is a lot easier to reason about and much less prone to errors as well.

Those features have been in the language for a long time and that's all fine and well but it isn't general enough to handle the usual mix of situations. For instance what if you need a procedure that works with two people? A hypothetical insert_two_people procedure cannot be written with the old form. This is where argument bundles come in. The idea here is to name the bundle which provides useful reference. To wit:

create proc insert_two_people(p1 like Person, p2 like Person)
begin
    call insert_person(from p1);
    call insert_person(from p2);
end;

or alternatively

create proc insert_two_people(p1 like Person, p2 like Person)
begin
    insert into Person from p1;
    insert into Person from p2;
end;

So what's going on here? Well, there are lots of reasons to keep the API to procedures simple and adding general purpose structured types would be at odds with that. It would require lots of knowledge about C structure layout and whatnot. And trying to call from java would require very complex JNI for any such procedure. So we avoid all that. We keep simple arguments. The above expands into:

create proc insert_person(
    p1_id text not null,
    p1_name text not null,
    p1_address text not null,
    p1_birthday real,
    p2_id text not null,
    p2_name text not null,
    p2_address text not null,
    p2_birthday real)
begin
    insert into Person(id, name, address, birthday)
        values(p1_id, p1_name, p1_address, p1_birthday);
    insert into Person(id, name, address, birthday)
        values(p2_id, p2_name, p2_address, p2_birthday);
end;

Or course the types don't have to be the same, you can create and name shapes of your choice. The language allow you to use an argument bundle in all the places that a cursor was previously a valid source. That includes insert, fetch, update cursor, and procedure calls. You can refer to the arguments by their expanded name p1_address or alternatively p1.address means the same thing.

Here's another example showing a silly but illustrative thing you could do:

create proc insert_lotsa_people(P like Person)
begin
    declare C cursor like P;
    fetch C from P;
    declare i integer not null;
    set i := 0;
    while (i < 20)
    begin
        update cursor C using
            printf("id_%d", i) id;
        insert into Person from C;
    end;
end;

The above shows that you can use a bundle as the source of a shape elsewhere, and you can use a bundle as a source of data to load a cursor. After which you can do all the usual value cursor things like out statements and so forth.

In order to call procedures with argument bundles more readily from other languages, the JSON output now includes additional information about where procedure arguments originated; The field with this information is creatively called "argOrigin:" and it has 3 forms.

"arg_name" -> the argument is not an expansion of anything
"T arg_name" -> the argument came from like T
- there will be one arg for each member of T
- the formal argument name for this arg will be argname
- if T is procedure arguments like p1 arguments then you'll get "p1[arguments] arg_name"
"name T arg_name" -> the argument came from name like T (a named bundle)
- there will be one arg for each member of T
- the formal argument name for this arg will be T_arg_name
- T could be procedure arguments as above
If the source of an argument was a cursor or argument bundle name you get instead that thing's shape source name
- this is always better because cursor names and bundle names are not globally unique.
If the cursor had an anonymous source (e.g. like select 1 x) then you get the useless shape name "select"
- this is an indicator that you should make some ad hoc struct for this procedure because there is no useful name for the arg bundle's type

None of this matters unless you're trying to make wrappers for a CQL procedure for some other language and you'd like to have your wrapper deal with structs rather than all loose arguments. the JSON basically tells you the structs.

Interestingly, argument bundles resulted in a significant reduction of code in the compiler. The argument bundle name has to be usable in the contexts where a cursor was previously usable. It is another source of shaped data. Getting that to work proved to be super simple as the two forms look almost identical to the compiler -- no coincidence there. So very little code was required to make from [cursor_name] work with from [any_shape_name] in the half dozen or so places that this construct is allowed (e.g. procedure call arguments, insert statements, etc.). However, there was as much code associated with from arguments as there was from cursor_name. And the code was nearly identical..

When argument bundles were introduced the natural thing to do was to create an artifical bundle called "arguments" which represents the bundle that is ALL the arguments. With that done, all the code for from arguments could be deleted because arguments itself was a valid shape name. Hence insert into T from arguments "just works". And so half the rewrites were deleted. The only cost was that the form from arguments like shape became the cursor form from arguments(like shape) which only adds mandatory parens to a form that was largely unused anyway (there were two cases in our entire codebase). The cursor form is more general as you can do from C(like A, like B) to get the fields that match A then those that match B. Arguments get this for free as well (well, at the cost of parens).

So overall, this feature was added, and the compiler got smaller and cleaner. Only the test suite had to grow.

Stay safe out there.

Introducing Declare Enum

December 3, 2020 · 5 min read

CG/SQL Team

Maintainer of CG/SQL

There is an unfortunate pattern of hard coding constants in SQL which I think comes from the fact that there's not an especially good way to encode constants in SQL. Things are a little better In CG/SQL's CQL language because it's normal to run things through the pre-processor first so you can do things like:

#define BUSINESS_TYPE_RESTAURANT 1
#define BUSINESS_TYPE_LAUNDROMAT 2

Having done so, you could write:

insert into Business using
   "Rico's Laundry"  name,
   BUSINESS_TYPE_LAUNDROMAT type;

-- by the time SQL sees this it becomes
insert into Business(name, type) values('Rico''s Laundry', 2);

And at least you don't have to see these loose '2' values all over. An especially unfortunate form is the below, in which the auther is clearly crying for a symbol to use:

insert into Business using
   "Rico's Laundry"  name,
   2 type; /* laundromat */

But if we use #define the language knows nothing of the names and it can't help you manage them or export them consistently or anything like that. I guess #define is pretty useful in several langauges (C and C++) so you could maybe #include the macros somehow but that doesn't seem especially great. And if you need them in Java you're getting no help at all.

So to this world we add enumerated constants. This is a bit short of enumerated types as we'll see later. You can now write something like this:

declare enum business_type integer (
  restuarant,
  laundromat,
  corner_store = 11+3  /* math added for demo purposes only */
);

After this:

select business_type.corner_store;

is the same as

select 14;

And that is exactly what SQLite will see, the literal 14.

What's going on here? There's just a few rules:

the enumeration can be any numeric type (bool, integer, long integer, real)
the values of the enumeration start at 1 (i.e. if there is no = expression the first item will be 1, not 0)
if you don't specify a value, the next value is the previous value + 1
if you do specify a value it can be any constant expression and it will be cast to the type of the enumeration (even if thatis lossy)
the enumeration can refer to previous values in itself with no qualification (big = 100.0, medium = big/2, small = medium/2)
the enumeration can refer to previously defined enumerations as usual (code = business_type.restaurant)
Once the enumeration is defined you refer to its members in a fully qualified fashion enum_name.member_name elsewhere

Why is this better than macros? Well for one thing the enum values can be checked at their declaration site, so if you have errors you will hear about them in a more reasonable place. But additionally since the structure is known to the compiler it can give you useful information in the outputs.

In the .h files you get:

enum business_type {
  business_type__restaurant = 1,
  business_type__laundromat = 2,
  business_type__corner_store = 14
};

In case of floating point values such as:

declare enum floating real (
  one = 1.0,
  two = 2.0,
  e = 2.71828,
  pi = 3.14159
);

You get:

// enum floating (floating point values)
#define floating__one 1.000000e+00
#define floating__two 2.000000e+00
#define floating__e 2.718280e+00
#define floating__pi 3.141590e+00

Which is unfortunately the best you can do since C has no floating point enums.

But in both cases the enums section of the JSON has the name of the enums and their members and values ready to go. With these values you can readily generate (with moustache or something) the language interfaces of your choice. This is a real help if you're trying to make helpers to call your CQL from say Java or something.

To do all this we needed to add some constant folding and general evaluation to the compiler. It's not much, just the normal numeric types and null. The supported operations include:

+, -, *, /, %, |, &, <<, >>, ~, and, or, not, ==, <=, >=, !=, <, >, the cast operator and the case forms. These are enough to make a lot of very interesting expressions, all of which are envaluated at compile time.

While the constant folding was added to allow for rich enum expressions, there is also the const() primitive in the language now which can appear anywhere a literal could appear. This allows you do things that were previously not allowed such as:

create table something(
  x integer default const((1<<16)|0xf) /*  again the math is just for illustration */
);

The const form is also very useful in macros:

#define SOMETHING const(12+3)

This form ensures that the constant will be evaluated at compile time. Const can also also nest so you can build these kinds of macros from other macros or you can build enums this way. Anywhere you might need literals, you can use const.

Importantly, no enumerated data types were added to the language to do any of this. The values can help you to achieve some correctness by avoiding transcription mistakes but there is no additional type-safety provided here. Indeed given the rich mix between these types in SQLite, and with SQLite having no knowledge of enumerations at all it would be tricky to do a complete job. Still, this might happen in the future.

But for now, declaring constants that are really an intimate part of your schema is now possible and the addition of the constants to the .h files and the .json output should hopefully make these generally useful. At least we might see less of the hard-coded constant business with good values baked right into the schema declarations.

Happy Holidays.

A quick tutorial on LIKE forms

November 20, 2020 · 8 min read

CG/SQL Team

Maintainer of CG/SQL

Everyone knows the usual expression syntax x LIKE y to do a string match. But the CQL compiler also uses LIKE in a different way that's powerful and important. CQL has the notion of data shapes and you use LIKE to refer to them. The simplest source of a data shape, and maybe the most common, is a table. Maybe something like this:

create table T(
 id integer not null,
 name text not null,
 age integer not null
);

Now suppose you want to write a procedure that can insert a row into that table, You could write

create proc insert_into_T(
  id_ integer not null, 
  name_ text not null, 
  age_ integer not null
)
begin
  insert into T(id, name, age)  values(id_, name_, age_);
end;

This is all fine and well but what if T had 50 columns? That gets old fast. And how can you be sure that you inserted the columns into T in the right order? This second example also compiles even though it's clearly wrong:

  insert into T(id, name, age) values(age_, name_, id_);

And of course you can imagine things get only more complicated with more columns in T.

We started adding the LIKE form to ease these issues and to ensure some consistency in the APIs while preventing simple transpostion errors. So you can instead write:

create proc insert_into_T(like T)
begin
  insert into T from arguments;
end;

so here the like T in the argument list simply means "make arguments that are the same as the columns of table T" -- well, almost. It also adds an _ to the end of each name so you end up exactly the same declaration as the long form above. But you won't miss any arguments, and they'll be in the right order.

And notice that we used from arguments to indicate that we wanted the values to come from the arguments in order. Again this saves you from a lot of typing and a lot of error checking. You can't get the arguments in the wrong order.

These are the most basic patterns. But there are quite a few more.

Let's suppose you want to write a procedure that returns in row with the highest age in the above. Maybe you write something like this:

create proc highest_age()
begin
  declare C cursor for select * from T;
  declare M cursor like C;
  loop fetch C
  begin
     if (not M or M.age < C.age) then
       fetch M from C;
     end if;
  end;
  out M;
end;

Here we made a cursor M that is the same as the cursor C and then we are going to generate a single row result from the cursor. Note that if you use a cursor name like M in an expression it refers to the hidden boolean that says if the cursor has a row in it or not. So M begins empty and we will load it if it's empty or if the age is higher than what we've already got.

Let's show a few more of the forms. Suppose we don't want to return name, just the id and the age. We can change things up a tiny bit.

create proc highest_age()
begin
  declare C cursor for select * from T;
  declare M cursor like select 1 id, 99 age;
  loop fetch C
  begin
     if (not M or M.age < C.age) then
       fetch M from cursor C(like M);
     end if;
  end;
  out M;
end;

So two things to notice. We used an ad hoc shape, making a fake select statement that returns the shape we want. This select doesn't run but it does define types and columns easily. Two not null integers in this case. Now M is not the same as C so we can't use the simplest form fetch M from C we have to use the more general form.

Fully expanded, what we wrote becomes:

  FETCH M(id, age) FROM VALUES(C.id, C.age);

But as you can see, we didn't have to type all those column names. And that's kind of the point of the LIKE construct.

So we've covered a bunch of the shape sources already:

a table name
a cursor name
a select statement that gives the shape in an ad hoc fashion

There are three more

a view name
the return shape of a procedure that returns a result set
the arguments of a procedure

View names are pretty simple, and they work the same as table names so we don't need to discuss those. Let's look at some of the other uses with procedures.

Suppose we have a procedure that can return a result set shape but we want to be able to mock its results so we can fake whatever result we need for testing.

We'll complicate this a bit adding a new table (keeping short table names for the sample to save typing)

create table U(
 id integer not null,
 email text not null
);

And here's a procedure:

create proc my_proc()
begin
   select T.*, U.email from T inner join U on T.id = U.id;
end;

Now we want to be able to make any fake result we want, so maybe want a temp table. No problem:

create proc _init_fake_results()
begin
  create temp table if not exists fake_results(
   like my_proc
  );
end;

create proc add_fake_result(like fake_results)
begin
  insert into fake_results from arguments;
end;

create proc get_fake_results()
begin
  select * from fake_results;
end;

The above is very generic and will maintain well. You can see we made a temp table that will have exactly the same shape as whatever my_proc returns. In this case it becomes:

CREATE PROC _init_fake_results ()
BEGIN
  CREATE TEMP TABLE IF NOT EXISTS fake_results(
    id INTEGER NOT NULL,
    name TEXT NOT NULL,
    age INTEGER NOT NULL,
    email TEXT NOT NULL
  );
END;

And the rest are patterns we've seem before.

The last source of shapes are procedure arguments. There's lots of good cases for those, I wrote an entry on those previously but I'll give a simple example here too.

Suppose we have this weird procedure:

create proc delete_stuff(age_ integer, name_ text)
begin
  if age_ is not null then
     delete from T where T.age = age_;
  end if;

  if name_ is not null then
     delete from T where T.name = name_;
  end if;
end;

What if we wanted to log any errors that happen here? Maybe make a verison that logs. We can do it like this:

create proc delete_and_log(like delete_stuff arguments)
begin
  begin try
    call delete_stuff(from arguments);
  end try;
  begin catch
    call printf("delete failed\n"); -- or whatever
    throw;
  end catch;
end;

The nice thing about this logging wrapper procedure is that if delete_stuff changes, the wrapper will change with it.

That covers all of the shape sources and as we saw we can use them to create things like cursors, tables, and argument lists. We can use them to specify a subset of columns that might be of interest when fetching or updating cursors. And we can use them in one last way -- to restrict arguments to a particular shape. Let's see how that works by making the previous logger a little different. Here we added an argument which tells if we should look. And that might look like it would spoil the from arguments part of the forwarding, but there is the final way to use LIKE.

create proc delete_and_log2(log bool not null, like delete_stuff arguments)
begin
  if log and age_ is not null then
    call printf("deleting %d\n", age_); -- or whatever
  end if;
  if log and name_ is not null then
    call printf("deleting %d\n", name_); -- or whatever
  end if;

  call delete_stuff(from arguments like delete_stuff arguments);
end;

So this form lets you use some of your arguments, the ones that match a certain shape. And as we saw in the previous article you can also use from C to pass arguments where C is a cursor and in that case you can also specify that arguments be matched by name from C like shape. In both those cases the formal parameter names of the called procedure are matched against the names of the shape and passed in the order of the formals. So this is like "call by name", the fields of the cursor or the order of arguments in the argument list might be different than the formals but you'll get the correct items in the correct order regardless, because it matches by name.

These forms can save you a lot of typing... and are excellent at avoiding errors and improving maintainability. Where they appear in SQL statements, everything is expanded before it goes to SQLite so SQLite will see normal syntax forms. Which is good because obviously SQLite doesn't know anything about this enhanced LIKE business.

In the examples above there were only one or two columns with very short names, but in real world code there can easily be dozens of columns with very long names. In those cases, these forms really shine.

Error Tracing Helper Macro

November 18, 2020 · 3 min read

CG/SQL Team

Maintainer of CG/SQL

Following up on the last blog entry, I thought it would be useful to present a simple error tracing macro that you can use to see what kind of error flow is going on when you're having trouble understanding why a procedure is returning an error code. The idea is we want to create a macro that we can use like this:

BEGIN_VERBOSE_STDERR_TRACING;

-- Some procedure(s) that you want to trace

END_VERBOSE_STDERR_TRACING;

We can do that with something like the below macros. These particular ones cause the output to go to stderr via fprintf but if that isn't what you need you can simply edit the macro. The macros looks like this:

-- manually force tracing on by redefining the cql_error_trace macro
#define BEGIN_VERBOSE_STDERR_TRACING \
    @echo c, "#undef cql_error_trace\n"; \
    @echo c, "#define cql_error_trace() fprintf(stderr, \"CQL Trace at %s:%d in %s: %d %s\\n\", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))\n"

#define END_VERBOSE_STDERR_TRACING \
    @echo c, "#undef cql_error_trace\n"; \
    @echo c, "#define cql_error_trace()\n"

So basically it's telling CQL to emit a #define into its output stream. In this case:

#define cql_error_trace() fprintf(stderr, "CQL Trace at %s:%d in %s: %d %s\n", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))

You could change that to any function you like, you can have it dump the errors where you like, or you can make it some dummy function you add so that you can set a breakpoint on it.

Whatever you do, do not leave your code with this sort of tracing enabled -- it's far too expensive in terms of code size. But it's perfect if you have this one procedure that is failing and it's hard for you to see where.

Obviously if you're making a custom trace thingy you don't need the macro at all, you can just emit your own #define with @echo as needed.

Note: @echo is quite a sledgehammer so don't use it lightly and not in production code but it is quite helpful for this sort of thing. CQL tests often use it to help make things visible to the tests. If you use @echo in weird ways you might not get working code when the codegen changes in the future.

The relevant state that is available to you inside a macro like this is:

__FILE__ the current filename (comes from the C pre-processor, this is the .c file name not the .sql)
__LINE__ the current line number (comes from the C pre-processor, this is the .c line number)
_rc_ the current SQLite result code (always the current return code in every CQL procedure that uses SQLite)
_db_ the current SQLite database pointer (always the current database in every CQL procedure that uses SQLite)
_PROC_ the current procedure name (CQL has a #define for this for you)

Introducing General Purpose Error Tracing

November 16, 2020 · 3 min read

CG/SQL Team

Maintainer of CG/SQL

Today we made a couple of minor changes in the code generation to take care of some lingering issues.

The first is that when you did a throw inside a catch to basically rethrow the error, you would lose the error code if something had succeeded within the catch handler.

The old codegen looked something like this:

  catch_start_1: {
    printf("error\n");
    cql_best_error(&_rc_)
    goto cql_cleanup;
  }

The problem being that while the printf above is fine and well, if you did any SQL operation then _rc_ would be clobbered and you'd end up throwing an unrelated error code. cql_best_error would at least make sure it was a failure code (SQLITE_ERROR) but the original error code was lost.

The new code looks like this:

  catch_start_1: {
    _rc_thrown_ = _rc_;
    printf("error\n");
    _rc_ = cql_best_error(_rc_thrown_);
    goto cql_cleanup;
  }

So now if there are db operations, the original return code is still preserved. Note: you still lose sqlite3_errmsg() because SQLite doesn't know that cleanup logic is running.

This brings us to the second new thing: general purpose error traces.

Error checking of result codes happens very consistently in CQL output. The usual pattern looks something like this:

  _rc_ = cql_exec(_db_,
    "SAVEPOINT base_proc_savepoint");
  if (_rc_ != SQLITE_OK) goto cql_cleanup;

or if it's inside a try block a little different... very little actually

  // try
  {
    _rc_ = cql_exec(_db_,
      "RELEASE SAVEPOINT base_proc_savepoint");
    if (_rc_ != SQLITE_OK) goto catch_start_8;
    // ... the rest of the try block
  }

Basically if the local _rc_ doersn't match the necessary condition we goto the appropriate error label... either the relevant catch block or else the procedure's cleanup code.

We generalize this a bit now so that it looks like this:

  if (_rc_ != SQLITE_OK) { cql_error_trace(); goto cql_cleanup; }

-- or, in a catch...

  if (_rc_ != SQLITE_OK) { cql_error_trace(); goto catch_start_8; }

Now the default implementation of cql_error_trace() is in cqlrt.h which you can and should customize. I'll be writing more about that later but suffice to say you're supposed to replace cqlrt.h and cqlrt.c with suitable runtime helpers for your environment while keeping cqlrt_common.h and cqlrt_common.c fixed.

So for instance, your cqlrt.h could look like this:

#ifndef CQL_TRACING_ENABLED
#define cql_error_trace()
#else
// whatever tracing you want, for example this might help in test code.
#define cql_error_trace() \
  fprintf(stderr, "Error at %s:%d in %s: %d %s\n", __FILE__, __LINE__, _PROC_, _rc_, sqlite3_errmsg(_db_))
#endif

So then when you need to debug problems involving lots of error recovery you can watch the entire chain of events easily.

Note that there are some useful variables there:

In any procedure _db_ is the current database and _rc_ is the most recent return code from SQLite. __FILE__ and __LINE__ of course come from the preprocessor. and _PROC_ (one underscore) is now generated by the compiler. Every procedure's body now begins with:

#undef _PROC_
#define _PROC_ "the_current_procedure"

So by defining your own cql_error_trace macro you can cause whatever logging you need to happen. Note this can be very expensive indeed because this happens a lot and even the string literals needed are a significant cost. So generally this should be off for production builds and enabled as needed for debug builds.

The default implementation is just an empty block

#define cql_error_trace()

But the hook is enough to light up whatever logging you might need, and you can use sqlite3_errmsg() before that message is gone.

Good hunting.