Chapter 3: Expressions, Literals, Nullability, Sensitivity

Until this point we've only discussed simple kinds of expressions and as well as variables and table columns marked with NOT NULL . These are indeed the easiest types for CQL to work with as they tend to correspond most directly to the types known to C. However SQL provides for many more types of expressions as well as nullable types and these require handling in any language that purports to be like SQL.

Expression Examples

The usual arithmetic operators apply in CQL:

Example expressions (these are all true)

(1 + 2) * 3 == 9
1 + 2 * 3 == 7
6 / 3 == 2
7 - 5 == 2
6 % 5 == 1
5 / 2.5 == 2
7 & 3 == 2 | 1
1 << 2 == 4

However, before going any further it's important to note that CQL is inherently a two-headed beast. Expressions are either evaluated by transpiling to C (like the predicate of an IF statement, or a variable assignment) or by sending them to SQLIte for evaluation (like expressions inside a SELECT statement or the WHERE part of a DELETE).

CQL evaluation rules are designed to be as similar as possible but some variance is inevitable because evaluation is done in two fundamentally different way.

Order of Evaluation

ASSIGNMENT: :=
LOGICAL_OR: OR
LOGICAL_AND: AND
EQUALITY: = == != <> IS, IS NOT, IN, NOT IN, LIKE, GLOB, MATCH, REGEXP
INEQUALITY: < <= > >=
LOGICAL_NOT: NOT
BINARY: << >> & |
BETWEEN: BETWEEN NOT BETWEEN
ADDITION: + -
MULIPLICATION: * / %
BINARY_NOT: ~
CONCAT: ||

NOTE: the above is NOT the C binding order (!!!) The Sqlite binding order is used in the language and parens are added in the C output as needed to force that order. CQL's rewriting emits minimal parens in all outputs. Different parens are often needed for SQL output.

Variables, Columns, Basic Types and Nullability

CQL needs type information for both variables in the code and columns in the database. Like SQL, CQL allows variables to hold a NULL value and just as in SQL the absence of NOT NULL implies that NULL is a legal value. Consider these examples:

-- real code should use better names than this :)
create table all_the_nullables(
i1 integer,
b1 bool,
l1 long,
r1 real,
t1 text,
bl1 blob
);
declare i2 integer;
declare b2 bool;
declare l2 long;
declare r2 real;
declare t2 text;
declare bl2 blob;

ALL of i1, i2, b1, b2, l1, l2, r1, r2, t1, t2, and bl1, bl2 are nullable. In some sense variables and columns declared nullable (by virtual of the missing NOT NULL) are the root sources of nullability in the SQL language. That and the NULL literal. Though there are other sources as we will see.

NOT NULL could be added to any of these, e.g.

-- real code should use better names than this :)
declare i_nn integer not null;

In the context of computing the types of expressions, CQL is statically typed and so it must make a decision about the type of any expression based on the type information at hand at compile time. As a result it handles the static type of an expression conservatively. If the result might be null then the expression is of a nullable type and the compiled code will include an affordance for the possibility of a null value at runtime.

The generated code for nullable types is considerably less efficient and so it should be avoided if that is reasonably possible.

CQL also has the special built-in variable @RC which refers to the most recent result code returned by a SQLite operation, e.g. 0 == SQLITE_OK, 1 == SQLITE_ERROR. @RC is of type integer not null. Note: there are many hidden SQLite operations such as statement finalization so @RC might change where there is no visible SQL operation. The most common use of @RC is to capture the error code in the first statement of a catch block.

Types of Literals

There are a number of literal objects that may be expressed in CQL. These are as follows:

String Literals

  • A double quoted string is a C style string literal
    • the usual simple C escape sequences are supported
    • the \xNN form for embedded hex characters is supported, however
    • the \0NNN octal form is not supported, and
    • embedded nulls in string literals (\0 or \0x00) are not supported (you must use blobs such cases)
  • a single quoted string is a SQL style string literal
    • No escape sequences are supported other than '' to indicate a single quote character (this is just like normal SQLite)
  • The sequence @FILE("some_string") is a special string literal
    • the value of this literal is the path of the current compiland starting at the letters in some_string, or
    • the entire path of the current compiland if some_string does not occur in the path
    • the purpose of the @FILE construct is to provide a partial path to a file for diagnostics that is consistent even if the file is built in various different root paths on different build machines

Blob Literals

  • SQLite Blob literals are supported in SQL contexts (i.e. where they will be processed by SQLite), CQL produces an error if you attempt to use a blob literal in a loose expression

Numeric Literals

  • All numeric literals are considered to be positive; negative numbers are actually a positive literal combined with unary minus (the negation operator)
  • Base 10 and hexadecimal literals are supported
  • Literals with a decimal point are of type REAL and stored as the C type double
  • Literals that can fit in a signed integer without loss, and do not end in the letter L are integer literals
  • Larger literals, or those ending with the letter L are long integer literals.
  • Literals that begin with 0x are interpreted as hex

Examples:

1.3 -- real
2L -- long
123456789123 -- long
123 -- integer
0x10 -- hex integer
0x10L -- hex long integer

The NULL literal

The use of NULL always gives a nullable result however this literal is special in that it has no storage class. NULL is not numeric, or string but rather mutates into whatever it is first combined with. For instance NULL + 1 results in a nullable integer. Because NULL has no primitive type in some cases where type knowledge is required you might have to use the CAST() function to cast the NULL to a specific type such as CAST(NULL as TEXT). This construct guarantees type consistence in cases like SELECT from different sources combined with UNION ALL

Note: constructs like CAST(NULL as TEXT) are always rewritten to just NULL before going to SQLite as the cast is uninteresting except for the type information which SQLite doesn't need/use anyway.

Other Considerations

There are no boolean literals other than the integers 0 and 1.

The C pre-processor is often combined with CQL in which case the _FILE_ and _LINE_ directives may be used to create literals; they will be preprocessed into normal literals.

The use of _FILE_ can give surprising results in the presence of build systems, hence the existence of @FILE(...).

Const and Enumerations

It's possible to use named constants in CQL with nothing more than the C pre-processor featuers that have already appeared, however use of #define in such a way is not entirely satisfactory. For one thing, CQL will not know these constants exist in any way as they will be replaced before it ever sees them. This means CQL can't provide their values for you in the JSON output for instance.

To help with this probvlem, CQL includes constants, note, this is not the same as enumerated types as we'll see later. You can now write something like this:

declare enum business_type integer (
restuarant,
laundromat,
corner_store = 11+3 /* math added for demo purposes only */
);

After this enum is declared, this:

select business_type.corner_store;

is the same as this:

select 14;

And that is exactly what SQLite will see, the literal 14.

You can also use the enum to define column types:

CREATE TABLE businesses (
name TEXT,
type business_type
);

CQL will then enforce you use the correct enum to access those columns. For example, this is valid:

SELECT * FROM businesses WHERE type = business_type.laundromat;

While this does not type check:

SELECT * FROM businesses WHERE type = business_corp_state.delaware;

Enumerations follow these rules:

  • the enumeration can be any numeric type (bool, integer, long integer, real)
  • the values of the enumeration start at 1 (i.e. if there is no = expression the first item will be 1, not 0)
  • if you don't specify a value, the next value is the previous value plus one
  • if you do specify a value it can be any constant expression and it will be cast to the type of the enumeration (even if that is lossy)
  • the enumeration can refer to previous values in itself with no qualification (big = 100.0, medium = big/2, small = medium/2)
  • the enumeration can refer to previously defined enumerations as usual (code = business_type.restaurant)
  • Once the enumeration is defined you refer to its members in a fully qualified fashion enum_name.member_name elsewhere

With these forms you get some additional useful output:

  • the JSON includes the enumerations and their values in their own section
  • you can use the @emit_enums directive to put declarations like this into the .h file that corresponds to the ccurrent compiland
enum business_type {
business_type__restaurant = 1,
business_type__laundromat = 2,
business_type__corner_store = 14
};

Note that C does not allow for floating point enumartions, so in case of floating point values such as:

declare enum floating real (
one = 1.0,
two = 2.0,
e = 2.71828,
pi = 3.14159
);

You get:

// enum floating (floating point values)
#define floating__one 1.000000e+00
#define floating__two 2.000000e+00
#define floating__e 2.718280e+00
#define floating__pi 3.141590e+00

In order to get useful expressions in enumeration values, constant folding and general evaluation was added to the compiler; these expressions work on any numeric type and the literal null. The supported operations include:

+, -, *, /, %, |, &, <<, >>, ~, and, or, not, ==, <=, >=, !=, <, >, the cast operator and the case forms (including the iif function). These are enough to make a lot of very interesting expressions, all of which are envaluated at compile time.

Constant folding was added to allow for rich enum expressions, but there is also the const() primitive in the language which can appear anywhere a literal could appear. This allows you do things like:

create table something(
x integer default const((1<<16)|0xf) /* again the math is just for illustration */
);

The const form is also very useful in macros:

#define SOMETHING const(12+3)

This form ensures that the constant will be evaluated at compile time. The const psuedo-function can also also nest so you can build these kinds of macros from other macros or you can build enum values this way. Anywhere you might need literals, you can use const.

Importantly, no enumerated data types were added to the language to do any of this. The following is an error:

declare enum my_enum integer (a, b);
declare x my_enum; /* my_enum is not a valid type: use integer, real, etc. */

Named Types

A common source of errors in stored procedures is incorrect typing in arguments. For instance, a particular key for an entity might need to be LONG or even always LONG NOT NULL or LONG NOT NULL @SENSITIVE and the only way to do this in the past was maybe with some #define thing. Otherwise you have to diligently get the type right in all the places, and should it ever change, again you have to visit all the places. To help with this situation, and to make code a little more self-describing we add named types to the language. This is a lot like typedef in the C language. They do not create different incompatible types but do let you name things well.

You can now write these sorts of forms:

declare foo_id type long not null;
create table foo(
id foo_id primary key autoincrement,
name text
);
create proc inserter(name_ text, out id foo_id)
begin
insert into foo(id, name) values(NULL, name_);
set id := last_insert_rowid();
end;
declare function func_return_foo_id() foo_id;
declare var foo_id;

Additionally any enumerated type can be used as a type name. e.g.

declare enum thing integer (
thing1,
thing2
);
declare thing_type type thing;

Enumerations always get "not null" in addition to their base type. Enumerations also have a unique "kind" associated, specfically the above enum has type integer<thing> not null. The rules for type kind are described below.

Type Kinds

Any CQL type can be tagged with a "kind" for instance real can become real<meters>, integer can become integer<job_id>. The idea here is that the additional tag, the "kind" can help prevent type mistakes in arguments and in columns and procedure calls. For instance:

create table things(
size real<meters>,
duration real<seconds>
);
create proc do_something(size_ real<meters>, duration_ real<seconds>)
begin
insert into things(size, duration) values(size_, duration_);
end;

In this situation you couldn't accidentally switch the columns in do_something even though both are real, and indeed SQLite will only see the type real for both. If you have your own variables typed real<size> and read<duration> you can't accidentally do:

call do_something(duration, size);

Even though both are real. The type kind won't match.

Importantly, an expression with no type kind is compatible with any type kind (or none). Hence all of the below are legal.

declare generic real;
set generic := size; -- no kind may accept <meters>
set generic := duration; -- no kind may accept <seconds>
set duration := generic; -- no kind may be stored in <seconds>

Only mixing types where both have a kind, and the kind is different generates errors. This choice allows you to write procedures that (for instance) log any integer or any real, or that return an integer out of a collection or some such.

These rules are applied to comparisons, assignments, column updates, everywhere types are checked for compatibility.

To get the most value out of these constructs, the authors recommend that type kinds be used universally except when the extra compatibility described above is needed (like low level helper functions).

Importantly type kind can be applied to object types as well, allowing object<dict> to be distinct from object<list>.

At run time the kind information is lost. But it does find its way into the JSON output so external tools also get to see the kinds.

Nullability Rules

General Rule

Except as noted in the exceptions below, the result of an operator is a nullable type if and only if any of its operands are of nullable type. This applies no matter the number of operands the operator requires.

Note: CQL does not do constant folding or other inferencing except in const(), it only uses the types of the values so even an expression such as NULL or 1 is of nullable type even though plainly it must evaluate to 1. This may change in time but presently no attempt is made to compute values to improve type information except in const() expressions.

Identifiers and Literals

Nullable variables and columns are always nullable. The NULL literal is nullable. Other literals are not nullable.

IS and IS NOT

These operators always return a non-null boolean.

IN and NOT IN

In an expression like needle IN (haystack) the result is always a boolean. The boolean is nullable if and only if needle is nullable. The presence of nulls in the the haystack is irrelevant.

The IN operator behaves like a series of equality tests (i.e. == tests not IS tests). NOT IN behaves symmetrically.

NOTE: SQLite has slightly different nullability rules for IN and NOT IN q.v. This is only place where CQL has different evaluation rules by design.

CASE ..WHEN ..THEN.. ELSE.. END

The following rules apply when considering nullability of a CASE expression.

  • if there is no ELSE clause, the result is nullable.
  • if any of the output values (i.e. any THEN or ELSE values) are nullable, the result is nullable
  • otherwise the result is not nullable

The SQL CASE construct is quite powerful and unlike the C switch statement it is actually an expression. So this operator is rather more like a highly generalized ternary a ? b : c operator rather than the C switch statement. There can be arbitrarily many conditions specified each with their own result and the conditions need not be constants, and typically are not.

The IFNULL and COALESCE Functions

These functions are nullable or non-nullable based on their arguments. If the IFNULL or COALESCE call has at least one non-nullable argument then the result is non-nullable.

The cql_attest_notnull function

Sometimes ifnull is not possible, e.g. if the operand is an object or blob type. The attest form gives you a type conversion to not null with a runtime check. It cannot be used in SQLite contexts because the function is not known to SQLite. But in loose expressions you can write this important pattern:

create proc foo(x object)
begin
if x is not null then
declare xo object not null;
set xo := cql_attest_notnull(x);
-- use xo where a non-null object is needed
end;
end;
#### LEFT and RIGHT OUTER JOINS
In most join operations the nullability of each column participating in the join is preserved. However in a
`LEFT OUTER` join the columns on the right side of the join are always considered nullable and in a
`RIGHT OUTER` join the columns on the left side of the join are considered nullable.
NOTE: CQL does not use constraints in the `WHERE`, `ON`, or `HAVING` clause to infer non-null status even where this would be possible. CQL does no data-flow at all. e.g. `select foo where foo is not null` does not change the type of the column. You can only do this with expression forms that strip the nullability (e.g. `IFNULL`).
### Expression Types
CQL supports a variety of expressions, nearly everything from the SQLite world. The following are the various supported operators, they are presented in order from the weakest binding strength to the strongest.
Note that the binding order is NOT the same as C, in some cases it is radically different (e.g. boolean math)
#### UNION and UNION ALL
These appear only in the context of `SELECT` statements. The arms of a compound select may include `FROM`, `WHERE`, `GROUP BY`, `HAVING`, and `WINDOW`. If `ORDER BY` or `LIMIT ... OFFSET` are present, these apply to the entire UNION.
example:
```sql
select A.x x from A inner join B using(z)
union all
select C.x x from C
where x = 1;

The WHERE applies only to the second select in the union. And each SELECT is evaluated before the the UNION ALL

select A.x x from A inner join B using(z)
where x = 3
union all
select C.x x from C
where x = 1
order by x;

The ORDER BY applies to the result of the union, so any results from the 2nd branch will sort before any results from the first branch (because x is constrained in both).

Assignment

Assignment only occurs in the UPDATE statement or in the SET statement. In both cases the left side is a simple target and the right side is a general expression. The expression is evaluated before the assignment.

example:

SET x := 1 + 3 AND 4; -- + before AND then :=

Logical OR

The logical OR operator does shortcut evaluation, much like the C || operator (not to be confused with SQL's concatenation operator with the same lexeme).

The truth table for logical OR is as follows:

ABA OR B
000
011
0NULLNULL
101
111
1NULL1
NULL0NULL
NULL11
NULLNULLNULL

Logical AND

The logical AND operator does shortcut evaluation, much like the C && operator, so if the left side is zero the result is 0 and the right side is not evaluated.

The truth table for logical AND is as follows:

ABA AND B
000
010
0NULL0
100
111
1NULLNULL
NULL00
NULL1NULL
NULLNULLNULL

BETWEEN and NOT BETWEEN

These are a ternary type operation. The general forms are:

expr1 BETWEEN expr2 AND expr3
expr1 NOT BETWEEN expr2 AND expr3

Note that there is an inherent ambiguity in the language because expr2 or expr3 could be logical expressions that include AND. CQL resolves this ambiguity by insisting that expr2 and expr3 be "math expressions" in the grammar.

"Math expressions" consist of:

  • bitwise & and |
  • bitwise left shift or right shift (<< and >>)
  • addition (+)
  • subtraction (-)
  • multiplication (*)
  • division (/)
  • modulus (%)
  • unary negation (-)
  • literals
  • any parenthesized expression

Hence:

-- oh hell no (syntax error)
a between 1 and 2 and 3;
-- all ok
a between (1 and 2) and 3;
a between 1 and (2 and 3);
a between 1 + 2 and 12 / 2;

These considerations force the grammar to contain special rules for expressions after BETWEEN.

Logical NOT

The one operand of logical not must be a numeric. NOT 'x' is illegal.

Non-ordering tests !=, <>, =, ==, LIKE, GLOB, MATCH, IN, NOT IN, IS, IS NOT

These operations do some non-ordered comparison of their two operands.

  • IS and IS NOT never return NULL, So for instance X IS NOT NULL gives the natural answer. x IS y is true if and only if: 1. both x and y are NULL or 2. if they are equal.
  • The other operators return NULL if either operand is NULL and otherwise perform their usual test to produce a boolean
  • != and <> are equivalent as are = and ==
  • strings and blobs compare equal based on their value, not their identity (i.e. not the string/blob pointer)
  • objects compare equal based on their address, not their content (i.e. reference equality)
  • MATCH and GLOB only valid in SQL contexts, LIKE can be used in any context (a helper method to do LIKE in C is provided by SQLite, but not the others)
NULL IS NULL -- this is true
(NULL == NULL) IS NULL -- this is also true because NULL == NULL is not 1, it's NULL.
(NULL != NULL) IS NULL -- this is also true because NULL != NULL is not 0, it's also NULL.

Ordering comparisons <, >, <=, >=

These operators do the usual order comparison of their two operands.

  • If either operand is NULL the result is NULL
  • Objects and Blobs may not be compared with these operands
  • Strings are compared based on their value (as with other comparisons) not their address
  • Numerics are compared as usual with the usual promotion rules

NOTE: CQL uses strcmp for string comparison. In SQL expressions the comparison happens in whatever way SQLite has been configured. Typically general purpose string comparison should be done with helper functions that deal with collation and other considerations. This is a very complex topic and CQL is largely silent on it.

Bitwise operators <<, >>, &, |

These are the bit-manipulation operations. Their binding strength is VERY different than C so beware. And notably the & operator has the same binding strength as the | operator so they bind left to right, this is utterly unlike most systems. Many parenthesis are likely to be needed to get the usual "or of ands" patterns codified correctly. Likewise the shift operators << and >> are the same strength as & and | which is very atypical. Consider:

x & 1 << 7; -- probably doesn't mean what you think (this is not ambigous, it's well defined, but unlike C)
(x & 1) << 7; -- means the same as the above
x & (1 << 7) -- probably what you intended

Note that these operators only work on integer and long integer data. If any operand is NULL the result is `NULL.

Addition and Subtraction +, -

These operators do the typical math. Note that there are no unsigned numerics so it's always signed math that is happening here.

  • operands are promoted to the "biggest" type involved as previously described (bool -> int -> long -> real)
  • only numeric operands are legal (no adding strings)
  • if any operand is NULL the result is NULL

Multiplication, Division, Modulus *, /, %

These operators do the typical math. Note that there are no unsigned numerics so it's always signed math that is happening here.

  • operands are promoted to the "biggest" type as previously described (bool -> int -> long -> real)
  • only numeric operands are legal (no multiplying strings)
  • if any operand is NULL the result is NULL

EXCEPTION: the % operator doesn't make sense on real values, so real values produces an error.

Unary operators -, ~

Unary negation (-) and bitwise invert (~) are the strongest binding operators.

  • The ~ operator only works on integer types (not text, not real)
  • the usual promotion rules otherwise apply
  • if the operand is NULL the result is NULL

CASE Expressions

The case expression has two major forms and provides a great deal of flexibility in an expression. You can kind of think of it as the C ?: operator on steroids.

set x := 'y';
select case x
when 'y' then 1
when 'z' then 2
else 3
end;

In this form the expression in the case x here is evaluated exactly once and then compared against each when clause, they must be type compatible with the expression. The then expression that corresponds is evaluated and becomes the result, or the else expression if present and no when matches. If there is no else and no match the result is null.

If that's not general enough, there is an alternate form:

set y := 'yy';
set z := 'z';
select case
when y = 'y' then 1
when z = 'z' then 2
else 3
end;

The second form, where there is no value before the first when keyword, each when expression is a separate independent boolean expression. The first one that evaluates to true causes the corresponding then to be evaluated and that becomes the result. If there are no matches the result is the else expression, or null if there is no else.

The result types must be compatible and the best type to hold the answer is selected with the usual promotion rules.

SELECT expressions

Single values can be extracted from SQLite using an inline select expression. For instance:

set x_ := (select x from somewhere where id = 1);

The select statement in question must extract exactly one column and the type of the expression becomes the type of the column. This form can appear anywhere an expression can be appear, though it is most commonly used in assignments. Something like this would also be valid:

if (select x from somewhere where id = 1) == 3 then
...
end if;

The select statement can of course be arbitrarily complex.

Note, if the select statement returns no rows this will result in the normal error flow. In that case, the error code will be SQLITE_DONE, which is treated like an error because in this context SQLITE_ROW is expected as a result of the select. This is not a typical error code can be quite surprising to callers. If you're seeing this failure mode it usually means the code had no affordance for the case where there were no rows and probably that situation should have been handled. This is an easy mistake to make, so to avoid it, CQL also supports these more tolerant forms:

set x_ := (select x from somewhere where id = 1 if nothing -1);

And even more generally if the schema allows for null vales and those are not desired:

set x_ := (select x from somewhere where id = 1 if nothing or null -1);

Both of these are much safer to use as only genuine errors (e.g. the table was dropped and no longer exists) will result in the error control flow.

Again note that:

set x_ := (select ifnull(x,-1) from somewhere where id = 1);

Would not avoid the SQLITE_DONE error code, because no rows returned is not at all the same as a null value returned.

The if nothing or null form above is equivalent to the following, but it is more economical, and probably clearer:

set x_ := (select ifnull(x,-1) from somewhere where id = 1 if nothing -1);

To compute the type of the overall expression, the rules are almost the same as normal binary operators. In particular:

  • if the default expression is present it must be type compatible with the select result
    • the result type is the smallest type that holds both the select value and the default expression (see normal promotion rules above)
  • object types are not allowed (SQLite cannot return an object)
  • in (select ...) the result type is not null if and only if the select result type is not null (see select statement, many cases)
  • in (select ... if nothing) the result type is not null if and only if both the select result and the default expression types are not null (normal binary rules)
  • in (select ... if nothing or null) the result type is not null if and only if the default expression type is not null

Finally, the form (select ... if nothing throw) is allowed; this form is exactly the same as normal (select ...) but makes the explicit that the error control flow will happen if there is no row. Consequently this form is allowed even if @enforce_strict select if nothing is in force.

Marking Data as Sensitive

CQL supports the notion of 'sensitive' data in a first class way. You can think of it as very much like nullability; It largely begins by tagging data columns with @sensitive

Rather than go through the whole calculus, it's easier to understand by a series of examples. So let's start with a table with some sensitive data.

create table with_sensitive(
id integer,
name text @sensitive,
sens integer @sensitive
);

The most obvious thing you might do at this point is create a stored proc that would read data out of that table. Maybe something like this:

create proc get_sensitive()
begin
select id as not_sensitive_1,
sens + 1 sensitive_1,
name as sensitive_2,
'x' as not_sensitive_2,
-sens as sensitive_3,
sens between 1 and 3 as sensitive_4
from with_sensitive;
end;

So looking at that procedure we can see that it's reading sensitive data, the result will have some sensitive columns in it.

  • the "id" is not sensitive (at least not in this example)
  • sens + 1 is sensitive, math on a sensitive field leaves it sensitive
  • name is sensitive, it began that way and is unchanged
  • 'x' is just a string literal, it's not sensitive
  • -sens is sensitive, that's more math
  • and the between expression is also sensitive

Generally sensitivity is "radioactive" anything it touches becomes sensitive. This is very important because even a simple looking expression like lgbtqia IS NOT NULL must lead to a sensitive result or the whole process would be largely useless. It has to be basically impossible to wash away sensitivity.

These rules apply to normal expressions as well as expressions in the context of SQL. Accordingly:

Sensitive variables can be declared:

declare sens integer @sensitive;

Simple operations on the variables are sensitive

-- this is sensitive (and the same would be true for any other math)
sens + 1;

The IN expression gives you sensitive results if anything about it is sensitive

-- all of these are sensitive
sens in (1, 2);
1 in (1, sens);
(select id in (select sens from with_sensitive));

Similarly sensitive constructs in CASE expressions result in a sensitive output

-- not sensitive
case 0 when 1 then 2 else 3 end;
-- all of these are sensitive
case sens when 1 then 2 else 3 end;
case 0 when sens then 2 else 3 end;
case 0 when 1 then sens else 3 end;
case 0 when 1 then 2 else sens end;

Cast operations preserve sensitivity

-- sensitive result
select cast(sens as INT);

Aggregate functions likewise preserve sensitivity

-- all of these are sensitive
select AVG(T1.sens) from with_sensitive T1;
select MIN(T1.sens) from with_sensitive T1;
select MAX(T1.sens) from with_sensitive T1;
select SUM(T1.sens) from with_sensitive T1;
select COUNT(T1.sens) from with_sensitive T1;

There are many operators that get similar treatment such as COALESCE, IFNULL, IS and IS NOT.

Things get more interesting when we come to the EXISTS operator:

-- sensitive if and only if any selected column is sensitive
exists(select * from with_sensitive)
-- sensitive because "info" is sensitive
exists(select info from with_sensitive)
-- not sensitive because "id" is not sensitive
exists(select id from with_sensitive)

If this is making you nervous, it probably should, we need a little more protection because of the way EXISTS is typically used. The predicates matter, consider the following:

-- id is now sensitive because the predicate of the where clause was sensitive
select id from with_sensitive where sens = 1;
-- this expression is now sensitive because id is sensitive in this context
exists(select id from with_sensitive where sens = 1)

In general: if the predicate of a WHERE or HAVING clause is sensitive then all columns in the result become sensitive.

Similarly when performing joins, if the column specified in the USING clause is sensitive or the predicate of the ON clause is sensitive then the result of the join is considered to be all sensitive columns (even if the columns were not sensitive in the schema).

Likewise a sensitive expression in LIMIT or OFFSET will result in 100% sensitive columns as these can be used in a WHERE-ish way. There is no reasonble defense against using LIMIT and testing for the presence or absence of a row as a way to wash away sensitivity so that is a weakness, but the rules that are present are likely to be very helpful.

-- join with ON
select T1.id from with_sensitive T1 inner join with_sensitive T2 on T1.sens = T2.sens
-- join with USING
select T1.id from with_sensitive T1 inner join with_sensitive T2 using(sens);

All of these expression and join propagations are designed to make it impossible to simply wash-away sensitivity with a little bit of math.

Now we come to enforcement, which boils down to what assignments or "assignment-like" operations we allow.

If we have these:

declare sens integer @sensitive;
declare not_sens integer;

We can use those as stand-ins for lots of expressions, but the essential calculus goes like this:

-- assigning a sensitive to a sensitive is ok
set sens := sens + 1;
-- assigning not sensitive data to a sensitive is ok
-- this is needed so you can (e.g.) initialize to zero
set sens := not_sens;
-- not ok
set not_sens := sens;

Now these "assignments" can happen in a variety of ways:

  • you can set an out parameter of your procedure
  • when calling a function or procedure, we require:
    • any IN parameters of the target be "assignable" from the value of the argument expression
    • any OUT parameters of the target be "assignable" from the procedures type to the argument variable
    • any IN/OUT parameters require both the above

Now it's possible to write a procedure that accepts sensitive things and returns non-sensitive things. This is fundamentally necessary because the proc must be able return (e.g.) a success code, or encrypted data, that is not sensitive. However, if you write the procedure in CQL it, too, will have to follow the assignment rules and so cheating will be quite hard. The idea here is to make it easy to do handle sensitive data well and make typical mistakes trigger errors.

With these rules it's possible to compute the the type of procedure result sets and also to enforce IN/OUT parameters. Since the signature of procedures is conveniently generated with --generate_exports good practices are fairly easy to follow and sensitivity checks flow well into your programs.

This is a brief summary of CQL semantics for reference types -- those types that are ref counted by the runtime.

The three reference types are:

  • TEXT
  • OBJECT
  • BLOB

Each of these has their own macro for retain and release though all three actually turn into the exact same code in all the cu rrent CQL runtime implementations. In all cases the object is expected to be promptly freed when the reference count falls to zero.

Reference Semantics

Stored Procedure Arguments

  • in and inout arguments are not retained on entry to a stored proc
  • out arguments are assumed to contain garbage and are nulled without retaining on entry
  • if your out argument doesn't have garbage in it, then it is up to you do release it before you make a call
  • When calling a proc with an out argument CQL will release the argument variable before the call site, obeying its own contract

Local Variables

  • assigning to a local variable retains the object, and then does a release on the previous object
  • this order is important, all assignments are done in this way in case of aliasing (release first might accidentally free too soon)
  • CQL calls release on all local variable when the method exits

Assigning to an out parameter or a global variable

  • out,inoutparameters, and global variables work just like local variables except that CQL does not callrelease` at the end of the procedure

Function Return Values

Stored procedures do not return values, they only have out arguments and those are well defined as above. Functions however are also supported and they can have either get or create semantics

Get Semantics

If you declare a function like so:

declare function Getter() object;

Then CQL assumes that the returned object should follow the normal rules above, retain/release will balance by the end of the procedure for locals and globals or out arguments could retain the object.

Create Semantics

If you declare a function like so:

declare function Getter() create text;

Then CQL assumes that the function created a new result which it is now responsible for releasing. In short the returned object is assumed to arrive with a retain count of 1 already on it. When CQL stores this return value it will:

  • release the object that was present at the storage location (if any)
  • copy the returned pointer without further retaining it this one time

As a result if you store the returned value in a local variable it will be released when the procedure exits (as usual) or if you instead store the result in a global or an out parameter the result will survive to be used later.

Comparison

CQL tries to adhere to normal SQL comparison rules but with a C twist.

OBJECT

The object type has no value based comparison, so there is no <, > and so forth.

The following table is useful. Let's suppose there are exactly two distinct objects 'X' and 'Y'

true expressions: X = X X <> Y Y = Y Y <> X X IN (X, Y) X NOT IN (Y)

false expressions: X = Y X <> X Y = X Y <> Y X NOT IN (X, Y)

null expressions: null = null X <> null x = null null <> null Y <> null y = null

null = null resulting in null is particular surprising but consistent with the usual SQL rules. And again, as in SQL, the IS operator returns true for X IS Y even if both are null.

TEXT

Text has value comparison semantics but normal string comparison is done only with strcmp which is of limited value. Typically you'll want to either delegate the comparison to Sqlite (with (select x < y)) or else use a helper function with a suitable comparison mechanism.

For text comparisons including equality:

true: if and only if both operands are not null and the comparison matches (using strcmp) false: if and only if both operands are not null and the comparison does not match (using strcmp) null: if and only if at least one operand is null

EXAMPLE: 'x' < 'y' is true because strcmp("x", "y") < 0

The IS and IS NOT operators behave similarly to equality and inequality, but never return null. If X is some value that doesn't happen to be null then we have the following:

true: null is null X is X X is not null null is not X false: null is not null X is not X X is null null is X

The IN and NOT IN operators also work for text using the same value comparisons as above. Additionally there are special text comparison operators such as LIKE, MATCH and GLOB. These comparisons are defined by SQLite.

BLOB

Blobs are compared by value (equivalent to memcmp) but have no well-defined ordering. The memcmp order is deemed not helpful as blobs usually have internal structure hence the valid comparisons are only equality and inequality. You can use user defined functions to do better comparisons of your particular blobs if needed.

The net comparison behavior is otherwise just like strings.

Sample Code

Out Argument Semantics

DECLARE FUNCTION foo() OBJECT;
CREATE PROC foo_user (OUT baz OBJECT)
BEGIN
SET baz := foo();
END;
void foo_user(cql_object_ref _Nullable *_Nonnull baz) {
*(void **)baz = NULL; // set out arg to non-garbage
cql_set_object_ref(baz, foo());
}

Function with Create Semantics

DECLARE FUNCTION foo() CREATE OBJECT;
CREATE PROCEDURE foo_user (INOUT baz OBJECT)
BEGIN
DECLARE x OBJECT;
SET x := foo();
SET baz := foo();
END;
void foo_user(cql_object_ref _Nullable *_Nonnull baz) {
cql_object_ref x = NULL;
cql_object_release(x);
x = foo();
cql_object_release(*baz);
*baz = foo();
cql_cleanup:
cql_object_release(x);
}

Function with Get Semantics

DECLARE FUNCTION foo() OBJECT;
CREATE PROCEDURE foo_user (INOUT baz OBJECT)
BEGIN
DECLARE x OBJECT;
SET x := foo();
SET baz := foo();
END;
void foo_user(cql_object_ref _Nullable *_Nonnull baz) {
cql_object_ref x = NULL;
cql_set_object_ref(&x, foo());
cql_set_object_ref(baz, foo());
cql_cleanup:
cql_object_release(x);
}
Last updated on by Rico Mariani