Chapter 3: Expressions, Literals, Nullability, Sensitivity
Until this point we've only discussed simple kinds of expressions as well as variables and table columns marked with NOT NULL
. These are indeed the easiest types for CQL to work with as they tend to correspond most directly to the types known to C. However,
SQL provides for many more types of expressions as well as nullable types and these require handling in any language that purports to be like SQL.
Expression Examplesβ
The usual arithmetic operators apply in CQL:
Example expressions (these are all true)
(1 + 2) * 3 == 9
1 + 2 * 3 == 7
6 / 3 == 2
7 - 5 == 2
6 % 5 == 1
5 / 2.5 == 2
7 & 3 == 2 | 1
1 << 2 == 4
However, before going any further it's important to note that CQL is inherently a two-headed beast. Expressions are either evaluated by transpiling to C (like the predicate of an IF statement, or a variable assignment) or by sending them to SQLIte for evaluation (like expressions inside a SELECT
statement or the WHERE
part of a DELETE
).
CQL evaluation rules are designed to be as similar as possible but some variance is inevitable because evaluation is done in two fundamentally different ways.
Operator Precedenceβ
The operator precedence rules in CQL are as follows; the top-most rule binds the most loosely and the bottom-most rule binds the most tightly:
ASSIGNMENT: :=
LOGICAL_OR: OR
LOGICAL_AND: AND
LOGICAL_NOT: NOT
EQUALITY: = == != <> IS [NOT], [NOT] IN, [NOT] LIKE,
[NOT] MATCH, [NOT] GLOB, [NOT] BETWEEN
INEQUALITY: < <= > >=
BINARY: << >> & |
ADDITION: + -
MULTIPLICATION: * / %
CONCAT: ||
COLLATE: COLLATE
UNARY: ~ -
The above rules are not the same as C's operator precedence rules! Instead, CQL follows SQLite's rules. Parentheses are emitted in the C output as needed to force that order.
NOTE: CQL emits minimal parentheses in all outputs. Different parentheses are often needed for SQL output as opposed to C output.
Order of Evaluationβ
In contrast to C, CQL guarantees a left-to-right order of evaluation for arguments. This applies both to arguments provided to the operators mentioned in the previous section as well as arguments provided to procedures.
Variables, Columns, Basic Types and Nullabilityβ
CQL needs type information for both variables in the code and columns in the database.
Like SQL, CQL allows variables to hold a NULL value and just as in SQL the absence
of NOT NULL
implies that NULL
is a legal value. Consider these examples:
-- real code should use better names than this :)
create table all_the_nullables(
i1 integer,
b1 bool,
l1 long,
r1 real,
t1 text,
bl1 blob
);
declare i2 integer;
declare b2 bool;
declare l2 long;
declare r2 real;
declare t2 text;
declare bl2 blob;
ALL of i1
, i2
, b1
, b2
, l1
, l2
, r1
, r2
, t1
, t2
, and bl1
, bl2
are nullable.
In some sense variables and columns declared nullable (by virtue of the missing NOT NULL
) are the
root sources of nullability in the SQL language. That and the NULL
literal. Though there are other
sources as we will see.
NOT NULL
could be added to any of these, e.g.
-- real code should use better names than this :)
declare i_nn integer not null;
In the context of computing the types of expressions, CQL is statically typed and so it must make a decision about the type of any expression based on the type information at hand at compile time. As a result it handles the static type of an expression conservatively. If the result might be null then the expression is of a nullable type and the compiled code will include an affordance for the possibility of a null value at runtime.
The generated code for nullable types is considerably less efficient and so it should be avoided if that is reasonably possible.
LET Statementβ
You can declare and initialize a variable in one step using the LET
form, e.g.
LET x := 1;
The named variable is declared to be the exact type of the expression on the right. More on expressions in the coming sections. The right side is often a constant in these cases but does not need to be.
LET i := 1; -- integer not null
LET l := 1L; -- long not null
LET t := "x"; -- text not null
LET b := x IS y; -- bool not null
LET b := x = y; -- bool (maybe not null depending on x/y)
The pseudo function "nullable" removes not null
from the type of its argument but otherwise does no computation.
This can be useful to initialize nullable types.
LET n_i := nullable(1); -- nullable integer variable initialized to 1
LET n_l := nullable(1L); -- nullable long variable initialized to 1
The pseudo function "sensitive" adds @sensitive
to the type of its argument but otherwise does no computation.
This also can be useful to initialize nullable types.
LET s_i := sensitive(1); -- sensitive nullable integer variable initialized to 1
LET s_l := sensitive(1L); -- sensitive nullable long variable initialized to 1
The @RC
special variableβ
CQL also has the special built-in variable @RC
which refers to the most recent error code returned by a SQLite operation, e.g. 0 == SQLITE_OK
, 1 == SQLITE_ERROR
. @RC
is of type integer not null
. Specifically:
- each catch block captures the error code when it is entered into its own local variable
- this variable is created lazily, so it only exists if it is used
- the variable is called
_rc_thrown_n
where n is the catch block number in the procedure
- the variable is called
- any reference to
@RC
refers to the above error variable of the innermost catch block the@RC
reference is in - if the
@RC
reference happens outside of any catch block its value isSQLITE_OK
(i.e. zero).
Types of Literalsβ
There are a number of literal objects that may be expressed in CQL. These are as follows:
String Literalsβ
- A double quoted string is a C style string literal
- the usual simple C escape sequences are supported
- the \xNN form for embedded hex characters is supported, however
- the \0NNN octal form is not supported, and
- embedded nulls in string literals (\0 or \0x00) are not supported (you must use blobs in such cases)
- A single quoted string is a SQL style string literal
- No escape sequences are supported other than
''
to indicate a single quote character (this is just like normal SQLite)
- No escape sequences are supported other than
- A sequence of single or double quoted strings separated by whitespace such as "xx" 'yy' "zz" which are concatenated to make one literal
- The sequence @FILE("some_string") is a special string literal
- the value of this literal is the path of the current compiland starting at the letters in
some_string
, or - the entire path of the current compiland if
some_string
does not occur in the path - the purpose of the
@FILE
construct is to provide a partial path to a file for diagnostics that is consistent even if the file is built in various different root paths on different build machines
- the value of this literal is the path of the current compiland starting at the letters in
Blob Literalsβ
- SQLite Blob literals are supported in SQL contexts (i.e. where they will be processed by SQLite), CQL produces an error if you attempt to use a blob literal in a loose expression
Numeric Literalsβ
- All numeric literals are considered to be positive; negative numbers are actually a positive literal combined with unary minus (the negation operator)
- Base 10 and hexadecimal literals are supported
- Literals with a decimal point are of type
REAL
and stored as the C typedouble
- Literals that can fit in a signed integer without loss, and do not end in the letter
L
are integer literals - Larger literals, or those ending with the letter
L
are long integer literals. - Literals that begin with 0x are interpreted as hex
Examples:
1.3 -- real
2L -- long
123456789123 -- long
123 -- integer
0x10 -- hex integer
0x10L -- hex long integer
The NULL literalβ
The use of NULL
always gives a nullable result however this literal is special in that it has no storage class. NULL
is neither numeric nor string itself but rather mutates into
whatever it is first combined with. For instance NULL + 1
results in a nullable integer. Because NULL
has no primitive type in some cases where type knowledge
is required you might have to use the CAST() function to cast the NULL to a specific type such as CAST(NULL as TEXT)
. This construct guarantees type consistence in cases like SELECT
from different sources combined with UNION ALL
Note: constructs like CAST(NULL as TEXT)
are always rewritten to just NULL
before going to SQLite as the cast is uninteresting except for the type information which SQLite doesn't need/use anyway.
Other Considerationsβ
There are no boolean literals other than the integers 0
and 1
.
The C pre-processor is often combined with CQL in which case the _FILE_
and _LINE_
directives may be used to create literals; they will be preprocessed into normal literals.
The use of _FILE_
can give surprising results in the presence of build systems, hence the existence of @FILE(...)
.
Const and Enumerationsβ
It's possible to use named constants in CQL with nothing more than the C pre-processor features that have already appeared, however use of #define in such a way is not entirely satisfactory. For one thing, CQL will not know these constants exist in any way as they will be replaced before it ever sees them. This means CQL can't provide their values for you in the JSON output for instance.
To help with this problem, CQL includes constants, note, this is not the same as enumerated types as we'll see later. You can now write something like this:
declare enum business_type integer (
restaurant,
laundromat,
corner_store = 11+3 /* math added for demo purposes only */
);
After this enum is declared, this:
select business_type.corner_store;
is the same as this:
select 14;
And that is exactly what SQLite will see, the literal 14
.
You can also use the enum to define column types:
CREATE TABLE businesses (
name TEXT,
type business_type
);
CQL will then enforce that you use the correct enum to access those columns. For example, this is valid:
SELECT * FROM businesses WHERE type = business_type.laundromat;
While this does not type check:
SELECT * FROM businesses WHERE type = business_corp_state.delaware;
Enumerations follow these rules:
- the enumeration can be any numeric type (bool, integer, long integer, real)
- the values of the enumeration start at 1 (i.e. if there is no
= expression
the first item will be1
, not0
) - if you don't specify a value, the next value is the previous value plus one
- if you do specify a value it can be any constant expression and it will be cast to the type of the enumeration (even if that is lossy)
- the enumeration can refer to previous values in itself with no qualification
(big = 100.0, medium = big/2, small = medium/2)
- the enumeration can refer to previously defined enumerations as usual
(code = business_type.restaurant)
- once the enumeration is defined you refer to its members in a fully qualified fashion
enum_name.member_name
elsewhere
With these forms you get some additional useful output:
- the JSON includes the enumerations and their values in their own section
- you can use the
@emit_enums
directive to put declarations like this into the.h
file that corresponds to the current compiland
enum business_type {
business_type__restaurant = 1,
business_type__laundromat = 2,
business_type__corner_store = 14
};
Note that C does not allow for floating point enumerations, so in case of floating point values such as:
declare enum floating real (
one = 1.0,
two = 2.0,
e = 2.71828,
pi = 3.14159
);
you get:
// enum floating (floating point values)
#define floating__one 1.000000e+00
#define floating__two 2.000000e+00
#define floating__e 2.718280e+00
#define floating__pi 3.141590e+00
In order to get useful expressions in enumeration values, constant folding and general evaluation was added to the compiler; these expressions work on any numeric type and the literal null. The supported operations include:
+
, -
, *
, /
, %
, |
, &
, <<
, >>
, ~
, and
, or
, not
, ==
, <=
, >=
, !=
, <
, >
, the cast
operator
and the case
forms (including the iif
function). These are enough to make a lot of very interesting expressions, all of
which are evaluated at compile time.
Constant folding was added to allow for rich enum
expressions, but there is also the const()
primitive in the
language which can appear anywhere a literal could appear. This allows you do things like:
create table something(
x integer default const((1<<16)|0xf) /* again the math is just for illustration */
);
The const
form is also very useful in macros:
#define SOMETHING const(12+3)
This form ensures that the constant will be evaluated at compile time. The const
pseudo-function can also nest
so you can build these kinds of macros from other macros or you can build enum values this way.
Anywhere you might need literals, you can use const
.
Named Typesβ
A common source of errors in stored procedures is incorrect typing in arguments. For instance, a particular key
for an entity might need to be LONG
or even always LONG NOT NULL
or LONG NOT NULL @SENSITIVE
and the only
way to do this in the past was maybe with some #define
thing. Otherwise you have to diligently get the type right
in all the places, and should it ever change, again you have to visit all the places. To help with this situation,
and to make the code a little more self-describing we added named types to the language. This is a lot like typedef
in
the C language. They do not create different incompatible types but they do let you name things well.
You can now write these sorts of forms:
declare foo_id type long not null;
create table foo(
id foo_id primary key autoincrement,
name text
);
create proc inserter(name_ text, out id foo_id)
begin
insert into foo(id, name) values(NULL, name_);
set id := last_insert_rowid();
end;
declare function func_return_foo_id() foo_id;
declare var foo_id;
Additionally any enumerated type can be used as a type name. e.g.
declare enum thing integer (
thing1,
thing2
);
declare thing_type type thing;
Enumerations always get "not null" in addition to their base type. Enumerations also have a unique "kind" associated,
specifically the above enum has type integer<thing> not null
. The rules for type kinds are described below.
Type Kindsβ
Any CQL type can be tagged with a "kind" for instance real
can become real<meters>
, integer
can become integer<job_id>
. The idea here is that the additional tag, the "kind" can help prevent type mistakes
in arguments, in columns and in procedure calls. For instance:
create table things(
size real<meters>,
duration real<seconds>
);
create proc do_something(size_ real<meters>, duration_ real<seconds>)
begin
insert into things(size, duration) values(size_, duration_);
end;
In this situation you couldn't accidentally switch the columns in do_something
even though both
are real
, and indeed SQLite will only see the type real
for both. If you have your own variables
typed real<size>
and real<duration>
you can't accidentally do:
call do_something(duration, size);
even though both are real. The type kind won't match.
Importantly, an expression with no type kind is compatible with any type kind (or none). Hence all of the below are legal.
declare generic real;
set generic := size; -- no kind may accept <meters>
set generic := duration; -- no kind may accept <seconds>
set duration := generic; -- no kind may be stored in <seconds>
Only mixing types where both have a kind, and the kind is different generates errors. This choice allows you to
write procedures that (for instance) log any integer
or any real
, or that return an integer
out of a collection.
These rules are applied to comparisons, assignments, column updates, anywhere and everywhere types are checked for compatibility.
To get the most value out of these constructs, the authors recommend that type kinds be used universally except when the extra compatibility described above is needed (like low level helper functions.)
Importantly, type kind can be applied to object types as well, allowing object<dict>
to be distinct from object<list>
.
At run time the kind information is lost. But it does find it's way into the JSON output so external tools also get to see the kinds.
Nullabilityβ
Nullability Rulesβ
Nullability is tracked via CQL's type system. To understand whether or not an expression will be assigned a nullable type, you can follow these rules; they will hopefully be intuitive if you are familiar with SQL:
The literal
NULL
is, of course, always assigned a nullable type. All other literals are nonnull.In general, the type of an expression involving an operator (e.g.,
+
,==
,!=
,~
,LIKE
, et cetera) is nullable if any of its arguments are nullable. For example,1 + NULL
is assigned the typeINTEGER
, implying nullability.1 + 2
, however, is assigned the typeINTEGER NOT NULL
.IN
andNOT IN
expressions are assigned a nullable type if and only if their left argument is nullable: The nullability of the right side is irrelevant. For example,"foo" IN (a, b)
will always have the typeBOOL NOT NULL
, whereassome_nullable IN (a, b)
will have the typeBOOL
.- NOTE: In CQL, the
IN
operator behaves like a series of equality tests (i.e.,==
tests, notIS
tests), andNOT IN
behaves symmetrically. SQLite has slightly different nullability rules forIN
andNOT IN
. This is the one place where CQL has different evaluation rules from SQLite, by design.
- NOTE: In CQL, the
The result of
IS
andIS NOT
is always of typeBOOL NOT NULL
, regardless of the nullability of either argument.For
CASE
expressions, the result is always of a nullable type if noELSE
clause is given. If anELSE
is given, the result is nullable if any of theTHEN
orELSE
expressions are nullable.- NOTE: The SQL
CASE
construct is quite powerful: Unlike the Cswitch
statement, it is actually an expression. In this sense, it is rather more like a highly generalized ternarya ? b : c
operator than a C switch statement. There can be arbitrarily many conditions specified, each with their own result, and the conditions need not be constants; typically, they are not.
- NOTE: The SQL
IFNULL
andCOALESCE
are assigned aNOT NULL
type if one or more of their arguments are of aNOT NULL
type.In most join operations, the nullability of each column participating in the join is preserved. However, in a
LEFT OUTER
join, the columns on the right side of the join are always considered nullable; in aRIGHT OUTER
join, the columns on the left side of the join are considered nullable.As in most other languages, CQL does not perform evaluation of value-level expressions during type checking. There is one exception to this rule: An expression within a
const
is evaluated at compilation time, and if its result is then known to be nonnull, it will be given aNOT NULL
type. For example,const(NULL or 1)
is given the typeBOOL NOT NULL
, whereas merelyNULL or 1
has the typeBOOL
.
Nullability Improvementsβ
CQL is able to "improve" the type of some expressions from a nullable type to a
NOT NULL
type via occurrence typing, also known as flow typing. There are
three kinds of improvements that are possible:
Positive improvements, i.e., improvements resulting from the knowledge that some condition containing one or more
AND-
linkedIS NOT NULL
checks must have been true:IF
statements:IF a IS NOT NULL AND c.x IS NOT NULL THEN
-- `a` and `c.x` are not null here
ELSE IF b IS NOT NULL THEN
-- `b` is not null here
END IF;CASE
expressions:CASE
WHEN a IS NOT NULL AND c.x IS NOT NULL THEN
-- `a` and `c.x` are not null here
WHEN b IS NOT NULL THEN
-- `b` is not null here
ELSE
...
END;IIF
expressions:IIF(a IS NOT NULL AND c.x IS NOT NULL,
..., -- `a` and `c.x` are not null here
...
)SELECT
expressions:SELECT
-- `t.x` and `t.y` are not null here
FROM t
WHERE x IS NOT NULL AND y IS NOT NULL
Negative improvements, i.e., improvements resulting from the knowledge that some condition containing one or more
OR
-linkedIS NULL
checks must have been false:IF
statements:IF a IS NULL THEN
...
ELSE IF c.x IS NULL THEN
-- `a` is not null here
ELSE
-- `a` and `c.x` are not null here
END IF;IF
statements, guard pattern:IF a IS NULL RETURN;
-- `a` is not null here
IF c.x IS NULL THEN
...
THROW;
END IF;
-- `a` and `c.x` are not null hereCASE
expressions:CASE
WHEN a IS NULL THEN
...
WHEN c.x IS NULL THEN
-- `a` is not null here
ELSE
-- `a` and `c.x` are not null here
END;IIF
expressions:IIF(a IS NULL OR c.x IS NULL,
...,
... -- `a` and `c.x` are not null here
)
Assignment improvements, i.e., improvements resulting from the knowledge that the right side of a statement (or a portion therein) cannot be
NULL
:SET
statements:SET a := 42;
-- `a` is not null here
NOTE: Assignment improvements from
FETCH
statements are not currently supported. This may change in a future version of CQL.
There are several ways in which improvements can cease to be in effect:
The scope of the improved variable or cursor field has ended:
IF a IS NOT NULL AND c.x IS NOT NULL THEN
-- `a` and `c.x` are not null here
END IF;
-- `a` and `c.x` are nullable hereAn improved variable was
SET
to a nullable value:IF a IS NOT NULL THEN
-- `a` is not null here
SET a := some_nullable;
-- `a` is nullable here
END IF;An improved variable was used as an
OUT
(orINOUT
) argument:IF a IS NOT NULL THEN
-- `a` is not null here
CALL some_procedure_that_requires_an_out_argument(a);
-- `a` is nullable here
END IF;An improved variable was used as a target for a
FETCH
statement:IF a IS NOT NULL THEN
-- `a` is not null here
FETCH c INTO a;
-- `a` is nullable here
END IF;An improved cursor field was re-fetched:
IF c.x IS NOT NULL THEN
-- `c.x` is not null here
FETCH c;
-- `c.x` is nullable here
END IF;A procedure call was made (which removes improvements from all globals because the procedure may have mutated any of them; locals are unaffected):
IF a IS NOT NULL AND some_global IS NOT NULL THEN
-- `a` and `some_global` are not null here
CALL some_procedure();
-- `a` is still not null here
-- `some_global` is nullable here
END IF;
CQL is generally smart enough to understand the control flow of your program and infer nullability appropriately; here are a handful of examples:
IF some_condition THEN
SET a := 42;
ELSE
THROW;
END IF;
-- `a` is not null here because it must have been set to 42
-- if we've made it this far
IF some_condition THEN
SET a := 42;
ELSE
SET a := 100;
END IF;
-- `a` is not null here because it was set to a value of a
-- `NOT NULL` type in all branches and the branches cover
-- all of the possible cases
IF a IS NOT NULL THEN
IF some_condition THEN
SET a := NULL;
ELSE
-- `a` is not null here despite the above `SET` because
-- CQL understands that, if we're here, the previous
-- branch must not have been taken
END IF;
END IF;
IF a IS NOT NULL THEN
WHILE some_condition
BEGIN
-- `x` is nullable here despite `a IS NOT NULL` because
-- `a` was set to `NULL` later in the loop and thus `x`
-- will be `NULL` when the loop repeats
LET x := a;
SET a := NULL;
...
END;
END IF;
Here are some additional details to note regarding conditions:
For positive improvements, the check must be exactly of the form
IS NOT NULL
; other checks that imply a variable or cursor field must not be null when true have no effect:IF a > 42 THEN
-- `a` is nullable here
END IF;NOTE: This may change in a future version of CQL.
For multiple positive improvements to be applied from a single condition, they must be linked by
AND
expressions along the outer spine of the condition; uses ofIS NOT NULL
checks that occur as subexpressions within anything other thanAND
have no effect:IF
(a IS NOT NULL AND b IS NOT NULL)
OR c IS NOT NULL
THEN
-- `a`, `b`, and `c` are all nullable here
END IF;For negative improvements, the check must be exactly of the form
IS NULL
; other checks that imply a variable or cursor field must not be null when false have no effect:DECLARE equal_to_null INT;
IF a IS equal_to_null THEN
...
ELSE
-- `a` is nullable here
END IF;For multiple negative improvements to be applied from a single condition, they must be linked by
OR
expressions along the outer spine of the condition; uses ofIS NULL
checks that occur as subexpressions within anything other thanOR
have no effect:IF
(a IS NULL OR b IS NULL)
AND c IS NULL
THEN
...
ELSE
-- `a`, `b`, and `c` are all nullable here
END IF;
Forcing Nonnull Typesβ
If possible, it is best to use the techniques described in "Nullability Improvements" to verify that the value of a nullable type is nonnull before using it as such.
Sometimes, however, you may know that a value with a nullable type cannot be
null and simply wish to use it as though it were nonnull. The ifnull_crash
and ifnull_throw
"attesting" functions convert the type of an expression to be
nonnull and ensure that the value is nonnull with a runtime check. They cannot
be used in SQLite contexts because the functions are not known to SQLite, but
they can be used in loose expressions. For example:
CREATE PROC square_if_odd(a INT NOT NULL, OUT result INT)
BEGIN
IF a % 2 = 0 THEN
SET result := NULL;
ELSE
SET result := a * a;
END IF;
END;
-- `x` has type `INT`, but we know it can't be `NULL`
let x := call square_if_odd(3);
-- `y` has type `INT NOT NULL`
let y := ifnull_crash(x);
Above, the ifnull_crash
attesting function is used to coerce the expression
x
to be of type INT NOT NULL
. If our assumptions were somehow wrong,
howeverβand x
were, in fact, NULL
βour program would crash.
As an alternative to crashing, you can use ifnull_throw
. The following two
pieces of code are equivalent:
CREATE PROC y_is_not_null(x INT)
BEGIN
let y := ifnull_throw(x);
END;
CREATE PROC y_is_not_null(x INT)
BEGIN
DECLARE y INT NOT NULL;
IF x IS NOT NULL THEN
SET y := x;
ELSE
THROW;
END IF;
END;
Expression Typesβ
CQL supports a variety of expressions, nearly everything from the SQLite world. The following are the various supported operators; they are presented in order from the weakest binding strength to the strongest. Note that the binding order is NOT the same as C, and in some cases it is radically different (e.g. boolean math)
UNION and UNION ALLβ
These appear only in the context of SELECT
statements. The arms of a compound select may include FROM
, WHERE
, GROUP BY
, HAVING
, and WINDOW
. If ORDER BY
or LIMIT ... OFFSET
are present, these apply to the entire UNION.
example:
select A.x x from A inner join B using(z)
union all
select C.x x from C
where x = 1;
The WHERE
applies only to the second select in the union. And each SELECT
is evaluated before the the UNION ALL
select A.x x from A inner join B using(z)
where x = 3
union all
select C.x x from C
where x = 1
order by x;
The ORDER BY
applies to the result of the union, so any results from the 2nd branch will sort before any results from the first branch (because x
is constrained in both).
Assignmentβ
Assignment only occurs in the UPDATE
statement or in the SET
statement. In both cases the left side
is a simple target and the right side is a general expression. The expression is evaluated before the assignment.
example:
SET x := 1 + 3 AND 4; -- + before AND then :=
Logical ORβ
The logical OR
operator does shortcut evaluation, much like the C ||
operator (not to be confused with SQL's concatenation operator with the same lexeme).
The truth table for logical OR
is as follows:
A | B | A OR B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
0 | NULL | NULL |
1 | 0 | 1 |
1 | 1 | 1 |
1 | NULL | 1 |
NULL | 0 | NULL |
NULL | 1 | 1 |
NULL | NULL | NULL |
Logical ANDβ
The logical AND
operator does shortcut evaluation, much like the C &&
operator, so if the left side is zero the result is 0 and the right side is not evaluated.
The truth table for logical AND
is as follows:
A | B | A AND B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
0 | NULL | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
1 | NULL | NULL |
NULL | 0 | 0 |
NULL | 1 | NULL |
NULL | NULL | NULL |
BETWEEN and NOT BETWEENβ
These are ternary operators. The general forms are:
expr1 BETWEEN expr2 AND expr3
expr1 NOT BETWEEN expr2 AND expr3
Note that there is an inherent ambiguity in the language because expr2
or expr3
could be logical expressions that include AND
. CQL resolves this ambiguity by insisting that expr2
and expr3
be "math expressions" in the grammar. These expressions may not have ungrouped AND
or OR
operators.
Examples::
-- oh hell no (syntax error)
a between 1 and 2 and 3;
-- all ok
a between (1 and 2) and 3;
a between 1 and (2 and 3);
a between 1 and b between c and d; -- binds left to right
a between 1 + 2 and 12 / 2;
Logical NOTβ
The one operand of logical NOT
must be a numeric. NOT 'x'
is illegal.
Non-ordering tests !=
, <>
, =
, ==
, LIKE
, GLOB
, MATCH
, REGEXP
, IN
, IS
, IS NOT
β
These operations do some non-ordered comparison of their two operands.
IS
andIS NOT
never returnNULL
, So for instanceX IS NOT NULL
gives the natural answer.x IS y
is true if and only if: 1. bothx
andy
areNULL
or 2. if they are equal.- The other operators return
NULL
if either operand isNULL
and otherwise perform their usual test to produce a boolean !=
and<>
are equivalent as are=
and==
- strings and blobs compare equal based on their value, not their identity (i.e. not the string/blob pointer)
- objects compare equal based on their address, not their content (i.e. reference equality)
MATCH
,GLOB
, andREGEXP
are only valid in SQL contexts,LIKE
can be used in any context (a helper method to doLIKE
in C is provided by SQLite, but not the others)MATCH
,GLOB
,REGEXP
,LIKE
, andIN
may be prefixed withNOT
which reverses their value
NULL IS NULL -- this is true
(NULL == NULL) IS NULL -- this is also true because NULL == NULL is not 1, it's NULL.
(NULL != NULL) IS NULL -- this is also true because NULL != NULL is not 0, it's also NULL.
'xy' NOT LIKE 'z%'` -- this is true
Ordering comparisons <, >, <=, >=β
These operators do the usual order comparison of their two operands.
- If either operand is
NULL
the result isNULL
- Objects and Blobs may not be compared with these operands
- Strings are compared based on their value (as with other comparisons) not their address
- Numerics are compared as usual with the usual promotion rules
NOTE: CQL uses strcmp
for string comparison. In SQL expressions the comparison happens in whatever way SQLite has been configured. Typically general purpose string comparison should be done with helper functions that deal with collation and other considerations. This is a very complex topic and CQL is largely silent on it.
Bitwise operators <<, >>, &, |β
These are the bit-manipulation operations. Their binding strength is VERY different than C so beware.
And notably the &
operator has the same binding strength as the |
operator so they bind left to right,
which is utterly unlike most systems. Many parentheses are likely to be needed to get the usual "or of ands" patterns codified correctly. Likewise, the shift operators <<
and >>
are the same strength as &
and |
which is very atypical. Consider:
x & 1 << 7; -- probably doesn't mean what you think (this is not ambiguous, it's well defined, but unlike C)
(x & 1) << 7; -- means the same as the above
x & (1 << 7) -- probably what you intended
Note that these operators only work on integer and long integer data. If any operand is NULL
the result is `NULL.
Addition and Subtraction +, -β
These operators do the typical math. Note that there are no unsigned numerics so it's always signed math that is happening here.
- operands are promoted to the "biggest" type involved as previously described (bool -> int -> long -> real)
- only numeric operands are legal (no adding strings)
- if any operand is
NULL
the result isNULL
Multiplication, Division, Modulus *, /, %β
These operators do the typical math. Note that there are no unsigned numerics so it's always signed math that is happening here.
- operands are promoted to the "biggest" type as previously described (bool -> int -> long -> real)
- only numeric operands are legal (no multiplying strings)
- if any operand is
NULL
the result isNULL
EXCEPTION: the %
operator doesn't make sense on real values, so real values produce an error.
Unary operators -, ~β
Unary negation (-
) and bitwise invert (~
) are the strongest binding operators.
- The
~
operator only works on integer types (not text, not real) - the usual promotion rules otherwise apply
- if the operand is
NULL
the result isNULL
CASE Expressionsβ
The case
expression has two major forms and provides a great deal of flexibility in an expression. You can kind of think of it as the C ?:
operator on steroids.
set x := 'y';
select case x
when 'y' then 1
when 'z' then 2
else 3
end;
In this form the case
expression (x
here) is evaluated exactly once and then compared against each when
clause. Every when
clause must be type compatible with the case
expression. The then
expression that corresponds to the matching when
is evaluated and becomes the result. If no when
matches then the else
expression is used. If there is no else
and no matching when
then the result is null
.
If that's not general enough, there is an alternate form:
set y := 'yy';
set z := 'z';
select case
when y = 'y' then 1
when z = 'z' then 2
else 3
end;
The second form, where there is no value before the first when
keyword, each when
expression is a separate independent boolean expression. The first one that evaluates to true causes the corresponding then
to be evaluated and that becomes the result. As before, if there is no matching when
clause then the result is the else
expression if present, or null
if there is no else
.
The result types must be compatible and the best type to hold the answer is selected with the usual promotion rules.
SELECT expressionsβ
Single values can be extracted from SQLite using an inline select
expression. For instance:
set x_ := (select x from somewhere where id = 1);
The select statement in question must extract exactly one column and the type of the expression becomes the type of the column. This form can appear anywhere an expression can appear, though it is most commonly used in assignments. Something like this would also be valid:
if (select x from somewhere where id = 1) == 3 then
...
end if;
The select statement can of course be arbitrarily complex.
Note, if the select statement returns no rows this will result in the normal error flow. In that case, the error code will be SQLITE_DONE, which is treated like an error because in this context SQLITE_ROW is expected as a result of the select. This is not a typical error code and can be quite surprising to callers. If you're seeing this failure mode it usually means the code had no affordance for the case where there were no rows and probably that situation should have been handled. This is an easy mistake to make, so to avoid it, CQL also supports these more tolerant forms:
set x_ := (select x from somewhere where id = 1 if nothing -1);
And even more generally if the schema allows for null values and those are not desired:
set x_ := (select x from somewhere where id = 1 if nothing or null -1);
Both of these are much safer to use as only genuine errors (e.g. the table was dropped and no longer exists) will result in the error control flow.
Again note that:
set x_ := (select ifnull(x,-1) from somewhere where id = 1);
Would not avoid the SQLITE_DONE error code, because no rows returned is not at all the same as a null value returned.
The if nothing or null
form above is equivalent to the following, but it is more economical, and probably clearer:
set x_ := (select ifnull(x,-1) from somewhere where id = 1 if nothing -1);
To compute the type of the overall expression, the rules are almost the same as normal binary operators. In particular:
- if the default expression is present it must be type compatible with the select result
- the result type is the smallest type that holds both the select value and the default expression (see normal promotion rules above)
- object types are not allowed (SQLite cannot return an object)
- in
(select ...)
the result type is not null if and only if the select result type is not null (see select statement, many cases) - in
(select ... if nothing)
the result type is not null if and only if both the select result and the default expression types are not null (normal binary rules) - in
(select ... if nothing or null)
the result type is not null if and only if the default expression type is not null
Finally, the form (select ... if nothing throw)
is allowed; this form is exactly the same as normal
(select ...)
but makes the explicit that the error control flow will happen if there is no row. Consequently
this form is allowed even if @enforce_strict select if nothing
is in force.
Marking Data as Sensitiveβ
CQL supports the notion of 'sensitive' data in a first class way. You can think of it as very much like nullability; It largely begins by tagging data columns with @sensitive
Rather than go through the whole calculus, it's easier to understand by a series of examples. So let's start with a table with some sensitive data.
create table with_sensitive(
id integer,
name text @sensitive,
sens integer @sensitive
);
The most obvious thing you might do at this point is create a stored proc that would read data out of that table. Maybe something like this:
create proc get_sensitive()
begin
select id as not_sensitive_1,
sens + 1 sensitive_1,
name as sensitive_2,
'x' as not_sensitive_2,
-sens as sensitive_3,
sens between 1 and 3 as sensitive_4
from with_sensitive;
end;
So looking at that procedure we can see that it's reading sensitive data, so the result will have some sensitive columns in it.
- the "id" is not sensitive (at least not in this example)
- sens + 1 is sensitive, math on a sensitive field leaves it sensitive
- name is sensitive, it began that way and is unchanged
- 'x' is just a string literal, it's not sensitive
- -sens is sensitive, that's more math
- and the between expression is also sensitive
Generally sensitivity is "radioactive" - anything it touches becomes sensitive. This is very important because even a simple looking expression like sens IS NOT NULL
must lead to a sensitive result or the whole process would be largely useless. It has to be basically impossible to wash away sensitivity.
These rules apply to normal expressions as well as expressions in the context of SQL. Accordingly:
Sensitive variables can be declared:
declare sens integer @sensitive;
Simple operations on the variables are sensitive
-- this is sensitive (and the same would be true for any other math)
sens + 1;
The IN
expression gives you sensitive results if anything about it is sensitive
-- all of these are sensitive
sens in (1, 2);
1 in (1, sens);
(select id in (select sens from with_sensitive));
Similarly sensitive constructs in CASE
expressions result in a sensitive output
-- not sensitive
case 0 when 1 then 2 else 3 end;
-- all of these are sensitive
case sens when 1 then 2 else 3 end;
case 0 when sens then 2 else 3 end;
case 0 when 1 then sens else 3 end;
case 0 when 1 then 2 else sens end;
Cast operations preserve sensitivity
-- sensitive result
select cast(sens as INT);
Aggregate functions likewise preserve sensitivity
-- all of these are sensitive
select AVG(T1.sens) from with_sensitive T1;
select MIN(T1.sens) from with_sensitive T1;
select MAX(T1.sens) from with_sensitive T1;
select SUM(T1.sens) from with_sensitive T1;
select COUNT(T1.sens) from with_sensitive T1;
There are many operators that get similar treatment such as COALESCE
, IFNULL
, IS
and IS NOT
.
Things get more interesting when we come to the EXISTS
operator:
-- sensitive if and only if any selected column is sensitive
exists(select * from with_sensitive)
-- sensitive because "info" is sensitive
exists(select info from with_sensitive)
-- not sensitive because "id" is not sensitive
exists(select id from with_sensitive)
If this is making you nervous, it probably should, we need a little more protection because of the way EXISTS is typically used. The predicates matter, consider the following:
-- id is now sensitive because the predicate of the where clause was sensitive
select id from with_sensitive where sens = 1;
-- this expression is now sensitive because id is sensitive in this context
exists(select id from with_sensitive where sens = 1)
In general: if the predicate of a WHERE
or HAVING
clause is sensitive then all columns in the result become sensitive.
Similarly when performing joins, if the column specified in the USING
clause is sensitive or the predicate of the ON
clause is sensitive then the result of the join is considered to be all sensitive columns (even if the columns were not sensitive in the schema).
Likewise a sensitive expression in LIMIT
or OFFSET
will result in 100% sensitive columns as these can be used in a WHERE
-ish way. There is no reasonable defense against using LIMIT
and testing for the presence or absence of a row as a way to wash away sensitivity so that is a weakness, but the rules that are present are likely to be very helpful.
-- join with ON
select T1.id from with_sensitive T1 inner join with_sensitive T2 on T1.sens = T2.sens
-- join with USING
select T1.id from with_sensitive T1 inner join with_sensitive T2 using(sens);
All of these expression and join propagations are designed to make it impossible to simply wash-away sensitivity with a little bit of math.
Now we come to enforcement, which boils down to what assignments or "assignment-like" operations we allow.
If we have these:
declare sens integer @sensitive;
declare not_sens integer;
We can use those as stand-ins for lots of expressions, but the essential calculus goes like this:
-- assigning a sensitive to a sensitive is ok
set sens := sens + 1;
-- assigning not sensitive data to a sensitive is ok
-- this is needed so you can (e.g.) initialize to zero
set sens := not_sens;
-- not ok
set not_sens := sens;
Now these "assignments" can happen in a variety of ways:
- you can set an out parameter of your procedure
- when calling a function or procedure, we require:
- any IN parameters of the target be "assignable" from the value of the argument expression
- any OUT parameters of the target be "assignable" from the procedures type to the argument variable
- any IN/OUT parameters require both the above
Now it's possible to write a procedure that accepts sensitive things and returns non-sensitive things. This is fundamentally necessary because the proc must be able return (e.g.) a success code, or encrypted data, that is not sensitive. However, if you write the procedure in CQL it, too, will have to follow the assignment rules and so cheating will be quite hard. The idea here is to make it easy to handle sensitive data well and make typical mistakes trigger errors.
With these rules it's possible to compute the the type of procedure result sets and also to enforce IN/OUT parameters. Since the signature of procedures is conveniently generated with --generate_exports good practices are fairly easy to follow and sensitivity checks flow well into your programs.
This is a brief summary of CQL semantics for reference types -- those types that are ref counted by the runtime.
The three reference types are:
- TEXT
- OBJECT
- BLOB
Each of these has their own macro for retain
and release
though all three actually turn into the exact same code in all the current CQL runtime implementations. In all cases the object is expected to be promptly freed when the reference count falls to zero.
Reference Semanticsβ
Stored Procedure Argumentsβ
in
andinout
arguments are not retained on entry to a stored procout
arguments are assumed to contain garbage and are nulled without retaining on entry- if your
out
argument doesn't have garbage in it, then it is up to you dorelease
it before you make a call - When calling a proc with an
out
argument CQL willrelease
the argument variable before the call site, obeying its own contract
Local Variablesβ
- assigning to a local variable
retains
the object, and then does arelease
on the previous object - this order is important; all assignments are done in this way in case of aliasing (
release
first might accidentally free too soon) - CQL calls
release
on all local variable when the method exits
Assigning to an out
parameter or a global variableβ
out
,inout
parameters, and global variables work just like local variables except that CQL does not callrelease
at the end of the procedure
Function Return Valuesβ
Stored procedures do not return values, they only have out
arguments and those are well defined as above. Functions however are also supported and they can have either get
or create
semantics
Get Semanticsβ
If you declare a function like so:
declare function Getter() object;
Then CQL assumes that the returned object should follow the normal rules above, retain/release will balance by the end of the procedure for locals and globals or out
arguments could retain the object
Create Semanticsβ
If you declare a function like so:
declare function Getter() create text;
then CQL assumes that the function created a new result which it is now responsible for releasing. In short, the returned object is assumed to arrive with a retain count of 1 already on it. When CQL stores this return value it will:
- release the object that was present at the storage location (if any)
- copy the returned pointer without further retaining it this one time
As a result if you store the returned value in a local variable it will be released when the procedure exits (as usual) or if you instead store the result in a global or an out parameter the result will survive to be used later.
Comparisonβ
CQL tries to adhere to normal SQL comparison rules but with a C twist.
OBJECT
β
The object type has no value based comparison, so there is no <
, >
and so forth.
The following table is useful. Let's suppose there are exactly two distinct objects 'X' and 'Y'
true expressions: X = X
X <> Y
Y = Y
Y <> X
X IN (X, Y)
X NOT IN (Y)
false expressions: X = Y
X <> X
Y = X
Y <> Y
X NOT IN (X, Y)
null expressions: null = null
X <> null
x = null
null <> null
Y <> null
y = null
null = null
resulting in null
is particular surprising but consistent with the usual SQL rules. And again, as in SQL, the IS
operator returns true for X IS Y
even if both are null
.
TEXT
β
Text has value comparison semantics but normal string comparison is done only with strcmp
which is of limited value. Typically you'll want to either delegate the comparison to Sqlite (with (select x < y)
) or else use a helper function with a suitable comparison mechanism.
For text comparisons including equality:
true: if and only if both operands are not null and the comparison matches (using strcmp) false: if and only if both operands are not null and the comparison does not match (using strcmp) null: if and only if at least one operand is null
EXAMPLE: 'x' < 'y'
is true because strcmp("x", "y") < 0
The IS
and IS NOT
operators behave similarly to equality and inequality, but never return null
. If X
is some value that doesn't happen to be null
then we have the following:
true: null is null
X is X
X is not null
null is not X
false: null is not null
X is not X
X is null
null is X
The IN
and NOT IN
operators also work for text using the same value comparisons as above. Additionally there are special text comparison
operators such as LIKE
, MATCH
and GLOB
. These comparisons are defined by SQLite.
BLOB
β
Blobs are compared by value (equivalent to memcmp
) but have no well-defined ordering. The memcmp
order is deemed not helpful as blobs
usually have internal structure hence the valid comparisons are only equality and inequality.
You can use user defined functions to do better comparisons of your particular blobs if needed.
The net comparison behavior is otherwise just like strings.
Sample Codeβ
Out Argument Semanticsβ
DECLARE FUNCTION foo() OBJECT;
CREATE PROC foo_user (OUT baz OBJECT)
BEGIN
SET baz := foo();
END;
void foo_user(cql_object_ref _Nullable *_Nonnull baz) {
*(void **)baz = NULL; // set out arg to non-garbage
cql_set_object_ref(baz, foo());
}
Function with Create Semanticsβ
DECLARE FUNCTION foo() CREATE OBJECT;
CREATE PROCEDURE foo_user (INOUT baz OBJECT)
BEGIN
DECLARE x OBJECT;
SET x := foo();
SET baz := foo();
END;
void foo_user(cql_object_ref _Nullable *_Nonnull baz) {
cql_object_ref x = NULL;
cql_object_release(x);
x = foo();
cql_object_release(*baz);
*baz = foo();
cql_cleanup:
cql_object_release(x);
}
Function with Get Semanticsβ
DECLARE FUNCTION foo() OBJECT;
CREATE PROCEDURE foo_user (INOUT baz OBJECT)
BEGIN
DECLARE x OBJECT;
SET x := foo();
SET baz := foo();
END;
void foo_user(cql_object_ref _Nullable *_Nonnull baz) {
cql_object_ref x = NULL;
cql_set_object_ref(&x, foo());
cql_set_object_ref(baz, foo());
cql_cleanup:
cql_object_release(x);
}