Chapter 14: CQL Query Fragments
CQL base fragments, extension fragments, and assembly fragments are now deprecated and will be removed. Please use shared fragments instead.
CQL Query fragments are the most sophisticated rewrite CQL offers for productivity. The idea is that a very large query can be represented in "fragments" that add columns or add rows based on the original "core" query. The final query will be an assembled rewrite of all the fragments chained together. Specifically, the motivation for this is that you can have a "core" query that fetches the essential columns for some UI design and then you can add query extension fragments that add new/additional columns for some new set of features. The core and extended columns can be in their own fragment and they can be compiled independently. The result of this is that any errors are in much smaller and easier to understand fragments rather than in some monster "fetch everything" query; any given extension does not have to know all the details of all the other extensions and can take a limited dependency on even the core query.
It's easiest to illustrate this with an example so let's begin there.
Let's first start with this very simple schema:
create table my_table(
id integer primary key,
name text not null,
rate real not null
);
create table added_rows(
like my_table -- sugar to duplicate the columns of my_table
);
create table added_columns(
id integer references my_table(id),
data text
);
Typically there would be a lot more columns but where you see flag1
and flag2
appear in fragments you can imagine any number
of additional columns of any type. So we can keep the examples simple.
Base Query Fragmentsβ
The base fragment might look something like this:
@attribute(cql:base_fragment=base_frag)
create proc base_frag_template(id_ integer not null)
begin
with
base_frag(*) as (select * from my_table where my_table.id = id_)
select * from base_frag;
end;
Here are the essential aspects:
- the base fragment is given a name, it can be anything, probably something that describes the purpose of the fragments
- the procedure name can be anything at all
- the procedure must consist of exactly one
with...select
statement - the fragment name must be the one and only CTE in the select statement
- you must select all the columns from the CTE
Note the syntax helper base_frag(*)
is just shorthand to avoid retyping all the column names of my_table
.
The interesting part is (select * from my_table where my_table.id = id_)
which could have been any select statement
of your choice. Everything else in the procedure must follow the designated format, and the format is enforced due to
the presence of @attribute(cql:base_fragment=base_frag)
.
The point of putting everything on rails like this is that all base fragments will look the same and it will be clear how to transform any base fragment into the final query when it is assembled with its extensions.
Note: the base fragment produces no codegen at all. There is no base_frag_template
procedure in the output. This is just a template. Also, the name of the procedure cannot be base_frag
this name will be used by the assembly fragment later. Really any descriptive unique name will do since the name does not appear in the output at all.
Extension Query Fragmentsβ
Adding Columnsβ
The most common thing that an extension might want to do is add columns to the result. There can be any number of such extensions in the final assembly. Here's a simple example that adds one column.
@attribute(cql:extension_fragment=base_frag)
create proc adds_columns(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate),
col_adder_frag(*) as (
select base_frag.*, added_columns.data
from base_frag
left outer join added_columns on base_frag.id = added_columns.id)
select * from col_adder_frag;
end;
Again there are some important features to this extension and they are largely completely constrained, i.e. you must follow the pattern.
- the attribute indicates
extension_fragment
and the name (herebase_frag
) must have been previously declared in abase_fragment
- the procedure name can be any unique name other than
base_frag
- it corresponds to this particular extension's purpose - the procedure arguments must be identical to those in the base fragment
- the first CTE must match the
base_fragment
attribute value,base_frag
in this case - you do not need to repeat the full select statement for
base_frag
, any surrogate with the same column names and types will do- the base fragment code might include a #define to make this easier
- e.g.
#define base_frags_core as base_frag(*) as (select 1 id, "name" name, 1.0 rate)
- e.g.
- doing so will make maintenance easier if new columns are added to the base fragment
- the base fragment code might include a #define to make this easier
- there must be exactly one additional CTE
- it may have any unique descriptive name you like
- it must begin with
select base_frags.*
with the appropriate CTE name matching the base fragment CTE - it must add at least one column (or it would be uninteresting)
- it may not have any clause other than the first
from
(e.g. nowhere
,having
,limit
etc.)- if any of these were allowed they would remove or re-order rows in the base query which is not allowed
- the
from
clause often includes nested selects which have no restrictions
- it must select from the base fragment name and left outer join to wherever it likes to get optional additional columns
- because of this the additional column(s) will certainly be a nullable type in the projection
- the final select must be of the form
select * from col_adder_frag
with the appropriate name - keeping all this in mind, the interesting bit happens here:
left outer join added_columns on base_frag.id = added_columns.id
- this is where you get the data for your additional column using values in the core columns
This fragment can be (and should be) compiled in its own compiland while using #include
to get the base fragment only. This will result in code gen for the accessor functions for a piece of the overall query -- the part this extension knows about. Importantly code that uses this extension's data does not need or want to know about any other extensions that may be present, thereby keeping
dependencies under control.
The C signatures generated would look like this:
extern cql_int32 adds_columns_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull adds_columns_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double adds_columns_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nullable adds_columns_get_data(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 adds_columns_result_count(
base_frag_result_set_ref _Nonnull result_set);
Even if there were dozens of other extensions, the functions for reading those columns would not be declared in the header for this extension. Any given extension "sees" only the core columns plus any columns it added.
Adding Rowsβ
Query extensions also frequently want to add additional rows to the main result set, based on the data that is already present.
The second form of extension allows for this; it is similarly locked in form. Here is an example:
@attribute(cql:extension_fragment=base_frag)
create proc adds_rows(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate),
row_adder_frag(*) as (
select * from base_frag
union all
select * from added_rows)
select * from row_adder_frag;
end;
Let's review the features of this second template form:
- there is a surrogate for the core query
- there is a mandatory second CTE
- the second CTE is a compound query with any number of branches, all
union all
- the first branch must be
select * from base_frag
(the base fragment) to ensure that the original rows remain- this is also why all the branches must be
union all
- this is also why all the branches must be
- this form cannot add new columns
- the extension CTE may not include
order by
orlimit
because that might reorder or remove rows of the base - any extensions of this form must come before those of the
left outer join
form for a given base fragment- which ironically means
row_adder_frag
has to come beforecol_adder_frag
- which ironically means
- the usual restrictions are in place on compound selects (same type and number of columns) to ensure a consistent result
- the final select after the CTE section must be exactly in the form
select * from row_adder_frag
which is the name of the one and only additional CTE with no other clauses or options- in practice only the CTE will be used to create the final assembly, so even if you did change the final select to something else it would be moot
The signatures generated for this will look something like so:
extern cql_int32 adds_rows_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull adds_rows_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double adds_rows_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 adds_rows_result_count(
base_frag_result_set_ref _Nonnull result_set);
which gives you access to the core columns. Again this fragment can and should be compiled standalone with only the declaration for the base fragment in the same translation unit to get the cleanest possible output. This is so that consumers of this extension do not "see" other extensions which may or may not be related and may or may not always be present.
Assembling the Fragmentsβ
With all the fragments independently declared they need to be unified to create one final query. This is where the
major rewriting happens. The assembly_fragment
looks something like this:
@attribute(cql:assembly_fragment=base_frag)
create proc base_frag(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate)
select * from base_frag;
end;
It will always be as simple as this; all the complexity is in the fragments.
- the
assembly_fragment
name must match the core fragment name - the procedure arguments must be identical to the base fragment arguments
- the procedure must have the same name as the assembly fragment (
base_frag
in this case)- the code that was generated for the previous fragments anticipates this and makes reference to what will be generated here
- this is enforced
- the assembled query is what you run to get the result set, this has real code behind it
- the other fragments only produce result set readers that call into the helper methods to get columns
- there is a surrogate for the core fragment as usual
- all of CTE section will ultimately be replaced with the fragments chained together
- the final select should be of the form
select * from your_frags
but it can include ordering and/or filtering, this statement will be present in final codegen, the final order is usually defined here
When compiling the assembly fragment, you should include the base, and all the other fragments, and the assembly template. The presence of the assembly_fragment will cause codegen for the extension fragments to be suppressed. The assembly translation unit only contains the assembly query as formed from the fragments.
Now let's look at how the query is rewritten, the process is pretty methodical.
After rewriting the assembly looks like this:
CREATE PROC base_frag (id_ INTEGER NOT NULL)
BEGIN
WITH
base_frag (id, name, rate) AS (SELECT *
FROM my_table
WHERE my_table.id = id_),
row_adder_frag (id, name, rate) AS (SELECT *
FROM base_frag
UNION ALL
SELECT *
FROM added_rows),
col_adder_frag (id, name, rate, data) AS (SELECT row_adder_frag.*, added_columns.data
FROM row_adder_frag
LEFT OUTER JOIN added_columns ON row_adder_frag.id = added_columns.id)
SELECT *
FROM col_adder_frag;
END;
Let's dissect this part by part. Each CTE serves a purpose:
- the core CTE was replaced by the CTE in the base_fragment, and it appears directly
- next, the first extension was added as a CTE referring to the base fragment just as before
- recall that the first extension has to be
row_adder_frag
, as that type must come first - looking at the chain you can see why it would be hard to write a correct fragment if it came after columns were added
- recall that the first extension has to be
- next the second extension was added as a CTE
- all references to the base fragment were replaced with references to row_adder_frag
- the extra column names in the CTE were added such that all previous column names are introduced
- this process continues until all extensions are exhausted
- the final select statement reads all the columns from the last extension CTE and includes and ordering and so forth that was present in the assembly query
The result of all this is a single query that gets all the various columns that were requested in all the extensions
and all the union all
operations play out as written. The extensions are emitted in the order that they appear
in the translation unit with the assembly, which again must have the row adding extensions first.
This facility provides considerable ability to compose a large query, but each fragment can be independently checked for errors so that nobody ever has to debug the (possibly monstrous) overall result. Fragments can be removed simply by excluding them from the final assembly (with e.g. #ifdefs, or build rules)
With the rewrite of the assembly_fragment complete, the codegen for that procedure is the normal codegen for a procedure with a single select.
As always, Java and Objective C codegen on these pieces will produce suitable wrappers for the C.
The output code for the assembly fragment generates these reading functions:
extern cql_int32 base_frag_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull base_frag_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double base_frag_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
// used by adds_columns_get_data() to read its data
extern cql_string_ref _Nullable __PRIVATE__base_frag_get_data(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 base_frag_result_count(
base_frag_result_set_ref _Nonnull result_set);
These are exactly what you would get for a normal query except that the pieces that came from extensions are marked PRIVATE
. Those methods should not be used directly but instead the methods generated for each extension proc should be used.
Additionally, to create the result set, as usual.
extern CQL_WARN_UNUSED cql_code base_frag_fetch_results(
sqlite3 *_Nonnull _db_,
base_frag_result_set_ref _Nullable *_Nonnull result_set,
cql_int32 id_);
With the combined set of methods you can create a variety of assembled queries from extensions in a fairly straightforward way.
Shared Fragmentsβ
Shared fragments do not have the various restrictions that the "extension" style fragments have. While extensions were created to allow a single query to be composed by authors that did not necessarily work with each other, and therefore they are full of restrictions on the shape, shared queries instead are designed to give you maximum flexibility in how the fragments are re-used. You can think of them as being somewhat like a parameterized view, but the parameters are both value parameters and type parameters. In Java or C#, a shared fragments might have had an invocation that looked something like this: `my_fragment(1,2)<table1, table2>. As with the other fragment types the common table expression (CTE) is the way that they plug in.
It's helpful to consider a real example:
split_text(tok) AS (
WITH RECURSIVE
splitter(tok,rest) AS (
SELECT
'' tok,
IFNULL( some_variable_ || ',', '') rest
UNION ALL
SELECT
substr(rest, 1, instr(rest, ',') - 1) tok,
substr(rest, instr(rest, ',') + 1) rest
FROM splitter
WHERE rest != ''
)
SELECT tok from splitter where tok != ''
)
This text might appear in dozens of places where a comma separated list needs to be split into pieces and there is no good way to share the code between these locations. CQL is frequently used in conjunction with the C-pre-processor so you could come up with something using the #define construct but this is problematic for several reasons:
- the compiler does not then know that the origin of the text really is the same
- thus it has no clue that sharing the text of the string might be a good idea
- any error messages happen in the context of the use of the macro not the definition
- bonus: a multi-line macro like the above gets folded into one line so any error messages are impenetrable
- if you try to compose such macros it only gets worse; it's more code duplication and harder error cases
- any IDE support for syntax coloring and so forth will be confused by the macro as it's not part of the language
None of this is any good but the desire to create helpers like this is real both for correctness and for performance.
To make these things possible, we introduce the notion of shared fragments. We need to give them parameters and the natural way to create a select statement that is bindable in CQL is the procedure. So the shape we choose looks like this:
@attribute(cql:shared_fragment)
CREATE PROC split_text(value TEXT)
BEGIN
WITH RECURSIVE
splitter(tok,rest) AS (
SELECT
'' tok,
IFNULL( value || ',', '') rest
UNION ALL
SELECT
substr(rest, 1, instr(rest, ',') - 1) tok,
substr(rest, instr(rest, ',') + 1) rest
FROM tokens
WHERE rest != ''
)
SELECT tok from splitter where tok != ''
END;
The introductory attribute @attribute(cql:shared_fragment)
indicates that the procedure is to produce
no code, but rather will be inlined as a CTE in other locations. To use it, we introduce the ability
to call a procedure as part of a CTE declaration. Like so:
WITH
result(v) as (call split_text('x,y,z'))
select * from result;
Once the fragment has been defined, the statement above could appear anywhere, and of course the
text 'x,y,z'
need not be constant. For instance:
CREATE PROC print_parts(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
result(v) as (CALL split_text('x,y,z'))
SELECT * from result;
LOOP FETCH C
BEGIN
CALL printf("%s\n", C.v);
END;
END;
Fragments are also composable, so for instance, we might also want some shared code that extracts comma separated numbers. We could do this:
@attribute(cql:shared_fragment)
CREATE PROC ids_from_string(value TEXT)
BEGIN
WITH
result(v) as (CALL split_text(value))
SELECT CAST(v as LONG) as id from result;
END;
Now we could write:
CREATE PROC print_ids(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
result(id) as (CALL ids_from_string('1,2,3'))
SELECT * from result;
LOOP FETCH C
BEGIN
CALL printf("%ld\n", C.id);
END;
END;
Of course these are very simple examples but in principle you can use the generated tables in whatever way is necessary. For instance, here's a silly but illustrative example:
/* This is a bit silly */
CREATE PROC print_common_ids(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
v1(id) as (CALL ids_from_string('1,2,3')),
v2(id) as (CALL ids_from_string('2,4,6'))
SELECT * from v1
INTERSECT
SELECT * from v2;
LOOP FETCH C
BEGIN
CALL printf("%ld\n", C.id);
END;
END;
With a small amount of dynamism in the generation of the SQL for the above, it's possible to share the body of v1 and v2. SQL will of course see the full expansion but your program only needs one copy no matter how many times you use the fragment anywhere in the code.
So far we have illustrated the "parameter" part of the flexibility. Now let's look at the "generics" part;
even though it's overkill for this example, it should still be illustrative. You could imagine that
the procedure we wrote above ids_from_string
might do something more complicated, maybe filtering out
negative ids, ids that are too big, or that don't match some pattern, whatever the case might be. You
might want these features in a variety of contexts, maybe not just starting from a string to split.
We can rewrite the fragment in a "generic" way like so:
@attribute(cql:shared_fragment)
CREATE PROC ids_from_string_table()
BEGIN
WITH
source(v) LIKE (select "x" v)
SELECT CAST(v as LONG) as id from source;
END;
Note the new construct for a CTE definition: inside a fragment we can use "LIKE" to define a plug-able CTE.
In this case we used a select
statement to describe the shape the fragment requires. We could also
have used a name source(*) LIKE shape_name
just like we use shape names when describing cursors. The
name can be any existing view, table, a procedure with a result, etc. Any name that describes a shape.
Now when the fragment is invoked, you provide the actual data source (some table, view, or CTE) and that parameter takes the role of "values". Here's a full example:
CREATE PROC print_ids(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
my_data(*) as (CALL split_text(value)),
my_numbers(id) as (CALL ids_from_string_table() USING my_data AS source)
SELECT id from my_numbers;
LOOP FETCH C
BEGIN
CALL printf("%ld\n", C.id);
END;
END;
We could actually rewrite the previous simple id fragment as follows:
@attribute(cql:shared_fragment)
CREATE PROC ids_from_string(value TEXT)
BEGIN
WITH
tokens(v) as (CALL split_text(value))
ids(id) as (CALL ids_from_string_table() USING tokens as source)
SELECT * from ids;
END;
And actually we have a convenient name we could use for the shape we need so
we could have used the shape syntax to define ids_from_string_table
.
@attribute(cql:shared_fragment)
CREATE PROC ids_from_string_table()
BEGIN
WITH
source(*) LIKE split_text
SELECT CAST(tok as LONG) as id from source;
END;
These examples have made very little use of the database but of course
normal data is readily available, so shared fragments can make a great
way to provide access to complex data with shareable, correct code.
For instance, you could write a fragment that provides the ids of all
open businesses matching a name from a combination of tables. This is
similar to what you could do with a VIEW
plus a WHERE
clause but:
- such a system can give you well controlled combinations known to work well
- there is no schema required, so your database load time can still be fast
- parameterization is not limited to filtering VIEWs after the fact
- "generic" patterns are available, allowing arbitrary data sources to be filtered, validated, augmented
- each fragment can be tested separately with its own suite rather than only in the context of some larger thing
- code generation can be more economical because the compiler is aware of what is being shared
In short, shared fragments can help with the composition of any complicated kinds of queries. If you're producing an SDK to access a data set, they are indispensible.
Creating and Using Valid Shared Fragmentsβ
When creating a fragment the following rules are enforced:
- the fragment many not have any out arguments
- it must consist of exactly one valid select statement (but see future forms below)
- it may use the LIKE construct in CTE definitions to create placeholder shapes
- this form is illegal outside of shared fragments (otherwise how would you bind it)
- the LIKE form may only appear in top level CTE expressions in the fragment
- the fragment is free to use other fragments, but it may not call itself
- calling itself would result in infinite inlining
Usage of a fragment is always introduced by a "call" to the fragment name in a CTE body. When using a fragment the following rules are enforced.
- the provided parameters must create a valid procedure call just like normal procedure calls
- i.e. the correct number and type of arguments
- the provided parameters may not use nested
(SELECT ...)
expressions- this could easily create fragment building within fragment building which seems not worth the complexity
- if database access is required in the parameters simply wrap it in a helper procedure
- the optional USING clause must specify each required table parameter exactly once and no other tables
- a fragment that requires table parameters be invoked without a USING clause
- every actual table provided must match the column names of the corresponding table parameter
- i.e. in
USING my_data AS values
the actual columns inmy_data
must be the same as in thevalues
parameter - the columns need not be in the same order
- i.e. in
- each column in any actual table must be "assignment compatible" with its corresponding column in the parameters
- i.e. the actual type could be converted to the formal type using the same rules as the := operator
- these are the same rules used for procedure calls, for instance, where the call is kind of like assigning the actual parameter values to the formal parameter variables
- the provided table values must not conflict with top level CTEs in the shared fragment
- exception: the top level CTEs that were parameters do not create conflicts
- e.g. it's common to do
values(*) as (CALL something() using source as source)
- here the caller's "source" takes the value of the fragment's "source", which is not a true conflict - however, the caller's source might itself have been a parameter in which case the value provided could create an inner conflict
- all these problems are easily avoided with a simple naming convention for parameters so that real arguments never look like parameter names and parameter forwarding is apparent
- e.g.
USING _source AS _source
makes it clear that a parameter is being forwarded and_source
is not likely to conflict with real table or view names
Note that when shared fragments are used, the generated SQL has the text split into parts, with each fragment and its surroundings separated, therefore the text of shared fragments is shared(!) between usages if normal linker optimizations for text folding are enabled (common in production code.)
Shared Fragments with Conditionalsβ
Shared fragments use dynamic assembly of the text to do the sharing but it is also possible to create alternative texts. There are many instances where it is desirable to not just replace parameters but use, for instance, an entirely different join sequence. Without shared fragments, the only way to accomplish this is to fork the desired query at the topmost level (because SQLite has no internal possibility of "IF" conditions.) This is expensive in terms of code size and also cognitive load because the entire alternative sequences have to be kept carefully in sync. Macros can help with this but then you get the usual macro maintenance problems, including poor diagnostics. And of course there is no possibility to share the common parts of the text of the code if it is forked.
However, conditional shared fragments allow forms like this:
@attribute(cql:shared_fragment)
CREATE PROC ids_from_string(val TEXT)
BEGIN
IF val IS NULL OR val IS '' THEN
SELECT 0 id WHERE 0; -- empty result
ELSE
WITH
tokens(v) as (CALL split_text(val))
ids(id) as (CALL ids_from_string_table() USING tokens as source)
SELECT * from ids;
END IF;
END;
Now we can do something like:
ids(*) AS (CALL ids_from_string(str))
In this case, if the string val
is empty then SQLite will not see the complex comma splitting code, and instead will see
the trivial case select 0 id where 0
. The code in a conditional fragment might be entirely different between the branches
removing unnecessary code, or swapping in a new experimental cache in your test environment, or anything like that.
The generalization is simply this:
- instead of just one select statement there is one top level "IF" statement
- each statement list of the IF must be exactly one select statement
- there must be an ELSE clause
- the select statements must be type compatible, just like in a normal procedure
- any table parameters with the same name in different branches must have the same type
- otherwise it would be impossible to provide a single actual table for those table parameters
With this additional flexibility a wide variety of SQL statements can be constructed economically and maintainability. Importantly, consumers of the fragments need not deal with all these various alternate possibilities but they can readily create their own useful combinations out of building blocks.
Ultimately, from SQLite's perspective, all of these shared fragment forms result in nothing more complicated than a chain of CTE expressions.
See Appendix 8 for an extensive section on best practices around fragments and common table expressions in general.
If you use CQL's query planner on shared fragments with conditionals, the query planner will only analyze the first branch by default. You need to use @attribute(cql:query_plan_branch=[integer])
to modify the behaviour. Read Query Plan Generation for details.
Shared Fragments as Expressionsβ
The shared fragment system also has the ability to create re-usable expression-style fragments giving you something like SQL inline functions. These do come with
some performance cost so they should be used for larger fragments. In many systems a simple shared fragment would not compete well with an equivalent #define
.
Expression fragments shine when:
- the fragment is quite large
- its used frequently (hence providing significant space savings)
- the arguments are complex, potentially used many times in the expression
From a raw performance perspective, the best you can hope for with any of the fragment approaches is a "tie" on speed compared do directly inlining equivalent SQL or using a macro to do the same. However, from a correctness and space perspective it is very possible to come out ahead. It's fair to say that expression fragments have the greatest overhead compared to the other types and so they are best used in cases where the size benefits are going to be important.
Syntaxβ
An expression fragment is basically a normal shared fragment with no top-level FROM
clause that generates a single column. A typical one might look like this:
-- this isn't very exciting because regular max would do the job
@attribute(cql:shared_fragment)
create proc max_func(x integer, y integer)
begin
select case when x >= y then x else y end;
end;
The above can be used in the context of a SQL statement like so:
select max_func(T1.column1, T1.column2) the_max from foo T1;
Discussionβ
The consequence of the above is that the body of max_func
is inlined into the generated SQL. However, like
the other shared fragments, this is done in such a way that the text can be shared between instances so
you only pay for the cost of the text of the SQL in your program one time, no matter how many time you use it.
In particular, for the above, the compiler will generate the following SQL:
select (
select case when x >= y then x else y end
from (select T1.column1 x, column2 y))
But each line will be its own string literal, so, more accurately, it will concatenate the following three strings:
"select (", // string1
" select case when x >= y then x else y end", // string2
" from (select T1.column1 x, column2 y))" // string3
Importantly, string2
is fixed for any given fragment. The only thing that changes is string3
, i.e., the arguments.
In any modern C compilation system, the linker will unify the string2
literal across all translation units so you only
pay for the cost of that text one time. It also means that the text of the arguments appears exactly one time,
no matter how complex they are. For these benefits, we pay the cost of the select wrapper on the arguments. If
the arguments are complex that "cost" is negative. Consider the following:
select max_func((select max(T.m) from T), (select max(U.m) from U))
A direct expansion of the above would result in something like this:
case when (select max(T.m) from T) >= (select max(U.m) from U)
then (select max(T.m) from T)
else (select max(U.m) from U)
end;
The above could be accomplished with a simple #define
style macro. However, the expression fragment
generates the following code:
select (
select case when x >= y then x else y end
from select (select max(T.m) from T) x, (select max(U.m) from U) y))
Expression fragments can nest, so you could write:
@attribute(cql:shared_fragment)
create proc max3_func(x integer, y integer, z integer)
begin
select max_func(x, max_func(y, z));
end;
Again, this particular example is a waste because regular max
would already do the job better.
To give another example, common mappings from one kind of code to another using case/when can be written and shared this way:
-- this sort of thing happens all the time
@attribute(cql:shared_fragment)
create proc remap(x integer not null)
begin
select case x
when 1 then 1001
when 2 then 1057
when 3 then 2010
when 4 then 2011
else 9999
end;
end;
In the following:
select remap(T1.c), remap(T2.d), remap(T3.e) from T1, T2, T3... etc.
The text for remap
will appear three times in the generated SQL query but only one time in your binary.
Restrictionsβ
- the fragment must consist of exactly one simple select statement
- no
FROM
,WHERE
,HAVING
, etc. -- the result is an expression
- no
- the select list must have exactly one value
- Note: the expression can be a nested
SELECT
which could then have all the usualSELECT
elements
- Note: the expression can be a nested
- the usual shared fragment rules apply, e.g. no out-parameters, exactly one statement, etc.
Additional Notesβ
A simpler syntax might have been possible but expression fragments are only interesting in SQL contexts where
(among other things) normal procedure and function calls are not available. So the select
keyword makes it
clear to the coder and the compiler that the expression will be evaluated by SQLite and the rules for what is
allowed to go in the expression are the SQLite rules.
The fragment has no FROM
clause because we're trying to produce an expression, not a table-value with one column.
If you want a table-value with one column, the original shared fragments solution already do exactly that.
Expression fragments give you a solution for sharing code in, say, the WHERE
clause of a larger select statement.
Commpared to something like
#define max_func(x,y) case when (x) >= (y) then x else y end;
The macro does give you a ton of flexibility, but it has many problems:
- if the macro has an error, you see the error in the call site with really bad diagnostic info
- the compiler doesn't know that the sharing is going on so it won't be able to share text between call sites
- the arguments can be evaluated many times each which could be expensive, bloaty, or wrong
- there is no type-checking of arguments to the macro so you may or may not get compilation errors after expansion
- you have to deal with all the usual pre-processor hazards
In general, macros can be used (as in C and C++) as an alternative to expression fragments, especially for small fragments. But, this gets to be a worse idea as such macros grow. For larger cases, C and C++ provide inline functions -- CQL provides expression fragments.