Chapter 14: CQL Query Fragments

CQL Query fragments are the most sophisticated rewrite CQL offers for productivity. The idea is that a very large query can be represented in "fragments" that add columns or add rows based on the original "core" query. The final query will be an assembled rewrite of all the fragments chained together. Specifically, the motivation for this is that you can have a "core" query that fetches the essential columns for some UI design and then you can add query extension fragments that add new/additional columns for some new set of features. The core and extended columns can be in their own fragment and they can be compiled independently. The result of this is that any errors are in much smaller and easier to understand fragments rather than in some monster "fetch everything" query; any given extension does not have to know all the details of all the other extensions and can take a limited dependency on even the core query.

It's easiest to illustrate this with an example so let's begin there.

Let's first start with this very simple schema.

create table my_table(
id integer primary key,
name text not null,
rate real not null
);
create table added_rows(
like my_table -- sugar to duplicate the columns of my_table
);
create table added_columns(
id integer references my_table(id),
data text
);

Typically there would be a lot more columns but where you see flag1 and flag2 appear in fragments you can imagine any number of additional columns of any type. So we can keep the examples simple.

Base Query Fragments

The base fragment might look something like this:

@attribute(cql:base_fragment=base_frag)
create proc base_frag_template(id_ integer not null)
begin
with
base_frag(*) as (select * from my_table where my_table.id = id_)
select * from base_frag;
end;

Here are the essential aspects:

  • the base fragment is given a name, it can be anything, probably something that describes the purpose of the fragments
  • the procedure name can be anything at all
  • the procedure must consist of exactly one with...select statement
  • the fragment name must be the one and only CTE in the select statement
  • you must select all the columns from the CTE

Note the syntax helper base_frag(*) is just shorthand to avoid retyping all the column names of my_table.

The interesting part is (select * from my_table where my_table.id = id_) which could have been any select statement of your choice. Everything else in the procedure must follow the designated format, and the format is enforced due to the presence of @attribute(cql:base_fragment=base_frag).

The point of putting everything on rails like this is that all base fragments will look the same and it will be clear how to transform any base fragment into the final query when it is assembled with its extensions.

Note: the base fragment produces no codegen at all. There is no base_frag_template procedure in the output. This is just a template. Also, the name of the procedure cannot be base_frag this name will be used by the assembly fragment later. Really any descriptive unique name will do since the name does not appear in the output at all.

Extension Query Fragments

Adding Columns

The most common thing that an extension might want to do is add columns to the result. There can be any number of such extensions in the final assembly. Here's a simple example that adds one column.

@attribute(cql:extension_fragment=base_frag)
create proc adds_columns(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate),
col_adder_frag(*) as (
select base_frag.*, added_columns.data
from base_frag
left outer join added_columns on base_frag.id = added_columns.id)
select * from col_adder_frag;
end;

Again there are some important features to this extension and they are largely completely constrained, i.e. you must follow the pattern.

  • the attribute indicates extension_fragment and the name (here base_frag) must have been previously declared in a base_fragment
  • the procedure name can be any unique name other than base_frag, it corresponds to this particular extension's purpose
  • the procedure arguments must be identical to those in the base fragment
  • the first CTE must match the base_fragment attribute value, base_frag in this case
  • you do not need to repeat the full select statement for base_frag, any surrogate with the same column names and types will do
    • the base fragment code might include a #define to make this easier
      • e.g. #define base_frags_core as base_frag(*) as (select 1 id, "name" name, 1.0 rate)
    • doing so will make maintenance easier if new columns are added to the base fragment
  • there must be exactly one additional CTE
    • it may have any unique descriptive name you like
    • it must begin with select base_frags.* with the appropriate CTE name matching the base fragment CTE
    • it must add at least one column (or it would be uninteresting)
    • it may not have any clause other than the first from (e.g. no where, having, limit etc.)
      • if any of these were allowed they would remove or re-order rows in the base query which is not allowed
      • the from clause often includes nested selects which have no restrictions
    • it must select from the base fragment name and left outer join to wherever it likes to get optional additional columns
      • because of this the additional column(s) will certainly be a nullable type in the projection
  • the final select must be of the form select * from col_adder_frag with the appropriate name
  • keeping all this in mind, the interesting bit happens here: left outer join added_columns on base_frag.id = added_columns.id
    • this is where you get the data for your additional column using values in the core columns

This fragment can be (and should be) compiled in its own compiland while using #include to get the base fragment only. This will result in code gen for the accessor functions for a piece of the overall query -- the part this extension knows about. Importantly code that uses this extension's data does not need or want to know about any other extensions that may be present, thereby keeping dependencies under control.

The C signatures generated would look like this:

extern cql_int32 adds_columns_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull adds_columns_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double adds_columns_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nullable adds_columns_get_data(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 adds_columns_result_count(
base_frag_result_set_ref _Nonnull result_set);

Even if there were dozens of other extensions, the functions for reading those columns would not be declared in the header for this extension. Any given extension "sees" only the core columns plus any columns it added.

Adding Rows

Query extensions also frequently want to add additional rows to the main result set, based on the data that is already present.

The second form of extension allows for this, it is similarly locked in form. Here is an example:

@attribute(cql:extension_fragment=base_frag)
create proc adds_rows(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate),
row_adder_frag(*) as (
select * from base_frag
union all
select * from added_rows)
select * from row_adder_frag;
end;

Let's review the features of this second template form:

  • there is a surrogate for the core query
  • there is a mandatory second CTE
  • the second CTE is a compound query with any number of branches, all union all
  • the first branch must be select * from base_frag (the base fragment) to ensure that the original rows remain
    • this is also why all the branches must be union all
  • this form cannot add new columns
  • the extension CTE may not include order by or limit because that might reorder or remove rows of the base
  • any extensions of this form must come before those of the left outer join form for a given base fragment
    • which ironically means row_adder_frag has to come before col_adder_frag
  • the usual restrictions on compound selects (same type and number of columns) ensure a consistent result
  • the final select after the CTE section must exactly in the form select * from row_adder_frag which is the name of the one and only additional CTE with no other clauses or options
    • in practice only the CTE will be used to create the final assembly so even if you did change the final select to something else it would be moot

The signatures generated for this will look something like so:

extern cql_int32 adds_rows_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull adds_rows_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double adds_rows_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 adds_rows_result_count(
base_frag_result_set_ref _Nonnull result_set);

Which gives you access to the core columns. Again this fragment can and should be compiled standalone with only the declaration for the base fragment in the same translation unit to get the cleanest possible output. This is so that consumers of this extension do not "see" other extensions which may or may not be related and may or may not always be present.

Assembling the Fragments

With all the fragments independently declared they need to be unified to create one final query. This is where the major rewriting happens. The assembly_fragment looks something like this:

@attribute(cql:assembly_fragment=base_frag)
create proc base_frag(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate)
select * from base_frag;
end;

It will always be as simple as this, all the complexity is in the fragments.

  • the assembly_fragment name must match the core fragment name
  • the procedure arguments must be identical to the base fragment arguments
  • the procedure must have the same name as the assembly fragment (base_frag in this case)
    • the code that was generated for the previous fragments anticipates this and makes reference to what will be generated here
    • this is enforced
  • the assembled query is what you run to get the result set, this has real code behind it
    • the other fragments only produce result set readers that call into the helper methods to get columns
  • there is a surrogate for the core fragment as usual
  • all of CTE section will ultimately be replaced with the fragments chained together
  • the final select should be of the form select * from your_frags but it can include ordering and/or filtering, this statement will be present in final codegen, the final order is usually defined here

When compiling the assembly fragment, you should include the base, and all the other fragments, and the assembly template. The presence of the assembly_fragment will cause codegen for the extension fragments to be suppressed. The assembly translation unit only contains the assembly query as formed from the fragments.

Now let's look at how the query is rewritten, the process is pretty methodical.

After rewriting the assembly looks like this:

CREATE PROC base_frag (id_ INTEGER NOT NULL)
BEGIN
WITH
base_frag (id, name, rate) AS (SELECT *
FROM my_table
WHERE my_table.id = id_),
row_adder_frag (id, name, rate) AS (SELECT *
FROM base_frag
UNION ALL
SELECT *
FROM added_rows),
col_adder_frag (id, name, rate, data) AS (SELECT row_adder_frag.*, added_columns.data
FROM row_adder_frag
LEFT OUTER JOIN added_columns ON row_adder_frag.id = added_columns.id)
SELECT *
FROM col_adder_frag;
END;

Let's dissect this part by part, each CTE serves a purpose.

  • the core CTE was replaced by the CTE in the base_fragment, it appears directly
  • next the first extension was added as a CTE referring to the base fragment just as before
    • recall that the first extension has to be row_adder_frag, as that type must come first
    • looking at the chain you can see why it would be hard to write a correct fragment if it came after columns were added
  • next the second extension was added as a CTE
    • all references to the base fragment were replaced with references to row_adder_frag
    • the extra column names in the CTE were added such that all previous column names are introduced
  • this process continues until all extensions are exhausted
  • the final select statement reads all the columns from the last extension CTE and includes and ordering and so forth that was present in the assembly query

The result of all this is a single query that gets all the various columns that were requested in all the extensions and all the union all operations play out as written. The extensions are emitted in the order that they appear in the translation unit with the assembly, which again must have the row adding extensions first.

This facility provides considerable ability to compose a large query, but each fragment can be independently checked for errors so that nobody ever has to debug the (possibly monstrous) overall result. Fragments can be removed simply by excluding them from the final assembly (with e.g. #ifdefs, or build rules)

With the rewrite of the assembly_fragment complete, the codegen for that procedure is the normal codegen for a procedure with a single select.

As always, Java and Objective C codegen on these pieces will produce suitable wrappers for the C.

The output code for the assembly fragment generates these reading functions:

extern cql_int32 base_frag_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull base_frag_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double base_frag_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
// used by adds_columns_get_data() to read its data
extern cql_string_ref _Nullable __PRIVATE__base_frag_get_data(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 base_frag_result_count(
base_frag_result_set_ref _Nonnull result_set);

These are exactly what you would get for a normal query except that the pieces that came from extensions are marked PRIVATE. Those methods should not be used directly but instead the methods generated for each extension proc should be used.

Additionally, to create the result set, as usual.

extern CQL_WARN_UNUSED cql_code base_frag_fetch_results(
sqlite3 *_Nonnull _db_,
base_frag_result_set_ref _Nullable *_Nonnull result_set,
cql_int32 id_);

With the combined set of methods you can create a variety of assembled queries from extensions in a fairly straightforward way.

Shared Fragments

Shared fragments do not have the various restrictions that the "extension" style fragments have. While extensions were created to allow a single query to be composed by authors that did not necessarily work with each other, and therefore they are full of restrictions on the shape, shared queries instead are designed to give you maximum flexibility in how the fragments are re-used. You can think of them as being somewhat like a parameterized view, but the parameters are both value parameters and type parameters. In Java or C#, a shared fragments might have had an invocation that looked something like this: `my_fragment(1,2)<table1, table2>. As with the other fragment types the common table expression (CTE) is the way that they plug in.

It's helpful to consider a real example:

split_text(tok) AS (
WITH RECURSIVE
splitter(tok,rest) AS (
SELECT
'' tok,
IFNULL( some_variable_ || ',', '') rest
UNION ALL
SELECT
substr(rest, 1, instr(rest, ',') - 1) tok,
substr(rest, instr(rest, ',') + 1) rest
FROM splitter
WHERE rest != ''
)
SELECT tok from splitter where tok != ''
)

This text might appear in dozens of places where a comma seperated list needs to be split into pieces and there is no good way to share the code between these locations. CQL is frequently used in conjunction with the C-pre-processor so you could come up with something using the #define construct but this is problematic for several reasons:

  • the compiler does not then known that the origin of the text really is the same
    • thus it has no clue that sharing the text of the string might be a good idea
  • any error messages happen in the context of the use of the macro not the definition
  • bonus: a multi-line macro like the above gets folded into one line so any error messages are impenetrable
  • if you try to compose such macros it only gets, worse, it's more code duplication harder error cases
  • any IDE support for syntax coloring and so forth will be confused by the macro as it's not part of the language

None of this is any good but the desire to create helpers like this is real both for correctness and for performance.

To make these things possible, we introduce the notion of shared fragments. We need to give them parameters and the natural way to create a select statement that is bindable in CQL is the procedure so the shape we choose looks like this:

@attribute(cql:shared_fragment)
CREATE PROC split_text(value TEXT)
BEGIN
WITH RECURSIVE
splitter(tok,rest) AS (
SELECT
'' tok,
IFNULL( value || ',', '') rest
UNION ALL
SELECT
substr(rest, 1, instr(rest, ',') - 1) tok,
substr(rest, instr(rest, ',') + 1) rest
FROM tokens
WHERE rest != ''
)
SELECT tok from splitter where tok != ''
END;

The introductory attribute @attribute(cql:shared_fragment) indicates that the procedure is to produce no code, but rather it will be inlined as a CTE in other locations. To use it, we introduce the ability to call a procedure as part of a CTE declaration. Like so:

WITH
values(v) as (call split_text('x,y,z'))
select * from v;

Once the fragment has been defined, the statement above could appear anywhere, and of course the text 'x,y,z' need not be constant. For instance:

CREATE PROC print_parts(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
values(v) as (CALL split_text('x,y,z'))
SELECT * from v;
LOOP FETCH C
BEGIN
CALL printf("%s\n", C.v);
END;
END;

Fragments are also composable, so for instance, we might also want some shared code that extracts comma separated numbers. We could do this:

@attribute(cql:shared_fragment)
CREATE PROC ids_from_string(value TEXT)
BEGIN
WITH
values(v) as (CALL split_text(value))
SELECT CAST(v as LONG) as id from values;
END;

Now we could write:

CREATE PROC print_ids(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
values(id) as (CALL ids_from_string('1,2,3'))
SELECT * from v;
LOOP FETCH C
BEGIN
CALL printf("%ld\n", C.id);
END;
END;

Of course these are very simple examples but in principle you can use the generated tables in whatever way is necessary. For instance, here's a silly but illustrative example:

/* This is a bit silly */
CREATE PROC print_common_ids(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
v1(id) as (CALL ids_from_string('1,2,3')),
v2(id) as (CALL ids_from_string('2,4,6'))
SELECT * from v1
INTERSECT
SELECT * from v2;
LOOP FETCH C
BEGIN
CALL printf("%ld\n", C.id);
END;
END;

With a small amount of dynamism in the generation of the SQL for the above, it's possible to share the body of v1 and v2. SQL will of course see the fully expanded but your program only needs one copy no matter how many times you use the fragment anywhere in the code.

So far we have illustrated the "parameter" part of the flexibility. Now let's look at the "generics" part, even though it's overkill for this example it should still but illustratative. You could imagine that the procedure we wrote above ids_from_string might do something more complicated, maybe filtering out negative ids, ids that are too big, or that don't match some pattern. Whatever the case might be. You might want these features in a variety of contexts, maybe not just starting from a string to split.

We can rewrite the fragment in a "generic" way like so:

@attribute(cql:shared_fragment)
CREATE PROC ids_from_strings()
BEGIN
WITH
values(v) LIKE (select nullable("x") v)
SELECT CAST(v as LONG) as id from values;
END;

Note the new construct for CTE definition: inside a fragment we can use "LIKE" to define a pluggable CTE. In this case we used a select statement to describe the shape the fragment requires. We could also have used a name values(*) LIKE shape_name just like we use shape names when describing cursors. The name can be any existing view, table, a procedure with a result, etc. Any name that looks like a record.

Now when the fragment is invoked, you provide the actual data source (some table, view, or CTE) and that parameter takes the role of "values". Here's a full example:

CREATE PROC print_ids(value TEXT)
BEGIN
DECLARE C CURSOR FOR
WITH
my_data(*) as (CALL split_text(value)),
my_numbers(id) as (CALL ids_from_strings() USING my_data AS values)
SELECT id from my_numbers;
LOOP FETCH C
BEGIN
CALL printf("%ld\n", C.id);
END;
END;

We could actually rewrite the previous simple id fragment as follows:

@attribute(cql:shared_fragment)
CREATE PROC ids_from_string(value TEXT)
BEGIN
WITH
tokens(v) as (CALL split_text(value))
ids(id) as (CALL ids_from_strings() USING tokens as values)
SELECT * from ids;
END;

And actually we have a convenient name we could use for the shape we need so we could have used the shape syntax to define ids_from_strings (this still has to go before ids_from_string)

@attribute(cql:shared_fragment)
CREATE PROC ids_from_strings()
BEGIN
WITH
values(*) LIKE split_text
SELECT CAST(tok as LONG) as id from values;
END;

These examples have made very little use of the database but of course normal data is readily available so shared fragments can make a great way to provide access to complex data with shareable, correct, code. For instance you could write a fragment that provides the ids of all open businesses matching a name from a combination of tables. This is similar to what you could do with a VIEW plus a WHERE clause but:

  • such a system can give you well controlled combinations known to work well
  • there is no schema required, so your database open time can still be fast
  • parameterization is not limited to filtering VIEWs after the fact
  • "generic" patterns are available, allowing arbitary data sources to be filtered, validated, augmented
  • each fragment can be tested seperately with its own suite rather than only in the context of some larger thing
  • code generation can be more economical because the compiler is aware of what is being shared

In short, shared fragments can help with the composition of any complicated kinds of queries. If you're producing an SDK to access a data set, they are indispensible.

Creating and Using Valid Shared Fragments

When creating a fragment the following rules are enforced:

  • the fragment many not have any out arguments
  • it must consist of exactly one valid select statement (but see future forms below)
  • it may use the LIKE construct in CTE definitions to create placeholder shapes
    • this form is illegal outside of shared fragments (otherwise how would you bind it)
  • the LIKE form may only appear in top level CTE expressions in the fragment
  • the fragment is free to use other fragments, but it may not call itself
    • calling itself would result in infinite inlining

Usage of a fragment is always intruced by a "call" to the fragment name in a CTE body. When using a fragment the following rules are enforced.

  • the provided parameters must create a valid procedure call just like normal procedure calls
    • i.e. the correct number and type of arguments
  • the provided parameters may not use nested (SELECT ...) expressions
    • this could easily create fragment building within fragment building which seems not worth the complexity
    • if database access is required in the parameters simply wrap it in a helper procedure
  • the optional USING clause must specify each required table parameter exactly once
  • the USING clause may not add any extra tables
  • every actual table provided must match the column names of the corresponding table parameter
    • i.e. in USING my_data AS values the actual columns in my_data must be the same as in the values parameter
    • the columns need not be in the same order
  • each column in any actual table must be "assignment compatible" with its corresponding column in the parameters
    • i.e. the actual type could be converted to the formal type using the same rules as the := operator
    • these are the same rules used for procedure calls for instance where the call is kind of like assigning the actual parameter values to the formal parameter variables
  • the provided table values must not conflict with top level CTEs in the shared fragment
    • exception: the top level CTEs that were parameters do not create conflicts
    • e.g. it's common to do values(*) as (CALL something() using source as source) here the caller's "source" takes the value of the fragment's "source", this is not a true conflict
    • however, the caller's source might itself have been a parameter in which case the value provided could create an inner conflict
      • all these problems are easily avoided with a simple naming convention for parameters so that real arguments never look like parameter names and parameter forwarding is apparent
      • e.g. USING _source AS _source makes it clear that a parameter is being forwarded and _source is not likely to conflict with real table or view names

Shared Fragment Futures (not yet implemented)

It's a bit strange to include features not in the language in a language guide but these are coming very soon (weeks from this writing) and they are worth mentioning now, if only to be sure that the text will have some reasonable flow in the future. There are two things to discuss:

Firstly, while it's possible to share the text of fragments in the current formulation, the compiler does not yet do this. That will hopefully become a reality before end of 2021. The code is designed to do this but it doesn't yet. (expect additions to the Internals Chapter 3 on code generation).

Secondly, there is an important additional flexibility in fragments not mentioned above that will be coming once code sharing is in place. Once dynamic assembly of the text is possible then it will also be possible to create alternative texts. This is crucial because there are many instances where it is desirable to not just replace parameters but use an entirely different join sequence. Today, the only way to accomplish this is to fork the query at the topmost level (because SQLite has no internal possibly of "IF" conditions). This is expensive in terms of code size and also cognitive load because the entire alternative sequences have to be kept carefully in sync. Macros can help with this but then you get the usual macro problems. And of course there is no possibilty to share the common parts of the text of the code.

The idea is to allow this form:

@attribute(cql:shared_fragment)
CREATE PROC get_filtered_things(filter TEXT)
BEGIN
IF filter IS NOT NULL THEN
SELECT * FROM main_table T1
INNER JOIN secondary_table T2 USING(id)
WHERE T2.name like filter;
ELSE
SELECT * FROM main_table T1;
END IF;
END;

Now when do something like:

my_things(*) AS (CALL get_filtered_things(filter))

You might get the join, or you might not, depending on if the filter was provided. In fact the code might be entirely different between the branches removing unnecessary code, or swapping in a new experimental cache in your test environment, or anything like that. The generalization is simply this:

  • instead of just one select statement you get one "IF" statement
  • each statement list of the IF must be exactly one select statement
  • there must be an ELSE clause
  • the select statements must be type compatible, just like in a normal procedure

With this additional flexibility a wide variety of SQL statements can be constructed economically and maintainably. Importantly consumers of the fragments need not deal with all these various alternate possibilities but they can readily create their own useful combinations out of building blocks.

Ultimately, from SQLite's perspective, all of these shared fragment forms result in nothing more complicated than a chain of CTE expressions.

Last updated on by Winnie Quinn