Chapter 14: CQL Query Fragments

CQL Query fragments are the most sophisticated rewrite CQL offers for productivity. The idea is that a very large query can be represented in "fragments" that add columns or add rows based on the original "core" query. The final query will be an assembled rewrite of all the fragments chained together. Specifically, the motivation for this is that you can have a "core" query that fetches the essential columns for some UI design and then you can add query extension fragments that add new/additional columns for some new set of features. The core and extended columns can be in their own fragment and they can be compiled independently. The result of this is that any errors are in much smaller and easier to understand fragments rather than in some monster "fetch everything" query; any given extension does not have to know all the details of all the other extensions and can take a limited dependency on even the core query.

It's easiest to illustrate this with an example so let's begin there.

Let's first start with this very simple schema.

create table my_table(
id integer primary key,
name text not null,
rate real not null
);
create table added_rows(
like my_table -- sugar to duplicate the columns of my_table
);
create table added_columns(
id integer references my_table(id),
data text
);

Typically there would be a lot more columns but where you see flag1 and flag2 appear in fragments you can imagine any number of additional columns of any type. So we can keep the examples simple.

Base Query Fragments

The base fragment might look something like this:

@attribute(cql:base_fragment=base_frag)
create proc base_frag_template(id_ integer not null)
begin
with
base_frag(*) as (select * from my_table where my_table.id = id_)
select * from base_frag;
end;

Here are the essential aspects:

  • the base fragment is given a name, it can be anything, probably something that describes the purpose of the fragments
  • the procedure name can be anything at all
  • the procedure must consiste of exactly one with...select statement
  • the fragment name must be the one and only CTE in the select statement
  • you must select all the columns from the CTE

Note the syntax helper base_frag(*) is just shorthand to avoid retyping all the column names of my_table.

The interesting part is (select * from my_table where my_table.id = id_) which could have been any select statement of your choice. Everything else in the procedure must follow the designated format, and the format is enforced due to the presence of @attribute(cql:base_fragment=base_frag).

The point of putting everything on rails like this is that all base fragments will look the same and it will be clear how to transform any base fragment into the final query when it is assembled with its extensions.

Note: the base fragment produces no codegen at all. There is no base_frag_template procedure in the output. This is just a template. Also, the name of the procedure cannot be base_frag this name will be used by the assembly fragment later. Really any descriptive unique name will do since the name does not appear in the output at all.

Extension Query Fragments

Adding Columns

The most common thing that an extension might want to do is add columns to the result. There can be any number of such extensions in the final assembly. Here's a simple example that adds one column.

@attribute(cql:extension_fragment=base_frag)
create proc adds_columns(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate),
col_adder_frag(*) as (
select base_frag.*, added_columns.data
from base_frag
left outer join added_columns on base_frag.id = added_columns.id)
select * from col_adder_frag;
end;

Again there are some important features to this extension and they are largely completely constrained, i.e. you must follow the pattern.

  • the attribute indicates extension_fragment and the name (here base_frag) must have been previously declared in a base_fragment
  • the procedure name can be any unique name other than base_frag, it corresponds to this particular extension's purpose
  • the procedure arguments must be identical to those in the base fragment
  • the first CTE must match the base_fragment attribute value, base_frag in this case
  • you do not need to repeat the full select statement for base_frag, any surrogate with the same column names and types will do
    • the base fragment code might include a #define to make this easier
      • e.g. #define base_frags_core as base_frag(*) as (select 1 id, "name" name, 1.0 rate)
    • doing so will make maintenance easier if new columns are added to the base fragment
  • there must be exactly one additional CTE
    • it may have any unique descriptive name you like
    • it must begin with select base_frags.* with the appropriate CTE name matching the base fragment CTE
    • it must add at least one column (or it would be uninteresting)
    • it may not have any clause other than the first from (e.g. no where, having, limit etc.)
      • if any of these were allowed they would remove or re-order rows in the base query which is not allowed
      • the from clause often includes nested selects which have no restrictions
    • it must select from the base fragment name and left outer join to whereever it likes to get optional additional columns
      • because of this the additional column(s) will certainly be a nullable type in the projection
  • the final select must be of the form select * from col_adder_frag with the appropriate name
  • keeping all this in mind, the interesting bit happens here: left outer join added_columns on base_frag.id = added_columns.id
    • this is where you get the data for your additional column using values in the core columns

This fragment can be (and should be) compiled in its own compiland while using #include to get the base fragment only. This will result in code gen for the accessor functions for a piece of the overall query -- the part this extension knows about. Importantly code that uses this extension's data does not need or want to know about any other extensions that may be present, thereby keeping dependencies under control.

The C signatures generated would look like this:

extern cql_int32 adds_columns_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull adds_columns_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double adds_columns_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nullable adds_columns_get_data(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 adds_columns_result_count(
base_frag_result_set_ref _Nonnull result_set);

Even if there were dozens of other extensions, the functions for reading those columns would not be declared in the header for this extension. Any given extension "sees" only the core columns plus any columns it added.

Adding Rows

Query extensions also frequently want to add additional rows to the main result set, based on the data that is already present.

The second form of extension allows for this, it is similarly locked in form. Here is an example:

@attribute(cql:extension_fragment=base_frag)
create proc adds_rows(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate),
row_adder_frag(*) as (
select * from base_frag
union all
select * from added_rows)
select * from row_adder_frag;
end;

Let's review the features of this second template form:

  • there is a surrogate for the core query
  • there is a manatory second CTE
  • the second CTE is a compound query with any number of branches, all union all
  • the first branch must be select * from base_frag (the base fragment) to ensure that the original rows remain
    • this is also why all the branches must be union all
  • this form cannot add new columns
  • the extension CTE may not include order by or limit because that might reorder or remove rows of the base
  • any extensions of this form must come before those of the left outer join form for a given base fragment
    • which ironically means row_adder_frag has to come before col_adder_frag
  • the usual restrictions on compound selects (same type and number of columns) ensure a consistent result
  • the final select after the CTE section must exactly in the form select * from row_adder_frag which is the name of the one and only additional CTE with no other clauses or options
    • in practice only the CTE will be used to create the final assembly so even if you did change the final select to something else it would be moot

The signatures generated for this will look something like so:

extern cql_int32 adds_rows_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull adds_rows_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double adds_rows_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 adds_rows_result_count(
base_frag_result_set_ref _Nonnull result_set);

Which gives you access to the core columns. Again this fragment can and should be compiled standalone with only the declaration for the base fragment in the same translation unit to get the cleanest possible output. This is so that consumers of this extension do not "see" other extensions which may or may not be related and may or may not always be present.

Assembling the Fragments

With all the fragments independently declared they need to be unified to create one final query. This is where the major rewriting happens. The assembly_fragment looks something like this:

@attribute(cql:assembly_fragment=base_frag)
create proc base_frag(id_ integer not null)
begin
with
base_frag(*) as (select 1 id, "name" name, 1.0 rate)
select * from base_frag;
end;

It will always be as simple as this, all the complexity is in the fragments.

  • the assembly_fragment name must match the core fragment name
  • the procedure arguments must be identical to the base fragment arguments
  • the procedure must have the same name as the assembly fragment (base_frag in this case)
    • the code that was generated for the previous fragments anticipates this and makes reference to what will be generated here
    • this is enforced
  • the assembled query is what you run to get the result set, this has real code behind it
    • the other fragments only produce result set readers that call into the helper meethods to get columns
  • there is a surrogate for the core fragment as usual
  • all of CTE section will ultimately be replaced with the fragments chained together
  • the final select should be of the form select * from your_frags but it can include ordering and/or filtering, this statement will be present in final codegen, the final order is usually defined here

When compiling the assembly fragment, you should include the base, and all the other fragments, and the assembly template. The presence of the assembly_fragment will cause codegen for the extension fragments to be suppressed. The assembly translation unit only contains the assembly query as formed from the fragments.

Now let's look at how the query is rewritten, the process is pretty methodical.

After rewriting the assembly looks like this:

CREATE PROC base_frag (id_ INTEGER NOT NULL)
BEGIN
WITH
base_frag (id, name, rate) AS (SELECT *
FROM my_table
WHERE my_table.id = id_),
row_adder_frag (id, name, rate) AS (SELECT *
FROM base_frag
UNION ALL
SELECT *
FROM added_rows),
col_adder_frag (id, name, rate, data) AS (SELECT row_adder_frag.*, added_columns.data
FROM row_adder_frag
LEFT OUTER JOIN added_columns ON row_adder_frag.id = added_columns.id)
SELECT *
FROM col_adder_frag;
END;

Let's dissect this part by part, each CTE serves a purpose.

  • the core CTE was replaced by the CTE in the base_fragment, it appears directly
  • next the first extension was added as a CTE referring to the base fragment just as before
    • recall that the first extension has to be row_adder_frag, as that type must come first
    • looking at the chain you can see why it would be hard to write a correct fragment if it came after columns were added
  • next the second extension was added as a CTE
    • all references to the base fragment were replaced with references to row_adder_frag
    • the extra column names in the CTE were added such that all previous column names are introduced
  • this process continues until all extensions are exhausted
  • the final select statement reads all the columns from the last extension CTE and includes and ordering and so forth that was present in the assembly query

The result of all this is a single query that gets all the various columns that were requested in all the extensions and all the union all operations play out as written. The extensions are emitted in the order that they appear in the translation unit with the assembly, which again must have the row adding extensions first.

This facility provides considerable ability to compose a large query, but each fragment can be independently checked for errors so that nobody ever has to debug the (possibly monstrous) overall result. Fragments can be removed simply by excluding them from the final assembly (with e.g. #ifdefs, or build rules)

With the rewrite of the assembly_fragment complete, the codegen for that procedure is the normal codegen for a procedure with a single select.

As always, Java and Objective C codegen on these pieces will produce suitable wrappers for the C.

The output code for the assembly fragment generates these reading functions:

extern cql_int32 base_frag_get_id(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_string_ref _Nonnull base_frag_get_name(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_double base_frag_get_rate(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
// used by adds_columns_get_data() to read its data
extern cql_string_ref _Nullable __PRIVATE__base_frag_get_data(
base_frag_result_set_ref _Nonnull result_set,
cql_int32 row);
extern cql_int32 base_frag_result_count(
base_frag_result_set_ref _Nonnull result_set);

These are exactly what you would get for a normal query except that the pieces that came from extensions are marked PRIVATE. Those methods should not be used directly but instead the methods generated for each extension proc should be used.

Additionally, to create the result set, as usual.

extern CQL_WARN_UNUSED cql_code base_frag_fetch_results(
sqlite3 *_Nonnull _db_,
base_frag_result_set_ref _Nullable *_Nonnull result_set,
cql_int32 id_);

With the combined set of methods you can create a variety of assembled queries from extensions in a fairly straightforward way.

Last updated on by Rico Mariani