Part 8: Test Helpers
Prefaceβ
Part 8 continues with a discussion of the Test Helper generation code. As in the previous sections, the goal here is not to go over every detail but rather to give a sense of how helpers are created in general -- the core strategies and implementation choices -- so that when reading the source you will have an idea how it all hangs together.
Test Helpersβ
The testability features are described in Chapter 12 of the Guide So, we won't be discussing all the details of what can be created. Instead we're going to go over the theory of how the generator works. This generator is somewhat different than others in that it only concerns itself with procedures and only those that have been suitably annotated -- there are large parts of the tree that are of no interest to the test helper logic, including, importantly the body of procedures. Only the signature matters. As we'll see there is a fairly large family of generators that are like this.
We'll have one section for every kind of output, but really only the dummy_test
helper is
worthy of detailed discussion the others, as we'll see, are very simple.
Initializationβ
The generator is wired like the others with a suitable main, this one is pretty simple:
// Main entry point for test_helpers
cql_noexport void cg_test_helpers_main(ast_node *head) {
Contract(options.file_names_count == 1);
cql_exit_on_semantic_errors(head);
exit_on_validating_schema();
cg_test_helpers_reset_globals();
CHARBUF_OPEN(output_buf);
cg_th_output = &output_buf;
bprintf(cg_th_output, "%s", rt->source_prefix);
cg_test_helpers_stmt_list(head);
cql_write_file(options.file_names[0], cg_th_output->ptr);
CHARBUF_CLOSE(output_buf);
cg_test_helpers_reset_globals();
}
The text output will be ultimately put into output_buf
defined here and helper_flags
will track which kinds of helpers
we saw. This helps us to emit the right sections of output as we'll see.
The code iterates the AST looking at the top level statement list only and in particular looking for CREATE PROC
statements.
// Iterate through statement list
static void cg_test_helpers_stmt_list(ast_node *head) {
Contract(is_ast_stmt_list(head));
init_all_trigger_per_table();
init_all_indexes_per_table();
CHARBUF_OPEN(procs_buf);
CHARBUF_OPEN(decls_buf);
cg_th_procs = &procs_buf;
cg_th_decls = &decls_buf;
for (ast_node *ast = head; ast; ast = ast->right) {
EXTRACT_STMT_AND_MISC_ATTRS(stmt, misc_attrs, ast);
if (is_ast_create_proc_stmt(stmt)) {
EXTRACT_STRING(proc_name, stmt->left);
cg_test_helpers_create_proc_stmt(stmt, misc_attrs);
}
}
bprintf(cg_th_output, "%s", decls_buf.ptr);
bprintf(cg_th_output, "\n");
bprintf(cg_th_output, "%s", procs_buf.ptr);
CHARBUF_CLOSE(decls_buf);
CHARBUF_CLOSE(procs_buf);
symtab_delete(all_tables_with_triggers);
all_tables_with_triggers = NULL;
symtab_delete(all_tables_with_indexes);
all_tables_with_indexes = NULL;
}
There are some preliminaries:
- we make a symbol table that maps from tables names to the list of triggers on that table by walking all the triggers
- we make a symbol table that maps from tables names to the list of indices on that table by walking all the indices
- we'll need two buffers one for declarations (that must go first) and one for procedure bodies
- each
CREATE PROC
statement potentially contributes to both sections cg_test_helpers_create_proc_stmt
checks for the helper attributes and sets up the dispatch to emit the test helpers
To do this we have to walk any misc attributes on the procedure we're looking for things of the form @attribute(cql:autotest=xxx)
static void cg_test_helpers_create_proc_stmt(ast_node *stmt, ast_node *misc_attrs) {
Contract(is_ast_create_proc_stmt(stmt));
if (misc_attrs) {
helper_flags = 0;
dummy_test_infos = symtab_new();
find_misc_attrs(misc_attrs, test_helpers_find_ast_misc_attr_callback, stmt);
symtab_delete(dummy_test_infos);
dummy_test_infos = NULL;
}
}
find_misc_attrs
calls test_helpers_find_ast_misc_attr_callback
. We're going to keep track of
which kinds of helpers we have found to help us with the output. This is where helper_flags
comes in. The flags are:
#define DUMMY_TABLE 1 // dummy_table attribute flag
#define DUMMY_INSERT 2 // dummy_insert attribute flag
#define DUMMY_SELECT 4 // dummy_select attribute flag
#define DUMMY_RESULT_SET 8 // dummy_result_set attribute flag
#define DUMMY_TEST 0x10 // dummy_test attribute flag
And now we're ready for actual dispatch:
// This is invoked for every misc attribute on every create proc statement
// in this translation unit. We're looking for attributes of the form cql:autotest=(...)
// and we ignore anything else.
static void test_helpers_find_ast_misc_attr_callback(
CSTR _Nullable misc_attr_prefix,
CSTR _Nonnull misc_attr_name,
ast_node *_Nullable ast_misc_attr_value_list,
void *_Nullable context)
{
ast_node *stmt = (ast_node *)context;
Contract(is_ast_create_proc_stmt(stmt));
if (misc_attr_prefix &&
misc_attr_name &&
!Strcasecmp(misc_attr_prefix, "cql") &&
!Strcasecmp(misc_attr_name, "autotest")) {
...
}
}
The main dispatch looks like this:
// In principle, any option can be combined with any other but some only make sense for procs with
// a result.
EXTRACT_STRING(autotest_attr_name, misc_attr_value);
if (is_autotest_dummy_test(autotest_attr_name)) {
cg_test_helpers_dummy_test(stmt);
}
// these options are only for procs that return a result set
if (has_result_set(stmt) || has_out_stmt_result(stmt) || has_out_union_stmt_result(stmt)) {
if (is_autotest_dummy_table(autotest_attr_name)) {
helper_flags |= DUMMY_TABLE;
cg_test_helpers_dummy_table(proc_name);
}
else if (is_autotest_dummy_insert(autotest_attr_name)) {
helper_flags |= DUMMY_INSERT;
cg_test_helpers_dummy_insert(proc_name);
}
else if (is_autotest_dummy_select(autotest_attr_name)) {
helper_flags |= DUMMY_SELECT;
cg_test_helpers_dummy_select(proc_name);
}
else if (is_autotest_dummy_result_set(autotest_attr_name)) {
helper_flags |= DUMMY_RESULT_SET;
cg_test_helpers_dummy_result_set(proc_name);
}
}
Most of these options are very simple indeed. cg_test_helpers_dummy_test
is the trickiest
by far and we'll save it for last, let's dispense with the easy stuff.
Dummy Table, Dummy Insert, Dummy Select, Dummy Result Setβ
All of these are a very simple template. The language includes just the right features
to emit these procedures as nearly constant strings. The LIKE
construct was literally
designed to make these patterns super simple. You can see all the patterns
in Chapter 12 but let's look at the code for
the first one. This is "dummy table".
// Emit an open proc which creates a temp table in the form of the original proc
// Emit a close proc which drops the temp table
static void cg_test_helpers_dummy_table(CSTR name) {
bprintf(cg_th_procs, "\n");
bprintf(cg_th_procs, "CREATE PROC open_%s()\n", name);
bprintf(cg_th_procs, "BEGIN\n");
bprintf(cg_th_procs, " CREATE TEMP TABLE test_%s(LIKE %s);\n", name, name);
bprintf(cg_th_procs, "END;\n");
bprintf(cg_th_procs, "\n");
bprintf(cg_th_procs, "CREATE PROC close_%s()\n", name);
bprintf(cg_th_procs, "BEGIN\n");
bprintf(cg_th_procs, " DROP TABLE test_%s;\n", name);
bprintf(cg_th_procs, "END;\n");
}
The purpose of this is to create helper functions that can create a temporary
table with the same columns in it as the procedure you are trying to mock.
You can then select rows out of that table (with dummy_select
) or insert
rows into the table (with dummy_insert
). Or you can make a single
row result set (often enough) with dummy_result_set
.
As we can see we simply prepend open_
to the procedure name and use
that to create a test helper that make the temporary table. The table's
columns are defined to be LIKE
the result shape of the procedure under
test. Recall this helper is only available to procedures that return a result set.
The temporary table gets a test_
prefix. Assuming the procedure with the
annotation is foo
then this code is universal:
CREATE TEMP TABLE test_foo(LIKE foo);
Is universal, no matter the result shape of foo
you get a table with those columns.
For this to work we need to emit a declaration of foo
before this code. However,
since we have the full definition of foo
handy that is no problem. We remember
that we'll need it by setting a flag in helper_flags
.
The code for close_foo
is even simpler if that's possible. The great thing is
all need to know the columns of foo
has been removed from the test helper. The
CQL compiler handles this as a matter of course and it is generally useful.
See Chapter 5
for more examples.
All the others are equally simple and use similar tricks. These were the first test helpers. They're actually not that popular because they are so easy to create yourself anyway.
Dummy Testβ
The dummy test code emitter is non-trivial. Let's quickly review the things it has to
do and then we can go over how each of these is accomplished. Assuming we have an procedure
your_proc
that has been annotated like this:
@attribute(cql:autotest=(dummy_test))
create proc your_proc(..args...)
begin
-- assorted references to tables and views
end;
Dummy test will produce the following:
test_your_proc_create_tables
- a procedure that creates all the tables and views that
your_proc
needs
- a procedure that creates all the tables and views that
test_your_proc_drop_tables
- a procedure that drops those same tables and views
test_your_proc_create_indexes
- a procedure that creates your indices, in a test you may or may not want to create the indices
test_your_proc_drop_indexes
- a procedure the drops those same indices
test_your_proc_create_triggers
- a procedure that creates your trigger, in a test you may or may not want to create the triggers
test_your_proc_drop_triggers
- a procedure the drops those same triggers
test_your_proc_read_table1
- for each table or view in the
create_tables
a procedure that selects all the data out of that object is created in case you need it
- for each table or view in the
test_your_proc_populate_tables
- a procedure that loads all the tables from
create_tables
with sample data - FK relationships are obeyed
- user data may be specified in an attribute and that data will be used in preference to auto-generated data
- a procedure that loads all the tables from
These are more fully discussed in Chapter 12.
Building the Trigger and Index mappingsβ
In order to know which indices and triggers we might need we have to be able to map from the tables/views in your_proc
to the indices.
To set up for this a general purpose reverse mapping is created. We'll look at the triggers version. The indices version is nearly identical.
// Walk through all triggers and create a dictionnary of triggers per tables.
static void init_all_trigger_per_table() {
Contract(all_tables_with_triggers == NULL);
all_tables_with_triggers = symtab_new();
for (list_item *item = all_triggers_list; item; item = item->next) {
EXTRACT_NOTNULL(create_trigger_stmt, item->ast);
EXTRACT_NOTNULL(trigger_body_vers, create_trigger_stmt->right);
EXTRACT_NOTNULL(trigger_def, trigger_body_vers->left);
EXTRACT_NOTNULL(trigger_condition, trigger_def->right);
EXTRACT_NOTNULL(trigger_op_target, trigger_condition->right);
EXTRACT_NOTNULL(trigger_target_action, trigger_op_target->right);
EXTRACT_ANY_NOTNULL(table_name_ast, trigger_target_action->left);
EXTRACT_STRING(table_name, table_name_ast);
if (create_trigger_stmt->sem->delete_version > 0) {
// dummy_test should not emit deleted trigger
continue;
}
symtab_append_bytes(all_tables_with_triggers, table_name, &create_trigger_stmt, sizeof(create_trigger_stmt));
}
}
The steps are pretty simple:
- we make a symbol table that will map from the table name to an array of statements
- there is a convenient
all_triggers
list that has all the triggers - from each trigger we
EXTRACT
the table or view name (namedtable_name
even if it's a view) - we append the trigger statement pointer to the end of such statements for the table
- any triggers marked with
@delete
are not included for obvious reasons
At the end of this looking up the table name will give you a list of trigger statement AST pointers. From there of course you can get everything you need.
The index version is basically the same, the details of the EXTRACT
ops to go from index to table name are different
and of course we start from the all_indices_list
Computing The Dependencies of a Procedureβ
Sticking with our particular example, in order to determine that tables/views that your_proc
might need,
the generator has to walk its entire body looking for things that are tables. This is handled by the
find_all_table_nodes
function.
static void find_all_table_nodes(dummy_test_info *info, ast_node *node) {
table_callbacks callbacks = {
.callback_any_table = found_table_or_view,
.callback_any_view = found_table_or_view,
.callback_context = info,
.notify_table_or_view_drops = true,
.notify_fk = true,
.notify_triggers = true,
};
info->callbacks = &callbacks;
find_table_refs(&callbacks, node);
// stitch the views to the tables to make one list, views first
for (list_item *item = info->found_views; item; item = item->next) {
if (!item->next) {
item->next = info->found_tables;
info->found_tables = info->found_views;
break;
}
}
// this shouldn't be used after it's been linked in
info->found_views = NULL;
}
This code uses the general dependency walker in cg_common.c
to visit all tables and views. It is a recursive
walk and the general steps for prosecution go something like this:
- starting from
your_proc
the entire body of the procedure is visited - references to tables or views in update, delete, insert, select etc. statements are identified
- each such table/view is added to the found tables list (at most once)
- for views, the recursion proceeds to the body of the view as though the body had been inline in the procedure
- for tables, the recursion proceeds to the body of the table to discover any FK relationships that need to be followed
- if any found item has triggers, the trigger body is walked, any tables/views mentioned there become additional found items
- any given table/view and hence trigger is only visited once
The net of all this, the "found items", is a list of all the tables and views that the procedure uses, directly
or indirectly. As discussed in Chapter 12
this walk does not include tables and views used by procedures that your_proc
calls.
To get the dependencies in the correct order, the tables have been walked following the foreign key chain and all
views go after all tables. The views are stitched together. The business of diving into views/tables/triggers and
maintainence of the found items is done by the callback function found_table_or_view
. The actual source
is more descriptive comments than code but the code is included here as it is brief.
// comments elided for brevity, the why of all this is described in the code
static void found_table_or_view(
CSTR _Nonnull table_or_view_name,
ast_node *_Nonnull table_or_view,
void *_Nullable context)
{
Contract(table_or_view);
dummy_test_info *info = (dummy_test_info *)context;
bool deleted = table_or_view->sem->delete_version > 0;
if (!deleted) {
continue_find_table_node(info->callbacks, table_or_view);
if (is_ast_create_view_stmt(table_or_view)) {
add_item_to_list(&info->found_views, table_or_view);
}
else {
add_item_to_list(&info->found_tables, table_or_view);
}
find_all_triggers_node(info, table_or_view_name);
}
}
The general purpose walker notifies exactly once on each visited table/view and continue_find_table_node
is used to
dive into the bodies of views/tables that would otherwise not be searched. Likewise find_all_triggers_node
dives into the body of any triggers that are on the found item.
Emitting Indices and Triggersβ
With the "found tables" computed (creatively stored in a field called found_tables
) it's very easy to loop over these
and generate the necessary indices for each found table (keeping in mind the "found table" can be a view).
The create index statement
is emitted by the usual gen_statement_with_callbacks
form that echos the AST.
The drop index
can be trivially created by name.
// Emit create and drop index statement for all indexes on a table.
static void cg_emit_index_stmt(
CSTR table_name,
charbuf *gen_create_indexes,
charbuf *gen_drop_indexes,
gen_sql_callbacks *callback)
{
symtab_entry *indexes_entry = symtab_find(all_tables_with_indexes, table_name);
bytebuf *buf = indexes_entry ? (bytebuf *)indexes_entry->val : NULL;
ast_node **indexes_ast = buf ? (ast_node **)buf->ptr : NULL;
int32_t count = buf ? buf->used / sizeof(*indexes_ast) : 0;
gen_set_output_buffer(gen_create_indexes);
for (int32_t i = 0; i < count; i++) {
ast_node *index_ast = indexes_ast[i];
EXTRACT_NOTNULL(create_index_stmt, index_ast);
EXTRACT_NOTNULL(create_index_on_list, create_index_stmt->left);
EXTRACT_ANY_NOTNULL(index_name_ast, create_index_on_list->left);
EXTRACT_STRING(index_name, index_name_ast);
gen_statement_with_callbacks(index_ast, callback);
bprintf(gen_create_indexes, ";\n");
bprintf(gen_drop_indexes, "DROP INDEX IF EXISTS %s;\n", index_name);
}
}
Triggers are done in exactly the same way except that instead of looping over found tables we can
actually generate them as they are discovered inside of find_all_triggers_node
. Recal that we
had to visit the triggers when computing the found tables anyway. We did not have to visit the indices
hence the difference.
These walks allow us to produce: test_your_proc_create_indexes
, test_your_proc_drop_indexes
, test_your_proc_create_triggers
, test_your_proc_drop_triggers
Emitting Tables and Viewsβ
Starting from the found tables, again it is very easy to generate the code to create and drop the tables and views. The only trick here is that the tables depend on one another so order is important. The tables are discovered with the deepest dependency first, new found items are added to the head of the found tables but it's a post-order walk so that means that the deepest tables/views are at the front of the list. This means the list is naturally in the order that it needs to be to delete the tables (parent tables at the end). So the algorithm goes like this:
- emit the drop tables/views in the found order
- reverse the list
- emit the create tables/views in the reverse order
- for each table/view emit the reader `testyour_proc_read[item]
- for tables we emit an insertion fragment into
test_your_proc_populate_tables
usingcg_dummy_test_populate
- population is discussed in the following sections
As in the other cases gen_statement_with_callbacks
is used to create the DDL statements:
CREATE TABLE
CREATE VIEW
CREATE VIRTUAL TABLE
The delete side is easily created with ad hoc DROP TABLE
or DROP VIEW
statements.
The reading procedure is always of the form SELECT * FROM foo
so that too is trivial to generate with a fixed template. The "echoing" system
once again is doing a lot of the heavy lifting.
These walks give us test_your_proc_create_tables
, test_your_proc_drop_tables
, and test_your_proc_read_[item]
and drive the population process
Gathering Ad Hoc Data To Be Insertedβ
Before we get into the mechanics of the population code, we have to visit one more area. It's possible to include data in the the
dummy_test
annotaiton itself. This is data that you want to have populated. This data will be included in the overall data populator.
If there is enough of it (at least 2 rows per candidate table) then it might be all the data you get. Now the data format here is
not designed to be fully general, after all it's not that hard to just write INSERT ... VALUES
for all your tables anyway. The goal
is to provide something that will help you not have to remember all the FK relationships and maybe let you economically specify some leaf
data you need and get the rest for free. It's also possible to manually create dummy data that just won't work, again, scrubbing all
this is way beyond the ability of a simple test helper. When the code runs you'll get SQLite errors which can be readily addressed.
So keeping in mind this sort of "entry level data support" as the goal, we can take a look at how the system works -- it's all
in the function collect_dummy_test_info
which includes this helpful comment on structure.
// the data attribute looks kind of like this:
// @attribute(cql:autotest = (
// .. other auto test attributes
// (dummy_test,
// (table_name1, (col1, col2), (col1_val1, col2_val1), (col1_val2, col2_val2) ),
// (table_name2, (col1, col2), (col1_val1, col2_val1), (col1_val2, col2_val2) ),
// ...
// )
// .. other auto test attributes
// ))
//
// we're concerned with the dummy_test entries here, they have a very specific format
// i.e. first the table then the column names, and then a list of matching columns and values
So we're going to walk a list of attributes each one begins with a table name, then a list of columns, and then a list of values.
All of the data is in the symbol table dummy_test_infos
which is indexed by table name. For each table name we find
we ensure there is a symbol table at that slot. So dummy_test_infos
is a symbol table of symbol tables. It's actually
going to be something like value_list = dummy_test_infos['table']['column']
// collect table name from dummy_test info
ast_node *table_list = dummy_attr->left;
EXTRACT_STRING(table_name, table_list->left);
symtab *col_syms = symtab_ensure_symtab(dummy_test_infos, table_name);
Next we're going to find the column names, they are the next entry in the list so we go right
to get the column_name_list
// collect column names from dummy_test info
ast_node *column_name_list = table_list->right;
for (ast_node *list = column_name_list->left; list; list = list->right) {
EXTRACT_STRING(column_name, list->left);
sem_t col_type = find_column_type(table_name, column_name);
bytebuf *column_values = symtab_ensure_bytebuf(col_syms, column_name);
// store the column meta data, create space to hold values in databuf
bytebuf_append_var(&col_data_buf, column_values);
bytebuf_append_var(&col_type_buf, col_type);
bytebuf_append_var(&col_name_buf, column_name);
}
The primary purpose of this part of the loop is then to add the column names to col_syms
so that they are linked to the dummy info for this table.
The line bytebuf *column_values = symtab_ensure_bytebuf(col_syms, column_name);
does this. And this also creates the byte buffer that will hold
the eventual values.
We also keep a side set of buffers that has the column name, type, and the values in the col_name
, col_type
, and col_data
buffers respectively.
These are used to handle the foreign key work shortly and they allow us to not have to look up all the names over and over.
// collect column value from dummy_test info. We can have multiple rows of column value
for (ast_node *values_ast = column_name_list->right;
values_ast;
values_ast = values_ast->right) {
int32_t column_index = 0;
// collect one row of column value
for (ast_node *list = values_ast->left; list; list = list->right) {
ast_node *misc_attr_value = list->left;
Contract(col_data_buf.used);
bytebuf *column_values = ((bytebuf **) col_data_buf.ptr)[column_index];
sem_t column_type = ((sem_t *) col_type_buf.ptr)[column_index];
CSTR column_name = ((CSTR *) col_name_buf.ptr)[column_index];
bytebuf_append_var(column_values, misc_attr_value);
column_index++;
...foreign key stuff goes here...
}
.. some cleanup
}
The most important part is bytebuf_append_var(column_values, misc_attr_value);
this is where the
attribute value is added to the list of values that are on the column.
Finally, the "foreign key stuff". What we need to do here is check the column name in the table to see if it's part of a foreign
key and if it is we recursively add the current data value to the referenced column in the reference table. That way
if you add an initalizer to a leaf table you don't also have to add it to all the parent tables. If it wasn't for this
feature the manual data wouldn't be very useful at all, hand written INSERT
statements would be just as good.
// If a column value is added to dummy_test info for a foreign key column then
// we need to make sure that same column value is also added as a value in the
// the referenced table's dummy_test info.
// e.g.
// create table A(id integer primary key);
// create table B(id integer primary key references A(id));
//
// If there is sample data provided for B.id then we must also ensure that
// the value provided for B.id is also add as a sample row in A with the same
// value for id.
if (is_foreign_key(column_type)) {
add_value_to_referenced_table(table_name, column_name, column_type, misc_attr_value);
}
When this is a done all of the initializers will have been added to the appropriate column of the appropriate table.
Again the overall structure is something like: value_list = dummy_test_infos['table']['column']
Emitting the Table Population Fragmentsβ
With any custom initalizers in the dummy_test_infos
structure we can do the population fragment for any given table.
The general algorithm here goes like this:
- the total number of rows we will generate will be the number of column values in the initializers or else
DUMMY_TEST_INSERT_ROWS
, whichever is larger - the insert statement generated will include
dummy_seed([value_seed])
where value_seed starts at 123 and goes up 1 for every row generated- dummy_seed will create values for any missing columns using the seed so any combination of included columns is ok, we'll always get a complete insert
- foreign key columns use a provided intializer from the parent table if there is one, or else they use 1, 2, 3 etc.
- likewise if a column is referenceable by some other table it uses the known sequence 1, 2, 3 etc. for its value rather than the varying seed
- in this way child tables can know that partent tables will have a value they can use since both tables will have at least
DUMMY_TEST_INSERT_ROWS
and any rows that were not manually initialized will match - note that foreign key columns always get this treatment, whether they were mentioned or not
- to mix things up the
dummy_nullables
anddummy_defaults
are added on every other row which makes missing values be NULL and/or the default value if one is present
This is enough to generate a set of insert statements for the table in question and since the fragments are generated in the table creation order the resulting insert statements will have the parent tables first so the foreign keys of later tables will be correct.
This can go wrong if the manual initializations use keys that conflict with the default generation or if the manual intializations have PK conflicts or other such things. No attempt is made to sort that out. The run time errors should be clear and these are, after all, only test helpers. It's very easy to avoid these hazards and you get a pretty clear error message if you don't so that seems good enough.
These fragments are ultimately combined to make the body of the procedure test_your_proc_populate_tables
.
Recapβ
The test helpers in cg_test_helpers.c
are very simple nearly-constant templates with the exception of dummy_test
which includes:
- table and view creation
- index creation
- trigger creation
- data population
Topics covered included:
- how the candidate procedures are discovered
- how the attributes are scanned for test directives
- how each dummy test type is dispatched
- how
dummy_test
handles data initialization - how
dummy_test
does its dependency analysis
As with the other parts, no attempt was made to cover every function in detail. That is best done by reading the source code. But there is overall structure here and an understanding of the basic principles is helpful before diving into the source code.