Some updates on the CQL schema upgrade system

Foreword

I was tempted to subtitle this article "How a great idea went horribly, horribly, wrong" but in the final analysis the outcome isn't actually at all horrible. But there are some good lessons here, and it's useful to capture the history while it is still fresh.

Introduction and Context

The CQL compiler can produce for you, starting from a set of table declarations and schema annotations, a schema upgrader that can upgrade your schema from any previous version to the current version, provided some simple rules are followed. Helpfully, the compiler enforces those rules with plain error messages so that you can reasonably expect your upgrader to work provided all is well with your database connection.

Broadly, the entities of the schema are on one of two plans, "create", and "recreate". These notions are all discussed in more detail in Chapter 10 of the guide.

The Create Plan

This plan applies strictly to tables, and is used for tables that have precious data that cannot reasonably be restored from say the cloud or some on-device backup. Typically the primary data is on this plan.

On this plan you are limited to these operations:

new tables can be created (including the so called baseline tables, those having no annotation at all)
columns can be added to the end of a table such that an ALTER TABLE ADD COLUMN statement could add them
columns can be deleted, making them illegal to use in queries but otherwise having no physical consequence
- in CQL "select from foo" will not include deleted columns hence "" is fully expanded
tables can be deleted, leaving a tombstone in the schema
- the tombstone provides the clue to the upgrader that the table should be dropped if it is found

The primary directives for this plan use @create annotations, hence the name.

The Recreate Plan

Triggers, Indicies, and Views are all on this plan and tables can be too if they are annotated with @recreate instead of @create. The idea with this plan is that if the entity changes at all you simply drop the old version and create the new version. This means any change is possible but it also means the upgrade is always destructive:

if the upgrader is going to do anything at all it drops all views and all triggers at the start and recreates them at the end
- this not destructive and takes a lot of weird failure modes off the table
- note steps in the upgrade logic therefore cannot rely on the existence of views or triggers
if any index or table changes at all it is dropped and recreated
- this is done by computing a 64 bit CRC of the entities schema and comparing it to the stored CRC
- if the CRC is changed the recreate happens

Probably the first thing you noticed once you create the notion of recreate for tables is that you really want to do the recreation in groups. There are constellations of schema that have related information and if one of them changes they all need to be updated. This lets you have complex foreign key relationships within this "recreate group".

You'll also notice that a recreate group can have foreign keys within itself and it can make foreign keys to things that are on the create plan but you run into trouble if you try to make foreign keys to some other recreate group. That group might vanish on you, or rather, it might try to vanish and discover that it cannot because of constraint violations. Originally recreate groups could not refer to other groups but recently this was generalized to track a directed acyclic graph of groups. This means that a core group recreating forces the recreation of any groups that refer to it. On this plan its common to end up with a snowflake type schema where the outer parts of the snowflake update often and the inner parts hardly at all.

Overall CRC

In addition to the CRCs for the recreate groups, and indices there was a one CRC for overall schema. The upgrader checks this before anything else. If the overall schema CRC matches the current schema then nothing needs to be done (the upgrader has already done its job). If it doesn't match then some steps have to be applied.

Immutable Schema Versions

Other than the cross-group dependencies things began in the form above. The recreate plan was CRC driven and the create plan was version driven. The original design simply generated the appropriate corrections at each schema version and tracked the current version. If the overall CRC had changed, whichever steps you needed were executed.

This turned out to be a disaster and it was changed within days.

The idea seems fine enough, but the first thing you run into is that two people might make a change to the schema creating say version 5. The problem is if one person adds table X and the other adds table Y they will each run their own and mark themselves as schema 5. When they merge their changes with some other developer, the version 5 upgrade will have already run and they will get nothing despite the fact that v5 includes more for both of them. This is crazytown for developers.

So, rather than simply tracking the current schema version, each schema version got its own mini-CRC. The upgrader would run the steps of each version if the CRC was absent or didn't match. With steps like CREATE TABLE IF NOT EXISTS and so forth a merge would result in you getting the other half of the changes for your version and work could accumulate at say schema v5 with no problems. Actual customers would never see this because they only saw completed schema versions.

This worked a lot better and lasted about a year.

The problem is that the system is based on these "mostly immutable" schema versions. You never restate the past you always give instructions on how to move forward. With the versions being nearly immutable, and the upgrade steps being idempotent, things seemed good. But it turns out neither of those two assumptions was really exactly true.

Mutating Schema Versions

The reality of the schema we created for our platform was that there was one large uber schema that had all the possible schema you might need for a variety of features and any give product could opt in to the features it wanted, thereby getting the necessary schema. The schema system had a good way to partition the schema using regions. The upgrader could work on a set of regions and provide the union of schema in those regions, omitting the rest.

Super. Here's where things get exciting. A schema consumer could reasonably decide at some time in the future that it wants new features and so it opts into additonal regions. That's fair enough, but the net of this is that of course new tables appear. Some of these are likely to be in the baseline schema (v0) and some might have appear later (e.g. v5, v10, v21). This is all fine, the CRCs for those versions change and the schema upgrader runs again. Those versions execute and add the correct schema. Perfect.

Actually no.

Zombie Tables

About two years into the history of CQL we started noticing that some attempts to delete tables were failing. The DROP commands claimed that there was a constraint problem -- but these should have been leaf tables. What constraint could possibly be the issue? This was the first time a major design flaw with this system was revealed. Previous bugs had been few and had all been silly logic errors or off by one checks in version numbers, that kind of thing, easily fixed. This was a puzzler. But the answer was fortunately available in the set of annotations.

Basically, imagine a table "zombie" had been created say in the baseline schema, and then later deleted; suppose it was deleted in version 20. All is well, the upgrade steps for version 20 include a table drop. However, now a team subscribes to more schema, causing the v0 schema to include a few more tables. Here's the problem, when the steps for v0 run again they notice that "zombie" is missing and helpfully create it, thinking this is the right thing to do. But this is a disaster... The "zombie" table is supposed to be deleted in v20 but that CRC is unchanged! So now a table exists that has no business existing. If "zombie" has an FK reference to some other table which we wnat to delete, then all attempts to drop that table will fail because "zombie" is there gumming up the works. Even if it's empty... which it will be in this case.

This problem was fixed by having all tables that need deleting be unconditionally deleted at the end of the upgrade and not in the steps for the version in which the delete happened. This meant that the next upgrade purged all the zombies and enabled the correct table drops to start running with no errors. The consequence of this was a 90% reduction in schema upgrade failures!

Unsubscription

Another reason for the "immutable" version history to (cough) mutate was a desire to opt out of tables. As described in this section we created an affordance to allow people to unsubscribe from some of the tables they had previously selected. This provided finer-grain control of the schema subscription and also made it possible to undo previous mistakes of over-subscription. However, it was clear from the beginning that you might want to undo an unsubscription at some time in the future. In keeping with schema directives that create a clear history the @unsub and @resub statements were added to the language with lots of rules for correctness. The upgrader did the following things:

upon finding an unsubscription at version X that version includes DDL to drop the unsubscribed table
changes to that table in an future versions were omitted
upon finding a resubscription at version Y that version included DDL to create the table as it exists at version Y
later changes to that table are once again emitted as usual

This was very nearly right except it had the same problem as the delete case above. A table created in say the baseline might come back as a zombie even though it was unsubscribed. However, now wise to this issue a small fix takes care of the problem.

always drop tables in the unsubscribed state at the end just like delete tables
no code is needed to do an unsubscribe at version x (the drop at the end will do the job)
a resubscribe at version X first drops the table and then recreates as it exists at version X

This gives us a consistent upgrade path again and importantly avoids the problem of a resubscription finding a zombie that prevents it from doing its job.

Performance Optimization 1

On July 1, 2022 we made a modest change that reduced the number of SQLite statements required to do a full upgrade. The opportuntity came from the many column existence checks we made before running ALTER TABLE ADD COLUMN. Rather than run a statement that looked like this (SELECT EXISTS(SELECT * FROM sqlite_master WHERE tbl_name = table_name AND sql GLOB column_declaration_pattern)) for each column we first selected all of the table schema out of the sqlite_master table and put it into a hash table keyed by the table name. Reading even a few hundred table names was much faster than running a single statement for each column that needed to be checked -- especially when recreating the schema from scratch. In the most relevant test case this was a 7% improvement.

Importantly, it motivated us to add hash tables into cqlrt_common.c and generalize the mechanism for object management so that the cqlrt allows creation of new objects without having to add special support for each one.

This new hash table meant that we could do a hash lookup and substring match instead of a sqlite query for each column.

Performance Optimization 2

On Oct 11, 2022 we stopped using CRCs for the version checks on the create plan entirely. This was in fact an optimization but it was motivated by a real, but rare, problem.

What was happening was something like maybe 1 in 10^5 databases was missing columns. The sequence of events that caused this was very hard to diagnose but the situation was very clear. The database was at say schema version 100. The columns had been added at say version 50. The CRCs indicated that the v50 upgrade had already run so it didn't run again. The columns would now never be added.

We had yet to come up with a set of steps that would adequately describe how this happened. I have to guess some combination of a resubscription ran because of one of those "the schema is not really immutable" changes and then "medium" version of the table say v25 was resubscribed but the columns added in say v50 never got readded because v50 thought it had already run.

This was getting to be a nightmare but there was a simple solution.

We already had created this dictionary that had all the tables and their schema from sqlite master, we were already using it to determine if we needed to add a particular column. The only reason we had version CRCs at all was to allow us to skip steps, but since we could already skip column adds super fast all we needed was to be able to skip table adds -- there is nothing else. Well the same hash table can obviously easily tell us if a table exists. Non-existent tables have no schema and hence are absent from the hash table which is loaded directly from sqlite_master.

So the new algorithm, goes something like this:

use the version numbers only for ordering
before adding a table, check if it exists in the table, this is faster htran running CREATE TABLE IF NOT EXISTS
check the columns as before
attempt each of these every time the overall schema changes, and trust that the fast checks are fast enough

On this plan we change the way @unsub and @resub are handled to something much simpler:

@unsub acts like an out of band @delete on the table or view to which it is applied
- the drop happens at the end like before
@resub resets that state so the table is considered not deleted if the last operation was @resub

To this we add one new rule:

the schema upgrader removes any reference to deleted tables entirely
- they are removed from baseline
- they are not included in any upgrade rules
- they are only dropped at the end if they still exist

This vastly simplifies unsub/resub and delete. An unsubscribed table will always get cleaned up at the end, just like deleted tables. No strange interim states happen in resub. If a table is resubcribed it just reappears in the schema and the various operations run as usual.

The only compromise to this is that we still have a single CRC for the overall baseline schema. However even that could be removed at the expense of more hash table lookups. There is a binary size win for fewer checks and since baseline by definition depends on nothing it seems like safe position to take.

This approach was about 13-15% faster in fact, the time saved examining and writing back schema CRCs more than paid for the extra hash table checks (which were ~100x faster than the db operations). And the hash table already existed! The reduction of the CRC checks and removal of vestigial upgrade logic for deleted tables also resulted in a 2.2% reduction of upgrader size for our most important case.

The most recent change and further simplications in unsub/resub logic

With all of this in place it's clear that the various rules for unsubscription and resubscription and the sort of historical playback that was used to try to create these immutable stages is moot. The only thing that matters is if we end in the unsubscribed state. Removing the unsubscribe upgrade steps from the upgrader entirely just simplifies everything. So no @resub is needed at all nor are @unsub version numbers. Presently set to land is set of changes that remove resubcription entirely, to resubscribe you simply remove the @unsub directive for your table/view.

This lets us eliminate a bunch of validations and test cases to get a simpler, clearer, and more easily verifiable upgrader. There's just much less to go wrong.

Even crazy cases like "an unsubscription happens in v5, the resubscription happens in v7, a user runs the upgrade and they might have a database that is v4, v5, v6, or v7 (having already been upgraded)". All of these had different potential flows before. Now they are all the same. All the cases will roll the table forward to v7 from whatever version they might be on with the usual rules and states particular to unsubscription or resubscription. The table is present or it isn't. It is missing columns or it isn't. Same as always.

A Versionless Future

More thinking is needed here but it's clear that now that we've arrived at this simpler place ALL the version numbers are moot. The only thing we really have to do with version numbers is run ad hoc migrations at the appropriate time, and only once. The rules for migrators would have to change such that they are responsible for finding the state of the schema, and maybe some context could be provided for this. But ad hoc data migrators are very uncommon and regular annotations are much more so.

Conclusion

The fundamental assumption about how schema changes would happen was wrong. Even so, it was close enough that upgrades were over 99.99% successful when the other parts of the system are working ok. This is probably about the best we can hope for given the state of affairs with flash drives on Android devices. The current system is actually pretty close code-wise to what we had planned -- just look at the Oct 11, 2022 diff to see what I mean. It's not that big of a difference in the end. The new system has been deployed for nearly a month now and it is immune basically all of the failure modes of the old. It will take some time before we know what its true reliability is given the low failure rate of both. But we do know the new system is significantly faster. Optimizations 1 and 2 together are over 20% for full installations.

I should note that someone who was obviously smarter than me told me that we would land on a solution like this and I didn't believe them. They were of course correct. You know who you are. Sorry I doubted you.

Foreword​

Introduction and Context​

The Create Plan​

The Recreate Plan​

Overall CRC​

Immutable Schema Versions​

Mutating Schema Versions​

Zombie Tables​

Unsubscription​

Performance Optimization 1​

Performance Optimization 2​

The most recent change and further simplications in unsub/resub logic​

A Versionless Future​

Conclusion​