Instant Database Reset and Easier Bug Fixing using Clones

The article demonstrates how the use of lightweight, containerized clones removes the problem of 'mopping up' after a database migration fails part way through, particularly in database systems like MySQL that don't support DDL transaction rollback. A quick reset to the previous saved revision provides 'instant rollback'. It also demos how they make it much easier and less disruptive to do retrospective bug fixing on older migrations.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

This article illustrates the benefits of using the cloning capability (rgclone) in Redgate Test Data Manager to save each version of a containerized Flyway database as a data container revision, both for instant recovery from current database migrations errors and for bug fixing older problems.

If a failed migration or test leaves your database in an ‘unknown state’, you’ll face the task of resetting it to its initial state. If your only means to do this is by undoing the sequence of changes, or restoring a backup, it will be time-consuming and you risk losing work. By contrast, when working with clones a developer can, with a single command (rgclone reset), instantly reset their working copy of database without any risk. In fact, with clones you are encouraged to consider the database, however large, as a volatile resource. This makes it much easier to develop and test tricky migrations. When you hit a problem, reset to the previous version, fix the problems, re-run the tests, and do this repeatedly until everything works.

Similarly, clones also make it much easier to tackle lingering bugs or design flaws in older migrations. For example, by using rgclone graduate, a developer can quickly create a ‘sub-branch’ of development from any saved revision to investigate and resolve these issues, without disrupting the current development stream.

The result should be thoroughly tested, more reliable migrations, fewer bugs, and lower-risk online database deployments.

Mopping up after a failed migration

You are applying a migration to a database, during development, perhaps making some sort of change in a branch. You want your database to be left in a state where either all the statements have been applied, creating the new version, or none of them have been applied, leaving the database at its original version. However, if you’ve run a script that consists of several ALTER or CREATE SQL statements and you get an error that terminates the batch, then your database is likely to end up in a mess, unless you have a means of removing the effects of those statements that succeeded before the error.

With many relational database systems, it is easy to mop up, because we can simply wrap the whole migration script into a transaction and roll it back if something goes wrong. Flyway does this for you automatically, in cases where the RDBMS allows it. But if you use an RDBMS such as MySQL, MariaDB or SQLite that doesn’t allow DDL transaction rollback, then you must mop up the mess yourself.

flyway : ERROR: Migration of schema 'dbo' to version "1.1.10 - AddAddressesPhonesEtc" failed! Please restore backups and roll back database and code!
At line:1 char:33
+ ... nt $Credentials -Raw | flyway migrate -target="1.1.10" -configFiles=-
+                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (ERROR: Migratio...abase and code!:String) [], RemoteException
    + FullyQualifiedErrorId : NativeCommandError

As the error message, ‘Please restore backups and roll back database and code’ suggests, you’ll need to retreat to the initial version, either by restoring a backup or by undoing the sequence of changes that left the database in an indeterminate state. The latter is likely to be time-consuming and disrupt your development work, especially if the residual objects have inter-relationships. You’ll probably also need to repair the Flyway Schema History table.

Using containerized clones for development can’t protect you from a failed deployment; you’ll still need a tested rollback or fail forward strategy to deal with them. However, they can remove all the stress of a failed Flyway database migration during development, even for databases that can’t rollback DDL within a transaction, by saving each version of the database as a data container revision and instantly resetting in the event of an error.

Debugging a failed migration script

To get past the ‘what-could-possibly-go-wrong?’ approach to executing scripts, we tend to delve piecemeal into the creation of any sort of change script, working in much the same way as when building a brick wall. We establish a firm base then add one brick at a time.

From the starting database version (our firm base), we execute every statement in an IDE on a copy of the database, one at a time. If a step fails, we know where and probably why, and the preceding ‘bricks’ are all still there. If the whole pile looks wrong or fails to function as we expected, then we can revert to that firm base by undoing the sequence of changes.

Once every step works as it should, execute the whole lot in one go from Flyway to make sure it runs. To do so, either spin up another copy of the database at the starting version, or revert all the changes, using the ‘undo’ code you’ve developed, and then have Flyway run the migration script. If it works, you run your tests, reverting to the starting version each time you need to fix any problems. Finally, you need a reference copy of the current development version to create the merge migration script, and use Flyway to run the migration ensure that works too

What if Flyway still gives errors?

What if, despite your brick-by-brick approach, you still encounter unexpected migration errors in Flyway? Often, you’ll be able to identify the source of the error and fix it. Occasionally, Flyway doesn’t manage to provide the full details of the bug, especially with an RDBMS like MySQL. If so, you ‘ll have to revert to an IDE that shows the full error message, such as by using MySQL’s SHOW WARNINGS or SHOW ERRORS. By using the IDE, you can insert SELECT statements at several different points in the script to inspect the state of the data and variables at each step and verify your assumptions about changes in the data. If you are debugging DDL SQL statements, you can wrap your script in a DML transaction to test changes without permanently affecting the data but be aware of how locks and transaction states might affect your queries.

By now, you’ll appreciate that you’ll need several copies of the database to do this sort of work quickly and effectively. You have your working copy of the database (your branch copy) where you test your code at the statement level, using an IDE, a reference copy at the initial branch version to test the finished migration until you’re satisfied with it and can apply it to the reference copy of the latest development version:

Database required to support branch-based database development

Instant recovery from migration failures using containerized clones

If you’re developing on a clone, delivered as a data container, then you get a much simpler and faster version of the same process described above, because you can save your ‘firm base’ (the starting version of the branch) as a revision and then instantly revert to it. Your working copy of the database is now just a volatile resource, that you can spin up, reset and destroy at will, without fear of losing any valuable work. In fact, when working with clones, it’s a good idea to spin up a fresh clone often, because the benefits of the economy of resources lessen with the volume of alterations.

Demo setup: Auto-saving revision points

For details of installing rgclone client access to the Cloning server, see Getting Started with Cloning in Redgate Test Data Manager. I’ve also explained the basics of connecting to data containers. After that, the demo here uses a sample Flyway project for MySQL and relies on bits of setup and code that I’ve described elsewhere, as depicted below:

auto-saving flyway database as data container revisions

For steps 1-3 in the above diagram, you can use the code PowerShell script provided in Data Container Revisions, Resets and Graduations. It creates a data image and from that creates a data container, saving the connection details to a file in the user area. In this example, the image contains an empty copy of the database. However, it is more likely to include a recent version of the Flyway database, containing the test data, because you can use the graduate command to make an image from any version for which you have a revision point in your current container.

In step 2 it then connects to the containerized clone from Flyway so we can migrate it (step 3) to the required version (e.g., the starting version from our branch development work). Flyway keeps a complete record of the database on the database itself, not the server or filesystem, so it is very easy to use Flyway with containerized databases. If it is an empty database, it will create the schemas and the Flyway schema history table on the first migration.

Step 4 uses a Apply-RGCloneRevisionToContainer function, available from my Flyway Teamwork GitHub project that can automate the process of saving the new Flyway version as a data container revision. To save a new revision point for our data container, after each successful migration, we simply supply the name of the Clone Project (it assumes a Database-Project-RDBMS-Branch naming convention for the project, but you can change this):

The function applies the revision, which in raw terms is just doing this …

…and records in a JSON file the database version that corresponds with the new revision point, so that we can relate revision points with Flyway Versions and, if it ever becomes necessary, load the correct version. If you look in your Flyway directory, you’ll see a <MyContainerName>Record.json file that records the corresponding database version for each saved revision point. We only need this file if we want to load a previous version into the container, rather than just reset to the revision from which we started (more on this later).

To autosave a revision on every successful migration, we could use the function in a Flyway afterMigrate callback. It takes a single parameter, the name of the project, which can be referenced from an environment variable. There is a snag though. While using rgclone with Flyway is generally harmonious, I’ve found that using any function or script that executes…

…within a callback causes a non-terminating error in Flyway:

flyway : ERROR: Unable to restore autocommit to original value for connection

The callback has been completed successfully and I don’t quite know why Flyway would want to restore autocommit at this point because not much can happen after an afterMigrate callback. I still use the callback because I’m capable of forgetting to do the revision point manually. If you are doing the migration within a script, the script that makes the revision point can be executed after the Flyway migration without the splashes of red on the screen.

Instant rollback using rgclone reset

Now, when Flyway calmly announces the need to ‘restore backups and roll back database and code‘, our “roll back” is a simple rgclone reset command to reset the container back to the starting revision, which will return our clone to the result of the last successful migration.

We’ve used a reference to an environment variable that holds the name of our container because it is less typing and we’re just using one clone at a time. It is just done to keep things simple so that it is accessible to any other processes that need to work on the same container, database, and project.

Instant recovery from database migration errors

This reset is the computer equivalent of clicking your heels together three times and saying, ‘There’s no place like home‘, where ‘home’ in our case is the previous successful database version. If you were to run a flyway info command after resetting the data container, you would see that you’re back where you were before you ran that bad code. This suddenly makes light of a tricky migration. You can load that migration step, run your tests, revert to the previous version, fix the problems, and run the process repeatedly until you are satisfied.

This is fine if this is a new migration, or you are in an isolated branch (isolated in the sense that there is only one development process going on in the branch). If you start changing previous migrations using migration files that are accessible to other development activities, then Flyway will detect an attempt to rewrite history and object.

The ability to reset to the current version speeds up a development magically, whatever methodology you use. It means you can proceed cautiously using your favorite IDE, and only apply the migration when you are confident that the migration works without error and does what it is supposed to do, efficiently.

Fixing a past problem by going back in time

To revert to a revision other than the one you started from, you don’t use the rgclone reset command but rather the rgclone load command. Why would you want to use any of those previous versions of the database? The most usual reason is to chase the nature of one of the more subtle bugs. It helps to be able to run a test on a previous version to see if it shows the same behavior. It narrows down the possibilities, especially if we can see what changes were innocently made just before the bug manifested itself.

For example, while testing the code in this article I was adding versions to my MySQL edition of Pubs database. Everything seemed to work until I got to the import of the full test data in V1.1.12. The original code for PubsMySQL had been developed and tested on MariaDB. It turned out that there was a bug in migration V1.1.10, where the auto-increment of MySQL skipped to the point that there was a broken series of records, which subsequently caused the data import to fail.

I had to find a fix. The rgclone reset command would only get me back to the previous version (V1.1.11). I could use the load command to return to V1.1.8 so I could then re-apply version V1.1.10 to see how the MySQL version got the auto-increment value wrong. You can see from the JSON file shown previously that rev.6 corresponds to our Flyway database version 1.1.8:

However, what I really wanted was, in effect, to create a ‘sub-branch’ so I could work out what went wrong while still having the current line of development for comparison work. So, instead, I created (via ‘graduate’) a new image from V1.1.8.

Graduating a data container

Graduating a data container means that you create a data image from it. In effect, it allows you to use its current state as the baseline for future work

Successfully graduated data container 'Pubs-TestRevisioning-mysql-develop-container' 
at revision 'rev.6' to a new data-image
New data image 'Pubs-TestRevisioning-mysql-reference-image' (67) available!

Creating a new reference container for the sub-branch

So, now we create a new ‘reference’ data container from our image, also creating the Flyway config file that captures the connection and credentials for the new container.

VERBOSE: Data Container Pubs-TestRevisioning-mysql-reference-Container 
created from Pubs-TestRevisioning-mysql-reference-Image.
VERBOSE: written out flyway connection and authentication details to C:\Users\Phil\Pubs_TestRevisioning_mysql_reference.conf

So far so good. We now make sure the credentials work and we have indeed got version 1.1.8 as we hoped

Schema version: 1.1.8
 
+-----------+---------+------------------------------+--------+---------+----------+
| Category  | Version | Description                  | Type   | State   | Undoable |
+-----------+---------+------------------------------+--------+---------+----------+
|           |         | << Flyway Schema Creation >> | SCHEMA | Success |          |
| Versioned | 1.1.1   | Initial Build                | SQL    | Success | Yes      |
| Versioned | 1.1.2   | Pubs Original Data           | SQL    | Success | Yes      |
| Versioned | 1.1.3   | UseNVarcharetc               | SQL    | Success | Yes      |
| Versioned | 1.1.4   | RenameConstraintsAdd tables  | SQL    | Success | Yes      |
| Versioned | 1.1.5   | Add New Data                 | SQL    | Success | No       |
| Versioned | 1.1.6   | Add Tags                     | SQL    | Success | No       |
| Versioned | 1.1.7   | Add Indexes                  | SQL    | Success | No       |
| Versioned | 1.1.8   | AddEditions                  | SQL    | Success | No       |
| Versioned | 1.1.10  | AddAddressesPhonesEtc        | SQL    | Pending | No       |
| Versioned | 1.1.12  | AddTestData                  | SQL    | Pending | No       |
+-----------+---------+------------------------------+--------+---------+----------+

Yup. It all worked fine.

Fixing the bug

We are at version 1.1.8 (AddEditions) and so we can now work on the bug in this that causes the problems with 1.1.10 (AddAddressesPhonesEtc), trying out solutions quickly until we find something that works.

Are we, in our pretended role of designing a publisher’s database, going to retrospectively re-write 1.1.8? If this development branch was being used only by a single group of developers, then it wouldn’t matter. It would be like a single-user database in that you would eventually want to create a single ‘merge’ migration in the parent branch once everything was finished and tested, so that each migration wouldn’t exist outside the branch. Otherwise, you would add a ‘patch migration’ to the migration that did nothing other than correct the mistake.

Conclusion

We’ve illustrated how to use the various features of the rgclone CLI, in Redgate Test Data Manager, in development work. This allows the very useful facility to remove a failed migration, and to discard the incremental changes that a developer will use to create and test a change to a database. With Flyway, this is a revelation because it provides for the need to use an IDE for much of the work, removing the need for undo migrations, but using Flyway to manage the migrations, and provide the versioning. It does more, of course, and I’ve shown how one can create a working database at any previous version in just a few seconds.

Tools in this post

Redgate Test Data Manager

Reliable and secure test data provisioning

Find out more