Product articles Redgate Flyway Tracking development changes
Defusing Flyway Validation Errors using…

Defusing Flyway Validation Errors using Smarter Checksum Comparisons

This article provides a scripted SQL tokenizer script that quickly verifies whether a Flyway validation error is a real cause for concern, due to retrospective metadata changes, or just the result of a developer valiantly adding formatting and documentation to improve the code. If the changes are purely cosmetic, we can safely run Flyway repair to resume normal migrations.

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Flyway validation checks: necessary but indiscriminate

Flyway runs validation checks to make sure that the files that have been successfully applied to the database haven’t subsequently been changed. It must do this to maintain version consistency: if two copies of the same database have the same schema version number then the schemas should match. Put another way, the set of migration file locations that you specify must always produce the same versions of the database.

If a versioned migration file that has been used on a database is later changed, Flyway will detect it and raise a validation error the next time you try to run a migration on that database.

It’s vital that Flyway can run the validation checks as quickly as possible, so it just calculates a generic file checksum. This is a necessary compromise, but it means the checks are indiscriminate. Flyway knows the file has changed but cannot determine the nature of the change. Even if you’ve just added helpful comments or formatted the SQL nicely, the file checksum will have changed, and you’ll see a validation error. It also gives false positives because it doesn’t just check changes in SQL, an extra space between keywords, for example, will cause Flyway to refuse to do a migration.

At this point, you’ll need to carefully review the changes. This can be a time-consuming process. If you’re confident that the changes were purely cosmetic and don’t compromise the ‘version integrity’ of the migration chain, you can run a Flyway repair to ‘re-sync’ the checksums, and continue migrations.

This article is about automating, as far as possible, the process of “verifying the validation error” so that if the changes are purely decorative, normal development work can continue as quickly as possible.

Defusing Flyway validation errors

Ideally, all code would be fully documented and beautifully formatted before it’s committed to the shared development branch in version control. But just in case it isn’t, it is worth making it as easy as possible to tidy up code retrospectively, while minimizing problems with Flyway migrations. Otherwise, it could easily inhibit developers from formatting code for better readability, or from adding better documentation with block comments and end-of-line comments. Nothing should discourage you or your co-workers from making these sorts of changes, however long it is delayed. After all, the source is the only easy way of understanding the design of the database.

The idea behind this article is to provide developers with a quick and automated way for developers to verify whether a Flyway validation error is a real problem, or a ‘false flag’ triggered by cosmetic rather than SQL changes, so they can respond accordingly.

I’ll show how to do this check using a PowerShell-based SQL tokenizer that performs a more discriminating, ‘metadata only’ checksum comparison. It parses the code into tokens, filters out those tokens that represent comments or whitespace changes, and then re-calculates the checksums for the original and modified migration files. If the ‘tokenized checksums’ now match, it’s likely the changes are purely cosmetic.

If a migration is already applied to production, then any subsequent validation errors on that migration should probably be investigated with a full drift check, even if the “discriminate checksum” indicates only cosmetic changes.

If a migration has been committed to the shared development branch in version control but has not yet been deployed, the approach can be different. If the ‘tokenized checksums’ still don’t match, then it means there are SQL changes, and the live database no longer represents the database built from the source. The database will need to be cleaned and then rebuilt from the source, followed by a fresh test cycle.

If the checksums match then there are no changes to the SQL, even accidental ones, and developers can quickly and safely run the Flyway repair command to resume migrations on the affected databases.

defusing flyway validation errors using smarter checksums

The tokenized checksum comparison

To be certain that a block of code has the same functionality as the one it replaces, we’d need to perform a full semantic code comparison, comparing the syntax trees of the language with an AST (Abstract Syntax Tree) comparison tool and consider the meaning of symbols.

We don’t need anything so sophisticated because we have much simpler aims. We just want to allow SQL code to be reformatted or documented without it being flagged as functionally different. I wouldn’t even want to allow SQL that is different but has, hopefully, the same effect. Even if it did, you’d need to use the discipline of a migration to make that sort of change, along with its associated tests.

We don’t need a parser to achieve this. We just need a tokenizer to split the SQL code into its individual tokens, and then eliminate those tokens that represent allowable cosmetic changes, such as comments. The tokenizer is already provided in the Flyway Teamwork framework as one of the resources called Tokenize-SQLString.ps1. It’s cross-RDBMS (based on SQL-92 standard) and I’ve used it previously to present color-coded migration files in a browser, and to help you check for external references in your deployment code.

The Get-SQLCodeHash function

The Tokenize-SQLString cmdlet is just a script and is rather slow, because of the complications of detecting three-part or four-part names. It runs speedily enough for normal DDL scripting, but it isn’t quick enough for SQL-based DML (data manipulation) involving megabytes of data inserts. For our purpose here, we use a slimmed-down version of it, removing everything that isn’t essential for calculating the checksum.

With the slimmed-down version, we use a rather elaborate Regex. Each token, such as ‘Select‘, ‘From‘ or ‘,‘, is matched, but with sufficient details to allow us to check whether the code is the same or different. If required, it can also tell you where in the code the change is:

Name   : Punctuation
Index  : 446
Length : 1
Value  : ;
Type   : 
line   : 9
column : 62

We use it to read the code and create a set of tokens and the Regex then removes any tokens that end-of-line comments, documentation, or formatting changes. This allows us to quickly build a string consisting just of the SQL code, and calculate its checksum:

This process ensures that the SHA256 checksum calculation is impervious to changes in case, comments, formatting or documentation within the original file. We can turn this into a function, Get-SQLCodeHash:

Comparing SQL expressions

Let’s try it out. The following two SQL Expressions in the following block of code are the same. The only differences are the formatting, the end-of-line comments, and block documentation:

The ‘tokenized checksums’ are identical:

9C32C29389678889788B486EC6F516F15076C962856D13394A507087CFCECA40
9C32C29389678889788B486EC6F516F15076C962856D13394A507087CFCECA40

Why not change an alias in one version but not the other, take out a column of one version maybe, and you’ll see that the checksum won’t be the same.

Checking for the nature of changes to Flyway migration scripts

So, we can now use this to check a file that has failed the validation process in the Flyway migrate command, because Flyway has detected that the file has changed.

You should never blindly run Flyway Repair. First, you need to verify that there were no metadata changes in the file, and if there were, you need to investigate what changed and why. As noted earlier, Flyway knows the file has changed but not what has changed, which is why the command is ‘repair’ rather than ‘fix’. All the repair command does is modify entries in the Flyway Schema History (FSH) table; in this case, it would refresh the checksum it has recorded for the file. You should only allow this if you’re certain the changes are just comments or formatting.

To make the check, we just take the previous version of the migration file from version control and compare its checksum output with that of the current file. To do this, we use Get-SQLCodeHash to calculate both the old and new checksums but this time getting the SQL from a file:

You can, of course, get these hashes from a file directory all at once. This script has a guard to prevent it from getting scripts that are too long, but you can adjust it to taste. Script-based tokenization is always going to be slow, but then, you are unlikely to want to add documentation to a huge wadge of DML data imports!

Conclusion

Flyway’s method of checking changes in migration files is fine, just as long as it is seen as a way of notifying the user that something needs investigating. Source control is fine for finding text changes but without a semantic extension or plugin, it cannot distinguish between code changes that change functionality and mere decoration.

If you are faced with a failed migration due to a change in the checksum of one of the successfully-applied migration files, then you are likely to be interested in a simple way of finding out whether someone has just done some useful work making the history of changes, annotations, and documentation more readable, or has just made the horrible mistake of trying to make retrospective changes. A scripted tokenizer can find out for you. It might be slow, but it is quicker than trying to scan through source control to spot the nature of the changes.

Tools in this post

Redgate Flyway

DevOps for the Database

Find out more