Detecting Database Drift during Flyway Database Development
How to detect database drift prior to running a database migration, so that you can be certain that a database hasn't been subject to any 'uncontrolled' changes that could affect the migration or result in untested changes being deployed to production.
Flyway will migrate a database between versions, tracking its version number in a schema history table, but it has no way to prevent other ‘external’ changes being made, subsequently. One of the big advantages of maintaining script directories for all the current versions of your database is the ability to detect this sort of database drift, and therefore verify that the database you wish to migrate really is at the version it claims to be.
In a previous article I demonstrated how to generate both the object-level source code and a build script for every version. This article will go into detail about how to use these script directories, in version control, to run drift checks, to detect when changes have been retrospectively applied to a deployed version, during database development. These drift checks use another tool in Flyway Enterprise, namely SQL Compare, to compare the metadata of the database you wish to migrate, with the object-level source, or build script, for that version.
Maintaining more than one method of building a database might seem like overkill, but once a version is no longer current, and so becomes of only historic interest, then the canonical source code in source control is sufficient. Even so, it is still useful in some circumstances to have the object-level source, because it enables you to maintain a source code repository that tracks changes at object level, so allowing you to see the changes in a specific table or other database object over time.
How database drift happens and how to detect it
I’ve described ways to detect database drift in several previous articles: How to cope with Database Drift during Deployment using SQL Compare, Checking for Database Drift using Extended Events and SQL Monitor and Database Deployment: The Bits – Database Version Drift. For more information on the different methods of drift detection, please read these. Don’t think that I particularly love the topic of database drift; I don’t. There is no need to back away and avoid catching my gaze if you see me at a cocktail party. It is, however, important to check for drift during a deployment because it can cause upsets, even under the increased control that you have with a Flyway migration.
We can get various types of ‘uncontrolled’ changes to a database version. They can happen through carelessness, a breakdown in development procedures, through a fault in the controls that a database development must have, through malice, and through criminal intent. As a development manager, I’ve experienced all these types of drift at one time or another, mainly because I inserted checks for them, as described in the above references
When working in finance and retail banking, but also in any healthcare database or any other enterprise where data access must be restricted, it is important to be able to prove that a deployed version hasn’t subsequently ‘mutated’. In other words, that there have been no retrospective changes or insertions into the code that could break a deployment, and that there are no inserted ‘backdoors’ that would allow illicit data access or manipulation. The whole permission system must be locked down, and it must be impossible to alter a deployment process once it has been signed off by UAT, security and legal experts. The code in source control must represent the live database, and it must be possible to track all changes, who made them, when and why. It must be up to the quality that can be used as evidence.
Flyway tracks deployed versions but can’t detect drift
With Flyway, you get a record of each migration that is applied, and when, in a Flyway schema history table. However, it is unavoidably editable. This table cannot have any special lock on it that is effective across all supported databases. To illustrate my point, here is a schema history table I’ve adulterated manually:
More importantly, it is possible at any point to apply a patch to the database without using Flyway. Because Flyway migrates a database rather than builds it, this patched version will remain permanently in the database undetected. Flyway will not object because it carries no information to be able to make the check. I’ve just proved it to myself, by running Flyway clean
, building to version 1.1.3, changing the names of the people who did it, migrating again to v1.1.6, adding a dodgy patch, manually, that opened a security backdoor, and then migrating to v1.1.8. It all worked. That patch would remain there permanently. So far, there is little risk, because the correct version of v1.1.8 can be created from the migration files in source control. If, however, you then consolidate many migration scripts into one, by scripting our v1.1.8 of the database as a single build script? Or you create a migration that hops several versions? At that point, the dodgy patch is there in all subsequent copies, and it has got into version control too. Flyway makes it very easy to ‘repair’ and rebuild databases, so there is little to prevent you ‘drifting’ the database.
You can get around this problem reasonably easily, for any specific database system, such as Oracle, but Flyway is a tool that must work across many database systems, and each database has a different way of comparing database schemas. What you need is a permanent and unchangeable representation of what the database should be, at every version, created fresh from source control. Then, for every migration we perform, we can first compare the copy of the database we’re working on with this ‘canonical source’, quickly and easily.
For SQL Server or Oracle, we can use SQL Compare, part of Flyway Enterprise, to compare an existing database, previously migrated with Flyway, to the script folders for that version. The great advantage is that you can then use the same tool subsequently to drill into the detail of the differences with the GUI and work out how they happened.
Creating the source folder for a version
My previous article, Creating Database Build Artifacts when Running Flyway Migrations, describes the ‘script block’ tasks to create an object-level source folder ($CreateScriptFoldersIfNecessary
) and a ($CreateBuildScriptIfNecessary
), once each, for each newly-migrated version.
The PowerShell script that drives the Flyway migration runs these tasks immediately after a migrate operation (as I’ll demonstrate shortly). First, the task checks whether the final database version has ever been scripted, by checking to see whether there is a Source folder (holding the object-level scripts) or Scripts folder (holding the build scripts) for that version. If these folders don’t exist, then it creates them, and persuades SQL Compare to script out the database at that version , creating subdirectories within the Source folder, one subdirectory for each type of object.
It also writes out an XML file giving the various database settings such as collation that are relevant to any subsequent comparison. All these necessary actions are contained within a PowerShell Script block. Each version is immutable, so there is no point in running these tasks more than once for any version. These script directories are subsequently saved in source control, so that any changes in the source can be detected easily.
Using the Source folder for drift detection
Once a source directory has been created for a particular version of a database, any subsequent migrations using this same project can be checked to see if the resulting database is the same or has changed. Any change must be investigated to work out why, because the migration process should, supposedly, be determinant. There is unlikely to be any good reason for a difference.
Drift detection can sometimes be complicated by the existence of legitimate differences between two deployed copies of the same version, such as when a database version has two or more ‘variants’ (a variant is a version of something with different features and is used when the product is altered in some way for a different target audience or usage).
So, when might you come across variants? Most often with databases, you have backward-compatible versions for clients with a large installed base of obsolescent versions of a database. You hit this too with banking applications that require databases that conform with different legislative frameworks. Sometimes, personal information must be handled in different, but contradictory, ways in different countries.
To illustrate the fact that there can be good reasons for differences, I’ve added some code in a migration script for v1.1.8 of the sample Pubs database, which will give a different metadata depending on whether the database supports string aggregation. Variants are an interesting topic but for the purposes of this article, we just need to note that for drift detection in such cases we have separated the Flyway folder of migration scripts from the project folder of reports and generated scripts, in a way that allows for these variants. However, I’ll deal with the topic of handling these variants in a separate article.
Running database drift and other checks during Flyway migrations
When working with Flyway, it is good to be able to do some fairly dramatic automation but mixed with interactive work. To ‘square the circle’, I’ve created a library of script blocks that each perform a discrete task for running Flyway with all the extra scripting as per the diagram. These are:
$FetchOrSaveDetailsOfParameterSet
: picks up the saved parameters from file if you just provide a name and a project, Passwords aren’t stored, and it is in the user area, so has NTFS user-access security$FetchAnyRequiredPasswords
: This will check to see if you have aUserID
to access the database. If so, it provides a password from a secure string in the user area.$FormatTheBasicFlywayParameters
: This provides in an array all the main parameters you need for running Flyway.$GetCurrentVersion
: checks the database and gets the current version number$CheckCodeInDatabase
: checks the code in the database for issues if this hasn’t been done yet$CheckCodeInMigrationFiles
: checks the code in the migration files for issues if this hasn’t been done yet$IsDatabaseIdenticalToSource
: if it can, checks to see if the database really is what you think$CreateBuildScriptIfNecessary
: writes out a build script if there isn’t one for this version$CreateScriptFoldersIfNecessary
: writes out a source folder with an object level script if absent$ExecuteTableSmellReport
: executes SQL that produces a report in JSON from the database that alerts you to tables that may have issues
These are all in a one, separate file, DatabaseBuildAndMigrateTasks.ps1, stored in the same directory as the PowerShell script that executes the tasks.
As I’ve discussed previously, I quickly became bored with filling in parameters when working with several combinations of projects and databases. Each database needs its own data. I’ve implemented a way to save the details for each database in a subdirectory on your user’s home directory. You can pick up the details by naming each file. The obvious name might be the name of the database, but it could be anything: after all, you might choose to have several servers, each with a database under the same name in which case you might want to use the server name.
1 2 3 |
$DatabaseDetails = @{ 'name' = 'MyDatabase'; 'Project' = $MyProject; } |
You can then fill in the $DatabaseDetails
array by executing
1 |
$FetchOrSaveDetailsOfParameterSet.invoke($DatabaseDetails) |
So here PowerShell script code to demonstrate the drift check. You need to do run the code once, providing all the database details in long form and giving the parameter set a name. For subsequent executions you can use the shorthand form shown above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
$MyProject = 'pubs' #Must fill in the name of the project. This determines where #script artefacts are kept <# The tasks are in a separate script. It is placed in the same directory as this script. First, find out where we were executed from. Each environment has a different way of doing it. It all depends how you execute it. If you just past this in, you'll have to make this the working directory #> try { $executablepath = [System.IO.Path]::GetDirectoryName($myInvocation.MyCommand.Definition) } catch { $executablepath = '' } #didn't like that so remove it if ($executablepath -eq '') { $executablepath = "$(If ($psISE) # null if at the commandline { Split-Path -Path $psISE.CurrentFile.FullPath } Else { $global:PSScriptRoot })" } if ([string]::IsNullOrEmpty($ExecutablePath)) { $ExecutablePath = $pwd } .("$executablepath\DatabaseBuildAndMigrateTasks.ps1") <# # so we define our project-wide settings $DatabaseDetails = @{ 'name' ='TheNameToGiveThisDatabaseAndProject'; 'ProjectFolder' = 'MyPathToTheFlywayFolder\PubsFlywaySecondMigration'; 'ProjectDescription'='A sample project to demonstrate Flyway, using the old Pubs database'; 'pwd' = ''; #Always leave blank 'uid' = 'MyUserName'; #leave blank unless you use credentials 'Database' = 'MyDatabase'; # fill this in please 'server' = 'MyServer'; # We need to know the server! 'port' = $null; #Not normally needed with SQL Server. add if required #set to $null or leave it out if you want to let jdbc detect it 'Project' = $MyProject; # the name of the project-needed for saving files 'Version' = ''; # current version of database - # leave blank unless you know 'Checked' = $false; # has it been checked against a source directory? } #> $DatabaseDetails = @{ 'name' = 'MyDatabase'; 'Project' = $MyProject; } <# first grab all those boring details from disk for the project and database #> # we line out our tasks in an array @($FetchOrSaveDetailsOfParameterSet, #add escaped names, pick up details by name and project $FetchAnyRequiredPasswords, #get any passwords needed from secure storage $FormatTheBasicFlywayParameters, #so you can run flyway $CheckCodeInMigrationFiles # check that all migration files have had a report ) | foreach { #we execute these tasks in turn if ($DatabaseDetails.Problems.Count -eq 0) { $_.Invoke($DatabaseDetails) } } <# at this point we are ready to do all the standard tasks. For this demonstration, we'll clean our test database #> if ($DatabaseDetails.Problems.Count -eq 0) { Flyway clean $DatabaseDetails.FlyWayArgs } <# now we will upgrade one file at a time and test out all our tasks as we go. We should end up with a scripts directory with source and build scripts. The second time we run this, the drift check will kick in on every version because in each case we have a scripts directory. You can try stopping after every version and tampering with the database to see whether it picks up your changes #> # list out all the existing versions in order dir "$($DatabaseDetails.projectFolder)/scripts/V*.sql" | foreach{ [version]($_.Name -replace 'V(?<Version>[.\d]+).+', '${Version}') } | Sort-Object | foreach{ if ($DatabaseDetails.Problems.Count -eq 0) { $Invocations = @( $GetCurrentVersion, #checks the database and gets the current version number #$CheckCodeInDatabase, #checks the code in the database for issues if this hasn't been done yet $IsDatabaseIdenticalToSource, # if it can, checks to see if the database really is what you think ) $Invocations | foreach{ if ($DatabaseDetails.Problems.Count -eq 0) { Write-Verbose "Executing from line $($_.startposition.StartLine)"; $_.Invoke($DatabaseDetails) } } Flyway migrate "-target=$($_.ToString())" $DatabaseDetails.FlyWayArgs $Invocations = @( $GetCurrentVersion, #checks the database and gets the current version number $CreateBuildScriptIfNecessary, #writes out a build script if there isn't one for this version $CreateScriptFoldersIfNecessary #writes out a source folder with an object level script if absent ) $Invocations | foreach{ if ($DatabaseDetails.Problems.Count -eq 0) { Write-Verbose "Executing from line $($_.startposition.StartLine)"; $_.Invoke($DatabaseDetails) } } } } #list out every problem with where it happened if ($DatabaseDetails.Problems.Count -gt 0) #list out exert error and which task failed {$DatabaseDetails.Problems.GetEnumerator()|Foreach{"$($_.Key)---------";$_.Value}|foreach {$_} } # and list out every warning. if ($DatabaseDetails.Warnings.Count -gt 0) #list out our warnings {$DatabaseDetails.Warnings.GetEnumerator()|Foreach{"$($_.Key)---------";$_.Value}|foreach {$_} } |
Once you’re run the code once, the drift check ($IsDatabaseIdenticalToSource
) will start working because the source folders for each version will now exist. You can try migrating your pubs database to a version, altering the database manually (for example, just open SSMS and create a couple of objects) and then migrating to the next version to verify that the drift is detected.
Registering data sources Creating mappings Comparing Applying Command Line Items Checking for identical databases Summary Information ======================================================================================================================= DB1 = Source DB2 = MyServer.PubsOne Object type Name DB1 DB2 ----------------------------------------------------------------------------------------------------------------------- Table [dbo].[MyBaseTable] << Table [dbo].[MyOtherBaseTable] << ----------------------------------------------------------------------------------------------------------------------- Database Different to source
Conclusions
The larger a team, the more likely you are to see the need for a check on drift before a migration. Until I made a habit of checking for uncontrolled database changes, I never realised how often in happened until I made a habit of trying to detect it. Is it essential? Not if you are feeling lucky, but you’re more likely to get changes in the metadata, and in some cases, the migration approach will allow this drift to build up if you’re not on the lookout for it.
Of course, now that you have script directories, you can make comparisons between any two versions, using the GUI-version of SQL Compare. This will open up many more ways of drilling down into a problem. You can also eliminate a lot of small migration steps by generating a migration script using SQL Compare, and then adding in anything you need to preserve the data where SQL Compare hasn’t enough information to do it for you.