Creating Idempotent DDL Scripts for Database Migrations
How to write idempotent DDL scripts that Flyway can run several times on a database, such as after a hotfix has already been applied directly to the production database, without causing errors.
Database code that creates or alters tables and routines ought to be ‘idempotent’, so that it won’t cause problems if it is applied more than once to the same database. This is sometimes trickier than it might sound. There is more involved than simply checking the script hasn’t already been previously applied to the database or writing IF
object_id('Mytable')
IS
NOT
NULL
checks to see if a table already exists before trying to create it.
For example, if you want to alter a column to make it NOT
NULL
, your script should ensure that the ALTER
statement only runs if the table exists, the column exists and is nullable, and there are no existing null values. I’ll demonstrate how to achieve this, using either the information schema views or system functions to check the required conditions, and then an IF
NOT
EXISTS
statement to ensure the SQL script only tries to make the change if all the conditions are met.
In a similar vein, I’ll show other examples of idempotent SQL scripts, such as one that checks to see if an index already exists, before creating it, as well scripts to safely create a table, or alter a table to add a Primary Key. If you are a SQL Prompt user I even show how to turn these scripts into reusable code snippets. Finally, I provide an example of an idempotent SQL script that adds or updates extended properties for every table, which you can run as a callback script when running Flyway migrate.
The uncontrolled hotfix
There will inevitably come a time in the deployment of a Flyway-managed database, where things get in a muddle. Let’s say you deploy some database updates to production, the migration itself works fine but sometime after the release, problems arise. It could be a missing index, some incorrect logic in a procedure, or maybe a datatype that is wrong. Time is money and whatever you do to fix this, it must be done quickly. You are a professional, so you diagnose the problem and apply a quick hotfix. Everything starts working properly.
However, you have a nagging feeling that you’re not quite finished yet. You’ve just done an uncontrolled change, so you now need to create a new version, ‘backfilling’ these changes into version control. You add the ‘patch’ migration script to Flyway’s migrations folder, with a new ‘patch’ version. From now on all databases that are created will have the fixes in the ‘patch’ automatically applied. With a sense of relief, you sit back with a well-earned cup of coffee, but a little voice in your head is telling you that something still isn’t quite right.
Although your production server is now at the ‘patch’ version, Flyway still thinks it’s at the previous version. As far as Flyway is concerned, the patch hasn’t been applied so next time you do a release into production, it will try to reapply the new version that you’ve just created. The code could easily produce an error, for example if it tries to CREATE
an object that already exists, or tries to ALTER
a table, to add a new column with the same name as an existing one.
The workaround: Flyway cherry picking
What do you do to avoid a future failed deployment? You need persuade Flyway that the migration file with the patch has been applied so that it doesn’t reapply the migration when next you update production.
If you’ve reached this point, the way around it is to use a feature of Flyway Teams called cherry picking. Flyway needs to apply each migration in order, but if you use cherryPick, with the parameter skipExecutingMigrations=true
, then Flyway Teams will update the flyway_schema_history
table on the database without actually executing the script. This is a good solution that works no matter what database system you are using.
Nonetheless, it is a better practice, where the database system allows it, to avoid subverting Flyway’s simple migration system, by instead using idempotent scripts, which can be rerun safely, and don’t require these extra steps.
The solution: idempotent SQL migration scripts
A hotfix is fine, if it is followed immediately by writing and adding an idempotent ‘patch’ migration script that only makes changes if they haven’t yet been made. Idempotency is the property of certain operations in mathematics and computer science; it means they can be applied several times without changing the result beyond the first time they are run. It is really a fancy way of saying that your code must check first whether the database object has already had the change before you make the change. An idempotent script is what you need to turn a hotfix safely into a Flyway migration.
It is relatively simple to do this in SQL Server, Oracle and PostgreSQL. In its simplest form, the script merely checks whether it has already been applied and if so, makes no changes. However, it is also worth adding ‘guard clauses’ that check whether the object exists before you create or alter it, so that you prevent error from objects being altered if they don’t exist or being created when they already exist. I like to test idempotent scripts that only alter existing objects against an empty database until they execute, doing nothing, but without errors.
It pays to be cautious and take as little as possible for granted. The errors that come from a target database having ‘drifted’ from its stated version are often difficult to predict. The more complicated, truly idempotent scripts will review every change and make the change only if it is necessary and possible.
Avoiding dropping and recreating routines
Obviously, tables require the most care in migration-based code, if you wish to preserve the data. However, you can also go seriously wrong with techniques that drop a routine and then create it because you lose all the permissions and extended properties associated with it. Relational databases are all moving towards a syntax for creating or altering programmable objects (a.k.a. ‘routines’) that makes a lot of ‘guard clauses’ unnecessary for them. In SQL Server, from 2016 onwards, we have the ANSI SQL clauses DROP
IF
EXISTS
and CREATE
OR
ALTER
, which are already standard on other database systems.
DROP
IF
EXISTS
doesn’t help you in code that seeks to prevent you altering a routine that doesn’t yet exist. However, you can now use CREATE
OR
ALTER
for the purpose in SQL Server DDL code for most ‘routines’ such as views, stored procedures and functions (including natively-compiled varieties of both) and triggers, but not the deprecated RULE
and DEFAULT
. You can’t use it for ‘storage objects’ (tables, indexes and indexed views), schemas, CLR objects or assemblies.
Where you can’t use CREATE
OR
ALTER
, you can use the following syntax to preserve existing permissions and extended properties. The ALTER
syntax is always used, but it is preceded by a batch that checks to see if the object exists. If the object doesn’t exist, it is created with a dummy body; I chose something non-toxic, SET
NOCOUNT
ON
, but it has no significance, as the following ALTER immediately changes it for the real definition.
1 2 3 4 |
IF OBJECT_ID('MyProcedure','U') IS NULL EXEC('CREATE PROCEDURE MyProcedure AS SET NOCOUNT ON;') GO ALTER PROCEDURE MyProcedure--etc. |
Idempotency using Guard clauses
Each relational database system has its own way of checking whether it is safe and necessary to execute the DDL. The ANSI standard way is to use the Information schema, which is provided in Apache Hive, Microsoft SQL Server, MySQL, PostgreSQL, H2 Database, HSQLDB, InterSystems Cache, MariaDB, Presto and MemSQL.
Safely altering a column
There are cross-platform ways of making sure that a table and column exists and checking whether it needs fixing to make it NOT
NULL
. Here is an example of using information_schema
in a guard clause. A typical task, though not one that would affect a release, is correcting a hastily created table where the columns weren’t specified as NOT
NULL
. This can cause bad data to leak in.
The following code will make the Edition_id
column NOT
NULL
only if the table and column exists, it is currently nullable, and there are no NULL
values in the existing column.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
IF ( ( SELECT CASE --return 0 if not nullable 1 if nullable (--return null if it doesn't exist SELECT IS_NULLABLE FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = 'dbo' AND TABLE_NAME = 'prices' AND COLUMN_NAME = 'Edition_id' ) WHEN 'NO' THEN 0 WHEN 'YES' THEN 1 ELSE -1 END ) = 1 ) BEGIN IF NOT EXISTS (SELECT 1 FROM prices WHERE Edition_id IS NULL) ALTER TABLE dbo.prices ALTER COLUMN Edition_id INT NOT NULL; END; |
The INFORMATION_SCHEMA.COLUMNS
table has plenty of useful information that can help avoid any errors that come from applying a script twice. There is ORDINAL_POSITION
, COLUMN_DEFAULT
, IS_NULLABLE
, DATA_TYPE
, CHARACTER_MAXIMUM_LENGTH
, CHARACTER_OCTET_LENGTH
, NUMERIC_PRECISION
, NUMERIC_PRECISION_RADIX
, NUMERIC_SCALE
, DATETIME_PRECISION
, CHARACTER_SET_CATALOG
, CHARACTER_SET_SCHEMA
, CHARACTER_SET_NAME
, COLLATION_CATALOG
, COLLATION_SCHEMA
, COLLATION_NAME
, DOMAIN_CATALOG
, DOMAIN_SCHEMA
, DOMAIN_NAME
. You’ll recognize some of these from ODBC calls, which uses the information to tell the application about the database metadata, which is why the INFORMATION_SCHEMA
is maintained.
In SQL Server we also have the system views and an undergrowth of system functions that provide shortcuts to get at the information. Unless you need a degree of cross-database compatibility, using them is a bit simpler and more versatile.
1 2 3 4 5 6 7 8 9 |
IF (COLUMNPROPERTY( OBJECT_ID('dbo.prices', 'U'), --the table 'Edition_id', --the column name 'AllowsNull' ) = 1 ) BEGIN IF NOT EXISTS (SELECT 1 FROM dbo.prices WHERE Edition_id IS NULL) ALTER TABLE dbo.prices ALTER COLUMN Edition_id INT NOT NULL; END; |
These system functions afford us the useful trick in that any metadata function that references an object will return NULL
if the object doesn’t exist. This guard clause, using the COLUMNPROPERTY() metadata function, will check to see whether there is a table in existence called dbo.prices
with a column called Edition_id
that allows nulls. If this returns NULL
then either the column or the table doesn’t exist. If 0 then the column exists and it is NOT
NULL
, 1 if it allows nulls.
If you tried to run the ALTER TABLE
statement without the table or column being there, you’d get an error. This ColumnProperty()
function will tell you a lot more. When you’re altering the properties of a column, you’d need to check on the property you want to alter. It could be any of these, though some are rarely altered in practice:
AllowsNull
, ColumnId
, FullTextTypeColumn
, GeneratedAlwaysType
, IsColumnSet
, IsComputed
, IsCursorType
, IsDeterministic
, IsFulltextlndexed
, IsHidden
, Isldentity
, IsldNotForRepl
, Islndexable
, IsOutParam
, IsPrecise
, IsRowGuidCol
, IsSparse
, IsSystemVerified
, isXmllndexable
, Precision
, Scale
, StatisticalSemantics
, SystemDataAccess
, UserDataAccess
, and UsesAnsiTrim
.
Creating indexes safely
It isn’t the only useful system function. Imagine that you need to add an index on the publication_id
column of dbo.editions
in your database. Instead of plunging straight in, and assuming the current state, your code must first check whether the table dbo.editions
actually exists and that there isn’t already an index of that name on the publication_id
column. If you just have the second IndexProperty
check, you could, when it returns NULL
, wrongly assume that table exists. With an idempotent script, it is best to assume nothing.
1 2 3 |
IF (Object_Id('dbo.editions','U') IS NOT NULL AND IndexProperty(Object_Id('dbo.editions','U'),'Publicationid_index','IndexID') IS NULL) CREATE INDEX Publicationid_index ON dbo.Editions(publication_id) |
This checks that a table called dbo.editions
exists, but that it hasn’t got an index called Publicationid_index
on the column publication_ID
. You can run and rerun this without an error. You can even run it on an empty database. It only makes the change once. The indexproperty
in the example is the index_id
of the index of that name in the table or indexed view, but we don’t actually need it. We are just using the function to check that an index of that name doesn’t exist yet.
With SQL Server, system functions like ColumnProperty()
and IndexProperty()
that come in handy for guard clauses but probably the most generally useful is ObjectPropertyEx
.
Safely creating and altering tables
So far, we’ve avoided guard clauses for creating tables because there are no clauses like CREATE
OR
ALTER
for them. If you want to create a table only if it doesn’t exist, you have two alternatives. The first is to use the NOEXEC
trick:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
--------Stores Table-------- IF Object_Id('dbo.stores','U') IS NOT NULL --does it exist? SET NOEXEC ON --if it does, then temporarily just parse, not execute ELSE PRINT N'Creating [dbo].[stores]' GO CREATE TABLE dbo.stores ( stor_id char (4) NOT NULL, stor_name nvarchar (80) NULL, stor_address nvarchar (80) NULL, city nvarchar (40) NULL, state char (2) NULL, zip char (5) NULL ) ON [PRIMARY] SET NOEXEC OFF --and switch on execution again (Important!) |
The second is to execute a batch conditionally as a string using Execute
or sp_ExecuteSQL
:
1 2 3 4 5 6 7 8 9 10 11 12 |
IF Object_Id('dbo.stores','U') IS NULL --does it exist? EXECUTE sp_ExecuteSQL @Stmt= N' PRINT N''Creating [dbo].[stores]'' CREATE TABLE dbo.stores ( stor_id char (4) NOT NULL, stor_name nvarchar (80) NULL, stor_address nvarchar (80) NULL, city nvarchar (40) NULL, state char (2) NULL, zip char (5) NULL ) ON [PRIMARY] |
Before altering a table, you can easily check whether it exists, and, at the same time, whether it needs to be altered. I would use this check clause:
1 2 3 4 5 6 7 8 |
IF ObjectPropertyEx(Object_Id('dbo.stores','U'),'TableHasPrimaryKey') = 0 /*if the stores table exists (it would return null otherwise) and it has no primary key yet*/ BEGIN PRINT N'Creating primary key UPK_storeid on dbo.stores' ALTER TABLE dbo.stores ADD CONSTRAINT UPK_storeid PRIMARY KEY CLUSTERED (stor_id) ON [PRIMARY] end GO |
You can, of course, nowadays declare indexes and constraints as part of the table definition in SQL Server, which makes an idempotent expression a lot easier. However, it doesn’t meet all possible conditions, such as the table already existing, so you’d have to define the primary key in both places.
We can combine conditional table creation with checking to see if it needs an index. Here is a table being created if necessary, with a non-clustered index as well as a primary key constraint:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
--------Discounts Table-------- IF Object_Id('dbo.discounts','U') IS NOT NULL SET NOEXEC ON ELSE PRINT N'Creating dbo.discounts' GO CREATE TABLE dbo.discounts ( discounttype nvarchar (80) NOT NULL, stor_id char (4) NULL, lowqty smallint NULL, highqty smallint NULL, discount decimal (4, 2) NOT NULL, Discount_id int NOT NULL IDENTITY(1, 1) ) ON [PRIMARY] SET NOEXEC OFF GO IF ObjectPropertyEx(Object_Id('dbo.discounts','U') ,'TableHasPrimaryKey') = 0--table exists but no PK BEGIN PRINT N'Creating primary key [DiscountDiscountID] on dbo.discounts' ALTER TABLE dbo.discounts ADD CONSTRAINT DiscountDiscountID PRIMARY KEY CLUSTERED (Discount_id) ON [PRIMARY] end IF (Object_Id('dbo.discounts','U') IS NOT NULL AND IndexProperty(Object_Id('dbo.discounts','U'), 'Storid_index','IndexID') iS NULL) BEGIN PRINT N'Creating index [Storid_index] on [dbo].[discounts]' CREATE NONCLUSTERED INDEX Storid_index ON dbo.discounts (stor_id) ON [PRIMARY] End |
Saving idempotent DDL scripts as Prompt snippets
I would never rely on my memory or artistry to write an idempotent migration script. I am a SQL Prompt user, so I turn them into snippets (see: using an SSMS template in a SQL Prompt snippet).
Here is an example of an idempotent table creation script saved for reuse as a Prompt snippet. You just select code for the body of the table and then click the name of the snippet in the action menu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
IF OBJECT_ID('<dbo.table_name, sysname, dbo.MyTable>', 'U') IS NOT NULL BEGIN PRINT N'''<dbo.table_name, sysname, dbo.MyTable>'' already exists'; SET NOEXEC ON --if it does, then temporarily just parse, not execute end ELSE PRINT N'Creating '' <dbo.table_name, sysname, dbo.MyTable>'''; GO /* C code for <dbo.table_name, sysname, dbo.MyTable> created in $DBNAME$ on $DATE$ by $USER$ */ CREATE TABLE <dbo.table_name, sysname, dbo.MyTable> ( $SELECTEDTEXT$ ) SET NOEXEC OFF GO |
Running idempotent migrations in Flyway
As a final example, here is code for adding documentation to the tables of the pubs
database I’ve been using as an example project in the Flyway articles. This will work even if the tables don’t exist or if the tables already have comments. It is far easier to use than putting the documents in the code for each table. The code is placed in the file V1.1.7__AddIndexes.sql in the Pubs project and includes a number of idempotent index insertions as well.
You can imagine that it started out life as a patch of indexes to cover foreign keys as a performance enhancement in production (V1.1.6) which then were retrospectively added as a migration. It would have been runnable as a regular migration and could be applied to production without doing anything harmful.
I don’t regard the process of adding documentation as a metadata change. The use of extended properties for this is an excellent way of supporting this idea. With this script, you can retrospectively add documentation to previous versions of the database whenever you rebuild them. All you need to do is to add it as an AfterMigrate
callback. I’ve added an expanded version of this script to GitHub.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
/*write the documentation to all the tables specified in the list that exist in the database either writing to, or updating, what is there */ DECLARE @WhatToDocument TABLE (TheOrder INT IDENTITY, Tablename sysname, TheDescription NVARCHAR(4000)); INSERT INTO @WhatToDocument (Tablename, TheDescription) VALUES ('dbo.authors', 'The authors of the publications. a publication can have one or more author'), ('dbo.discounts', 'These are the discounts offered by the sales people for bulk orders'), ('dbo.editions', 'A publication can come out in several different editions, of maybe a different type'), ('dbo.employee', 'An employee of any of the publishers'), ('dbo.jobs', 'These are the job descriptions and min/max salary level' ), ('dbo.prices', 'these are the current prices of every edition of every publication'), ('dbo.Sales', 'these are the sales of every edition of every publication'), ('dbo.pub_info', 'this holds the special information about every publisher'), ('dbo.publications', 'This lists every publication marketed by the distributor'), ('dbo.publishers', 'this is a table of publishers who we distribute books for'), ('dbo.roysched', 'this is a table of the authors royalty scheduleable'), ('dbo.stores', 'these are all the stores who are our customers'), ('dbo.TagName', 'All the categories of publications'), ('dbo.TagTitle', 'This relates tags to publications so that publications can have more than one'), ('dbo.titleauthor', 'this is a table that relates authors to publications, and gives their order of listing and royalty%'); DECLARE @ii INT, @iiMax INT; --the iterators --the values fetched for each row DECLARE @TableName sysname, @Schemaname sysname, @TheDescription NVARCHAR(4000),@Object_id int; --initialise the iterator SELECT @ii = 1, @iiMax = Max(TheOrder) FROM @WhatToDocument; WHILE @ii <= @iiMax BEGIN SELECT @Schemaname = Object_Schema_Name(Object_Id(Tablename)), @TableName = Object_Name(Object_Id(Tablename)), @TheDescription = TheDescription, @Object_id=Object_Id(TableName) FROM @WhatToDocument WHERE @ii = TheOrder; IF (@Object_id IS NOT NULL) --if the table exists IF NOT EXISTS --does the extended property exist? ( SELECT 1 FROM sys.fn_listextendedproperty (N'MS_Description', N'SCHEMA',@Schemaname, N'TABLE',@TableName, null, null ) ) EXEC sys.sp_addextendedproperty @name = N'MS_Description', @value = @TheDescription, @level0type = N'SCHEMA', @level0name = @Schemaname, @level1type = N'TABLE', @level1name = @TableName;; ELSE EXEC sys.sp_updateextendedproperty @name = N'MS_Description', @value = @TheDescription, @level0type = N'SCHEMA', @level0name = @Schemaname, @level1type = N'TABLE', @level1name = @TableName;; SELECT @ii = @ii + 1; -- on to the next one END; |
Conclusion
Idempotent code is generally useful for running hotfixes to ensure that it just doesn’t matter if they are run more than once. It is also magical for some tedious tasks such as maintaining documentation.
I like idempotent code, but you need to be aware of the implications of what you’re doing. An example of doing it wrong is to check for the prior existence of a table before you CREATE
it, and then try to delete the previous version. The motivation is to prevent an error informing you that the table already exists. Even if you have no foreign keys or schema-bound objects, I’d have thought that this sort of error would need a rollback and quite a bit of investigation. A novice programmer did it to me once and wiped several tables in a production database (I managed restore it within some very embarrassing minutes). He had a build script like this and thought he was accessing his own Dev server. Usually, it is better to execute the code within a transaction and let the code produce an error severe enough to roll-back the entire script. However, the code you use to ensure idempotency can be adapted for this: To be completely certain, and you can be completely confident of executing within a transaction, you can use the guard clauses to check first and produce an error rather than attempting a more graceful avoidance.
So, this sort of coding is a useful addition to your armoury, but it is possible to overdo it!