21 April 2021

Guest post

This is a guest post from Phil Factor. Phil Factor (real name withheld to protect the guilty), aka Database Mole, has 30 years of experience with database-intensive applications.

Despite having once been shouted at by a furious Bill Gates at an exhibition in the early 1980s, he has remained resolutely anonymous throughout his career.

He is a regular contributor to Simple Talk and SQLServerCentral.

Phil Factor

21 April 2021

Creating Idempotent DDL Scripts for Database Migrations

How to write idempotent DDL scripts that Flyway can run several times on a database, such as after a hotfix has already been applied directly to the production database, without causing errors.

Guest post

Database code that creates or alters tables and routines ought to be ‘idempotent’, so that it won’t cause problems if it is applied more than once to the same database. This is sometimes trickier than it might sound. There is more involved than simply checking the script hasn’t already been previously applied to the database or writing IF object_id('Mytable') IS NOT NULL checks to see if a table already exists before trying to create it.

For example, if you want to alter a column to make it NOT NULL, your script should ensure that the ALTER statement only runs if the table exists, the column exists and is nullable, and there are no existing null values. I’ll demonstrate how to achieve this, using either the information schema views or system functions to check the required conditions, and then an IF NOT EXISTS statement to ensure the SQL script only tries to make the change if all the conditions are met.

In a similar vein, I’ll show other examples of idempotent SQL scripts, such as one that checks to see if an index already exists, before creating it, as well scripts to safely create a table, or alter a table to add a Primary Key. If you are a SQL Prompt user I even show how to turn these scripts into reusable code snippets. Finally, I provide an example of an idempotent SQL script that adds or updates extended properties for every table, which you can run as a callback script when running Flyway migrate.

The uncontrolled hotfix

There will inevitably come a time in the deployment of a Flyway-managed database, where things get in a muddle. Let’s say you deploy some database updates to production, the migration itself works fine but sometime after the release, problems arise. It could be a missing index, some incorrect logic in a procedure, or maybe a datatype that is wrong. Time is money and whatever you do to fix this, it must be done quickly. You are a professional, so you diagnose the problem and apply a quick hotfix. Everything starts working properly.

However, you have a nagging feeling that you’re not quite finished yet. You’ve just done an uncontrolled change, so you now need to create a new version, ‘backfilling’ these changes into version control. You add the ‘patch’ migration script to Flyway’s migrations folder, with a new ‘patch’ version. From now on all databases that are created will have the fixes in the ‘patch’ automatically applied. With a sense of relief, you sit back with a well-earned cup of coffee, but a little voice in your head is telling you that something still isn’t quite right.

Although your production server is now at the ‘patch’ version, Flyway still thinks it’s at the previous version. As far as Flyway is concerned, the patch hasn’t been applied so next time you do a release into production, it will try to reapply the new version that you’ve just created. The code could easily produce an error, for example if it tries to CREATE an object that already exists, or tries to ALTER a table, to add a new column with the same name as an existing one.

The workaround: Flyway cherry picking

What do you do to avoid a future failed deployment? You need persuade Flyway that the migration file with the patch has been applied so that it doesn’t reapply the migration when next you update production.

If you’ve reached this point, the way around it is to use a feature of Flyway Teams called cherry picking. Flyway needs to apply each migration in order, but if you use cherryPick, with the parameter skipExecutingMigrations=true, then Flyway Teams will update the flyway_schema_history table on the database without actually executing the script. This is a good solution that works no matter what database system you are using.

Nonetheless, it is a better practice, where the database system allows it, to avoid subverting Flyway’s simple migration system, by instead using idempotent scripts, which can be rerun safely, and don’t require these extra steps.

The solution: idempotent SQL migration scripts

A hotfix is fine, if it is followed immediately by writing and adding an idempotent ‘patch’ migration script that only makes changes if they haven’t yet been made. Idempotency is the property of certain operations in mathematics and computer science; it means they can be applied several times without changing the result beyond the first time they are run. It is really a fancy way of saying that your code must check first whether the database object has already had the change before you make the change. An idempotent script is what you need to turn a hotfix safely into a Flyway migration.

It is relatively simple to do this in SQL Server, Oracle and PostgreSQL. In its simplest form, the script merely checks whether it has already been applied and if so, makes no changes. However, it is also worth adding ‘guard clauses’ that check whether the object exists before you create or alter it, so that you prevent error from objects being altered if they don’t exist or being created when they already exist. I like to test idempotent scripts that only alter existing objects against an empty database until they execute, doing nothing, but without errors.

It pays to be cautious and take as little as possible for granted. The errors that come from a target database having ‘drifted’ from its stated version are often difficult to predict. The more complicated, truly idempotent scripts will review every change and make the change only if it is necessary and possible.

Avoiding dropping and recreating routines

Obviously, tables require the most care in migration-based code, if you wish to preserve the data. However, you can also go seriously wrong with techniques that drop a routine and then create it because you lose all the permissions and extended properties associated with it. Relational databases are all moving towards a syntax for creating or altering programmable objects (a.k.a. ‘routines’) that makes a lot of ‘guard clauses’ unnecessary for them. In SQL Server, from 2016 onwards, we have the ANSI SQL clauses DROP IF EXISTS and CREATE OR ALTER, which are already standard on other database systems.

DROP IF EXISTS doesn’t help you in code that seeks to prevent you altering a routine that doesn’t yet exist. However, you can now use CREATE OR ALTER for the purpose in SQL Server DDL code for most ‘routines’ such as views, stored procedures and functions (including natively-compiled varieties of both) and triggers, but not the deprecated RULE and DEFAULT. You can’t use it for ‘storage objects’ (tables, indexes and indexed views), schemas, CLR objects or assemblies.

Where you can’t use CREATE OR ALTER, you can use the following syntax to preserve existing permissions and extended properties. The ALTER syntax is always used, but it is preceded by a batch that checks to see if the object exists. If the object doesn’t exist, it is created with a dummy body; I chose something non-toxic, SET NOCOUNT ON, but it has no significance, as the following ALTER immediately changes it for the real definition.

IF OBJECT_ID('MyProcedure','U') IS NULL

EXEC('CREATE PROCEDURE MyProcedure AS SET NOCOUNT ON;')

ALTER PROCEDURE MyProcedure--etc.

Idempotency using Guard clauses

Each relational database system has its own way of checking whether it is safe and necessary to execute the DDL. The ANSI standard way is to use the Information schema, which is provided in Apache Hive, Microsoft SQL Server, MySQL, PostgreSQL, H2 Database, HSQLDB, InterSystems Cache, MariaDB, Presto and MemSQL.

Safely altering a column

There are cross-platform ways of making sure that a table and column exists and checking whether it needs fixing to make it NOT NULL. Here is an example of using information_schema in a guard clause. A typical task, though not one that would affect a release, is correcting a hastily created table where the columns weren’t specified as NOT NULL. This can cause bad data to leak in.

The following code will make the Edition_id column NOT NULL only if the table and column exists, it is currently nullable, and there are no NULL values in the existing column.

IF (

(

SELECT CASE --return 0 if not nullable 1 if nullable

(--return null if it doesn't exist

SELECT IS_NULLABLE

FROM INFORMATION_SCHEMA.COLUMNS

WHERE TABLE_SCHEMA = 'dbo'

AND TABLE_NAME = 'prices'

AND COLUMN_NAME = 'Edition_id'

) WHEN 'NO' THEN 0

WHEN 'YES' THEN 1 ELSE -1 END

) = 1

)

BEGIN

IF NOT EXISTS (SELECT 1 FROM prices WHERE Edition_id IS NULL)

ALTER TABLE dbo.prices ALTER COLUMN Edition_id INT NOT NULL;

END;

The INFORMATION_SCHEMA.COLUMNS table has plenty of useful information that can help avoid any errors that come from applying a script twice. There is ORDINAL_POSITION, COLUMN_DEFAULT, IS_NULLABLE, DATA_TYPE, CHARACTER_MAXIMUM_LENGTH, CHARACTER_OCTET_LENGTH, NUMERIC_PRECISION, NUMERIC_PRECISION_RADIX, NUMERIC_SCALE, DATETIME_PRECISION, CHARACTER_SET_CATALOG, CHARACTER_SET_SCHEMA, CHARACTER_SET_NAME, COLLATION_CATALOG, COLLATION_SCHEMA, COLLATION_NAME, DOMAIN_CATALOG, DOMAIN_SCHEMA, DOMAIN_NAME. You’ll recognize some of these from ODBC calls, which uses the information to tell the application about the database metadata, which is why the INFORMATION_SCHEMA is maintained.

In SQL Server we also have the system views and an undergrowth of system functions that provide shortcuts to get at the information. Unless you need a degree of cross-database compatibility, using them is a bit simpler and more versatile.

IF (COLUMNPROPERTY( OBJECT_ID('dbo.prices', 'U'), --the table

'Edition_id', --the column name

'AllowsNull'

) = 1

)

BEGIN

IF NOT EXISTS (SELECT 1 FROM dbo.prices WHERE Edition_id IS NULL)

ALTER TABLE dbo.prices ALTER COLUMN Edition_id INT NOT NULL;

END;

These system functions afford us the useful trick in that any metadata function that references an object will return NULL if the object doesn’t exist. This guard clause, using the COLUMNPROPERTY() metadata function, will check to see whether there is a table in existence called dbo.prices with a column called Edition_id that allows nulls. If this returns NULL then either the column or the table doesn’t exist. If 0 then the column exists and it is NOT NULL, 1 if it allows nulls.

If you tried to run the ALTER TABLE statement without the table or column being there, you’d get an error. This ColumnProperty() function will tell you a lot more. When you’re altering the properties of a column, you’d need to check on the property you want to alter. It could be any of these, though some are rarely altered in practice:

AllowsNull, ColumnId, FullTextTypeColumn, GeneratedAlwaysType, IsColumnSet, IsComputed, IsCursorType, IsDeterministic, IsFulltextlndexed, IsHidden, Isldentity, IsldNotForRepl, Islndexable, IsOutParam, IsPrecise, IsRowGuidCol, IsSparse, IsSystemVerified, isXmllndexable, Precision, Scale, StatisticalSemantics, SystemDataAccess, UserDataAccess, and UsesAnsiTrim.

Creating indexes safely

It isn’t the only useful system function. Imagine that you need to add an index on the publication_id column of dbo.editions in your database. Instead of plunging straight in, and assuming the current state, your code must first check whether the table dbo.editions actually exists and that there isn’t already an index of that name on the publication_id column. If you just have the second IndexProperty check, you could, when it returns NULL, wrongly assume that table exists. With an idempotent script, it is best to assume nothing.

IF (Object_Id('dbo.editions','U') IS NOT NULL

AND IndexProperty(Object_Id('dbo.editions','U'),'Publicationid_index','IndexID') IS NULL)

CREATE INDEX Publicationid_index ON dbo.Editions(publication_id)

This checks that a table called dbo.editions exists, but that it hasn’t got an index called Publicationid_index on the column publication_ID. You can run and rerun this without an error. You can even run it on an empty database. It only makes the change once. The indexproperty in the example is the index_id of the index of that name in the table or indexed view, but we don’t actually need it. We are just using the function to check that an index of that name doesn’t exist yet.

With SQL Server, system functions like ColumnProperty() and IndexProperty() that come in handy for guard clauses but probably the most generally useful is ObjectPropertyEx.

Safely creating and altering tables

So far, we’ve avoided guard clauses for creating tables because there are no clauses like CREATE OR ALTER for them. If you want to create a table only if it doesn’t exist, you have two alternatives. The first is to use the NOEXEC trick:

--------Stores Table--------

IF Object_Id('dbo.stores','U') IS NOT NULL --does it exist?

SET NOEXEC ON --if it does, then temporarily just parse, not execute

ELSE

PRINT N'Creating [dbo].[stores]'

CREATE TABLE dbo.stores

(

stor_id char (4) NOT NULL,

stor_name nvarchar (80) NULL,

stor_address nvarchar (80) NULL,

city nvarchar (40) NULL,

state char (2) NULL,

zip char (5) NULL

) ON [PRIMARY]

SET NOEXEC OFF --and switch on execution again (Important!)

The second is to execute a batch conditionally as a string using Execute or sp_ExecuteSQL:

IF Object_Id('dbo.stores','U') IS NULL --does it exist?

EXECUTE sp_ExecuteSQL @Stmt= N'

PRINT N''Creating [dbo].[stores]''

CREATE TABLE dbo.stores

(

stor_id char (4) NOT NULL,

stor_name nvarchar (80) NULL,

stor_address nvarchar (80) NULL,

city nvarchar (40) NULL,

state char (2) NULL,

zip char (5) NULL

) ON [PRIMARY]

Before altering a table, you can easily check whether it exists, and, at the same time, whether it needs to be altered. I would use this check clause:

IF ObjectPropertyEx(Object_Id('dbo.stores','U'),'TableHasPrimaryKey') = 0

/*if the stores table exists (it would return null otherwise) and it has

no primary key yet*/

BEGIN

PRINT N'Creating primary key UPK_storeid on dbo.stores'

ALTER TABLE dbo.stores ADD CONSTRAINT UPK_storeid PRIMARY KEY CLUSTERED (stor_id) ON [PRIMARY]

end

You can, of course, nowadays declare indexes and constraints as part of the table definition in SQL Server, which makes an idempotent expression a lot easier. However, it doesn’t meet all possible conditions, such as the table already existing, so you’d have to define the primary key in both places.

We can combine conditional table creation with checking to see if it needs an index. Here is a table being created if necessary, with a non-clustered index as well as a primary key constraint:

--------Discounts Table--------

IF Object_Id('dbo.discounts','U') IS NOT NULL

SET NOEXEC ON

ELSE

PRINT N'Creating dbo.discounts'

CREATE TABLE dbo.discounts

(

discounttype nvarchar (80) NOT NULL,

stor_id char (4) NULL,

lowqty smallint NULL,

highqty smallint NULL,

discount decimal (4, 2) NOT NULL,

Discount_id int NOT NULL IDENTITY(1, 1)

) ON [PRIMARY]

SET NOEXEC OFF

IF ObjectPropertyEx(Object_Id('dbo.discounts','U')

,'TableHasPrimaryKey') = 0--table exists but no PK

BEGIN

PRINT N'Creating primary key [DiscountDiscountID] on dbo.discounts'

ALTER TABLE dbo.discounts ADD CONSTRAINT DiscountDiscountID

PRIMARY KEY CLUSTERED (Discount_id) ON [PRIMARY]

end

IF (Object_Id('dbo.discounts','U') IS NOT NULL

AND IndexProperty(Object_Id('dbo.discounts','U'),

'Storid_index','IndexID') iS NULL)

BEGIN

PRINT N'Creating index [Storid_index] on [dbo].[discounts]'

CREATE NONCLUSTERED INDEX Storid_index

ON dbo.discounts (stor_id) ON [PRIMARY]

End

Saving idempotent DDL scripts as Prompt snippets

I would never rely on my memory or artistry to write an idempotent migration script. I am a SQL Prompt user, so I turn them into snippets (see: using an SSMS template in a SQL Prompt snippet).

Here is an example of an idempotent table creation script saved for reuse as a Prompt snippet. You just select code for the body of the table and then click the name of the snippet in the action menu

IF OBJECT_ID('<dbo.table_name, sysname, dbo.MyTable>', 'U') IS NOT NULL

BEGIN

PRINT N'''<dbo.table_name, sysname, dbo.MyTable>'' already exists';

SET NOEXEC ON --if it does, then temporarily just parse, not execute

end

ELSE

PRINT N'Creating '' <dbo.table_name, sysname, dbo.MyTable>''';

C code for <dbo.table_name, sysname, dbo.MyTable> created in $DBNAME$ on $DATE$ by $USER$

CREATE TABLE <dbo.table_name, sysname, dbo.MyTable>

(

$SELECTEDTEXT$

)

SET NOEXEC OFF

Running idempotent migrations in Flyway

As a final example, here is code for adding documentation to the tables of the pubs database I’ve been using as an example project in the Flyway articles. This will work even if the tables don’t exist or if the tables already have comments. It is far easier to use than putting the documents in the code for each table. The code is placed in the file V1.1.7__AddIndexes.sql in the Pubs project and includes a number of idempotent index insertions as well.

You can imagine that it started out life as a patch of indexes to cover foreign keys as a performance enhancement in production (V1.1.6) which then were retrospectively added as a migration. It would have been runnable as a regular migration and could be applied to production without doing anything harmful.

I don’t regard the process of adding documentation as a metadata change. The use of extended properties for this is an excellent way of supporting this idea. With this script, you can retrospectively add documentation to previous versions of the database whenever you rebuild them. All you need to do is to add it as an AfterMigrate callback. I’ve added an expanded version of this script to GitHub.

/*write the documentation to all the tables specified in the list that exist in the

database either writing to, or updating, what is there */

DECLARE @WhatToDocument TABLE (TheOrder INT IDENTITY, Tablename sysname, TheDescription NVARCHAR(4000));

INSERT INTO @WhatToDocument (Tablename, TheDescription)

VALUES

('dbo.authors', 'The authors of the publications. a publication can have one or more author'),

('dbo.discounts', 'These are the discounts offered by the sales people for bulk orders'),

('dbo.editions', 'A publication can come out in several different editions, of maybe a different type'),

('dbo.employee', 'An employee of any of the publishers'),

('dbo.jobs', 'These are the job descriptions and min/max salary level' ),

('dbo.prices', 'these are the current prices of every edition of every publication'),

('dbo.Sales', 'these are the sales of every edition of every publication'),

('dbo.pub_info', 'this holds the special information about every publisher'),

('dbo.publications', 'This lists every publication marketed by the distributor'),

('dbo.publishers', 'this is a table of publishers who we distribute books for'),

('dbo.roysched', 'this is a table of the authors royalty scheduleable'),

('dbo.stores', 'these are all the stores who are our customers'),

('dbo.TagName', 'All the categories of publications'),

('dbo.TagTitle', 'This relates tags to publications so that publications can have more than one'),

('dbo.titleauthor', 'this is a table that relates authors to publications, and gives their order of listing and royalty%');

DECLARE @ii INT, @iiMax INT; --the iterators

--the values fetched for each row

DECLARE @TableName sysname, @Schemaname sysname,

@TheDescription NVARCHAR(4000),@Object_id int;

--initialise the iterator

SELECT @ii = 1, @iiMax = Max(TheOrder) FROM @WhatToDocument;

WHILE @ii <= @iiMax

BEGIN

SELECT @Schemaname = Object_Schema_Name(Object_Id(Tablename)),

@TableName = Object_Name(Object_Id(Tablename)),

@TheDescription = TheDescription,

@Object_id=Object_Id(TableName)

FROM @WhatToDocument

WHERE @ii = TheOrder;

IF (@Object_id IS NOT NULL) --if the table exists

IF NOT EXISTS --does the extended property exist?

(

SELECT 1

FROM sys.fn_listextendedproperty

(N'MS_Description', N'SCHEMA',@Schemaname,

N'TABLE',@TableName, null, null

)

EXEC sys.sp_addextendedproperty @name = N'MS_Description',

@value = @TheDescription, @level0type = N'SCHEMA',

@level0name = @Schemaname, @level1type = N'TABLE',

@level1name = @TableName;;

ELSE

EXEC sys.sp_updateextendedproperty @name = N'MS_Description',

@value = @TheDescription, @level0type = N'SCHEMA',

@level0name = @Schemaname, @level1type = N'TABLE',

@level1name = @TableName;;

SELECT @ii = @ii + 1; -- on to the next one

END;

Conclusion

Idempotent code is generally useful for running hotfixes to ensure that it just doesn’t matter if they are run more than once. It is also magical for some tedious tasks such as maintaining documentation.

I like idempotent code, but you need to be aware of the implications of what you’re doing. An example of doing it wrong is to check for the prior existence of a table before you CREATE it, and then try to delete the previous version. The motivation is to prevent an error informing you that the table already exists. Even if you have no foreign keys or schema-bound objects, I’d have thought that this sort of error would need a rollback and quite a bit of investigation. A novice programmer did it to me once and wiped several tables in a production database (I managed restore it within some very embarrassing minutes). He had a build script like this and thought he was accessing his own Dev server. Usually, it is better to execute the code within a transaction and let the code produce an error severe enough to roll-back the entire script. However, the code you use to ensure idempotency can be adapted for this: To be completely certain, and you can be completely confident of executing within a transaction, you can use the guard clauses to check first and produce an error rather than attempting a more graceful avoidance.

So, this sort of coding is a useful addition to your armoury, but it is possible to overdo it!

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Product articles
Redgate Flyway
Deployment checks and safeguards
Creating Idempotent DDL Scripts for…

Guest post

Creating Idempotent DDL Scripts for Database Migrations

Guest post

The uncontrolled hotfix

The workaround: Flyway cherry picking

The solution: idempotent SQL migration scripts

Avoiding dropping and recreating routines

Idempotency using Guard clauses

Safely altering a column

Creating indexes safely

Safely creating and altering tables

Saving idempotent DDL scripts as Prompt snippets

Running idempotent migrations in Flyway

Conclusion

Tools in this post

Redgate Flyway

SQL Prompt

Redgate Flyway

Product Articles

University

Events

Forums

Community

Simple Talk

Redgate Test Data Manager

Redgate Flyway

Redgate Monitor

Overview

Protect

Automate

Monitor

Product articles Redgate Flyway Deployment checks and safeguards Creating Idempotent DDL Scripts for…

Guest post

Guest post

The uncontrolled hotfix

The workaround: Flyway cherry picking

The solution: idempotent SQL migration scripts

Avoiding dropping and recreating routines

Idempotency using Guard clauses

Safely altering a column

Creating indexes safely

Safely creating and altering tables

Saving idempotent DDL scripts as Prompt snippets

Running idempotent migrations in Flyway

Conclusion

Tools in this post

Redgate Flyway

SQL Prompt

Redgate Flyway

You may also like

Redgate Hub

Product Articles

University

Events

Forums

Community

Simple Talk

Product articles
Redgate Flyway
Deployment checks and safeguards
Creating Idempotent DDL Scripts for…