Robyn Page's SQL Server Data Validation Workbench

Checking your data on entry

Contents

Rules
Defaults
Triggers
Integrity Constraints
Questions

Introduction

If you are completely confident about the data that is being inserted or updated in a table, then you won’t need constraints. Even with a complete logical interface of stored procedures you’ll still require them for development work because the impossible always seems to happen.

In the real world, however, and particularly if the real world includes programmers using ‘Dynamic SQL’, or data feeds into tables and so on, then they are a vital defence for the integrity of your data.

There are several different ways of putting constraints and checks on data, and the purpose of this workshop is just to point out the odd quirk that might be of practical help, whilst running through the list.

Rules, defaults, indexes, and triggers are used for:

Requirement integrity – a column must have data in every row, no nulls
Check or Validity – Is the data valid? a valid Email address for example. Triggers or rules are generally used.
Uniqueness – no two table rows can have the same non-null values for the selected columns. Unique Indexes are used for this.
Referential integrity – data inserted into a table column must already have matching data in another table column or another column in the same table.

I have a certain difficulty with this workbench, as the SQL Server team have given advanced notice of its intention to replace Rules and Defaults, probably with ANSI Domains. However, until that time comes, they are the most effective way to implement certain business rules in the database.

One could argue that they shouldn’t be used because they are deprecated, but the fact is that there is no realistic substitute in SQL 2005 for the functionality, and safety mechanisms, they provide.

Would you refuse to wear a seatbelt, on hearing that a better design was due in four years time?

Rules

Rules are the best way of ensuring that business rules, especially those that are liable to change, are implemented in one place only.

As mentioned above, Rules are considered deprecated in SQL Server 2005. However there is nothing yet that replaces the functionality they provide until Domains are properly implemented. One should be aware of possible problems when upgrading. BOL gives the mysterious advice to use Check constraints instead, as if the two devices had the same functionality.

Rules are used to check data that is being inserted or updated in columns. If you need to perform a fairly complex check, and it has to be done in more than one table, then a rule is very handy. One rule can be referenced from any number of tables or user-defined data types in the database. You can bind a rule to to a user-defined data type, so it will then apply to any column where the datatype is used. This is even more convenient and safe.

The process is…

Create the rule using CREATE RULE.
Bind the rule to a column or user-defined datatype using sp_bindrule.
Test the bound rule by inserting or updating data.

You can create a rule easily in EM or SMSS but don’t even think about altering the rule if you have bound it to an object. The interface was not implemented with any enthusiasm.

Imagine that you have a database in which you wish to handle postcodes. Normally, you’d want to put the whole messy business of checking that the format is valid into a stored procedure, as well as requiring that the front-end applications do all the appropriate filtering. However, this isn’t always possible. Whatever happens, complex code like this must be kept in one place and heavily documented.

Postcode validation criteria are as follows (slightly simplified):

The entire length must be between 6 and 8 characters
A space must be included
The local (inward) code, to the right of the space, always 3 chars
The first character of the local code must be numeric
The second and third characters of the local code must be alpha
The Sorting Office (outward) code the left of the gap, can be between
2 and 4 characters
The first character of the Sorting Office (outward) code must be alpha

We put these criteria, which will define our RULE, in a test harness to develop it and test it as much as possible, as they are not the easiest of objects to change.

--start of the test harness

DECLARE @Postcode VARCHAR(10)

SELECT @Postcode='CO10 7SP'

SELECT CASE WHEN

--start of the validation rules

LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars

AND

LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only!

--local (inward) code, to the right of the space, 3 characters

AND--The second and third characters of the local code must be alpha

SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]'

AND--first character of the Sorting Office (outward) code must be alpha

SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%'

AND--Sorting Office (outward) code the left of the gap, between 2 and 4

CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5

--one number followed by two letters.

--end of the validation rules

THEN 'good' ELSE 'bad' END

--end of the test harness

/* Now we create the RULE

You can use any name for the argument (we chose @Postcode), but the

first character must be "@." Make sure it is something meaningful

You can use built-in functions but anything that references other

database objects is forbidden.

IF EXISTS (SELECT * FROM dbo.sysobjects

WHERE id = OBJECT_ID(N'[dbo].[PostcodeValidation]')

AND OBJECTPROPERTY(id, N'IsRule') = 1)

DROP RULE [dbo].[PostcodeValidation]

CREATE RULE PostcodeValidation

LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars

AND

LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only!

--local (inward) code, to the right of the space, 3 characters

AND--The second and third characters of the local code must be alpha

SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]'

AND--first character of the Sorting Office (outward) code must be alpha

SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%'

AND--Sorting Office (outward) code the left of the gap, between 2 and 4

CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5

--one number followed by two letters.

--then we apply the documentation

EXEC sp_addextendedproperty 'MS_Description',

'length between 6 and 8 characters

Containing one space only

3 characters to the right of the space

of which the first myst be numeric and the other two alpha

Before the space can be between 2 and 4 characters

of which the first must be alpha',

'user', dbo, 'rule', PostcodeValidation

With that done we can then bind the rule to a column of a table. You can bind a rule to as many columns as you wish within the database: it keeps everything neat without duplication of the implementation of business rules.

CREATE TABLE TestOfRule (MyID INT IDENTITY(1,1), Postcode VARCHAR(10))

You cannot bind a rule to a text, image, or timestamp datatype column.

You cannot bind rules to system tables. Still, who cares?*/

--First bind the rule to the table column

EXEC sp_bindrule PostcodeValidation, 'TestOfRule.Postcode'

--Now let's try out some inserts

INSERT INTO TestOfRule(postcode) SELECT 'CM20 3EQ'--OK

INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--OK

INSERT INTO TestOfRule(postcode) SELECT 'CM 3EQ'--OK

INSERT INTO TestOfRule(postcode) SELECT 'CM2 30EQ'--not so good

INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--not so good

INSERT INTO TestOfRule(postcode) SELECT 'SQUIFFY'--Not so good

/*---error---!

A column insert or update conflicts with a rule imposed by a previous

CREATE RULE statement. The statement was terminated. The conflict

occurred in database 'master', table 'TestOfRule', column 'Postcode'.

The statement has been terminated.

Now, could we use a function in a rule? This would make it so much easier to develop and we can do a bit more processing.

Sadly no. A missed opportunity.

CREATE FUNCTION [dbo].[ufiCleverStuff]

( @string VARCHAR(8000)) RETURNS INT AS BEGIN RETURN 1 END

ALTER TABLE dbo.TestOfRule ADD Address VARCHAR(255) NULL

CREATE RULE AddressValidation AS dbo.ufiCleverStuff(@address)=1

--This will cause an error!

/* However, we can bind the rule to a user-defined data type */

EXEC sp_addtype Postcode,'Varchar(10)',NONULL --NONULL=no nulls allowed

EXEC sp_bindrule PostcodeValidation, 'Postcode'

sp_help Postcode --just to chck that the rule is there (Rule_Name)

--And use it wherever we want

CREATE TABLE SecondTestOfRule (MyID INT IDENTITY(1,1), Postcode postcode)

--and try it out

SELECT * FROM SecondTestOfRule

INSERT INTO SecondTestOfRule(postcode) SELECT 'ME3 5EQ'

--OK

INSERT INTO SecondTestOfRule(postcode) SELECT 'AR3 567'

--didn't like the 567 bit

INSERT INTO SecondTestOfRule(postcode) SELECT 'ME2 30RP'

--not so good

INSERT INTO SecondTestOfRule(postcode) SELECT 'ME 3RP'

--OK

INSERT INTO SecondTestOfRule(postcode) SELECT 'DE52 3EQ'

--not so good

INSERT INTO SecondTestOfRule(postcode) SELECT 'SQUIFFY'

--Not so good

UPDATE SecondTestOfRule SET postcode='SE34 2DS' WHERE myid=5

UPDATE SecondTestOfRule SET postcode='squiffy' WHERE myid=5

--good, it chucked it out

Rules bound to columns take precedence over rules bound to user

datatypes.

--now what happens when we create a variable of the datatype

DECLARE @MyPostcode Postcode

SELECT @Mypostcode='&^bananas****'

Not what you’d expect; it accepts a silly postcode. There is, sadly, no checking of a rule on variable with a user-Defined datatype.

Defaults

Defaults and rules seem to have attracted the displeasure of the SQL Standards committee.

Defaults are deprecated by SQL Server 2005 but are actually rather useful because they can be ‘bound’ to a user-defined Data Type as well as any column in the entire database. There is no other way of doing this.

Take a common example. I believe that every table should have a column that gives the insertion date. You can, of course laboriously set up a default constraint on every table but why not do the following…

--just create a simple default of the current date and time

CREATE DEFAULT DateOfInsertion AS GETDATE()

--now document it

EXEC sp_addextendedproperty 'MS_Description',

'this is a default supplying the current date',

'user', dbo, 'default', DateOfInsertion

--add a new type called InsertionDate

EXEC sp_addtype InsertionDate,'DateTime',NONULL

--add the default to the type

sp_bindefault DateOfInsertion, "InsertionDate"

sp_help InsertionDate --check that the default is there (Default_name)

--now the whole process is a lot simpler

CREATE TABLE EventLog (MyID INT IDENTITY(1,1),

TheDescription VARCHAR(100),

insertionDate insertionDate)

INSERT INTO Eventlog(theDescription)

VALUES ('the martians are attacking')

INSERT INTO Eventlog(theDescription)

VALUES ('It is the treens')

INSERT INTO Eventlog(theDescription)

VALUES ('Call in Dan Dare')

INSERT INTO Eventlog(theDescription)

VALUES ('Dan Dare to the rescue')

SELECT * FROM EventLog--note that the dates have been filled in

/* but there is a feeling that there is something missing.. Why not

have a nice identity field User-Defined Data Type too? */

EXEC sp_addtype counter,INT,'IDENTITY' -- Sybase only. Sorry

--you'll get an error

Triggers

Triggers can be quite complex, as Pop Rivett explains elsewhere on this site with his excellent ‘Auditing’ trigger. We’ll only discuss triggers as data constraints.

Imagine we want to check addresses that are being placed in an address table. We want to ensure that we’ve done reasonable sanity checks to the address. For a start, we believe that an address should contain at least four words and alphanumeric characters with just a sprinkling of punctuation

Here is one of Phil Factor’s routines (thanks Phil).

CREATE FUNCTION [dbo].[ufiWordcountVarchar] --counts the number of

--words in a VARCHAR. We need something like this to check a valid

--address.

(

@string VARCHAR(8000),

@WordStartCharacters VARCHAR(255)='a-z',

@WordCharacters VARCHAR(255)='-a-z'''

)

RETURNS INT

BEGIN

DECLARE @Position INT, --index of current search

@WhereWeAre INT,--index into string so far

@wordcount INT,--the number of words found so far

@next INT, --where the next search starts

@size INT --the total size of the text

SELECT @WhereWeAre=1,@size=LEN(@string),@Position=1,@wordcount=0

WHILE @Position>0

BEGIN

SELECT @Position=PATINDEX('%['+@wordStartCharacters+']%',

SUBSTRING(@string,@whereWeAre,8000))

IF @Position>0

BEGIN

SELECT @next=@WhereWeAre+@Position,@wordcount=@Wordcount+1

SELECT @Position=PATINDEX('%[^'+@wordCharacters+']%',

SUBSTRING(@string,@next,8000)+' ')

SELECT @WhereWeAre=@next+@Position

END

RETURN @wordcount

END

/* so we create a test table and add a trigger to it, putting the two

checks in the trigger */

CREATE TABLE MyAddresses (MyID INT IDENTITY (1,1), Address VARCHAR(100))

CREATE TRIGGER tCheckAddress_MyAddress

ON MyAddresses

FOR INSERT, UPDATE

BEGIN

IF (SELECT MIN(dbo.ufiWordcountVarchar(Address,DEFAULT,DEFAULT))

FROM inserted)<4

EXISTS (SELECT 1 FROM inserted WHERE address LIKE '%[^-a-z0-9''",.()]%')

BEGIN

ROLLBACK TRANSACTION

RAISERROR ('Suspect address inserted into MyAddresses',16,1)

END

--not forgetting the documentation

EXEC sp_addextendedproperty 'MS_Description',

'This checks to see if there are at least four words in the address

and checks for characters you wouldn''t expect to see in an address ',

'user', dbo, 'table', MyAddresses,'trigger', tCheckAddress_MyAddress

INSERT INTO MyAddresses (address)

VALUES ('32, Acacia Avenue, Goldhay, Berkshire')

INSERT INTO MyAddresses (address)

VALUES ('12 the Firs, Clive West Way, Slough, Berks')

INSERT INTO MyAddresses (address)

VALUES ('46, the promenade, felixstowe essex fizzbang<>*')

INSERT INTO MyAddresses (address)

VALUES ('Buckingham palace')

UPDATE Myaddresses SET address='Aargh!' WHERE MyID=2

SELECT * FROM MyAddresses

Integrity Constraints

Integrity Constraints, like rules, limit or ‘constrain’ you in what you can put in a table or column. Unlike triggers, integrity constraints cannot cascade changes through related tables in the database, enforce complex restrictions by referencing other database objects, perform “what if” analyses or roll back the current transaction as a result of enforcing data integrity. (With triggers, you can either roll back or continue the transaction, depending on how you handle referential integrity.) Nevertheless they are the most commonly used constraint and simple to define and alter.

There are three types of Table-level constraint:

Unique and primary key constraints ( no two rows in a table have the same values in the specified columns – with a primary key constraint it can’t be null either).
Referential integrity constraints (enforces the rule that, for a specific column, there must already be matching data in the column it references).
Check constraints limit the values of data inserted into columns.

We’re most concerned with Check constraints in this article. Like a rule, the check has to be an expression that would fit in a WHERE or IF Clause, but can’t involve a subquery. You can apply several constraints on the one column.

Imagine we want to check that an email address is valid.

CREATE TABLE TestConstraint

( MyID INT IDENTITY(1,1),

MyEmailAddress VARCHAR(50) CONSTRAINT my_check_constraint

CHECK (MyEmailAddress NOT LIKE '%[^a-z-_.@]%'

AND LEN(REPLACE(MyEmailAddress,'@',''))=LEN(MyEmailAddress)-1)

)

EXEC sp_addextendedproperty 'MS_Description',

'Check to make sure that only valid characters are in the email address

and there is only one @ sign',

'user', dbo, 'table', testconstraint,'constraint', my_check_constraint

INSERT INTO TestConstraint (MyEmailAddress)

SELECT 'Robyn.Page@Simple-talk.com'

INSERT INTO TestConstraint (MyEmailAddress)

SELECT 'Silly@>@Simple-talk.com'

INSERT INTO TestConstraint (MyEmailAddress)

SELECT 'Bill.Gates@Microsoft.com'

UPDATE TestConstraint

SET MyEmailAddress='dodgy@@simple-talk.com' WHERE MyID=1

/* the second one and fourth will fail. Ideally, one should be able to

bind a user-defined error message to the constraint. Again, sorry -

Sybase only!

if we want the constraint to check other columns, then we need a table

-level constraint rather then a column constraint*/

CREATE TABLE SecondTestConstraint

( MyID INT IDENTITY(1,1),

MyName VARCHAR(50),

MyDomain VARCHAR(50),

CONSTRAINT my_Second_check_constraint

CHECK (MyName+MyDomain NOT LIKE '%[^a-z-_.@]%'

AND LEN(REPLACE(MyName+MyDomain,'@',''))

=LEN(MyName+MyDomain)-1)

)

INSERT INTO SecondTestConstraint (MyName,MyDomain)

SELECT 'Robyn.Page','@Simple-talk.com'

INSERT INTO SecondTestConstraint (MyName,MyDomain)

SELECT 'Mad <andBad>','@Simple-talk.com'

So we see that the table-level check has done a check based on both columns.

A unique index permits no two rows to have the same index value, including NULL. A unique index checks that all data in the column is unique when it is created and also when an insert or update is attempted.

You cannot create a unique index on a column that contains null values in more than one row – an error is, obviously, triggered.

You can use the unique keyword on composite indexes.

CREATE UNIQUE INDEX idxUniqueEmail ON TestConstraint(MyEmailAddress)

INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com'

INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Kubla@Kahn.com'

INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Robyn@Kahn.com'

INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Anyone@Kahn.com'

INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Admin@Kahn.com'

INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com'

--yes, they all go in except for the last one which triggers an error

The use of referential constraints is beyond the scope of this article, as it is difficult to talk about them without going on to write the entire chapter of a book, touching on the complexities of referential data models, cascading deletes and so on.

This workshop started off when I was staring at the tree view of Enterprise Manager, and suddenly realising I’d never used Rules and wondering what they were for! I hope you’ve learned something new too.

Just a few questions just to check …

When would you choose a Rule as the best way of checking data going into your database
What are the advantages of defaults over default constraints?
Why would you choose a table-level constraint over a column level constraint?
How do you ensure that the value entering a table is unique to the column?
When would you choose a trigger to check on data over either a rule or default constraint?
What is the point of putting a default on a user-defined datatype?
What happens if you create a Unique index on a column that has two null values in it? Why?
how would you check in SQL whether a default or rule is bound to a user-Defined datatype?
What is the easiest way of ensuring that a column containing foreign keys actually reference valid primary keys in another table.
What are Rules and Defaults likely to be replaced by in future?

Register for Simple Talk

Robyn Page’s SQL Server Data Validation Workbench

Checking your data on entry

Introduction

Rules

Defaults

Triggers

Integrity Constraints

Article tags

About the author

Robyn Page

Robyn Page's contributions

Articles

Books

Top topics

Robyn Page's latest contributions:

SQL Server Matrix Workbench

Database Design Workbench – Keys

NET Performance Cribsheet

Checking your data on entry

Introduction

Rules

Defaults

Triggers

Integrity Constraints

Article tags

Recommended

About the author

Robyn Page

Robyn Page's contributions

Articles

Books

Top topics

Robyn Page's latest contributions:

SQL Server Matrix Workbench

Database Design Workbench – Keys

NET Performance Cribsheet