Checking your data on entry
Contents
Introduction
If you are completely confident about the data that is being inserted or updated in a table, then you won’t need constraints. Even with a complete logical interface of stored procedures you’ll still require them for development work because the impossible always seems to happen.
In the real world, however, and particularly if the real world includes programmers using ‘Dynamic SQL’, or data feeds into tables and so on, then they are a vital defence for the integrity of your data.
There are several different ways of putting constraints and checks on data, and the purpose of this workshop is just to point out the odd quirk that might be of practical help, whilst running through the list.
Rules, defaults, indexes, and triggers are used for:
- Requirement integrity – a column must have data in every row, no nulls
- Check or Validity – Is the data valid? a valid Email address for example. Triggers or rules are generally used.
- Uniqueness – no two table rows can have the same non-null values for the selected columns. Unique Indexes are used for this.
- Referential integrity – data inserted into a table column must already have matching data in another table column or another column in the same table.
I have a certain difficulty with this workbench, as the SQL Server team have given advanced notice of its intention to replace Rules and Defaults, probably with ANSI Domains. However, until that time comes, they are the most effective way to implement certain business rules in the database.
One could argue that they shouldn’t be used because they are deprecated, but the fact is that there is no realistic substitute in SQL 2005 for the functionality, and safety mechanisms, they provide.
Would you refuse to wear a seatbelt, on hearing that a better design was due in four years time?
Rules
Rules are the best way of ensuring that business rules, especially those that are liable to change, are implemented in one place only.
As mentioned above, Rules are considered deprecated in SQL Server 2005. However there is nothing yet that replaces the functionality they provide until Domains are properly implemented. One should be aware of possible problems when upgrading. BOL gives the mysterious advice to use Check constraints instead, as if the two devices had the same functionality.
Rules are used to check data that is being inserted or updated in columns. If you need to perform a fairly complex check, and it has to be done in more than one table, then a rule is very handy. One rule can be referenced from any number of tables or user-defined data types in the database. You can bind a rule to to a user-defined data type, so it will then apply to any column where the datatype is used. This is even more convenient and safe.
The process is…
- Create the rule using
CREATE RULE
. - Bind the rule to a column or user-defined datatype using sp_bindrule.
- Test the bound rule by inserting or updating data.
You can create a rule easily in EM or SMSS but don’t even think about altering the rule if you have bound it to an object. The interface was not implemented with any enthusiasm.
Imagine that you have a database in which you wish to handle postcodes. Normally, you’d want to put the whole messy business of checking that the format is valid into a stored procedure, as well as requiring that the front-end applications do all the appropriate filtering. However, this isn’t always possible. Whatever happens, complex code like this must be kept in one place and heavily documented.
Postcode validation criteria are as follows (slightly simplified):
- The entire length must be between 6 and 8 characters
- A space must be included
- The local (inward) code, to the right of the space, always 3 chars
- The first character of the local code must be numeric
- The second and third characters of the local code must be alpha
- The Sorting Office (outward) code the left of the gap, can be between
- 2 and 4 characters
- The first character of the Sorting Office (outward) code must be alpha
We put these criteria, which will define our RULE
, in a test harness to develop it and test it as much as possible, as they are not the easiest of objects to change.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
--start of the test harness DECLARE @Postcode VARCHAR(10) SELECT @Postcode='CO10 7SP' SELECT CASE WHEN --start of the validation rules LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars AND LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only! --local (inward) code, to the right of the space, 3 characters AND--The second and third characters of the local code must be alpha SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]' AND--first character of the Sorting Office (outward) code must be alpha SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%' AND--Sorting Office (outward) code the left of the gap, between 2 and 4 CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5 --one number followed by two letters. --end of the validation rules THEN 'good' ELSE 'bad' END --end of the test harness /* Now we create the RULE You can use any name for the argument (we chose @Postcode), but the first character must be "@." Make sure it is something meaningful You can use built-in functions but anything that references other database objects is forbidden. */ IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N'[dbo].[PostcodeValidation]') AND OBJECTPROPERTY(id, N'IsRule') = 1) DROP RULE [dbo].[PostcodeValidation] GO CREATE RULE PostcodeValidation AS LEN (LTRIM(@Postcode)) BETWEEN 6 AND 8--between 6 and eight chars AND LEN(REPLACE(@Postcode,' ',''))=LEN(@postcode)-1--one space only! --local (inward) code, to the right of the space, 3 characters AND--The second and third characters of the local code must be alpha SUBSTRING(@Postcode,CHARINDEX(' ',@Postcode)+1,3) LIKE '[0-9][a-Z][a-z]' AND--first character of the Sorting Office (outward) code must be alpha SUBSTRING(@Postcode,1,CHARINDEX(' ',@Postcode)-1) LIKE '[a-z][0-9a-z]%' AND--Sorting Office (outward) code the left of the gap, between 2 and 4 CHARINDEX(' ',@Postcode) BETWEEN 3 AND 5 --one number followed by two letters. GO --then we apply the documentation EXEC sp_addextendedproperty 'MS_Description', 'length between 6 and 8 characters Containing one space only 3 characters to the right of the space of which the first myst be numeric and the other two alpha Before the space can be between 2 and 4 characters of which the first must be alpha', 'user', dbo, 'rule', PostcodeValidation |
With that done we can then bind the rule to a column of a table. You can bind a rule to as many columns as you wish within the database: it keeps everything neat without duplication of the implementation of business rules.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE TABLE TestOfRule (MyID INT IDENTITY(1,1), Postcode VARCHAR(10)) /* You cannot bind a rule to a text, image, or timestamp datatype column. You cannot bind rules to system tables. Still, who cares?*/ --First bind the rule to the table column EXEC sp_bindrule PostcodeValidation, 'TestOfRule.Postcode' --Now let's try out some inserts INSERT INTO TestOfRule(postcode) SELECT 'CM20 3EQ'--OK INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--OK INSERT INTO TestOfRule(postcode) SELECT 'CM 3EQ'--OK INSERT INTO TestOfRule(postcode) SELECT 'CM2 30EQ'--not so good INSERT INTO TestOfRule(postcode) SELECT 'CM2 3EQ'--not so good INSERT INTO TestOfRule(postcode) SELECT 'SQUIFFY'--Not so good /*---error---! A column insert or update conflicts with a rule imposed by a previous CREATE RULE statement. The statement was terminated. The conflict occurred in database 'master', table 'TestOfRule', column 'Postcode'. The statement has been terminated. |
Now, could we use a function in a rule? This would make it so much easier to develop and we can do a bit more processing.
Sadly no. A missed opportunity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
CREATE FUNCTION [dbo].[ufiCleverStuff] ( @string VARCHAR(8000)) RETURNS INT AS BEGIN RETURN 1 END ALTER TABLE dbo.TestOfRule ADD Address VARCHAR(255) NULL CREATE RULE AddressValidation AS dbo.ufiCleverStuff(@address)=1 --This will cause an error! /* However, we can bind the rule to a user-defined data type */ EXEC sp_addtype Postcode,'Varchar(10)',NONULL --NONULL=no nulls allowed EXEC sp_bindrule PostcodeValidation, 'Postcode' sp_help Postcode --just to chck that the rule is there (Rule_Name) --And use it wherever we want CREATE TABLE SecondTestOfRule (MyID INT IDENTITY(1,1), Postcode postcode) --and try it out SELECT * FROM SecondTestOfRule INSERT INTO SecondTestOfRule(postcode) SELECT 'ME3 5EQ' --OK INSERT INTO SecondTestOfRule(postcode) SELECT 'AR3 567' --didn't like the 567 bit INSERT INTO SecondTestOfRule(postcode) SELECT 'ME2 30RP' --not so good INSERT INTO SecondTestOfRule(postcode) SELECT 'ME 3RP' --OK INSERT INTO SecondTestOfRule(postcode) SELECT 'DE52 3EQ' --not so good INSERT INTO SecondTestOfRule(postcode) SELECT 'SQUIFFY' --Not so good UPDATE SecondTestOfRule SET postcode='SE34 2DS' WHERE myid=5 UPDATE SecondTestOfRule SET postcode='squiffy' WHERE myid=5 --good, it chucked it out /* Rules bound to columns take precedence over rules bound to user datatypes. */ --now what happens when we create a variable of the datatype DECLARE @MyPostcode Postcode SELECT @Mypostcode='&^bananas****' /* |
Not what you’d expect; it accepts a silly postcode. There is, sadly, no checking of a rule on variable with a user-Defined datatype.
Defaults
Defaults and rules seem to have attracted the displeasure of the SQL Standards committee.
Defaults are deprecated by SQL Server 2005 but are actually rather useful because they can be ‘bound’ to a user-defined Data Type as well as any column in the entire database. There is no other way of doing this.
Take a common example. I believe that every table should have a column that gives the insertion date. You can, of course laboriously set up a default constraint on every table but why not do the following…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
--just create a simple default of the current date and time CREATE DEFAULT DateOfInsertion AS GETDATE() --now document it EXEC sp_addextendedproperty 'MS_Description', 'this is a default supplying the current date', 'user', dbo, 'default', DateOfInsertion --add a new type called InsertionDate EXEC sp_addtype InsertionDate,'DateTime',NONULL --add the default to the type sp_bindefault DateOfInsertion, "InsertionDate" sp_help InsertionDate --check that the default is there (Default_name) --now the whole process is a lot simpler CREATE TABLE EventLog (MyID INT IDENTITY(1,1), TheDescription VARCHAR(100), insertionDate insertionDate) INSERT INTO Eventlog(theDescription) VALUES ('the martians are attacking') INSERT INTO Eventlog(theDescription) VALUES ('It is the treens') INSERT INTO Eventlog(theDescription) VALUES ('Call in Dan Dare') INSERT INTO Eventlog(theDescription) VALUES ('Dan Dare to the rescue') SELECT * FROM EventLog--note that the dates have been filled in /* but there is a feeling that there is something missing.. Why not have a nice identity field User-Defined Data Type too? */ EXEC sp_addtype counter,INT,'IDENTITY' -- Sybase only. Sorry --you'll get an error |
Triggers
Triggers can be quite complex, as Pop Rivett explains elsewhere on this site with his excellent ‘Auditing’ trigger. We’ll only discuss triggers as data constraints.
Imagine we want to check addresses that are being placed in an address table. We want to ensure that we’ve done reasonable sanity checks to the address. For a start, we believe that an address should contain at least four words and alphanumeric characters with just a sprinkling of punctuation
Here is one of Phil Factor’s routines (thanks Phil).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
CREATE FUNCTION [dbo].[ufiWordcountVarchar] --counts the number of --words in a VARCHAR. We need something like this to check a valid --address. ( @string VARCHAR(8000), @WordStartCharacters VARCHAR(255)='a-z', @WordCharacters VARCHAR(255)='-a-z''' ) RETURNS INT AS BEGIN DECLARE @Position INT, --index of current search @WhereWeAre INT,--index into string so far @wordcount INT,--the number of words found so far @next INT, --where the next search starts @size INT --the total size of the text SELECT @WhereWeAre=1,@size=LEN(@string),@Position=1,@wordcount=0 WHILE @Position>0 BEGIN SELECT @Position=PATINDEX('%['+@wordStartCharacters+']%', SUBSTRING(@string,@whereWeAre,8000)) IF @Position>0 BEGIN SELECT @next=@WhereWeAre+@Position,@wordcount=@Wordcount+1 SELECT @Position=PATINDEX('%[^'+@wordCharacters+']%', SUBSTRING(@string,@next,8000)+' ') SELECT @WhereWeAre=@next+@Position END END RETURN @wordcount END GO /* so we create a test table and add a trigger to it, putting the two checks in the trigger */ CREATE TABLE MyAddresses (MyID INT IDENTITY (1,1), Address VARCHAR(100)) GO CREATE TRIGGER tCheckAddress_MyAddress ON MyAddresses FOR INSERT, UPDATE AS BEGIN IF (SELECT MIN(dbo.ufiWordcountVarchar(Address,DEFAULT,DEFAULT)) FROM inserted)<4 OR EXISTS (SELECT 1 FROM inserted WHERE address LIKE '%[^-a-z0-9''",.()]%') BEGIN ROLLBACK TRANSACTION RAISERROR ('Suspect address inserted into MyAddresses',16,1) END END --not forgetting the documentation EXEC sp_addextendedproperty 'MS_Description', 'This checks to see if there are at least four words in the address and checks for characters you wouldn''t expect to see in an address ', 'user', dbo, 'table', MyAddresses,'trigger', tCheckAddress_MyAddress INSERT INTO MyAddresses (address) VALUES ('32, Acacia Avenue, Goldhay, Berkshire') INSERT INTO MyAddresses (address) VALUES ('12 the Firs, Clive West Way, Slough, Berks') INSERT INTO MyAddresses (address) VALUES ('46, the promenade, felixstowe essex fizzbang<>*') INSERT INTO MyAddresses (address) VALUES ('Buckingham palace') UPDATE Myaddresses SET address='Aargh!' WHERE MyID=2 SELECT * FROM MyAddresses |
Integrity Constraints
Integrity Constraints, like rules, limit or ‘constrain’ you in what you can put in a table or column. Unlike triggers, integrity constraints cannot cascade changes through related tables in the database, enforce complex restrictions by referencing other database objects, perform “what if” analyses or roll back the current transaction as a result of enforcing data integrity. (With triggers, you can either roll back or continue the transaction, depending on how you handle referential integrity.) Nevertheless they are the most commonly used constraint and simple to define and alter.
There are three types of Table-level constraint:
- Unique and primary key constraints ( no two rows in a table have the same values in the specified columns – with a primary key constraint it can’t be null either).
- Referential integrity constraints (enforces the rule that, for a specific column, there must already be matching data in the column it references).
- Check constraints limit the values of data inserted into columns.
We’re most concerned with Check constraints in this article. Like a rule, the check has to be an expression that would fit in a WHERE
or IF
Clause, but can’t involve a subquery. You can apply several constraints on the one column.
Imagine we want to check that an email address is valid.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
CREATE TABLE TestConstraint ( MyID INT IDENTITY(1,1), MyEmailAddress VARCHAR(50) CONSTRAINT my_check_constraint CHECK (MyEmailAddress NOT LIKE '%[^a-z-_.@]%' AND LEN(REPLACE(MyEmailAddress,'@',''))=LEN(MyEmailAddress)-1) ) EXEC sp_addextendedproperty 'MS_Description', 'Check to make sure that only valid characters are in the email address and there is only one @ sign', 'user', dbo, 'table', testconstraint,'constraint', my_check_constraint INSERT INTO TestConstraint (MyEmailAddress) SELECT 'Robyn.Page@Simple-talk.com' INSERT INTO TestConstraint (MyEmailAddress) SELECT 'Silly@>@Simple-talk.com' INSERT INTO TestConstraint (MyEmailAddress) SELECT 'Bill.Gates@Microsoft.com' UPDATE TestConstraint SET MyEmailAddress='dodgy@@simple-talk.com' WHERE MyID=1 /* the second one and fourth will fail. Ideally, one should be able to bind a user-defined error message to the constraint. Again, sorry - Sybase only! if we want the constraint to check other columns, then we need a table -level constraint rather then a column constraint*/ CREATE TABLE SecondTestConstraint ( MyID INT IDENTITY(1,1), MyName VARCHAR(50), MyDomain VARCHAR(50), CONSTRAINT my_Second_check_constraint CHECK (MyName+MyDomain NOT LIKE '%[^a-z-_.@]%' AND LEN(REPLACE(MyName+MyDomain,'@','')) =LEN(MyName+MyDomain)-1) ) INSERT INTO SecondTestConstraint (MyName,MyDomain) SELECT 'Robyn.Page','@Simple-talk.com' INSERT INTO SecondTestConstraint (MyName,MyDomain) SELECT 'Mad <andBad>','@Simple-talk.com' |
So we see that the table-level check has done a check based on both columns.
A unique index permits no two rows to have the same index value, including NULL
. A unique index checks that all data in the column is unique when it is created and also when an insert or update is attempted.
You cannot create a unique index on a column that contains null values in more than one row – an error is, obviously, triggered.
You can use the unique
keyword on composite indexes.
1 2 3 4 5 6 7 8 |
CREATE UNIQUE INDEX idxUniqueEmail ON TestConstraint(MyEmailAddress) INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com' INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Kubla@Kahn.com' INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Robyn@Kahn.com' INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Anyone@Kahn.com' INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Admin@Kahn.com' INSERT INTO TestConstraint(MyEmailAddress) SELECT 'Gengis@Kahn.com' --yes, they all go in except for the last one which triggers an error |
The use of referential constraints is beyond the scope of this article, as it is difficult to talk about them without going on to write the entire chapter of a book, touching on the complexities of referential data models, cascading deletes and so on.
This workshop started off when I was staring at the tree view of Enterprise Manager, and suddenly realising I’d never used Rules and wondering what they were for! I hope you’ve learned something new too.
Just a few questions just to check …
- When would you choose a Rule as the best way of checking data going into your database
- What are the advantages of defaults over default constraints?
- Why would you choose a table-level constraint over a column level constraint?
- How do you ensure that the value entering a table is unique to the column?
- When would you choose a trigger to check on data over either a rule or default constraint?
- What is the point of putting a default on a user-defined datatype?
- What happens if you create a Unique index on a column that has two null values in it? Why?
- how would you check in SQL whether a default or rule is bound to a user-Defined datatype?
- What is the easiest way of ensuring that a column containing foreign keys actually reference valid primary keys in another table.
- What are Rules and Defaults likely to be replaced by in future?
Load comments