Often, we have code that works perfectly well for a particular purpose, and then find that we need to implement some very similar functionality in another database. It is all too tempting to just copy the code, adapt it to meet the new requirements, and then deploy this slightly modified version of the same code. However, every time we copy and paste code in this manner, we are exposed to the following risk: the requirements change, and we fail to change the code in both places.
Whenever evidence of repetition is found in the code base, a defensive programmer should seek to refactor, so that the code to tackle a given problem or enforce a given rule is implemented in one place only. In other words, common logic should be refactored into a single reusable code unit, in the form of a constraint, stored procedure, trigger, user-defined function (UDF), or index. Whichever approach we use in each particular case, this proper form of code reuse reduces the possibility of bugs and is a vitally important part of defensive programming.
Unfortunately, many developers find it difficult to choose the correct implementation for the given requirement; in this article I will offer some useful pointers as to the approach that will lead to the most defensive, and therefore robust, code. We must, as always, benchmark our solutions because the one that most conveniently promotes reuse is not necessarily the one that will perform the best.
Specifically, this article will cover:
- why copy-and-paste will get you into trouble
- how proper code reuse will help
- using views to encapsulate simple queries
- using UDFs to encapsulate parameterized queries; and why UDFs may sometimes be preferable to stored procedures for this requirement
- potential performance issues with UDFs
- using constraints, triggers and filtered indexes to implement business logic in one place.
The Dangers of Copy-and-Paste
The biggest problem with copy-and-paste as a means of solving a set of similar problems is that, of course, it leads to code duplication. In turn, this means that we need to maintain multiples copies of essentially the same code, but with each copy subtly modified to suit a particular need. The real danger arises when requirements change, and we need to make sure that this is reflected, not just in the original code, but in all the subsequent copies. We can easily demonstrate this risk with an example. Listing 1 creates the Sales
table and loads it with some test data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
CREATE TABLE dbo.Sales ( SalesID INT NOT NULL IDENTITY PRIMARY KEY , StateCode CHAR(2) NOT NULL , SaleDateTime DATETIME NOT NULL , Amount DECIMAL(10, 2) NOT NULL ) ; GO SET NOCOUNT ON ; DECLARE @d DATETIME , @i INT ; SET @d = '20091002' ; SET @i = 0 ; WHILE @i < 40 BEGIN ; INSERT INTO dbo.Sales ( StateCode , SaleDateTime , Amount ) SELECT 'CA' , @d , case WHEN @d <'20091001' THEN 5000000 ELSE 5000 END UNION ALL SELECT 'OR' , @d , case WHEN @d <'20091001' THEN 1000000 ELSE 1000 END ; SELECT @d = DATEADD(day, -1, @d) , @i = @i + 1 ; END ; |
Listing 2 shows the stored procedure, SelectTotalSalesPerStateForMonth
, which returns the total sales per state for a given month.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE PROCEDURE dbo.SelectTotalSalesPerStateForMonth @AsOfDate DATETIME AS SELECT SUM(Amount) AS SalesPerState , StateCode FROM dbo.Sales -- month begins on the first calendar day of the month WHERE SaleDateTime >= DATEADD(month, DATEDIFF(month, '19900101', @AsOfDate), '19900101') AND SaleDateTime <= @AsOfDate GROUP BY StateCode ; |
At the time we developed this code, our understanding of a report “for a given month” is one that covers the period of time from the first calendar day of the month until the day we run the report. For this purpose, our stored procedure serves the customers’ needs well, and we soon receive a request for a similar report, returning the average sales per state, for a given month. Note that our new report is required to use the same definition of “for a given month.”
It is very tempting to just copy the existing SelectTotalSalesPerStateForMonth
procedure, and replace sum
with avg
to meet the new requirements, as shown in Listing 3.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE PROCEDURE dbo.SelectAverageSalesPerStateForMonth @AsOfDate DATETIME AS SELECT AVG(Amount) AS SalesPerState , StateCode FROM dbo.Sales -- month begins on the first calendar day of the month WHERE SaleDateTime >= DATEADD(month, DATEDIFF(month, '19900101', @AsOfDate), '19900101') AND SaleDateTime <= @AsOfDate GROUP BY StateCode ; |
In this way, we have completed the task in just a few seconds and, in the short term at least, it will do the job.
Suppose, however, that at some later time the users request to change the definition of “for a given month” to “thirty consecutive calendar days, ending on the day we run the report.” Unfortunately, the definition of “for a given month” is implemented twice, both in SelectTotalSalesPerStateForMonth
and in SelectAverageSalesPerStateForMonth
. Even if one and the same person developed them both, it is possible to forget it by the time we need to implement the change. Even if it is clearly documented that both procedures should use one and the same definition, it is still possible that the developer implementing the change has failed to modify both stored procedures in a consistent way.
Suppose, for example, that only the SelectAverageSalesPerStateForMonth
stored procedure was modified to meet this new requirement. Listing 4 shows how it was changed.
1 2 3 4 5 6 7 8 9 10 |
ALTER PROCEDURE dbo.SelectAverageSalesPerStateForMonth @AsOfDate DATETIME AS SELECT AVG(Amount) AS SalesPerState , StateCode FROM dbo.Sales -- month means 30 calendar days WHERE SaleDateTime >= DATEADD(day, -29, @AsOfDate) AND SaleDateTime <= @AsOfDate GROUP BY StateCode ; |
When we make such changes, it is very easy to forget that we have implemented the definition of “for a given month” in two places. If we update the definition in one place and not the other, we will get inconsistent results, as demonstrated by Listing 5.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
PRINT 'Total Sales Per State For Month:' ; EXEC dbo.SelectTotalSalesPerStateForMonth @AsOfDate = '20091005' ; PRINT 'Average Sales Per State For Month:' ; EXEC dbo.SelectAverageSalesPerStateForMonth @AsOfDate = '20091005' ; Total Sales Per State For Month: SalesPerState StateCode --------------------------------------- --------- 10000.00 CA 2000.00 OR (2 row(s) affected) Average Sales Per State For Month: SalesPerState StateCode --------------------------------------- --------- 4630000.000000 CA 926000.000000 OR (2 row(s) affected) |
Clearly the average sales size for the state of California (4,630,000) is many times greater than the total sales for the same state (10,000), which makes no sense at all. In this example, I have deliberately used test data that makes the discrepancy obvious. In general, however, such discrepancies may be more subtle and difficult to spot, so they can lurk around for a long time.
As this example clearly demonstrates, when we cut and paste code, we expose our code to the possibility of bugs if our requirements change, and we fail to change each of the multiple implementations of one and the same logic in exactly the same way. In short, copy-and-paste coding is a direct violation of the DRY (Don’t Repeat Yourself) principle, which is so fundamental in software engineering.
The DRY principle…
…was originally stated by Andy Hunt see http://en.wikipedia.org/wiki/Andy_Hunt_(author) and Dave Thomas see http://en.wikipedia.org/wiki/Dave_Thomas_(author) in their book The Pragmatic Programmer. For details, go to http://en.wikipedia.org/wiki/The_Pragmatic_Programmer. I encourage you to read this book; it is very relevant to every programmer.
The code to implement a given logic should be implemented once, and once only, and reused by all applications that need it. However, of course, due care must be taken when reusing SQL code. Careless reuse of code can lead to maintenance and performance issues, especially when this reuse takes the form of scalar UDFs. We cannot reuse code without first verifying that it runs fast enough.
How Reusing Code Improves its Robustness
Rather than repeat the same logic in multiple places, we need to refactor the common functionality out of our two stored procedures. We can implement the definition of “sales for a given month” in an inline UDF, as shown in Listing 6.
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE FUNCTION dbo.SalesForMonth (@AsOfDate DATETIME) RETURNS TABLE AS RETURN ( SELECT SalesID , StateCode , SaleDateTime , Amount FROM dbo.Sales WHERE SaleDateTime >= DATEADD(day, -29, @AsOfDate) AND SaleDateTime <= @AsOfDate ) ; |
This new inline UDF can then be used in both stored procedures.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
ALTER PROCEDURE dbo.SelectTotalSalesPerStateForMonth @AsOfDate DATETIME AS BEGIN SELECT SUM(Amount) AS SalesPerState , StateCode FROM dbo.SalesForMonth(@AsOfDate) GROUP BY StateCode ; END ; GO ALTER PROCEDURE dbo.SelectAverageSalesPerStateForMonth @AsOfDate DATETIME AS BEGIN SELECT AVG(Amount) AS SalesPerState , StateCode FROM dbo.SalesForMonth(@AsOfDate) GROUP BY StateCode ; END ; |
After this refactoring, our two stored procedures are guaranteed to have the same definition of “for a given month.” We can rerun Listing 5 and try them out. If, at some later date, we change the definition of the reporting period again, we will have to modify only one module, SalesForMonth
.
Can we reuse the definition of the reporting period in other queries against other tables? We can at least try to go one step further and have one module define our reporting period and do nothing else. As usual, we should verify that the performance is still acceptable. The code in Listing 8 shows how to implement the definition of reporting period as an inline UDF.
1 2 3 4 5 6 7 |
CREATE FUNCTION dbo.MonthReportingPeriodStart (@AsOfDate DATETIME ) RETURNS TABLE AS RETURN ( SELECT DATEADD(day, -29, @AsOfDate) AS PeriodStart ) ; |
We can utilize this inline UDF when we implement the “sales for a given month” functionality.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
ALTER FUNCTION dbo.SalesForMonth ( @AsOfDate DATETIME ) RETURNS TABLE AS RETURN ( SELECT SalesID , StateCode , SaleDateTime , Amount FROM dbo.Sales AS s CROSS APPLY dbo.MonthReportingPeriodStart(@AsOfDate) AS ps WHERE SaleDateTime >= ps.PeriodStart AND SaleDateTime <= @AsOfDate ) ; |
You can rerun Listing 5 one more time to verify that both our stored procedures still work correctly.
Alternatively, we can use a scalar UDF to implement the definition of reporting period, as shown in Listing 10.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
-- being defensive, we must drop the old implementation -- so that reporting periods are implemented -- only in one place DROP FUNCTION dbo.MonthReportingPeriodStart ; GO CREATE FUNCTION dbo.MonthReportingPeriodStart ( @AsOfDate DATETIME ) RETURNS DATETIME AS BEGIN ; DECLARE @ret DATETIME ; SET @ret = DATEADD(day, -29, @AsOfDate) ; RETURN @ret ; END ; |
We also have to change our SalesForMonth
function, so that it utilizes our new scalar UDF, as shown in Listing 11.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
ALTER FUNCTION dbo.SalesForMonth ( @AsOfDate DATETIME ) RETURNS TABLE AS RETURN ( SELECT SalesID , StateCode , SaleDateTime , Amount FROM dbo.Sales AS s WHERE SaleDateTime >= dbo.MonthReportingPeriodStart(@AsOfDate) AND SaleDateTime <= @AsOfDate ) ; |
Note that the new implementation of SalesForMonth
is simpler than the previous one (Listing 9). Instead of using the CROSS
APPLY
clause to utilize the inline UDF, we can just invoke the scalar UDF directly in the WHERE
clause.
In fact, however, the CROSS APPLY
version will perform better in many cases. As always when we reuse code, we need to benchmark the performance of each of the possible approaches before making a choice. In some cases, chaining functions can lead to bad performance so, depending on the results of our benchmarking, we might even have to abandon the SalesForMonth
and MonthReportingPeriodStart
UDFs and return to the simpler function from Listing 6.
The basic fact remains, however, that implementing the same logic in multiple places increases the possibility of bugs when our requirements change. Instead, we should aim for sensible code reuse wherever possible, and UDFs are just one of the means to achieve this. Over the coming sections, we’ll discuss other ways in which we can reuse T-SQL code, as dictated by the given circumstances. Overall, reusing code is a very important component of defensive programming, and I cannot emphasize strongly enough how much it can improve the robustness of our code.
Wrapping SELECTs in Views
In some cases, it makes sense to wrap a frequently-used query in a view, as shown in Listing 12.
1 2 3 4 5 |
CREATE VIEW dbo.TotalSalesByState AS SELECT SUM(Amount) AS TotalSales, StateCode FROM dbo.Sales GROUP BY StateCode ; |
You can SELECT
from views in exactly the same way as you can SELECT
from tables, so views are very convenient and useful. However, views do not offer the ability to provide parameters to the SELECT
statements that we are reusing. When this requirement arises, we reuse SELECT
statements by wrapping them either in stored procedures or in user-defined functions.
As usual, we need to consider performance whenever we choose to use views. Typically views do not cause any performance degradation at all. However, we need to use them in moderation: having to deal with too many layers of nested views may overwhelm the optimizer and cause it to choose a suboptimal plan.
Reusing Parameterized Queries: Stored Procedures versus Inline UDFs
If we want to reuse parameterized queries, it is usually preferable to wrap them in user-defined functions. It is typically less convenient to reuse parameterized queries that are wrapped in stored procedures, as the following examples will demonstrate.
Let’s say we have a stored procedure that returns all sales for the month, across all states, as shown in Listing 13.
1 2 3 4 5 6 7 8 9 10 |
CREATE PROCEDURE dbo.SelectSalesForMonth @AsOfDate DATETIME AS BEGIN ; SELECT Amount , StateCode FROM dbo.Sales WHERE SaleDateTime >= DATEADD(day, -29, @AsOfDate) AND SaleDateTime <= @AsOfDate ; END ; GO |
Hopefully, you spotted the missed opportunity for code reuse in this listing. We should have reused our MonthReportingPeriodStart
in the WHERE
clause; I leave this as an exercise for the reader.
We now need to develop a stored procedure that retrieves the total sales per state for a given month, and we want to reuse the SelectSalesForMonth
stored procedure, Although it’s possible to do this, we will need to create a table variable or a temporary table with a structure that matches the structure of the result set returned by stored procedure, as shown in Listing 14.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE PROCEDURE dbo.SelectSalesPerStateForMonth @AsOfDate DATETIME AS BEGIN ; DECLARE @SalesForMonth TABLE ( StateCode CHAR(2) , Amount DECIMAL(10, 2) ) ; INSERT INTO @SalesForMonth ( Amount , StateCode ) EXEC dbo.SelectSalesForMonth @AsOfDate ; SELECT SUM(Amount) AS TotalSalesForMonth , StateCode FROM @SalesForMonth GROUP BY StateCode ORDER BY StateCode ; END ; GO |
We can run a smoke test to verify that our two stored procedures work.
1 2 |
EXEC dbo.SelectSalesForMonth @AsOfDate = '20091002' ; EXEC dbo.SelectSalesPerStateForMonth @AsOfDate = '20091002' ; |
So far so good; we have reused the code wrapped in SelectSalesForMonth
procedure and it works. However, now suppose we want to select the state with the highest total sales for a given month. It looks as if we can simply reuse the SelectSalesPerStateForMonth
procedure, again with a slight modification to create a table variable or a temporary table, as shown in Listing 16.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE PROCEDURE dbo.SelectStateWithBestSalesForMonth @AsOfDate DATETIME AS BEGIN ; DECLARE @SalesForMonth TABLE ( TotalSales DECIMAL(10, 2) , StateCode CHAR(2) ) ; INSERT INTO @SalesForMonth ( TotalSales , StateCode ) EXEC dbo.SelectSalesPerStateForMonth @AsOfDate ; SELECT TOP (1) TotalSales , StateCode FROM @SalesForMonth ORDER BY TotalSales DESC ; END ; |
Unfortunately, although the procedure creates, it does not work.
1 2 3 4 5 |
EXEC dbo.SelectStateWithBestSalesForMonth @AsOfDate = '20091002' ; Msg 8164, Level 16, State 1, Procedure SelectSalesPerStateForMonth, Line 10 An INSERT EXEC statement cannot be nested. |
Unfortunately, the INSERT...EXEC
approach that we used in SelectSalesPerStateForMonth
procedure cannot be nested. This is a very serious limitation.
The two inline UDFs shown in Listing 18 implement the same requirements. Note that the TotalSalesPerStateForMonth
function implements the same functionality as our previous SelectTotalSalesPerStateForMonth
stored procedure. As per our rules of code reuse, we would only ever implement one or the other, not both, in our solutions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE FUNCTION dbo.TotalSalesPerStateForMonth ( @AsOfDate DATETIME ) RETURNS TABLE AS RETURN ( SELECT StateCode , SUM(Amount) AS TotalSales FROM dbo.SalesPerStateForMonth(@AsOfDate) GROUP BY StateCode ) ; GO CREATE FUNCTION dbo.StateWithBestSalesForMonth ( @AsOfDate DATETIME ) RETURNS TABLE AS RETURN ( SELECT TOP (1) StateCode , TotalSales FROM dbo.TotalSalesPerStateForMonth(@AsOfDate) ORDER BY TotalSales DESC ) ; |
In contrast to what we saw in Listing 17, our attempt to reuse result sets returned from nested inline UDFs works just fine.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
SELECT * FROM dbo.TotalSalesPerStateForMonth ( '20091002' ) ; SELECT * FROM dbo.StateWithBestSalesForMonth ( '20091002' ) ; StateCode TotalSales --------- --------------------------------------- CA 140010000.00 OR 28002000.00 (2 row(s) affected) StateCode TotalSales --------- --------------------------------------- CA 140010000.00 (1 row(s) affected) |
It is often easier to reuse code when it is wrapped in inline UDFs than when it is wrapped in stored procedures. I should emphasize that I refer only to inline UDFs, not to all three varieties of UDF. Whenever we are deciding whether to use stored procedures or UDFs, we also need to consider the following:
INSERT EXEC
requires you to create a table variable or temporary table before doing the call; stored procedures can have multiple and/or varying result sets, depending on code path, causing all kinds of problems withINSERT EXEC
- certain functionality, such as data modifications and
TRY...CATCH
blocks, is not allowed in UDFs - the inline UDF, like a view, is expanded in the execution plan, giving the optimizer the choice to take shortcuts, or even remove joined tables if their columns are not used.
Let’s discuss performance considerations and see why it might not be a good idea to use scalar UDFs.
Scalar UDFs and Performance
Hopefully, the examples so far have demonstrated that laying out code in simple reusable modules can simplify maintenance, and reduce the chance of bugs when requirements change.
Although the emphasis of this book is on writing correct and robust code, we must, in this article, discuss performance. The reason is simple: careless code reuse can seriously hurt performance. For example, in some cases scalar UDFs may perform very poorly, and I will provide an example that demonstrates this, for SQL Server 2005 and 2008. Of course, in future versions of SQL Server the relative performance of the different flavors of UDFs may change, so it’s essential that you always benchmark the performance impact of code refactoring, and rerun these benchmarks when you upgrade to a new SQL Server version.
For this example, we’ll need to create a test table with a reasonable number of rows, so let’s first set up a 128K-row helper table, Numbers
, as shown in Listing 20, which we can use to populate the test table. These helper tables are a must-have in database development. If you already have your own version that suits the same purpose, then feel free to use that instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
CREATE TABLE dbo.Numbers ( n INT NOT NULL , CONSTRAINT PK_Numbers PRIMARY KEY ( n ) ) ; GO DECLARE @i INT ; SET @i = 1 ; INSERT INTO dbo.Numbers ( n ) VALUES ( 1 ) ; WHILE @i < 100000 BEGIN ; INSERT INTO dbo.Numbers ( n ) SELECT @i + n FROM dbo.Numbers ; SET @i = @i * 2 ; END ; |
Next, in Listing 21, we create the sample Packages
table and populate it using our Numbers
helper table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
CREATE TABLE dbo.Packages ( PackageID INT NOT NULL , WeightInPounds DECIMAL(5, 2) NOT NULL , CONSTRAINT PK_Packages PRIMARY KEY ( PackageID ) ) ; GO INSERT INTO dbo.Packages ( PackageID , WeightInPounds ) SELECT n , 1.0 + ( n % 900 ) / 10 FROM dbo.Numbers ; |
The cost of shipping for a package is $1 if it weighs less than 5 pounds and $2 if it weighs 5 pounds or more. Listing 22 shows how to implement this simple algorithm, both as a scalar and as an inline UDF.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
CREATE FUNCTION dbo.GetShippingCost ( @WeightInPounds DECIMAL(5, 2) ) RETURNS DECIMAL(5, 2) AS BEGIN DECLARE @ret DECIMAL(5, 2) ; SET @ret = CASE WHEN @WeightInPounds < 5 THEN 1.00 ELSE 2.00 END ; RETURN @ret ; END ; GO CREATE FUNCTION dbo.GetShippingCost_Inline ( @WeightInPounds DECIMAL(5, 2) ) RETURNS TABLE AS RETURN ( SELECT CAST(CASE WHEN @WeightInPounds < 5 THEN 1.00 ELSE 2.00 END AS DECIMAL(5, 2)) AS ShippingCost ) ; |
Now, we are ready to examine the comparative performance of each function, using the simple benchmark shown in Listing 23.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
SET STATISTICS TIME ON ; SET NOCOUNT ON ; PRINT 'Using a scalar UDF' ; SELECT SUM(dbo.GetShippingCost(WeightInPounds)) AS TotalShippingCost FROM dbo.Packages ; PRINT 'Using an inline UDF' ; SELECT SUM(s.ShippingCost) AS TotalShippingCost FROM dbo.Packages AS p CROSS APPLY dbo.GetShippingCost_Inline(p.WeightInPounds) AS s ; PRINT 'Not using any funtions at all' ; SELECT SUM(CASE WHEN p.WeightInPounds < 5 THEN 1.00 ELSE 2.00 END) AS TotalShippingCost FROM dbo.Packages AS p ; SET STATISTICS TIME OFF ; |
Although both functions implement exactly the same algorithm, the performance is dramatically different. When we run this benchmark on SQL Server 2012 or 2014, the query that uses our scalar UDF runs dramatically slower. Also, in this particular case, the query which uses the inline UDF performs very well, although not as fast as the query that does not use any UDFs at all, as shown in Listing 24. Of course, when you run these benchmarks on your system, you may get different results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Using a scalar UDF ...<snip>... SQL Server Execution Times: CPU time = 1531 ms, elapsed time = 1552 ms. Using an inline UDF ...<snip>... SQL Server Execution Times: CPU time = 109 ms, elapsed time = 82 ms. Not using any functions at all ...<snip>... SQL Server Execution Times: CPU time = 32 ms, elapsed time = 52 ms. |
I am not saying that using inline UDFs never incurs any performance penalties; blanket statements do not belong in database programming, and we always need to consider the performance of each particular case separately. However, in many cases, inline UDFs perform very well.
Multi-statement Table-valued UDFs
Besides scalar and inline UDFs, there are multi-statement table-valued UDFs. I will not discuss or benchmark them here, because I feel I’ve already proved the point that we need to consider performance when we refactor code. However, it’s worth noting that, in general, while inline UDFs tend to be “performance neutral,” scalar and multi-statement ones tend to hurt performance if not used carefully, and should be rigorously tested and benchmarked. Be especially wary of using a multi-statement table-valued UDF in an APPLY
, since that may force the optimizer to re-execute the UDF for each row in the table the UDF is applied against.
If you are interested in learning about different flavors of UDF, I encourage you to read Books Online and Itzik Ben Gan’s T-SQL Querying book.
Reusing Business Logic: Stored Procedure, Trigger, Constraint or Index?
There are several ways in which we can choose to implement our business logic. For example, we could use:
- stored procedures
- constraints
- triggers
- unique filtered indexes.
Over the coming sections we’ll discuss the sort of situations where each approach may, or may not, be appropriate.
Use constraints where possible
In many cases, constraints are the easiest and simplest to use. To demonstrate this point, consider the Teams
table shown in Listing 25, with a primary key constraint on the TeamID
column.
1 2 3 4 5 6 |
CREATE TABLE dbo.Teams ( TeamID INT NOT NULL , Name VARCHAR(50) NOT NULL , CONSTRAINT PK_Teams PRIMARY KEY ( TeamID ) ) ; |
Since we wish to forbid access to the base tables, teams will be inserted into the table, one at a time, by calling a stored procedure. Our business rule is simple: team names must be unique. So, we need to decide where to implement this business rule. One choice is to enforce it in the stored procedure, as shown in Listing 27.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE PROCEDURE dbo.InsertTeam @TeamID INT , @Name VARCHAR(50) AS BEGIN ; -- This is not a fully-functional stored -- procedure. Error handling is skipped to keep -- the example short. -- Also potential race conditions -- are not considered in this simple module INSERT INTO dbo.Teams ( TeamID , Name ) SELECT @TeamID , @Name WHERE NOT EXISTS ( SELECT * FROM dbo.Teams WHERE Name = @Name ) ; -- we also need to raise an error if we -- already have a team with such a name END ; |
So, we have a stored procedure that enforces our rule, at least in the absence of high concurrency. However, what happens when we need another stored procedure that modifies a single row in the Teams
table, or one that merges a batch of new rows into that table? We’ll need to re-implement this same logic for every stored procedure that modifies this table. This is a form of copy-and-paste and is both time consuming and error prone.
Besides, unless you can guarantee that no applications can run modifications directly against the Teams
table, it’s likely that your business rule will be bypassed at some point, and inconsistent data will be introduced.
It is much easier and safer to just create the business rule once, in one place, as a UNIQUE
constraint, as shown in Listing 27.
1 2 |
ALTER TABLE dbo.Teams ADD CONSTRAINT UNQ_Teams_Name UNIQUE(Name) ; |
We can now let the database engine make sure that this business rule is always enforced, regardless of the module or command that modifies the table.
Turn to triggers when constraints are not practical
As we have seen, constraints are extremely useful in many simple cases. However, our business rules are often more complex, and it is sometimes not possible or not practical to use constraints. To demonstrate this point, let’s add one more table, TeamMembers
, which references the Teams
table through the TeamID
column, as shown in Listing 28.
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE TABLE dbo.TeamMembers ( TeamMemberID INT NOT NULL , TeamID INT NOT NULL , Name VARCHAR(50) NOT NULL , IsTeamLead CHAR(1) NOT NULL , CONSTRAINT PK_TeamMembers PRIMARY KEY ( TeamMemberID ) , CONSTRAINT FK_TeamMembers_Teams FOREIGN KEY ( TeamID ) REFERENCES dbo.Teams ( TeamID ) , CONSTRAINT CHK_TeamMembers_IsTeamLead CHECK ( IsTeamLead IN ( 'Y', 'N' ) ) ) ; |
Suppose that we need to implement the following business rule: no team can have more than two members. Implementing this business rule in a trigger is quite straightforward, as shown in Listing 29, and you only have to do it once. It is possible, but much more complex, to implement this rule via constraints.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
CREATE TRIGGER dbo.TeamMembers_TeamSizeLimitTrigger ON dbo.TeamMembers FOR INSERT, UPDATE AS IF EXISTS ( SELECT * FROM ( SELECT TeamID , TeamMemberID FROM inserted UNION SELECT TeamID , TeamMemberID FROM dbo.TeamMembers WHERE TeamID IN ( SELECT TeamID FROM inserted ) ) AS t GROUP BY TeamID HAVING COUNT(*) > 2 ) BEGIN ; RAISERROR('Team size exceeded limit',16, 10) ; ROLLBACK TRAN ; END ; |
With our business rule implemented in only one place, we can comprehensively test just one object. In order to test this trigger, we need some test data in our parent table, as shown in Listing 30.
1 2 3 4 5 6 7 8 9 10 11 12 |
INSERT INTO dbo.Teams ( TeamID , Name ) SELECT 1 , 'Red Bulls' UNION ALL SELECT 2 , 'Blue Tigers' UNION ALL SELECT 3 , 'Pink Panthers' ; |
The script shown next, in Listing 31, verifies that we can successfully add new team members, as long as the teams’ sizes do not exceed the limit imposed by our trigger.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
-- adding team members to new teams INSERT INTO dbo.TeamMembers ( TeamMemberID , TeamID , Name , IsTeamLead ) SELECT 1 , 1 , 'Jill Hansen' , 'N' UNION ALL SELECT 2 , 1 , 'Sydney Hobart' , 'N' UNION ALL SELECT 3 , 2 , 'Hobart Sydney' , 'N' ; -- add more team members to existing teams BEGIN TRANSACTION ; INSERT INTO dbo.TeamMembers ( TeamMemberID , TeamID , Name , IsTeamLead ) SELECT 4 , 2 , 'Lou Larry' , 'N' ; ROLLBACK TRANSACTION ; |
The script shown next, in Listing 32, verifies that we can successfully transfer team members between teams, as long as the teams’ sizes do not exceed the limit.
1 2 3 4 5 6 7 8 9 |
BEGIN TRANSACTION ; UPDATE dbo.TeamMembers SET TeamID = TeamID + 1 ; ROLLBACK ; BEGIN TRANSACTION ; UPDATE dbo.TeamMembers SET TeamID = 3 - TeamID ; ROLLBACK ; |
So, we’ve proved that our trigger allows modifications that do not violate our business rules. Now we need to make sure that it does not allow modifications that do violate our business rules; there are quite a few cases, and we need to verify them all. First of all, Listing 33 verifies that we cannot add new team members if the resulting teams’ sizes are too big. All the statements in the script must, and do, fail.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
-- attempt to add too many team members -- to a team which already has members INSERT INTO dbo.TeamMembers ( TeamMemberID , TeamID , Name , IsTeamLead ) SELECT 4 , 2 , 'Calvin Lee' , 'N' UNION ALL SELECT 5 , 2 , 'Jim Lee' , 'N' ; GO -- attempt to add too many team members to an empty team INSERT INTO dbo.TeamMembers ( TeamMemberID , TeamID , Name , IsTeamLead ) SELECT 4 , 3 , 'Calvin Lee' , 'N' UNION ALL SELECT 5 , 3 , 'Jim Lee' , 'N' UNION ALL SELECT 6 , 3 , 'Jake Lee' , 'N' ; |
Also, we need to make sure that we cannot transfer team members if the resulting teams’ sizes are too big, as shown in Listing 34. Again, all the following statements fail as expected.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
-- attempt to transfer members from other teams -- to a team which is full to capacity UPDATE dbo.TeamMembers SET TeamID = 1 WHERE TeamMemberID = 3 ; GO -- attempt to transfer too many team members -- to a team that is not full yet UPDATE dbo.TeamMembers SET TeamID = 2 WHERE TeamMemberID IN ( 1, 2 ) ; GO -- attempt to transfer too many team members -- to an empty team UPDATE dbo.TeamMembers SET TeamID = 3 ; |
The amount of testing needed to ensure that a trigger works as expected can be quite substantial. However, this is the easiest alternative; if we were to re-implement this business rule in several stored procedures, then the same amount of testing required for the single trigger would be required for each of these procedures, in order to ensure that every one of them implements our business rule correctly.
Unique filtered indexes
Last, but not least, in some cases filtered indexes also allow us to implement business rules. For example, suppose that we need to make sure that each team has at most one team lead. If you are using SQL Server 2008 and upwards, then a filtered index can easily implement this business rule, as shown in Listing 35. I encourage you to try out this index and see for yourself that it works.
1 2 3 |
CREATE UNIQUE NONCLUSTERED INDEX TeamLeads ON dbo.TeamMembers(TeamID) WHERE IsTeamLead='Y' ; |
Summary
I hope that this article has proved to you that a copy-and-paste approach to code reuse will lead to multiple, inconsistent versions of the same logic being scattered throughout your code base, and a maintenance nightmare.
It has also demonstrated how common logic can be refactored into a single reusable code unit, in the form of a constraint, stored procedure, trigger, UDF or index. This careful reuse of code will reduce the possibility of bugs and greatly improve the robustness of our code.
Unfortunately, performance considerations may prevent us from reusing our code to the fullest. Yet, with careful benchmarking, we can usually find a nice compromise and develop code that is easy to maintain but still performs well enough.
Specifically, I hope the article has taught you the following lessons in defensive programming:
- views are useful for simple reuse of non-parameterized queries
- for reuse of parameterized queries, inline UDFs are often preferable to stored procedures
- be wary of performance issues with scalar and multi-statement table-valued UDFs
- if possible, enforce reusable business logic in a simple constraint, or possibly a filtered index in SQL 2008 and upwards
- for more complex logic, triggers often offer the most convenient means of promoting reuse, but they require extensive testing.
Load comments