The Questions
- “Under what circumstances is character data truncated without triggering an error?”.
- “SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”.
- “I’ve created a query that uses the ISNUMERIC function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”
- “I’m told I should use the BIT data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”
- “When I use the DATALENGTH and LEN functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”.
- “I’m troubleshooting a stored procedure that uses a COALESCE expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”.
- “I’m developing a query that contains a BETWEEN operator in the WHERE clause. The compared data includes a range of consecutive values that contain both letters and numbers, as in XYZ60 through XYZ700. Even when I specify the entire range, values are unexpectedly omitted from my query results. Any idea why this is occurring?”
- “I’ve seen some T-SQL code that contains ‘national character varying(max)’ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”.
- “When is it best to use the CHAR date type rather than VARCHAR?”.
- “I’m building a query that joins a FLOAT column to a REAL column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”.
- “I have a table that includes a VARCHAR column and a stored procedure that accesses data from the table, based on values in that column. The procedure includes an NVARCHAR parameter used to qualify the WHERE clause, which references the VARCHAR column. Does it matter that the data types are different between the parameter and the column?”.
- “My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”.
- “I’m told never to use SQL_VARIANT data type inappropriately. What is an appropriate use for this data type?”
“Under what circumstances is character data truncated without triggering an error?”
Never, one might hope. But SQL Server is a sneaky devil, preventing truncation some of the time, but not all the time. Sure, if you try to insert too large a value into a column, the database engine baulks, just like it would for the following table:
1 2 3 4 5 6 7 8 9 10 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffName VARCHAR(10) NOT NULL ); GO |
The StuffID
column, the primary key, is configured with the INT
data type, and the StuffName
column is configured as VARCHAR(10)
. All very basic. And just as basic is the following INSERT
statement:
1 2 |
INSERT INTO OurStuff(StuffName) VALUES('Really Big Thing'); |
What we’re trying to do is here is insert a value into the StuffName
column that exceeds the data type’s specified length. Not surprisingly, SQL Server kicks out an error:
1 2 3 |
Msg 8152, Level 16, State 14, Line 1 String or binary data would be truncated. The statement has been terminated. |
Everything just as we’d expect. The value being inserted is too big, so the database engine jumps to the rescue, preventing any values from being truncating and, consequently, ensuring integrity of the data.
Unfortunately, SQL Server is not quite so protective with its variables and parameters. Consider the following T-SQL script, which declares the @stuff
variable and sets its value to a string that exceeds the data type’s specified length:
1 2 |
DECLARE @stuff VARCHAR(10) = 'Really Big Thing'; PRINT @stuff; |
The database engine doesn’t blink. It simply inserts the string’s first 10 characters (Really
Big
) and goes about its business, pretending that all is well, when in fact our variable must limp along with a truncated value. The PRINT
statement merely confirms our suspicions and returns the really wrong Really
Big
.
It gets worse. Imagine you’re using an ISNULL
function to compare two variables, as in the following example:
1 2 3 4 |
DECLARE @stuff1 VARCHAR(5) = NULL, @stuff2 VARCHAR(10) = 'Big Thing'; PRINT ISNULL(@stuff1, @stuff2); |
The first variable is defined as VARCHAR(5)
and assigned a value of NULL
. The second is defined as VARCHAR(10)
and assigned the string value Big
Thing
. So far so good. Now we get to ISNULL
. The thing with this function is that the first expression (in this case, @stuff1
) determines the outputted data type, even if that expression returns a NULL
. That means the @stuff2
value is confined by @stuff1
data type constraints. As a result, our PRINT
statement, rather than returning Big
Thing
, gives us Big
T
, which sounds more like a mob boss than a credible data item.
Parameters too can fall victim to truncation without warning. Suppose we create the following stored procedure:
1 2 3 4 5 6 7 8 9 |
IF OBJECT_ID('AddStuff', 'P') IS NOT NULL DROP PROCEDURE AddStuff; GO CREATE PROCEDURE AddStuff @stuff VARCHAR(10) AS INSERT INTO OurStuff(StuffName) VALUES(@stuff); GO |
A simple example, of course, but it demonstrates an important concept. Notice that the @stuff
parameter is defined as VARCHAR(10)
. Now suppose we pass in a value longer than 10 characters:
1 |
EXEC AddStuff 'Really Big Thing'; |
The database engine runs the procedure without a hitch and returns a message saying that one row is affected (the row inserted into the OurStuff
table). We can verify this by running a simple SELECT
statement:
1 |
SELECT * FROM OurStuff; |
The results, shown in the following table, indicate that the value has indeed been truncated, although not once did SQL Server raise an exception.
StuffID |
StuffName |
2 |
Really Big |
So be aware of what can happen when working with variables and parameters. You’ll likely want to include in your T-SQL code the logic necessary to catch values that might be faulty. Obviously, you can’t rely on the database engine to do it for you.
“SQL Server’s implicit conversions sometimes lead to unexpected query results, such as a value being returned as a data type different from what I expect. How do implicit conversions work?”
SQL Server is great at surreptitiously converting values from one type to another: CHAR
to VARCHAR
, FLOAT
to NCHAR
, BIGINT
to MONEY
. You get the picture. As long as the values are compatible with the target type, the database engine does all the work, and you get to kick back and reap the rewards.
Sort of.
Imagine the following scenario, in which we add a CHAR
variable to an INT
variable:
1 2 3 4 5 |
DECLARE @stuff1 CHAR(2) = '22', @stuff2 INT = 33; SELECT @stuff1 + @stuff2 AS AllStuff, SQL_VARIANT_PROPERTY(@stuff1 + @stuff2, 'basetype') AS BaseType; |
We declare the variables, assign values, and add them together. We also pull the base type of the returned value, just so we know what’s going on. The following table shows us the results of our SELECT
statement:
AllStuff |
BaseType |
55 |
int |
All looks fine on the surface. We add our INT
value to our CHAR
value and come up with the total (55), returned as an INT
value. The database engine has implicitly converted the CHAR
value to an INT
value. The INT
data type is used because INT
takes precedence over CHAR
. SQL Server is quite explicit when it comes to data type precedence. INT
wins out over CHAR
and VARCHAR
every time. If a DATETIME
value is tossed into the mix, it will beat out INT
and just about all other types. (You can find details about precedence rules in the MSDN topic “Data Type Precedence.”)
Returning to our example, SQL Server converts the CHAR
value to INT
and then proceeds to add the two integers together, giving us a result of 55
. But what if we didn’t want to add the values together but concatenate them instead? In that case, we’d have to explicitly convert the INT
value:
1 2 3 4 5 6 |
DECLARE @stuff1 CHAR(2) = '22', @stuff2 INT = 33; SELECT @stuff1 + CAST(@stuff2 AS CHAR(2)) AS AllStuff, SQL_VARIANT_PROPERTY(@stuff1 + CAST(@stuff2 AS CHAR(2)), 'basetype') AS BaseType; |
Now our results look quite different. Not only are the two values concatenated, but the value is also returned as a CHAR
type:
AllStuff |
BaseType |
2233 |
char |
You should also be aware of how SQL Server handles implicit conversions for numerical data. For example, suppose this time around, we create a DECIMAL
variable and an INT
variable, assign a value to the DECIMAL
variable, and then set the INT
variable to equal the DECIMAL
variable:
1 2 3 4 5 |
DECLARE @stuff1 DECIMAL(8,4) = 1234.9999, @stuff2 INT; SET @stuff2 = @stuff1; SELECT @stuff1 AS Stuff1, @stuff2 AS Stuff2; |
The SELECT
statement now returns these results:
Stuff1 |
Stuff2 |
1234.9999 |
1234 |
Notice what the database engine has done. Rather than rounding the value up, as might be expected, it simply truncates the value, giving us our integer, without too great a concern for the original value. And the problem isn’t limited to DECIMAL
–INT
conversions. Check out what happens when we go from INT
to REAL
and back again:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
DECLARE @stuff1 INT = 1080000000, @stuff2 INT = 1080000016, @stuff3 REAL; SET @stuff3 = @stuff2; SELECT @stuff3 AS Stuff3, CAST(@stuff3 AS INT) AS Stuff3int, CASE WHEN @stuff3 = @stuff1 THEN 'yes' ELSE 'no' END AS IsEqual; |
We declare the INT
variables and set their values, and then declare a REAL
variable and set its value to equal the second INT
variable. In our SELECT
statement, we then retrieve the REAL
variable as it is stored, convert it to an INT
, and then check whether the first and third variables are equal. The following table shows our results:
Stuff3 |
Stuff3in |
IsEqual |
1.08E+09 |
1080000000 |
yes |
Notice that the Stuff3
value appears as scientific notation because the @stuff3
variable had insufficient precision to hold the original integer. In addition, when the variable is converted to an integer, it returns a value different from what it was assigned. Instead, it now equals the value stored in @stuff1
.
In many cases, you shouldn’t leave it up to the database engine to do your conversions, no matter how tempting that might be. In fact, some developers insist that all conversions should be explicit. At the very least, be sure that whenever the integrity of the data could be brought into question, error on the side of converting the data explicitly.
“I’ve created a query that uses the ISNUMERIC
function to verify whether a value is numeric and, if so, convert it to an integer. In some cases, however, the database engine generates an error because it’s trying to convert a value that’s not really a numeric. What’s going on?”
The ISNUMERIC
function can be a slippery one. On the surface, its purpose appears only to determine whether the submitted value is indeed a valid numeric type. But looks can be deceiving. Take the following T-SQL script. It creates a table with a VARCHAR
column and then adds a mix of values to that column:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffName VARCHAR(10) NOT NULL ); GO INSERT INTO OurStuff(StuffName) VALUES('abc'), ('def123'), ('456'), ('$789'), ('1d1'), (','), ('.'), ('$.,'); |
Nothing extraordinary, just your run-of-the-mill alphanumeric hodgepodge. But now let’s look at what happens when we apply the ISNUMERIC
function to the StuffName
column:
1 2 |
SELECT StuffID, StuffName, ISNUMERIC(StuffName) AS IsNumber FROM OurStuff; |
All we’re doing is pulling the table’s two columns, as well as returning a third column to test each value’s numeric status. The ISNUMERIC
function returns a 1
if the tested value appears to be a numeric data type, otherwise returns a 0
. Here’s what we get:
StuffID |
StuffName |
IsNumber |
1 |
abc |
0 |
2 |
def123 |
0 |
3 |
456 |
1 |
4 |
$789 |
1 |
5 |
1d1 |
1 |
6 |
, |
1 |
7 |
. |
1 |
8 |
$., |
1 |
It should come as no surprise that the first two values are considered non-numeric and the third value is. However, the function also considers $789
to be numeric, despite the dollar sign (or more precisely, because of the dollar sign), and 1d1
to be numeric because of its resemblance to scientific notation, whether or not that is the intent.
The sky might not fall because of these interpretations, but they could cause problems if you want to perform other operations based on the output of the ISNUMERIC
function. For example, suppose we want to explicitly convert each value to an INT
based on whether we think that value is numeric:
1 2 3 4 5 6 7 8 |
SELECT StuffID, StuffName, CASE WHEN ISNUMERIC(StuffName) = 1 THEN CAST(StuffName AS INT) ELSE NULL END AS CaseResult FROM OurStuff WHERE StuffID BETWEEN 1 AND 3; |
In this case, we’re retrieving only the first three rows from our table. If the ISNUMERIC
function returns a 1
, we convert the value to the INT
type; otherwise, we return a NULL
value. As to be expected, the first two rows each return a NULL
and the third row returns the value 456
as an INT
, as shown in the following results:
StuffID |
StuffName |
CaseResult |
1 |
abc |
NULL |
2 |
def123 |
NULL |
3 |
456 |
456 |
Now let’s retrieve the fourth row from the table:
1 2 3 4 5 6 7 8 |
SELECT StuffID, StuffName, CASE WHEN ISNUMERIC(StuffName) = 1 THEN CAST(StuffName AS INT) ELSE NULL END AS CaseResult FROM OurStuff WHERE StuffID = 4; |
This time around, we receive the following error message, indicating that we cannot convert that value to an INT
:
1 2 |
Msg 245, Level 16, State 1, Line 1 Conversion failed when converting the varchar value '$789' to data type int. |
This shouldn’t come as too big of a surprise. The INT
data type has never been fond of dollar signs. The same goes for scientific notation. If we were to try to run the query against the last row in our table, we would receive a similar error.
And it’s not just dollar signs and possible scientific notation that can elicit problems. The ISNUMERIC
function recognizes an assortment of characters as being numeric (in addition to actual numbers), such as plus signs and minus signs and periods and tabs and commas. Take a look at the following example, in which we assign a tab to the @tab
variable:
1 2 |
DECLARE @tab VARCHAR(10) = ' '; PRINT ISNUMERIC(@tab); |
As odd as this might seem, the ISNUMERIC
function returns a 1
, as it will in the next example, in which we pass in a plus sign:
1 2 |
DECLARE @plus NVARCHAR(10) = '+'; PRINT ISNUMERIC(@plus); |
If you’re working with a version of SQL Server prior to 2012, you’ll have to come up with some sort of workaround to check for those circumstances in which the apparent numeric value would generate an error. You might, for example, create a function or add a CASE
statement to setup complex test conditions.
For those of you working with SQL Server 2012 or later, you have a better alternative-the TRY_CONVERT
function:
1 2 3 4 5 6 7 |
SELECT StuffID, StuffName, CASE WHEN ISNUMERIC(StuffName) = 1 THEN TRY_CONVERT(INT, StuffName) ELSE NULL END AS CaseResult FROM OurStuff; |
The function first tests whether the requested conversion will work. If so, it converts the data to the target data type. If not, it returns a NULL
:
StuffID |
StuffName |
CaseResult |
1 |
abc |
NULL |
2 |
def123 |
NULL |
3 |
456 |
456 |
4 |
$789 |
NULL |
5 |
1d1 |
NULL |
5 |
, |
NULL |
5 |
. |
NULL |
5 |
$., |
NULL |
As you can see, the only number that can be converted is 456
. The rest are there only to complicate the lives of overworked DBAs with nothing better to do than to search for wayward numbers.
“I’m told I should use the BIT
data type where possible to save space. Is it true that it saves space or is this just syntactic sugar?”
Yes, the BIT
data type can save space under certain circumstances, and it undoubtedly has some advantages over an approach such as using CHAR(1)
, which requires a CHECK
constraint to limit the values in the same way BIT
already does. With the BIT
data type, you can insert only the values 1
and 0
, as well as NULL
values, assuming the object is nullable.
That said, a BIT
data type still uses a byte of data, just like CHAR(1)
or TINYINT
. Some might have assumed that the BIT
type translates into only one bit of data, given the type’s tie-in to the 1
and 0
value limits. But that’s hardly the case. BIT
is actually in integer data type and consequently requires more legroom. Case in point:
1 2 3 4 |
DECLARE @stuffed BIT = 0; SELECT @stuffed AS Stuffed, DATALENGTH(@stuffed) AS BitLength; |
As you can see in the following results, the value does indeed use one byte:
Stuffed |
BitLength |
0 |
1 |
However, if you create a table with multiple BIT
columns, the database engine is clever enough to store the column data in a single byte, up to eight columns. If your table contains between 9 and 16 BIT
columns, SQL Server stores the data as two bytes. Between 17 and 24 BIT
columns? Three bytes, and so on. You get the picture.
That said, you should still be wary about implementing BIT
. It can have unexpected results when used in an expression, and it’s not necessarily well suited to masquerading as a bit flag or bit mask. For an interesting take on the whole issue, check out Joe Celko’s excellent article “BIT of a Problem.”
“When I use the DATALENGTH
and LEN
functions to calculate the size of a string, I often receive different results, even though I’d expect them to be the same. What’s the difference between the two functions?”
The DATALENGTH
function returns the number of bytes used by an expression, and the LEN
function returns the number of characters contained in an expression. The LEN
function is also limited to string expressions; whereas, the DATALENGTH
function supports expressions of any type.
To better understand how they both work, let’s take a look at them in action. The following T-SQL script creates a table with four columns, each configured with a character data type, and then inserts data into those columns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
IF OBJECT_ID('TestLength', 'U') IS NOT NULL DROP TABLE TestLength; GO CREATE TABLE TestLength ( char1 CHAR(10), nchar2 NCHAR(10), varchar3 VARCHAR(10), nvarchar4 NVARCHAR(10) ); GO INSERT INTO TestLength VALUES('stuff', 'stuff', 'stuff', 'stuff'); INSERT INTO TestLength VALUES('stuff ', 'stuff ', 'stuff ', 'stuff '); |
Notice that, in the final INSERT
statement, I’ve added several spaces after each value. You’ll see why in a moment. But first, let’s retrieve data from each of the columns, applying both the LEN
and DATALENGTH
functions:
1 2 3 4 5 6 7 8 9 10 |
SELECT LEN(char1) AS len1, DATALENGTH(char1) AS datalength1, LEN(nchar2) AS len2, DATALENGTH(nchar2) AS datalength2, LEN(varchar3) AS len3, DATALENGTH(varchar3) AS datalength3, LEN(nvarchar4) AS len4, DATALENGTH(nvarchar4) AS datalength4 FROM TestLength; |
The following table shows the statement’s output:
len1 |
datalength1 |
len2 |
datalength2 |
len3 |
datalength3 |
len4 |
datalength4 |
5 |
10 |
5 |
20 |
5 |
5 |
5 |
10 |
5 |
10 |
5 |
20 |
5 |
8 |
5 |
16 |
The len1
and datalength1
columns in the resultset refer to the char1
column in the TestLength
table. The column is configured as CHAR(10)
. The LEN
function provides of the actual number of characters being used (5
), not including the trailing spaces. The DATALENGTH
function indicates that 10 bytes are being used, despite the actual number of characters. This is because the column is configured at a fixed width of 10
. As a result, the DATALENGTH
function can provide us with a more accurate view of our storage requirements.
The len2
and datalength2
columns in the resultset refer to the nchar2
column and work much the same way, except that it is a Unicode column configured as NCHAR(10)
. The LEN
function still shows that five characters are being used (not including spaces), but the DATALENGTH
function indicates that the value actually requires 20 bytes, just like we’d expect.
When we get to the len3
and datalength3
columns, the results get even more interesting. This data is based on the varchar3
column, which is configured as VARCHAR(10)
. Even so, the LEN
function still indicates that only five characters are being used. However, because we’re working with a VARCHAR
column, the DATALENGTH
function indicates that five bytes are being used by the value in the first row and eight bytes by the value in the second row (to accommodate the trailing spaces). The len4
and datalength4
columns work in the same way, only the number of bytes is doubled in each case to account for the Unicode type.
As mentioned earlier, although the LEN
function is limited to string data, you can use the DATALENGTH
function on an expression of any data type. For example, the following T-SQL declares an INT
variable, assigns a value, and then checks the number of bytes the variable uses:
1 2 |
DECLARE @int INT = 1234567; SELECT @int AS IntNew, DATALENGTH(@int) AS IntLength; |
The SELECT
statement returns the results shown in the following table, which indicate that the value uses four bytes, the expected length for the INT
data type:
IntNew |
IntLength |
1234567 |
4 |
Even if our integer were made up of only two digits, the results would be the same. Same goes for a DATETIME
value, in terms of consistency:
1 2 |
DECLARE @now DATETIME = GETDATE(); SELECT @now AS DateNow, DATALENGTH(@now) AS DateLength; |
In this case, the DATALENGTH
function returns eight bytes:
DateNow |
DateLength |
2014-07-05 21:32:23.507 |
8 |
We would expect eight bytes because DATETIME
values are stored as two integers, one for the date and one for the time.
“I’m troubleshooting a stored procedure that uses a COALESCE
expression to return the first non-NULL value from a set of columns. However the procedure often generates a conversion error, even if the expression returns is the first column listed. Any idea why this might be occurring?”
A COALESCE
expression can be a bit tricky to get right at first. It is essentially shorthand for a type of CASE
expression that evaluates a series of expressions, often columns or variables, and returns the first one that does not evaluate to NULL
. Let’s look how it works. The following T-SQL declares several variables and assigns values to them, based on their respective types:
1 2 3 4 5 6 7 |
DECLARE @stuffNull VARCHAR(5) = NULL, @stuffID INT = 1001, @stuffName VARCHAR(10) = 'Big Thing'; SELECT COALESCE(@StuffNull, @StuffID, @StuffName) AS NotNullStuff, SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffID, @StuffName), 'basetype') AS BaseType; |
The COALESCE
expression takes the three variables as an argument. Because the @StuffNull
variable contains a NULL
value, it will be skipped and we’ll move to the next variable, @StuffID
. This one contains a value, so the SELECT
statement returns that value in its results:
NotNullStuff |
BaseType |
1001 |
int |
Everything seems in order here. The @StuffID
variable does not contain a NULL
, so that value is returned, and it is indeed an integer, just as we would expect. But what if we change the order of our variables:
1 2 3 4 5 6 7 |
DECLARE @stuffNull VARCHAR(5) = NULL, @stuffID INT = 1001, @stuffName VARCHAR(10) = 'Big Thing'; SELECT COALESCE(@StuffNull, @StuffName, @StuffID) AS NotNullStuff, SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffID, @StuffName), 'basetype') AS BaseType; |
The @StuffName
variable does not contain a NULL
value, so that variable’s value should be returned. However, the statement instead generates the following error:
1 2 |
Msg 245, Level 16, State 1, Line 5 Conversion failed when converting the varchar value 'Big Thing' to data type int. |
The problem is that a COALESCE
expression uses the data type with the highest precedence. It does not matter what order the arguments are presented. Because INT
has a higher precedence than VARCHAR
, the returned type is an INT
, which means the database engine is trying to convert the Big
Thing
string to an integer. It just doesn’t work.
Data type precedence also applies to a type’s length:
1 2 3 4 5 6 7 8 |
DECLARE @stuffNull VARCHAR(5) = NULL, @stuffName1 VARCHAR(10) = 'Big Thing', @stuffName2 VARCHAR(15) = 'Next Big Thing'; SELECT COALESCE(@StuffNull, @StuffName1, @StuffName2) AS NotNullStuff, SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffName1, @StuffName2), 'basetype') AS BaseType, SQL_VARIANT_PROPERTY(COALESCE(@StuffNull, @StuffName1, @StuffName2), 'maxlength') AS BaseLength; |
Because @StuffName1
is the first variable to contain a non-null value, that value is returned. However, because @StuffName2
has a greater length (15
as opposed to 10
), that is the data type of the returned value, as the following table shows:
NotNullStuff |
BaseType |
BaseLength |
Big Thing |
varchar |
15 |
Although both variables are VARCHAR
, the 15
length has precedence over the 10
length, just like INT
has precedence over a VARCHAR
.
“I’m developing a query that contains a BETWEEN
operator in the WHERE
clause. The compared data includes a range of consecutive values that contain both letters and numbers, as in XYZ60
through XYZ700
. Even when I specify the entire range, values are unexpectedly omitted from my query results. Any idea why this is occurring?”
When you use the BETWEEN
operator for non-numerical data, you have to keep in mind how SQL Server sorts and searches for data. The example data you provide is considered character data (non-numerical), so the returned range of values follow the rules of the database’s configured collation, which specifies how character data is sorted and compared, based on language and usage norms.
The best way to understand this is to see it in action. The following T-SQL creates the OurStuff
table, which includes a VARCHAR
column and DATEIME
column, and inserts a number of rows into those columns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffProdID VARCHAR(10) NOT NULL, StuffDate DATETIME NOT NULL ); GO INSERT INTO OurStuff(StuffProdID, StuffDate) VALUES ('ab38', '2014-07-06 12:46:48.240'), ('ab39', '2014-07-08 10:46:48.240'), ('ab48', '2014-07-09 08:46:48.240'), ('ab49', '2014-07-10 06:46:48.240'), ('ab58', '2014-07-11 04:46:48.240'), ('ab59', '2014-07-12 02:46:48.240'), ('ab398', '2014-07-13 22:46:48.240'), ('ab399', '2014-07-14 20:46:48.240'), ('ab400', '2014-07-15 18:46:48.240'), ('cd38', '2014-07-16 16:46:48.240'), ('cd39', '2014-07-17 14:46:48.240'), ('cd48', '2014-07-18 12:46:48.240'), ('cd49', '2014-07-19 10:46:48.240'), ('cd58', '2014-07-20 11:46:48.240'), ('cd59', '2014-07-21 12:46:48.240'), ('cd398', '2014-07-22 13:46:48.240'), ('cd399', '2014-07-23 14:46:48.240'), ('cd400', '2014-07-24 15:46:48.240'); |
Now suppose we issue a simple query that retrieves the rows of data based on StuffID
values 6
through 16
:
1 2 |
SELECT * FROM OurStuff WHERE StuffID BETWEEN 6 AND 16; |
Not surprisingly, the statement returns the results shown in the following table:
StuffID |
StuffProdID |
StuffDate |
6 |
ab59 |
2014-07-12 02:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
8 |
ab399 |
2014-07-14 20:46:48.240 |
9 |
ab400 |
2014-07-15 18:46:48.240 |
10 |
cd38 |
2014-07-16 16:46:48.240 |
11 |
cd39 |
2014-07-17 14:46:48.240 |
12 |
cd48 |
2014-07-18 12:46:48.240 |
13 |
cd49 |
2014-07-19 10:46:48.240 |
14 |
cd58 |
2014-07-20 11:46:48.240 |
15 |
cd59 |
2014-07-21 12:46:48.240 |
16 |
cd398 |
2014-07-22 13:46:48.240 |
he BETWEEN
operator does exactly what we expect. It includes rows 6
through 16
and omits the rest. However, now suppose we modify the BETWEEN
clause to retrieve data based on the StuffProdID
column:
1 2 |
SELECT * FROM OurStuff WHERE StuffProdID BETWEEN 'a' AND 'c'; |
Now our results include only the first nine rows:
StuffID |
StuffProdID |
StuffDate |
1 |
ab38 |
2014-07-06 12:46:48.240 |
2 |
ab39 |
2014-07-08 10:46:48.240 |
3 |
ab48 |
2014-07-09 08:46:48.240 |
4 |
ab49 |
2014-07-10 06:46:48.240 |
5 |
ab58 |
2014-07-11 04:46:48.240 |
6 |
ab59 |
2014-07-12 02:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
8 |
ab399 |
2014-07-14 20:46:48.240 |
9 |
ab400 |
2014-07-15 18:46:48.240 |
Although we’ve specified that the letter c
as the end of the range, no rows whose StuffProdID
value starts with that letter are included. They are not part of the acceptable range of values, based on the sort order defined by the collation. In other words, all values that start with c
include other characters, so they’re considered out of range.
To help clarify this, let’s sort our table based on the StuffProdID
column:
1 2 |
SELECT * FROM OurStuff ORDER BY StuffProdID; |
The data in the StuffProdID
column is sorted based on the rules of the database collation:
StuffID |
StuffProdID |
StuffDate |
1 |
ab38 |
2014-07-06 12:46:48.240 |
2 |
ab39 |
2014-07-08 10:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
8 |
ab399 |
2014-07-14 20:46:48.240 |
9 |
ab400 |
2014-07-15 18:46:48.240 |
3 |
ab48 |
2014-07-09 08:46:48.240 |
4 |
ab49 |
2014-07-10 06:46:48.240 |
5 |
ab58 |
2014-07-11 04:46:48.240 |
6 |
ab59 |
2014-07-12 02:46:48.240 |
10 |
cd38 |
2014-07-16 16:46:48.240 |
11 |
cd39 |
2014-07-17 14:46:48.240 |
16 |
cd398 |
2014-07-22 13:46:48.240 |
17 |
cd399 |
2014-07-23 14:46:48.240 |
18 |
cd400 |
2014-07-24 15:46:48.240 |
12 |
cd48 |
2014-07-18 12:46:48.240 |
13 |
cd49 |
2014-07-19 10:46:48.240 |
14 |
cd58 |
2014-07-20 11:46:48.240 |
15 |
cd59 |
2014-07-21 12:46:48.240 |
The data is sorted first by the first character, then by the second character, within the first-character grouping, next by the third character, within the second-character grouping, and so on. If we reconsider our BETWEEN
clause (BETWEEN
'a'
AND
'c'
), we’ll see that the range includes all the values that start with a, but none that start with c
. If we had a c
-only value, that would be included, but in this case, all of our values that start with c
include other characters as well and consequently are outside the range specified in the BETWEEN
clause. Now let’s specify a more inclusive range:
1 2 |
SELECT * FROM OurStuff WHERE StuffProdID BETWEEN 'ab3' AND 'cd4'; |
Because our range includes all rows up to cd4
, our resultset includes several more rows then it did when we specified only c
as the upper end of the range:
StuffID |
StuffProdID |
StuffDate |
1 |
ab38 |
2014-07-06 12:46:48.240 |
2 |
ab39 |
2014-07-08 10:46:48.240 |
3 |
ab48 |
2014-07-09 08:46:48.240 |
4 |
ab49 |
2014-07-10 06:46:48.240 |
5 |
ab58 |
2014-07-11 04:46:48.240 |
6 |
ab59 |
2014-07-12 02:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
8 |
ab399 |
2014-07-14 20:46:48.240 |
9 |
ab400 |
2014-07-15 18:46:48.240 |
10 |
cd38 |
2014-07-16 16:46:48.240 |
11 |
cd39 |
2014-07-17 14:46:48.240 |
16 |
cd398 |
2014-07-22 13:46:48.240 |
17 |
cd399 |
2014-07-23 14:46:48.240 |
Note, however, we’re still missing any rows that start with cd4
or cd5
because they fall outside the specified range. If you refer back to the fully sorted resultset, you’ll see that the values cd400
through cd59
are at the end of the result set, after cd4
.
The following example demonstrates this concept further:
1 2 |
SELECT * FROM OurStuff WHERE StuffProdID BETWEEN 'ab38' AND 'ab400'; |
Now our results include only five rows, those that fall within the specified range. As you can see in the following results, ab400
is now included because we specified the entire value, not just the first part of it:
StuffID |
StuffProdID |
StuffDate |
1 |
ab38 |
2014-07-06 12:46:48.240 |
2 |
ab39 |
2014-07-08 10:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
8 |
ab399 |
2014-07-14 20:46:48.240 |
9 |
ab400 |
2014-07-15 18:46:48.240 |
If you’re working with DATETIME
values, you also need to be aware of how the data is sorted and compared. The following example specifies a date range three days apart:
1 2 |
SELECT * FROM OurStuff WHERE StuffDate BETWEEN '20140710' AND '20140713'; |
Because the range specified in a BETWEEN
clause is inclusive, we might expect four rows to be returned, but instead we get only three:
StuffID |
StuffProdID |
StuffDate |
4 |
ab49 |
2014-07-10 06:46:48.240 |
5 |
ab58 |
2014-07-11 04:46:48.240 |
6 |
ab59 |
2014-07-12 02:46:48.240 |
When working with DATETIME
values, keep in mind that the value is always made up of the date and time, down to a thousandth of a second. The time part is factored into the range calculations, so you must take that part into account when you’re specifying your range. If you don’t specify a specific time, as is the case in our SELECT
statement, SQL Server assumes the time is midnight-all zeroes.
One approach to dealing with this issue is to specify the entire DATETIME
values:
1 2 3 |
SELECT * FROM OurStuff WHERE StuffDate BETWEEN '2014-07-10 06:46:48.240' AND '2014-07-13 22:46:48.240'; |
This way, our results will include all rows whose DATETIME
values fall within this very specific range:
StuffID |
StuffProdID |
StuffDate |
4 |
ab49 |
2014-07-10 06:46:48.240 |
5 |
ab58 |
2014-07-11 04:46:48.240 |
6 |
ab59 |
2014-07-12 02:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
The trouble with this approach, however, is that it can get fairly cumbersome. Instead of using the BETWEEN
operator, you might consider the greater than or equal to operator (>=
) along with the lesser than operator (<
):
1 2 3 |
SELECT * FROM OurStuff WHERE StuffDate >= '20140710' AND StuffDate < '20140714'; |
For the beginning of the range, we specify that the date be greater than or equal to the specified date, which puts us right at midnight. We then use the lesser than operator to specify one day past our intended date. Because it will also be set to midnight, it will pick up anything before midnight on the previous day. The following table shows the results now returned by the SELECT
statement:
StuffID |
StuffProdID |
StuffDate |
4 |
ab49 |
2014-07-10 06:46:48.240 |
5 |
ab58 |
2014-07-11 04:46:48.240 |
6 |
ab59 |
2014-07-12 02:46:48.240 |
7 |
ab398 |
2014-07-13 22:46:48.240 |
“I’ve seen some T-SQL code that contains ‘national
character
varying(max)
‘ listed as that data type, rather than one of the normal types, yet everything seems to work fine. Is this something new?”
There’s nothing new about national character data types such as NATIONAL
CHARACTER
VARYING
. SQL Server includes them as synonyms to appease the ISO gods in order to conform to the international standards set by that organization. In fact, many of the SQL Server data types include ISO synonyms. (The TechNet topic “Data Type Synonyms” lists all the ISO synonyms.)
You can use the synonyms in place of the names of the base data types in your data definition language (DDL) statements. However, once you create the object, the synonyms are no longer used. For example, the following T-SQL script creates a table that includes columns configured with the NVARCHAR
and NATIONAL
CHARACTER
VARYING
data types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffName1 NVARCHAR(MAX) NOT NULL, Stuffname2 NATIONAL CHARACTER VARYING(MAX) NOT NULL ); GO INSERT INTO OurStuff(StuffName1, Stuffname2) VALUES('Really Big Thing', 'Amazingly Big Thing'); |
We can specify the national character type just like we do any other type. Plus, we can insert and manipulate data just like any other type. But now let’s look at the column metadata:
1 2 3 4 5 6 7 |
SELECT c.name AS ColumnName, t.name AS TypeName FROM sys.columns c INNER JOIN sys.types t ON c.user_type_id = t.user_type_id WHERE object_id = OBJECT_ID('OurStuff') ORDER BY c.column_id; |
All we’re doing is pulling the column names and their data types, after we created our table. The following results tell all:
ColumnName |
TypeName |
StuffID |
int |
StuffName1 |
nvarchar |
Stuffname2 |
nvarchar |
As you can see, both character columns have been created with the NVARCHAR
data type. The NATIONAL
CHARACTER
VARYING
alias has disappeared altogether.
“When is it best to use the CHAR
date type rather than VARCHAR
?”
The key to knowing when to use one over the other is to first understand the differences between them. The CHAR
and NCHAR
data types store data at a fixed length. If an inserted value is less then that length, it is padded with trailing spaces to ensure that every value is of equal length. The VARCHAR
and NVARCHAR
data types are variable length, which means the size depends on the inserted value, plus an additional couple of bytes for overhead.
The following example helps illustrate the differences between the data types:
1 2 3 4 5 6 7 8 9 10 |
DECLARE @stuff1 CHAR(7) = 'thing', @stuff2 NCHAR(7) = 'thing', @stuff3 VARCHAR(10) = 'thing', @stuff4 NVARCHAR(10) = 'thing'; SELECT DATALENGTH(@stuff1) AS char1, DATALENGTH(@stuff2) AS nchar2, DATALENGTH(@stuff3) AS varchar3, DATALENGTH(@stuff4) AS nvarchar4; |
All we’re doing here is using the DATALENGTH
function to determine the size of each variable, as shown in the following results:
char1 |
nchar2 |
varchar3 |
nvarchar4 |
7 |
14 |
5 |
10 |
Although the value is only five characters long, the length of the CHAR
value is seven bytes, because that was how the variable was defined. The NCHAR
value works the same way, except that it’s doubled to account for the Unicode characters, so the DATALENGTH
function returns 14
. On the other hand, the VARCHAR
value requires only the bytes needed for the value (not including the overhead bytes). The NVARCHAR
value merely doubles that.
The choice, then, on whether to use CHAR
or VARCHAR
depends on how consistent the size of the values are and how small the values are. Microsoft recommends that you use CHAR
only when value lengths are consistent, and use VARCHAR
when the lengths vary a lot. Microsoft is noncommittal about the gray areas in between.
Small columns (fewer than five or six characters) often work well as CHAR
columns when you take into account the overhead added to a VARCHAR
column. Even so, some developers would suggest that you use CHAR
only if all values are exactly the same length. If values are shorter than the designated length, they get padded with spaces, which can be annoying to deal with and could potentially waste a lot of space if your values are mostly one-to-two characters but you’ve created a CHAR(5)
column to accommodate a relatively few five-digit values.
That said, if all your values are a comparable length, you don’t incur the storage penalty for overhead that comes with VARCHAR
, although storage has grown so cheap, some of the concerns about the extra space have been mitigated. You might see some performance gains if you go with CHAR
, but the significance of those gains can vary and are perhaps not worth the trade-off of having to deal with the trailing spaces. Even so, CHAR
is a useful option when your data is consistent and small.
“I’m building a query that joins a FLOAT
column to a REAL
column, but when I run the statement, it returns no results, even though I can see the matching values in the two columns. Why is this happening?”
Welcome to the shifty world of approximate-number data types, in which floating-point data is merely an approximation and the numbers you see might not be the real numbers at all. Let me demonstrate. The following T-SQL declares a FLOAT
variable, sets its value, and retrieves the value as its represented in SQL Server Management Studio (SSMS) as well as it actually exists in the database:
1 2 3 |
DECLARE @StuffAvg FLOAT = 33.33; SELECT @StuffAvg AS StuffAvg, STR(@StuffAvg, 30, 15) AS RealStuff; |
The STR
function converts the numerical data to character data and, in the process, gives us a glimpse of what’s really out there:
StuffAvg |
RealStuff |
33.33 |
33.329999999999998 |
So here’s what’s happened. We assigned the value 33.33
to the variable, but the database engine actually stored it as 33.329999999999998
, thus the approximation component. When SSMS retrieves that value, however, it sees fit to once again make it look nice and neat (a fact that displeases more than a few in the SQL Server user community).
On the surface, all this might not seem too big a deal-or perhaps it does. In any case, let’s take a closer look. The following T-SQL creates a table that contains a FLOAT
column and a REAL
column and then populates those columns with a few choice values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffAvg1 FLOAT NOT NULL, StuffAvg2 REAL NOT NULL ); GO INSERT INTO OurStuff(StuffAvg1, StuffAvg2) VALUES(33.33, 33.33), (66.66, 66.66), (99.99, 99.99); |
Now let’s do what we did with our variable and retrieve the SSMS-enhanced values and the actual values:
1 2 3 4 5 6 7 |
SELECT StuffID, StuffAvg1, STR(StuffAvg1, 30, 15) AS RealStuff1, StuffAvg2, STR(StuffAvg2, 30, 15) AS RealStuff2 FROM OurStuff; |
Once again, our trusty STR
function comes to the rescue and shows us what’s what:
StuffID |
StuffAvg1 |
RealStuff1 |
StuffAvg2 |
RealStuff2 |
1 |
33.33 |
33.329999999999998 |
33.33 |
33.330001831054688 |
2 |
66.66 |
66.659999999999997 |
66.66 |
66.660003662109375 |
3 |
99.99 |
99.989999999999995 |
99.99 |
99.989997863769531 |
As you can see, not only do the FLOAT
and REAL
data types like to approximate their values, they also like to do it differently. Yet according to the SSMS perspective, each pair of values is exactly the same. What makes matters worse is that our references to the values also get somewhat fuzzy. Check it out:
1 2 |
SELECT * FROM OurStuff WHERE StuffAvg1 = 33.33; |
When we specify 33.33
in our WHERE
clause, the applicable row is returned:
StuffID |
StuffAvg1 |
StuffAvg2 |
1 |
33.33 |
33.33 |
Now let’s specify 33.329999999999998
in our WHERE
clause:
1 2 |
SELECT * FROM OurStuff WHERE StuffAvg1 = 33.329999999999998; |
The statement returns the exact results as the previous one. And it works the same way for our REAL
column. Either value will return the same row.
By all appearances, then, it would seem that we could specify either value or compare values and everything would work out fine. Guess again. This time around, we’ll compare the values as they appear in SSMS, without the benefit of STR
:
1 2 3 4 5 6 7 8 |
DECLARE @StuffAvg1 FLOAT = 33.33; DECLARE @StuffAvg2 REAL = 33.33; PRINT CASE WHEN @StuffAvg1 = @StuffAvg2 THEN 'equal' ELSE 'not equal' END; |
It turns out that the statement returns a value of not
equal
, suggesting that, despite appearances, not all instances of FLOAT
and REAL
are anything close to equal. So what happens if we try to join a FLOAT
column to a REAL
column? To find out, let’s create a couple tables and add some data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO IF OBJECT_ID('OurOtherStuff', 'U') IS NOT NULL DROP TABLE OurOtherStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffAvg1 FLOAT NOT NULL ); GO CREATE TABLE OurOtherStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffAvg2 REAL NOT NULL ); GO INSERT INTO OurStuff(StuffAvg1) VALUES(33.33), (66.66), (99.99); INSERT INTO OurOtherStuff(StuffAvg2) VALUES(33.33), (66.66), (99.99); |
Now let’s retrieve the data in the OurStuff
table:
1 |
SELECT * FROM OurStuff; |
Not surprisingly, our results look like this:
StuffID |
StuffAvg1 |
1 |
33.33 |
2 |
66.66 |
3 |
99.99 |
Next, let’s retrieve the data in the OurOtherStuff
table:
1 |
SELECT * FROM OurOtherStuff; |
Once again, no surprises:
StuffID |
StuffAvg2 |
1 |
33.33 |
2 |
66.66 |
3 |
99.99 |
So with everything in place, let’s join these critters:
1 2 3 |
SELECT a.StuffID, a.StuffAvg1, b.StuffAvg2 FROM OurStuff a INNER JOIN OurOtherStuff b ON a.StuffAvg1 = b.StuffAvg2; |
As it turns out, the statement returns no rows, which is why Microsoft generally recommends that you avoid using FLOAT
or REAL
in equality constructions in places such as joins or WHERE
conditions. To make this work, you would need to do some rounding or converting or creating calculated columns or something as equally clever. Just don’t rely on FLOAT
and REAL
for any sort of precision. That’s not what they were designed to do. They offer a great deal of flexibility, but at a price.
“I have a table that includes a VARCHAR
column and a stored procedure that accesses data from the table, based on values in that column. The procedure includes an NVARCHAR
parameter used to qualify the WHERE
clause, which references the VARCHAR
column. Does it matter that the data types are different between the parameter and the column?”
Yes, it matters, but let’s make sure we’re talking about the same thing. The following T-SQL script creates a table, inserts data into that table, creates a procedure that retrieves data from that table, and then executes the procedure, passing in the parameter value Really
Big
Thing
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffName VARCHAR(25) NOT NULL, StuffDate DATETIME NOT NULL DEFAULT GETDATE() ); GO INSERT INTO OurStuff(StuffName) VALUES('Thing'), ('Big Thing'), ('Really Big Thing'), ('Amazingly Big Thing'); IF OBJECT_ID('FindStuff', 'P') IS NOT NULL DROP PROCEDURE FindStuff; GO CREATE PROCEDURE FindStuff @stuff NVARCHAR(25) AS SELECT StuffID, StuffDate FROM OurStuff WHERE StuffName = @stuff; GO EXEC FindStuff 'Really Big Thing'; |
When you run the procedure, it returns the StuffID
and StuffDate
values, as the following results show:
StuffID |
StuffDate |
3 |
2014-07-06 17:16:29.457 |
ou might have noticed that the table’s StuffName
column is configured as VARCHAR
and the procedure’s @stuff
parameter is configured as NVARCHAR
. The database engine implicitly converts the value without no apparent problem. In the background, however, we might find a different story. Although the database engine can perform an implicit conversion without a hiccup, the operation can result in a performance penalty, in part because it is preventing index seeks. As a result, when you start adding millions-or billions-of rows to the equation, a minor data type mismatch can have a significant impact. For more information about all this, check out my Simple Talk article “Data Conversion in SQL Server.”
“My query needs to calculate past and future dates based on an existing date as well as calculate the time difference between two dates. I’ve tried using arithmetic operators to perform the calculations, but the results are never right. How do I go about working with dates?”
The best two tools for achieving what you’re trying to do are the DATEADD
and DATEDIFF
functions. DATEADD
subtracts or adds a specified interval of time to a date value, which can be any date or time data type. DATEDIFF
returns the interval of time between two specified date values. Again, the values can be any of the date or time date types.
Let’s first look at how the DATEADD
function works. The following T-SQL declares a DATETIME
variable, uses the GETDATE
function to assign the current date and time values to the function, and then retrieves the dates 10 years forward and 10 years back:
1 2 3 4 5 |
DECLARE @StuffDate DATETIME = GETDATE(); SELECT @StuffDate AS StuffDate, DATEADD(yy, 10, @StuffDate) AS Plus10, DATEADD(yy, -10, @StuffDate) AS Less10; |
The DATEADD
function takes three arguments. The first argument is the date part, such as day, week, month, or year. In this case, we’ve specified yy
for year. The second argument is the number of years forward or backward we want to go. Finally, we specify our base date. The SELECT
statement returns the following results:
StuffDate |
Plus10 |
Less10 |
2014-07-07 19:03:50.183 |
2024-07-07 19:03:50.183 |
2004-07-07 19:03:50.183 |
If we want to extract a specific date or time element, we can use the DATEPART
function, as in the following example:
1 2 3 4 5 |
DECLARE @StuffDate DATETIME = GETDATE(); SELECT DATEPART(yy, @StuffDate) AS StuffDate, DATEPART(yy, DATEADD(yy, 10, @StuffDate)) AS Plus10, DATEPART(yy, DATEADD(yy, -10, @StuffDate)) AS Less10; |
All we’ve done is specify that only the year should be extracted from the date, as shown in the following results:
StuffDate |
Plus10 |
Less10 |
2014 |
2024 |
2004 |
Now let’s use the DATEDIFF
function to calculate the number of months between two dates:
1 2 3 4 5 6 |
DECLARE @StuffDate1 DATETIME = GETDATE(); DECLARE @StuffDate2 DATETIME = DATEADD(yy, 10, @StuffDate1); SELECT DATEPART(yy, @StuffDate1) AS StuffDate1, DATEPART(yy, @StuffDate2) AS StuffDate2, DATEDIFF(mm, @StuffDate1, @StuffDate2) AS MonthDiff; |
The DATEDIFF
function also requires three arguments. As with DATEADD
, the first argument is the date part. This time around we’re using mm
for month. The next two arguments are the dates we want to compare, in this case, our two DATETIME
variables. As the following results show, there are 120 months between the two dates:
StuffDate1 |
StuffDate2 |
MonthDiff |
2014 |
2024 |
120 |
The functions available in SQL Server for working with date and time values are definitely ones worth getting to know. Date and time values are treated differently from other types of data and require special consideration. You can find more about these functions in the MSDN topic “Date and Time Data Types and Functions.”
“I’m told never to use SQL_VARIANT
data type inappropriately. What is an appropriate use for this data type?”
SQL_VARIANT is included in SQL Server mainly for system use. Although it is used within system procedures and views, It has several important restrictions. It can only be used with great caution, and there is usually a better way. It cannot be used in a relational table. However, there do exist some appropriate specialised uses.
An appropriate use is one in which you have no other reasonable option or a reasonable option requires a workaround that causes more trouble than using SQL_VARIANT
. In fact, under the right circumstances, the data type could prove a valuable tool. It supports most T-SQL base types all within a single structure, such as a column or parameter. For example, if you define a column with SQL_VARIANT
, you can insert character data and numerical data and data/time data. Here’s proof. The following T-SQL script creates a table that includes a SQL_VARIANT
column and then inserts into it an assortment of values:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
IF OBJECT_ID('OurStuff', 'U') IS NOT NULL DROP TABLE OurStuff; GO CREATE TABLE OurStuff ( StuffID INT NOT NULL IDENTITY PRIMARY KEY, StuffName SQL_VARIANT NOT NULL ); GO INSERT INTO OurStuff(StuffName) VALUES('Same Old Thing'); INSERT INTO OurStuff(StuffName) VALUES(479); INSERT INTO OurStuff(StuffName) VALUES(GETDATE()); INSERT INTO OurStuff(StuffName) VALUES(1e9); INSERT INTO OurStuff(StuffName) VALUES(453.66903438); |
Sure, we can insert the same data into a VARCHAR
column, but the values would all come out VARCHAR
data. The SQL_VARIANT
data type, on the other hand, preserves the original type, which we can verify by using the SQL_VARIANT_PROPERTY
function:
1 2 3 4 5 6 |
SELECT StuffName, SQL_VARIANT_PROPERTY(StuffName, 'basetype') AS BaseType, SQL_VARIANT_PROPERTY(StuffName, 'maxlength') AS BaseLength, SQL_VARIANT_PROPERTY(StuffName, 'precision') AS BasePrecision FROM OurStuff; |
And here are the results:
StuffName |
BaseType |
BaseLength |
BasePrecision |
Same Old Thing |
varchar |
8000 |
0 |
479 |
int |
4 |
10 |
2014-07-08 09:06:34.440 |
datetime |
8 |
23 |
1000000000 |
float |
8 |
53 |
453.66903438 |
numeric |
9 |
11 |
Notice that each type has been preserved. SQL_VARIANT
was even able to pick up on the scientific notation. So you can see why, in certain circumstances, such a data type could be useful. For example, you might consider SQL_VARIANT
when defining a parameter in a stored procedure or function. This approach helps to maintain some of the precision you might otherwise lose when passing in numeric or date/time values through a VARCHAR
parameter. Another example of how SQL_VARIANT
might be useful is if you’re supporting an application that allows users to define their own tables and fields. SQL_VARIANT
can help you avoid having to set up extra structures to support each data type.
You’ll have to decide on a case-by-case basis whether to use SQL_VARIANT
, but know that the data type comes with a number of limitations. For instance, SQL Server converts SQL_VARIANT
values to NVARCHAR(4000)
to support older versions of the OLE DB and ODBC providers. Same thing with many other external tools that don’t recognize SQL_VARIANT
. In addition, using linked servers in conjunction with SQL_VARIANT
can lead to potential memory leaks. Plus, you can’t use the data type in statement elements such as LIKE
expressions or computed columns. So before you implement the SQL_VARIANT
data type, do your homework. A good place to start with the TechNet topic “Using sql_variant Data.
Load comments