10 things in SQL Server (which don't work as expected)
10. Unique NULL
When declaring UNIQUE constraints on columns, SQL Server treats all NULL values as unique. This means you can't have more than one NULL value in a unique column.
This behavior violates the standard and is specific to SQL Server.
To work around this, implement the constraint by declaring a UNIQUE index which would exclude NULL values:
CREATE UNIQUE INDEX
ux_mytable_mycolumn
ON mytable (mycolumn)
WHERE mycolumn IS NOT NULL
Note that this will prevent using the index for searching NULL values.
Fortunately, to search for a NULL value (even in a variable) you have to use a special predicate, IS NULL, as NULL = NULL does not evaluate to TRUE and hence won't satisfy a WHERE condition.
Thus, you can create an additional (non-unique) index:
CREATE INDEX
ix_mytable_mycolumn_null
ON mytable (mycolumn)
WHERE mycolumn IS NULL
which will be used for IS NULL searches.
9. NULLS FIRST
When using ORDER BY on a column containing NULL values, these values come out first.
This behavior is not specified by the standard, and is observed in SQL Server and MySQL.
Oracle and PostgreSQL sort NULL values last by default. Also, those two RDBMS support an extension to ORDER BY clause: each expression accepts possible modifiers NULLS FIRST and NULLS LAST, instructing the engine how to sort the NULL values returned by this expression. These modifiers can also be used in index definitions, thus defining record order not only in queries, but also in indexes.
To work around this, use an additional ordering expression in queries:
SELECT *
FROM mytable
ORDER BY
CASE WHEN mycolumn IS NULL THEN 1 END,
mycolumn
Unfortunately, SQL Server, unlike Oracle and PostgreSQL, does not allow creating indexes over expressions. So in order to use it in an index, you would have to add a computed column to the table:
ALTER TABLE
mytable
ADD mycolumn_nulls_last AS CASE WHEN mycolumn IS NULL THEN 1 END
GO
CREATE INDEX
ix_mytable_mycolumn_nulls_last
ON mytable (mycolumn_nulls_last, mycolumn)
GO
For this index to be usable in the queries, you should always use
mycolumn_nulls_last in the ORDER BY as well:SELECT *
FROM mytable
ORDER BY
mycolumn_nulls_last,
mycolumn
8. Multiple columns in IN
You can't use multiple columns in the IN predicates. This query:
SELECT *
FROM mytable
WHERE (col1, col2) IN
(
SELECT col1, col2
FROM othertable
)
won't work.
This behavior violates the standard and is specific to SQL Server.
To work around this, replace IN with EXISTS:
SELECT *
FROM mytable m
WHERE EXISTS
(
SELECT NULL
FROM othertable o
WHERE o.col1 = m.col1
AND o.col2 = m.col2
)
Note that this only applies to IN, as opposed to NOT IN.
NOT IN is a little bit different from NOT EXISTS in the way it handles NULL values.
To emulate this NOT IN query:
SELECT *
FROM mytable m
WHERE (col1, col2) NOT IN
(
SELECT col1, col2
FROM othertable
)
we would have to use this:
SELECT *
FROM mytable m
WHERE NOT EXISTS
(
SELECT NULL
FROM othertable o
WHERE o.col1 = m.col1
AND o.col2 = m.col2
)
AND NOT EXISTS
(
SELECT NULL
FROM othertable o
WHERE o.col1 IS NULL
OR
o.col2 IS NULL
)
The second predicate makes sure that
othertable does not have NULL values in eithercol1 or col2. Any such value would make the original query to return no records at all, despite what the other values are.7. No USING clause
SQL Server does not support USING clause in joins.
This behavior violates the standard and is specific to SQL Server.
USING clause allows you to write joins like this:
SELECT *
FROM mytable
JOIN othertable
USING (id)
instead of
SELECT *
FROM mytable m
JOIN othertable o
USING o.id = m.id
Note that those two queries are not identical: the first one will only return id once.
This is usually not much of an issue, however there's a case when this clause really helps: a FULL JOIN.
Imagine a PostgreSQL query like this:
SELECT *
FROM table1
FULL JOIN
table2
USING (id)
FULL JOIN
table3
USING (id)
ORDER BY
id
LIMIT 10
This is pretty self-explanatory: it takes first 10 records in
id order from any of the three tables, filling potentially missing records in other two tables with NULL values if needed.
Let's try to make the same in SQL Server:
SELECT TOP 10
*
FROM table1 t1
FULL JOIN
table2 t2
ON t2.id = t1.id
FULL JOIN
table3 t3
ON t3.id = COALESCE(t1.id, t2.id)
ORDER BY
COALESCE(t1.id, t2.id, t3.id)
As you can see, we need to use an ugly COALESCE for joining and ordering, as with the FULL join either table can yield a NULL in id. This also will prevent the optimizer from using a MERGE JOIN which is the fastest method provided that all id columns are indexed.
To benefit from MERGE JOIN, we need to rewrite the query like this:
SELECT *
FROM (
SELECT TOP 10
id
FROM (
SELECT id
FROM table1
UNION ALL
SELECT id
FROM table2
UNION
SELECT id
FROM table3
) q
ORDER BY
id
) q
LEFT JOIN
table1 t1
ON t1.id = q.id
LEFT JOIN
table2 t2
ON t2.id = q.id
LEFT JOIN
table3 t3
ON t3.id = q.id
SQL Server will be able to use MERGE UNION instead of MERGE JOIN, which is just as good for our purposes.
6. String concatenation
Contatenating strings is performed using the same operator
+ which is used for adding numerics. This may lead to confusion when the numbers and strings are mixed.
This behavior is not defined by the standard and is specific to SQL Server.
When attempting to concatenate a numeric and a string, bear in mind that it's the string which will be cast to a numeric and added, rather than the numeric cast to a string and concatenated.
Operators are evaluated left to right.
Here are some queries:
SELECT '1' + 1
--
2
A string plus numeric yields a numeric and the string is cast to a numeric.
SELECT 1 + '1'
--
2
A numeric plus string yields a numeric and the string is cast to a numeric.
SELECT '3' + '2'
--
'32'
A string plus string yields a string, and the strings are concatenated.
SELECT '1' + '2' + 3
--
15
A string plus string plus numeric yields a numeric. First, two strings are concatenated (yielding '12'), then the resulting string is cast into a numeric and added to 3 (yielding 15)
SELECT '1' + 'a' + 3
--
Conversion failed when converting the varchar value '1a' to data type int.: SELECT '1' + 'a' + 3
First, the two strings are concatenated, yielding '1a'. Then, an attempt to cast the resulting string to a numeric is made, which fails and yields the error above.
If you want to concatenate a numeric and a string, explicitly cast the former to the latter:
SELECT 'Value' + CAST(123 AS CHAR)
--
'Value123'
5. MAX(BIT)
SQL Server supports a special datatype for storing booleans, called BIT. However, you cannot do aggregate AND or aggregate OR on those fields (usually those are done with MAX and MIN).
This behavior is not defined by the standard and is specific to SQL Server.
BIT datatype is designed for storing booleans. Up to 8 bit values in a record are stored in a single byte, 9 to 16 values are stored in two bytes etc, making this type very storage-efficient.
However if we tried to aggregate those values, we would encounter an error:
CREATE TABLE
bitvalues
(
id INT NOT NULL PRIMARY KEY,
v1 BIT,
v2 BIT,
v3 BIT,
)
CREATE INDEX
ix_bitvalues_v1
ON bitvalues (v1)
CREATE INDEX
ix_bitvalues_v2
ON bitvalues (v2)
CREATE INDEX
ix_bitvalues_v3
ON bitvalues (v3)
INSERT
INTO bitvalues
VALUES (1, 0, 0, 1),
(2, 0, 1, 1)
SELECT MAX(v1), MAX(v2), MAX(v3),
MIN(v1), MIN(v2), MIN(v3)
FROM bitvalues
--
Operand data type bit is invalid for max operator.: SELECT MAX(v1), MAX(v2), MAX(v3), MIN(v1), MIN(v2), MIN(v3) FROM bitvalues
A simple way would be casting them to integers:
SELECT MAX(CAST(v1 AS INT)),
MAX(CAST(v2 AS INT)),
MAX(CAST(v3 AS INT)),
MIN(CAST(v1 AS INT)),
MIN(CAST(v2 AS INT)),
MIN(CAST(v3 AS INT))
FROM bitvalues
but this would prevent indexes from being used.
A more efficient way would be using TOP 1 / ORDER BY instead of MAX and MIN. Ordering, unlike MAX and MIN, does work on bits:
SELECT (
SELECT TOP 1
v1
FROM bitvalues
ORDER BY
v1 DESC
),
(
SELECT TOP 1
v2
FROM bitvalues
ORDER BY
v2 DESC
),
(
SELECT TOP 1
v3
FROM bitvalues
ORDER BY
v1 DESC
),
(
SELECT TOP 1
v1
FROM bitvalues
ORDER BY
v1
),
(
SELECT TOP 1
v2
FROM bitvalues
ORDER BY
v2
),
(
SELECT TOP 1
v3
FROM bitvalues
ORDER BY
v3
)
4. TRIM
SQL Server does not implement TRIM function which strips whitespace characters from both ends of a string.
This behavior violates the standard and is specific to SQL Server.
Just use a combination of LTRIM and RTRIM:
SELECT LTRIM(RTRIM(' my string with spaces '))
--
'my string with spaces'
3. LEAST and GREATEST
SQL Server does not support LEAST and GREATEST: the functions which return the least or the greatest value from a list of arguments, appropriately.
This is not specified by the standard and is specific to SQL Server.
To emulate those functions, one can use nested CASE statements:
SELECT CASE WHEN x > y THEN x ELSE y END
FROM mytable
but this is not really readable and prone to errors even with three columns (to say nothing of more).
A better way would be using a MAX or MIN in a subquery, composed of union'ed queries returning the appropriate fields:
SELECT (
SELECT MIN(v)
FROM (
SELECT x
UNION ALL
SELECT y
UNION ALL
SELECT z
UNION ALL
SELECT t
) q(v)
)
FROM mytable
2. ORDER BY in nested queries
SQL Server does not support ORDER BY in nested queries. This query won't work:
SELECT *
FROM (
SELECT *
FROM mytable
WHERE col1 = 42
ORDER BY
id
) q
This behavior is not defined by the standard and is specific to SQL Server.
In nested queries and inline (single-query) table valued functions, ORDER BY is not supported unless TOP is used with the SELECT.
SQL does not guarantee order of records unless you explicitly define it with ORDER BY, that's why ORDER BY really makes little sense in a nested query. If you want specific order, you should specify it in the outermost query.
This, however, may break queries ported from other engines. Normally, ORDER BY is allowed in nested queries and either propagates to the outermost query (like in Oracle) or no assumptions are made about the order (i. e. ORDER BY is just ignored silently).
There's no workaround, as you have to modify the nested query anyway by either removing ORDER BY:
SELECT *
FROM (
SELECT *
FROM mytable
WHERE col1 = 42
) q
or adding TOP 100 PERCENT:
SELECT *
FROM (
SELECT TOP 100 PERCENT
*
FROM mytable
WHERE col1 = 42
ORDER BY
id
) q
Both solutions make the optimizer engine ignore the ORDER BY.
1. Readers and writers block each other
SQL Server's locking system makes readers block writers and vice versa, sometimes even if they don't access the same record.
This behavior is not defined by the standard and is observed in SQL Server and MySQL (with MyISAM).
Any decent database system should exhibit ACID properties, that is make sure that its transactions are Atomic, Consistent, Isolated and Durable.
This, among other things, means that no query should change a database state partially (as visible from the outside). If a query updates 100 rows, no other session should ever see 30 rows updated and 70 not: it's always all or nothing.
Different systems achieve this with different means, but usually this boils down to one of the two options: create a copy of the record on update, or lock the record so that it's only accessible by one session.
SQL Server goes the second way. If a session updates a record, it creates a special object called lock, which prevents a concurrent access to this record. Even if a session reads a record, it places a lock on it too (though in a way which makes concurrent reads possible).
Locks are stored in memory and are quite expensive to maintain. So when their number grows high, SQL Server may decide to escalate them: make the locks protect not just the affected records, but a whole database page in which the records reside or even the whole table. This frees some memory but make the locks less selective, so that records which have never been touched by the query may get locked and become inaccessible to other sessions.
To work around this, SQL Server provides a special tranaction isolation level called SNAPSHOT. It does not rely on locking to achieve consistency, but rather makes a copy of each affected record in the tempdb (a special storage area where temporary data go). This allows writers not to block readers and vice versa, but increases storage requirements and I/O load on the system, and thus may affect performance.
No comments:
Post a Comment