10 things in SQL Server (which don't work as expected)

10. Unique NULL

http://goo.gl/XHiAY

When declaring UNIQUE constraints on columns, SQL Server treats all NULL values as unique. This means you can't have more than one NULL value in a unique column.

This behavior violates the standard and is specific to SQL Server.

To work around this, implement the constraint by declaring a UNIQUE index which would exclude NULL values:

CREATE UNIQUE INDEX
        ux_mytable_mycolumn
ON      mytable (mycolumn)
WHERE   mycolumn IS NOT NULL

Note that this will prevent using the index for searching NULL values.

Fortunately, to search for a NULL value (even in a variable) you have to use a special predicate, IS NULL, as NULL = NULL does not evaluate to TRUE and hence won't satisfy a WHERE condition.

Thus, you can create an additional (non-unique) index:

CREATE INDEX
        ix_mytable_mycolumn_null
ON      mytable (mycolumn)
WHERE   mycolumn IS NULL

which will be used for IS NULL searches.

9. NULLS FIRST

http://goo.gl/62ExV

When using ORDER BY on a column containing NULL values, these values come out first.

This behavior is not specified by the standard, and is observed in SQL Server and MySQL.

Oracle and PostgreSQL sort NULL values last by default. Also, those two RDBMS support an extension to ORDER BY clause: each expression accepts possible modifiers NULLS FIRST and NULLS LAST, instructing the engine how to sort the NULL values returned by this expression. These modifiers can also be used in index definitions, thus defining record order not only in queries, but also in indexes.

To work around this, use an additional ordering expression in queries:

SELECT  *
FROM    mytable
ORDER BY
        CASE WHEN mycolumn IS NULL THEN 1 END,
        mycolumn

Unfortunately, SQL Server, unlike Oracle and PostgreSQL, does not allow creating indexes over expressions. So in order to use it in an index, you would have to add a computed column to the table:

ALTER TABLE
        mytable
ADD     mycolumn_nulls_last AS CASE WHEN mycolumn IS NULL THEN 1 END
GO

CREATE INDEX
        ix_mytable_mycolumn_nulls_last
ON      mytable (mycolumn_nulls_last, mycolumn)
GO

For this index to be usable in the queries, you should always use mycolumn_nulls_last in the ORDER BY as well:

SELECT  *
FROM    mytable
ORDER BY
        mycolumn_nulls_last,
        mycolumn

8. Multiple columns in IN

http://goo.gl/0yCqD

You can't use multiple columns in the IN predicates. This query:

SELECT  *
FROM    mytable
WHERE   (col1, col2) IN
        (
        SELECT  col1, col2
        FROM    othertable
        )

won't work.

This behavior violates the standard and is specific to SQL Server.

To work around this, replace IN with EXISTS:

SELECT  *
FROM    mytable m
WHERE   EXISTS
        (
        SELECT  NULL
        FROM    othertable o
        WHERE   o.col1 = m.col1
                AND o.col2 = m.col2
        )

Note that this only applies to IN, as opposed to NOT IN.

NOT IN is a little bit different from NOT EXISTS in the way it handles NULL values.

To emulate this NOT IN query:

SELECT  *
FROM    mytable m
WHERE   (col1, col2) NOT IN
        (
        SELECT  col1, col2
        FROM    othertable
        )

we would have to use this:

SELECT  *
FROM    mytable m
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    othertable o
        WHERE   o.col1 = m.col1
                AND o.col2 = m.col2
        )
        AND NOT EXISTS
        (
        SELECT  NULL
        FROM    othertable o
        WHERE   o.col1 IS NULL
                OR
                o.col2 IS NULL
        )

The second predicate makes sure that othertable does not have NULL values in eithercol1 or col2. Any such value would make the original query to return no records at all, despite what the other values are.

7. No USING clause

http://goo.gl/Fx0je

SQL Server does not support USING clause in joins.

This behavior violates the standard and is specific to SQL Server.

USING clause allows you to write joins like this:

SELECT  *
FROM    mytable
JOIN    othertable
USING   (id)

instead of

SELECT  *
FROM    mytable m
JOIN    othertable o
USING   o.id = m.id

Note that those two queries are not identical: the first one will only return id once.

This is usually not much of an issue, however there's a case when this clause really helps: a FULL JOIN.

Imagine a PostgreSQL query like this:

SELECT  *
FROM    table1
FULL JOIN
        table2
USING   (id)
FULL JOIN
        table3
USING   (id)
ORDER BY
        id
LIMIT   10

This is pretty self-explanatory: it takes first 10 records in id order from any of the three tables, filling potentially missing records in other two tables with NULL values if needed.

Let's try to make the same in SQL Server:

SELECT  TOP 10
        *
FROM    table1 t1
FULL JOIN
        table2 t2
ON      t2.id = t1.id
FULL JOIN
        table3 t3
ON      t3.id = COALESCE(t1.id, t2.id)
ORDER BY
        COALESCE(t1.id, t2.id, t3.id)

As you can see, we need to use an ugly COALESCE for joining and ordering, as with the FULL join either table can yield a NULL in id. This also will prevent the optimizer from using a MERGE JOIN which is the fastest method provided that all id columns are indexed.

To benefit from MERGE JOIN, we need to rewrite the query like this:

SELECT  *
FROM    (
        SELECT  TOP 10
                id
        FROM    (
                SELECT  id
                FROM    table1
                UNION ALL
                SELECT  id
                FROM    table2
                UNION
                SELECT  id
                FROM    table3
                ) q
        ORDER BY
                id
        ) q
LEFT JOIN
        table1 t1
ON      t1.id = q.id
LEFT JOIN
        table2 t2
ON      t2.id = q.id
LEFT JOIN
        table3 t3
ON      t3.id = q.id

SQL Server will be able to use MERGE UNION instead of MERGE JOIN, which is just as good for our purposes.

6. String concatenation

http://goo.gl/BTaVx

Contatenating strings is performed using the same operator + which is used for adding numerics. This may lead to confusion when the numbers and strings are mixed.

This behavior is not defined by the standard and is specific to SQL Server.

When attempting to concatenate a numeric and a string, bear in mind that it's the string which will be cast to a numeric and added, rather than the numeric cast to a string and concatenated.

Operators are evaluated left to right.

Here are some queries:

SELECT  '1' + 1
--
2

A string plus numeric yields a numeric and the string is cast to a numeric.

SELECT  1 + '1'
--
2

A numeric plus string yields a numeric and the string is cast to a numeric.

SELECT  '3' + '2'
--
'32'

A string plus string yields a string, and the strings are concatenated.

SELECT  '1' + '2' + 3
--
15

A string plus string plus numeric yields a numeric. First, two strings are concatenated (yielding '12'), then the resulting string is cast into a numeric and added to 3 (yielding 15)

SELECT  '1' + 'a' + 3
--
Conversion failed when converting the varchar value '1a' to data type int.: SELECT '1' + 'a' + 3

First, the two strings are concatenated, yielding '1a'. Then, an attempt to cast the resulting string to a numeric is made, which fails and yields the error above.

If you want to concatenate a numeric and a string, explicitly cast the former to the latter:

SELECT  'Value' + CAST(123 AS CHAR)
--
'Value123'

5. MAX(BIT)

http://goo.gl/jLl4S

SQL Server supports a special datatype for storing booleans, called BIT. However, you cannot do aggregate AND or aggregate OR on those fields (usually those are done with MAX and MIN).

This behavior is not defined by the standard and is specific to SQL Server.

BIT datatype is designed for storing booleans. Up to 8 bit values in a record are stored in a single byte, 9 to 16 values are stored in two bytes etc, making this type very storage-efficient.

However if we tried to aggregate those values, we would encounter an error:

CREATE TABLE
        bitvalues
        (
        id INT NOT NULL PRIMARY KEY,
        v1 BIT,
        v2 BIT,
        v3 BIT,
        )

CREATE INDEX
        ix_bitvalues_v1
ON      bitvalues (v1)

CREATE INDEX
        ix_bitvalues_v2
ON      bitvalues (v2)

CREATE INDEX
        ix_bitvalues_v3
ON      bitvalues (v3)

INSERT
INTO    bitvalues
VALUES  (1, 0, 0, 1),
        (2, 0, 1, 1)

SELECT  MAX(v1), MAX(v2), MAX(v3),
        MIN(v1), MIN(v2), MIN(v3)
FROM    bitvalues
--
Operand data type bit is invalid for max operator.: SELECT MAX(v1), MAX(v2), MAX(v3), MIN(v1), MIN(v2), MIN(v3) FROM bitvalues

A simple way would be casting them to integers:

SELECT  MAX(CAST(v1 AS INT)),
        MAX(CAST(v2 AS INT)),
        MAX(CAST(v3 AS INT)),
        MIN(CAST(v1 AS INT)),
        MIN(CAST(v2 AS INT)),
        MIN(CAST(v3 AS INT))
FROM    bitvalues

but this would prevent indexes from being used.

A more efficient way would be using TOP 1 / ORDER BY instead of MAX and MIN. Ordering, unlike MAX and MIN, does work on bits:

SELECT  (
        SELECT  TOP 1
                v1
        FROM    bitvalues
        ORDER BY
                v1 DESC
        ),
        (
        SELECT  TOP 1
                v2
        FROM    bitvalues
        ORDER BY
                v2 DESC
        ),
        (
        SELECT  TOP 1
                v3
        FROM    bitvalues
        ORDER BY
                v1 DESC
        ),
        (
        SELECT  TOP 1
                v1
        FROM    bitvalues
        ORDER BY
                v1
        ),
        (
        SELECT  TOP 1
                v2
        FROM    bitvalues
        ORDER BY
                v2
        ),
        (
        SELECT  TOP 1
                v3
        FROM    bitvalues
        ORDER BY
                v3
        )

4. TRIM

http://goo.gl/AjTtZ

SQL Server does not implement TRIM function which strips whitespace characters from both ends of a string.

This behavior violates the standard and is specific to SQL Server.

Just use a combination of LTRIM and RTRIM:

SELECT  LTRIM(RTRIM('  my string with spaces  '))
--
'my string with spaces'

3. LEAST and GREATEST

http://goo.gl/K7YDV

SQL Server does not support LEAST and GREATEST: the functions which return the least or the greatest value from a list of arguments, appropriately.

This is not specified by the standard and is specific to SQL Server.

To emulate those functions, one can use nested CASE statements:

SELECT  CASE WHEN x > y THEN x ELSE y END
FROM    mytable

but this is not really readable and prone to errors even with three columns (to say nothing of more).

A better way would be using a MAX or MIN in a subquery, composed of union'ed queries returning the appropriate fields:

SELECT  (
        SELECT  MIN(v)
        FROM    (
                SELECT  x
                UNION ALL
                SELECT  y
                UNION ALL
                SELECT  z
                UNION ALL
                SELECT  t
                ) q(v)
        )
FROM    mytable

2. ORDER BY in nested queries

http://goo.gl/sTXFc

SQL Server does not support ORDER BY in nested queries. This query won't work:

SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        WHERE   col1 = 42
        ORDER BY
                id
        ) q

This behavior is not defined by the standard and is specific to SQL Server.

In nested queries and inline (single-query) table valued functions, ORDER BY is not supported unless TOP is used with the SELECT.

SQL does not guarantee order of records unless you explicitly define it with ORDER BY, that's why ORDER BY really makes little sense in a nested query. If you want specific order, you should specify it in the outermost query.

This, however, may break queries ported from other engines. Normally, ORDER BY is allowed in nested queries and either propagates to the outermost query (like in Oracle) or no assumptions are made about the order (i. e. ORDER BY is just ignored silently).

There's no workaround, as you have to modify the nested query anyway by either removing ORDER BY:

SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        WHERE   col1 = 42
        ) q

or adding TOP 100 PERCENT:

SELECT  *
FROM    (
        SELECT  TOP 100 PERCENT
                *
        FROM    mytable
        WHERE   col1 = 42
        ORDER BY
                id
        ) q

Both solutions make the optimizer engine ignore the ORDER BY.

1. Readers and writers block each other

http://goo.gl/GZ2jM

SQL Server's locking system makes readers block writers and vice versa, sometimes even if they don't access the same record.

This behavior is not defined by the standard and is observed in SQL Server and MySQL (with MyISAM).

Any decent database system should exhibit ACID properties, that is make sure that its transactions are Atomic, Consistent, Isolated and Durable.

This, among other things, means that no query should change a database state partially (as visible from the outside). If a query updates 100 rows, no other session should ever see 30 rows updated and 70 not: it's always all or nothing.

Different systems achieve this with different means, but usually this boils down to one of the two options: create a copy of the record on update, or lock the record so that it's only accessible by one session.

SQL Server goes the second way. If a session updates a record, it creates a special object called lock, which prevents a concurrent access to this record. Even if a session reads a record, it places a lock on it too (though in a way which makes concurrent reads possible).

Locks are stored in memory and are quite expensive to maintain. So when their number grows high, SQL Server may decide to escalate them: make the locks protect not just the affected records, but a whole database page in which the records reside or even the whole table. This frees some memory but make the locks less selective, so that records which have never been touched by the query may get locked and become inaccessible to other sessions.

To work around this, SQL Server provides a special tranaction isolation level called SNAPSHOT. It does not rely on locking to achieve consistency, but rather makes a copy of each affected record in the tempdb (a special storage area where temporary data go). This allows writers not to block readers and vice versa, but increases storage requirements and I/O load on the system, and thus may affect performance.

SQLSERVER

Wednesday, December 24, 2014

10 things in SQL Server (which don't work as expected)

10. Unique NULL

9. NULLS FIRST

8. Multiple columns in IN

7. No USING clause

6. String concatenation

5. MAX(BIT)

4. TRIM

3. LEAST and GREATEST

2. ORDER BY in nested queries

1. Readers and writers block each other