“NOT” doesn’t negate a boolean expression

During a Slowly Changing Dimension 2-load I stumbled upon a problem, that putting NOT in front of a boolean expression did not negate the result. I first created a condition to find the rows, which had not changed, and then simply put a NOT in front of this condition, expecting that I now get the rows which had been changed. Unfortunately this did not work like as described. Are you curious? Then read on.

First things first. Let’s start with building up a demo environment.

Create a sample table

Create a sample table in TEMPDB and fill it with rows:

--prepare sample table
CREATE TABLE tempdb.dbo.MyDates (
    DestinationDate datetime2,
    SourceDate        datetime2
    )
INSERT INTO tempdb.dbo.MyDates
VALUES
({ts'2016-03-01 12:45:31'}, {ts'2016-03-01 12:45:31'}),
({ts'2016-03-01 12:45:31'}, {ts'2016-04-02 13:15:26'}),
(null,                      {ts'2016-04-02 13:15:26'}),
({ts'2016-03-01 12:45:31'}, null),
(null,                      null)
;

SELECT
    *
FROM
    tempdb.dbo.myDates;


DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000
NULL                        2016-04-02 13:15:26.0000000
2016-03-01 12:45:31.0000000 NULL
NULL                        NULL

Things start out good

Here is, how I started out. First query all rows where DestinationDate and SourceDate matches:

--List rows with identical Date
SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    DestinationDate = SourceDate;

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000

As you can see, according to ANSI standard NULL treatment, the row in which both of the columns are NULL are not listed. So we have to add extra-treatment for those rows:

--List rows with identical Date respecting NULL value
SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    DestinationDate = SourceDate or
    (DestinationDate is null and SourceDate is null);

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000
NULL                        NULL

Here the trouble begins

Now I have all the rows, which did not change, according to the business logic. I want to get all the rows, which have changed, according to the business logic. So I simply put a NOT in front of my WHERE condition to get “the other” rows:

SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    NOT (
    DestinationDate = SourceDate or
    (DestinationDate is null and SourceDate is null)
    );

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000

Wait, a second – there two are rows missing! Putting a simple NOT in front of the previous WHERE condition did not negate the boolean expression. But what is the reason for that? Why doesn’t NOT negate a TRUE to a FALSE and a FALSE to a TRUE?

How to visualize a boolean expression

So I build a query to show me both of the date columns and the result of a boolean expression which compares equality and inequality of both columns.

SELECT
    DestinationDate, 
    SourceDate,
    case
        when     (DestinationDate = SourceDate) then '1'
        else                                         '0'
        end [(DestinationDate = SourceDate)=TRUE?],
    case
        when          (DestinationDate is null) then '1'
        else                                         '0'
        end [(DestinationDate is null)=TRUE?],
    case
        when               (SourceDate is null) then '1'
        else                                         '0'
        end [(SourceDate is null)=TRUE?]

FROM
    tempdb.dbo.myDates;
DestinationDate             SourceDate                  (DestinationDate = SourceDate)=TRUE? (DestinationDate is null)=TRUE? (SourceDate is null)=TRUE?
--------------------------- --------------------------- ------------------------------------ ------------------------------- --------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 1                                    0                               0
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 0                                    0                               0
NULL                        2016-04-02 13:15:26.0000000 0                                    1                               0
2016-03-01 12:45:31.0000000 NULL                        0                                    0                               1
NULL                        NULL                        0                                    1                               1

 

And here is the result of the combined condition:

SELECT
    DestinationDate, 
    SourceDate,
    case
        when     (DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null) then '1'
        else                                                                                             '0'
        end [((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  ((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?
--------------------------- --------------------------- ------------------------------------------------------------------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 1
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 0
NULL                        2016-04-02 13:15:26.0000000 0
2016-03-01 12:45:31.0000000 NULL                        0
NULL                        NULL                        1

Everything seems to be fine.

Let’s extend the query by adding columns, which put a NOT in front of those expressions:

SELECT
    DestinationDate, 
    SourceDate,
    case
        when NOT (DestinationDate = SourceDate) then '1'
        else                                         '0'
        end [NOT (DestinationDate = SourceDate)=TRUE?],
    case
        when NOT      (DestinationDate is null) then '1'
        else                                         '0'
        end [NOT (DestinationDate is null)=TRUE?],
    case
        when NOT           (SourceDate is null) then '1'
        else                                         '0'
        end [NOT (SourceDate is null)=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  NOT (DestinationDate = SourceDate)=TRUE? NOT (DestinationDate is null)=TRUE? NOT (SourceDate is null)=TRUE?
--------------------------- --------------------------- ---------------------------------------- ----------------------------------- ------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 0                                        1                                   1
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 1                                        1                                   1
NULL                        2016-04-02 13:15:26.0000000 0                                        0                                   1
2016-03-01 12:45:31.0000000 NULL                        0                                        1                                   0
NULL                        NULL                        0                                        0                                   0

Here is the problem again. NOT did not negate/invert the outcome of the condition! “NOT (DestinationDate = SourceDate)=TRUE?” does’nt show the inverted result from “(DestinationDate = SourceDate)=TRUE?”; the last three rows are still not TRUE. This leads to the described unexpected result, which you can see here again:

SELECT
    DestinationDate, 
    SourceDate,
    case
        when NOT (DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null) then '1'
        else                                                                                             '0'
        end [NOT ((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  NOT ((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?
--------------------------- --------------------------- ----------------------------------------------------------------------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 0
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 1
NULL                        2016-04-02 13:15:26.0000000 0
2016-03-01 12:45:31.0000000 NULL                        0
NULL                        NULL                        1

The root cause of trouble

Before the problem is driving you crazy I will show you the root cause of the problem. The root cause is the simple fact, that boolean logic in SQL Server is not implemented as a two-value logic, but as a three-value logic.

Two-value logic would be a simple TRUE and FALSE.

In case of comparisions that means:
‘A’ = ‘A’ –> TRUE
‘A’ = ‘B’ –> FALSE

Negation works fine:
NOT   TRUE –> FALSE
NOT   FALSE –> TRUE

Three-value logic means, that there is, beside TRUE and FALSE, an additional NULL for unknown cases (i will call it UNKNOWN in the following paragraphs).

In case of comparisions that means:
‘A’ = ‘A’ –> TRUE
‘A’ = ‘B’ –> FALSE
‘A’ = null –> UNKNOWN

UNKNOWN means, that the outcome of the condition is neither TRUE nor FALSE. A condition resulting in UNKNOWN in a WHERE-clause will not show the rows. Furthermore, putting a NOT in front of an expression which results as UNKNOWN, stays UNKNOWN, and will not show those rows neither.

NOT   TRUE –> FALSE
NOT   FALSE –> TRUE
NOT   UNKNOWN –> UNKNOWN

You can see the consequences, when I expand the query to show boolean values correctly, respecting the three-value logic:

SELECT
    DestinationDate, 
    SourceDate,

    case
        when     (DestinationDate = SourceDate) then '1'
        when NOT (DestinationDate = SourceDate) then '0'
        else                                         'unknown'
        end [(DestinationDate = SourceDate)=TRUE?],
    case
        when          (DestinationDate is null) then '1'
        when NOT      (DestinationDate is null) then '0'
        else                                         'unknown'
        end [(DestinationDate is null)=TRUE?],
    case
        when               (SourceDate is null) then '1'
        when NOT           (SourceDate is null) then '0'
        else                                         'unknown'
        end [(SourceDate is null)=TRUE?],


     case
        when NOT (DestinationDate = SourceDate) then '1'
        when     (DestinationDate = SourceDate) then '0'
        else                                         'unknown'
        end [NOT (DestinationDate = SourceDate)=TRUE?],

    case
        when NOT (DestinationDate <> SourceDate) then '1'
        when     (DestinationDate <> SourceDate) then '0'
        else                                          'unknown'
        end [NOT (DestinationDate <> SourceDate)=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  (DestinationDate = SourceDate)=TRUE? (DestinationDate is null)=TRUE? (SourceDate is null)=TRUE? NOT (DestinationDate = SourceDate)=TRUE? NOT (DestinationDate <> SourceDate)=TRUE?
--------------------------- --------------------------- ------------------------------------ ------------------------------- -------------------------- ---------------------------------------- -----------------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 1                                    0                               0                          0                                        1
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 0                                    0                               0                          1                                        0
NULL                        2016-04-02 13:15:26.0000000 unknown                              1                               0                          unknown                                  unknown
2016-03-01 12:45:31.0000000 NULL                        unknown                              0                               1                          unknown                                  unknown
NULL                        NULL                        unknown                              1                               1                          unknown                                  unknown

 

Now let me sound smart :-):

Not to be TRUE does not mean that something is FALSE. And not to be FALSE does not mean that something is TRUE. As long as there is the UNKNOWN.

Those UNKNOWN values have an impact on boolean AND and OR as well (read NULL as UNKNOWN in the following table):

AND     TRUE    FALSE    NULL
TRUE    TRUE    FALSE    NULL
FALSE   FALSE   FALSE    FALSE
NULL    NULL    FALSE    NULL

When one part of the condition is FALSE, the outcome of the whole condition is FALSE, when combined with AND. Is one part TRUE and the other UNKNOWN, the result becomes UNKNOWN.

OR      TRUE    FALSE    NULL
TRUE    TRUE    TRUE     TRUE
FALSE   TRUE    FALSE    NULL
NULL    TRUE    NULL     NULL

When one part of the condition is TRUE, the outcome of the whole condition is TRUE, when combined with OR. Is one part FALSE and the other UNKNOWN, the result becomes UNKNOWN.


So the NULL values in either DestinationDate or SourceDate lead to a condition resulting as UNKNOWN. Which is OK in the very first example (when either of the two columns is NULL the condition is not TRUE, and the rows will not show up). The problem comes only, when I put a NOT in front of this condition. The condition for those rows (with either of the two date columns being NULL) will stay UNKNOWN and the rows will not be displayed.

Wrap up

Keep always in mind that there is more then TRUE and FALSE for your conditions. Due to a three-value logic of booleans in SQL Server (and most other relational databases) there is an addional UNKNOWN value, which stays UNKNOWN even if you try to negate it with a NOT. Here is the correct solution to find all rows, which have different values for DestinationDate and SourceDate.

SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    DestinationDate <> SourceDate or
    (DestinationDate is not null and SourceDate is     null) or
    (DestinationDate is     null and SourceDate is not null);

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000
NULL                        2016-04-02 13:15:26.0000000
2016-03-01 12:45:31.0000000 NULL

Now we get a list of all rows, which have different values for DestinationDate and SourceDate.

And don’t forget to clean up your TEMPDB:

drop table tempdb.dbo.MyDates;

I hope this post helped you a little in understanding the consequences of a three-value boolean logic and how to implement a Slowly Changing Dimension 2-load in case of having to handle NULL values.

Yours sincerly,
Markus Ehrenmueller

Advertisements
“NOT” doesn’t negate a boolean expression

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s