Ich spreche beim #GlobalAzureBootcamp 2017 in Linz über Machine Learning

Es freut mich, bei einem internationalen Event dabei zu sein, ohne lange reisen zu müssen: Das Global Azure Bootcamp findet global am gleichen Tag (heuer am 22. April 2017) in über 240 Location statt – und eben unter anderem in Linz.

Rainer Stropek organisiert das Linzer Event bereits seit einigen Jahren und die wachsende Teilnehmeranzahl zeigt, dass das Theme Cloud im allgemeinen und Microsoft’s Angebot mit Azure im speziellen auf immer mehr Interesse stößt. Heuer ist neu, dass es parallel ein “Junior Bootcamp” gibt – die Gelegenheit, die Konferenz als Eltern-Kind – Tag zu verbringen. 🙂

Ich werde gleich in der Früh (um 9:00) mit einem “Aufruf” starten: “Überlasst das Lernen den Maschinen“. Neben einer Einführung in das “Wie” und “Warum” von “Machine Learning” werden wir gemeinsam einen konkreten Anwendungsfall mit Azure Machine Learning durchspielen. Später am Nachmittag wird dann Rina Ahmed zum gleichen Theme fortsetzen und zeigen, wie Sie Machine Learning in Ihren Anwendungen inkludieren können.

Wir sehen uns in Linz!

Ich spreche beim #GlobalAzureBootcamp 2017 in Linz über Machine Learning

I’m speaking at SQL Saturday Lisbon #sqlsatlisbon

I am looking forward to travel to Lisbon the upcoming weekend. This will be my very first time in Portugal – so I am very excited about the trip.

SQL Saturday Lisbon has invited me to speak about two topics (like roughly all of the accepted speakers – so my first impression about the Portugese is: they bleed us white :-)).

  • A Game of Hierarchies – From GROUP BY to Recursive CTE’s
    This talk covers basically two topics. First, how to apply different GROUP BY clauses to get detailed rows and summary rows in the very same query. Second, how to leverage Common Table Expressions (CTE’s) to visualize so-called parent-child hierarchies. Instead of the “usual” AdventureWorks-demos I invested time to build my own sample database, which is based on George R. R. Martin’s epos “A Game of Thrones”. You should be able to enjoy this session, either if you are interested in SQL solutions to query hierarchies, or if you are interested in the books or the HBO TV show. 🙂
  • My Favorite Pie (Chart): Simple Rules for Clear Visualizations
    In this session I pick up on the discussions if Pie Charts are good or bad on the one hand, and come up with a hand-full guidelines which will help you to check & improve your reports, charts & dashboards on the other hand. In this 60 minutes version of the talk I will not do any classic live demos, but the slides will be filled with good and bad visualisations which I am glad to discuss with you – be prepared for an interactive session. Let’s make Data Visualizations great again! 🙂

I hope I see you in Lisbon!

Sincerly yours,
Markus Ehrenmüller-Jensen

I’m speaking at SQL Saturday Lisbon #sqlsatlisbon

SQL Nexus 2017 in Copenhagen

I am very proud to announce that one of my submissions made it to SQL Nexus 2017 in Copenhagen. (As my wife is danish I feel always partly sort of “home” while being in Denmark – and can apply my danish language skills. And the best part is, that I can use my wife’s Dankort to pay for everything. 🙂 )

I will talk about the power of data vizualisation with the help of R. As visual perception of data is the key to understanding data, being able to plot data is a crucial skill. I will give a broad overview about what R offers on the one hand, and will dive deep into common visualisations on the other hand, so you get scripts at your hand, which you can apply immediatly.

Register soon for the conference, so you secure yourself a seat at an affordable price.

SQL Nexus 2017 in Copenhagen

Speaking in the 2. half of 2016

I am glad that a bunch of upcoming conferences have accepted my submissions. Therefore I have the pleasure of speaking at the following events in the upcoming days, weeks & months:

I also have submitted to SQL Saturdays in Prague (which will be the very first in Czech republic – so be sure not to miss this event) and Slovenia, too, but not heard if they have accepted me as a speaker until now.

And we are already in the middle of organizing the upcoming SQL Saturday in Vienna, which will be held Friday, January 20th 2017 (there will be a pre-con on Thursday, January 19th 2017). Watch out for details!

Speaking in the 2. half of 2016

24HOP: Summit Preview 2016

PASS-24HOP_SummitPreview2016_624x93b

I have the pleasure and honor not only to talk about of one of my favorite topics at the PASS Summit 2016 in Seattle but was also invited to give a sneak preview of this talk at the “24 Hours of PASS – Preview Edition“. As you can guess from the name of the event: It is featuring 24 consecutive one-hour webinars from upcoming PASS Summit 2016 sessions.

PASS_24HOPreview_Speaking2016_250x250

The title of my talk is “My Favorite Pie (Chart) – Simple Rules for Clear Visualizations“. As I will make clear very early in my talk: I have plenty of favorite pies – but do not really like any sort of pie chart. And as it is not enough to just criticize things you do not like and show disadvantages of – in this case – pie charts, I came up with easy-to-remember and easy-to-follow rules to generally improve visualizations. Those rules will help you to find out in which (rare) cases a pie chart would be the visualization of the choice and in which cases another (and which) type of visualization would be the better choice:

1.Use proper chart-type

2.Display as few information as possible, as much information as necessary

3.Encode accurately

4.Highlight important things

5.Calculate measures

Use proper chart-type

Many BI tools on the market offer plenty of chart-types. Unfortunately most of the tools are not very good in doing useful recommendations which chart to use on the data one is analyzing. On the other side, in my experience the “gut-feeling” of many users will not help very much in choosing the best visualization.

Therefore we will discuss the most common chart types – including “ordinary” tables, which have sometimes a bad reputation, despite to their usefulness in many  cases.

Display as few information as possible, as much information as necessary

The idea of visualizing data in form of tables or charts is to give insights to the report users. For this reason we should only show information on the screen which is necessary to achieve this goal. Unfortunately many tools available are not very good in coming up with useful defaults when creating a table or a chart.

Therefore we will discuss why we should reduce the “ink-factor” and watch out for proper scaling.

Encode accurately

Again: The idea of visualizing data is to give insights to the report users. These insights should be achieved intuitively – which can’t happen if the most important things are not shown in the most prominent way on the screen.

Therefore we will discuss sorting and the proper use of colors.

Highlight important things

We can help the report users tremendously when we not only show data, but give a hint about how good or bad the number actually is for the organization or him/her.

Therefore we will discuss how we can highlight those numbers where the user should be alarmed and set actions.

Calculate measures

I can’t remember how often I have seen people in offices in front of their high-end PC’s and laptops grabbing for an ordinary pocket calculator to compute a sum, a difference or a more or less complicated performance indicator from the numbers on their screen on this device. But I can remember that in any single case I was stunned and watched with my mouth open.

Therefore we will take some time to discuss how important it is to add calculated values in you visualization.

Call to Action

The talk will be online and seats are limited – so save the date (september 7th or 8th – depending on your time zone), be sure to register to “24 Hours of PASS – Preview Edition” as soon as possible and watch out for twitter tags #pass24hop and #sqlpass.

I am looking forward to have you in my talk!

 

Sincerly,
Markus Ehrenmüller-Jensen

PASS-24HOP_SummitPreview2016_624x93a

 

24HOP: Summit Preview 2016

“NOT” doesn’t negate a boolean expression

During a Slowly Changing Dimension 2-load I stumbled upon a problem, that putting NOT in front of a boolean expression did not negate the result. I first created a condition to find the rows, which had not changed, and then simply put a NOT in front of this condition, expecting that I now get the rows which had been changed. Unfortunately this did not work like as described. Are you curious? Then read on.

First things first. Let’s start with building up a demo environment.

Create a sample table

Create a sample table in TEMPDB and fill it with rows:

--prepare sample table
CREATE TABLE tempdb.dbo.MyDates (
    DestinationDate datetime2,
    SourceDate        datetime2
    )
INSERT INTO tempdb.dbo.MyDates
VALUES
({ts'2016-03-01 12:45:31'}, {ts'2016-03-01 12:45:31'}),
({ts'2016-03-01 12:45:31'}, {ts'2016-04-02 13:15:26'}),
(null,                      {ts'2016-04-02 13:15:26'}),
({ts'2016-03-01 12:45:31'}, null),
(null,                      null)
;

SELECT
    *
FROM
    tempdb.dbo.myDates;


DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000
NULL                        2016-04-02 13:15:26.0000000
2016-03-01 12:45:31.0000000 NULL
NULL                        NULL

Things start out good

Here is, how I started out. First query all rows where DestinationDate and SourceDate matches:

--List rows with identical Date
SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    DestinationDate = SourceDate;

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000

As you can see, according to ANSI standard NULL treatment, the row in which both of the columns are NULL are not listed. So we have to add extra-treatment for those rows:

--List rows with identical Date respecting NULL value
SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    DestinationDate = SourceDate or
    (DestinationDate is null and SourceDate is null);

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000
NULL                        NULL

Here the trouble begins

Now I have all the rows, which did not change, according to the business logic. I want to get all the rows, which have changed, according to the business logic. So I simply put a NOT in front of my WHERE condition to get “the other” rows:

SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    NOT (
    DestinationDate = SourceDate or
    (DestinationDate is null and SourceDate is null)
    );

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000

Wait, a second – there two are rows missing! Putting a simple NOT in front of the previous WHERE condition did not negate the boolean expression. But what is the reason for that? Why doesn’t NOT negate a TRUE to a FALSE and a FALSE to a TRUE?

How to visualize a boolean expression

So I build a query to show me both of the date columns and the result of a boolean expression which compares equality and inequality of both columns.

SELECT
    DestinationDate, 
    SourceDate,
    case
        when     (DestinationDate = SourceDate) then '1'
        else                                         '0'
        end [(DestinationDate = SourceDate)=TRUE?],
    case
        when          (DestinationDate is null) then '1'
        else                                         '0'
        end [(DestinationDate is null)=TRUE?],
    case
        when               (SourceDate is null) then '1'
        else                                         '0'
        end [(SourceDate is null)=TRUE?]

FROM
    tempdb.dbo.myDates;
DestinationDate             SourceDate                  (DestinationDate = SourceDate)=TRUE? (DestinationDate is null)=TRUE? (SourceDate is null)=TRUE?
--------------------------- --------------------------- ------------------------------------ ------------------------------- --------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 1                                    0                               0
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 0                                    0                               0
NULL                        2016-04-02 13:15:26.0000000 0                                    1                               0
2016-03-01 12:45:31.0000000 NULL                        0                                    0                               1
NULL                        NULL                        0                                    1                               1

 

And here is the result of the combined condition:

SELECT
    DestinationDate, 
    SourceDate,
    case
        when     (DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null) then '1'
        else                                                                                             '0'
        end [((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  ((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?
--------------------------- --------------------------- ------------------------------------------------------------------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 1
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 0
NULL                        2016-04-02 13:15:26.0000000 0
2016-03-01 12:45:31.0000000 NULL                        0
NULL                        NULL                        1

Everything seems to be fine.

Let’s extend the query by adding columns, which put a NOT in front of those expressions:

SELECT
    DestinationDate, 
    SourceDate,
    case
        when NOT (DestinationDate = SourceDate) then '1'
        else                                         '0'
        end [NOT (DestinationDate = SourceDate)=TRUE?],
    case
        when NOT      (DestinationDate is null) then '1'
        else                                         '0'
        end [NOT (DestinationDate is null)=TRUE?],
    case
        when NOT           (SourceDate is null) then '1'
        else                                         '0'
        end [NOT (SourceDate is null)=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  NOT (DestinationDate = SourceDate)=TRUE? NOT (DestinationDate is null)=TRUE? NOT (SourceDate is null)=TRUE?
--------------------------- --------------------------- ---------------------------------------- ----------------------------------- ------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 0                                        1                                   1
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 1                                        1                                   1
NULL                        2016-04-02 13:15:26.0000000 0                                        0                                   1
2016-03-01 12:45:31.0000000 NULL                        0                                        1                                   0
NULL                        NULL                        0                                        0                                   0

Here is the problem again. NOT did not negate/invert the outcome of the condition! “NOT (DestinationDate = SourceDate)=TRUE?” does’nt show the inverted result from “(DestinationDate = SourceDate)=TRUE?”; the last three rows are still not TRUE. This leads to the described unexpected result, which you can see here again:

SELECT
    DestinationDate, 
    SourceDate,
    case
        when NOT (DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null) then '1'
        else                                                                                             '0'
        end [NOT ((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  NOT ((DestinationDate = SourceDate) or (DestinationDate is null and SourceDate is null))=TRUE?
--------------------------- --------------------------- ----------------------------------------------------------------------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 0
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 1
NULL                        2016-04-02 13:15:26.0000000 0
2016-03-01 12:45:31.0000000 NULL                        0
NULL                        NULL                        1

The root cause of trouble

Before the problem is driving you crazy I will show you the root cause of the problem. The root cause is the simple fact, that boolean logic in SQL Server is not implemented as a two-value logic, but as a three-value logic.

Two-value logic would be a simple TRUE and FALSE.

In case of comparisions that means:
‘A’ = ‘A’ –> TRUE
‘A’ = ‘B’ –> FALSE

Negation works fine:
NOT   TRUE –> FALSE
NOT   FALSE –> TRUE

Three-value logic means, that there is, beside TRUE and FALSE, an additional NULL for unknown cases (i will call it UNKNOWN in the following paragraphs).

In case of comparisions that means:
‘A’ = ‘A’ –> TRUE
‘A’ = ‘B’ –> FALSE
‘A’ = null –> UNKNOWN

UNKNOWN means, that the outcome of the condition is neither TRUE nor FALSE. A condition resulting in UNKNOWN in a WHERE-clause will not show the rows. Furthermore, putting a NOT in front of an expression which results as UNKNOWN, stays UNKNOWN, and will not show those rows neither.

NOT   TRUE –> FALSE
NOT   FALSE –> TRUE
NOT   UNKNOWN –> UNKNOWN

You can see the consequences, when I expand the query to show boolean values correctly, respecting the three-value logic:

SELECT
    DestinationDate, 
    SourceDate,

    case
        when     (DestinationDate = SourceDate) then '1'
        when NOT (DestinationDate = SourceDate) then '0'
        else                                         'unknown'
        end [(DestinationDate = SourceDate)=TRUE?],
    case
        when          (DestinationDate is null) then '1'
        when NOT      (DestinationDate is null) then '0'
        else                                         'unknown'
        end [(DestinationDate is null)=TRUE?],
    case
        when               (SourceDate is null) then '1'
        when NOT           (SourceDate is null) then '0'
        else                                         'unknown'
        end [(SourceDate is null)=TRUE?],


     case
        when NOT (DestinationDate = SourceDate) then '1'
        when     (DestinationDate = SourceDate) then '0'
        else                                         'unknown'
        end [NOT (DestinationDate = SourceDate)=TRUE?],

    case
        when NOT (DestinationDate <> SourceDate) then '1'
        when     (DestinationDate <> SourceDate) then '0'
        else                                          'unknown'
        end [NOT (DestinationDate <> SourceDate)=TRUE?]

FROM
    tempdb.dbo.myDates;

DestinationDate             SourceDate                  (DestinationDate = SourceDate)=TRUE? (DestinationDate is null)=TRUE? (SourceDate is null)=TRUE? NOT (DestinationDate = SourceDate)=TRUE? NOT (DestinationDate <> SourceDate)=TRUE?
--------------------------- --------------------------- ------------------------------------ ------------------------------- -------------------------- ---------------------------------------- -----------------------------------------
2016-03-01 12:45:31.0000000 2016-03-01 12:45:31.0000000 1                                    0                               0                          0                                        1
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000 0                                    0                               0                          1                                        0
NULL                        2016-04-02 13:15:26.0000000 unknown                              1                               0                          unknown                                  unknown
2016-03-01 12:45:31.0000000 NULL                        unknown                              0                               1                          unknown                                  unknown
NULL                        NULL                        unknown                              1                               1                          unknown                                  unknown

 

Now let me sound smart :-):

Not to be TRUE does not mean that something is FALSE. And not to be FALSE does not mean that something is TRUE. As long as there is the UNKNOWN.

Those UNKNOWN values have an impact on boolean AND and OR as well (read NULL as UNKNOWN in the following table):

AND     TRUE    FALSE    NULL
TRUE    TRUE    FALSE    NULL
FALSE   FALSE   FALSE    FALSE
NULL    NULL    FALSE    NULL

When one part of the condition is FALSE, the outcome of the whole condition is FALSE, when combined with AND. Is one part TRUE and the other UNKNOWN, the result becomes UNKNOWN.

OR      TRUE    FALSE    NULL
TRUE    TRUE    TRUE     TRUE
FALSE   TRUE    FALSE    NULL
NULL    TRUE    NULL     NULL

When one part of the condition is TRUE, the outcome of the whole condition is TRUE, when combined with OR. Is one part FALSE and the other UNKNOWN, the result becomes UNKNOWN.


So the NULL values in either DestinationDate or SourceDate lead to a condition resulting as UNKNOWN. Which is OK in the very first example (when either of the two columns is NULL the condition is not TRUE, and the rows will not show up). The problem comes only, when I put a NOT in front of this condition. The condition for those rows (with either of the two date columns being NULL) will stay UNKNOWN and the rows will not be displayed.

Wrap up

Keep always in mind that there is more then TRUE and FALSE for your conditions. Due to a three-value logic of booleans in SQL Server (and most other relational databases) there is an addional UNKNOWN value, which stays UNKNOWN even if you try to negate it with a NOT. Here is the correct solution to find all rows, which have different values for DestinationDate and SourceDate.

SELECT
    *
FROM
    tempdb.dbo.myDates
WHERE
    DestinationDate <> SourceDate or
    (DestinationDate is not null and SourceDate is     null) or
    (DestinationDate is     null and SourceDate is not null);

DestinationDate             SourceDate
--------------------------- ---------------------------
2016-03-01 12:45:31.0000000 2016-04-02 13:15:26.0000000
NULL                        2016-04-02 13:15:26.0000000
2016-03-01 12:45:31.0000000 NULL

Now we get a list of all rows, which have different values for DestinationDate and SourceDate.

And don’t forget to clean up your TEMPDB:

drop table tempdb.dbo.MyDates;

I hope this post helped you a little in understanding the consequences of a three-value boolean logic and how to implement a Slowly Changing Dimension 2-load in case of having to handle NULL values.

Yours sincerly,
Markus Ehrenmueller

“NOT” doesn’t negate a boolean expression

Boosting DWH Performance with SQL Server 2016 at SQL Saturday Sofia 2015

I was very glad to return to Bulgaria within a couple of months to attend the SQL Saturday Sofia 2015 and beeing accepted for two talks. One of them covering how to use the improvements for ColumnStore technology in SQL Server 2016 to boost the performance of your DWH.

I was very excited when the xVelocity technology was introduced into both the relational database of SQL Server (as Non-Clustered ColumnStore Index) and Analysis Services (as Tabular model). As I was already stunned by the features available with the Power Pivot add-in (available since Excel 2010), this was a huge step, integrating a whole different way of storing direct into the heart of the database. (As opposed to the other vendors, which came up with vertical storage as well, but as another tool and not integrated in their core applications.)

SQL Server 2012 had unfortunately a lot of restrictions for the ColumnStore. The biggest of them are, that the index is available as a Non-Clustered Index only, and the the table becomes read-only.
SQL Server 2014 came up with lesser restrictions and introduced a Clustered ColumnStore Index, which is update-able, but letting the Non-Clustered Index still read-only.

For SQL Server 2016 we expect a whole bunch of improvements:

  • Less restrictions and performance-improvements to Non-Clustered & Clustered ColumnStore Indexes
  • Update-able Non-Clustered ColumnStore Index
  • Creating Non-Clustered B-tree Indexes on tables with Clustered ColumnStore Index
  • Creating ColumnStore Indexes on In-Memory OLTP tables

You can find my slide deck here.

Sincerly,
Markus Ehrenmüller-Jensen
@MEhrenmueller & @SQLederhose
markus.ehrenmueller@gmail.com

Boosting DWH Performance with SQL Server 2016 at SQL Saturday Sofia 2015

ColumnStore Index Best Practice at SQLdays

Last week I did two talks at SQLdays, Erding, Germany. One of them was about “ColumnStore Index Best Practice”. I basically set the frame by using a three-folded agenda:

  • ColumnStore
  • Non-Clustered ColumnStore Index
  • Clustered ColumnStore Index

It is important to understand the basic concepts of a ColumnStore, as opposed to the way data in a relational database was stored over the last decades. The latter is now called RowStore and didn’t have a special name until ColumnStore was introduced, because it was the one and only way how to store data.

After getting behind the “magic” of a ColumnStore it is much more easier to evaluate the use cases, where it is appropriate, and what to consider when maintaining this special kind of “index”. (Even the definition of the word “index” gets a new point of view, as rows in a ColumnStore are not sorted – which is conversed to a b-tree index.)

Non-Clustered ColumnStore Index was introduced in SQL Server 2012 and came with a whole list of limitations. One of the most discussed limitation was, that the table for which the Non-Clustered ColumnStore Index was created, becomes read-only.

Clustered ColumnStore Index was introduced in SQL Server 2014 and came with a better performance and a shorter list of limitations. One of the limitations we got rid of was, that the table for which a Clustered ColumnStore Index was created, was still updateable. (But in SQL 2014 tables with a Non-Clustered ColumnStore Index is still read-only).

You can find my slide deck here.

Sincerly,
Markus Ehrenmüller-Jensen
@MEhrenmueller & @SQLederhose
markus.ehrenmueller@gmail.com

ColumnStore Index Best Practice at SQLdays

Microsoft’s Business Intelligence Roadmap

Everybody expects that Microsoft will do great announcements during PASS Summit 2015 concerning their data platform. Even when there already have been a lot of announcements, previews and updates for the Business Intelligence suite in the weeks and months prior to the summit, the content of the keynote and Microsoft’s sessions during the conference exceeded my expectations in a very positive manner.

The Box is Back

“Cloud first” changed Microsoft’s release policy to develop and deploy new features first to the cloud and deliver them for on-prem (“box”) in later releases. This lead to a time gap and annoyed customers who would not use the cloud for one reason or another. With the features shown in various sessions at PASS Summit 2015, with the available features in CTP3 and with the announcements for SQL Server 2016 RTM you can clearly see: The box is back! New features are delivered every month – new Power BI visualisations (s. below) will be created even once a week.

Roadmap

When I think back to the sort-of-roadmap which was announced 2009 I dont’t get any positive feelings. The promise in this year was, to conoslidate the tools to only three: Excel & SharePoint as the front-end and SQL Server as the back-bone. But instead of consolidating the tools of the Business Intelligence stack which already consisted of a bunch of tools (Microsoft Office Excel, SharePoint Excel Services, SQL Server PowerPivot for Excel, SQL Server PowerPivot for SharePoint, SQL Server Reporting Services in native mode, SQL Server Reporting Services in SharePoint mode, ProClarity *sigh*, SharePoint PerformancePoint Services, SharePoint Power View) new tools have been introduced and/or acquired: new add-ins for Microsoft Office Excel: Power View, Power Map, Power Query two new cloud-experiences Office365 and PowerBI.com, Power BI desktop and DataZen. That’s not what I call “consolidating”.

Things changed for my at the PASS Summit 2015 for a better. After years we have now a very clear roadmap for the Business Intelligence tools. Every of the existing tools gets its own dedicated place, compagnioning each other:

  • Power BI Desktop for interactive reports
  • Excel for spreadsheets
  • Report Builder for pixel-perfect paginated reports
  • DataZen for mobile reports

SQL Service Reporting Services is the worlds succesful tool for building paginated reports and operational reports. It will be fully integrated with PowerBI by end of 2015 in a hybrid way. Reporting Services will duplicate the features of off-prem PowerBI for on-prem uses. Excel still has its important place within the BI stack and will fully integrate in PowerBI/Reporting Services. And DataZen will be integrated into PowerBI and Reporting Services to enhance the mobile experience.

The following features are not available in CTP3, but announced for “very soon” (which might mean by end of 2015):

  • Pin entire PowerBI Reports on a dashboard (including filters)
  • Upload & embedd Excel workbooks as a report onto Power BI and pin parts of the workbook to a dashboard
  • Publish PowerBI Desktop Reports to an on-prem Reporting Server
  • Publish DataZen Reports to an on-prem Reporting Server

End-user BI

Here is it, the new buzz word! After enabling Corporate BI and making Self-Service BI possible, we now speak of “End-user BI”. This means, that we make data available directly for the end-user. Microsofts tool for this is, of course, Power BI.

By end of october 2015 already 90.000 organizations in over 180 countries subscribed to PowerBI.com, which is an impressive success story.

Get Insights

As nobody of us digs data, but wants to gain insights (s. my lightning talk on “My Favorite Candy Bar (Chart)” at PASS Summit 2015), a new feature in PowerBI is very welcomed: “Get Insights”. It will analyze a data set to detect correlations, outliers, low and high values you might not have discovered on your own.

Bring Your Own Device (BYOD)

Through the tools acquired with DataZen, Business Intelligence goes mobile on widespread types of devices. Windows (Phone) is now not the only platform supported, but one among different ones: Mac OS, iOS & Android.

Power BI Enterprise Gateway

The Power BI Enterprise Gateway enables Power BI, which is a cloud service, to connect to your on-prem data. Combined with a live data source you can analyze your on-prem data without moving the data out to the cloud – which will help to get up-to-date analysis or even real-time reports.

Real-time Cubes

To have the possibility to analyse data immediately after it was generated, is a nice idea. Unfortunately, typical DWH scenarios are far away from this: Data from different sources is first extracted into a staging area, transformed and then loaded into a relational data warehouse. The process is usually done through night or on weekends. Because relational tables can be inconvenient to query for the average end-user (for either performance reasons or usability of table- and column-names, or both), many companies build up a cube through SQL Server Analysis Services on top of that, which adds up another step (and leads to even more time lag).

Fortunately two different features are helping out here: Using ColumnStore indexes for relational tables usually is speeding up queries on those tables and SQL Server Analysis Services is capable of both real-time online analytic processing (ROLAP in muldi-dimensional models) and DirectQuery (in tabular models). The performance of those queries will be improved in SQL Server 2016, while it did not have a very useful performance in previous versions (due to inefficient SQL statements generated and lack of any caching).

DAX improvements

The editor will have intelli-sense, syntax highlighting, allow to edit in multiple lines and intend text, and will allow comments. The language will be enhanced with 50 new functions and allow the use of variables. New type of relationships will be allowed (eg. 1:m to m:1 relationship) including bidirectional filtering through multiple tables.

Microsoft goes Open Source with Power BI Visualisations

Microsoft announced in October to open up the visualisation stack for “custom visualisations” and actually ran a competition to submit new visualisations for Power BI. Those new types of visualisations integrate perfectly with the existing ones and with each other, so you can eg. cross filter like you would expect. All the visualisations are available in both, Power BI Desktop and the PowerBI web site (visit visuals.powerbi.com).

Reporting Services

The last big investements into Reporting Services was with SQL Server 2008 R2, which is a couple of years ago. With SQL 2016 Reporting Services is back again. Report Manager will be totally overworked as a responsive webpage and will be based in HTML5. The first looks I had at PASS Summit 2015 have been very promising.

 

Yours sincerly,
Markus Ehrenmueller

Microsoft’s Business Intelligence Roadmap

SQLCAT: SQL Server 2016 CollumnStore Customer Scenarios and Best Practices (SQL PASS Summit 2015)

Improve loading a ColumnStore in SQL Server 2016 CTP3 by using the following guidelines:

  • Load into stage-tables per partition using WITH (TABLOCK) and switch those tables in afterwards
  • Separate your INSERT and UPDATE statements
  • For initial load create an empty table with Clustered ColumnStore on it (instead of loading a heap and create the index afterwards)
  • Include the parititioning column in the unique index/primary key (as you would with partitioned heaps)
  • Create statistics immmediately after initial data load (auto stats improvements are on their way, but not implemented yet)
  • Utilize the new possibility to combine Clustered ColumnStore Indexes with b-tree Non-Clustered Indexes where apropriate
  • Clustered ColumnStore Index with a b-tree index on, slows down the load (therefore create the non-clustered b-tree after load; CTP3 allows for parallel non-clustered index creation)
  • REBUILD of ColumnStore is not an online operation, therefore apply it on partition level only
  • Try trace flag 9481 to force the old cardinality estimator for performance-outliers
  • SQL Server Integration Services can lead to trimmed row groups
  • Set AutoAdjustBufferSize on in SQL Server Integration Services to avoid small uncompressed delta stores
  • Use compatibility level 130 when testing ColumnStore
SQLCAT: SQL Server 2016 CollumnStore Customer Scenarios and Best Practices (SQL PASS Summit 2015)