ColumnStore Index Best Practice at SQLdays

Last week I did two talks at SQLdays, Erding, Germany. One of them was about “ColumnStore Index Best Practice”. I basically set the frame by using a three-folded agenda:

  • ColumnStore
  • Non-Clustered ColumnStore Index
  • Clustered ColumnStore Index

It is important to understand the basic concepts of a ColumnStore, as opposed to the way data in a relational database was stored over the last decades. The latter is now called RowStore and didn’t have a special name until ColumnStore was introduced, because it was the one and only way how to store data.

After getting behind the “magic” of a ColumnStore it is much more easier to evaluate the use cases, where it is appropriate, and what to consider when maintaining this special kind of “index”. (Even the definition of the word “index” gets a new point of view, as rows in a ColumnStore are not sorted – which is conversed to a b-tree index.)

Non-Clustered ColumnStore Index was introduced in SQL Server 2012 and came with a whole list of limitations. One of the most discussed limitation was, that the table for which the Non-Clustered ColumnStore Index was created, becomes read-only.

Clustered ColumnStore Index was introduced in SQL Server 2014 and came with a better performance and a shorter list of limitations. One of the limitations we got rid of was, that the table for which a Clustered ColumnStore Index was created, was still updateable. (But in SQL 2014 tables with a Non-Clustered ColumnStore Index is still read-only).

You can find my slide deck here.

Sincerly,
Markus Ehrenmüller-Jensen
@MEhrenmueller & @SQLederhose
markus.ehrenmueller@gmail.com

ColumnStore Index Best Practice at SQLdays

Microsoft’s Business Intelligence Roadmap

Everybody expects that Microsoft will do great announcements during PASS Summit 2015 concerning their data platform. Even when there already have been a lot of announcements, previews and updates for the Business Intelligence suite in the weeks and months prior to the summit, the content of the keynote and Microsoft’s sessions during the conference exceeded my expectations in a very positive manner.

The Box is Back

“Cloud first” changed Microsoft’s release policy to develop and deploy new features first to the cloud and deliver them for on-prem (“box”) in later releases. This lead to a time gap and annoyed customers who would not use the cloud for one reason or another. With the features shown in various sessions at PASS Summit 2015, with the available features in CTP3 and with the announcements for SQL Server 2016 RTM you can clearly see: The box is back! New features are delivered every month – new Power BI visualisations (s. below) will be created even once a week.

Roadmap

When I think back to the sort-of-roadmap which was announced 2009 I dont’t get any positive feelings. The promise in this year was, to conoslidate the tools to only three: Excel & SharePoint as the front-end and SQL Server as the back-bone. But instead of consolidating the tools of the Business Intelligence stack which already consisted of a bunch of tools (Microsoft Office Excel, SharePoint Excel Services, SQL Server PowerPivot for Excel, SQL Server PowerPivot for SharePoint, SQL Server Reporting Services in native mode, SQL Server Reporting Services in SharePoint mode, ProClarity *sigh*, SharePoint PerformancePoint Services, SharePoint Power View) new tools have been introduced and/or acquired: new add-ins for Microsoft Office Excel: Power View, Power Map, Power Query two new cloud-experiences Office365 and PowerBI.com, Power BI desktop and DataZen. That’s not what I call “consolidating”.

Things changed for my at the PASS Summit 2015 for a better. After years we have now a very clear roadmap for the Business Intelligence tools. Every of the existing tools gets its own dedicated place, compagnioning each other:

  • Power BI Desktop for interactive reports
  • Excel for spreadsheets
  • Report Builder for pixel-perfect paginated reports
  • DataZen for mobile reports

SQL Service Reporting Services is the worlds succesful tool for building paginated reports and operational reports. It will be fully integrated with PowerBI by end of 2015 in a hybrid way. Reporting Services will duplicate the features of off-prem PowerBI for on-prem uses. Excel still has its important place within the BI stack and will fully integrate in PowerBI/Reporting Services. And DataZen will be integrated into PowerBI and Reporting Services to enhance the mobile experience.

The following features are not available in CTP3, but announced for “very soon” (which might mean by end of 2015):

  • Pin entire PowerBI Reports on a dashboard (including filters)
  • Upload & embedd Excel workbooks as a report onto Power BI and pin parts of the workbook to a dashboard
  • Publish PowerBI Desktop Reports to an on-prem Reporting Server
  • Publish DataZen Reports to an on-prem Reporting Server

End-user BI

Here is it, the new buzz word! After enabling Corporate BI and making Self-Service BI possible, we now speak of “End-user BI”. This means, that we make data available directly for the end-user. Microsofts tool for this is, of course, Power BI.

By end of october 2015 already 90.000 organizations in over 180 countries subscribed to PowerBI.com, which is an impressive success story.

Get Insights

As nobody of us digs data, but wants to gain insights (s. my lightning talk on “My Favorite Candy Bar (Chart)” at PASS Summit 2015), a new feature in PowerBI is very welcomed: “Get Insights”. It will analyze a data set to detect correlations, outliers, low and high values you might not have discovered on your own.

Bring Your Own Device (BYOD)

Through the tools acquired with DataZen, Business Intelligence goes mobile on widespread types of devices. Windows (Phone) is now not the only platform supported, but one among different ones: Mac OS, iOS & Android.

Power BI Enterprise Gateway

The Power BI Enterprise Gateway enables Power BI, which is a cloud service, to connect to your on-prem data. Combined with a live data source you can analyze your on-prem data without moving the data out to the cloud – which will help to get up-to-date analysis or even real-time reports.

Real-time Cubes

To have the possibility to analyse data immediately after it was generated, is a nice idea. Unfortunately, typical DWH scenarios are far away from this: Data from different sources is first extracted into a staging area, transformed and then loaded into a relational data warehouse. The process is usually done through night or on weekends. Because relational tables can be inconvenient to query for the average end-user (for either performance reasons or usability of table- and column-names, or both), many companies build up a cube through SQL Server Analysis Services on top of that, which adds up another step (and leads to even more time lag).

Fortunately two different features are helping out here: Using ColumnStore indexes for relational tables usually is speeding up queries on those tables and SQL Server Analysis Services is capable of both real-time online analytic processing (ROLAP in muldi-dimensional models) and DirectQuery (in tabular models). The performance of those queries will be improved in SQL Server 2016, while it did not have a very useful performance in previous versions (due to inefficient SQL statements generated and lack of any caching).

DAX improvements

The editor will have intelli-sense, syntax highlighting, allow to edit in multiple lines and intend text, and will allow comments. The language will be enhanced with 50 new functions and allow the use of variables. New type of relationships will be allowed (eg. 1:m to m:1 relationship) including bidirectional filtering through multiple tables.

Microsoft goes Open Source with Power BI Visualisations

Microsoft announced in October to open up the visualisation stack for “custom visualisations” and actually ran a competition to submit new visualisations for Power BI. Those new types of visualisations integrate perfectly with the existing ones and with each other, so you can eg. cross filter like you would expect. All the visualisations are available in both, Power BI Desktop and the PowerBI web site (visit visuals.powerbi.com).

Reporting Services

The last big investements into Reporting Services was with SQL Server 2008 R2, which is a couple of years ago. With SQL 2016 Reporting Services is back again. Report Manager will be totally overworked as a responsive webpage and will be based in HTML5. The first looks I had at PASS Summit 2015 have been very promising.

 

Yours sincerly,
Markus Ehrenmueller

Microsoft’s Business Intelligence Roadmap

SQLCAT: SQL Server 2016 CollumnStore Customer Scenarios and Best Practices (SQL PASS Summit 2015)

Improve loading a ColumnStore in SQL Server 2016 CTP3 by using the following guidelines:

  • Load into stage-tables per partition using WITH (TABLOCK) and switch those tables in afterwards
  • Separate your INSERT and UPDATE statements
  • For initial load create an empty table with Clustered ColumnStore on it (instead of loading a heap and create the index afterwards)
  • Include the parititioning column in the unique index/primary key (as you would with partitioned heaps)
  • Create statistics immmediately after initial data load (auto stats improvements are on their way, but not implemented yet)
  • Utilize the new possibility to combine Clustered ColumnStore Indexes with b-tree Non-Clustered Indexes where apropriate
  • Clustered ColumnStore Index with a b-tree index on, slows down the load (therefore create the non-clustered b-tree after load; CTP3 allows for parallel non-clustered index creation)
  • REBUILD of ColumnStore is not an online operation, therefore apply it on partition level only
  • Try trace flag 9481 to force the old cardinality estimator for performance-outliers
  • SQL Server Integration Services can lead to trimmed row groups
  • Set AutoAdjustBufferSize on in SQL Server Integration Services to avoid small uncompressed delta stores
  • Use compatibility level 130 when testing ColumnStore
SQLCAT: SQL Server 2016 CollumnStore Customer Scenarios and Best Practices (SQL PASS Summit 2015)

SQL PASS Summit 2015 Keynote #1 (Wednesday)

In 2015 the PASS Summit is the 16th annual meeting of data platform professionals, gathering 5500 total registrations.

PASS President Tom LaRock remembered everybody to welcome #sqlfamily members with a #sqlhug.

Keynote speaker was Joseph Sirosh, Corporte Vice Precident, Data Group.

Analog data will disappear, and digital data got the majority. In short cloud/internet connected data will be the majority. We moved from an age of hardware to an age of software and heading towards an age of data (online recommendations, customer experience, …). Microsofts tool for this is “Cortana Analytics Suite”.

“We are all big data” through 2GB of genomic data.

Eric Fleischman, Chief Architect andVP Platform Engineering, DocuSign, explains how they can cope with the increasing growth of the company, and the data coming with that. Their decision was against open source, as the want to “use” a database system and not “write” a database system.

As averagely two documents are signed every second their system generating 180 Mio events a day.

Engines of data: mission cricital OLTP, high-performance DW, end-to-end mobile BI, with advanced analytics on top of it.

Most of the vendors built their systems (in slow cycles) and ship it afterwards to the cloud. Microsoft is the only company, who builds everything for the cloud and ship it later on-prem. Gartner rated Microsoft as a leader in completeness of vision and completeness of execution.

Shawn Bice, General Manager, Database Systems Group, tells that companies who are able to embrace the data are far more succesful. Big bats:

  • everything is built-in (no add-ins or so)
  • Mission critical OLTP
  • Most secure database: least vulnerable database the past 6 years in a row
  • highest performing data warehouse: won against the other vendors
  • Ent-to-end MobileBI on any device: a fraction of the cost comp (USD 120 against Tableau (USD 480) or Oracle (USD 2230), self-service BI per user)
  • In-database Advanced Analytics: R + in-memory direct to the platofrm
  • in-memory across all workloads
  • consistent experience across cloud and on-prem

Learnings from the experiences with Azure went into SQL 2016 on-prem.

Polybase removes the complexity of big data by enabling T-SQL over Hadoop by providing “external tables” within SQL Server. JSON support will also help a lot of projects.

Real-time is learn and adjust as things are happening. ColumnStore Indexes on top of in-memory (Hekaton) tables will enable this. Combined with embedded R Services the data is accessable to data scientist, without moving the data, but analysing it where it already is.

Rohan Kumar, Partner Director, Engineering shows a showcase with customer “p:cubed”. on a machine with impressive 480 logical processors. Monitors a huge amount of transaction, calculating customer rewards, both in real-time.

Non-Clustered Index will be updateable with SQL 2016. On top of an im-memory table this index will not sacrifice the performance of the oltp-table.

Advanced Analytics enables to include R-scripts within T-SQL code without moving data outside of the data platform and therefore enableing real-time analyitcs. As oltp-data lies in-memory, the data does not even touch the disk from the time it is tracked until it is analyzed.

“Always Encrypted” guarantees, that the encryption key and the deciphered text is available on the client only. The content is never stored in a decrypted way on the server (neither on disk, nor in the buffer pool memory).

Stretch Database allows to combine hot and cold data in one logical table, but actually moves the cold data out to cheaper disks, instead leaving it on the expensive storage, designed for hot data. It will still be queryable in the common way, as the technology is completely transparent to the client. And it works together with “Always Encrypted”.

The stretched part will show up as a “Remote Query” in the execution plan.

Joseph Sirosh: “Our industry does not respect tradition, but does respect innovation.”

Cloud helps to transform hardware to software, software to services and data to intelligence.

Sincerly yours,
Markus Ehrenmüller-Jensen

SQL PASS Summit 2015 Keynote #1 (Wednesday)

Business Intelligence Power Hour at BASTA

After not haven spoken at BASTA! in 2014 I was again at BASTA! 2015, speaking about PowerBI.

I did talk about the following topics, explaining each of them by just one slide and using most of the time in demos:

  • Power Query
  • Power Pivot
  • Power View
  • Power Map
  • PowerBI.com
  • Power Q&A
  • Power BI App
  • Power BI Desktop

You can download my slide deck here.

Sincerly,
Markus Ehrenmüller-Jensen
@MEhrenmueller & @SQLederhose
markus.ehrenmueller@gmail.com

Business Intelligence Power Hour at BASTA

My Favorite Candy Bar (Chart) at PASS Summit

I am really proud that my abstract has been accepted by the abstract review team for PASS Summit 2015 in Seattle. Therefore I will speak for the 3rd time at the PASS Summit. I do appreciate, that I am not only attending THE conference dedicated to SQL Server worldwide, but also being able to be part of a conference I am a huge fan of, since I attended the first time in 2011.

Like last year, I will have again “10 minutes of fame” 🙂 during a Lightning Talk session (during a 75 minutes General Session slot together with 5 other speakers which is dedicated to datavisualisation). This year I will talk about “My Favorite Candy Bar (Chart)“, which will be a sequel zu last years session with title “My Favorite Pie (Chart)”.

Last year I took a sample Pie Chart which did show values for 8 different products. Even when I told the audience, that 7 values where identically, but one had a bigger value, nobody could identify the product with the biggest value (which was 10% (!) bigger than the others). I think that showed very clearly, that in this case a Pie Chart was the wrong type of visualisation.
I ended up converting the Pie Chart into a column chart which made the difference very clear.

I wont reveal all details to this year’s session, but I can tell you so much:
This time I will show a Bar Chart, which will improve hugely by converting it into Pie Chart – no, I am not kidding.
If you are curious about how I could change my mind since last year, just come into my session! 🙂

Sincerly,
Markus Ehrenmüller-Jensen
@MEhrenmueller & @SQLederhose

My Favorite Candy Bar (Chart) at PASS Summit

Introduction to SQL Azure Databases at SQLdays

This week I did two talks at SQLdays, Erding, Germany. One of them was an “Introduction to SQL Azure Databases”. As it was an introductionary level 200 talk, I had lot of slides to explain the concepts and the use cases pro & contra cloud computing, but also prepared some quick demos to show how to

  • subscribe to Azure,
  • create a server,
  • create a database, and last-but-not-least,
  • manage a database

The aim was, that the attendees could get an impression, how putting a database in Microsoft’s cloud offering would “feel” alike.

You can finde my slidedeck here.

Unfortunately through my demos I had to discover that I could not find the link to “Azure Online SQL Database Management”, which I was sure was there the other day, when I rehearsed my demos.

Doing a proper research I found the solution:
When I upgraded my server from V11 to V12 (to get the new (preview) features like Auditing, Dynamic data masking, Transparent data encryption, …) I unintentionally disabled the SQL Database Management, which is a deprecated feature and not available for databases in V12.

Personally I feel really bad about the fact, that SQL Database Mangement is abandoned in the new version. On the one hand I can understand, that resources even in a big company like Microsoft are not unlimited, but on the other hand being able to manage your database objects through the browser without any need to install tools locally, was really helpful. From V12 on we have to stick to tools installed on-premises on your client to maintain tables, views, indexes, and the like.

Sincerly,
Markus Ehrenmüller-Jensen
@MEhrenmueller & @SQLederhose

Introduction to SQL Azure Databases at SQLdays