In this post I want to take a look at how batch mode and column store indexes perform in two scenarios:
- Fact table to dimension table join
Sequential scans demonstrate some interesting nuances depending on the type of hardware setup being used, for this very reason, I have two distinct types of storage:
For the all the tests performed an artificially inflated FactInternetSales table is used:
Lets now look at how a straight sequential scan performs with my slow and fast storage using the pre SQL Server 2012 forms of compression:
Elapsed time increases as the compression ratio goes up for the slower storage, however for the faster storage, the reverse is true, the theory being that we are feeding the servers CPU cores at such a rate that the system is now CPU bound. The cost of decompressing a page compressed heap during a sequential scan versus that of an uncompressed heap is quantifiable by looking at the call stack, as obtained through xperf:
These sequential scan findings are best illustrated with a see-saw analogy as there is a tipping point between the balance of a server’s CPU capacity and its IO sub system bandwidth which determines whether or not compression is beneficial for a sequential scan.
Conversely, if your IO sub system can keep up with or swamp your available CPU resources, compression yields benefits for sequential scans.
When joins are introduced into the equation, prior to column store indexes, this is the best that is achievable for a simple fact table to dimension table join:
- The use of column store indexes and batch mode results in better scalability, up to a degree of parallelism of around 10, as opposed to 8 without column store indexes.
- Column store indexes and batch mode reduce elapsed time by an order of magnitude compared to rows stores and row mode.
- Here is the really interesting thing, the best sequential scan for the row store and the flash storage takes place when there is no compression is used. This begs the question as to what sort of performance could be obtained if SQL Server allowed for column store indexes without any run length compression. It is my belief that for storage that can match / best a servers aggregate CPU core consumption rate, this would result in the fastest possible sequential scans and the best all round query performance.
In the next post I will cover some of the fundamentals of modern CPU architectures which are the drivers behind why the batch mode execution engine exists.