We are at the greatest inflection point in storage since the inception of the beast pictured below, the worlds first commercial hard drive in the form of the IBM RAMAC. I am of course speaking of flash storage, this challenges a lot of conventional DBA wisdom.
High performance flash storage turns a lot of conventional DBA storage wisdom and lore on its head:
#1 Compress your data to boost performance
The raw IOPS horse power of flash storage, raided gen 1 ioDrives in this example, is enough to make scans CPU bound to the extent that the overhead of decompression has the net effect of sending performance backwards:
The graph provides the perfect illustration of the problem that row and page level compression is desgined to solve: the performance disparity between spinning disks and modern CPUs.
I repeated this test with my much faster ioDrive 2 duo, rated at 3GB/s for sequential reads. An INSERT SELECT based on this statement is used for populating a heap hash partitioned across 24 partitions:
;WITH generator AS ( SELECT TOP 50000 id = ROW_NUMBER() OVER (ORDER BY a) FROM (SELECT a = 1 FROM master.dbo.syscolumns) c1 CROSS JOIN master.dbo.syscolumns c2 ) SELECT g1.id % 24 AS Col1 ,g1.id + 1 AS Col2 ,g1.id + 2 AS Col3 ,g1.id + 3 AS Col4 ,g1.id + 4 AS Col5 ,CASE g1.id % 2 WHEN 1 THEN AddressLine1 ELSE AddressLine2 END AS Col6 ,CASE g1.id % 7 WHEN 1 THEN AddressLine1 ELSE AddressLine2 END AS Col7 ,FirstName AS Col8 ,MiddleName AS Col9 ,LastName AS Col10 ,CustomerAlternateKey AS Col11 INTO BigData FROM generator g1 CROSS JOIN DimCustomer
I then scanned this using two different statements:
SELECT COUNT([Col1]) ,COUNT([Col2]) ,COUNT([Col3]) ,COUNT([Col4]) ,COUNT([Col5]) ,COUNT([Col6]) ,COUNT([Col7]) ,COUNT([Col8]) ,COUNT([Col9]) ,COUNT([Col10]) ,COUNT([Col11]) FROM [dbo].[BigData] WITH (NOLOCK) SELECT SUM([Col2]) ,SUM([Col3]) ,SUM([Col4]) ,SUM([Col5]) FROM [dbo].[BigData] WITH (NOLOCK)
If we start off with a basic heap and then go all the a column store with archive compression, this is what we see for elapsed times:
The SELECT COUNT query runs in under a second for the column store, I suspect this is because the database engine is picking count information directly out of the global dictionary. Below are the scan rates and CPU consumption per query/compression type:
I have omitted the scan rates and CPU consumption for the SELECT COUNT query against the column store because it runs so fast. The main takeaway from this data is that as we head towards page compression for the heap and the SELECT SUM query, the scan rate goes down and CPU consumption goes up (turbo boost allows for CPU consumption above 100%), the net effect being an increase in elapsed time. Column stores do not suffer from the CPU being swamped by the SELECT SUM query, I suspect this is due largely to the fact that the huge CPU penatly of having to work out where columns begin and end in the page is not incurred for column stores:
#2 512K Reads and Aggressive Read Ahead Behavior Is Required For Maximum Scan Throughput
Wrong again, the fact that IO request can be parallelized when talking to flash via the PCIe bus means that the maximum IO throughput of a flash devices can be achieved with 64K reads. My ioDrive 2 duo rated at 3GB/s for sequential reads can deliver 3GB/s scan rates with 64K reads:
#3 Put TEMPDB, Transactions Logs etc On Different Spindles
This table is borrowed from my blog post on singleton inserts, after experimenting with different key types and various ‘Tweaks’ as documented below, I managed to insert 727,272 rows per second, and how was the storage configured for this ?, everything was stripped across a Fusion IO drive and two conventional SanDisk solid state drivers . . . no separation of tempdb, transaction logs etc whatsoever.
#4 Deep Queue Depths Are Required For High IO Throughput
This excerpt comes from this Microsoft article:
HBA Queue Depth Settings
When configuring HBAs on the host, ensure that the Queue Depth is set to an optimal value. SQL Server applications are generally I/O-intensive, with many concurrent outstanding I/O requests. As a result, the default values for Queue Depth on HBAs are usually not high enough to support optimal performance. Currently the default value for the major HBA vendors is in the range of 8 to 32.
In our SQL Server testing, we have seen substantial gains in I/O performance when increasing this to 64 or even higher. It is worth noting that in these tests SQL Server was usually the only application using the storage array. It is important to discuss with your storage administrator the appropriate values for this setting, as the setting may affect other applications in sharing the same storage environment. When Queue Depth is set too low, a common symptom is increasing latency and less-than-expected throughput given the bandwidth between host/storage and the number of spindles in a particular configuration.
Flash PCIe cards do not require high queue depths in order to deliver near peak IO throughput, the screen shot below comes from a parallel full table scan of a heap on my iodrive 2 duo, the card is rated at 3GB/s for sequential reads, 2845MB/s is being achieved with a queue depth of just 1.