I will forewarn readers of this blog post that this is ‘Conceptual’ in nature, due to the fact in my tests I was spinning up containers which then fell over with core dumps. Nonetheless, I will put forward my thesis behind why you might want to do this.
Bottlenecks In The Legacy Engine
For what seemed like the first half of 2017 I blogged and presented on the SQL Server community circuit on the pressure points in the legacy database engine. My work came to two main conclusions:
- There is an inherent spinlock bottleneck in the legacy engine around the XDESMGR spinlock (used to serialise the distribution of transactions ids) and the LOGCACHE_ACCESS spinlock.
- Spinlock bottlenecks are exacerbated by the nuances of the cache architecture of modern CPUs. This can be illustrated by running a single threaded singleton workload on the same CPU socket as the log writer and then on a different socket. The test running on the same CPU socket as the log writer will complete in the shortest amount of time, to understand why this is the case you need to need to drill down into the way that spinlocks work. When a sessions thread wishes to acquire a spinlock the cache line associated with it, in the case of the LOGCACHE_ACCESS spinlock, needs to travel from the CPU core hosting the log write thread, to the CPU core hosting the sessions thread. Once the cache line has completed this journey, it is then subjected to a compare and swap (CAS) operation involving the SPID of the session. When the spinlock is released, its cache line will have to make the same journey, but in the opposite direction. The further apart the two CPU cores in question are, the higher the number of CPU cycles burned up in this process are. Ergo, the closer the singleton insert workload can run in relation to the log writer thread, the shorter the amount of time it will take to complete in.
Whilst blogging on this subject, the following image was ubiquitous in many of my blog posts:
A Recap On Memory Management 101
The unit of transfer between main memory and the CPU is the cache line, this is 64 bytes in size. All memory access is CPU intensive, whilst data and / or instructions are being accessed, regardless of whether the instructions and / or data resides in main memory or the CPU cache hierarchy, CPU cycles are burned, there are no waits or locks involved in pure memory access. The final point is that the number of CPU cycles burned when accessing data / instructions is proportional to the distance a cache line has to travel. The best case scenario is that the data / instruction required already resides in a CPU core’s register, in this case we are talking about less than four CPU cycles, depending on the CPU instruction being executed, the worst case scenario is main memory access, this can be 160 cycles and more if there is a translation look-a-side buffer miss also.
Sharding Workloads Across Instances
An approach I mused about in order to get the best possible utilisation possible from a multi socket server was to ‘Affinitize’ a SQL Server instance to each socket and then shard the workload across databases on each instance. The benefits of doing this would be two-fold:
- Two instances = 2 x the number of available XDESMGR and LOGCACHE_ACCESS spinlocks.
- The worse case scenario for moving the cache lines associated with the spinlocks would be the latency of the last level (L3) cache which is less than going from CPU socket to socket via the quickpath interconnect, as illustrated by the graph below:
Containers To The Rescue !!!
What if each singleton insert thread had its own database residing on a dedicated instance ?, if each instance could be tied down to adjacent CPU cores, this would create the best case scenario in terms of minimizing the CPU cycles burned when passing spinlock cache lines around. It just so happens that docker allows the logical CPUs to be specified when spinning up containers via the cpuset argument. Because SQL Server on Linux “Is a thing” I decided to rebuild my trusty home server using:
- Ubuntu 16.04 LTS
- PowerShell for Linux
- Docker (CE Version ) 17.12.0-ce, build c97c6d6
- SQL Server 2017 docker image, the latest version available as of 29/12/2017
The end game to all of this was to achieve linear scalability across all available CPU cores in the server.
I came up with this PowerShell script for spinning up the containers, creating a SingletonInsert database on each one and then executing one single threaded singleton insert workload per container:
Clear-Host docker ps -a -q | ForEach-Object { docker rm -f $_ } $MaxContainers=12 $CpuSets = @( "0,3" , "4,7" , "8,11" , "12,15" , "16,19" , "20,23", "24,27", "28,31", "32,35", "36,39", "40,43", "44,47" ) for($i=0; $i -lt $MaxContainers; $i++) { $CpuSet = $CpuSets[$i] $DockerCmd = "docker run --cpuset-cpus=$CpuSet -e `"ACCEPT_EULA=Y`" -e `"MSSQL_SA_PASSWORD=P@ssw0rd!`" " +` "-p " + [string](15560 + $i) + ":1433 --name SqlLinux$i -d microsoft/mssql-server-linux:2017-latest" Invoke-Expression $DockerCmd } Start-Sleep -s 20 for($i=0; $i -lt $MaxContainers; $i++) { $DockerCmd = "docker exec -it SqlLinux$i /opt/mssql-tools/bin/sqlcmd " +` "-S localhost -U SA -P `"P@ssw0rd!`" " +` "-Q `"CREATE DATABASE SingletonInsert;" +` "ALTER DATABASE SingletonInsert SET DELAYED_DURABILITY=FORCED`"" $DockerCmd Invoke-Expression $DockerCmd $DockerCmd = "docker exec -it SqlLinux$i /opt/mssql-tools/bin/sqlcmd " +` "-S localhost -d SingletonInsert -U SA -P `"P@ssw0rd!`" " +` "-Q `"CREATE TABLE [dbo].[si] ( " +` " c1 UNIQUEIDENTIFIER NOT NULL DEFAULT NEWID() PRIMARY KEY );`"" Invoke-Expression $DockerCmd $DockerCmd = "docker exec -it SqlLinux$i /opt/mssql-tools/bin/sqlcmd " +` "-S localhost -d SingletonInsert -U SA -P `"P@ssw0rd!`" " +` "-Q `"CREATE PROCEDURE [dbo].[usp_Insert] AS " +` "BEGIN " +` " SET NOCOUNT ON " +` " DECLARE @i INTEGER = 0; " +` " WHILE @i <span data-mce-type="bookmark" id="mce_SELREST_start" data-mce-style="overflow:hidden;line-height:0" style="overflow:hidden;line-height:0" >&#65279;</span>< 1000000000 " +` " BEGIN " +` " INSERT INTO si DEFAULT VALUES; " +` " SET @i += 1; " +` " END; " +` "END;`"" Invoke-Expression $DockerCmd } $RunWorkload = { param ($LoopIndex) $DockerCmd = "docker exec SqlLinux$LoopIndex /opt/mssql-tools/bin/sqlcmd " +` "-S localhost -U sa -P `'P@ssw0rd!`' -d SingletonInsert " +` "-Q `'EXEC usp_Insert`'" $DockerCmd Invoke-Expression $DockerCmd } $InsertRates = @() for($i=0; $i -lt $MaxContainers; $i++) { $StartMs = Get-Date for($j=0; $j -lt ($i + 1); $j++) { Start-Job { &$RunWorkload -LoopIndex $j } } while (Get-Job -state running) { Start-Sleep -Milliseconds 10 } $EndMs = Get-Date Write-Host "Duration = " = ($EndMs - $StartMs).TotalMilliseconds $InsertRate = (($i + 1) * 1000000000 * 1000)/(($EndMs - $StartMs).TotalMilliseconds) $InsertRates += $InsertRate } Clear-Host for($i=0; $i -lt $MaxContainers; $i++) { Write-Host "Inserts`/s using " ($i + 1) " containers = " $InsertRates[$i] }
After ironing out some kinks in the script, I found that not all twenty containers started up (note that this screen shot was taken some time after the last invocation of the script was run):
cadkin@ubuntu:~$ docker ps -a [sudo] password for cadkin: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e9370509b67c microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux19 c8fcd5d346b0 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux18 feed2a44559e microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux17 c39325f2a1cf microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux16 af9373fdd1c1 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux15 77a92099448a microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux14 41443910d1ee microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux13 a261eea040ca microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux12 148627655d39 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux11 f7bf0775312a microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux10 39f4cb71b9d3 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Exited (1) 17 hours ago SqlLinux9 b7ef74de5127 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15568->1433/tcp SqlLinux8 db91dc2f0c24 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15567->1433/tcp SqlLinux7 5ef07d08a682 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15566->1433/tcp SqlLinux6 53aea9fd7260 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15565->1433/tcp SqlLinux5 ff9428020511 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15564->1433/tcp SqlLinux4 76139408c21b microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15563->1433/tcp SqlLinux3 b5ba59af259d microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15562->1433/tcp SqlLinux2 d4960fc52ae8 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15561->1433/tcp SqlLinux1 fbe948d0bdc9 microsoft/mssql-server-linux:2017-latest "/bin/sh -c /opt/m..." 17 hours ago Up 17 hours 0.0.0.0:15560->1433/tcp SqlLinux0
Performing docker logs on some of the containers that had exitted revealed something I was not expecting to see:
cadkin@ubuntu:~$ docker logs SqlLinux9 This program has encountered a fatal error and cannot continue running. The following diagnostic information is available: Reason: 0x00000003 Message: result == 0 Stacktrace: 0000565228a5c928 0000565228a5c4d0 Process: 9 - sqlservr Thread: 13 (application thread 0x1000) Instance Id: 9dd6e659-44eb-4204-a3d8-574e0655c184 Crash Id: c2d78278-bdcc-43dc-aac8-a5124cfbd18a Build stamp: 440ff4daa7285d760aa0f372321cea94bd5f87ac313be8b29efa1ced69448c93 Capturing core dump and information... No journal files were found. No journal files were found. Attempting to capture a dump with paldumper WARNING: Capture attempt failure detected Attempting to capture a filtered dump with paldumper WARNING: Attempt to capture dump failed. Reference /var/opt/mssql/log/core.sqlservr.9.temp/log/paldumper-debug.log for details Attempting to capture a dump with gdb WARNING: Unable to capture crash dump with GDB. You may need to allow ptrace debugging, enable the CAP_SYS_PTRACE capability, or run as root.
Although I was hoping to present some nice graphs that would prove / disprove my theory, this was not to be. However, I will endeavour to come up with a test environment that works, whether this is resolved by moving to Ubuntu 17, when it becomes the long term supported release, a later version of the SQL Server docker image or a later version of the Docker engine, I am unsure at present. However, this is most definitely something I intend to revisit, therefore, watch this space !!!.
Addendum 04/01/2018
It appears that my Ubuntu issues (in part) were due to the cgroup and memory account setting in the grub configuration file on my Ubuntu server, as detailed in this link . Using this in junction with specifying the max memory size when starting up the containers resolved the core dump issue. However, some of my containers were still falling over, my suspicion is that my swap partition is not configured correctly. The good news is that I managed to get my test working on a windows 2016 server using Docker CE, the results of which will be presented in part 2 of this blog post.
The second update is that my PowerShell script (inline further up this blog post) has been updated on order to remove some bugs and make the results easier to extract from the scripts output. I will enhance this script as part of the efforts which will go into writing part 2 of the blog post and put this on GitHub.