In this blog post I’m going to outline a very simple technique for working out whether spin lock activity is the cause of excessive CPU consumption. What is A Spinlock ?
A spinlock synchronizes access to internal data structure within the database engine, any thread wishing to acquire a spinlock will spin up until the point it can be acquired. The key thing to note is that the wait is largely a “Bust wait”, meaning that whilst the thread is waiting to acquire the spinlock it burns CPU cycles. A spinlock is a much more light weight mechanism than a latch and it is mainly intended for use in scenarios where the ‘Expected’ wait for the lock is short. This simple pseudo code illustrates the logic behind a spinlock acquire and release:
After a certain amount of time the thread will yield, this is referred to as a backoff:
SQL Server 2008 R2 introduces the concept of “Exponential back off” whereby the amount of time that elapses between a thread attempting to acquire the spinlock and the thread sleeping decreases exponentially over time. This call stack excerpt illustrates a Sleep associated with a spinlock acquire:
If you have genuine cause to look into spinlocks in anger, its highly likely that your server has at least two CPUs and hardware NUMA enabled. Spinlocks travel between NUMA nodes as memory cache lines:
NUMA node memory transfers between i Series Xeon’s CPUs is via the Quick Path Interconnect (QPI). The latency is passing cache lines between NUMA nodes is considerably more than that when passing cache entries between CPU cores on the same socket or NUMA node:
It is therefore important that the quick path interconnect is working as efficiently as possible, this usually involves making sure that the server firmware is uptodate, coreinfo from sys internals can be used to check the latency between NUMA nodes.
Why Spin ?
This simple graph represents the latency in CPU cycles for accessing main memory and different parts of the CPU cache hierarchy:
Let me add something to this, which will dwarf everything on my original chart:
The context switch is the great white shark of performance killers, but what is this ?. When a CPU core has to switch between threads, this is known as a “Context switch”. The switching bit should be relatively self explanatory, the context element refers to the saving of the execution context of the thread being switched out and the loading of the execution context being switched in. Execution context includes:
- CPU register contents
- Call stack of the thread
- Virtual memory mapping information of the threads working set:memory pages it is currently using.
Because of this; fewer CPU cycles are expended if a busy wait is performed compared to switching a thread out if the wait is brief.
When Should I Suspect Spinlock Activity Might Be An Issue ?
The usual suspects to look out for are:
- High CPU utilization
- High concurrency, this typically affects OLTP systems
- As the number of threads performing transactions goes up, transactions per second remains static but CPU utilization increases.
Determining that spin locking is an issue can be a bit more subtle than this, hence this blog post ! ;-), however, some simple sums can help us get to the bottom of this. Let me provide a simple example in the form of a test I ran:
- Server has two CPU sockets, each CPU is clocked at 2.4Ghz with six cores and hyper threading turned on, this results in SQL Server seeing 24 logical processors.
- osstress is used to fire create 24 threads performing concurrent inserts.
- The test lasts for 84 seconds.
- When the test is in full flight, this is the CPU utilization:
- Here are the spinlock statistics from sys.dm_os_spinlock_stats:
Most of the check boxes for spinlock activity being a potential issue have been checked:
- High CPU utilization, 100%, plus because the CPU is running at 2.49Ghz, turbo boost has kicked in on the CPU clock.
- Latches with lots of spins and back offs, the most active spin lock has eleven digits.
We have everything except the nice graph depicting that transactions per second plateaus off whilst CPU consumption increases as the number of threads is increased.
Where Does The Math Bit Come In ?
It comes in here, what I’m going to do is to turn the number of spins into a percentage of the CPU cycles expended:
- First we calculate the number of CPU cycles (roughly) that have taken place during the test, this is:Time (s) x CPU cycles per second x physical cores used 84 x 2,490,000,000 x 12 = 2,509,920,000,000
- The second step is to compare our spins to the total number of CPU cycles expended, I will do this for the XDESMGR spin lock and the total number of spins:
XDESMGR spins as % of CPU cycles 100 – ( ( ( 2,509,920,000,000 – 12,123,870,745 ) / 2,509,920,000,000 ) * 100 ) = 0.48% Total spins as % of CPU cycles 100 – ( ( ( 2,509,920,000,000 – 14,581,540,862 ) / 2,509,920,000,000 ) * 100 ) = 0.58%
Can We Do Better ? Yes we can, if we take our most active spin lock, windows performance tool kit can be used to quantify the CPU time spent spinning on it. The top of the stack provides the overall CPU time SQL Server has used:
Lets do the math again, but this time using CPU time:
100 – ( ( ( 1,478,876.859525 – 36,016.346882 ) / 1,478,876.859525 ) * 100 )
= 2.43 %
This is a more accurate calculation in terms of providing something that is directly tangible to CPU time, however it still spells out the fact that spins on XDESMGR are not responsible for maxing out the server’s CPUs.
Using the computation graph in the windows performance analyzer analysis view the XDESMGR spinlock spin CPU time can be visualized compared to that for the database engine as a whole. The green line at the bottom highlighted by the red box is the CPU consumed by the XDESMGR spinlock:
There you have it, a two simple ways of determining if spinlock activity is causing excessive CPU consumption “By doing the math”.
I must give thanks to Thomas Kejser for letting me borrow the first three images in this blog post from his “Master Tuning” slide deck.