Diagnosing Spinlock Problems By Doing The Math

In this blog post I’m going to outline a very simple technique for working out whether spin lock activity is the cause of excessive CPU consumption. What is A Spinlock ?

A spinlock synchronizes access to internal data structure within the database engine, any thread wishing to acquire a spinlock will spin up until the point it can be acquired. The key thing to note is that the wait is largely a “Bust wait”, meaning that whilst the thread is waiting to acquire the spinlock it burns CPU cycles. A spinlock is a much more light weight mechanism than a latch and it is mainly intended for use in scenarios where the ‘Expected’ wait for the lock is short. This simple pseudo code illustrates the logic behind a spinlock acquire and release:

psuedo code

After a certain amount of time the thread will yield, this is referred to as a backoff:

backoff

SQL Server 2008 R2 introduces the concept of “Exponential back off” whereby the amount of time that elapses between a thread attempting to acquire the spinlock and the thread sleeping decreases exponentially over time. This call stack excerpt illustrates a Sleep associated with a spinlock acquire:

spinlock sleep

If you have genuine cause to look into spinlocks in anger, its highly likely that your server has at least two CPUs and hardware NUMA enabled. Spinlocks travel between NUMA nodes as memory cache lines:

spinlock_cacheline

NUMA node memory transfers between i Series Xeon’s CPUs is via the Quick Path Interconnect (QPI). The latency is passing cache lines between NUMA nodes is considerably more than that when passing cache entries between CPU cores on the same socket or NUMA node:

latency

It is therefore important that the quick path interconnect is working as efficiently as possible, this usually involves making sure that the server firmware is uptodate, coreinfo from sys internals can be used to check the latency between NUMA nodes.

Why Spin ?

This simple graph represents the latency in CPU cycles for accessing main memory and different parts of the CPU cache hierarchy:

CPU Cache Access in CPU Cycles

Let me add something to this, which will dwarf everything on my original chart:

context switch

The context switch is the great white shark of performance killers, but what is this ?. When a CPU core has to switch between threads, this is known as a “Context switch”. The switching bit should be relatively self explanatory, the context element refers to the saving of the execution context of the thread being switched out and the loading of the execution context being switched in. Execution context includes:

  • CPU register contents
  • Call stack of the thread
  • Virtual memory mapping information of the threads working set:memory pages it is currently using.

Because of this; fewer CPU cycles are expended if a busy wait is performed compared to switching a thread out if the wait is brief.

When Should I Suspect Spinlock Activity Might Be An Issue ?

The usual suspects to look out for are:

  • High CPU utilization
  • High concurrency, this typically affects OLTP systems
  • As the number of threads performing transactions goes up, transactions per second remains static but CPU utilization increases.

Determining that spin locking is an issue can be a bit more subtle than this, hence this blog post ! ;-), however, some simple sums can help us get to the bottom of this. Let me provide a simple example in the form of a test I ran:

  • Server has two CPU sockets, each CPU is clocked at 2.4Ghz with six cores and hyper threading turned on, this results in SQL Server seeing 24 logical processors.
  • osstress is used to fire create 24 threads performing concurrent inserts.
  • The test lasts for 84 seconds.
  • When the test is in full flight, this is the CPU utilization: CPU utilsation
  • Here are the spinlock statistics from sys.dm_os_spinlock_stats:

spins

Most of the check boxes for spinlock activity being a potential issue have been checked:

  • High CPU utilization, 100%, plus because the CPU is running at 2.49Ghz, turbo boost has kicked in on the CPU clock.
  • Concurrency
  • Latches with lots of spins and back offs, the most active spin lock has eleven digits.

We have everything except the nice graph depicting that transactions per second plateaus off whilst CPU consumption increases as the number of threads is increased.

Where Does The Math Bit Come In ?

It comes in here, what I’m going to do is to turn the number of spins into a percentage of the CPU cycles expended:

  • First we calculate the number of CPU cycles (roughly) that have taken place during the test, this is:Time (s) x CPU cycles per second x physical cores used 84 x 2,490,000,000 x 12 = 2,509,920,000,000
  • The second step is to compare our spins to the total number of CPU cycles expended, I will do this for the XDESMGR spin lock and the total number of spins:
    XDESMGR spins as % of CPU cycles 100 – ( ( ( 2,509,920,000,000 – 12,123,870,745 ) / 2,509,920,000,000 ) * 100 ) = 0.48% Total spins as % of CPU cycles 100 – ( ( ( 2,509,920,000,000 – 14,581,540,862 ) / 2,509,920,000,000 ) * 100 ) = 0.58%

Can We Do Better ? Yes we can, if we take our most active spin lock, windows performance tool kit can be used to quantify the CPU time spent spinning on it. The top of the stack provides the overall CPU time SQL Server has used: topofstack xdesmgr

Lets do the math again, but this time using CPU time:

100 – ( ( ( 1,478,876.859525 – 36,016.346882 ) / 1,478,876.859525 ) * 100 )

= 2.43 %

This is a more accurate calculation in terms of providing something that is directly tangible to CPU time, however it still spells out the fact that spins on XDESMGR are not responsible for maxing out the server’s CPUs.

Using the computation graph in the windows performance analyzer analysis view the XDESMGR spinlock spin CPU time can be visualized compared to that for the database engine as a whole. The green line at the bottom highlighted by the red box is the CPU consumed by the XDESMGR spinlock:

xdesmgr_graph

There you have it, a two simple ways of determining if spinlock activity is causing excessive CPU consumption “By doing the math”.

I must give thanks to Thomas Kejser for letting me borrow the first three images in this blog post from his “Master Tuning” slide deck.

9 thoughts on “Diagnosing Spinlock Problems By Doing The Math

  1. Great post as usual, Chris, with a nice sober angle. I sometimes wonder if spinlock contention might be the new gluten intolerance…

    • In honesty, based on my testing I suspect I need at least eight physical cores per socket under load to perform test in which spin lock activity is a cause for concern. As part of the work that went into this blog and the one of singleton inserts I’ve got enough material for another two blogs which cover the ramifications of spinlock activity from different angles, these should be posted in the next couple of weeks or so.

  2. Chris,
    I have looked at Thomas Kejsers blog on spinlocks too. I am trying to track them now. Your formula is a great start point on when to be concerned. I was wondering at what level should we worry about spinlocks if you don’t see 100% CPU spikes? Do you have a major percentage point when this should be a concern? Your example showed the XDESMGR spinlock just being 2.43%. If it had been say over 40% then might you consider this a being a major reason for the high CPU?

    Just trying to get a clear idea of when to be worried about spinlocks.

    Thanks

    Chris

    • As a general rule of thumb you need to be maxing out your CPU capacity and be seeing spins of the order of billions (US billion with 9 zeros). However above and beyond this it depends on which spin lock you are seeing a high number of spins and back offs on. For example, the LOGCACHE_ACCESS spin lock governs access to the log buffer, its not unheard of 20% of a servers CPU utilization to be spent on spins on this one spinlock under a heavy OLTP load. Apart from affinitizing the rest of the data engine away from the core the log writer is running on and using CPUs with the fastest clock speed you can buy, so that threads can obtain and then release this spinlock as fast as possible your options are limited. A full answer to this warrants a blog post in itself which I will endeavor to write.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s