Using The Right Tool For Right Job: XPERF Flags and Diagnosing Difficult IO Issues

At the end of the session I presented at the recent SQL Saturday in Portugal I had a chat with one of the attendees who mentioned that good xperf documentation is sparse and the question was posed: “Which flags” should I use ?. Luckily, the defrag tools team on Channel 9 have some excellent shows that cover windows performance toolkit, to get started with the tool take a look at their introduction to windows performance toolkit.

The Right Tool For The Right Job

Driving the database engine to its limits and seeing how its behaves is my main area of interest and it requires insights into the database engine which the standard SQL Sever tools cannot provide and you will run into points where the SQL Server internals book cannot help either. Understanding how the database engine behaves at scale, to coin a term used in skiing, requires you to go off piste:

water line

Response time = service time + wait time, in the context of SQL Server the service time is what is happening inside the SQL OS scheduler, wait stats provide information on when the scheduler is yielded by no definitive answers as to where the schedulers are burning CPU cycles.

The Debugger

The debugger gives you insights into what the engine is doing, however there are several critical things it does not give you:

  • CPU utilization statistics.
  • Statistics with which to perform wait analysis at a thread level.
  • Interrupt service request and deferred procedure call statistics.
  • IO performance statistics.
  • Network performance statistics.
  • The ability to explore stack traces in a “Slide and dice the data” manner.

When To Use XPERF ?

For a given workload on a given server, driving the database engine to its limits will involve maxing out the CPU capacity on the server or the IO bandwidth of your IO subsystem, there are several maxims that apply here:

  • Its rare that you will saturate your CPU capacity and IO bandwidth at the same time, you will hit a wall with one or the other first, not both at the same time.
  • If you cannot max out your CPU capacity or IO bandwidth, this is likely to be due to some form of contention or throttling taking place.

Throttling refers to mechanisms that comes into life to dampen throughput somewhere because part of the database engine cannot cope with a certain level of throughput from one end to another, two examples of this spring to mind:

Without further ado I will outline some scenarios in which windows performance toolkit is useful. I will go into some depth about IO because this is something I have not covered in much detail to date.

Investigating CPU Saturation

The real power of windows performance toolkit comes with the ability it gives you to not only capture stack traces but to navigate them, however the most critical thing you will require are public symbols for the version of SQL Server you have captured an event trace for. I should add that the public symbols are version specific right down to CU level. For playing around with windows performance toolkit at home kernel groups ( groups of xperf flags) are useful:


However, if you a trouble shooting a production system you are better off avoiding kernel groups and using as few flags as possible. Note that many of the kernel groups contain DISK_IO, for investigating CPU saturation you do not need this at all. Use:

 xperf -on PROC_THREAD+LOADER+PROFILE -StackWalk Profile

Ordinarily SQL Server should not incur any context switches in that the schedulers that SQL OS provides should shield you from this unless you are encountering PREEMPTIVE_OS_ events. One use case where you might want to investigate context switching is when using integration services and you have more that one execution path per package, more simply put you will get context switches as buffers are passed between threads, in this case you would want to use an xperf command line which looks like this:

 xperf -on PROC_THREAD+LOADER+PROFILE -StackWalk Profile+CSwitch

The defrag tools show of interest for this area are:

Investigating IO

xperf -on PROC_THREAD+LOADER+DISK_IO+DISK_IO_INIT -stackwalk DiskReadInit+DiskWriteInit+DiskFlushInit

IO Initialization and Call Stacks

Most peoples initial approach to investigating IO issues would be to look at using more performance counters, wait stats and DMVs such as sys.dm_io_virtual_file_stats. However, windows performance toolkit can expose information that these standard tools cannot. For example, the DiskReadInit, DiskWriteInit and DiskFlushInit that follow -stackwalk allow the call stacks at read operation initialization time, write initialization time, disk flush initialization time to be captured. Consider this screen shot of windows performance viewer:

IO init stack

The trace for this was taken from my single disk laptop, hence why AdventureWorksDW2012 data file is located under program files. “IO Time” reflects the total time spent servicing the IO operation from making the system call, travelling right down through the driver stack to the the physical device itself. “Disk service time” is the time that the device spent servicing the IO request (times are in micro seconds). The term ‘Disk’ should be used loosely in that windows has little concept of how the IO takes place at the other side of the Raid controller or host bus adapter, IO requests could be serviced by:

  • A SAN or DAS array cache
  • Anywhere within a storage infrastructure that is presented as a unified pool of storage via a storage virtualization technology, IBM SAN Volume Controller is but one example of a storage virtualization technology.
  • Any tier in a SAN or DAS array that uses automatic tier-ing, HP Adaptive Optimization on a 3 Par SAN for example.

The important point to note is that windows performance toolkit allows you to see what the call stack looked like when a read/write/flush was initiated.

How Does This Help Me With SQL Server ?

Consider SQL Server error 833 as documented in this Microsoft article, its symptom is:

When I/O requests take longer than 15 seconds, Microsoft SQL Server reports “Msg 833” to the event log, and the following text is logged in the ERRORLOG file:

SQL Server has encountered <n> occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [<full path of file>] in database [<database name.>] (<dbid>). The OS file handle is <file handle>. The offset of the latest long I/O is: <offset in hexadecimal>

the article goes on to state that the cause is:

This problem is caused by some issue in the operating system, by device drivers, by firmware, or by hardware that is used to support the devices to store SQL Server database and/or transaction log files. SQL Server records the time that it initiated an I/O request and records the time that the I/O was completed. If that difference is 15 seconds or more, this condition is detected. This also means that SQL Server is not the cause of a delayed I/O condition that this message describes and reports.

Windows performance allows ISR and DPC information to be captured for trouble shooting bad device drivers (covered later), putting the two together provides a powerful means with which to crack the SQL Server 833 error nut.

Is Something Holding On to Or Re-Ordering My IO Requests ?

But before I move off the subject of IO, there is one more powerful feature windows performance toolkit provides, the ability see how deep the queue was at IO initialization and completion time. Before delving into this lets go into the back ground of why re-ordering IO might be beneficial. Disk performance is killed by excessive head movement which is why the technique of “Short stroking exists”:

full vs short stroked disk

This graph illustrates the affect short stroking can have:

effect of short stroking

At the expense of losing some disk capacity short stroking helps increase throughput and reduce latency. Some RAID controllers such as the HP Smart Array controller have the ability to “Elevator sort” IO requests in order to minimize head movement, this is great at increasing IO throughput, but this comes at the expense of increasing latency:

elevator sorting

We want high and consistent sequential scan throughput for OLAP/MI and reporting style applications, for OLTP application any IO re-ordering ‘Intelligence’ that the storage introduces can be devastating for things which require low latency IO, such as writes to transaction logs. Fortunately the windows performance toolkit allows us to collect and view metrics on:

  • QD/I – Queue Depth at Init Time
  • QD/C – Queue Depth Completion Time

A disparity between the two might indicate something is coalescing or re-ordering IO requests. The screen shot below is from my laptop, it uses the legacy Advanced Host Controller Interface protocol to talk to the disk across a SATA connector, this performs IO re-ordering as illustrated by the difference between the queue depths at IO initialization and completion time:

queue depth

Refer to:

Bad Device Drivers

Under normal circumstances interrupts produced by device drivers and the deferred procedure calls which service these should account for a 1% CPU utilization at most. Below is an overview of how interrupts, interrupt service requests and deferred procedure calls are handled:


To capture the relevant information for trouble shooting a bad device driver use this xperf command line:


Refer to the defrag tool show:

Collecting IO and ISR/DPC Statistics Together

Getting to the bottom of error 833 will require that IO and ISR/DPC statistics are collected together, this is the xperf command line to achieve this:

xperf -on PROC_THREAD+LOADER+DISK_IO+DISK_IO_INIT+INTERRUPT+DPC -stackwalk DiskReadInit+DiskWriteInit+DiskFlushInit

Caution . . . Event Trace for Windows Files Can Grow To Be Huge !

If you have an intermittent issue that is not easily reproducible, you probably want to run xperf continuously, the problem being that this can result in huge ETW files, you might therefore want to consider adding the switches to the end of the xperf command line highlighted in bold:

xperf -on PROC_THREAD+LOADER+DISK_IO+DISK_IO_INIT+INTERRUPT+DPC -stackwalk DiskReadInit+DiskWriteInit+DiskFlushInit -BufferSize 1024 -MinBuffers 256 -MaxBuffers 256 -MaxFile 256 -FileMode Circular

The maxfile is the maximum size of the ETW file produced in MB, 256MB in this example.

Troubleshooting Networking Issues

Long waits on ASYNC_NETWORK_IO is the most commonly perceived SQL Server related networking performance issue, this is down to the client not being able to process packets fast enough and acknowledge receipt of them to SQL Server as illustrated below:


Short of re-writing the client code to run more efficiently or configuring it to process rows in batches, there is little that can be done about this.

The use of the TCP-IP chimney feature, documented here, can also cause issues that fall into the general category of “Networking problems”, here is a basic xperf command line for investigating networking issues:


I would like to thank Thomas Kejser for giving me full access to reuse images from his master tuning slide deck.

6 thoughts on “Using The Right Tool For Right Job: XPERF Flags and Diagnosing Difficult IO Issues

  1. Hi Chris, I’m glad you’re writing about this. Are you considering running any classes on these topics?

    • James the short answer, is I would like to deliver some training, I need to work out what demand there might be for this. I’m cognizant of the fact that Pass has gone into overdrive with laying on events across Europe which might sap demand for anything I might put on. However putting some training on and doing a precon is most definitely on my radar. Also Thomas Kejser has asked me to take on the task of presenting his Super Scaling SQL Server course which I will have full access to all of his material for.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s