Analyzing The Database Engine With Windows Performance Toolkit’s Advanced Features

In this blog post I’m going to touch on some of the more advanced features of the Windows Performance Toolkit which I have yet to blog on. To date I have demonstrated how stack traces obtained with xperf can be used to quantify the CPU time cost of:

  • Individual execution plan iterators
  • Compression
  • The database engine having to inspect column values whilst performing a heap scan
  • etc etc . . .

In short it opens up a world of insight into where the database engine is expending CPU time that no other free tool can. Without further a do, here are some things you can do with windows performance analyzer that I have teased about:

  • IO sizes per file histogram
    The base kernel flag enables IO statistics collection (among other things), drag the storage graph on to the windows performance analyzer GUI analysis canvas, change the view to “Counts by process, IO type” and by playing around with the columns displayed you can get the following:
  • 7_1 io sizes SanDiskQuantifying the costs of the different layers of the database engine.

    From SQL Server 2012 onwards the database engine comprises of a number of dll’s:1. sqldk and sqlos: SQLOS
    2. sqltses : SQL expression service
    3. sqllang : language processing
    4. qds : Query data store, being surfaced in SQL Server vnext ( SQL 2016 ? ).
    5. sqlmin : Encapsulates the run time and storage engine

    By dragging the computation graph onto the analysis canvas and enabling the ‘Module’ column to be displayed in the data table, we can see how much sampled CPU time is expended by each layer of the engine:

    weight by module

  • Obtaining the CPU consumption graphs of different parts of the call stack.

    The yellow line that appears in the windows performance analyzer data table view controls how data is aggregated;everything to the right of the line is rolled up: counts, sums etc. If we elect to aggregate by the stack itself, we can view a graph of the CPU time over time for any part of the call stack itself, simply right click on any part of the stack to the left of the graph and select Enable -> Selection:

    stack graph

  • Comparative Analysis.

    Imaging carrying out tuning, making a change and wanting to observe how this impacts the CPU consumption of a specific part of the engine, windows performance analyzer enables this to be done via “Comparative analysis”. Simply open two event trace for windows files in WPA, select “New Comparative Analysis View” from the Window menu and off you go, the weightings are now the difference between the ‘Baseline’ and the second etw file chosen. Note the negative value at the bottom of the screen shot, this is where the weight of part of a call stack is less than that of the same place in the stack from the base line:


The windows performance toolkit comes free with the Windows Assessment and Deployment kit, all you need is a SQL Server instance, the public symbols for this and you are good to go, enjoy !.

2 thoughts on “Analyzing The Database Engine With Windows Performance Toolkit’s Advanced Features

  1. I once used PerfView to debug a CPU intensive reporting query. I found that case-insensitive string comparisons were about half of the runtime. They are surprisingly expensive. I couldn’t have found out any other way.

    • Thanks for reading my blog, indeed, for CPU intensive tasks finding out where your CPU time is going to **** definitively **** windows performance analyzer is the only way to go, all the other more widely used tools only provide hints and clues as to where you CPU time is going. Plus not all spin locks are documented.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s