In this blog post I’m going to touch on some of the more advanced features of the Windows Performance Toolkit which I have yet to blog on. To date I have demonstrated how stack traces obtained with xperf can be used to quantify the CPU time cost of:
- Individual execution plan iterators
- Compression
- The database engine having to inspect column values whilst performing a heap scan
- etc etc . . .
In short it opens up a world of insight into where the database engine is expending CPU time that no other free tool can. Without further a do, here are some things you can do with windows performance analyzer that I have teased about:
- IO sizes per file histogram
The base kernel flag enables IO statistics collection (among other things), drag the storage graph on to the windows performance analyzer GUI analysis canvas, change the view to “Counts by process, IO type” and by playing around with the columns displayed you can get the following:
Quantifying the costs of the different layers of the database engine.
From SQL Server 2012 onwards the database engine comprises of a number of dll’s:1. sqldk and sqlos: SQLOS
2. sqltses : SQL expression service
3. sqllang : language processing
4. qds : Query data store, being surfaced in SQL Server vnext ( SQL 2016 ? ).
5. sqlmin : Encapsulates the run time and storage engineBy dragging the computation graph onto the analysis canvas and enabling the ‘Module’ column to be displayed in the data table, we can see how much sampled CPU time is expended by each layer of the engine:
-
Obtaining the CPU consumption graphs of different parts of the call stack.
The yellow line that appears in the windows performance analyzer data table view controls how data is aggregated;everything to the right of the line is rolled up: counts, sums etc. If we elect to aggregate by the stack itself, we can view a graph of the CPU time over time for any part of the call stack itself, simply right click on any part of the stack to the left of the graph and select Enable -> Selection:
-
Comparative Analysis.
Imaging carrying out tuning, making a change and wanting to observe how this impacts the CPU consumption of a specific part of the engine, windows performance analyzer enables this to be done via “Comparative analysis”. Simply open two event trace for windows files in WPA, select “New Comparative Analysis View” from the Window menu and off you go, the weightings are now the difference between the ‘Baseline’ and the second etw file chosen. Note the negative value at the bottom of the screen shot, this is where the weight of part of a call stack is less than that of the same place in the stack from the base line:
The windows performance toolkit comes free with the Windows Assessment and Deployment kit, all you need is a SQL Server instance, the public symbols for this and you are good to go, enjoy !.
I once used PerfView to debug a CPU intensive reporting query. I found that case-insensitive string comparisons were about half of the runtime. They are surprisingly expensive. I couldn’t have found out any other way.
Thanks for reading my blog, indeed, for CPU intensive tasks finding out where your CPU time is going to **** definitively **** windows performance analyzer is the only way to go, all the other more widely used tools only provide hints and clues as to where you CPU time is going. Plus not all spin locks are documented.