Sunday, October 16, 2011

Adding Up the Logical Reads in the Output Window

nixie

Probably the first thing I learned about query tuning is that "clock-on-the-wall" time is definitely not the thing you want to measure. The elapsed time a query takes can be influenced by countless factors (other processes running, disk speed, number of processors, amount of RAM, phase of the moon, a few hundred database settings) that it's too rough a metric for serious performance tuning.

What I've been taught to measure, instead, are logical reads, which are the number of disk I/O reads needed to execute the query. If the query doesn't change, and the underlying data doesn't change, it's been my experience (and it makes sense theoretically), that you can run a given query all day long, under any load conditions you can create, and the number of logical reads will always be the same. This immunity to outside "noise" makes it ideal for tuning: you can be pretty sure (given the same schema, of course - indexes leap to mind - and at least similar data) that the logical read counts on your machine will be the same on another machine (such as the Production server, or your boss's laptop).

Ok, so how do we capture logical reads? Easy - simply enable the STATISTICS IO option, like so:

SET STATISTICS IO ON

Once turned on for a given session, this emits the number logical reads made for every table (including temp tables) for every statement executed, until you turn it off (by saying SET STATISTICS IO OFF). Let's look at a single line that would be emitted:

Table 'Invoices'. Scan count 1, logical reads 332, physical reads 10, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You'll notice a lot of other values displayed on this line, but for now, ignore them. One that you should especially ignore (for now) is physical reads, because its count is included in the logical reads value. That is, in our one-line example, we can see there are going to be 332 pages read from somewhere. That "somewhere" will be either in RAM (the buffer pool), or from disk (your "C:" drive, for example), but both of these "somewheres" are counted by logical reads. Physical reads only counts what was read from disk this time, which is the kind of unpredictable "clock-on-the-wall" information we know to be unreliable for query tuning (for now). Why are they unpredictable? Because who knows why the page wasn't in buffer pool in memory, and so had to read from disk, at the precise moment we needed it? It's unknowable, and even if it weren't - if you had perfect knowledge of the innards of the computer at that moment - it's still unpredictable.

Looking at our one-line example above, everything seems pretty rosy. SET STATISTICS IO ON spits out the logical reads we need, so all we have to do is total them up. But this is where it gets ugly, because the information we're interested in is interspersed with our own PRINT messages, RAISERROR output, warning and informational messages from the system, and data output from SELECT statements (if we're using "Output to Text" mode, which I often do, as it's the only way to get the dang output window to scroll). This can all be quite a mess to read:

Starting at 12:25:30 PM...
Table 'Customers'. Scan count 6, logical reads 10213968, physical reads 125391, read-ahead reads 254294, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CustomerNum   InvoiceNum   Amt
----------- ------------ --------------------------------------
2344   1331111   333.23
4492   4837227 44222.84
3434   222444   91811.33
98474   54422  22.09
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Updating the total prices...
(1 row(s) affected)
Table 'Invoices'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
12:30:38: Added row to the Invoices table for 2011/10/12
Warning: Aggregate calculation excluded NULL values.
Table 'Worktable'. Scan count 3, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now, try to total up the logical reads in that output. If there's only a few lines of output, it's manageable, but imagine trying to decipher hundreds of lines of stuff like this. Over time, I developed a rigamarole system:

  • copy the entire contents of the Output window to the clipboard
  • fire up Excel, and paste the output into it
  • from Excel's "Data" tab, click the "Sort A to Z" button
  • select only the rows that start with "Table..." and copy them to the clipboard
  • paste the rows into a new window in Management Studio
  • use a regular expression search-and-replace to strip out all the text on each line before the logical read numbers
  • use a regular expression search-and-replace to strip out all the text on each line after the logical read numbers
  • copy the column of logical read numbers back into Excel
  • from Excel's "Formulas" tab, use the AutoSum button to, finally, calculate the total number of logical reads

"Crude but effective" doesn't quite do this process justice. There must be a better way, I thought.

My first idea was an AddOn for Management Studio, but the more I looked into it, I realized I just couldn't face the tedium of application programming (yes, I'm now permanently spoiled by a platform - SQL Server - that actually works without swearing at it). And while SQL Server isn't exactly famous for its text-manipulation ability, what I was was doing didn't have to be fast or elegant. And I'd rather hack something together in T-SQL in a few hours than beat my head against the C# wall for a few days. (Don't believe me? Ask someone who's written a truly great Visual Studio add-on.) As a bonus, the people using the code would be able to understand it, and maybe even build upon and improve it.

So, after an embarrassingly long delay caused by forgetting, again, that REPLACE replaces all occurrences, not just the first one, here it is. It works correctly as far as I can tell, but don't use it as an example of how to write good T-SQL code. (Please tell me about any bugs you find. Use at your own risk. Some settling of contents may occur during shipping.)

SET NOCOUNT ON
SET QUOTED_IDENTIFIER OFF

DECLARE @Text NVARCHAR(MAX) =
"-- INSERT YOUR TEXT BETWEEN THESE TWO LINES -----------------------------------
Starting at 12:25:30 PM...
Table 'Customers'. Scan count 6, logical reads 10213968, physical reads 125391, read-ahead reads 254294, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CustomerNum   InvoiceNum   Amt
----------- ------------ --------------------------------------
2344   1331111   333.23
4492   4837227 44222.84
3434   222444   91811.33
98474   54422  22.09
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Updating the total prices...
(1 row(s) affected)
Table 'Invoices'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
12:30:38: Added row to the Invoices table for 2011/10/12
Warning: Aggregate calculation excluded NULL values.
Table 'Worktable'. Scan count 3, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- INSERT YOUR TEXT BETWEEN THESE TWO LINES -----------------------------------"

DECLARE @Lines TABLE (Txt NVARCHAR(MAX), Num INT)
DECLARE @nLF   INT

WHILE 1=1
BEGIN
    SET @nLF = CHARINDEX(NCHAR(10), @Text)
    IF @nLF = 0  BREAK
    
    IF @nLF < 3
    BEGIN
        SET @Text = STUFF(@Text, 1, 1, '')  
        CONTINUE
    END
    
    INSERT @Lines (Txt)
    VALUES (LEFT(@Text, @nLF - LEN(NCHAR(13) + NCHAR(10))))

    SET @nLF = PATINDEX('%' + NCHAR(13) + NCHAR(10) + '%', @Text)
    SET @Text = STUFF(@Text, 1, @nLF, '')
END

DELETE @Lines WHERE Txt NOT LIKE '%, logical reads %'
UPDATE @Lines SET Num = PATINDEX('%, logical reads %', Txt)
UPDATE @Lines SET Txt = STUFF(Txt, 1, Num + LEN(', logical reads '), '')
UPDATE @Lines SET Num = CHARINDEX(',', Txt)
UPDATE @Lines SET Txt = LEFT(Txt, Num - 1)
UPDATE @Lines SET Num = CAST(Txt AS INT)
DELETE @Lines WHERE Num = 0

SELECT Num      AS 'Logical Reads'       FROM @Lines
SELECT SUM(Num) AS 'Total Logical Reads' FROM @Lines

SET QUOTED_IDENTIFIER ON

To use this script, just copy-and-paste the entire contents of the Output window between the two lines in red in the @Text variable's definition, and hit F5 (Ctrl-E, whatever) to execute the script. You should get output like this:

The first set of results represent the SET STATISTICS IO rows that have non-zero logical reads, useful mostly for debugging the script. You can suppress this output by commenting out the first SELECT statement in the script (third line of code from the bottom). The second set of results is what we're looking for: the total number of logical reads that appear in the text of the Output window.

1 comments:

Piotr Rodak said...

that's pretty cool.
We have a .net app that can sup up the reads of a query, but it makes sense to have such ability straight in ssms. thanks for the code.

Post a Comment