Sunday, October 16, 2011

Adding Up the Logical Reads in the Output Window

nixie

Probably the first thing I learned about query tuning is that "clock-on-the-wall" time is definitely not the thing you want to measure. The elapsed time a query takes can be influenced by countless factors (other processes running, disk speed, number of processors, amount of RAM, phase of the moon, a few hundred database settings) that it's too rough a metric for serious performance tuning.

What I've been taught to measure, instead, are logical reads, which are the number of disk I/O reads needed to execute the query. If the query doesn't change, and the underlying data doesn't change, it's been my experience (and it makes sense theoretically), that you can run a given query all day long, under any load conditions you can create, and the number of logical reads will always be the same. This immunity to outside "noise" makes it ideal for tuning: you can be pretty sure (given the same schema, of course - indexes leap to mind - and at least similar data) that the logical read counts on your machine will be the same on another machine (such as the Production server, or your boss's laptop).

Ok, so how do we capture logical reads? Easy - simply enable the STATISTICS IO option, like so:

SET STATISTICS IO ON

Once turned on for a given session, this emits the number logical reads made for every table (including temp tables) for every statement executed, until you turn it off (by saying SET STATISTICS IO OFF). Let's look at a single line that would be emitted:

Table 'Invoices'. Scan count 1, logical reads 332, physical reads 10, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You'll notice a lot of other values displayed on this line, but for now, ignore them. One that you should especially ignore (for now) is physical reads, because its count is included in the logical reads value. That is, in our one-line example, we can see there are going to be 332 pages read from somewhere. That "somewhere" will be either in RAM (the buffer pool), or from disk (your "C:" drive, for example), but both of these "somewheres" are counted by logical reads. Physical reads only counts what was read from disk this time, which is the kind of unpredictable "clock-on-the-wall" information we know to be unreliable for query tuning (for now). Why are they unpredictable? Because who knows why the page wasn't in buffer pool in memory, and so had to read from disk, at the precise moment we needed it? It's unknowable, and even if it weren't - if you had perfect knowledge of the innards of the computer at that moment - it's still unpredictable.

Looking at our one-line example above, everything seems pretty rosy. SET STATISTICS IO ON spits out the logical reads we need, so all we have to do is total them up. But this is where it gets ugly, because the information we're interested in is interspersed with our own PRINT messages, RAISERROR output, warning and informational messages from the system, and data output from SELECT statements (if we're using "Output to Text" mode, which I often do, as it's the only way to get the dang output window to scroll). This can all be quite a mess to read:

Starting at 12:25:30 PM...
Table 'Customers'. Scan count 6, logical reads 10213968, physical reads 125391, read-ahead reads 254294, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CustomerNum   InvoiceNum   Amt
----------- ------------ --------------------------------------
2344   1331111   333.23
4492   4837227 44222.84
3434   222444   91811.33
98474   54422  22.09
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Updating the total prices...
(1 row(s) affected)
Table 'Invoices'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
12:30:38: Added row to the Invoices table for 2011/10/12
Warning: Aggregate calculation excluded NULL values.
Table 'Worktable'. Scan count 3, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Now, try to total up the logical reads in that output. If there's only a few lines of output, it's manageable, but imagine trying to decipher hundreds of lines of stuff like this. Over time, I developed a rigamarole system:

  • copy the entire contents of the Output window to the clipboard
  • fire up Excel, and paste the output into it
  • from Excel's "Data" tab, click the "Sort A to Z" button
  • select only the rows that start with "Table..." and copy them to the clipboard
  • paste the rows into a new window in Management Studio
  • use a regular expression search-and-replace to strip out all the text on each line before the logical read numbers
  • use a regular expression search-and-replace to strip out all the text on each line after the logical read numbers
  • copy the column of logical read numbers back into Excel
  • from Excel's "Formulas" tab, use the AutoSum button to, finally, calculate the total number of logical reads

"Crude but effective" doesn't quite do this process justice. There must be a better way, I thought.

My first idea was an AddOn for Management Studio, but the more I looked into it, I realized I just couldn't face the tedium of application programming (yes, I'm now permanently spoiled by a platform - SQL Server - that actually works without swearing at it). And while SQL Server isn't exactly famous for its text-manipulation ability, what I was was doing didn't have to be fast or elegant. And I'd rather hack something together in T-SQL in a few hours than beat my head against the C# wall for a few days. (Don't believe me? Ask someone who's written a truly great Visual Studio add-on.) As a bonus, the people using the code would be able to understand it, and maybe even build upon and improve it.

So, after an embarrassingly long delay caused by forgetting, again, that REPLACE replaces all occurrences, not just the first one, here it is. It works correctly as far as I can tell, but don't use it as an example of how to write good T-SQL code. (Please tell me about any bugs you find. Use at your own risk. Some settling of contents may occur during shipping.)

SET NOCOUNT ON
SET QUOTED_IDENTIFIER OFF

DECLARE @Text NVARCHAR(MAX) =
"-- INSERT YOUR TEXT BETWEEN THESE TWO LINES -----------------------------------
Starting at 12:25:30 PM...
Table 'Customers'. Scan count 6, logical reads 10213968, physical reads 125391, read-ahead reads 254294, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
CustomerNum   InvoiceNum   Amt
----------- ------------ --------------------------------------
2344   1331111   333.23
4492   4837227 44222.84
3434   222444   91811.33
98474   54422  22.09
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Updating the total prices...
(1 row(s) affected)
Table 'Invoices'. Scan count 1, logical reads 332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
12:30:38: Added row to the Invoices table for 2011/10/12
Warning: Aggregate calculation excluded NULL values.
Table 'Worktable'. Scan count 3, logical reads 120, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- INSERT YOUR TEXT BETWEEN THESE TWO LINES -----------------------------------"

DECLARE @Lines TABLE (Txt NVARCHAR(MAX), Num INT)
DECLARE @nLF   INT

WHILE 1=1
BEGIN
    SET @nLF = CHARINDEX(NCHAR(10), @Text)
    IF @nLF = 0  BREAK
    
    IF @nLF < 3
    BEGIN
        SET @Text = STUFF(@Text, 1, 1, '')  
        CONTINUE
    END
    
    INSERT @Lines (Txt)
    VALUES (LEFT(@Text, @nLF - LEN(NCHAR(13) + NCHAR(10))))

    SET @nLF = PATINDEX('%' + NCHAR(13) + NCHAR(10) + '%', @Text)
    SET @Text = STUFF(@Text, 1, @nLF, '')
END

DELETE @Lines WHERE Txt NOT LIKE '%, logical reads %'
UPDATE @Lines SET Num = PATINDEX('%, logical reads %', Txt)
UPDATE @Lines SET Txt = STUFF(Txt, 1, Num + LEN(', logical reads '), '')
UPDATE @Lines SET Num = CHARINDEX(',', Txt)
UPDATE @Lines SET Txt = LEFT(Txt, Num - 1)
UPDATE @Lines SET Num = CAST(Txt AS INT)
DELETE @Lines WHERE Num = 0

SELECT Num      AS 'Logical Reads'       FROM @Lines
SELECT SUM(Num) AS 'Total Logical Reads' FROM @Lines

SET QUOTED_IDENTIFIER ON

To use this script, just copy-and-paste the entire contents of the Output window between the two lines in red in the @Text variable's definition, and hit F5 (Ctrl-E, whatever) to execute the script. You should get output like this:

The first set of results represent the SET STATISTICS IO rows that have non-zero logical reads, useful mostly for debugging the script. You can suppress this output by commenting out the first SELECT statement in the script (third line of code from the bottom). The second set of results is what we're looking for: the total number of logical reads that appear in the text of the Output window.

Sunday, October 9, 2011

SSIS: Word-wrapping Annotations (Using Only Notepad)

Kaukauna 41 Junk Yard HDRRemember that Twilight Zone episode, where a man sells his soul to the Devil for immortality, only to be sentenced to life in prison the next day?

I'm learning SSIS (2005, for now), and have discovered, like all those before me, that Microsoft implemented Annotations, but... they don't word-wrap! Your Annotation must fit on one really long one line! Bwa-ha-ha-ha-ha-ha!

Of course, I'm no longer surprised by anything SSIS: I even have a bookmark for "SSIS" Google searches now. And that's how I found Paul Blackwell's excellent Hidden SSIS Features: Word Wrapping Your Annotations And More which covers the subject extremely well. I highly recommend the entire article, especially the section titled Transparency & Word Wrapping.

Another very interesting read is [SSIS] Modifier l'apparence des packages via l'API [FAIL] by François Jehl. (We non-Francophones can use Google's translate-to-English, or the page itself has a Microsoft translation button.) After some research, his conclusion is that Annotations are a feature from a product created before SSIS ("DaVinci Design Surface", hence "MSDDS"), which might explain why Annotations are hacked implemented in SSIS package files as XML nested inside the document's XML. (Hmmm... that would explain all the &lt;s and &gt;s floating around!) For what François was attempting, that was a fatal roadblock. Fortunately, we're not trying anything nearly so ambitious or useful as he was - certainly nothing Notepad can't handle!

Anyway, using what they've uncovered, here's how to make your Annotations word-wrap. (This is how I do it - use at your own risk, and make a backup of your package file first). Close Visual Studio, and open the package file in Notepad. Search for controlprogid="MSDDS.Text", and you'll be treated to nested XML that looks something like this:

&lt;ddscontrol controlprogid="MSDDS.Text" left="4118" top="-655" logicalid="18" controlid="9" masterid="0" hint1="0" hint2="0" width="6153" height="994" noresize="0" nomove="0" nodefaultattachpoints="1" autodrag="0" usedefaultiddshape="1" selectable="1" showselectionhandles="1" allownudging="1" isannotation="0" dontautolayout="0" groupcollapsed="0" tabstop="1" visible="1" snaptogrid="0"&gt;
    &lt;control&gt;
      &lt;ddsxmlobjectstreaminitwrapper binary="0002000009180000e20300000300640000000500008008000000000000002a000054006800650001000000900144420100065461686f0074006f002000506d615800540068006500200063006f006e006e0065006300740069006f006e0020006d0061006e00610067006500054006800650007200200066006f0074006f0020005000720020007400680069007300200d61582005400610073006b0020006600610069006c0073002000730070006f007200610064000540068006500074006f00200050006900630061006c006c007900200075006e006c00650073007300200069007400270073002000730065007400200d6158005400680065002006100730073006900760065002000540068006500004d006f00640065002e0000000000" /&gt;
    &lt;/control&gt;
    &lt;layoutobject&gt;
      &lt;ddsxmlobj /&gt;
    &lt;/layoutobject&gt;
    &lt;shape groupshapeid="0" groupnode="0" /&gt;
  &lt;/ddscontrol&gt;

See the red "2" at position 61 of the ddsxmlobjectstreaminitwrapper item? To enable word-wrap, simply OR in a value of 1: in this case, we change the "10" (decimal 2) to "11" (decimal 3). Once that's done, save the package file (you made a backup first, right?), exit Notepad, open the package with Visual Studio, and - viola! - your Annotation is word-wrapped! Now do the same thing for all your other Annotations. (Hey, it's a hack, not magic.)

So, if it's just a bit-flip, why didn't Microsoft simply put a checkbox on the Properties page for Annotations? Because... Annotations don't have Property pages! Bwa-ha-ha-ha-ha-ha!

Tuesday, October 4, 2011

SSIS FTP Task: Directory is not specified in the file connection manager

Are you getting this error message, been Googling for an hour, and nothing helps?

Error: 0xC0029186 at {Your FTP Task}, {Your FTP Task Description}:
Directory is not specified in the file connection manager {Your Variable Name}.
rusty....

If you entered the value for the source and/or target filename variable and enclosed it with double-quotes, try removing them. Worked for me.

(Before coming over to the SQL Server side, I was a full-time Windows application developer, working in C, C++, and C#, ever since Microsoft QuickC for Windows was released, way back in 1990. If you've ever wondered what it's like being a Windows application developer, imagine that everything in SQL Server was designed and implemented just like SSIS. If you've ever spent much time writing application code for Windows, you'll know exactly what I mean. If not, count your blessings!)