Sunday, February 28, 2010

Do your Clustered Indexes on IDENTITYs Have the Correct FILLFACTOR?

The great thing about blogging is that it forces you to be as sure as you can that you know what you're talking about. That alone makes it worthwhile...

I started this FILLFACTOR post a few days ago with a completely different goal in mind. After I did my due diligence, however - perusing Books OnLine, seeing what's in my "dead trees" library, and reading my favorite gurus' blogs - I realized that the "tip" I was preparing to demonstrate was completely wrongheaded.

In the process, I learned a lot about setting FILLFACTOR values. One thing I learned was that clustered indexes on IDENTITY columns should almost always have their FILLFACTOR set to 100% (completely full - no room for expansion). The reasoning has to do with page splits, which are expensive, especially on a clustered index.

Since IDENTITY columns are monotonically increasing, INSERTs will always be applied to the end of the index, so an INSERT will never cause a page split. UPDATE-ing a record can cause page splits, but only if a variable-length field (like, an NVARCHAR) is updated with more data than it had before. Since the page is already 100% full, there's nowhere to put the extra data: that causes a page split.

Can we avoid page splits like this by setting the FILLFACTOR to something less than 100% - that is, leave some empty space on each page? Sure, but, as always, there's a downside. Database "reads" typically outnumber "writes" by a factor of n to 1, right? Whether n is 5 or 50 depends on who you believe, and the unique characteristics of each database, and certainly changes over time; so it's a little vague, but in any case, the typical OLTP database does a lot more reads than writes. That's important, because setting the FILLFACTOR to less than 100% can make reads much slower: that's the downside.

To see why this is, imagine you have a table with 1000 rows, and 50 rows will fit on an 8 KB page. With a FILLFACTOR of 100%, every page is full (by definition), this will require 100 / 50 = 20 pages. So, a scan of this table will require, at most, 20 reads.

Now imagine you've set the FILLFACTOR to 80%, leaving each page 20% empty after you REBUILD it. This leaves lots of room for growth, so we'll reduce the number of page splits. And that's a good thing.

But now, only 40 rows (80% of 50 rows) will fit on each page, so the table now requires 100 / 40 = 25 pages. A scan of the table will now require, at most, 25 reads. You've just decreased read performance by, in the worst case, 1 - (20 / 25) = 20%. Oops.

So, to be sure all the tables that are clustered on an IDENTITY column have a FILLFACTOR setting of 100%, I've written a script, Index_ClusteredOnIdentity_SetFillFactorTo100_All.sql . The script first finds the tables to be rebuilt, using a cursor on this statement:

SELECT OBJECT_SCHEMA_NAME(i.object_id)   AS 'Schema Name',
       OBJECT_NAME(i.object_id)          AS 'Table Name',
       i.name                            AS 'Index Name'
  FROM sys.indexes                  i
  JOIN sys.index_columns            ic
    ON ic.object_id = i.object_id
   AND ic.index_id  = i.index_id 
  JOIN sys.columns                  c
    ON c.object_id  = ic.object_id
   AND c.column_id  = ic.column_id
  JOIN sysindexes                   si
    ON si.id        = i.object_id
   AND si.indid     = i.index_id
 WHERE i.object_id                     > 100
   AND OBJECT_SCHEMA_NAME(i.object_id) != 'sys'
   AND i.is_disabled                   = 0 
   AND i.type_desc                     = 'CLUSTERED'
   AND c.is_identity                   = 1
   AND si.origfillfactor               NOT IN (0, 100)

This should be pretty self-explanatory. The system tables are joined in the usual way to get at the index's column(s). We restrict the result set to exclude system tables and tables in the "sys" that can be returned in certain cases. We don't REBUILD disabled indexes, because that also enables them, a side-effect you may not expect. We of course only want clustered indexes on IDENTITY columns. Finally, there's no point in rebuilding the index unnecessarily.

As we iterate through the resultset, we build up a statement like:

ALTER INDEX PK_Product_ProductID
   ON Production.Product
   REBUILD WITH (ONLINE = OFF, PAD_INDEX = ON, FILLFACTOR = 100)

Rebuilding a clustered index can be a time-intensive process, but at least, as BooksOnLine says, "Rebuilding a clustered index does not rebuild associated nonclustered indexes unless the keyword ALL is specified" - and we're not specifying it.

Tuesday, February 16, 2010

Find and Fix NOT NULL Columns Without DEFAULT Constraints

The latest and greatest version of my FindAllNotNullColumnsWithNoDefaultConstraint.sql script (formerly, embarrassingly, FindAllNonNullableColumnsWithNoDefaultConstraint.sql) is now available. It returns the DML code to add DEFAULT constraints to all NOT NULL columns that don't already have them. Useful on legacy databases to head off problems before they happen.

This is a work in progress, so the usual caveats apply: do not run this on any database you want to keep, for example, a production database, or any database that isn't backed up. Comments welcome!

Thursday, February 11, 2010

Script Produces DML to Fix Index Problems

(Getting the hang of this blogging thing... Rather than post an entire SQL script as a blog post, I've created a page for all my scripts. I'll add a blog post whenever a script is added or changed.)

The latest and greatest version of my IndexDemon.sql script (formerly, Index_Analysis.sql) is now available. It examines all the indexes in the current database and emits DML code to correct any problems it finds. It makes no changes to your database itself - you have to copy and paste the emitted SQL - seemed safer.

This is a work in progress, so the usual caveats apply: do not run this on any database you want to keep, for example, a production database, or any database that isn't backed up. Comments welcome!

How Do You Begin a SQL Script?

Have you ever noticed that, regardless of where you work, there's always some kind of "standard header" for SQL scripts? Sometimes, it's a formal, documented set of T-SQL commands, with clear goals and a well thought out implementation.

More often, though, it's like that Star Trek episode where everyone's forgotten the meaning of the Preamble to the United States Constitution, but can still kinda grunt the words? In the same way, a few lines of T-SQL, copied from file to file to file, can devolve into, essentially, a religious ritual. (I don't know what all this does, but our other scripts do it, so I better too!)

After getting bitten by a misuse of SET QUOTED_IDENTIFIER, I sat down to research the issue. Before discussing the code, here's what I came up with; this is the code I use, with occasional modifications, to start most of my scripts:

-------------------------------------------------------------------------------
-- MUST be set as shown to support indexes on computed columns, indexed views. 
SET ANSI_NULLS ON                                 -- Deprecated: leave set ON. 
SET ANSI_PADDING ON                               -- Deprecated: leave set ON. 
SET ANSI_WARNINGS ON                              -- No trailing blanks saved. 
SET ARITHABORT ON                                 -- Math failure not ignored. 
SET CONCAT_NULL_YIELDS_NULL ON                    -- NULL plus string is NULL. 
SET NUMERIC_ROUNDABORT OFF                        -- Allows loss of precision. 
SET QUOTED_IDENTIFIER ON                          -- Allows reserved keywords. 

-- These aren't, strictly speaking, required, but are generally good practice. 
SET NOCOUNT ON                                    -- Minimize network traffic. 
SET ROWCOUNT 0                                    -- Reset in case it got set. 
SET XACT_ABORT ON                                 -- Make transactions behave. 
-------------------------------------------------------------------------------

IF DB_NAME() IN ('master', 'tempdb', 'model', 'msdb')                  RAISERROR('   $$$ YOU ARE ATTACHED TO A SYSTEM DB $$$',   20, 1) WITH NOWAIT, LOG
IF 10 > CAST(CAST(SERVERPROPERTY('ProductVersion') AS CHAR(2)) AS INT) RAISERROR('   $$$ REQUIRES SQL SERVER 2008 OR LATER $$$', 20, 1) WITH NOWAIT, LOG
IF @@TRANCOUNT <> 0                                                    RAISERROR('   $$$ OPEN TRANSACTION EXISTS $$$',           20, 1) WITH NOWAIT, LOG

Let's divide this code into three sections and examine them one at a time.

Standard SET commands

This section was easy to develop, because they are all required if you want to have indexed views (or indexed calculated columns) in your database. I came to that conclusion when I researched each command independently; only when I was finished did I notice this in BOL 2008:

ANSI_NULLS is one of seven SET options that must be set to required values when dealing with indexes on computed columns or indexed views. The options ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, QUOTED_IDENTIFIER, and CONCAT_NULL_YIELDS_NULL must also be set to ON, and NUMERIC_ROUNDABORT must be set to OFF.

Optional SET commands

These three commands are "defensive" (read, "paranoid") programming; ideally, you'd never have to do things like this, but better safe than unemployed.

Critical error checks

I like to do these to prevent run-time errors; note that I'm raising error level 20, which severs the session's connection, because if these errors occurs, I want execution to stop now.

If you've ever accidentally run your "create a bunch of tables" script against the master database, you'll appreciate the first line. It just checks the current database against the well-known names of the system databases. (Curiously, I couldn't find an "is_system" column in the system views for databases.)

The second critical error check is a simple SQL Server version check - to me there's nothing worse than downloading a cool script and having it crash because it requires version n + 1.

The final check is just to be sure there's not a transaction left often. This is the line of code I often end up commenting out in production, but during development it's nice to have, as it prevents those annoying "deadlocks" in SSMS when you forget to close a transaction.

So, these are what I use - did I miss a favorite of yours?

Friday, February 5, 2010

Use QUOTENAME( )... [or else!]

Lately, I seem to be reading a lot of "bracket-slapping" code:

SET @Sql = 'ALTER INDEX [' 
         + @IndexName  + '] ON [' 
         + @DBName     + '].[' 
         + @SchemaName + '].['
         + @ObjectName + '] REORGANIZE '

This is much easier to read (and easier to type!) than using the QUOTENAME() function:

SET @Sql = 'ALTER INDEX ' 
         + QUOTENAME(@IndexName)  + ' ON ' 
         + QUOTENAME(@DBName)     + '.' 
         + QUOTENAME(@SchemaName) + '.'
         + QUOTENAME(@ObjectName) + ' REORGANIZE '

BOL says that QUOTENAME() "[r]eturns a Unicode string with the delimiters added to make the input string a valid Microsoft SQL Server delimited identifier." So these two code snippets are pretty much interchangable, right?

But QUOTENAME() would be pretty lame if all it did was slap brackets around a string, wouldn't it? Fortunately (or unfortunately, if you've got a lot of "bracket-slapping" code laying around) it does more: it "escapes" embedded bracket characters. For example, let's take the table name 'Sales YTD [2009]', and try to create it.

-- Demonstrate that QUOTENAME() is safer than bracket-slapping.
DECLARE @Sql      NVARCHAR(4000) = ''
DECLARE @TableOne sysname        = 'Sales YTD [2009]'

-- Expected to fail: table name has spaces, brackets.
SET @Sql = 'CREATE TABLE '  + @TableOne            + ' (x INT)'
PRINT '(1)  ' + @Sql
EXEC sp_executesql @Sql
PRINT ' '

-- Expected to fail: bracket-slapping doesn't handle brackets.
SET @Sql = 'CREATE TABLE [' + @TableOne            + '] (x INT)'
PRINT '(2)  ' + @Sql
EXEC sp_executesql @Sql
PRINT ' '

-- Expected to succeed.
SET @Sql = 'CREATE TABLE '  + QUOTENAME(@TableOne) + ' (x INT) '
         + CHAR(10) + '     '
         + 'DROP   TABLE '  + QUOTENAME(@TableOne)
PRINT '(3)  ' + @Sql
EXEC sp_executesql @Sql
PRINT ' '

The first attempt uses the name with no "decoration"; it's not a legal identifier name, and so will fail. The second attempt uses "bracket-slapping"; the brackets in the table's name will cause this to fail. The third, and correct, method uses QUOTENAME(). Running the code above results in this output:

(1)  CREATE TABLE Sales YTD [2009] (x INT)
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'YTD'.
 
(2)  CREATE TABLE [Sales YTD [2009]] (x INT)
Msg 105, Level 15, State 1, Line 1
Unclosed quotation mark after the character string 'Sales YTD [2009] (x INT)'.
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 'Sales YTD [2009] (x INT)'.
 
(3)  CREATE TABLE [Sales YTD [2009]]] (x INT) 
     DROP   TABLE [Sales YTD [2009]]]

Moral of the story: always use QUOTENAME() - unless you enjoy bug hunts!