Monday, April 25, 2011

How to Hide Metadata Information Using Extended Properties

Ok, a more honest title for this post might be, 'How to Clean Up the Mess After You've Hosed the Extended Properties on Primary Key Constraints', but that wouldn't fit.

What I was trying to accomplish is a topic for another post. But while grumbling loudly while writing the code to fix what I had inadvertently done, I realized it could be used to "hide" information about tables. I say "hide", in quotes, because anyone with access to Books Online could find it easily. This is really just a way of storing information about tables, and not having it appear in SQL Server Management Studio (SSMS).

What I ran into involves the difference between primary key constraints and primary key indexes. The most important distinction between them is "existence"; there's no such thing as a primary key index. If you're confused, keep reading. If you've heard this story already, skip ahead a bit. Simply put, when someone is talking about "primary key indexes", they're using verbal shorthand for, "the index that the primary key constraint uses to enforce uniqueness." (As far as I know, this kind of index has no specific name, which is good, because if if did, we'd all confuse it with "primary key constraint", so we'd be in the same boat we are now, but without the ability to abruptly clarify things by saying, "but there's no such thing as a whatever-it-is." We should count our blessings.) To sum up, when it comes to primary keys, we have a "primary key constraint", and a unique index of some sort to help it. (We won't be getting into the whole clustered vs. non-clustered thing here.)

Now we're ready to talk about setting extended properties on primary key constraints. First we create a little test table:

    MyPkColumn INT NOT NULL,

And now we add an extended property to the PK_MyTable constraint, as so:

EXEC sys.sp_addextendedproperty 
    @name       = N'Extended property for PK_MyTable constraint'
  , @value      = N'This is the PK for MyTable'
  , @level0type = N'SCHEMA'
  , @level0name = N'dbo'
  , @level1type = N'TABLE'
  , @level1name = N'MyTable'
  , @level2type = N'CONSTRAINT'
  , @level2name = N'PK_MyTable'

Once added, this extended property will appear in SSMS, as expected. Oddly, it appears on the index-the-primary-key-constraint-uses, not on the constraint itself. (And no, I'm not suggesting you create tables in the master database.) So far, so good. But given that the "slot" for the index is already taken, and knowing what we do about about the non-existence of "primary key indexes", what happens if we do this?

EXEC sys.sp_addextendedproperty 
    @name       = N'Secret extended property for PK_MyTable index'
  , @value      = N'This is the PK index for MyTable'
  , @level0type = N'SCHEMA'
  , @level0name = N'dbo'
  , @level1type = N'TABLE'
  , @level1name = N'MyTable'
  , @level2type = N'INDEX'
  , @level2name = N'PK_MyTable'

Ideally (I think anyway) this should generate an error: we're adding an extended property to a CONSTRAINT, but setting @level2name to INDEX. What it actually does is create the extended property as if there really were such a thing as a "primary key index". Interestingly, but perhaps not surprisingly, this new extended property doesn't show up (at least not anywhere I can find) in SSMS. So how do we know it's there? We use the system views, of course!

-- Emit code to drop all extended properties on "primary key indexes". 

SELECT ep.major_id
     , ep.minor_id
     ,           AS 'Schema Name'
     ,           AS 'Table Name'
     ,           AS 'Index Name'
     ,          AS 'EP Name'
     , ep.value         AS 'EP Value'
     , 'USE ' + DB_NAME() + '   ' +
       'EXEC sys.sp_dropextendedproperty '
            + '@name = '                                 +  '''' +  + ''', '
            + '@level0type = ''SCHEMA'', @level0name = ' + quotename( + ', ' 
            + '@level1type = ''TABLE'',  @level1name = ' + quotename( + ', '
            + '@level2type = ''INDEX'',  @level2name = ' + quotename(   AS 'T-SQL'

  FROM sys.extended_properties    ep
  JOIN sys.indexes                i
    ON ep.major_id = i.object_id
   AND ep.minor_id = i.index_id
  JOIN sys.tables                 t
    ON i.object_id = t.object_id
  JOIN sys.schemas                s
    ON t.schema_id = s.schema_id
 WHERE ep.class_desc    = 'INDEX'
   AND i.is_primary_key = 1

The above displays the offending extended properties on the current database; the last column contains the code necessary to drop each one; for example:

EXEC sys.sp_dropextendedproperty 
    @name       = 'Secret extended property for PK_MyTable index', 
    @level0type = 'SCHEMA', 
    @level0name = [dbo], 
    @level1type = 'TABLE',  
    @level1name = [MyTable], 
    @level2type = 'INDEX',  
    @level2name = [PK_MyTable]

Additional Reading

Thursday, April 7, 2011

Find and Kill All System-Named Statistics

I hate system-named objects.

Some of this is to be expected from someone like me, who, depending on the decade, could be called "persnickety", "perfectionist", "OCD", or "The Anal Retentive Chef." But there's also a very practical reason to name all your database objects.

If you're using a third-party tool to synch up database schemas, you may find that every time you copy a system-named object from one side of the synch to the other, an entirely new name is generated. For example, you might have a statistic object named dbo.Customers._WA_Sys_00000001_45544755 on the left-hand side of the synch, but when you tell the tool to copy it to the right-hand side, you get a statistic named dbo.Customers._WA_Sys_00000001_436BFEE3.

Well, that won't do.

Below is a stored procedure that locates and drops all system named statistics. After you run this, just create a new statistic object with a name that follows your SQL nomenclature guidelines. (Whaddaya mean you don't have any? Get some!)

-- Comment. 

-- Drops all system-named statistics.                                          

CREATE PROCEDURE dbo.usp_CleanupStatisticsWithSysNames

    DECLARE @Cmd        NVARCHAR(MAX) = ''
    DECLARE @SchemaName sysname       = ''
    DECLARE @TableName  sysname       = ''
    DECLARE @StatName   sysname       = ''

        SELECT   AS SchemaName
             ,   AS TableName
             ,   AS StatName
          FROM sys.stats           s
          JOIN sys.stats_columns   sc
            ON s.object_id = sc.object_id
           AND s.stats_id  = sc.stats_id
          JOIN sys.tables          t
            ON s.object_id = t.object_id
          JOIN sys.schemas         m
            ON t.schema_id = m.schema_id
         WHERE ( LIKE '\_WA\_Sys\_%' ESCAPE '\'  OR  s.auto_created = 1)
           AND s.object_id > 100
      ORDER BY SchemaName
             , TableName
             , StatName

    OPEN cur
    FETCH NEXT FROM cur INTO @SchemaName, @TableName, @StatName

        SET @Cmd = 'DROP STATISTICS ' + QUOTENAME(@SchemaName) + '.' + QUOTENAME(@TableName) + '.' + QUOTENAME(@StatName)
        RAISERROR(@Cmd, 10, 1) WITH NOWAIT
        EXEC sp_executesql @Cmd
        FETCH NEXT FROM cur INTO @SchemaName, @TableName, @StatName

    CLOSE cur

Rumor has it that the "WA" in the system-generated names stands for "Washington", as in "Redmond".

Tuesday, April 5, 2011

What About "Under-Used" Indexes?

We've all spent time looking for unused indexes (right?), but today my boss asked about under-used indexes. Now, I wrote a script a while ago to find completely unused indexes, but finding indexes that were only being used a little was just different enough to require a new script.

The script below displays all indexes, with the least-used (including unused) ones first, and totals for user and system reads and updates. I figured having the ratio between reads and updates might be useful, since if it takes more updates to maintain the index than it gets used, you might want to investigate its cost-benefit.

Some caveats:

  • If your server just rebooted, these numbers probably won't mean a lot; wait for at least a full day to allow time for the indexes to be read or written to.
  • Also, don't forget about periodic events like end-of-month processing. You don't want a phone call at 12:15 am on the first of the month when it turns out that index was needed after all.
  • Just because it costs more to maintain an index doesn't mean it's not still worth having. The cost of maintaining the index is probably spread out over a long period of time; the benefit to a waiting user of a query that takes two seconds instead of five minutes to run might outweigh that cost. It depends on your particular environment.
  • Note that indexes that support primary key constraints are excluded, as I'm pretty sure I'm not going to ever want to drop them!

Finally, consider disabling any under-used indexes you find, rather than just dropping them. If it turns out the index was important, you'll still have to rebuild the index, but at least the DDL code will be there already.

-- Display unused and under-used indexes. 


; WITH cteUser AS
    SELECT object_id,
           user_seeks + user_scans + user_lookups   AS 'User Reads',
           user_updates                             AS 'User Updates'
      FROM sys.dm_db_index_usage_stats   
     WHERE database_id = @dbid
cteSystem AS
    SELECT object_id,
           system_seeks + system_scans + system_lookups   AS 'System Reads',
           system_updates                                 AS 'System Updates'
      FROM sys.dm_db_index_usage_stats   
     WHERE database_id = @dbid
cteTotal AS
    SELECT u.object_id,
           [User Reads]   + [System Reads]     AS 'Total Reads',
           [User Updates] + [System Updates]   AS 'Total Updates'
      FROM cteUser     u
      JOIN cteSystem   s
        ON u.object_id = s.object_id
       AND u.index_id  = s.index_id
cteReadToUpdateRatio AS
    SELECT object_id,
                  CONVERT(MONEY, ISNULL(
                                        CAST([Total Reads] AS REAL)
                                        NULLIF([Total Updates], 0.0)
                                        , [Total Reads]
                               , 1
                  )   AS 'Read-to-Update Ratio'
      FROM cteTotal
SELECT OBJECT_NAME(i.object_id)               AS 'Table Name',                                 AS 'Index Name',
       u.[User Reads],
       u.[User Updates],
       s.[System Reads],
       s.[System Updates],
       t.[Total Reads],
       t.[Total Updates],
       r.[Read-to-Update Ratio],
       t.[Total Reads] + t.[Total Updates]    AS 'TOTAL READS + UPDATES'
  FROM cteUser                        u
  JOIN cteSystem                      s
    ON u.object_id = s.object_id
   AND u.index_id  = s.index_id
  JOIN cteTotal                       t
    ON u.object_id = t.object_id
   AND u.index_id  = t.index_id
  JOIN cteReadToUpdateRatio           r
    ON u.object_id = r.object_id
   AND u.index_id  = r.index_id
  JOIN sys.indexes                    i
    ON u.object_id = i.object_id
   AND u.index_id  = i.index_id
  JOIN sys.objects                    o
    ON u.object_id = o.object_id
 WHERE OBJECTPROPERTY(o.object_id, 'IsUserTable') = 1
   AND i.is_primary_key = 0
   --AND (t.[Total Reads] = 0                                  -- These definitely need to be dropped. 
   -- OR CAST(r.[Read-to-Update Ratio] AS REAL) < 1.0)         -- These may need to be dropped. 
 ORDER BY CAST(r.[Read-to-Update Ratio] AS REAL)

Friday, April 1, 2011

ISNULL or COALESCE? Sometimes it Really Does Matter

After seeing some poor performance in a query that I couldn't figure out, I ended up reading a lot of discussions about the "ISNULL / COALESCE" debate. Some people feel one is faster than the other, and some people like COALESCE because it's ANSI standard. (And they hold these opinions very strongly.)

I always thought the two were identical (when just two values are involved, of course), but it turns out that they have a difference that can affect performance. The value that ISNULL returns is typed to be same as the type of the first argument. The value that COALESCE returns is typed to be the same as the argument with the highest data type precedence.

Adam Machanic says: "What does this have to do with query performance? Sometimes, when using ISNULL or COALESCE as part of a predicate, a user may end up with a data type mismatch that is not implicitly convertable and which therefore causes a table scan or other less-than-ideal access method to be used."

Data type mismatches can cause table scans? I had totally forgotten about that. Makes sense though: if there's no implicit conversion, you'll have to use a CAST or CONVERT, which means each row will have to be evaluated.

That's what was causing the poor performance I was seeing. I wouldn't advocate a global search-and-replace, but I will be looking more closely at the COALESCE statements I come across.