Tuesday, August 24, 2010

Remove All Text Between Single Quotes

There's a zillion posts and articles about how to escape single quotes in T-SQL, but what I wanted to do was strip out the characters that were inside the single quotes. After a ridiculous amount of Googling turned up nothing, I spent an even more ridiculous amount of time figuring it out.

The code below is a complete, working example of removing the characters from between two single quotes for every row of a table. It also removes the single quotes; modifying the code to leave the single quotes in is left as an exercise for the reader. (Ok, just add 1 to PosBeg and subtract 1 from PosEnd.)

--------------------------------------------
-- How to convert the rows' values to this: 
--                              
--      The cow says .          
--      The duck says .         
--      The clock says .        
--                              
-- In other words, remove the single quotes
-- and the characters *between* them. 
--------------------------------------------

SET NOCOUNT ON
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON

CREATE TABLE t (c NVARCHAR(4000) NOT NULL)

INSERT INTO t 
          SELECT 'The cow says ''moo''.'
    UNION SELECT 'The duck says ''quack''.'
    UNION SELECT 'The clock says ''eleven-thirty''.'
    
SELECT c FROM t ORDER BY c
 
UPDATE t SET c = STUFF(c,
                       CHARINDEX('''', c),
                       CHARINDEX('''', c, CHARINDEX('''', c) + 1) - CHARINDEX('''', c) + 1,
                       ''
                      )

SELECT c FROM t ORDER BY c
DROP TABLE t

If you insert the code below into the code above just before the UPDATE statement, it may make what's going on clearer:

SELECT CHARINDEX('''', c)                              AS PosBeg FROM t

SELECT CHARINDEX('''', c, CHARINDEX('''', c) + 1)      AS PosEnd FROM t
 
SELECT CHARINDEX('''', c, CHARINDEX('''', c) + 1) -
       CHARINDEX('''', c)                              AS Length FROM t 

SELECT SUBSTRING(c,
                 CHARINDEX('''', c),
                 CHARINDEX('''', c, CHARINDEX('''', c) + 1) - CHARINDEX('''', c) + 1
                )
  FROM t

The really important thing I learned from this is that (a) when people say T-SQL was not intended for string manipulation, they're not kidding, and (b) the way to do anything remotely fancy with strings is one tiny step at a time. As an old C/C++/C# developer, I underestimated the difficulty of this problem: playing with strings ceased to be interesting as soon as CString came out. I finally got it through my head that the only hope of implementing this in an intelligible way was to start from the inside, and work my way out. That's where the intermediate variables PosBeg, PosEnd, and Length came from - they were the baby steps I took along the way.

I haven't programmed using intermediate variables in a long, long time. I don't write C# like that, and I sure don't write other T-SQL that way, but string manipulation in T-SQL drove me back to it. I wonder if it's T-SQL's lack of powerful string operators, or just the familiarity of old habits, that are to blame?

1 comments:

Anonymous said...

Very helpful code, thank you!

Can I suggest possibly wrapping your whole STUFF function in an ISNULL with the original string to handle records which don't have any of the delimiter characters you are trying to remove (e.g. the single quotes in your example). When I tried to implement this for my code I found that the records which didn't contain any delimiters were returning NULL values (not what I wanted).

Post a Comment