Friday, August 28, 2015

The “Case” of the Failing SSIS Lookup

I recently ran across an issue that reminded me of an often overlooked “feature” of the SSIS lookup task.  Before jumping to the punch line, let me set the stage…

I was working on a very simple SSIS package.  Basically, just pulling some data from an operational system, landing that data into a staging table, then using a lookup task to determine whether to insert or update specific records in a final destination table.  Lots of ways to do that, but in this case, I was using a lookup task to bounce the records off the destination table.  If a given record had a lookup match, it became an update; otherwise, it became an insert.  After testing the packages several times, it was deployed to production and ran successfully.  Most of the time.  Well, about half of the time.  The other half of the time, the package would fail and the error messages indicated that inserts were failing due to duplicate key violations.  The destination table had a unique constraint (index) in place on its set of business keys – and the package was failing due to unique constraint violations when attempting to insert the same business key…

Of course, at first glance, this didn’t make much sense.  You see, the lookup task was using the same set of business keys, so any record flowing through the package and through the lookup *should* have been found if the given set of business key values existed in the destination table.  And any records with a successful lookup match should have been routed to the update part of the data flow (which was actually just landing those records into another staging table that was used after the data flow to do a set-based update).  So, how was it that sometimes records would fail to be found by the lookup task and flow to the insert part of the data flow but then fail to be inserted into the table based on unique constraint violations?  I’ll admit that at first, I was stumped.  I removed the mappings in the lookup task and reset them, thinking maybe that would help.  Nope.  I deleted the lookup task and recreated it, thinking maybe that would help.  Nope.  Okay, time for some serious troubleshooting here!  So, I created another (temporary) staging table and reconfigured the insert part of the data flow to insert the records into that new staging table instead of the real destination table.  And ran the package a few times.  After each execution, I examined both the new staging table as well as the other “updates” staging table to see where the lookup had routed all of the incoming rows.  Since the package was pulling from a live operational system, it was hard to get the same set of records each time (which made the troubleshooting a bit harder), but I was able to see that indeed, sometimes the lookup would route “existing” records to the new staging table that represented those rows that had failed the lookup.  When I manually joined that new staging table to the destination table based on the business keys, the join did indeed verify that some of the records existed in the destination table even though the lookup thought they did not.

So, what was going on here?  After scratching my head a few times, I reached out to a good friend of mine, Tom Huneke from Key2 Consulting.  Tom and I work together as consulting resources for the Department of Veterans Affairs.  After going through the package with Tom and looking at the results captured in the two staging tables, Tom immediately saw the problem – the records that the lookup task failed to find (and therefore routed to the new staging table that represented inserts) had the same business key values as existing records in the destination table – except in the “case” of the data.  For example, a record in the destination table had a value of “TBICUBE” in one of the business key columns while the record that had “failed” the lookup and was flagged as a new record had the value of “TBICube”.  I had forgotten that although your source and destination systems (in this case, both SQL Server) might be configured to be case-insensitive, SSIS and its various tasks are NOT case-insensitive.  While I was able to successfully join the records with “TBICUBE” and “TBICube” as values using T-SQL (and those values were considered equivalent), SSIS and its lookup task didn’t see it that way.

The quick fix was to change the lookup task to use a query that forced the business key values to lower case and add a derived column task before the lookup to create lower-case versions of the incoming data columns.  Configuring the lookup task to ensure it was comparing on lower-case only values solved the problem.  As to why the package was succeeding sometimes and failing sometimes, a bit more research of the source system revealed that sometimes it recorded values (in the example above, the “TBICube” value) using all upper case, sometimes using mixed case, and sometimes using all lower case.  Don’t get me started on why it might have been doing that – it is what it is in this case and will likely never change.  As a result, sometimes the recorded values in the source records would match (as the lookup task was using case-sensitive matching) destination table records, and sometimes they wouldn’t.

In the end, SSIS and its lookup task were operating exactly as designed – and exactly as documented.  Sometimes even those of us who have been creating SSIS solutions for more years than we care to admit will forget about something as simple as this – and spend way to much time troubleshooting the problem and overlooking the obvious!

No comments: