Tuesday, April 5, 2016

Introduction to BI Office from Pyramid Analytics

BI Office, from Pyramid Analytics, is a business intelligence and analytics tool that includes four main components, all provided within a web-based application:

  • Data Discovery allows a user to connect to an SSAS data model (tabular or multidimensional) to create reports using grids, charts, advanced visualizations, gauges (based on KPI’s), and maps (using the Bing mapping service). Reports get saved into a book (a term familiar to any who ever used or heard of ProClarity), which is managed within the content management system (folders maintained within a BI Office repository).
  • Story Board provides a dashboarding tool that allows a user to create dashboards (or story boards) based on a “slide” paradigm. Each slide can contain report content from books created using the Data Discovery component plus other assets like images, logos, text, buttons of various types, etc. Dashboards can be saved and managed within the content system as well.
  • Publication gives a user the ability to do two things. First, users can create print-ready reports using content from Data Discovery books along with document-like constructs including headers, footers, paragraphs of text (and even dynamic text that is based on SSAS data model queries), etc.  And secondly, users can create alerts that provide contextual notifications to users based on data model queries (for example, notifying a user that a value in a cube is rising or falling).  Publication documents and alerts can be scheduled for execution and distributed via email or saved back into the content management system.
  • Data Model allows a user to create a mash-up of data via a wizard interface that walks the user through creating an SSAS tabular model.  Users can select from various data sources, combine tables/data sets from those sources, include calculations, etc. as they define their model.  Once models are created, they can be uploaded to an SSAS tabular server and scheduled for recurring refreshes.  These data models can then be used as sources for Data Discovery reports (which can then be used for analysis, included in dashboards, added to publications, etc.)

As mentioned above, the BI Office components exist within a web-based application that today uses Silverlight technology; thankfully, Pyramid Analytics has indicated that a future version will be based on HTML5 instead.  That said, the existing web-based application is extremely functional and works fine for most Microsoft-centric organizations – which is where BI Office fits best given its dependence on SQL Server technologies.  The BI Office system uses SQL Server to house its repository (as a typical SQL Server database) and the Data Discovery component can only connect to SSAS data models (whether multidimensional or tabular models created using Visual Studio and deployed to an SSAS server or those tabular models created by users using the Data Model component).

Beyond the major components, BI Office includes several compelling features that make it a very capable and interesting BI application:

  • Business logic in the form of custom measures, custom members, custom sets, and even KPI’s can be defined and managed within the BI Office content management system. These business logic items are associated with the SSAS data model they relate to so that users can share definitions created using the BI Office client. This allows users to extend the SSAS data models beyond the original design created by the SSAS developers.
  • Beyond the standard charts typical to just about any BI and data analytics tool, BI Office includes several advanced visualizations including color/heat maps, sunburst diagrams, waterfall and pareto charting, word clouds, cluster-packing diagrams, and more. Any of these advanced visualizations can be used in reports, dashboards, or publications.
  • BI Office provides social connectivity that allows users to communicate and collaborate via comments and annotations to any of the grids, charts, or other types of reporting content – whether made available via a book of reports or added to an interactive dashboard. Note that social connectivity can be integrated into BI Office using Yammer, Jive, Chatter, or the built-in BI Office collaboration engine.
  • Mobile BI capabilities are provided via the BI Office Mobile application available for mobile devices running iOS, Android, WinRT, or Windows 8.1. The mobile application allows users to not only view and analyze reports and dashboards, but also act on the information available using the same type of interactive actions that the web-based application provides (drill-down, expand/collapse, dice, filter, sort, etc.)
  • R has become a de facto standard for extending data analytics and BI tools, and BI Office is no different.  That said, BI Office makes it extremely easy for users to take advantage of standard forecasting and clustering algorithms via simple menu selections in the Data Discovery component. Beyond that, however, users can also customize the R scripts used via those menu selections in order to take advantage of more complex packages and algorithms available via R.

In future posts, I’ll go into more details about each of the major components as well as the additional features listed above.  For now, if you are interested in learning more about BI Office and/or downloading a trial copy, feel free to hit the Pyramid Analytics website at https://www.pyramidanalytics.com.

Monday, March 28, 2016

Starting a New Pyramid Analytics Series

For those who don’t necessarily know, I am a consulting resource to the Department of Veterans Affairs. Over the years, I’ve helped architect the data warehouse environment within the VA, designed the ETL infrastructure and a data distribution application, and generally assisted in various technology arenas as one of the chief architects within the BI organization there. For the last 12 months (or so), I’ve led the implementation of BI Office (from Pyramid Analytics) within the VA, providing an on-premise, self-service analytics reporting and dashboarding environment. At present, we have over 28,000 users, over 115,000 reports, and over 21,000 pages (or slides) within over 2,000 dashboards. Granted, some of those reports and dashboards were created by individual users for individual use (given the self-service nature of BI Office), but a lot of them were created for organizational use.

Given my work with BI Office over the last year, I thought it might be interesting to blog about some of our experiences with it internally at the VA. I won’t necessarily be able to discuss specific report or dashboard designs or talk about specific data used within the VA, but I’ll instead focus on what BI Office has to offer, what its strengths (and weaknesses) are, and how others might be able to take advantage of it if they are looking for a good on-premise self-service analytics tool…

One final note: I won’t necessarily be comparing BI Office to other solutions like QlikView, Tableau, or even Power BI. I don’t have the experience necessary to craft any comparisons to those other tools. If anyone reading this feels the need to comment about how those tools do this or that better, feel free – but I likely won’t take the time to acknowledge or comment otherwise.

With that said, look for my first post about BI Office within the next few days :-)

Friday, August 28, 2015

The “Case” of the Failing SSIS Lookup

I recently ran across an issue that reminded me of an often overlooked “feature” of the SSIS lookup task.  Before jumping to the punch line, let me set the stage…

I was working on a very simple SSIS package.  Basically, just pulling some data from an operational system, landing that data into a staging table, then using a lookup task to determine whether to insert or update specific records in a final destination table.  Lots of ways to do that, but in this case, I was using a lookup task to bounce the records off the destination table.  If a given record had a lookup match, it became an update; otherwise, it became an insert.  After testing the packages several times, it was deployed to production and ran successfully.  Most of the time.  Well, about half of the time.  The other half of the time, the package would fail and the error messages indicated that inserts were failing due to duplicate key violations.  The destination table had a unique constraint (index) in place on its set of business keys – and the package was failing due to unique constraint violations when attempting to insert the same business key…

Of course, at first glance, this didn’t make much sense.  You see, the lookup task was using the same set of business keys, so any record flowing through the package and through the lookup *should* have been found if the given set of business key values existed in the destination table.  And any records with a successful lookup match should have been routed to the update part of the data flow (which was actually just landing those records into another staging table that was used after the data flow to do a set-based update).  So, how was it that sometimes records would fail to be found by the lookup task and flow to the insert part of the data flow but then fail to be inserted into the table based on unique constraint violations?  I’ll admit that at first, I was stumped.  I removed the mappings in the lookup task and reset them, thinking maybe that would help.  Nope.  I deleted the lookup task and recreated it, thinking maybe that would help.  Nope.  Okay, time for some serious troubleshooting here!  So, I created another (temporary) staging table and reconfigured the insert part of the data flow to insert the records into that new staging table instead of the real destination table.  And ran the package a few times.  After each execution, I examined both the new staging table as well as the other “updates” staging table to see where the lookup had routed all of the incoming rows.  Since the package was pulling from a live operational system, it was hard to get the same set of records each time (which made the troubleshooting a bit harder), but I was able to see that indeed, sometimes the lookup would route “existing” records to the new staging table that represented those rows that had failed the lookup.  When I manually joined that new staging table to the destination table based on the business keys, the join did indeed verify that some of the records existed in the destination table even though the lookup thought they did not.

So, what was going on here?  After scratching my head a few times, I reached out to a good friend of mine, Tom Huneke from Key2 Consulting.  Tom and I work together as consulting resources for the Department of Veterans Affairs.  After going through the package with Tom and looking at the results captured in the two staging tables, Tom immediately saw the problem – the records that the lookup task failed to find (and therefore routed to the new staging table that represented inserts) had the same business key values as existing records in the destination table – except in the “case” of the data.  For example, a record in the destination table had a value of “TBICUBE” in one of the business key columns while the record that had “failed” the lookup and was flagged as a new record had the value of “TBICube”.  I had forgotten that although your source and destination systems (in this case, both SQL Server) might be configured to be case-insensitive, SSIS and its various tasks are NOT case-insensitive.  While I was able to successfully join the records with “TBICUBE” and “TBICube” as values using T-SQL (and those values were considered equivalent), SSIS and its lookup task didn’t see it that way.

The quick fix was to change the lookup task to use a query that forced the business key values to lower case and add a derived column task before the lookup to create lower-case versions of the incoming data columns.  Configuring the lookup task to ensure it was comparing on lower-case only values solved the problem.  As to why the package was succeeding sometimes and failing sometimes, a bit more research of the source system revealed that sometimes it recorded values (in the example above, the “TBICube” value) using all upper case, sometimes using mixed case, and sometimes using all lower case.  Don’t get me started on why it might have been doing that – it is what it is in this case and will likely never change.  As a result, sometimes the recorded values in the source records would match (as the lookup task was using case-sensitive matching) destination table records, and sometimes they wouldn’t.

In the end, SSIS and its lookup task were operating exactly as designed – and exactly as documented.  Sometimes even those of us who have been creating SSIS solutions for more years than we care to admit will forget about something as simple as this – and spend way to much time troubleshooting the problem and overlooking the obvious!

Thursday, August 27, 2015

Why We Speak and Volunteer for SQL Saturday

Last weekend, I had the pleasure of both speaking and volunteering for SQL Saturday #403 in Louisville.  What a blast!  From feedback that we’ve received, it seems to have been our most successful event to date in Louisville.  We maxed out our registration limit of 250 (a great crowd for our city), had great feedback from our sponsors, and generally seemed to exceed expectations of the attendees in terms of the sessions we offered, the speakers we had, the food we brought in, etc.  Just a great event all around!!

After the event, I got an awesome email from my good friend Chris Yates (blog | twitter | linkedin), fellow volunteer for the event and all around great guy.  Here’s the text of that email (posted with his permission):

A BI developer came up to me as I was leaving yesterday. I quote verbatim:

“Mr. Yates I owe you an apology”

I ask why?

“I’ve been doing things wrong for the last year and I didn’t realize it nor the trouble I was causing you. I’m going to work toward fixing my processes”

I ask what brought this on?

“I sat in a guy’s class named Dave Fackler at SQL Saturday, and I understand now why things have not been working the way they should”

BOOOOM ~ and you tell me what we do doesn’t pay off! Good job homie, and thanks for making my life easier here :-)

Talk about a great feeling!  Yes, my head did swell a bit and my chest did stick out a bit farther for the rest of the day!  But the point here is really the last sentence in that email from Chris – that we do what we do (in this case, volunteering to speak and volunteering to make events like this happen) in order to make a difference.  Maybe we’re able to just make a difference because someone learns something, or maybe we’re able to make a difference in how someone does there job.  And maybe either of those will make a bigger difference to the company they work for – either now or down the road.  Sure, its fun to be a speaker and to have a room full of people hanging on your every word (while hoping beyond hope that your demos go well and that you don’t say something stupid!).  But the reason most of us speak is to try to make a difference…

And events like this (along with other PASS events, local PASS chapter meetings, etc.) just wouldn’t happen without the volunteers who spend countless hours making it all work out.  Most people don’t understand what goes into making a great event happen until they volunteer – at least I didn’t.  And sometimes being a volunteer can be aggravating – but in the end, it is almost always worth it.  To see the excitement of the attendees on the day of the event, to see the smiles of the speakers who appreciate the work that has gone into the event, and to hear the buzz as sponsors engage with people throughout the day – just WOW.  It is all worth it in the end…

So, what have you volunteered for lately?  If you’re passionate about SQL Server (and I assume the few people who might actually read this are), then there’s no better way to help your community of fellow SQL Server professionals (affectionately referred to on Twitter as #sqlfamily) than to volunteer – get involved with your local PASS chapter, help out with the next SQL Saturday event in your city, find out how to get engaged as a volunteer at the next PASS Summit.  In the end, you’ll love it and you’ll think “Why didn’t I do this sooner??”

Monday, August 10, 2015

Table Versioning for Schema Changes

During my session CSI:DW – Anatomy of a VLDW at SQL Saturday #402 in Indianapolis (and I’ll be repeating that session at SQL Saturday #403 in Louisville in less than two weeks), I talked about managing schema changes for very large data warehouse tables…  So I thought I’d blog about the solution we came up with at the VA for handling schema changes.

Imagine that you have a large data warehouse environment.  Also imagine that you have a fact table with 2B+ rows in it.  And that you have several hundred (if not thousands) of users who query that fact table in various ways every day.  As luck would have it, you need to change the schema of the table.  Maybe you need to add a column (which is the most common scenario in our case at the VA).  Or maybe you need to change a column’s data type (hopefully that wouldn’t happen, but…)  How do you handle that for a table that large?  Perhaps you could manage to add a column by using an ALTER TABLE statement, but then you’d have to deal with updating all 2B+ rows to get that column filled in.  And perhaps you could handle a data type change by adding a new column with the new data type (albeit also with a new column name).  In most cases, anything you do to the table will likely impact users in one way or another.  And we all know that users like to complain when they can’t get their data as expected when expected!

After dealing with this issue a few times, the DW team at the VA came up with table versioning as a standard methodology for handling any and all schema change scenarios.  There are a couple of key concepts that are involved with this solution:

  • Every table name has a version number appended to the table name.  Thus, we have physical table names like [Dim].[Location_v025] and [Appt].[Appointment_v062].
  • All user access to data warehouse tables is managed via views and the view names do NOT reflect version numbers.  So, we have view names like [Dim].[Location] and [Appt].[Appointment] and users are only able to select data from the views.
  • Surrogate keys are assigned to each table based on a separate DW surrogate ID table that is paired up with each table.  These tables all exist in a [DWSID] database schema.  For example, for the [Dim].[Location_v025] table, we have a table named [DWSID].[Location].  This table contains a surrogate key column, implemented as an IDENTITY() column, along with the business key(s) for that table.  The versioned table also includes the surrogate key column and business keys, but the surrogate key column is NOT marked as an IDENTITY() column in the versioned table.

With these concepts in place, we can now create a new version of a table independent of the existing version.  Assuming the location dimension has to be changed in some way, our data modeling team will model the new table and assign it a new version number (which is actually handled automatically by their modeling tool, ER/Studio).  For example, they might end up creating [Dim].[Location_v059] as a result of the new modeling effort.  The new table can be created alongside the existing versioned table, with no ill effects to the existing version or user access to the existing version (since the [Dim].[Location] view now gets data from [Dim].[Location_v025].  The ETL developers will create a new load process (we use SSIS packages for all of the ETL work at the VA) for loading the new version of the table and execute it (typically via scheduled job) to get the new table loaded.  Meanwhile, the existing load process for the existing version of the table can continue to run.  Both processes (SSIS packages) can and do insert records into the [DWSID].[Location] table as needed when one or the other encounters a new business key (i.e., a new source record) in the source system(s).  Each process can then correctly add that same record to each version of the table – and can assign the right (and the same) surrogate key based on a lookup against the [DWSID].[Location] table using the business key(s).

With both tables populated, the ETL developers (and others) can then take a look at the new table to verify its new schema, its data, etc.  In fact, both tables can remain “in production” for several days or even weeks as needed to ensure the ETL process and the data is right.  Once the DW team is ready to switch users to the new version, ALTER VIEW statements can be used to change the view to reflect the new table version and its schema change.  In the example above, the [Dim].[Location] view can be changed so that it gets its data from the new [Dim].[Location_v059] table instead of the old [Dim].[Location_v025] table.  When complete, users will see the schema changes and the old table can be deprecated and dropped from the database.

Of course, nothing comes for free.  Using table versioning does imply that we have enough disk space available to create a full clone of a table – including additional space for any new columns, any larger data types used, any additional indexes planned, etc.  Typically this isn’t a problem but there have been times when we’ve had to add additional space before generating a new version of some of our largest tables.  The good news is that space can be reclaimed (if needed) once the old version of a table is deprecated.

Although table versioning may sound complicated, it has become part of our overall DW architecture at the VA and has worked out well over the years (we’ve been following this approach for almost 5 years now).  Most DW users are not aware of the technical details, but do appreciate the fact that we can announce schema changes and then (typically) enact those changes on an exact date as announced (since the only change required to “publish” a new version of a table once it is ready is the DDL statement used to alter the view)…

I’d be interested to hear what other data warehouse professionals think of this idea, so leave a comment here if you want or drop me a note at davef@rollinghillsky.com.