Friday, November 28, 2008

Optimizing Dimension Designs (Series Prelude)

As I indicated in my last post, I presented at the PASS Summit 2008 conference on optimizing dimension designs.  This is a prelude to a series of posts I'll be doing over the next several weeks related to that presentation.

First things first.  If you missed my session, you missed a special appearance by Beaker from The Muppets.  As I mentioned during my talk, I used to work at Intellinet, a great consulting firm located in Atlanta.  While there (from 1997 to 2005), I grew and managed the original BI practice there, trying to manage the likes of Douglas McDowell and Erik Veerman (yes, I'm partially responsible for both of them!)  At any rate, at a Christmas party one year, they presented me with a special award -- my very own Beaker statue.  I certainly don't see the resemblance, but everyone got a kick out of it (and I still have that statue!)  So, to lighten up my session, I started things off right by including this Beaker video -- and explained to everyone that I was a lot like Beaker in that video during the conference as I had to completely rebuild my VPC and demos after an external USB drive crashed on me the day before the conference started -- yikes!

At any rate, enjoy the Beaker video if you didn't catch my session.  And if you managed to catch my session, let me know if you thought Beaker stole the show!!

PS.  Congratulations to Douglas McDowell for becoming a new PASS Board member!

Friday, November 21, 2008

PASS Summit 2008 Session on Optimizing Dimension Designs

Well, its been quite some time since I last blogged.  The summer and fall were a pretty busy time, but I guess everyone is busy, so that's no real excuse.  Although I didn't post about it prior to the PASS Summit 2008 conference, I finished up my presentation at the conference today.  As in prior years, I spoke about dimension designs in SSAS.  This year I included information regarding good dimension designs and a few interesting design scenarios that call for "outside the box" thinking.

Over the next several weeks, I'm planning a series of posts containing information from my presentation.  And I'll hopefully include some details that I had to gloss over based on the limited amount of time available during my session (75 minutes goes by pretty quickly!)  If you happened to catch my session, let me know what you thought -- hopefully it was entertaining if nothing else!  And stay tuned over the coming weeks for additional posts related to optimizing dimension designs.

PS.  If I get permission from PASS, I'll post my slide deck here (waiting to hear if that breaks any speaker rules or not...)

Wednesday, June 25, 2008

Handling Multiple Calendars with a M2M Scenario (Part 2)

In my last post, I started describing a way to handle multiple calendars using a many-to-many scenario.  As I explained, a recent client engagement called for a solution that would allow them to add calendars (customer calendars in particular) without requiring changes to the relational or SSAS models.  In most classic date dimensions, multiple calendars are handled using additional columns for the additional calendar parts that each date has.  These additional columns are then added to SSAS as additional attributes in the date dimension; these attributes are then typically used to create specific user hierarchies for that particular calendar (a Gregorian calendar hierarchy, a fiscal calendar hierarchy, etc.)

The previous post laid the groundwork for creating a multi-calendar solution using a many-to-many scenario.  In that post, I described a new DimCalendar dimension table that would contain a row for each date within each of the calendars needed.  The table contained date-part columns (for things like year, quarter, month, etc.), but these were not specific to a given calendar.  Instead, they contained the appropriate date-parts for a given date within a given calendar.  One thing to note about this approach is that the DimCalendar dimension table will hold N times the number of dates that you might need for all the years you need to represent in your solution.  Thus, if you need 10 years worth of dates, a standard date dimension might contain approximately 3,650 rows; with the DimCalendar table, if you need 4 calendars represented for those same 10 years, the table will contain approximately 14,600 rows.  Given this is a relatively low row count for a dimension table, this shouldn't impose any performance problems with the solution.

With the DimCalendar table available, we can create a new [Calendar] dimension within SSAS using standard dimension techniques.  Using the DimCalendar table design from the last post, the [Calendar] dimension within SSAS might look like this:

 Calendar Dimension Design

A few notes about the design:

  • The [Calendar] attribute contains the calendar names based on the values found in the CalendarName column from the DimCalendar table; the DefaultMember property of the [Calendar] attribute is set to [Calendar].[Calendar].&[Gregorian] so that the default calendar is the Gregorian calendar; this attribute also has its IsAggregatable property set to False.
  • The other attributes are built in a typical fashion, using standard KeyColumns and NameColumn values based on columns available from the DimCalendar table; in addition, standard attribute relationships exist between attributes as would be expected in a date dimension (with the additional step of relating the [Year] attribute to [Calendar]).
  • The [Year-Quarter-Month] and [Year-Week] hierarchies are built in a typical fashion.
  • The [Date Calculations] attribute is based on a named calculation column in the DSV (a standard approach for "hanging" date calculations on an attribute within a date dimension) with a default value of "Current Calendar Date".

With the new [Calendar] dimension in place, we can update the cube to take advantage of it.  Note that in the discussion that follows, I did not remove the existing [Date] dimension from the Adventure Works cube.  Ideally, however, the new [Calendar] dimension would replace it (and could be used in a role-playing fashion just like the [Date] dimension is used already).

The first step to updating the cube is to add a new measure group based on the new FactDateCalendar table.  Remember that this table contains the linkage between the existing DimTime table (which is related to all the other fact tables) and the new DimCalendar table.  The measure group need only contain a single, hidden measure that uses the Count aggregate function to aggregate the row count of the FactDateCalendar table based on its related dimensions.  In the screen shot below, note the diagram showing the relationships between the new tables and the new measure group named "Fact Date Calendar".

New Measure Group

With the new measure group in place, the Dimension Usage tab can be used to relate the [Calendar] and [Date] dimensions to it and create the many-to-many relationship between the new [Calendar] dimension and the other related measure groups.  The following shows the a few of the relationships defined on the Dimension Usage tab.

New Many-to-Many Relationships

Note that the many-to-many relationships use the new "Fact Date Calendar" measure group as the intermediate measure group.  Also note that many-to-many relationships exist between the [Calendar] dimension and the other measure groups (I just chose not to show them all in the screen shot).

With the new measure group in place along with the new relationships, the solution can be deployed, processed, and then queried (using the BIDS browser, the SSMS browser, or a front-end tool like Excel).  The following shows a simple Excel 2007 pivot-table that uses the [Year-Quarter-Month] hierarchy from the new [Calendar] dimension along with measures from the "Internet Sales" measure group.

Simple Pivot-Table with Calendar Dimension

Note the different calendars showing the same grand total, but with different subtotals below that based on the fact that days aggregate to fiscal years differently than they do for calendar years (at least for Adventure Works).  If users want to focus on a particular calendar, they can filter the hierarchy or include the [Calendar] attribute as a page-level filter to get the same affect.  Also note that if a query doesn't include the new [Calendar] hierarchy, everything works as expected given the fact that the Gregorian calendar is set as the default calendar and aggregates and totals work as expected.

Finally, to round out the solution, we can add date calculations to the new [Calendar] dimension and make them work as expected.  Earlier, I mentioned that the [Date Calculations] attribute was added to the new [Calendar] dimension in order to provide a place to "hang" date calculations.  This is a fairly standard approach that has been described (with its various pros and cons discussed) by many different SSAS practitioners over the last several years; thus, I won't go into a lot of discussion about the approach here.  Suffice it to say, the following provides the MDX I added to the calculations script for the cube in order to create three example date calculations (which apply to different sets of calendars -- for example purposes only).

// Date calculations for the [Calendar] dimension

Create Member CurrentCube.[Calendar].[Date Calculations].[Year to Date]
As Null;

Create Member CurrentCube.[Calendar].[Date Calculations].[Year Over Year Growth %]
As Null;

Create Member CurrentCube.[Calendar].[Date Calculations].[Year Over Year Growth]
As Null;

Create Set [Date Calculation Target Measures]
As
{
[Measures].[Internet Sales Amount]
,[Measures].[Internet Order Quantity]
,[Measures].[Internet Order Count]
,[Measures].[Reseller Sales Amount]
,[Measures].[Reseller Order Quantity]
,[Measures].[Reseller Order Count]
};

// [Year to Date]
Scope
(
[Date Calculation Target Measures]
,[Calendar].[Date Calculations].[Year to Date]
);

// [Year-Quarter-Month].[Year] level
(
[Calendar].[Year-Quarter-Month].[Year].Members
) =
Aggregate
(
{[Calendar].[Date Calculations].DefaultMember} *
PeriodsToDate
(
[Calendar].[Year-Quarter-Month].[Year]
,[Calendar].[Year-Quarter-Month].CurrentMember
)
);

// [Year-Quarter-Month].[Quarter] level
(
[Calendar].[Year-Quarter-Month].[Quarter].Members
) =
Aggregate
(
{[Calendar].[Date Calculations].DefaultMember} *
PeriodsToDate
(
[Calendar].[Year-Quarter-Month].[Year]
,[Calendar].[Year-Quarter-Month].CurrentMember
)
);

// [Year-Quarter-Month].[Month] level
(
[Calendar].[Year-Quarter-Month].[Month].Members
) =
Aggregate
(
{[Calendar].[Date Calculations].DefaultMember} *
PeriodsToDate
(
[Calendar].[Year-Quarter-Month].[Year]
,[Calendar].[Year-Quarter-Month].CurrentMember
)
);

// [Year-Week].[Year] level
(
[Calendar].[Year-Week].[Year].Members
) =
Aggregate
(
{[Calendar].[Date Calculations].DefaultMember} *
PeriodsToDate
(
[Calendar].[Year-Week].[Year]
,[Calendar].[Year-Week].CurrentMember
)
);

// [Year-Week].[Week] level
(
[Calendar].[Year-Week].[Week].Members
) =
Aggregate
(
{[Calendar].[Date Calculations].DefaultMember} *
PeriodsToDate
(
[Calendar].[Year-Week].[Year]
,[Calendar].[Year-Week].CurrentMember
)
);

End Scope;

// [Year Over Year Growth %]
// Applied only to the [Gregorian] calendar
Scope
(
[Date Calculation Target Measures]
,[Calendar].[Date Calculations].[Year Over Year Growth %]
,[Calendar].[Calendar].[Gregorian]
);

// [Year] attribute
(
[Calendar].[Year].[Year].Members ( 1 ) : Null
) =
IIf
(
([Calendar].[Date Calculations].DefaultMember, [Calendar].[Year].CurrentMember.Lag(1)) = 0
,Null
,
(
([Calendar].[Date Calculations].DefaultMember) -
([Calendar].[Date Calculations].DefaultMember, [Calendar].[Year].CurrentMember.Lag(1))
)
/
([Calendar].[Date Calculations].DefaultMember, [Calendar].[Year].CurrentMember.Lag(1))
);

Format_String
(
([Calendar].[Date Calculations].[Year Over Year Growth %])
) = "Percent";

End Scope;

// [Year Over Year Growth]
// Applied only to the [Fiscal] calendar
Scope
(
[Date Calculation Target Measures]
,[Calendar].[Date Calculations].[Year Over Year Growth]
,[Calendar].[Calendar].[Fiscal]
);

// [Year] attribute
(
[Calendar].[Year].[Year].Members ( 1 ) : Null
) =
([Calendar].[Date Calculations].DefaultMember) -
([Calendar].[Date Calculations].DefaultMember, [Calendar].[Year].CurrentMember.Lag(1));

End Scope ;



With this script in place, the Excel 200 pivot table shown above can be updated to include the new date calculations (showing just one of the measure included in the set of measures the date calculations apply to).



Excel Pivot-Table with Calendar Dim and Date Calculations



Note that in the screen shot below, a "Reporting" calendar shows up along with the "Fiscal" and "Gregorian" calendars.  The nice thing about this solution is that I didn't have to change the design at all in order to add this new calendar (that's the whole point of the design).  In fact, I had completed the design work described above, deployed and processed the solution, and was browsing it with just the "Fiscal" and "Gregorian" calendars included.  I then used a T-SQL script to add the new "Reporting" calendar by adding rows to the new DimCalendar and FactDateCalendar tables in the relational database.  I then reprocessed the [Calendar] dimension and the "Fact Date Calendar" measure group (without reprocessing any other dimensions or measure groups in the solution) and viola!  The new "Reporting" calendar was established and showed up in my pivot-table.



Hopefully, others out there who may be faced with supporting multiple calendars (beyond the standard "Fiscal" and "Gregorian" varieties) will find this approach interesting and useful.  Let me know what you think...



PS.  If you want to download a copy of the solution, I've made it available as a zip file here:





The SQL database should be restored as "AdventureWorksDW-M2MCalendar" and the SSAS solution should be deployed as "Adventure Works DW - M2M Calendar".  I typically create copies of the base Adventure Works sample databases for creating and testing new ideas and solutions.  Enjoy!

Saturday, May 10, 2008

Handling Multiple Calendars with a M2M Scenario

Most dimensional models and BI solutions implement a date dimension of one type or another and many of them include multiple calendars -- the "wall" (or Gregorian) calendar, a fiscal calendar, a manufacturing calendar, etc.  Handling multiple calendars is fairly simple and most people approach the problem by adding additional columns to a date dimension table that holds a row for each date they need.  Thus, many date dimension tables look like the one in Adventure Works:

DimTime Dimension Table

However, what if you need to support more calendars now (maybe more than 3 or 4) and you need a solution that will support N calendars into the future?  Sounds a little crazy, but a while back I was working with a client that had such a need.  As it turned out, they wanted to have their major customer fiscal calendars included in their date dimension so that sales and marketing managers could analyze data according to the fiscal calendar of the customer.  Maybe not so crazy...

The client wanted a solution that would not require schema changes (relationally or in SSAS) whenever a new calendar needed to be added (or when a calendar needed to be retired).  Thus, the solution couldn't really rely on things like adding new columns to a date dimension.  After thinking about the problem quite a bit, we decided that we could solve it using a M2M scenario.  In this blog entry, I'll lay the ground work for the solution by providing the details on the relational side.  In a follow-on entry, I'll finish it up by showing the details on the SSAS side.

On the relational side, we needed to add a table to hold the calendar information as well as a "bridge" table that would relate dates to a given calendar.  I'll show how this might work in the AdventureWorksDW database.  To begin with, we'll continue to use the DimTime table as it exists today so that we can compare our new solution and its results to the existing mechanism for showing the Gregorian and fiscal calendars for Adventure Works.  A "final" version of this solution might trim down the DimTime table so that it only includes TimeKey and FullDateAlternateKey.

Now, we need a new dimension table to represent our new solution for calendars and dates.  Each row in the new table will represent a date within a given calendar; thus, we will end up with 1/1/2008 in the table for as many calendars as we need to support.  And every new calendar that gets added (in the future) will add a new row for 1/1/2008.  Thus, the grain of the dimension table is no longer just the date -- it's the combination of a date and the calendar it belongs to.  Thus, we create a new DimCalendar table that looks something like this:

DimCalendar Dimension Table

Notice that the table includes most of the normal attributes you'd think of for a date -- year attributes, quarter attributes, etc.  If a given calendar needs more attributes (for different "levels"), you can add columns for them (semesters, for example).  If a given calendar doesn't need that particular "level" or attribute, then you can simply leave that column NULL for each of the dates in that calendar.

You may be wondering why the table design includes the date itself (DayName and DayNumber attributes) since we're going to end up relating this new table to the existing DimTime table, which already includes date.  By including date here, we're able to include the day level in hierarchies that we build based on this new dimension (for example, Year-Quarter-Month-Day and Year-Week-Day hierarchies).

When populating this table, we decided to ensure that the DayNumber column provided a unique value for each day within each calendar.  Thus, for each row, we decided to assign a prefix (10 for the Gregorian calendar, 20 for the first fiscal calendar, etc.) to each day number.  So, the DayNumber column for 1/1/2008 in the Gregorian calendar became 1020080101 while the DayNumber column for 1/1/2008 in the first fiscal calendar became 2020080101.

With this table in hand, you can populate each calendar that you need by simply adding rows to the table for each day (across as many years as you want) for each calendar.  And in the future, you can add another calendar by simply adding new rows to the table for each day in the new calendar.  New calendars no longer need schema changes.  Here's what some of the data in the DimCalendar table might look like:

DimCalendar Table Data

Finally, to make the M2M scenario work for us, we need a bridge table that relates DimTime to DimCalendar.  This new bridge table will relate a date from DimTime (like 1/1/2008) to multiple rows in the DimCalendar table (like our 1/1/2008 row for the Gregorian calendar and our 1/1/2008 row in the first fiscal calendar.  We named this new bridge table FactDateCalendar and it looks like this:

FactDateCalendar Bridge Table

The overall model of using these new tables (shown along with a couple of the existing fact tables in the AdventureWorksDW database) then looks like this:

Updated AdventureWorksDW Model

With these tables populated (using whatever means necessary), we're ready to modify the SSAS solution for Adventure Works to use our new M2M scenario.  I'll follow-up in the next week or so with those details.

Sunday, April 13, 2008

Intelligencia Query for BIDS and SSRS

Very interesting announcement from Chris Webb (working with Andrew Wiles) about a new tool to ease the pain of using SSRS to create SSAS reports:

http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1797.entry

I definitely plan to try this out.  I, like Chris, have had to manage client expectations on many different projects when the client wanted to use SSRS to create reports based on SSAS solutions.  If this works half as well as I suspect it will (with Chris and Andrew behind it), I'll likely be ready to actually start recommending SSRS and SSAS together!

Exciting stuff!

Wednesday, March 12, 2008

Dimension Security Issues with Excel 2007 and SSAS 2005 Parent-Child Hierarchies

I've seen and read a lot of different posts and blog entries related to Excel 2007 and SSAS 2005 issues. Mostly things that don't work as planned with regards to displaying and using calculated members, incorrect query results given Excel's subcube-based filtering and its interaction with default members, etc. But here's one that I don't think I've run across yet that actually causes an error to occur while browsing a cube...

Suppose you have dimension security set up on a parent-child dimension. This is a common scenario in a sales cube or any other cube that contains a parent-child dimension related to people within an organization. In most cases, individuals should be restricted to just seeing themselves plus their subordinates within the parent-child hierarchy. Typically, I've seen this done by including AD logins as an attribute in the dimension and then using an MDX expression to restrict the allowed set of members within the parent-child hierarchy to just the user currently logged in (which then includes their subordinates by default in a parent-child hierarchy).

Here's a quick example using Adventure Works. I've created a new role in the Adventure Works SSAS solution named "Employees Security.role". This role specifies dimension data security based on the [Employee] dimension within the [Adventure Works] cube. Security is enabled on the [Employees] attribute within the dimension using the following MDX expression for the Allowed Member Set property:



Filter(
[Employee].[Employees].Members,
[Employee].[Employees].CurrentMember.Properties("Login ID") = "adventure-works\amy0"
)


Note that in this example, I'm referring directly to the AD login for the employee named Amy E. Alberts; normally, the MDX expression would use the UserName function to return the AD login of the current user. Doing it this way allows me to easily test the security role. Also note that I have Visual Totals enabled for the security role. Here's what the security role looks like:

Dimension Security Settings for Employees Security Role

With this in place, I can browse the [Adventure Works] cube. When I browse the cube without using the security role, I can see all of the employees in the [Employees] parent-child dimension as shown below:

All Employees Visible with No Security Role

However, when I connect to the cube through the security role (using the Change User button on the browser toolbar in either the BIDS cube designer or in the SSMS cube browser window), I'm relegated to seeing just Amy (along with her superiors -- required so that I can drill down to Amy) and her subordinates as shown below:

Only Secured Employees Visible with Security Role

Works great. Notice that Visual Totals has kicked in with the role so that I cannot see any numbers beyond those of Amy. This even works well within Excel 2003. If I connect and create a pivot table using the standard features in Excel 2003, I can see all of the employees when not connected through the security role:

Excel 2003 PivotTable with No Dimension Security

If I update the .oqy file created by Microsoft Query within Excel 2007, I can add "Roles=Employees Security" to the end of the connection string to get Excel 2003 to connect through the security role. After doing this, when I create the same pivot table as seen above, dimension security kicks in as I would expect it to:

Excel 2003 PivotTable with Dimension Security

Notice that Visual Totals is working correctly and I'm limited to seeing Amy and her subordinates (and her superiors again in order to provide a drill-down path to get to her). Fabulous!

However, when I try this same thing using Excel 2007, it just doesn't work. First, I'll create an Excel 2007 pivot table without any dimension security in place. This works fine using the parent-child hierarchy:

Excel 2007 PivotTable with No Dimension Security

Okay so far. Now, if I update the connection used by the pivot table (highlight the Data option on the Excel ribbon, click on the Properties toolbar button in the Connections group, then add "Roles=Employees Security" to the end of the Connection String on the Definition tab of the Connection Properties dialog box), I can attempt to recreate the pivot table. When I add the [Employees] parent-child hierarchy along with the [Reseller Sales Amount] measure (as done above), things look promising (and you can see that Visual Totals has kicked in):

Excel 2007 PivotTable with Dimension Security

However, as soon as I try to drill-down to get to Amy, boom!!

Error in Excel 2007 PivotTable with Dimension Security

Ugh. And to date, I've not been able to find a work-around. Anyone out there who has ideas on how to handle this, please let me know. I've tried defining the MDX expression in the security role in a couple of different ways and I've tried to tweak the pivot table settings in Excel 2007 in various ways. All to know avail. Uh, Microsoft, can we get this fixed (along with all the other ugly issues that exist when using Excel 2007 as a client for SSAS??)

SSIS Package to Back Up SSAS Databases

There are a couple of different posts and blog entries out there that discuss backing up SSAS databases using various techniques such as AMO, SSIS, etc. I recently created an SSIS package to do the same thing, and I think it's a little bit simpler than some of the other solutions out there.

The basic design of the package is to use a ForEach Loop container to loop over the list of databases on a given SSAS server (versus using AMO or some other method of getting a list of databases) and then using an Analysis Services DDL task with an XMLA script that calls for a backup of the current database.

The figure below shows the overall control flow in the package:

Control Flow for Backup SSAS Databases Package

The ForEach Loop container uses the ADO.NET Schema Rowset Enumerator along with an ADO.NET connection to the SSAS server. Thus, the package contains a Connection Manager using the .Net Provider for OLEDB along with the OLE DB Provider for Analysis Services 9.0. Thus, the Connection Manager looks like this:

Connection Manager for ADO.NET Connection to SSAS

Notice the Provider selected, the Server name entered, and the fact that no Initial Catalog is selected (since we're going to loop over all of the catalogs with the ForEach loop container). In this case, the Connection Manager just connects to the local SSAS server, but that could be dynamically handled with a package variable and a property expression on the Connection Manager.

The important settings for the ForEach Loop Container look like this:

Collection Settings for ForEach Loop Container

Variable Mappings Settings for ForEach Loop Container

Note that the Collections settings specify the ADO.NET Schema Rowset Enumerator, the defined Connection Manager, and the Catalogs schema. Also note that the Variable Mappings tab specifies that the catalog name returned by the enumerator be stored in a package variable named DatabaseName.

Within the ForEach Loop container, you'll see a Script Task as well as an Analysis Services Execute DDL Task. The Script Task is used to generate the XMLA script needed to backup the current database and the Analysis Services Execute DDL Task then executes that XMLA script. The Script Task is configured as follows:

Script Settings for Script Task

Note that the DatabaseName variable is passed in as a read-only variable while another package variable named BackupScript is passed in as a read-write variable. The script within the task looks like this:

Public Sub Main()
'
' Add your code here
'

Dts.Variables.Item("BackupScript").Value = _
"<Backup xmlns=""http://schemas.microsoft.com/analysisservices/2003/engine"">" + _
" <Object>" + _
" <DatabaseID>" + Dts.Variables.Item("DatabaseName").Value.ToString + "</DatabaseID>" + _
" </Object>" + _
" <File>" + Dts.Variables.Item("DatabaseName").Value.ToString + ".abf</File>" + _
" <AllowOverwrite>true</AllowOverwrite>" + _
"<Backup>"

Dts.TaskResult = Dts.Results.Success

End Sub


Sorry for the additional line wrapping in the code above, but the important point is that the BackupScript variable will contain a well-formed XMLA script for backing up the database named in the DatabaseName variable. The script is fairly simple in nature as it simply uses a backup filename of the database name with .abf appended and it assumes the backup file can be overwritten. Other scenarios could be handled (time-stamping the backup filenames, for example) via additional code.

Once the Script Task executes, the BackupScript variable is then ready to use to backup the current database (within the context of the ForEach Loop container). The Analysis Services Execute DDL Task is configured to execute the backup script as follows:

DDL Settings for AS Execute DDL Task

Note that the Analysis Services Execute DDL Task uses a native OLE DB provider to connect to SSAS, not an ADO.NET provider. Thus, a second Connection Manager is needed within the package to handle this connection. It is configured as follows:

Connection Manager for OLE DB Connection to SSAS

Note that it is configured the same as the ADO.NET-based Connection Manager above, but uses a Native OLE DB provider. Also note that the server name here could be made dynamic via a variable and a Property Expression as well.

And that's it. If you run the package, it will loop through all of the databases on the SSAS server and backup each one. If this needs to be done on a scheduled basis, the package can be easily deployed to one or more SSAS servers and scheduled using SQL Server Agent.

I've pushed a copy of the package up to my SkyDrive in case you want to take a look or even use it (as is of course!) yourself. Here's a link to the zip file there:


Let me know what you think...

Wednesday, March 5, 2008

Current Date Parts for SSRS Reports using SSAS Cubes

Recently, while working with a client on the development and deployment of a set of SSRS reports using SSAS as a data source, we came up with a nifty method of defaulting date-part report parameters to the current date.

When creating SSRS reports based on SSAS, a common scenario is to want date-part parameters that map to various attribute hierarchies within a date dimension. For example, in most SSAS solutions, a date dimension has attribute and user hierarchies that contain parts of a date. If these are added to a report as parameters, you might want the various date parts to default to parts of the current date. So, a report parameter based on a year attribute might need to default to the current year; a report parameter based on a month attribute might need to default to the current month.

Most solutions that I've seen to this problem go about it via MDX (by adding new "current date" calculated members to the date dimension) or via SSRS expressions (using various functions to create the right default values). The solution we came up with, however, relies on the fact that the date dimension in SSAS is typically processed from an underlying date table in the relational database used as the data source for SSAS. Let's look at an example of how this might work (using Adventure Works).

First, let's create a new report for Adventure Works using SSAS as the data source. In an SSRS project, you can use the Report Wizard to create a new report. Let's assume you need to create a report showing reseller sales, allowing users to filter the report by year and month. Using the Query Builder made available in the Report Wizard, we can create the following query for the report:

SSAS Query

Notice the parameters that are included (for the [Date].[Calendar Year] and [Date].[Month of Year] attributes) along with the default values ([CY 2001] and [July] respectively) that I selected. After finishing the Report Wizard and then applying a bit of formatting to the report, here's what it might look like:

Initial Reseller Sales Report

Notice the report parameters at the top, along with the default values that got filled in (based on the selections I made when the parameters were created in the Query Builder). If you inspect the report parameters, you can see what the format of each of the default values looks like:

Year Parameter Default Value Format

Month Parameter Default Value Format[4]

Notice that the default for the year parameter is the string "[Date].[Calendar Year].&[2001]" and the default for the month parameter is the string "[Date].[Month of Year].&[7]". If you look at the designs of these attributes in the SSAS solution, these map to values in the key columns DimTime.CalendarYear and DimTime.MonthNumberOfYear.

Now, suppose users of this report want the year parameter to default to the current year and the month parameter to default to the current month. Within the AdventureWorksDW relational database, the following query returns the key columns listed above for each date in the DimTime table:

select
CalendarYear
,MonthNumberOfYear
from
dbo.DimTime

If we apply some formatting to the columns returned, we can get values back that look like the default parameter strings noted above:

select
'[Date].[Calendar Year].&['
+ convert(varchar, CalendarYear)
+ ']' as CalendarYearValue
,'[Date].[Month of Year].&['
+ convert(varchar, MonthNumberOfYear)
+ ']' as MonthofYearValue
from
dbo.DimTime

And if we apply a where clause to the query, then we can return the formatted values for a single date -- in particular, the current date (likely using the getdate() function). Now, in the AdventureWorksDW database, the DimTime table doesn't have dates up through the current date; thus, we'll simply ask for the maximum date in the table. And we'll wrap this query into a stored procedure named dbo.GetDefaultDateParts:

create procedure dbo.GetDefaultDateParts
as

select
'[Date].[Calendar Year].&['
+ convert(varchar, CalendarYear)
+ ']' as CalendarYearValue
,'[Date].[Month of Year].&['
+ convert(varchar, MonthNumberOfYear)
+ ']' as MonthofYearValue
from
dbo.DimTime
where
-- FullDateAlternateKey = convert(varchar, getdate(), 101)
FullDateAlternateKey =
(
select
max(FullDateAlternateKey)
from
dbo.DimTime
)

With this stored procedure in place, we can now update the report to use these new default values. First, we need to add a new data set to the report, based on the AdventureWorksDW relational database, calling this new stored procedure. Here's what the properties for the new data set might look like:

Data Set for GetDefaultDateParts Stored Procedure

And here's what a query of the data set would return:

GetDefaultDateParts Data Set Results

So, with this new data set available, we can simply update the report parameters such that each parameter's default value comes from the data set (instead of being based on the values that were originally selected when the report parameters were created). Here are the updated report parameters:

Updated Calendar Year Report Parameter

Updated Month of Year Report Parameter

Note that each parameter has been set to be single-valued (although this doesn't necessarily have to be the case) and that each parameter now gets its default value from the appropriate field of the new data set.

Finally, previewing the report will show that the parameters get filled in with our new default values (2004 for the year and August for the month):

Report with New Default Values for Parameters

Now, in this particular case, it looks like Adventure Works didn't have any reseller sales for August 2004. So, the default values we created via the new stored procedure might not be optimal. But, now that we have a stored procedure for returning the default values, it could be updated to return the date parts for the last date with data, the last date of the last "closed" financial month, or just about anything else that makes sense for our users and our business scenario. The stored procedure could even be updated to accept parameter values to control it in various ways, with different reports calling for different dates for example.

There are certainly some cases in which this may not work. For example, if you are using an SSAS-based date dimension (with no underlying date table in the relational database) or if have a very slim date dimension table and are creating additional columns within your DSV. So, your mileage may vary. But, for quite a number of scenarios and situations, this seems to work out well and avoids some of the complexity I've seen people create just to do this type of thing.

I've uploaded a copy of the Reseller Sales report sample displayed above along with a T-SQL script for creating the stored procedure. Both are available here:

Give it a try and let me know if this seems like a useful technique...

Sunday, February 24, 2008

Incorrect Truncation Warning in SSIS

I was working on an SSIS package a few weeks back and was puzzled by a truncation warning that two of the Data Flow tasks kept throwing out.  In the Data Flow, a string-based column of length 10 in the pipeline was being used as part of the input for an OLE DB Destination, which was storing the data into a SQL Server table.  The column being populated by this specific input column had a data type of varchar(20).  No problem, right?

Well, the Data Flow task was generating a warning stating that the column of length 20 might be truncated when used to populate the column of length 10.  Huh?  Exactly what I thought.  Why was the Data Flow task complaining about that when clearly the column of length 20 was the destination and the column of length 10 was the source.

I reviewed every property I could think of with regards to the source column, the destination column, the destination table, the OLE DB Destination component, the Data Flow task, etc.  Nothing I could find (at first glance) seemed out of place.  And even though the package ran fine (it was, after all, just a warning), it continued to bug me that something about the package was causing the SSIS engine to generate the warning.

After scratching my head for quite some time, I finally found the answer.  It seemed that someone working on the package prior to my development work on it had set the ValidateExternalMetadata property on the OLE DB Destination to False.  This is typically done (in my experience) to keep a Data Flow task from complaining about the metadata of a destination table that doesn't exist yet (maybe it will be created in a step before the Data Flow task executes, for example).  Well, believe it or not, setting this property to False seemed to cause the SSIS engine to misinterpret the metadata information about the source and destination, in this case, swapping the column metadata.

Here's a quick example to see this in action.  Create a Data Flow task in a package, add an OLE DB Source component, a Derived Column component, and an OLE DB Destination component.  For the OLE DB Source component, select a table from the AdventureWorks database (like the Production.Product table) and pick a character-based column.  Connect the OLE DB Source component to the Derived Column component.  In the Derived Column component, create a new column (with a new name) and use the SUBSTRING() function to make a shorter version of the string column.  Send the output of the Derived Column component to the OLE DB Destination, mapping the source to the original table, using the shorter column you created as the source for the column you originally picked from the table.  So, you end up with something like this:

Simple Data Flow

In the OLE DB Source, I selected the Production.Product table and selected the Name field, which has a data type of nvarchar(50).

OLE DB Source Component

In the Derived Column component, I created a new column named ShorterName as follows:

Derived Column Component

Finally, for the OLE DB Destination, I mapped the new ShorterName column back to the Name column:

OLE DB Destination

Now, there's no need to really run the package to see the warning.  Simply change the ValidateExternalMetadata property for the OLE DB Destination component from the default of True to False.

OLE DB Destination Properties

As soon as you do this, you'll see a warning icon on the OLE DB Destination component.  If you look at the warning in the Error List, you'll see something like the following:

Validation warning. Data Flow Task: OLE DB Destination [140]: Truncation may occur due to inserting data from data flow column "Name" with a length of 50 to database column "ShorterName" with a length of 40.

Note that the truncation warning has inverted the metadata involved, stating that (in this case) truncation may occur inserting data from the Name column to the database column ShorterName.  But, as is clearly the case, the database column (the destination) is the Name column while the ShorterName column is actually the source.  If you set the ValidateExternalMetadata column back to True, the truncation warning goes away.

Strange behavior.  Again, just a warning that doesn't keep a given data flow from working properly.  But strange nonetheless.  I've posted this as a bug on Connect; if you come across it, feel free to comment or validate it.

http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=327577

Friday, February 22, 2008

Using Connection Managers in SSIS

Whenever I create SSIS packages, I typically create two different Connection Managers for use against SQL Server databases. The first uses a standard OLE DB provider, like this one to the AdventureWorks database (which I typically name something like AdventureWorks.OLEDB):

AdventureWorks.OLEDB

The second one, to the same database, uses an ADO.NET provider, like this one to the AdventureWorks database (which I typically name something like AdventureWorks.ADONET):

AdventureWorks.ADONET

Now, here's why I set up these two different Connection Managers:

The OLE DB provider seems to be faster than the ADO.NET provider for getting data into and out of a SQL Server database. Thus, in Data Flow tasks within the package, I use the first Connection Manager for OLE DB Source and OLE DB Destination components when hitting the database. Thus:

Data Flow Task

However, as anyone who has used the Execute SQL task along with an OLE DB-based Connection Manager will tell you, dealing with stored procedures and their parameters isn't the most elegant. If the Execute SQL task uses an OLE DB-based Connection Manager, then calls to stored procedures look like this:

Stored Procedure with Positional Parameters

In the Parameter Mappings tab of the task, the parameters then have to be specified using positional numbers, starting with 0. So, for the example above, you'd have to set up mappings to parameters as follows:

Positional Parameter Names

If, on the other hand, you use an ADO.NET-based Connection Manager with the Execute SQL Task, things are a bit more elegant. In this case, you can simply specify the name of the stored procedure without any positional parameter markers:

Stored Procedure Name Only

Note the property called IsQueryStoredProcedure below the name of the stored procedure. When an OLE DB-based Connection Manager is used (as in the first example above), this property is set to False and is grayed out. Now, however, with an ADO.NET-based Connection Manager, the property can be set to True whenever a stored procedure name is provided. As for the parameters, they can now be mapped using the actual parameter names within the stored procedure:

Actual Parameter Names

To me, this is much more intuitive and elegant. And, when you build lots of SSIS packages (for an ETL process, for example), you almost always have more than a few stored procedure calls that need to be made before Data Flow tasks, after them, in Event Handlers, etc. Thus, although it seems somewhat silly to have two Connection Managers to the same database, using one for each of the different providers for SQL Server allow you to have the best of both worlds -- good performance for getting data into and out of a database, and good (or at least better) stored procedure handling for parameter names and mapping.

Thursday, February 21, 2008

Using Schemas to Organize Data Warehouse Objects

Over the last several years, I've adopted the practice of using schemas to organize data warehouse objects in the SQL Server database created as part of a BI solution. I've found this is a good way to differentiate the objects based on their purpose. And its freed me from having to use suffixes or prefixes as I'd been doing for several years.

For example, in my early days of designing BI solutions, I'd often use table names like DimDate, DimProduct, and the always funny DimCustomer and DimEmployee for dimension tables. And things like FactSales, FactInventory, and FactCalls for fact tables. This organized things fairly well, but always felt a bit artificial.

With the introduction of schemas to SQL Server databases, however, I've standardized on a set of schemas for every data warehouse database I design. The tables above would then be named Dim.Date, Dim.Product, Dim.Customer, and Dim.Employee (as tables built within the Dim schema) along with Fact.Sales, Fact.Inventory, and Fact.Calls (as tables built within the Fact schema). Likewise, if I have views, stored procedures, functions, etc. that are related to the dimensions, they get created in the Dim schema while those related to the fact tables get created in the Fact schema.

Overall, I generally create the following schemas right out of the gate:

  • Dim -- used for all objects related to the dimensions in the model
  • Fact -- used for all objects related to the facts in the model
  • Audit -- used to hold objects related to auditing, generally a feature of the ETL process that loads the dimensions and facts
  • ETL -- used to hold objects directly related to the ETL process (configuration values, processing directives, etc.)
  • Stage -- used to hold all objects needed for staging data during the ETL process

In some cases, I've used other schemas to hold objects specific to a given solution or implementation. For example, I've been known to create the following:

  • History -- used to hold objects related to storing historical copies of data (for example, when you update a dimension record and need the pre-update version of the record just in case)
  • Report -- used to hold objects specifically used by reporting applications (such as views and stored procedures used directly by SSRS)

And while I know that schemas are generally used for security purposes, the ability to quickly glance at a list of tables, views, and stored procedures and know what they're there for is great. And since SSMS shows lists of objects alphabetically by schema, the list of tables and such are organized by schema. Pretty handy!

New Blog Up and Running

For those of you who know me (and I realize that's a much smaller audience than I'd like to think sometimes), I'm finally getting around to doing the blog thing.  A while back, I attempted to start a blog up while with another consulting firm (which has all but vanished I think).  But I didn't really do much with that blog, so I'm starting this new one with high hopes!

I've been working in the Microsoft BI space for quite a while now (having started with the alpha bits of SQL Server 7.0 -- trying valiantly to figure out the programming model for DTS before BOL existed -- on a project in Atlanta).  Since that time, I've come to have a pretty decent understanding of Microsoft's BI stack -- including just enough to be dangerous with MOSS.  But my focus continues to be the core components of SSIS and SSAS.  So, more than likely, that's what you'll see the most of in this blog.

Over the course of the next few weeks (and months), I hope to get a number of things posted with regards to the current versions of the tools (aka, SQL Server 2005).  As I get more opportunities to start using the new release, I'll start to post about that.  Hopefully, you'll find a good nugget or three to take away and use on your next BI implementation...