Thursday, February 21, 2008

Using Schemas to Organize Data Warehouse Objects

Over the last several years, I've adopted the practice of using schemas to organize data warehouse objects in the SQL Server database created as part of a BI solution. I've found this is a good way to differentiate the objects based on their purpose. And its freed me from having to use suffixes or prefixes as I'd been doing for several years.

For example, in my early days of designing BI solutions, I'd often use table names like DimDate, DimProduct, and the always funny DimCustomer and DimEmployee for dimension tables. And things like FactSales, FactInventory, and FactCalls for fact tables. This organized things fairly well, but always felt a bit artificial.

With the introduction of schemas to SQL Server databases, however, I've standardized on a set of schemas for every data warehouse database I design. The tables above would then be named Dim.Date, Dim.Product, Dim.Customer, and Dim.Employee (as tables built within the Dim schema) along with Fact.Sales, Fact.Inventory, and Fact.Calls (as tables built within the Fact schema). Likewise, if I have views, stored procedures, functions, etc. that are related to the dimensions, they get created in the Dim schema while those related to the fact tables get created in the Fact schema.

Overall, I generally create the following schemas right out of the gate:

  • Dim -- used for all objects related to the dimensions in the model
  • Fact -- used for all objects related to the facts in the model
  • Audit -- used to hold objects related to auditing, generally a feature of the ETL process that loads the dimensions and facts
  • ETL -- used to hold objects directly related to the ETL process (configuration values, processing directives, etc.)
  • Stage -- used to hold all objects needed for staging data during the ETL process

In some cases, I've used other schemas to hold objects specific to a given solution or implementation. For example, I've been known to create the following:

  • History -- used to hold objects related to storing historical copies of data (for example, when you update a dimension record and need the pre-update version of the record just in case)
  • Report -- used to hold objects specifically used by reporting applications (such as views and stored procedures used directly by SSRS)

And while I know that schemas are generally used for security purposes, the ability to quickly glance at a list of tables, views, and stored procedures and know what they're there for is great. And since SSMS shows lists of objects alphabetically by schema, the list of tables and such are organized by schema. Pretty handy!

1 comment:

jtankers said...

I was tempted to do the same (results do look elegant), but I am using schemas to indicate the source database.