Tuesday, March 20, 2012

DIMENSIONAL MODELING - for beginners


Dimensional modeling is the name of a logical design technique often used for data-warehouses. Dimensional modeling provides the best results for both ease of use and high performance.
Data Warehouse is loaded from the source database. So we design Dimensional model from the existing OLTP/ Normalized source DB model.
Before going in to the terms and theories, first let us see how an OLTP to Dimensional model will look like:


In Dimensional model, tables are divided in to either fact tables or dimensional tables.
FACT table
Fact table contains columns which are numeric data (measures). In the ORDER FACT, other than PKs and FKs, rest of the columns are measures like QUANTITY, QUOTED_PRICE, UNIT_PRICE, etc.

DIMENSION table
Dimension tables on the other hand have only textual data or non-text data that is used for textual purposes.  This data is used for descriptive purposes only.  In the above example there are 4 Dimension tables with each table containing only descriptive data.

STAR schema / Snowflake: Dimensional Models are mostly star schema since it looks like star with many dimensions surrounding a Fact. When one or more dimension tables are normalized, then the model will be referred to as Snowflake (Note: Fact table won’t be normalized).

Steps for ER to Dimensional Models
  • 1Separate each entity into the business process that it represents.
  • 2Create fact tables by selecting numeric and additive non-key facts.  Fact tables may be a detail level or at an aggregated level depending on business needs.
  • 3Create Dimensions by de-normalizing all the remaining tables into flat tables with atomic keys that connect directly to the fact tables.


Data Mart: According to the different business process, there will different Facts like Product, Order, Sales, etc and a data mart is essentially a coordinated set of fact tables, all with similar structures. Reporting tools and Business Intelligence tools normally works on top of data marts/OLAPs.

Why we need a separate Data Warehouse when all data is present in the source system?
  • When we use source system for reporting purpose, then transactions like Insert, delete, update will take more time to complete which may affect thousands of users.
  • 2There might be different type of databases (Oracle, SQL Server, CSV, XML, etc) used in source system and to extract reports from all these sources can be done only after bringing those to a common platform.


Why we need Data Warehouse to be in Dimensional Model? Just another copy of source system in one platform isn’t enough?
Suppose you want to find total no. of orders done through a particular Sales representative,
In ER model, query will be like following:
select SUM(od.QUANTITY) from SALESREP s, CUSTOMER c, ORDERS o, ORDER-DETAIL od, PARTS p
where s.SLSREP_NR=c.SLSREP_NR and c.CUSTOMER_NR=o.CUSTOMER_NR and o.ORDER_NR=od.ORDER_NR and od.PART_NR=p.PART_NR
and p.PART_NR=3321223 and s.SLSREP_NR=8769     

In Dimensional Model, below query will fetch the same output as produced by above query:
select SUM(QUANTITY) from ORDER_FACT
where PART_NR=3321223 and SLSREP_NR=8769

Did you gain performance? If yes, then where?
Yes. You saved 4 table joins in Dimensional model! (Join operation is considered to be one of the hectic processes for Database software).

Isn’t Dimensional model worth or not?
There are lot more concepts in Data warehouse model which convince the necessity of having Dimensional model for data Warehouses. This model was founded by Ralph Kimball.
This paper is just an introduction for beginners and as I said before, there are lot of concepts still to be discussed.

No comments:

Post a Comment