Dimensional modeling is the name of a logical design
technique often used for data-warehouses. Dimensional modeling provides the
best results for both ease of use and high performance.
Data Warehouse is loaded from the source database. So we
design Dimensional model from the existing OLTP/ Normalized source DB model.
Before going in to the terms and theories, first let us see how
an OLTP to Dimensional model will look like:
In Dimensional model,
tables are divided in to either fact tables or dimensional tables.
FACT table
Fact table contains columns which are numeric data
(measures). In the ORDER FACT, other than PKs and FKs, rest of the columns are
measures like QUANTITY, QUOTED_PRICE, UNIT_PRICE, etc.
DIMENSION table
Dimension tables on the other hand have only textual data or
non-text data that is used for textual purposes. This data is used for descriptive purposes
only. In the above example there are 4
Dimension tables with each table containing only descriptive data.
STAR schema /
Snowflake: Dimensional Models are mostly star schema since it looks like
star with many dimensions surrounding a Fact. When one or more dimension tables
are normalized, then the model will be referred to as Snowflake (Note: Fact
table won’t be normalized).
Steps for ER to
Dimensional Models
- 1Separate each entity into the business process that it represents.
- 2Create fact tables by selecting numeric and additive non-key facts. Fact tables may be a detail level or at an aggregated level depending on business needs.
- 3Create Dimensions by de-normalizing all the remaining tables into flat tables with atomic keys that connect directly to the fact tables.
Data Mart: According
to the different business process, there will different Facts like Product,
Order, Sales, etc and a data mart is essentially a coordinated set of fact
tables, all with similar structures. Reporting tools and Business Intelligence
tools normally works on top of data marts/OLAPs.
Why we need a
separate Data Warehouse when all data is present in the source system?
- When we use source system for reporting purpose, then transactions like Insert, delete, update will take more time to complete which may affect thousands of users.
- 2There might be different type of databases (Oracle, SQL Server, CSV, XML, etc) used in source system and to extract reports from all these sources can be done only after bringing those to a common platform.
Why we need Data
Warehouse to be in Dimensional Model? Just another copy of source system in one
platform isn’t enough?
Suppose you want to find total no. of orders done through a
particular Sales representative,
In ER model, query will be like following:
select SUM(od.QUANTITY) from SALESREP s,
CUSTOMER c, ORDERS o, ORDER-DETAIL od, PARTS p
where s.SLSREP_NR=c.SLSREP_NR and
c.CUSTOMER_NR=o.CUSTOMER_NR and o.ORDER_NR=od.ORDER_NR and od.PART_NR=p.PART_NR
and p.PART_NR=3321223 and s.SLSREP_NR=8769
In Dimensional Model, below query will fetch the same output
as produced by above query:
select SUM(QUANTITY) from ORDER_FACT
where PART_NR=3321223
and SLSREP_NR=8769
Did you gain
performance? If yes, then where?
Yes. You saved 4 table
joins in Dimensional model! (Join operation is considered to be one of the
hectic processes for Database software).
Isn’t Dimensional
model worth or not?
There are lot more concepts in Data warehouse model which
convince the necessity of having Dimensional model for data Warehouses. This
model was founded by Ralph Kimball.
This paper is just an introduction for beginners and as I
said before, there are lot of concepts still to be discussed.
No comments:
Post a Comment