Skip to main content

Slowly Changing Dimension

In most multi-dimensional OLAP scenarios, lookup table might change unpredictably, rather than according to a regular schedule. For example product category of one specific product might get changed in product table, or segmentation of some customers might be changed in customer table. As product category or customer segmentation are modeled as dimensions in a cube, they are so called Slowly Changing Dimension, or SCD in short. Detailed introduction reference wikipedia

Dealing with this issue involves SCD management methodologies referred to as Type 0 through 6. But the most commonly seen are Type 1 and Type 2:

  • Type 1: overwrite. This methodology overwrites old with new data, and therefore does not track historical data. This is also called "latest status".

  • Type 2: add new row. This method tracks historical data by creating multiple records for a given natural key in the dimensional tables with separate surrogate keys and/or different version numbers. Unlimited history is preserved for each insert. This is also called "historical truth".

For SCD Type 2(subsequently referred to as "SCD2"), Only supports the model based on the History Table, Below screen-shot illustrates the basics:

SCD2 Model

History Table

The History Table stores the basic information of the record and the life cycle of each record. Changes to the record will add a new row and modify the life cycle of the historical record. Through the life cycle of the record, you can query historical records, and you can also query the latest records.

For example, in the SCD2_SALES table below, the time interval of the salesperson in the corresponding business area (SALES_DPT) is [START_DATE,END_DATE).

SALES_PKSALES_IDSALES_NAMESALES_DPTSTART_DATEEND_DATE
11Zhang SanSourth area1992/1/11993/1/1
22Li SiNorth area1992/1/11994/1/2
33Wang WuEast area1992/1/11995/1/3
41Zhang SanNorth area1993/1/11994/1/1
52Li SiEast area1994/1/29999/1/1
63Wang WuSourth area1995/1/39999/1/1
71Zhang SanWest area1994/1/19999/1/1

It can be seen from the table that Zhang San:

  • Worked in sourth area from 1992/1/1 to 1993/1/1

  • Worked in the north area from 1993/1/1 to 1994/1/1

  • And he has been working in the West area since 1994/1/1

Every time Zhang San change his work location, the History Table adds a new line of records and modifies the END_DATE of the previous record.

Join condition based on History Table

In order to be able to query the historical information of the History Table, the fact table is often used to filter the start and end dates of the History Table records, likeLO_ORDERDATE>=START_DATE AND LO_ORDERDATE<END_DATE As shown below:

model_historical_dimension_table_scd2_join

In order to use the History Table to meet the demand for slow dimensions, you can click Setting -> Advanced Settings -> Support History Table to turn on the function of supporting History Table. As shown below:

historical_dimension_table_switch

  • When it is turned on, you can use non-equal join conditions (≥,<) could be used for modeling, building and queries.

  • When it is turned off, the old SCD2 model will be offline

The current join conditions based on the History Table have the following restrictions:

  • Can’t define multiple join conditions for the same columns
  • Join condition ≥ and < must be used in pairs, and same column must be joint in both conditions
  • Join condition for columns should include at least one equal-join condition (=)
  • Two tables could only be joined by the same condition for one time
  • Currently, recommendations are not supported for the History Table model
  • By default, even if you use LEFT JOIN, you need to exactly match the model before you can use the model that contains the History Table to answer queries.

SCD2 model based on History Table

For SCD2 model, the purpose of historical traceability can be achieved through the join conditions based on the History Table. As shown in the figure below, in order to query the total sales revenue of the seller at each work location, the order date is associated with the working time interval. historical_dimension_table_scd2 historical_dimension_table_scd2 join

For seller Zhang San, the total sales revenue of orders in different area from 1992 to the present can be queried, as shown in the following table:

D_YEARSALES_NAMESALES_DPTTOTAL_REVENUE
1992Zhang SanSourth area3711706590
1993Zhang SanNorth area3882401031
1994Zhang SanWest area3626302199
1995Zhang SanWest area3733096229
1996Zhang SanWest area3487903587
1997Zhang SanWest area3725031606
1998Zhang SanWest area2101112606