How to handle Slowly Changing Dimensions (SCDs) in data model design?
2012-06-25 14:05
411 查看
There are multiple methods to handle the slowly changing dimensions. Which technique to use depends on your business requirements. The choice among these three methods are not a technical design decision since their behaviors are different.
Type One: Overwite the old data with new data
Using this method, you do not store the histoy. For example, that say each customer can have one salesrep at any given point in time. When the salerep of ABC Inc., changes from Sandy to Laura, Sandy was a salerep of ABC will not
be kept anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc. forever and count all the sales done by Sandy as Lanura’s.
The above example may not sound making business sense. However, if you only report the sales of the current period, and salesrep does not change during the period, this method is ok to be used.
Mary OLTP tables does not need to track the history of changes and thus this method may be used by the source application. However, if you want to report the historical data, even your OLTP does not track history, the data warehouse can still use other methods
to track the history.
Type Two: Add a new record at the timeof the change
Using this method, all priorhistory are saved. There are two alternative methods to model the key of this table.
Method A – No surrogate key – Use timestamp
When a change happens, a new record is added into the table. All the attributes are copied from the previous record except the changed values. The nature key is copied as well so the timestamps is used to differentiate the records.
When a fact table is joined with the dimension, if you are interested in the historical data, the timestamp will be used as part of the join condition. To ease the join, the record typically use two date columns – the effective start date and the effective
end date.
Method B – No surrogate key – Use version number
Instead of using the date column, a version number is used to differentiate the different versions of the records.
This technique requires the fact table store both nature key and the version number to retrive a given version of the dimension date.
Method C – Use a surrogate key
When an attribue is change, a sequence generated key is used, the fact table will also use this key column as the foreign key.
Type Three: Track changes using a separate column
Using this method, you use a separate column of dimension table to store the values of previous years, in addition to the current year data.
This method does not track all the history, but just one prior version.
If the data is changed, the old value need to be moved from the current value column to the prior column and the new value overwrites the current column.
This method is used when the changes is not randon but a predefined interval such as annual.
出处:http://dylanwan.wordpress.com/2007/01/13/how-to-handle-slowly-changing-dimensions-scds-in-data-model-design/
Type One: Overwite the old data with new data
Using this method, you do not store the histoy. For example, that say each customer can have one salesrep at any given point in time. When the salerep of ABC Inc., changes from Sandy to Laura, Sandy was a salerep of ABC will not
be kept anywhere. Any report by salesrep will assume that Laura is the salereps of ABC Inc. forever and count all the sales done by Sandy as Lanura’s.
The above example may not sound making business sense. However, if you only report the sales of the current period, and salesrep does not change during the period, this method is ok to be used.
Mary OLTP tables does not need to track the history of changes and thus this method may be used by the source application. However, if you want to report the historical data, even your OLTP does not track history, the data warehouse can still use other methods
to track the history.
Type Two: Add a new record at the timeof the change
Using this method, all priorhistory are saved. There are two alternative methods to model the key of this table.
Method A – No surrogate key – Use timestamp
When a change happens, a new record is added into the table. All the attributes are copied from the previous record except the changed values. The nature key is copied as well so the timestamps is used to differentiate the records.
When a fact table is joined with the dimension, if you are interested in the historical data, the timestamp will be used as part of the join condition. To ease the join, the record typically use two date columns – the effective start date and the effective
end date.
Method B – No surrogate key – Use version number
Instead of using the date column, a version number is used to differentiate the different versions of the records.
This technique requires the fact table store both nature key and the version number to retrive a given version of the dimension date.
Method C – Use a surrogate key
When an attribue is change, a sequence generated key is used, the fact table will also use this key column as the foreign key.
Type Three: Track changes using a separate column
Using this method, you use a separate column of dimension table to store the values of previous years, in addition to the current year data.
This method does not track all the history, but just one prior version.
If the data is changed, the old value need to be moved from the current value column to the prior column and the new value overwrites the current column.
This method is used when the changes is not randon but a predefined interval such as annual.
出处:http://dylanwan.wordpress.com/2007/01/13/how-to-handle-slowly-changing-dimensions-scds-in-data-model-design/
相关文章推荐
- [转]How to handle Failed Rows in a Data Flow
- How to use the System Restore API to save and to restore system data in Visual C++
- 转:How to create your own custom 404 error page and handle redirect in SharePoint 2007 (MOSS)?
- How to cleanup orphaned Data Pump jobs in DBA_DATAPUMP_JOBS ?
- How to read data from a file in reverse order?
- 【跟着stackoverflow学Pandas】How to iterate over rows in a DataFrame in Pandas-DataFrame按行迭代
- How to design DL model(1):Efficient Convolutional Neural Networks for Mobile Vision Applications
- How to Build an Economic Model in Your Spare Time (如何利用业余时间建立经济模型)
- How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?
- How to handle ddl in GoldenGate environment without ddl replication?
- How to distinguish Design time or Running time in Mobile cusotmer Contorl(the NetCF2.0 is different to NetCF1.0)
- how to select data in multiple datatables after using join,group by and order by in Linq
- How to prepare system design questions in a tech interview?
- How to Trick ESXi 5 in seeing an SSD Datastore
- Agile Project Management: How to Succeed in the Face of Changing Project Requirements
- *****How to scroll in the grid when editing data in a cell
- How to use circles in website design[译文]
- How to create custom navigation menu in SharePoint with XML data source 使用XML数据源在SharePoint创建自定义导航菜单
- How to delete a large number of data in SharePoint for List when refreshing data?
- How to Read/Save JSON Data in Unity