开发者

What is exactly a measure in star schema in data warehouse design?

开发者 https://www.devze.com 2023-04-09 11:02 出处:网络
Star schema consists of dimension and fact tables. Fact tables contain foreign keys for each dimension and in additio开发者_如何学JAVAn to that, it contains \"measures\". What exactly comprises of th

Star schema consists of dimension and fact tables.

Fact tables contain foreign keys for each dimension and in additio开发者_如何学JAVAn to that, it contains "measures". What exactly comprises of this measure ?

Is it some aggregate function's answer that is stored ?


Basically yes.

If you had a simple grid

Salary     Januari   Februari   March   April   May    June
           Q1                         Q2

Me         1100      1100      1100    1100    1500    1500
Collegue1   2000      2000      2000    0       0      0

Time is a hierarchical dimension with two levels (shown). The other dimension shown is 'EmployeeID'. The other dimension (not shown) could be in the PointOfView (e.g. Budget/Actual).

The Amount (1100, e.g.) is the Measure and it constitutes your facts (the non-identifying parts of the facts). The dimensions define consolidation functions for each measure on the various levels (E.g. Amount(Q1) == SUM(Amount(January...March))). Note that the consolidation will behave differently depending on the measure (e.g. the income tax % will not be summed, but somehow consolidated: how exactly is the art of OLAP Cube design).

(trivia: you can have calculated measures, that use MDX to query e.g. the deviation of Amount in comparison the the preceding Quarter, the Average salary acoss the whole quarter etc.; it will be pretty clear that again, the consolidation formulas require thought).

At this point you will start to see that designing the consolidation rules depends on the order in which the rules are calculated (if the formula for 'salary deviation %' is is evaluated FIRST and then consolidated, you need to average it; however if the raw SALARY measure is consolidated (summed) to the Q1,Q2 level first, then the derived Measure can be calculated like it was at the lowest level.

Now things become more fun when deciding how to store the cube. Basically two ways exist:

  • precalculate all cells (including all consolidations in all scenarios)
  • calculate on the fly

It won't surprise anyone that most OLAP engines have converged on hybrid methods (HOLAP), where significant parts of frequently accessed consolidation levels are pre-calculated and stored, and other parts are calculated on the fly.

Some will store the underlying data in a standard RDBMS (ROLAP) other won't (OLAP). The engines focused on high performance tend to keep all data in precalculated cubes (only resorting to 'many small sub-cubes' for very sparse dimensions).

Well, anywyas, this was a bit of a rant. I liked rambling off what I once learned when doing datawarehousing and OLAP


Fact and measure are synonyms afaik. Facts are data: sales, production, deliveries, etc. Dimensions are information tied to the fact (time, location, department).


Measures are one of two kinds of things.

  1. Measures. Measurements. Numbers with units. Dollars, weights, volumes, sizes, etc. Measurements.

  2. Aggregates. Sums (or sometimes averages) of data. It might be data in the warehouse: pre-computed aggregates for performance reasons. Or it might be data that can't be acquired (or isn't needed) because it's too detailed. Too high volume or something.

The most important thing about a fact table is that the non-key measures are actual measurements with units.


If it would be an adjacent tree model it would be the title-field or any other field that contains the data.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号