The basic strucure of a cube
Interested in exciting thought leadership and revenue insights?
Sign up for the Fluence newsletter.
This week we’ll take a look at the basic structure of a cube from an end user perspective, as opposed to the architectural underpinnings. This is intended as a high level overview, and for brevity contains some generalizations, and focuses on Microsoft Analysis Services cubes.
An OLAP cube consists of several key elements, the most fundamental of which are dimensions and measures
Dimensions are the business elements by which the data can be queried. They can be thought of as the ‘by’ part of reporting. For example “I want to see sales by region, by product by time”. In this case region, product and time would be three dimensions within the cube, and sales would be a measure, below. A cube based environment allows the user to easily navigate and choose elements or combinations of elements within the dimensional structure.
Measures are the units of numerical interest, the values being reported on. Typical examples would be unit sales, sales value and cost.
Note that there are modelling techniques which develop cubes with only one pseudo measure, typically called ‘value’ or similar, and implement what the user would think of as the measures through a dimension. There are performance and navigational reasons which can make this a good approach, but not one we’ll cover here in our introduction.
The diagram below shows a very simple cube which we’ll use for discussion.
This particular cube is for an exports business, and consists of 3 dimensions, Source, Route, and Time. The two measures are Packages, being the number of packages shipped, and Last, being the last shipped date.
Very few real world cubes will have just three dimensions, but I’ve yet to learn how to draw a 12 dimensional cube! The diagram above is enough to illustrate the fundamental principle, that at every intersection of the different dimensions, are stored the value for each of the measures. In larger real world cubes the principle is the same, just the numbers of intersections is larger.
The diagram highlights a few additional features of dimensions which need to be understood.
A dimension can contain one or more hierarchies. Hierarchies are really navigation or drill paths through the dimension. They are structured like a family tree, and use some of the same naming conventions (children / parent / descendant). Hierarchies are what brings much of the power to OLAP reporting, because they allow the user to easily select data at different granularity (day / month / year), and to drill down through data to additional levels of detail.
- Hierarchies consist of different levels. For example a time dimension would typically have a year, a month and a day level. A customer hierarchy may consist of Country, State, City, and Name levels.
- The levels are either implied in the case of dates, or exist as ‘attributes’ in the source data. So for example customer number 12324, John Brown, would have additional information recorded such as his address, broken into house number & street, city, state, country. Each of these is an attribute.
- Hierarchies are really ordered navigation paths through the attributes
- In Analysis Services 2005, the user’s view of a dimension will typically consist of both the defined Hierarchies, and also the Attributes. Attributes are ‘flat’, i.e. contain no ordered drill path.
A member is any single element within a hierarchy. For example in a standard Time hierarchy, 1st January 2008 would be a member, as would 20th February 2008. However January 2008, or 2008 itself could also be members. The latter two would be aggregations of the days which belong to them. Members can be physical or calculated. Calculated members mean that common business calculations and metrics can be encapsulated into the cube, and are available for easy selection by the user, for example in the simplest case Profit = Sales – Cost
Aggregation is a key part of the speed of cube based reporting. The reason why a cube can be very fast when for example selecting data for an entire year, is because it has already calculated the answer. Whereas a typical relational database would potentially sum millions of day level records on the fly to get an annual total, Analysis Services cubes calculate these aggregations during the cube build and hence a well designed cube can return the answer quickly.
Sum is the most common aggregation method, but it’s also possible to use average, max etc. For example, if storing dates as measures it makes no sense to sum them.
The cube introduces a number of dimensions, hierarchies and measures, modeling the business of interest, and all of which are available to the end user to quickly and easily select, drill, and slice and dice. With a well designed cube the user benefits from a reporting environment which is highly flexible, contains the pre-calculated business metrics they regularly use, and is fast in terms of data retrieval.