Multidimensional Data Model

Table of Content

What is an information cube?

An information cube allows data to be modeled and viewed in multiple dimensions. It is defined by dimensions and facts. In general terms, dimensions are the perspectives or entities with respect to which an organization wants to keep records.

Each dimension may have a table associated with it, called a dimension table, which further describes the dimension. Facts are numerical measures. The fact table contains the names of the facts or measures, as well as keys to each of the related dimension tables.

This essay could be plagiarized. Get your custom essay
“Dirty Pretty Things” Acts of Desperation: The State of Being Desperate
128 writers

ready to help you now

Get original paper

Without paying upfront

For example, in a 2-D representation, the sales for Vancouver are shown with respect to the time dimension (organized in quarters) and the item dimension (organized according to the types of items sold). The fact or measure displayed is dollars sold.

Now, suppose we would like to see the sales data with a third dimension. For instance, we might want to see the data according to time, item, as well as location. The above tables show the data at different levels of summarization. In the data warehousing research literature, an information cube such as each of the above is referred to as a cuboid.

Given a set of dimensions, we can build a lattice of cuboids, each showing the data at a different level of summarization, or group by (i.e., summarized by a different subset of the dimensions). The lattice of cuboids is then referred to as an information cube. The following figure shows a lattice of cuboids forming an information cube for the dimensions time, item, location, and provider.

The cuboid which holds the lowest level of summarization is called the base cuboid. The 0-D cuboid which holds the highest level of summarization is called the vertex cuboid. The apex cuboid is typically denoted by “all”.

STARS, SNOWFLAKES, AND FACT CONSTELLATIONS: SCHEMA FOR MULTIDIMENSIONAL DATABASES

The entity-relationship data model is commonly used in the design of relational databases, where a database schema consists of a set of entities or objects, and the relationships between them. Such a data model is appropriate for online transaction processing. Data warehouses, however, require a concise, subject-oriented schema which facilitates online data analysis. The most popular data model for data warehouses is a multidimensional model. This model can be in the form of a star schema, a snowflake schema, or a fact constellation schema.

Star schema:

The star schema is a modeling paradigm in which the data warehouse contains (1) a large central table (fact table), and (2) a set of smaller attendant tables (dimension tables), one for each dimension. The schema graph resembles a starburst, with the dimension tables displayed in a radial form around the central fact table.

In the Star Schema, each dimension is represented by only one table, and each table contains a set of attributes. For example, the location dimension table contains the attribute set {location_key, city, province, state}. This constraint may present some redundancy.

Example: Chennai and Madurai are both cities in the Tamil Nadu state in India.

Snowflake schema:

The snowflake schema is a variation of the star schema model, where some dimension tables are normalized, thereby further dividing the information into additional tables. The resulting schema graph forms a shape similar to a snowflake.

Snowflake schema of a data warehouse for sales

The major difference between the snowflake and star schema models is that the dimension tables of the snowflake model may be kept in normalized form to reduce redundancies. Such a table is easy to maintain and also saves storage space.

Drawback: The Snowflake schema needs more joins to execute a query, so it is not as popular as the Star Schema in Data Warehouse Design. A compromise between the star schema and the snowflake schema is to adopt a mixed schema where only the very large dimension tables are normalized.

Fact constellation:

Sophisticated applications may require multiple fact tables to share dimension tables. This type of schema can be viewed as a collection of stars, and hence is called a galaxy schema or a fact constellation.

Fact constellation schema of a data warehouse for sales and shipping

This schema specifies two fact tables, sales and shipping. The sales table definition is identical to that of the star schema. A fact constellation schema allows dimension tables to be shared between fact tables. In data warehousing, there is a distinction between a data warehouse and a data mart. A data warehouse collects information about subjects that span the entire organization, such as customers, items, sales, assets, and personnel, and therefore its scope is enterprise-wide.

For data warehouses, the fact constellation schema is commonly used since it can model multiple, interconnected subjects. A data mart, on the other hand, is a department subset of the data warehouse that focuses on selected subjects, and therefore its scope is department-wide. For data marts, the star or snowflake schemas are popular since each is geared towards modeling individual subjects. Examples for specifying star, snowflake, and fact constellation schemas in DMQL.

Snowflake and Fact Configuration Schemas:

Measures: Their Categorization and Computation

A measure value is computed for a given point by aggregating the information corresponding to the several dimension-value pairs specifying the given point. Measures can be organized into three categories:

  1. Distributive Measure
  2. Algebraic Measure
  3. Holistic Measure

Based on the type of aggregate functions that are used.

  • Distributive Measure

An aggregate function is distributive if it can be computed in a distributed mode as follows: Suppose the data is partitioned into n sets. The calculation of the function on each partition derives one sum value. If the result derived by applying the function to the n sum values is the same as that derived by applying the function on all the data without partitioning, the function can be computed in a distributed mode.

For example, count() can be computed for a data cube by first partitioning the cube into a set of subcubes, calculating count() for each subcube, and then summing up the counts obtained for each subcube. Hence, count() is a distributive aggregate function. For the same reason, sum(), min(), and max() are distributive aggregate functions. A measure is distributive if it is obtained by using a distributive aggregate function.

Cite this page

Multidimensional Data Model. (2017, Aug 19). Retrieved from

https://graduateway.com/multidimensional-data-model-essay-3170-essay/

Remember! This essay was written by a student

You can get a custom paper by one of our expert writers

Order custom paper Without paying upfront