Fact & Dimension, Star & Snowflake Schema

(Part II of the study notes for the Data Warehousing Concepts course on DataCamp (link))

There are two types of tables in a data warehouse: fact table and dimension table.

  • A fact table contains the measurements, metrics and facts of a business process. Consider the fact table of a car sales business. Each row would contain information such as the date of transaction, what car model is sold, the sales price, details about the buyer, etc. I think of it as a table that records and updates all transactions (or whatever form of business records), and it serves as a "source of truth" for this business.

  • A dimension table can be considered as companions to the fact table. It contains descriptive attributes and can be joined to the fact table for querying. For the car sales business mentioned above, a dimension table may contain car specifications for all possible car models sold by this business, and this dimension table can be joined to the car sales fact table on the key car_model, for example. I think of them as "mapping tables" - they are not directly describing the business process, but provide ancillary information related to it.

Suppose we want to organize fact and dimension tables in a data warehouse, there are two classical ways of to do so, i.e., two possible schemas (not to be confused with the schema of a single table in a database).

  • Star Schema: As the name suggests, all dimension tables surround the fact table in the shape of a star. Each dimension table can be joined directly with the fact table for querying. It is very simple structure and allows for efficient data querying. However, the fact table may get rather wide with more dimension tables added which could increase data redundancy, and there isn't support for many-to-many relationships.

  • Snowflake Schema: The fact table still sits in the centre and is surrounded by dimension tables. In addition, there are more dimension tables linked to other dimension tables, but not directly to the fact table, thus forming the shape of a snowflake. The structure is more complex, thus allowing for more in-depth analyses, but querying may be less efficient due to possibly multiple joins.


You'll only receive email when they publish something new.

More from Spark Tseung
All posts