Fact & Dimension, Star & Snowflake Schema

January 6, 2025•373 words

(Part II of the study notes for the Data Warehousing Concepts course on DataCamp (link))

There are two types of tables in a data warehouse: fact table and dimension table.

A fact table contains the measurements, metrics and facts of a business process. Consider the fact table of a car sales business. Each row would contain information such as the date of transaction, what car model is sold, the sales price, details about the buyer, etc. I think of it as a table that records and updates all transactions (or whatever form of business records), and it serves as a "source of truth" for this business.
A dimension table can be considered as companions to the fact table. It contains descriptive attributes and can be joined to the fact table for querying. For the car sales business mentioned above, a dimension table may contain car specifications for all possible car models sold by this business, and this dimension table can be joined to the car sales fact table on the key car_model, for example. I think of them as "mapping tables" - they are not directly describing the business process, but provide ancillary information related to it.

Suppose we want to organize fact and dimension tables in a data warehouse, there are two classical ways of to do so, i.e., two possible schemas (not to be confused with the schema of a single table in a database).

Star Schema: As the name suggests, all dimension tables surround the fact table in the shape of a star. Each dimension table can be joined directly with the fact table for querying. It is very simple structure and allows for efficient data querying. However, the fact table may get rather wide with more dimension tables added which could increase data redundancy, and there isn't support for many-to-many relationships.
Snowflake Schema: The fact table still sits in the centre and is surrounded by dimension tables. In addition, there are more dimension tables linked to other dimension tables, but not directly to the fact table, thus forming the shape of a snowflake. The structure is more complex, thus allowing for more in-depth analyses, but querying may be less efficient due to possibly multiple joins.

👍❤️🫶👏👌🤯🤔😂😍😭😢😡😮

Subscribe to the author

You'll only receive email when they publish something new.

More from Spark Tseung
All posts

Data Warehouse vs Mart vs Lake

January 3, 2025•145 words

(Part I of the study notes for the Data Warehousing Concepts course on DataCamp (link)) The following table summarizes some key differences between Data Warehouse, Data Mart and Data Lake. I've also found this useful resource by AWS on the same comparison but it gives a lot more details. Feature Data Warehouse Data Mart Data Lake Data Structure Structured Structured Structured & Unstructured Complexity to Change Complex Complex Less Complex Purpose of Data Known Known May not be k...

Read post

Database Normalization vs Data Cleansing

January 9, 2025•264 words

(Part III of the study notes for the Data Warehousing Concepts course on DataCamp (link)) Database Normalization is the process of structuring a relational database according to a series of so-called "Normal Forms". The goal is to reduce data redundancy and improve data integrity. The set of normal forms are listed on Wikipedia with examples, which I don't think is necessary to repeat here. A very closely related concept is Data Cleansing, which involves identifying and correcting/removing cor...

Read post

Fact & Dimension, Star & Snowflake Schema

More from Spark TseungAll posts

Data Warehouse vs Mart vs Lake

Database Normalization vs Data Cleansing

More from Spark Tseung
All posts