Normalization vs. Denormalization

normalization vs. denormalizationNormalization vs. Denormalization best practices for Power Pivot / Tabular data modeling is typically not disputed.  

First, let’s quickly define in human terms what we are referencing when we speak of normalization vs. denormalization. Normalization is reducing data duplication by splitting dimensional data and attributes into their dimension tables.  Typically, dimensions/attributes in a hierarchy are modeled in a snowflake structure (i.e. Fact Table –> Item Table –> ProductCode Table –> Category Table).  Denormalization is modeling in an attempt to optimize performance by taking data from higher hierarchy tables and forcing the data into the lower dimension tables regardless of duplication.  This reduces the number of joins required to report on the attributes available in the higher hierarchy data.  

I will highlight first what I have read and what has resonated with me for the duration of my Power Pivot / Tabular experience in regards to normalization vs. denormalization.

Normalization vs. Denormalization top principles

More rows are preferred over more columns.

More columns are preferred over more dimensions/tables, especially if the attributes have a lower cardinality.

Everyone tends to agree (and data backs it up) that a snowflake data model should be denormalized as much as possible for performance optimization for Power Pivot / Tabular data models.

This principle of normalization vs. denormalization goes against many typical data warehouse architect’s mantra.  This is one of the many in-memory concepts that is hard for them to swallow.  Normalizing data reduces data warehouse disk space by reducing data duplication and dimension cardinality.  The goal for Power Pivot / Tabular data modeling is two-fold and sometimes one interferes with the other.  We want to reduce the cardinality AND reduce the memory footprint of our data model.  Typically, the more important goal is to increase performance.  By denormalizing the data model into a simple star schema, bringing in attributes from a snowflaked dimension into the lowest dimension in the hierarchy, you do not reduce cardinality.  However, you do reduce the number of joins required to report on the higher level attributes.  I say you do not reduce cardinality because the same number of elements for each attribute still exists in your lower dimension.  They are indeed duplicated in this lower dimension; however, the cardinality has not changed!  

It is important to note that you typically do not want a single, flat table structure.  It has been shown in performance testing that Power Pivot / Tabular performs better when utilizing a dimension vs. an attribute in a flat fact table.  As well, you will increase memory consumption if you duplicate data in the fact table due to the increase in pointers when compression occurs.  Power Pivot and Tabular can compress data better if it is ordered (more on this later).  Obviously, you are not able to order all attributes of a table.  The more data not ordered optimally, the more pointers, and the more memory consumption.

As well, third normal form is not preferred.  However, third normal form and/or single, flat tables can be utilized for prototyping.  There is no harm and it may even help profile the data for designing the optimal star schema structure.

When designing a data model, one must take great care in picturing the end result.  This is the structure that your end users will see when creating reports or performing ad-hoc analysis.  If it is not organized in a friendly manner (i.e. dimensions split from facts), it may prove to be difficult to navigate (more on this in a later post as well).  

In Power Pivot, if you want to display attributes in a separate folder, you will be forced to normalize.  If the attribute cardinality is low, it is typically safe to normalize.  A relationship between two large tables will degrade performance.  So, it would be optimal to keep an attribute with very high cardinality in the fact table rather than normalizing.  If you are only retrieving a single attribute from a dimension, it would also be optimal to denormalize. 

Test, test, test.  Power Pivot / Tabular surprises the best of us and provides unexpected results sometimes.  Always keep the end user perspective into account and then follow these basic concepts to optimize.  Normalization vs. denormalization will always have exceptions to the rule.

Data Model Best Practices

Best Practice - DMWe will start off discussing the data model itself.  

The data model that is created in Power Pivot or a Tabular solution is the foundation of any Microsoft business intelligence solution.

The data model consists of the data structure (including tables and their relationships, columns in the tables, and all measures created), the data itself (actual content in the table rows), and the metadata (the table/column/measure descriptions, formatting, security, etc.).

Starting the solution with a solid foundation (the data model) is critical to the success of the solution.  An optimal data model lends itself to the speed and performance of the solution as well as the usability and user acceptance or adoption of the solution.
We will discuss the following topics related to Power Pivot and Tabular data modeling best practices:

We will modify this posting as we add new content.  We will add links to the new postings and very possibly add new topics as we go.  We have so much information already that we could at least put an outline around the majority of it.  We look forward to sharing this information with you and helping grow the Power Pivot and tabular community.  Check back often or sign up for a subscription so you don’t miss new information!

Best Practices of Microsoft BI

Best PracticesWe have seen several dispersed articles on Power Pivot, SSAS Tabular, and Data Analysis Expressions (DAX) best practices.


Over many articles, we will attempt to consolidate many of these into a single, consolidated location as well as adding new material to the list based on our experiences as well as any missing information we could not come across.  

We would love your feedback as we move through the list of best practices.  With newer technologies, there are new things learned every day.  If you feel like there is something that you would like to be added or you would like to request a posting on a specific topic, please let us know.  We are highly receptive to feedback and requests for topics.  As well, please comment and feel free to subscribe to ensure you are notified when we post new articles.

We have a ton of material coming and are excited to share it with the community!

The listing below will be linked to the underlying article category header as we create articles under each category. Each category will have multiple articles created under each one as well.  We will add new categories as needed and content will be created in no particular order.  Like we stated previously, we have a huge backlog of information and articles to share.


Power BI Preview

I am so excited for the future of Microsoft BI.  On January 27th, Microsoft unveiled a new Power BI Preview (link) offering that I believe will radically change the BI atmosphere.
power bi preview
Since the Power BI inception, the biggest drawback has been its pricing.  Power BI can now compare to Tableau, IBM, Qlikview, and Spotfire pricing.  Other drawbacks or deficiencies?
  1. The pricing model discussed above priced Power BI out of the market.  This is now addressed with the new pricing model.Power BI Preview
  2. No mobile component for on premises or for the iPad?  Microsoft has introduced both the iPad application (Android coming soon) as well as the Analysis Services Connection Preview.  This is exciting!  A connector to connect to your on premises tabular data models while gaining the new features and functionality of Power BI in the cloud as soon as they are released and mobile applications!
  3. Forcing users to utilize Excel?  Virtually everyone utilizes Excel, but OK, the choice to utilize Power BI decoupled from Excel is now an option!
  4. Don’t want to be forced into utilizing SharePoint just for the Power BI features?  OK!
  5. Your business does not have a strong enough supporting infrastructure and/or personnel to administer the hardware required for Power BI on premises?  No problem!  Let Microsoft handle the hardware and infrastructure….you just model the data and develop reporting.
  6. The Power BI learning curve?  So many people are familiar with Excel and Pivot Table functionality, PowerPivot, DAX, and Power View is somewhat a natural progression for power users.
  7. Do you have limited SQL skills?  Power Query helps to bridge that gap.  It is essentially a very advanced macro for ETL and data transformation.
  8. Don’t trust your data in the cloud?  A new SSAS on premises connector alleviates this concern.
  9. Dislike the reliance on Silverlight?  Microsoft has moved to HTML5 on the new preview.
  10. Want a consolidated view of the most important measurements/KPI(s) across completely unrelated, multiple data sources?  Done with the new dashboard functionality!
  11. Don’t want or need the full Office 365 Suite of tools?  OK!  Standalone Power BI is here in the preview and is priced accordingly.
  12. Power View lacks the ability for customization?  You cannot export the underlying data?  You cannot configure subscriptions or data alerts?  OK, yeah, these are still lacking.  However, there are a ton of ways to to work around this.  Some things can still not be done (i.e. variable segmentation, variable titles, custom colors, exporting data, etc.) but this is a first generation product….it’s coming.
Microsoft is quickly answering the concerns of the masses and has really impressed me with their offerings.
Continue Reading