E-Book
34,79 €

Extreme DAX E-Book

Michiel Rozema

0,0

34,79 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

This book helps business analysts generate powerful and sophisticated analyses from their data using DAX and get the most out of Microsoft Business Intelligence tools.

Extreme DAX will first teach you the principles of business intelligence, good model design, and how DAX fits into it all. Then, you’ll launch into detailed examples of DAX in real-world business scenarios such as inventory calculations, forecasting, intercompany business, and data security. At each step, senior DAX experts will walk you through the subtleties involved in working with Power BI models and common mistakes to look out for as you build advanced data aggregations.

You’ll deepen your understanding of DAX functions, filters, and measures, and how and when they can be used to derive effective insights. You’ll also be provided with PBIX files for each chapter, so that you can follow along and explore in your own time.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 529

Veröffentlichungsjahr: 2022

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Extreme DAX

Take your Power BI and Microsoft data analytics skills to the next level

Michiel Rozema

Henk Vlootman

BIRMINGHAM—MUMBAI

Extreme DAX

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Producer: Suman Sen

Acquisition Editor – Peer Reviews: Saby Dsilva

Project Editor: Rianna Rodrigues

Content Development Editor: Lucy Wan

Copy Editor: Safis Editing

Technical Editor: Aditya Sawant

Proofreader: Safis Editing

Indexer: Manju Arasan

Presentation Designer: Pranit Padwal

First published: January 2022

Production reference: 1070122

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-851-1

www.packt.com

Contributors

About the authors

Michiel Rozema is one of the world's top Power BI experts and lives in the Netherlands. He holds a master's degree in mathematics and has worked in the IT industry for over 25 years as a consultant and manager. Michiel was the data insight lead at Microsoft Netherlands for 8 years, during which time he launched Power BI in the country. He is the author of two Dutch books on Power Pivot and Power BI. Michiel is one of the founders of the Dutch Power BI user group, organizer of the Power BI Summer School, and has been a speaker at many conferences on Power BI. He has been awarded the Microsoft MVP award since 2019 and, together with fellow MVP Henk Vlootman, he runs the consultancy company Quanto, which specializes in Power BI.

Henk Vlootman is a senior global Power Platform, Power BI, and Excel business consultant. Every year since 2013, Henk has received the Microsoft MVP award for his outstanding expertise and community leadership. Henk is one of the founders of the Dutch Power BI user group, organizer of the Power BI Summer School, and has been a speaker at many conferences on Power BI all over the world. He is also the author of two Excel and two Power Pivot/Power BI books. He started his career in 1992 with his own company, then as an Excel consultant. Nowadays he runs the consultancy company Quanto, which specializes in Power BI, together with fellow MVP Michiel Rozema.

About the reviewer

Greg Deckler is a Microsoft MVP for Data Platform and an active member of the Columbus Ohio IT community, having founded the Columbus Azure ML and Power BI User Group (CAMLPUG) and presented at many conferences and events throughout the country. An active blogger and community member interested in helping new users of Power BI, Greg actively participates in the Power BI community, having authored over 180 Power BI Quick Measures Gallery submissions and over 5,000 solutions to community questions. Greg is Vice President of Cloud Services at Fusion Alliance, a regional consulting firm, and assists customers in gaining competitive advantage from the cloud and cloud-first technologies like Power BI. Greg has authored three books on Power BI: Learn Power BI, DAX Cookbook, and Power BI Cookbook, Second Edition. Finally, Greg has also built an external tool for Power BI Desktop called Microsoft Hates Greg's Quick Measures, and he posts Power BI videos on YouTube.

I would like to thank my son, family, and the entire Power BI community for all of their support.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Part I: Introduction

1.1 DAX in Business Intelligence

The Five-Layer model for business intelligence

Enterprise BI and end-user BI

Where DAX fits in, and where to find it

Excel

Power BI

SQL Server Analysis Services

Azure Analysis Services

Tools to develop models and DAX

How to approach solution development

Using Power BI models to accelerate BI solution development

We do not know exactly what we need

Our data is not correct

The digital transformation cycle

Summary

1.2 Model Design

Columnar data storage

Relational databases

Columnar databases

Data types and encoding

Relationships

Data in Excel

Data in relational databases

Power BI's relational model

Relationship properties

Active and inactive relationships

Cross filter direction

Cardinality

Effective model design

Star schemas and snowflakes

The issue with star schemas

RDBMS principles to avoid in Power BI models

Interdependent dimensions

One fact table only

Data warehouse as the single source of truth

Using many-to-many relationships

Memory and performance considerations

Summary

1.3 Using DAX

Calculated columns

Calculated tables

Measures

DAX security filters

DAX queries

Date tables

Creating a date table

Best practices in DAX

Think in terms of DAX measures primarily

Build explicit measures

Use base measures as building blocks

Hide model elements

Do not mix data and measures – use measure tables instead

Table types

Summary

1.4 Context and Filtering

The Power BI model

Introduction to DAX context

Row context

Query context

Filter context

Detecting filters

Comparing query and filter context to row context

DAX filtering: Using CALCULATE

Step 1: Setting up a filter context

Step 2: Removing existing filters

Step 3: Applying new filters

Step 4: Evaluating the expression to calculate

Removing filters with ALL functions

Time intelligence

Changing relationship behavior

Table functions in DAX

Table aggregations

Using virtual tables

Context in table functions

Performance considerations using table functions

Filtering with table functions

Using CALCULATETABLE

Filters and tables

Using TREATAS

DAX variables

Summary

Part II: Business cases

2.1 Security with DAX

Introduction to row-level security (RLS)

Security roles

DAX security filters

Dynamic row-level security

Modeling considerations for RLS

Testing security roles

Testing live connection reports

Impersonation model

Adding the pImpersonation table to the model

Adding a test security role

Making it all work

Securing hierarchies using PATH functions

Hierarchical tables

Introducing PATH functions

PATH

PATHCONTAINS

PATHLENGTH

PATHITEM

PATHITEMREVERSE

Using PATH functions in RLS

Advanced hierarchy navigation in RLS

Securing attributes

The case for secured attributes

Object-level security and its restrictions

Dynamically securing attributes: introducing value-level security

Value-level security: modeling

Value-level security: security filters

Value-level security: advanced scenarios

How to develop in models with value-level security

Securing aggregation levels

Measures cannot be secured, fact tables can

Restricting fact table granularity

Securing aggregation levels with composite models

Combining aggregation security with value-level security

Securing an aggregation level as an attribute

Summary

2.2 Dynamically Changing Visualizations

The business case

Dynamic measures

The basic KPI measures

Creating a helper table

Creating a dynamic DAX measure

Selecting both the calculation and date columns dynamically

Dynamic labels

Solution overview

Creating a helper table

Creating a DAX measure using dynamic labels

Combining dynamic labels and dynamic calculations

Summary

2.3 Alternative Calendars

Week-based and Gregorian calendars

What is a week-based calendar?

Week numbers

Periods

Quarters

Years

Creating a week-based calendar table

Setting up dates

Finding the correct start date

Finding the correct end date

Creating additional columns

Time Intelligence calculations for week-based calendars

The Power BI model

Calculating year-to-date results

Calculating sales growth

Moving average by week within an accounting year

Keeping your report current

The Date Selection table

Creating selection options

Using Date selection in measures

Summary

2.4 Working with AutoExist

The Power BI model

How Power BI visualizes the output of a model

Visual filters and context

How using measures changes the behavior of visuals

Understanding a visual's DAX query

What AutoExist is, and what it does

Using multiple filters in a visual

How AutoExist optimizes DAX evaluation

Example: The case of the missing workdays

The business case

Model structure

Order intake analysis

Extending the Calendar table

Workday analysis

Where's my workday gone?

How to solve the missing workdays problem

The root of the problem

Changing model structure to get around AutoExist

Always consider the context!

Fixing the workday calculation

Optimizing report performance with AutoExist

Granularity in fact tables

Filtering on multiple fact tables

Optimizing model structure

Optimizing the visual

Summary

2.5 Intercompany Business

Modeling the QuantoBikes sales process

The sales process

Model structure

Business between subsidiaries

Subsidiary view versus consolidated view

Matching internal sales and purchases

Visualizing intercompany business

Future sales

Sales on one-off sales orders

Sales on long-term sales orders

Dealing with old sales orders

Dealing with current sales orders

Optimizing the current sales order calculation

Further optimization

Testing complex calculations

Summary

2.6 Exploring the Future: Forecasting and Future Values

Financial calculations

Present Value and Net Present Value

Internal Rate of Return

Financial DAX functions

The business case and model

Creating adjustable rates and indexes

Calculating Future Value (FV)

Initial investment and residual value

Irregular cash flows

Recurring cash flows

Positive and negative cash flows

Calculating Net Present Value (NPV)

Calculating the Internal Rate of Return (IRR)

Calculating cost-covering rent

Determining cost-covering rent by approximation

Optimizing the approximation

Summary

2.7 Inventory Analysis

Data modeling for status-oriented data

Inventory granularity

Basic inventory calculations

Inventory targets

Inventory forecasting

Two types of forecast

Using a sales forecast to predict inventory changes

Using extrapolation to predict inventory changes

Calculating long-lasting inventory

Working with forecast-based inventory targets

Using linear regression for extrapolating inventory

Summary

2.8 Personnel Planning

The Power BI model

Calculating sales

Optimizing the sales calculation

Calculating FTEs needed

Considering totals

Optimizing the FTE calculation

Optimizing the Power BI model

Considering aggregation levels

Summary

Other Books You May Enjoy

Index

Landmarks

Cover

Index

Share Your Thoughts

Once you've read Extreme DAX, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Part I

Introduction

1.1 DAX in Business Intelligence

Without a doubt, information is nowadays one of the most valuable assets of any organization. We all know this as consumers: companies are lining up to get our personal data. Not because any of us are so interesting individually (though we're sure you are a very interesting person!), but the combination of data from many consumers enables companies to get valuable insights to drive their business forward.

This is not only true for commercial companies. Public institutions, hospitals, and universities also benefit from information to better run their core processes. In any case, information is the foundation of progress and innovation.

But getting from data to information to insights can be a tedious process. It involves combining data from different sources, discovering hidden structures and correlations, and considering the context of data. This is why the field of business intelligence, or data analytics, has traditionally been exclusive to IT professionals. Which is not optimal, as it is really about business insights.

This book is all about one of the hottest tools for data analytics in existence today: DAX (Dynamic Analysis eXpressions). As a reader, we assume that you have some experience with DAX and that you are looking to improve your skills. As the foundation for this, it is important to understand what DAX is for and what it is not for. Additionally, you must learn to avoid doing things elsewhere that can be done better with DAX.

This chapter covers some general concepts that will help you lay this foundation. We cover the following topics:

The Five-Layer model for business intelligenceEnterprise BI and end-user BIWhere DAX fits in, and where to find itTools to develop models and DAXPowered by DAX: visual, interactive reportsHow to approach solution developmentThe digital transformation cycle

The Five-Layer model for business intelligence

To be able to discuss analytics in a structured and comprehensive way, we have developed a simple framework that describes the main components and responsibilities in an analytics solution. We have given it an equally simple name: the Five-Layer model.

Figure 1.1: The Five-Layer model

The first and lowest layer, Connect, is the starting point of analytics: if you want to analyze data, those data must come from somewhere. They could reside in Excel sheets, text files, large business databases, or somewhere on the Internet.

These raw data aren't typically in the right format to be able to analyze, especially when they come from different sources. Therefore, you need to Prepare your data. Preparation can take many forms, like imposing certain data types, transforming data, building data history, or matching data based on key attributes.

Creating nice and clean datasets in the Connect and Prepare layers can take a lot of effort. A lot. Building a data warehouse, a typical IT component in the Prepare layer, often leads to development projects spanning multiple years. The sad thing about this is that by the time the data warehouse is complete, the world has changed and the data warehouse does not comply anymore.

When data has been put in a workable format, you can start with proper analysis of the data in the Analyze layer. This is really the heart of the analytics solution. Building analytical models allows you to slice and filter data, compute aggregations of all sorts, and add calculations to provide specific insights.

The Visualize layer is all about creating reports and dashboards visualizing the results of analytical models. We have called this layer Visualize and not simply Output as, to really provide insights, helping a user focus on the important results is as important as providing results at all. Classic reports consisting of pages and pages of details are not very insightful, and force the user to export the whole report to Excel where they can aggregate the data. Really providing insight usually involves effective visualization of the most important outcomes of the analysis, while giving the user the option to answer the questions arising from the insight by viewing additional visuals on a lower detail level.

The top layer, Share, consists of platforms and processes aimed at providing access to reports and dashboards to the right people, while shielding them from those who shouldn't have access.

Whether you build your analytics solution in Excel, use Power BI, develop a corporate business intelligence system, or use no automated system at all, you will have to cover the five layers in some way. In a good analytics solution, the layers are clearly separated, and the processing of data is done in the right layer. Doing this has a lot of advantages, like avoiding implementing the same logic multiple times. Proper implementation of the Five-Layer model makes it relatively easy to cope with changes in, for example, source systems.

As you will see later in this chapter, DAX lives in the Analyze layer, with strong ties to both the Prepare and Visualize layers. We will talk about visualizations in a separate section, but first, let us discuss the question who does what in BI?

Enterprise BI and end-user BI

Organizations are becoming ever more data-driven. An organization that assesses performance on key performance indicators, or KPIs, uses dashboards reflecting the status for each KPI. These dashboards are typically highly standardized: the organization has reflected on the business strategy, business processes, how to measure the KPIs, and how to report on them. An automated dashboard with KPIs is relatively stable and will not change much, and is typically built and maintained by a central IT department or BI competence center.

The next level of being a data-driven organization is that every decision taken can be based on insights from relevant data. This means that the need for a more dynamic form of analytics arises, one that can answer ad hoc questions. This form of analytics is marketed as self-service BI and promises that everyone can build a BI solution without the help of a central IT department. From the Five-Layer model, it is clear why this is not realistic: few end users are able or have the time to solve the complexities of all the layers. The ideal situation is when the central IT department and end users complement each other, each focusing on their specific strengths.

We use the terms enterprise BI and end-user BI to distinguish between the two forms of analytics. Enterprise BI is the form of analytics built and maintained by IT. It uses large-scale server or cloud platforms and serves many users at the same time. Solutions are built in professionally managed projects that focus on data quality, availability of the platform, and well-defined processes.

End-user BI is done by business users. They have a direct need for insights in their daily job and look for fast and accessible ways to get these insights. Many of them start building their analytics solutions in Excel, but the complexity of the Excel documents leads to an ever-increasing time investment for keeping the data up to date and maintaining and extending the solution. Excel is not capable of coping with growing data volumes. To solve this, Microsoft has extended the features of Excel to turn it into a real end-user BI platform, with Power Query and Power Pivot as powerful new tools to prepare and analyze data. These tools have evolved into what is now Power BI.

The beauty of the Microsoft platform for analytics is that the tools for enterprise BI and end-user BI work seamlessly together. The technology underlying Power BI is a driving force behind this. In fact, it is now possible to mix the two formerly opposed sides of BI into one architecture with different levels of self-service capabilities:

Figure 1.2: The integration of enterprise and end-user BI

The Five-Layer model is a useful framework for positioning these self-service capabilities. Users who express a need for self-service BI can have that on different layers in the model. They may create their own visualizations on top of existing (corporate) analytical models; they may build their own analytical models using centrally prepared datasets; or they may gather their own data, combine that with corporate data, and create most of the solution by themselves. They may even use artifacts created by other end users to build upon.

It should be clear that when moving down in the five layers, the complexity of the work done tends to increase. Therefore, this not only requires more advanced expertise, but it also adds to the responsibility of complying with corporate guidelines and standards. Central IT or BI needs to facilitate selected end users to create their own solutions. When implemented correctly, this leads to a situation where all people in the organization benefit from insights optimally. We call this vision collective analytics.

Power BI offers several features that enable the implementation of collective analytics. One of these features is DAX, which is instrumental in empowering business users not only to create their own analytic solutions, but also be strongly involved in developing enterprise solutions. We will talk about the latter in the How to approach solution development section later in this chapter, but let us first take a look at what DAX is and where it is found in a BI solution.

Where DAX fits in, and where to find it

In an analytics solution based on the Microsoft platform, DAX is used in the Analyze layer. DAX lives inside analytical models as the formula language to define calculations and other logic. In fact, models and DAX are really two sides of the same coin: the design of the model impacts the complexity of the DAX code, and your skills in DAX determine your model designs (we will elaborate on the core concepts of data models in Chapter 1.2, Model Design).

The power of DAX is in its strong data aggregation capabilities. The DAX language contains many functions and constructs for defining a variety of aggregations to generate results that lead to the insights needed. In the past, many types of aggregations could not be created directly but had to be implemented through specifically preparing data. For instance, computing a year-to-date sales total can be done in DAX with a single function, while in Excel or traditional reporting tools, separate indicators are needed to denote which sales transactions belong to the year-to-date period. The end result, while more complex to implement, is still less dynamic than the DAX solution, which provides historical year-to-date figures as well.

This means that, with DAX, the effort that goes toward data preparation is much lower than in traditional BI solutions. As the DAX syntax and many core concepts are similar to those in Excel, the DAX language is relatively easy to learn for Excel users. This doesn't mean, though, that DAX is easy to master: when working with DAX, you will find that the complexity of the questions to be answered with DAX calculations steadily increases. And with that, you will need to create ever more sophisticated DAX code. This book will give you many examples of advanced applications of DAX, which will hopefully help you solve your DAX problems.

All core Microsoft data products now offer the capability to build an analytical model including DAX. It is confusing, however, that in each product the naming of the model is different. Below, we give an overview of models and DAX in different Microsoft products.

Excel

Since version 2010, Microsoft Excel offers analytical data modeling capabilities in the form of Power Pivot. Power Pivot, also called a Data Model in Excel, is the analytical model based on DAX.

Power BI

The shining star in the Microsoft data platform is Power BI. It was introduced first as an add-on to Office 365 but was turned into a separate service in 2015. The analytical model in Power BI is called a Power BI dataset, or a dataset for short, and is the home for DAX.

Power BI datasets and other Power BI artifacts are run within the Power BI cloud service accessible through the Power BI website, powerbi.com. Some other access points to the Power BI service are available, like the Power BI mobile app, Microsoft Teams, and even custom applications through a special version of the Power BI service called Power BI Embedded. Alternatively, Power BI Report Server is server software designed for customers that cannot or do not want to use the cloud service.

SQL Server Analysis Services

Microsoft's data server platform, SQL Server, contains an analytics component called Analysis Services (SSAS). Starting as OLAP Services around the millennium, SSAS has been a classical OLAP (for Online Analytical Processing) server for years and is now called Multidimensional. With SQL Server 2012, a second flavor of analytical capabilities was introduced, called Tabular, which is the analytical model based on DAX.

You may wonder about the differences between the two technologies in SSAS. This is not the book for going deep into all the details here, but the most fundamental thing is that SSAS Multidimensional is based on "classical" relational database technology. This technology is not designed to aggregate and perform calculations on large amounts of data, and a Multidimensional model, or cube, simply performs all these calculations beforehand during model processing. The effect of this is that a Multidimensional model is far less flexible and dynamic than a Tabular model, simply because all aggregation levels and calculation results are computed during model processing. On the contrary, the power of DAX is being able to perform calculations on the fly, meaning a much more dynamic analytics experience, as a report designer is not restricted by aggregation levels implemented by the model designer.

Azure Analysis Services

Azure Analysis Services (AAS) is a fully managed analytics cloud service based on the same Tabular engine as SSAS. The difference, obviously, is that AAS runs in the cloud. The result is that your organization doesn't have to worry about the maintenance of the hardware and databases. It is also a flexible solution, since the resources can be scaled to meet the needs of the moment.

The analytical model with DAX clearly exists under many different names: Power Pivot, Data Model, dataset, or Tabular model. This poses a challenge for this book when we need to refer to the analytical model. As our primary focus is on Power BI, we will use the term Power BI model throughout this book, or simply model when there is no risk of confusion. The term "analytical model" is used only for the conceptual model in the Five-Layer model.

Tools to develop models and DAX

You have multiple options to choose from as your tool to work with DAX, depending on the target platform for the model:

For a Power Pivot model, you use Excel with the Power Pivot add-on.For a Power BI dataset, you use Power BI Desktop.

It is interesting to note that there are in fact three versions of Power BI Desktop. One can be downloaded from the Power BI website, powerbi.com. The second is installed from the Windows Store and is automatically updated just like any other app from the store. When you realize that Power BI Desktop gets a new release nearly every month, these automatic updates are a real plus, although it can sometimes be annoying when a new release changes things you do not expect. If you want to, you can install both versions on the same computer. The third version of Power BI Desktop, which can be downloaded from the Power BI website as well, is a special edition for use with Power BI Report Server.

For a Tabular model in SSAS or AAS, you use Visual Studio, which offers many features for professional development, like integration with version control systems, scripting, and compatibility.For a Power BI dataset in Power BI Premium, you can choose between Power BI Desktop and Visual Studio. This is possible through the XMLA endpoint, a technique implemented in Power BI Premium that gives a Power BI dataset the exact same outside appearance as a Tabular model.In addition, there are several community-based tools available like Tabular Editor and DAX Studio. These can even be integrated into Power BI Desktop.

For this book, we choose to use "plain" Power BI Desktop, as this is a free app that you probably already have. Every reader of this book can easily download Power BI Desktop and use the example files provided in the book's GitHub repository at https://github.com/PacktPublishing/Extreme-DAX.

Powered by DAX: visual, interactive reports

In discussing the Five-Layer model, we already briefly touched upon the importance of visual reporting. But effective visual reporting is only possible through the power of DAX-based models.

Creating visual output is paramount to generating real insights versus a lot of information only. Just take the example below:

Figure 1.3: Some sales data in a table visual

Can you spot the problem or opportunity this company has? If so, you are very good at numbers! Most people, though, are much more visually inclined. The picture below provides the same insight in a visual way:

Figure 1.4: The same sales data presented more visually

It is obvious now that one of the SKUs is doing significantly better than the others. That is an interesting and valuable insight: if we can make the other SKUs reach the same performance, the overall results of the company will dramatically improve. This insight also leads to something else: you want to know what it is about this SKU that makes it so great. Is there a single large customer that ordered this SKU? Are there specific geographies where the SKU is sold? And what about the margin on this SKU; perhaps it sells well because we priced it too low?

This is a fundamental cause-and-effect that happens all the time when you create deep insights through visual reports. Insights lead to new questions. The answers to these questions are new insights, which, in turn, lead to other questions.

How can this effect be addressed? The traditional way has been to provide as much information as possible in a single report. The reason for this was that it takes time to render a report. Power BI approaches this in a fundamentally different way, made possible by the power of DAX, by adding interaction to reports:

Figure 1.5: The Visualize-Interact cycle

Interaction allows the consumer of a report to dig into the initial insights and find the answers to their follow-up questions, making the move from straightforward reporting to analyzing in an organic way. For this to work, the report needs to be able to provide new visualizations in the blink of an eye. For a Power BI visual report, which retrieves all content from a Power BI model, this means that the model must be equally fast in providing results to the report. The performance of the model is a result of its structure and the DAX code that you implement. So, there is a direct link between your DAX code and the user experience of your reports!

How to approach solution development

Power BI models and DAX enable a different way to develop BI solutions with much deeper involvement from businesspeople. This leads to solutions that deliver insights that optimally add value to the business.

The traditional, IT-centric way to create BI solutions is to start with connecting to data sources and preparing data. At first glance, this sounds like the right thing to do; after all, we need good data to have good and valuable insights.

Figure 1.6: The traditional BI solution development approach

This approach is typically materialized in the development of an enterprise data warehouse. The idea behind having a data warehouse is to store all of the organization's data in a single place, and to use that as the foundation for all reporting.

It should be clear that this can be a major undertaking, as organizations have many different systems, with lots of different data in them. The data warehouse is traditionally implemented as a relational database system, which means that all of the enterprise data must fit in the database structure, or schema, of the data warehouse.

The variety of data causes the schema of a data warehouse to be highly complex. Moreover, whenever a source system changes, or a new system is introduced, the data in the new system must match the schema of the data warehouse, or the data warehouse must be changed to accommodate the new data. As a consequence, data warehouse projects are notorious for their duration and high costs. Many careers are built on data warehouses and, unfortunately, many careers have been broken on them as well.

There is a more fundamental flaw in the traditional approach. By starting with source systems and working upward through the Five-Layer model, you miss the invaluable business context about the insights that are actually needed. Even while most data warehouse projects aim to include business needs and contexts in the process, in reality, the business fades into the background in many, and perhaps most, projects that are approached this way. The technical complexity is just too much to let the business be involved deeply. Often, the result is that when the data warehouse is finally finished (or rather, taken into production for the first time), it is already behind the real business needs of that moment.

While waiting for the corporate reports to be available, the organization has already deployed a "shadow IT" tactic to retrieve the insights needed. People take any tool at hand (usually Excel) and any data that they can put their hands on, and build analytics solutions themselves. While this is a solution to a problem – the urgent need for insights – it doesn't help the organization in the long run. The quality of the data, and therefore the validity of the insights generated, remains questionable. The risk of making incorrect decisions based on these insights is high.

Using Power BI models to accelerate BI solution development

Instead of the approach described above, Power BI enables working both upward and downward through the Five-Layer model.

Figure 1.6: The solution development approach Power BI enables

With this approach, we use Power BI not only as the target platform for the solution, but also as a tool to streamline the project itself. By doing this, you leverage the specific capabilities of Power BI: fast creation of reports providing tangible insights, based on wherever the data is. The nature of Power BI models and DAX is foundational for these capabilities.

The approach is based on two basic principles:

We do not know exactly what we needOur data is not correct

The consequence of these principles is that it is impossible to get it right the first time. Rather, you should deploy an iterative way of working to fail fast and improve fast.

We do not know exactly what we need

This principle means that you do not expect the business owner, or yourself for that matter, to be able to provide correct specifications for the reports. If you have ever undertaken a BI project, you will recognize this. Even those who claim they know exactly how things should be will overlook some details.

Even if they do not, there will be a misunderstanding in how to implement the specifications in the solution, if only because the person developing the solution does not have the same business context.

As a consequence, it is not effective to spend a lot of time gathering requirements or writing down specifications and getting them approved. You just know that the first report will be wrong, so the better way is to do it wrong as fast as possible. As it turns out, it is much easier to point out the errors and flaws in something that actually exists than to write down all the details in an abstract way.

You can formalize this approach by working in multiple iterations with a joint session at the end to show prototypes and gather feedback. (We like to call these business design sessions to highlight the nature of them: they are not feedback sessions or demos, but joint efforts to achieve the right results.) Depending on your skills in Power BI and DAX, you can take two or more days to work on a prototype. The results of a business design session are taken as input for the next iteration.

Figure 1.7: Iterative requirements capturing

The result of this is a report that is truly jointly developed and optimally aligned with business needs. Moreover, the report and underlying analysis model is an exact specification of the data that the lower layers in the Five-Layer model should provide.

Our data is not correct

This second principle is most probably not new to you. Of course our data is not correct! The reason for this is that real-world business processes are much messier and more complex than anyone could model.

And, when you consider that IT systems are designed with an image in mind of what a business process should look like, it is clear that a system to automate a business process faces a fundamental dilemma. Either the system implements a strict data quality policy, in the sense that one can only enter data that conforms to the designed processes (and therefore fails to capture each and every case in that process), or the system provides flexibility to capture all instances of the business process, and necessarily allows for data that does not fit the designed ideal flow.

The latter option is what is chosen most of the time. This means that a typical business system allows for exceptions, custom fields that users leverage to enter custom information, and different kinds of bypasses. As a consequence, data from business systems does not always conform to your expectations. And things become worse when your business data is in spreadsheets or other files!

In traditional BI solutions, messy data is hard to detect and solve. The reason is that, typically, the BI system only contains aggregated data or the technology does not support the exploration of data on a detailed level by business users. This is where Power BI comes in: the technology of Power BI models is so powerful that in many situations, data can be loaded into the model without being aggregated. The visual and interactive reports provide insights through sophisticated (DAX) aggregations, while allowing you to zoom in to the deepest levels of detail.

In an iterative approach to BI solution development, results after the first few iterations are typically full of errors. Knowledgeable businesspeople are generally capable of discovering these errors quite easily. At first, flaws in the aggregations implemented are discovered this way; but in later iterations, the quality of the data comes to light. Being able to see detailed data in a Power BI report is a huge help in driving the adoption of, and trust in, new BI solutions.

The digital transformation cycle

So far, we have focused on what is needed to go from "raw" data to insights, and the role of DAX-powered Power BI models in this process. We have seen that business value does not come from merely connecting and preparing data, but that it is the insights from visual, interactive reports that provide value.

However, no organization will get better from just staring at nice-looking reports. What really matters, of course, is what you do with the insights. In other words: you need to take action to reap the benefits of a BI solution. You will also want to measure the effects of these actions, either in an automated way or by letting users enter feedback into some kind of system. The result of that is, again, data.

This leads to a cycle of digital transformation, or data-driven business improvement:

Figure 1.8: The digital transformation cycle

While Power BI shines in the process of going from data to insights, it is not designed to cover the other half of the transformation cycle. That part requires different capabilities, like facilities to enter or update data, and technology to connect systems and people.

It is for this reason that Microsoft made Power BI a part of a broader platform called the Power Platform. The Power Platform addresses the full digital transformation cycle with the same basic design principles: a central role for the business user, easy entry, and sheer power to cover demanding business requirements.

The Power Platform consists of three main components next to Power BI:

Power Apps provides an environment to develop business applications in a low-code manner, for use on smartphones or through a web browser. These can be used to edit or add data.Power Automate allows for automating process flow between a variety of systems, services, and user-oriented applications. A flow could, for instance, be triggered by an incoming email, ask for confirmation by a business owner, and automatically update data and trigger the refresh of a Power BI model and connected reports.Power Virtual Agents offers a platform to create user-friendly interaction with the Power Platform through AI-powered chatbots. With these, a user can enter data or kick off actions through a conversation-style interaction instead of having to learn a specific app interface.

Although these are all separate components of the Power Platform, tight integration exists between them. For example, you can create a Power BI report with an embedded Power Apps app to allow users to change data in a context-aware manner, right from where the insights are. In the same way, Power Automate flows can be embedded in Power BI reports to allow for taking action based on insights.

Summary

In this chapter, we discussed the field of business intelligence and the central role of analytical models in modern BI solutions. Power BI models are ideally suited for use as such models, not least because of the power of DAX.

You have learned about two capabilities of DAX that have a profound impact on the way BI solutions can be designed and developed:

DAX enables sophisticated calculations for a variety of aggregations of data; aggregations that, in the past, needed a lot of preparation of customized data. Therefore, DAX allows for shifting the focus from data (with all the tedious work involved) to the logic to generate business insights.DAX as a language is set up in a way that allows businesspeople who are mostly familiar with Excel to work on the BI solution themselves, to varying degrees. This means that much more alignment with business priorities can be achieved.

As Power BI models and DAX are two sides of the same coin, it is important to know how to balance the two to achieve optimal results. Simply put, you should do with DAX whatever DAX is particularly good at, and not solve these things in data, and vice versa, that is, not using DAX to prepare or generate data.

The next few chapters elaborate on these topics. In Chapter 1.2, Model Design, we discuss the dos and don'ts of designing Power BI models. Chapter 1.3, Using DAX, focuses on how to use DAX for best results. Chapter 1.4, Context and Filtering, continues on this theme, exploring the most important concepts to understand when writing DAX calculations. Part 2 of this book contains many examples, mostly from real-life customer projects, that demonstrate the power of DAX and the balance between DAX and Power BI models.

1.2 Model Design

Being effective with DAX starts with designing a good analysis model. In this chapter, we address a number of modeling-related topics that are important to understand for strong model designs.

The topics in this chapter include:

The way the Power BI engine stores dataChoosing the right data typesRelationshipsWhat structure to strive for in your models

To achieve good models, you will need to adapt to the appropriate way of thinking. This is a change needed both when you start working with Power BI coming from a background in Excel, and when you have a background in relational databases. When you are used to working in Excel, the concept of relationships in an analysis model specifically takes time to comprehend; but even when you have a database background, there are many things that are different as well. One of the difficulties in designing for Power BI is that the concepts seem familiar to database professionals, when in reality, they are fundamentally different. We therefore discuss many of these differences in this chapter.

Columnar data storage

The power of the Power BI model comes primarily from a smart data storage mechanism. Power BI models are, in fact, databases in the sense that they organize and store data. But the internals are very different from other database technologies you may be familiar with.

Relational databases

The traditional, corporate way of working with data is with a relational database management system (RDBMS) like Microsoft SQL Server. In an RDBMS, tables of data are defined with a fixed number of data columns per table. Each column must have a data type, like integer, text, or decimal number, and from this information, the RDBMS can derive how much space is needed to store a single row of data, or record, and how many rows can be stored in a disk-based data file. This concept makes an RDBMS an effective choice for applications that process transactions, like sales transactions from a web shop or transactions in a company's financial ledger.

The concept of an index in an RDBMS makes finding a specific record fast and efficient, which means that processing transactions on existing records can also be implemented effectively with an RDBMS. Tables can be linked through relationships (hence the name RDBMS), which makes sure that data in these tables are consistent; for example, an RDBMS will block the insertion of a sales transaction involving an unknown customer.

The RDBMS is a mature concept and many optimizations for it have been invented and implemented. As a result, most traditional analytics platforms rely on RDBMS technology as well. However, the needs of an analytical solution are completely different from those of a transactional system. In analytics, you are typically not interested in retrieving all data from a single row, but in retrieving data from many rows at the same time, and only from one or a few columns. To retrieve information from a single column, the RDBMS still needs to read complete rows from storage. Also, aggregating data from many rows is something an RDBMS is not designed to do and so will be relatively slow.

Figure 1.2.1 visualizes this; storing data by row (identified by the numbers) leaves no possibility of efficiently retrieving all the required column's values:

Figure 1.2.1: Retrieving column values from row-based storage is inefficient

Columnar databases

The Power BI model transposes the concept of an RDBMS by storing data not on a per-row basis, but per column. The rationale behind this is that in an analytical solution, often only a couple of columns have to be read from storage, but for all available rows. This is most efficient when all data in the same column are stored near each other.

Another reason is that in real-life situations, many values in a single column are the same; for example, millions of sales transactions are for only thousands or tens of thousands of products. A columnar database can therefore highly compress data by only storing a specific value once and keeping track of which rows it belongs to.

The high compression rate that columnar databases achieve opens up the possibility of keeping the whole database in memory, meaning that all data resides in the internal memory of the computer or server the database runs on, instead of being stored in files on disk. Keeping data in memory accelerates data retrieval even more.

The columnar model means data aggregation is done very efficiently. Instead of, for instance, summing all separate values in a column, the columnar database engine can simply take each distinct value and multiply that by the number of rows the value appears in. In short, the Power BI model's database engine is designed from scratch to support the typical workloads that come with data analytics: handling large amounts of data with specific characteristics and performing aggregations and calculations across these.

There is, however, one caveat here: in the end, you still need to know which values in different columns go together in one row. It is not enough that product 103 has been sold; you need to know how much it was sold for, to which customer, and on what day. To enable this, the model must keep lists of pointers that keep track of which value goes in which row, for any column. The overhead from this obviously grows when more columns are added to a table. "Narrow" tables are therefore more efficient than "broad" tables in a Power BI model.

Data types and encoding

The Power BI model works with a limited number of data types. Choosing the right data type for your data is important, as it determines the way your data is stored, or encoded, and how efficiently the model can process your data. Below is a list of all the data types recognized by Power BI:

Text: The most generic data type is Text. Virtually all data can be stored as Text. When loading data through Power Query, the generic Power Query data type Any is converted to Text in the Power BI model. This causes numerical columns to be stored as Text when you forget to explicitly make the type conversion in Power Query. (You can, of course, change the data type in the model, which will automatically add a type change step in Power Query.)Whole Number: The data type Whole Number is used, as you would guess, to store whole numbers. Because of the way the Power BI model stores and compresses data, this is one of the most efficient data types available. Decimal Number: This data type is the most versatile of the number data types. It can store almost anything, from very small to very large numbers, and with fractional values. You can store numbers with up to 15 digits.Fixed Decimal Number: The Fixed Decimal Number type, sometimes called Currency, is a data type for storing fractional number values with four decimals. You can store numbers with up to 19 digits, including the four decimals. This means that this data type has a smaller reach than the Decimal Number type. The Fixed Decimal Number type is typically used to store currency amounts, but can be used for any value that doesn't need many decimals.Date/Time, Date, Time: The Power BI model stores date and time values with a similar structure as Excel. This means that values are decimal numbers, with the whole number part representing the date, and the decimals representing the time.

The difference compared to Excel is in the base reference date: in a Power BI model, the number 1 corresponds to December 31, 1899, while in Excel, the number 1 corresponds to January 1, 1900 (both at midnight). The decimal adds the time as a fraction of a 24-hour day; for example, the value 2.5 represents January 1, 1900, at noon.

You have three choices for storing your date/time data. The Date/Time data type stores both dates and times. The Date data type stores only dates, meaning that this data type is equivalent to whole numbers. The Time data type only stores the time part, or decimals.

True/False: The True/False or Boolean data type can only store two values: True and False. Though of limited use, data is stored very efficiently using this type.Binary: The Binary data type is used to store data that cannot be represented as text, like image data or documents. It is not possible to perform aggregations or calculations with this data type, but it may be used for storing images for use in reports.