Microsoft Power BI Data Analyst Certification Guide - Orrin Edenfield - E-Book

Microsoft Power BI Data Analyst Certification Guide E-Book

Orrin Edenfield

0,0
33,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Microsoft Power BI enables organizations to create a data-driven culture with business intelligence for all. This guide to achieving the Microsoft Power BI Data Analyst Associate certification will help you take control of your organization's data and pass the exam with confidence.
From getting started with Power BI to connecting to data sources, including files, databases, cloud services, and SaaS providers, to using Power BI’s built-in tools to build data models and produce visualizations, this book will walk you through everything from setup to preparing for the certification exam. Throughout the chapters, you'll get detailed explanations and learn how to analyze your data, prepare it for consumption by business users, and maintain an enterprise environment in a secure and efficient way.
By the end of this book, you'll be able to create and maintain robust reports and dashboards, enabling you to manage a data-driven enterprise, and be ready to take the PL-300 exam with confidence.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 378

Veröffentlichungsjahr: 2022

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Microsoft Power BI Data Analyst Certification Guide

A comprehensive guide to becoming a confident and certified Power BI professional

Orrin Edenfield

Edward Corcoran

BIRMINGHAM—MUMBAI

Microsoft Power BI Data Analyst Certification Guide

Copyright © 2022 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Heramb Bhavsar

Senior Editor: David Sugarman

Content Development Editor: Joseph Sunil

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Production Designer: Sinhayna Bais

Marketing Coordinator: Priyanka Mhatre

First published: June 2022

Production reference: 4120822

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80323-856-2

www.packt.com

Contributors

About the authors

Orrin Edenfield has 20 years of experience in designing and implementing data and analytics solutions. Orrin contributes to Microsoft certification exams as well as architecture best practices and blueprints, helping to solve BI problems using Power BI and more. As a cloud solution architect, Orrin helps customers solve technical obstacles and helps develop Azure platforms.

Edward Corcoran has spent 20 years working in data warehousing and business intelligence. As a cloud solution architect, Ed helps customers use Microsoft data and analytics services to solve technical obstacles and helps develop customer implementation projects to ensure easy adoption. Ed also collaborates with product engineering to aid with the development of Microsoft products and Azure platforms.

About the reviewers

Udit Kumar Chatterjee is an experienced data analytics and data science professional with a demonstrated history of working in statistical data analysis, insights mining, and business analytics. He has worked across various sectors, including manufacturing, market research, retail, and accounting, and with reputed and big-4 firms. He is an MCT (Microsoft Certified Trainer) and a 13-times Microsoft Certified Professional, including holding the PL-300 certification and the MCSE (Microsoft Certified Solutions Expert) certification. He likes to empower and educate people in the field of data by actively contributing to various community platforms and LinkedIn.

I wish to thank my beloved parents for inspiring and encouraging me to achieve more on both the personal and professional fronts.

Joseph Gnanaprakasam is a data architect, husband, and father living in Virginia. He has over a decade of experience in building data engineering and business intelligence solutions. Recently, he has started sharing his musings on data at joegnan.com. He is an avid photographer and enjoys traveling.

Table of Contents

Preface

Part 1 – Preparing the Data

Chapter 1: Overview of Power BI and the PL-300 Exam

A brief overview of Power BI

Power BI for business intelligence

Power BI as a solution

Why get certified?

PL-300 Analyzing Data with Microsoft Power BI

Microsoft tests

Timelines

Strategies to get a passing grade

Summary

Questions

Chapter 2: Connecting to Data Sources

Technical requirements

Identifying data sources

Local data sources, files, and databases

Cloud and SaaS data sources

Connecting to data sources

On-premises data gateway

Exploring query types

Power BI datasets

Power BI dataflows

Query performance tuning

Reducing the data size

DirectQuery optimization

Composite model optimization

Advanced options (what-if parameters, Power Query parameters, PBIDS files, and XMLA endpoints)

What-if parameters

Power Query parameters

PBIDS files

XMLA endpoints

Summary

Questions

Chapter 3: Profiling the Data

Technical requirements

Identifying data anomalies

Interrogating column properties

Examining data structures

Interrogating data statistics

Column distribution

Column profile

Summary

Questions

Chapter 4: Cleansing, Transforming, and Shaping Data

Technical requirements

Accessing Power Query in Power BI

Sorting and filtering

Managing columns

Using column transformations

Transforming any data type columns

Transforming text columns

Transforming number columns

Transforming date and time columns

Adding columns

Using row transformations

Combining data

Using merge queries

Append queries

Combine files

Enriching data with AI

Language detection

Key phrase extraction

Sentiment analysis

Image tagging

Azure ML

Using advanced operations of Power Query

Using the Advanced Editor

Using the Query Dependencies tool

R and Python scripts

Summary

Questions

Part 2 – Modeling the Data

Chapter 5: Designing a Data Model

Technical requirements

Define the tables

Flatten out a parent-child hierarchy

Star schema

Defining relationships

Cardinality

Cross-filter direction

Relationship test tips

Define role-playing dimensions

Date table as a role-playing dimension

Configure table and column properties

The General section

The Formatting section

The Advanced section

Define quick measures

Resolve many-to-many relationships

Create a common date table

Power BI date hierarchy tables

Using your own date table

Date math

Model size

Role-playing with our date table

Define the appropriate level of data granularity

Design the data model to meet performance requirements

Summary

Questions

Chapter 6: Using Data Model Advanced Features

Technical requirements

Using sensitivity labels

Implementing row-level security

Setting up row-level security

Managing row-level security

Applying natural-language Q&A capability

Using Q&A in reports and dashboards

Q&A linguistic models

Optimizing Q&A in data models

Summary

Questions

Chapter 7: Creating Measures Using DAX

Technical requirements

Building complex measures with DAX

Quick measures

Creating your own measure

Measures versus calculated columns

Default summarization

Context is everything!

Using CALCULATE to manipulate filters

Simple filtering

The FILTER function

The ALL function

Implementing time intelligence using DAX

Date tables

Role-playing dimensions

Replacing numeric calculated columns with measures

The X functions

When to use calculated columns

When to use measures

Using basic statistical functions to enhance data

Changing the default summarization

Binning and grouping histograms

Implementing top N analysis

Ranking function

Top N functions

Creating semi-additive measures

Additive measures

Non-additive measures

Semi-additive measures

Summary

Questions

Chapter 8: Optimizing Model Performance

Technical requirements

Optimizing data in the model

Removing unnecessary rows and columns

Splitting numeric and text column data

Optimizing measures, relationships, and visuals

Optimizing relationships

Optimizing visuals

Optimizing with aggregations

Query diagnostics

Session diagnostics

Step diagnostics

Understanding query diagnostics

Summary

Questions

Part 3 – Visualizing the Data

Chapter 9: Creating Reports

Technical requirements

Understanding the capabilities of Power BI

Adding visualization items to reports

Choosing an appropriate visualization type

Table and matrix visualizations

Bar and column charts

Line and area charts

Pie chart, donut chart, and treemaps

Combination charts

Card visualization

Funnel visualization

Gauge chart

Waterfall chart

Scatter chart

Map visuals

Q&A visualization

Formatting and configuring visualizations

Formatting options for a visualization

Importing a custom visual

Configuring conditional formatting

Configuring small multiples

Applying slicing and filtering

Adding an R or Python visual

Adding a smart narrative visual

Configuring the report page

Designing and configuring for accessibility

Report accessibility checklist

Configuring automatic page refresh

Creating a paginated report

Using Power BI datasets in Excel PivotTables

Summary

Questions

Chapter 10: Creating Dashboards

Technical requirements

Introducing Power BI dashboards

Creating a dashboard

Setting a dashboard theme

Using a dashboard

Pinning tiles

Optimizing dashboards

Configuring views of a dashboard

Optimizing the performance of a dashboard

Summary

Questions

Chapter 11: Enhancing Reports

Technical requirements

Using bookmarks

Using the selection pane

Creating custom tooltips

Interactions between visuals

Configuring navigation for a report

Applying sorting

Sync slicers

Using drillthrough and cross-filter

Drilling down into data using interactive visuals

Exporting report data

Designing reports for mobile devices

Summary

Questions

Part 4 – Analyzing the Data

Chapter 12: Exposing Insights from Data

Technical requirements

Exploring slicers and filters

The Analytics pane

Summary

Questions

Chapter 13: Performing Advanced Analysis

Technical requirements

Identifying outliers

Using anomaly detection

Conducting time series analysis

Grouping and binning

Grouping

Binning

Key influencers

Decomposition tree visual

Applying AI insights

Summary

Questions

Part 5 – Deploying and Maintaining Deliverables

Chapter 14: Managing Workspaces

Technical requirements

Using workspaces

Using workspace roles

Workspace licensing

Distributing reports and dashboards

Creating a Power BI app

Using deployment pipelines

Creating a deployment pipeline

Unassigning a workspace to a deployment pipeline stage

Automating deployment pipelines

Monitoring workspace usage

Using usage reports

Summary

Questions

Chapter 15: Managing Datasets

Technical requirements

Configuring a dataset scheduled refresh

Identifying when a gateway is required

Configuring row-level security group membership

Providing access to datasets

Summary

Questions

Part 6 – Practice Exams

Chapter 16: Practice Exams

Practice test 1

Practice Test 2

Answer keys

Practice Test 1

Practice Test 2

Appendix: Practice Question Answers

Chapter 1, Overview of Power BI and the PL-300 Exam

Chapter 2, Connecting to Data Sources

Chapter 3, Profiling the Data

Chapter 4, Cleansing, Transforming, and Shaping Data

Chapter 5, Designing a Data Model

Chapter 6, Using Data Model Advanced Features

Chapter 7, Creating Measures Using DAX

Chapter 8, Optimizing Model Performance

Chapter 9, Creating Reports

Chapter 10, Creating Dashboards

Chapter 11, Enhancing Reports

Chapter 12, Exposing Insights from Data

Chapter 13, Performing Advanced Analytics

Chapter 14, Managing Workspaces

Chapter 15, Managing Datasets

Other Books You May Enjoy

Part 1 – Preparing the Data

This section covers how you can connect to various data sources, view data, and modify, cleanse, and prepare data for use in a data visualization, report, or dashboard.

This section comprises the following chapters:

Chapter 1, Overview of Power BI and the PL-300 ExamChapter 2, Connecting to Data SourcesChapter 3, Profiling the DataChapter 4, Cleaning, Transforming, and Shaping Data

Chapter 1: Overview of Power BI and the PL-300 Exam

First off, welcome to this study guide.

There are many reasons why you may have gotten this book. You may be required by work to get certified in Power BI. You may be looking for a new job or career advancement and realize that technical certifications provide a great means to demonstrate technical mastery. You may just want to learn more about Power BI and realize that a study guide would, by necessity, provide an overview of the entire landscape of Power BI.

In this chapter, we'll be covering the following topics:

A brief overview of Power BIWhy get certified?PL-300 Analyzing Data with Power BI

A brief overview of Power BI

The BI in Power BI stands for business intelligence. Business intelligence is a field of technology that concerns itself with everything from reporting to using math to predict the future. You may have heard it referred to by some other names, such as data mining or analytics.

Whatever name it is called in your organization, the goal of business intelligence is to distill the massive volume of data gathered and generated by modern businesses into actionable intelligence.

Basing your plans on data-driven decision making will allow you a deeper understanding of not only what you are doing but why. Data-guessing decision making or, worse, we've always done it this way decision making will become anathema to your data culture.

The reason for this is that your competitors will start driving their business decisions based on data analytics. Their businesses will become more intelligent, see market opportunities and trends before you do, and respond to customer needs, wants, and desires faster than you.

Microsoft defines business intelligence this way (https://powerbi.microsoft.com/en-us/what-is-business-intelligence/):

Business intelligence (BI) helps organizations analyze historical and current data, so they can quickly uncover actionable insights for making strategic decisions. Business intelligence tools make this possible by processing large data sets across multiple sources and presenting findings in visual formats that are easy to understand and share.

The key to this is that business intelligence must provide a business with "actionable insights." Which customer segment should we spend our marketing dollars on? What trucks will be off the road next month for scheduled maintenance? How many hours has that pump been running since we put it into production? Who has signed up to bring cupcakes to the bake sale? These questions and many more are answered every hour of every day by businesses and governments around the world.

Power BI for business intelligence

Power BI is Microsoft's premier enterprise data visualization tool for modern businesses. Power BI is also Microsoft's reporting tool for "citizen developers." Power BI is easy enough to use that anyone with a familiarity with Microsoft Excel should be able to understand and use it. Power BI is also powerful enough that it is the primary reporting tool for some of the largest companies in the world.

Power BI allows users to create interactive reports that lead to actionable intelligence for business decision making. Although Power BI is usually thought of as a reporting tool, it's also a complete business intelligence solution. It can, and often is, the entry point for businesses that need to start making data-driven decisions. For some businesses, Power BI provides all the business intelligence they will ever need.

As you will find in this book, Power BI is a collection of services, applications, connectors, and software. These things all work together to turn your data into actionable insights by turning that data into interactive, immersive reports and dashboards. To do these things, Power BI requires data.

Data is at the heart of Power BI. But there is a huge problem in modern businesses…

Reporting challenges

Data is everywhere. One of the biggest challenges in modern businesses is trying to get an end-to-end view of what is happening now or what has happened in the past. Many businesses have data in disparate locations. Data is spread out, some of it on-premises and some in the cloud. Companies try to keep important data in large relational databases but many times, crucial information is contained in Excel spreadsheets or in a SharePoint document library.

It is often very difficult for modern business users to see a complete picture of what is happening across the entire enterprise.

Power BI provides an overall, holistic view of all data within your business, providing that single pane of glass that shows what is happening everywhere within the business. With Power BI, you can see dashboards and reports that display rich, interactive visualizations and KPIs from data that can be residing both on-premises and in the cloud.

It has been said that data is the new gold. Data is the new oil. Businesses value their data estates as much as their manufacturing equipment or supply chains. Your data is a valuable asset.

As businesses learn to use their data, they also learn the importance of having that data. But data is only useful if it can be turned into actions.

For example, Cerner is a global healthcare technology company. They track more than 80 million patient visits every year. Cerner uses Power BI to help streamline the healthcare process, providing valuable insights in seconds instead of the weeks it used to take.

It's not just used in healthcare. The world of retail is being transformed by access to real-time information. T-Mobile uses Power BI to grant front-line workers access to analytical data so they can do their jobs better. Managers and associates can see activations, scorecards, and traffic numbers as they are generated. This allows managers to immediately allocate resources where needed and associates to see whether they are meeting their goals.

Financial companies are usually at the forefront of modernization. It's not just the giants of finance that are adopting Power BI; Members 1st Credit Union is a small credit union located in rural Pennsylvania. Operating 56 small- to mid-sized branches meant that monthly reports took hundreds of employee hours to collate and analyze. By taking advantage of some of Power BI's data features, such as automated refreshes and drillthrough analysis, Members 1st was able to save more than 10,000 hours a year, which is huge for a small, rural financial institution. (Full disclosure, one of the authors banks at Members 1st.)

As you will see in upcoming chapters, not only can Power BI connect to many different data sources, but Power BI can also combine data sources. Power BI is designed from the ground up to allow a user to easily bring data from multiple sources together in one location. These connections allow you to see data from your Enterprise Resource Planning (ERP) system mixed with data in an Excel spreadsheet and GitHub data.

You can bring together data from hundreds of sources and mix them together to discover new facts, new correlations, and new data points about your business.

Power BI as a solution

Power BI has two main versions: Power BI Desktop and the Power BI service.

Power BI Desktop

Power BI Desktop is a visual data exploration and interactive reporting tool, providing a free-form canvas for drag-and-drop exploration of your data, an extensive library of interactive visualizations, and an authoring experience for ease of report creation for the Power BI service. It produces interactive reports and data models.

Power BI Desktop is a free, downloadable Windows desktop application optimized for the Power BI service. Although it is a Microsoft Office application, sharing much of its user interface with products such as Excel and Word, it does not require or depend on Microsoft Office.

With Power BI Desktop, you get an application that specializes in delivering interactive visualizations for data analysis. With it, you can manipulate and consolidate multiple data sources into one report, allowing you to see data from disparate sources on one pane of glass.

The Power BI service, sometimes referred to as app.powerbi.com, allows you to create beautiful visualizations to tell compelling data stories. It's optimized to build rich, live dashboards that turn data into business insights.

With the Power BI service, you can securely share reports, dashboards, and Power BI apps with other people in your company, or even with trusted vendors and partners. This secure sharing is one of the biggest reasons for the popularity of Power BI.

The Power BI service also allows you to see your data on the go. With the Power BI mobile app, you can securely see your reports and dashboards from anywhere in the world.

So, with all of the benefits of Power BI in mind, let's consider the certification.

Why get certified?

Probably one of the biggest reasons to get certified in Power BI, or any technology really, is to demonstrate to your employer, or a future employer, that you understand and know what it does and how it works. Employers are often looking for people with knowledge and experience. Certification can let you easily demonstrate your knowledge.

Even if you are not looking for a new job, your PL-300 certification will demonstrate your Power BI knowledge to your boss, co-workers, and everyone who sees your LinkedIn profile.

Beyond just demonstrating technical knowledge, another key benefit is that you keep current with all the changes in Power BI. The process of gaining and keeping your PL-300 certification necessitates that you learn about Power BI.

Yes, that previous sentence said keeping your certification. Microsoft announced that starting December 15, 2020, you will have to renew your PL-300 certification every 2 years. We'll cover this more in the next section.

PL-300 Analyzing Data with Microsoft Power BI

The PL-300 Analyzing Data with Microsoft Power BI exam is a Microsoft role-based exam. Starting in 2019, Microsoft announced they were moving to a role-based model of exams and certifications.

Historically, Microsoft exams were centered around a single product, such as Windows Server or Exchange. Microsoft realigned their entire learning and certification process around the idea of roles. Currently, Microsoft has organized exams into 12 different roles. PL-300 is a test for the Data Analyst Associate track.

Because Microsoft wants to keep their certifications relevant and valuable, they target the questions of the exam to the level of the test. The PL-300 exam is targeted at an intermediate level. This means, as you will see in the Knowledge needed to pass section, this test covers a lot of ground.

Microsoft tests

Microsoft has been testing and certifying people for decades. Over that time, the tests have evolved and become much more complex. Microsoft exams are not just multiple-choice questions. During your exam, you will be presented with different types of questions, depending on what Microsoft considers the best way to make certain you know the answer.

Microsoft does not deduct points for wrong answers. Make certain to answer every question, even if you are uncertain.

PL-300 is about 55 questions long and you are given 3 hours to take the exam. Plan on it taking 3½ hours, as there will be some prep before the test. There is also an optional survey at the end.

You will have to score 700 out of a possible 1,000 points to pass the exam.

Types of questions

Some of the types of questions you might face include the following:

Multiple choice – These types will more than likely make up much of the PL-300 exam.Best answer – Perhaps my least favorite is this type of question. Much like multiple choice, but with the possibility that more than one answer will work and only one of the answers is "best."Build list – These questions present you with a scenario, then a list of steps that can be used to solve the scenario. You drag the steps from one side of the screen to the other and place them in the correct order. Usually, not all of the presented steps are part of the solution. Do not worry if you have steps left over.Drag and drop – Much like the build list type, with drag and drop you will be presented with a scenario and a series of processes or technologies. You match the process or technology to an answer by dragging it from the list to where it matches. As with the build list, Microsoft often provides more answers than needed. You may see this in the DAX section, where you will have to drag DAX commands into the appropriate place in the script.Active screen – This type of question will present you with a scenario part of a user interface. You must complete actions to achieve the desired outcome as specified by the scenario. You may see this type of question during the administration part of the exam. You may be asked to make a selection in the Power BI admin user interface.

Knowledge needed to pass

Microsoft publishes a list of topics covered by the PL-300 exam, and what percentage of the overall grade each topic will be worth. As you are going through the list, you may notice that it tracks with the layout of this book. This was intentional. If Microsoft is going to provide us with an outline, we are more than pleased to use it.

Here is the current list of exam topics and what percentage of your overall grade they are worth:

Prepare the Data (20-25%)Model the Data (25-30%)Visualize the Data (20-25%)Analyze the Data (10-15%)Deploy and Maintain Deliverables (10-15%)

You may notice from the table of contents that this book is broken down into sections that follow these topics in this order. That was intentional.

Timelines

Microsoft has decided to make most of their role-based and specialty certifications valid for 2 years from the date of achievement. Six months before your certification expires, Microsoft will provide a link on your Certification Dashboard (https://aka.ms/CertDashboard) for you to take an assessment. Passing the assessment on Microsoft Learn is the only way to renew a certification. Renewing your certification will then become an annual requirement.

You will also receive an email telling you it's time to renew.

This process ensures that you, and everyone else with the PL-300 certification, is up to date with all the changes in Power BI. As Power BI can change radically within a year, this is a good thing. The fact that renewal is free is amazing.

The best part is that you can retake the renewal test as many times as you need. You must pass it before your certification expires, but you have 6 months, unlimited retakes, and access to the internet while taking the renewal.

If you do not pass your renewal assessment, then you must pay for and pass the PL-300 exam again.

Strategies to get a passing grade

So, now we know what's on the test, let's talk about how to prepare to take the test. If you search the internet, you will find many slightly different strategies for studying for and passing Microsoft certification exams:

Set a date – The most important step is the first one. If you set a date, it allows you to work backward from that date to plan your studying. Some people even go as far as to purchase their test voucher for a date in the future, thus committing them to that data. Buy a nice study guide – Might I recommend this one? Get hands-on – Most of the test is centered around Power BI Desktop, a free-to-use application; no license is required. You can download Power BI Desktop from https://powerbi.com or, as I prefer to do, install it from the Microsoft Store. The Microsoft Store version will get the monthly updates automatically.Learn the technology – This study guide is great, but nothing is better than getting hands-on with the technology. Create reports for your work, for yourself, and for your friends. Learn the vocabulary – Knowing what things are called will help immensely with the test. Many times, you can eliminate one or two answers from a multiple-choice question just by knowing the terminology. Know what to expect (again, a nice study guide is a good idea) – Look at the table of contents for this book. Review the previous Knowledge needed to pass section. Make sure you have at least a passing familiarity with each subheading.

Do not stress out! Power BI is intuitive and fun. Plus, Microsoft does not put a record of failed attempts on your transcript.

Summary

In this chapter, we covered the basics of why Power BI is a great reporting tool. We also went over Microsoft certifications, what they are, why they're great, and what's on the one you will be studying for. We also reviewed the types of questions asked and the topics that will be covered in this exam.

In the next chapter, we will dive right into what will be on the exam. As with most reporting things, we start with data. Power BI is great at aggregating data from disparate sources. We will cover the connection to those sources and how to organize your data for better reporting.

Questions

What does the "BI" in Power BI stand for?Business informationBidirectional informationBusiness intelligenceBig industryThe Power BI service allows you to:Download software allowing you to explore data and create reports.Create spreadsheets that calculate values based on formulas.Author presentations using the slideshow concept.Share reports, dashboards, and apps with other users.How often must Microsoft certifications be renewed?Never, they last forever.Every 6 months.Every 2 years.Every 4 years.

Chapter 2: Connecting to Data Sources

In most organizations, data tends to be stored in various data stores, such as filesystems, proprietary and open source databases, or even distributed filesystems for high-performance compute platforms. Often the data has meaning and is useful while being stored in the source systems, such as a transactional database that keeps track of sales from a group of point-of-sale systems. In this example, data is stored in a relational database that is tuned to keep track of each sale. For analytics purposes, we will likely want to use this data in concert with data from a separate system that tracks the inventory of items we have for sale. The inventory will likely be a different relational database, possibly from another technology vendor. To better understand whether we are stocking too many items (or not enough) for sale, we need to create a view of the data from both sales and inventory databases.

Over the past few decades, this has been the goal of data warehousing – to allow data from all over an organization to be combined to help answer meaningful business questions with the goal of helping organizations to run more efficiently, and to help them enable further transformations to help in the competing marketplace. The data warehousing industry exists to help make this easier, but this is always an increasing challenge as new technologies for data storage and processing arise.

The end goal of data warehousing is to allow businesses to have a 360-degree view of their business, and to do that not only do data warehouses need to connect to disparate data sources but so do BI and other reporting tools. Many times, data is stored outside the enterprise data warehouse that also needs to be incorporated into the analysis. The rise of more popular Software-as-a-Service (SaaS) providers also means that data can live in the cloud, and it also needs to be incorporated into BI and reporting analysis in order to provide the whole picture to the business decision makers.

Microsoft Power BI provides some of the most comprehensive connectivity capabilities on the BI reporting and analytics tools on the market. This means Power BI can connect to disparate data sources for use in creating reports and dashboards.

In this chapter, we're going to cover the following topics:

Identifying data sourcesConnecting to data sourcesPower BI datasetsPower BI dataflowsQuery performance tuningAdvanced options (what-if parameters, DAX parameters, PBIDS files, and XMLA endpoints)

Technical requirements

The following are the prerequisites in order to complete the work in this chapter:

Microsoft Power BI Desktop installed on a Microsoft Windows PC.Access to some data to use. We're also providing synthetic data that can be used. This is available in the GitHub repository for this book here: https://github.com/PacktPublishing/Microsoft-Power-BI-Data-Analyst-Certification-Guide/tree/main/example-data.

Identifying data sources

In this section, we will review the various data source options that Power BI provides.

Local data sources, files, and databases

Most BI developers will work from a local Windows PC. Usually, that PC is not also running an enterprise database or functioning as a corporate file server. Power BI provides the ability to connect local data on your PC just the same as if the data is stored on a corporate file server (using network connectivity or Windows file share) or if you're running a development server on your local machine. For ad hoc and testing purposes, many users will also want to import data files, such as CSV or Excel, from their local PC as well. Power BI supports various formats and makes it easy to import local files/data.

You can also import a folder of files or a Microsoft SharePoint folder of files.

Power BI also supports connecting to databases in the most popular databases. Most enterprise organizations run their business using data stored in databases, so sourcing data from a database is a common occurrence using Power BI. Some of the most popular databases include Microsoft SQL Server, Oracle Database, and SAP.

To see a complete list of supported files and databases, be sure to review the official documentation at https://docs.microsoft.com/power-bi/connect-data/power-bi-data-sources.

Cloud and SaaS data sources

In recent years, as more organizations have seen the value and adopted the cloud to help digitally transform their business, support for common cloud databases and SaaS providers has also been very important. Power BI supports a wide range of Microsoft Azure cloud services, including Azure Synapse Analytics and Amazon Redshift.

Outside of these dedicated connectors to databases and cloud services, Power BI also includes capabilities that enable connectivity using open or standards-based solutions, such as ODBC, REST API, and OData.

To see a complete list of supported cloud and SaaS data sources, check the official documentation at https://docs.microsoft.com/power-bi/connect-data/power-bi-data-sources.

It's important for BI tools to connect to the data sources where data is stored. If an organization uses a data store that the BI tool does not support, then the data needs to be moved to a supported data store. For the exam, it is important to know some of the common data sources supported by Power BI. The wide connectivity to data has helped Power BI become one of the leading BI tools on the market today.

Connecting to data sources

You may want to include data from a variety of different sources, each of which has its own methods of connecting. We'll look at a couple of types of data sources in this section.

On-premises data gateway

Since Power BI lives as both a desktop tool and an online SaaS that provides the hosting of reports and dashboards, it is important to connect to non-online-hosted data sources as well. This is accomplished with the on-premises data gateway (referred to as the data gateway). The data gateway is a free download from the Microsoft Download Center that works with the online Power BI service to enable this connectivity on local resources from the cloud.

It is important to understand how the on-premises data gateway is different from other gateway software that may have been used in the past. The data gateway software runs on a local Windows computer, and it needs to have an internet connection to connect to an intermediary Azure Service Bus. This encrypted connection to Azure Service Bus (provided by the Power BI service) allows the data gateway to work with the Power BI service, without needing any firewall ports to be opened. This allows a cloud service to connect to the local network. Most enterprise network security teams want to maintain or consistently improve the security posture of their organization and using the on-premises data gateway architecture will help those teams meet security goals as well.

Please note that the data gateway supports different data sources in different ways. Some sources are supported and some are not; it's important to make note of the sources you intend to use and check for the needed support, either directly from the Power BI service or using the on-premises data gateway.

Figure 2.1 – Architecture of the on-premises data gateway

A data gateway can function in two modes: personal mode and standard mode. Standard mode is typically used in organizations on server hardware and provides DirectQuery and live connection to Analysis Services support. Standard mode, however, must run as a service on the computer and requires administrator privileges, while personal mode does not. When running in standard mode, this data gateway also works with Azure Analysis Services, Azure Logic Apps, Power Apps, Power Automate, and Power BI dataflows.

The data gateway is available for download from https://powerbi.microsoft.com/gateway/.

Anytime Power BI connects to a data source (directly or with the on-premises data gateway), it uses one of two types of queries. Next, we'll look at query types.

Exploring query types

When Power BI connects to any data source, it makes one of two types of connections or queries.

DirectQuery

The first type is called DirectQuery, or a live connection to the data. This means that Power BI is storing credentials and a connection string to the data source. For example, this could be the relational database storing inventory data used in a warehouse. This DirectQuery connection allows Power BI to query the inventory database and retrieve information such as the tables and views available in the source database, the schema of those tables and views, as well as records contained in the tables and views. Once a connection has been made, the data retrieved is used and displayed by Power BI. DirectQuery is useful when the underlying data has the potential to be changed quickly and the most up-to-date information is needed for use in Power BI reports and dashboards. For example, if a real-time inventory system has been implemented in the warehouse and the inventory system itself keeps an up-to-the-second record of goods for sale, then it's possible the purchasing team may also need up-to-the-second reports that allow them to make the most informed decisions when it comes to refilling stock in the warehouse.

Import Query

The second type of connection is called Import Query. Import will connect to a data source and store the same credentials and connection string, but it will also import the data and store that in memory