Extending Power BI with Python and R - Luca Zavarella - E-Book

Extending Power BI with Python and R E-Book

Luca Zavarella

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Python and R allow you to extend Power BI capabilities to simplify ingestion and transformation activities, enhance dashboards, and highlight insights. With this book, you'll be able to make your artifacts far more interesting and rich in insights using analytical languages.
You'll start by learning how to configure your Power BI environment to use your Python and R scripts. The book then explores data ingestion and data transformation extensions, and advances to focus on data augmentation and data visualization. You'll understand how to import data from external sources and transform them using complex algorithms. The book helps you implement personal data de-identification methods such as pseudonymization, anonymization, and masking in Power BI. You'll be able to call external APIs to enrich your data much more quickly using Python programming and R programming. Later, you'll learn advanced Python and R techniques to perform in-depth analysis and extract valuable information using statistics and machine learning. You'll also understand the main statistical features of datasets by plotting multiple visual graphs in the process of creating a machine learning model.
By the end of this book, you’ll be able to enrich your Power BI data models and visualizations using complex algorithms in Python and R.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 574

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Extending Power BI with Python and R

Ingest, transform, enrich, and visualize data using the power of analytical languages

Luca Zavarella

BIRMINGHAM—MUMBAI

Extending Power BI with Python and R

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Publishing Product Manager: Sunith Shetty

Senior Editor: David Sugarman

Content Development Editor: Joseph Sunil

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Aparna Nair

Proofreader: Safis Editing

Indexer: Tejal Soni

Production Designer: Nilesh Mohite

First published: November 2021

Production reference: 1221021

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80107-820-7

www.packt.com

To my wife, who was patient in all the time writing the book took away from us. To my group of friends, "5-5-5 tables," who always had the right words to motivate and support me in the most difficult moments during this adventure. To my mentors, who have been able to instill in me the curiosity and tenacity useful to learn skills with which to overcome the most insidious technical difficulties. To everyone I have met so far in my professional journey who has allowed me to delve into the topics I am now an expert in.

- Luca Zavarella

Foreword

It has been my pleasure to know Luca Zavarella and his work since 2018. Luca and I met for the first time at a machine learning and data science conference in London, and since then we have shared a genuine passion and extensive knowledge about the data science community and the latest technologies that are empowering the community.

Luca is one of the most brilliant professionals I know in the advanced analytics industry: his mission has always been to simplify the complexity of BI tools and programming languages for data science professionals.

Most people think that data science is all about building machine learning models that make accurate predictions. Building these models is a subset of the end-to-end data science workflow, and BI technologies, such as Power BI, play a pivotal role in this workflow that supports successful data science projects.

Such a versatile and powerful tool, combined with open source languages such as Python and R, is essential for anyone who needs to do data ingestion and transformation to build business-ready dashboards and reports, and to prepare their data to feed machine learning algorithms.

Learning data science is a process that is continuous in nature. In order to succeed in this process, you need to commit to the process of understanding how to apply the latest technologies, such as Python and R, to your data, enrich your analyses, and accelerate your career.

Become a data analyst superhero and build actionable insights and data science applications with Power BI, R, and Python!

Francesca Lazzeri, PhD

Principal Data Scientist Manager, Microsoft

Adjunct Professor, Columbia University

Contributors

About the author

Luca Zavarella is certified as an Azure Data Scientist Associate and is also a Microsoft MVP for artificial intelligence. He graduated in computer engineering at the Faculty of Engineering of L'Aquila University, and he has more than 10 years of experience working on the Microsoft Data Platform. He started his journey as a T-SQL developer on SQL Server 2000 and 2005. He then focused on all the Microsoft Business Intelligence stack (SSIS, SSAS, SSRS), deepening data warehousing techniques. Recently, he has been dedicating himself to the world of advanced analytics and data science. He also graduated with honors in classical piano at the Conservatory "Alfredo Casella" in L'Aquila.

About the reviewers

Back in 2010, Riccardo started working in information technology. After a brief experience as an ERP consultant, he moved into the data realm. He spent these years exploring the data world from different points of view, as both a DBA and a business intelligence engineer. He's a Microsoft Certified Professional (MCP), Microsoft Data Analyst Associate, and a Microsoft Certified Solution Associate (MCSA) in the database, BI, and Azure areas, as well as an Microsoft MVP in the data platform category. He's one of the leaders of the Power BI User Group in Italy and he's been a speaker at the Power Platform World Tour and Power Platform Bootcamp stops in Italy. When he can, he travels Europe to speak at and attend events such as Azure Saturdays, SQL Saturdays, and Data Saturdays

Art Tennick is the author of 20 computer books and over 1,000 magazines and LinkedIn articles, including one of the first books on DAX, written in 2009. He is an independent, freelance consultant in Power BI (since 2009), Analysis ServSSices (since 1998), and SQL Server (since 1993). His main interests are in Analysis Services (on-premises, Azure, and in the Power BI service), Power BI, Power BI Paginated, and the integration of Python and R data science with Power BI, Analysis Services, and SQL Server. You can find him on LinkedIn.

I would like to thank Rita Mendoza, Emma, and Lorna for their love and support over the years.

Table of Contents

Preface

Section 1: Best Practices for Using R and Python in Power BI

Chapter 1: Where and How to Use R and Python Scripts in Power BI

Technical requirements

Injecting R or Python scripts into Power BI

Data loading

Data transformation

Data visualization

Using R and Python to interact with your data

R and Python limitations on Power BI products

Summary

Chapter 2: Configuring R with Power BI

Technical requirements

The available R engines

The CRAN R distribution

The Microsoft R Open distribution and MRAN

Microsoft R Client

Phasing out of Microsoft R Open

Choosing an R engine to install

The R engines used by Power BI

Installing the suggested R engines

Installing an IDE for R development

Installing RStudio

Configuring Power BI Desktop to work with R

Configuring the Power BI service to work with R

Installing the on-premises data gateway in personal mode

Sharing reports that use R scripts in the Power BI service

R visuals limitations

Summary

Chapter 3: Configuring Python with Power BI

Technical requirements

The available Python engines

Choosing a Python engine to install

The Python engines used by Power BI

Installing the suggested Python engines

Installing an IDE for Python development

Configuring Python with RStudio

Configuring Python with Visual Studio Code

Configuring Power BI Desktop to work with Python

Configuring the Power BI service to work with R

Sharing reports that use Python scripts in the Power BI service

Limitations of Python visuals

Summary

Section 2: Data Ingestion and Transformation with R and Python in Power BI

Chapter 4: Importing Unhandled Data Objects

Technical requirements

Importing RDS files in R

A brief introduction to Tidyverse

Creating a serialized R object

Using an RDS file in Power BI

Importing PKL files in Python

A very short introduction to the PyData world

Creating a serialized Python object

Using a PKL file in Power BI

Summary

References

Chapter 5: Using Regular Expressions in Power BI

Technical requirements

A brief introduction to regexes

The basics of regexes

Checking the validity of email addresses

Checking the validity of dates

Validating data using regex in Power BI

Using regex in Power BI to validate emails with Python

Using regex in Power BI to validate emails with R

Using regex in Power BI to validate dates with Python

Using regex in Power BI to validate dates with R

Loading complex log files using regex in Power BI

Apache access logs

Importing Apache access logs in Power BI with Python

Importing Apache access logs in Power BI with R

Extracting values from text using regex in Power BI

One regex to rule them all

Using regex in Power BI to extract values with Python

Using regex in Power BI to extract values with R

Summary

References

Chapter 6: Anonymizing and Pseudonymizing Your Data in Power BI

Technical requirements

De-identifying data

De-identification techniques

Understanding pseudonymization

What is anonymization?

Anonymizing data in Power BI

Anonymizing data using Python

Anonymizing data using R

Pseudonymizing data in Power BI

Pseudonymizing data using Python

Pseudonymizing data using R

Summary

References

Chapter 7: Logging Data from Power BI to External Sources

Technical requirements

Logging to CSV files

Logging to CSV files with Python

Logging to CSV files with R

Logging to Excel files

Logging to Excel files with Python

Logging to Excel files with R

Logging to an Azure SQL server

Installing SQL Server Express

Creating an Azure SQL database

Logging to an Azure SQL server with Python

Logging to an Azure SQL server with R

Summary

References

Chapter 8: Loading Large Datasets beyond the Available RAM in Power BI

Technical requirements

A typical analytic scenario using large datasets

Import large datasets with Python

Installing Dask on your laptop

Creating a Dask DataFrame

Extracting information from a Dask DataFrame

Importing a large dataset in Power BI with Python

Importing large datasets with R

Installing disk.frame on your laptop

Creating a disk.frame instance

Extracting information from disk.frame

Importing a large dataset in Power BI with R

Summary

References

Section 3: Data Enrichment with R and Python in Power BI

Chapter 9: Calling External APIs to Enrich Your Data

Technical requirements

What a web service is

Registering for Bing Maps Web Services

Geocoding addresses using Python

Using an explicit GET request

Using an explicit GET request in parallel

Using the Geocoder library in parallel

Geocoding addresses using R

Using an explicit GET request

Using an explicit GET request in parallel

Using the tidygeocoder package in parallel

Accessing web services using Power BI

Geocoding addresses in Power BI with Python

Geocoding addresses in Power BI with R

Summary

References

Chapter 10: Calculating Columns Using Complex Algorithms

Technical requirements

The distance between two geographic locations

Spherical trigonometry

The law of cosines distance

The Law of Haversines distance

Vincenty's distance

What kind of distance to use and when

Implementing distances using Python

Calculating distances with Python

Calculating distances in Power BI with Python

Implementing distances using R

Calculating distances with R

Calculating distances in Power BI with R

The basics of linear programming

Linear equations and inequalities

Formulating a linear optimization problem

Definition of the LP problem to solve

Formulating the LP problem

Handling optimization problems with Python

Solving the LP problem in Python

Solving the LP problem in Power BI with Python

Solving LP problems with R

Solving the LP problem in R

Solving the LP problem in Power BI with R

Summary

References

Chapter 11: Adding Statistics Insights: Associations

Technical requirements

Exploring associations between variables

Correlation between numeric variables

Karl Pearson's correlation coefficient

Charles Spearman's correlation coefficient

Maurice Kendall's correlation coefficient

Description of a real case

Implementing correlation coefficients in Python

Implementing correlation coefficients in R

Implementing correlation coefficients in Power BI with Python and R

Correlation between categorical and numeric variables

Considering both variables categorical

Considering a numeric variable and a categorical one

Implementing correlation coefficients in Python

Implementing correlation coefficients in R

Implementing correlation coefficients in Power BI with Python and R

Summary

References

Chapter 12: Adding Statistics Insights: Outliers and Missing Values

Technical requirements

What outliers are and how to deal with them

The causes of outliers

Dealing with outliers

Identifying outliers

Univariate outliers

Multivariate outliers

Implementing outlier detection algorithms

Implementing outlier detection in Python

Implementing outlier detection in R

Implementing outlier detection in Power BI

What missing values are and how to deal with them

The causes of missing values

Handling missing values

Diagnosing missing values in R and Python

Implementing missing value imputation algorithms

Removing missing values

Imputing tabular data

Imputing time-series data

Imputing missing values in Power BI

Summary

References

Chapter 13: Using Machine Learning without Premium or Embedded Capacity

Technical requirements

Interacting with ML in Power BI with data flows

Using AutoML solutions

PyCaret

Azure AutoML

RemixAutoML for R

Embedding training code in Power Query

Training and using ML models with PyCaret

Using PyCaret in Power BI

Using trained models in Power Query

Scoring observations in Power Query using a trained PyCaret model

Using trained models in script visuals

Scoring observations in a script visual using a trained PyCaret model

Calling web services in Power Query

Using Azure AutoML models in Power Query

Using Cognitive Services in Power Query

Summary

References

Section 3: Data Visualization with R in Power BI

Chapter 14: Exploratory Data Analysis

Technical requirements

What is the goal of EDA?

Understanding your data

Cleaning your data

Discovering associations between variables

EDA with Python and R

EDA in Power BI

Dataset summary page

Missing values exploration

Univariate exploration

Multivariate exploration

Variable associations

Summary

References

Chapter 15: Advanced Visualizations

Technical requirements

Choosing a circular barplot

Implementing a circular barplot in R

Implementing a circular barplot in Power BI

Summary

References

Chapter 16: Interactive R Custom Visuals

Technical requirements

Why interactive R custom visuals?

Adding a dash of interactivity with Plotly

Exploiting the interactivity provided by HTML widgets

Packaging it all into a Power BI Custom Visual

Installing the pbiviz package

Developing your first R HTML custom visual

Importing the custom visual package into Power BI

Summary

References

Other Books You May Enjoy

Preface

Python and R allow you to extend Power BI capabilities to simplify ingestion and transformation activities, enhance dashboards, and highlight insights. With this book, you'll gain the ability to make your artifacts far more interesting and rich in insights using analytical languages.

You'll start by learning how to configure your Power BI environment to use your Python and R scripts. Next, the book explores data ingestion and data transformation extensions and focuses on data augmentation and data visualization. You'll understand how to import data from unhandled objects and transform them with regular expressions and complex algorithms. The book helps you implement personal data de-identification methods such as pseudonymization and anonymization in Power BI. You'll be able to call external APIs to enrich your data much more quickly using Python and R. Later, you'll learn advanced Python or R techniques to perform in-depth analysis and extract valuable information using statistics and machine learning. You'll also understand the main statistical features of datasets by plotting multiple visual graphs of them in the process of understanding relationships between their variables.

By the end of this book, you'll be able to enrich your Power BI data model and visualizations using complex algorithms in Python and R.

Who this book is for

This book is for business analysts, business intelligence professionals, and data scientists who already use Microsoft Power BI and want to add more value to their analysis using Python and R. Working knowledge of Power BI is required to make the most of this book. Basic knowledge of Python and R will also be helpful.

What this book covers

Chapter 1, Where and How to Use R and Python Scripts in Power BI, gives an introduction to Power BI and dives into the integration of Python and R with Power BI, and how we can interact with your data while using it. It also dives into the limitations faced by both Python and R.

Chapter 2, Configuring R with Power BI, looks at how to get the analytical language engines from the previous chapters up and running and gives you some general guidelines on how to pick the most appropriate one for your needs. After that, we'll look at how to make these engines interface with both Power BI Desktop and Power BI Service. Finally, we will give some important tips on how to overcome some stringent limitations of R Visuals on Power BI Service.

Chapter 3, Configuring Python with Power BI, shows how to install the Python engines on your machine. You'll also see how to configure some IDEs so that you can develop and test Python code comfortably before using it in Power BI.

Chapter 4, Importing Unhandled Data Objects, explores the use of R and Python to import complex serialized objects into Power BI, with the aim of using them to enrich your dashboards with new insights.

Chapter 5, Using Regular Expressions in Power BI, explores the use of regular expressions to validate low-quality data, to import semi-structured log files, and to extract structured information from free text.

Chapter 6, Anonymizing and Pseudonymizing Your Data in Power BI, introduces de-identification techniques using Python or R scripts that can help the Power BI developer prevent a person's identity from being linked to the information shown on the report.

Chapter 7, Logging Data from Power BI to External Sources, shows how to use Power Query to log data to various external files or systems.

Chapter 8, Loading Large Datasets beyond the Available RAM in Power BI, shows how you can take advantage of the flexibility provided by specific packages that implement distributed computing systems in both Python and R without having to resort to Apache Spark-based backends.

Chapter 9, Calling External APIs to Enrich Your Data, explores how you can use Python and R code to read data from external web services for your dashboards. It also shows you how to reduce waiting time when retrieving data using parallelization techniques.

Chapter 10, Calculating Columns Using Complex Algorithms, shows how to analyze data using various algorithms and math techniques in order to get hidden insights from your data.

Chapter 11, Adding Statistics Insights: Associations, explains the basic concepts of some statistical procedures that aim to extract relevant insights regarding the associations between variables from your data.

Chapter 12, Adding Statistics Insights: Outliers and Missing Values, explores some methodologies for detecting univariate and multivariate outliers in your dataset. In addition, advanced methodologies to impute possible missing values in datasets and time series will be exposed.

Chapter 13, Using Machine Learning without Premium or Embedded Capacity, shows how to use machine learning in Power BI even if you only have the Pro license, using AutoML solutions and Cognitive Services.

Chapter 14, Exploratory Data Analysis, focuses on implementing a report that will help analysts in understanding the shape of a dataset and the relationship between its variables thanks to custom data visualizations developed in R.

Chapter 15, Advanced Visualizations, explores how you can create a very advanced and attractive custom chart using R and Power BI.

Chapter 16, Interactive R Custom Visuals, teaches you how to introduce interactivity into custom graphics created using R and by using HTML widgets directly.

To get the most out of this book

You will need the following elements: A working PC and internet connection, along with the Power BI desktop application.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Extending-Power-BI-with-Python-and-R. If there's an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781801078207_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in the text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Enter this command: install.packages('ggdist'). It is designed to create visualizations of categorical data, in particular, mosaic plots."

A block of code is set as follows:

tbl <- src_tbl %>%

  mutate(

    across(categorical_vars, as.factor),

    across(integer_vars, as.integer)

  ) %>%

  select( -all_of(vars_to_drop) )

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

aws athena get-query-execution --query-execution-id <QueryExecutionId>

Any command-line input or output is written as follows:

$ mkdir css

$ cd css

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, "mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Extending Power BI with Python and R, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

Section 1: Best Practices for Using R and Python in Power BI

The first thing to know when approaching analytic languages in Power BI is where you can use them within Power BI, and then what limitations on their use are imposed by Power BI itself. At that point, knowing which analytical language engines and integrated development environments to install, how to properly configure them with Power BI products, and their limitations is the next step in starting to use them properly. This section guides you through all of this in a simple and efficient manner.

This section comprises the following chapters:

Chapter 1, Where and How to Use R and Python Scripts in Power BIChapter 2, Configuring R with Power BIChapter 3, Configuring Python with Power BI

Chapter 1: Where and How to Use R and Python Scripts in Power BI

Power BI is Microsoft's flagship self-service business intelligence product. It consists of a set of on-premises applications and cloud-based services that help organizations integrate, transform, and analyze data from a wide variety of source systems through a user-friendly interface.

The platform is not limited to data visualization. Power BI is much more than this, when you consider that its analytics engine (Vertipaq) is the same as SQL Server Analysis Services (SSAS) and Azure Analysis Services. It also uses Power Query as its data extraction and transformation engine, which we find in both Analysis Services and Excel. The engine comes with a very powerful and versatile formula language (M) and GUI, thanks to which you can "grind" and shape any type of data into any form.

Moreover, Power BI supports DAX as a data analytic formula language, which can be used for advanced calculations and queries on data that has already been loaded into tabular data models.

Such a versatile and powerful tool is a godsend for anyone who needs to do data ingestion and transformation in order to build dashboards and reports to summarize a company's business.

Recently, the availability of huge amounts of data, along with the ability to scale the computational power of machines, has made the area of advanced analytics more appealing. So, new mathematical and statistical tools have become necessary in order to provide rich insights. Hence the integration of analytical languages such as Python and R within Power BI.

R or Python scripts can only be used within Power BI with specific features. Knowing which Power BI tools can be used to inject R or Python scripts into Power BI is key to understanding whether the problem you want to address is achievable with these analytical languages.

This chapter will cover the following topics:

Injecting R or Python scripts into Power BIUsing R and Python to interact with your dataR and Python limitations on Power BI products

Technical requirements

This chapter requires you to have Power BI Desktop already installed on your machine (you can download it from here: https://aka.ms/pbiSingleInstaller).

Injecting R or Python scripts into Power BI

In this first section, Power BI Desktop tools that allow you to use Python or R scripts will be presented and described in detail. Specifically, you will see how to add your own code during the data loading, data transforming, and data viewing phases.

Data loading

One of the first steps required to work with data in Power BI Desktop is to import it from external sources:

There are many connectors that allow you to do this, depending on the respective data sources, but you can also do it via scripts in Python and R. In fact, if you click on the Get data icon in the ribbon, not only the most commonly used connectors are shown, but you can select other ones from a more complete list by clicking on More...:

Figure 1.1 – Browse more connectors to load your data

In the new Get Data window that pops up, simply start typing the string script into the search text box, and immediately the two options for importing data via Python or R appear:

Figure 1.2 – Showing R script and Python script into the Get Data window

Reading the contents of the tooltip, obtained by hovering the mouse over the Python script option, two things should immediately jump out at you:

a) A local installation of Python is required.

b) What can be imported through Python is a data frame.

The same two observations also apply when selecting R script. The only difference is that it is possible to import a pandas DataFrame when using Python (a DataFrame is a data structure provided by the pandas package), whereas R employs the two-dimensional array-like data structure called an R data frame, which is provided by default by the language.

After clicking on the Python script option, a new window will be shown containing a text box for writing the Python code:

Figure 1.3 – Window showing the Python script editor

As you can see, it's definitely a very skimpy editor, but in Chapter 3,Configuring Python with Power BI, you'll see how you can use your favorite IDE to develop your own scripts.

Taking a look at the warning message, Power BI reminds you that no Python engine has been detected, so it must be installed. Clicking on the How to install Python link will cause a Microsoft Docs web page to open, explaining the steps to install Python.

Microsoft suggests installing the base Python distribution, but in order to follow some best practices on environments, we will install the Miniconda distribution. The details of how to do this and why will be covered in Chapter 3.

If you had clicked on R script instead, a window for entering code in R, similar to the one shown in Figure 1.4, would have appeared:

Figure 1.4 – Window showing the R script editor

As with Python, in order to run code in R, you need to install the R engine on your machine. Clicking on the How to install R link will open a Docs page where Microsoft suggests installing either Microsoft R Open or the classic CRAN R. Chapter 2, Configuring R With Power BI, will show you which engine to choose and how to configure your favorite IDE to write code in R.

In order to import data using Python or R, you need to write code in the editors shown in Figure 1.3 and Figure 1.4 that assigns a pandas DataFrame or an R dataframe to a variable, respectively. You will see concrete examples throughout this book.

Next, let's look at transforming data.

Data transformation

It is possible to apply a transformation to data already imported or being imported, using scripts in R or Python. Should you want to test this on the fly, you can import the following CSV file directly from the web: http://bit.ly/iriscsv. Follow these steps:

Simply click on Get data and then Web to import data directly from a web page:

Figure 1.5 – Select the Web connector to import data from a web page

You can now enter the previously mentioned URL in the window that pops up:

Figure 1.6 – Import the Iris data from the web

Right after clicking OK, a window will pop up with a preview of the data to be imported.

In this case, instead of importing the data as-is, click on Transform Data in order to access the Power Query data transformation window:

Figure 1.7 – Imported data preview

It is at this point that you can add a transformation step using a Python or R script by selecting the Transform tab in Power Query Editor:

Figure 1.8 – R and Python script tools into Power Query Editor

By clicking on Run Python script, you'll cause a window similar to the one you've already seen in the data import phase to pop up:

Figure 1.9 – The Run Python script editor

If you carefully read the comment in the text box, you will see that the dataset variable is already initialized and contains the data present at that moment in Power Query Editor, including any transformations already applied. At this point, you can insert your Python code in the text box to transform the data into the desired form.

A similar window will open if you click on Run R script:

Figure 1.10 – The Run R script editor

Also, in this case, the dataset variable is already initialized and contains the data present at that moment in Power Query Editor. You can then add your own R code and reference the dataset variable to transform your data in the most appropriate way.

Next, let's look at visualizing data.

Data visualization

Finally, your own Python or R scripts can be added to Power BI to create new visualizations, in addition to those already present in the tool out of the box:

Assuming we resume the data import activity begun in the previous section, once the Iris dataset is loaded, simply click Cancel in the Run R script window, and then click Close & Apply in the Home tab of Power Query Editor:

Figure 1.11 – Click Close & Apply to import the Iris data

After the data import is complete, you can select either the R script visual or Python script visual option in the Visualizations pane of Power BI:

Figure 1.12 – The R and Python script visuals

If you click on Python script visual, a window pops up asking for permission to enable script code execution, as there may be security or privacy risks:

Figure 1.13 – Enable the script code execution

After enabling code execution, in Power BI Desktop you can see a placeholder for the Python visual image on the report canvas and a Python script editor at the bottom:

Figure 1.14 – The Python visual layout

You can now write your own custom code in the Python editor and run it via the Run script icon highlighted in Figure 1.14 to generate a Python visualization.

A pretty much identical layout occurs when you select R script visual.

Using R and Python to interact with your data

In the previous section, you saw all the ways you can interact with your data in Power BI via R or Python scripts. Beyond knowing how and where to inject your code into Power BI, it is very important to know how your code will interact with that data. It's here that we see a big difference between the effect of scripts injected via Power Query Editor and scripts used in visuals:

Scripts via Power Query Editor: This type of script will transform the data and persist transformations in the model. This means that it will always be possible to retrieve the transformed data from any object within Power BI. Also, once the scripts have been executed and have taken effect, they will not be re-executed unless the data is refreshed. Therefore, it is recommended to inject code in R or Python via Power Query Editor when you intend to use the resulting insights in other visuals, or in the data model.Scripts in visuals: The scripts used within the R and Python script visuals extract particular insights from the data and only make them evident to the user through visualization. Like all the other visuals on a report page, the R and Python script visuals are also interconnected with the other visuals. This means that the script visuals are subject to cross-filtering and therefore they are refreshed every time you interact with other visuals in the report. That said, it is not possible to persist the results obtained from the visuals scripts in the data model.

Tip

Thanks to the interactive nature of R and Python script visuals due to cross-filtering, it is possible to inject code useful to extract real-time insights from data, but also from external sources (you'll see how in Chapter 9,Calling External APIs to Enrich Your Data). The important thing to keep in mind is that, as previously stated, it is then only possible to visualize such information, or at the most to write it to external repositories (as you will see in Chapter 7,Logging Data from Power BI to External Sources).

In the final section of this chapter, let's look at the limitations of using R and Python when it comes to various Power BI products.

R and Python limitations on Power BI products

The first question once you are clear on where to inject R and Python scripts in Power BI could be: "Is the use of R and Python code allowed in all Power BI products?" In order to have a brief recap of the various Power BI products and their usage in general, here is a concise list:

Power BI Service: This is sometimes called Power BI Online, and it's the Software as a Service (SaaS) declination of Power BI. It was created to facilitate the sharing of visual analysis between users through Dashboards and Reports.Power BI Report Server: This is the on-premises version of Power BI and it extends the capabilities of SQL Server Reporting Services, enabling the sharing of reports created in Power BI Desktop (for Report Server).Power BI Embedded: A Microsoft Azure service that allows dashboards and reports to be embedded in an application for users who do not have a Power BI account.Power BI Desktop: A free desktop application for Windows that allows you to use almost all of the features that Power BI exposes. It is not the right tool for sharing results between users, but it allows you to share them on Power BI Service and Power BI Report Server. The desktop versions that allow publishing on the two mentioned services are distinct.Power BI Mobile: A mobile application, available on Windows, Android, and iOS, that allows secure access to Power BI Service and Power BI Report Server, and that allows you to browse and share dashboards and reports, but not edit them.

Apart from the licenses, which we will not go into here, a summary figure of the relationships between the previously mentioned products follows:

Figure 1.15 – Interactions between Power BI products

Unfortunately, of all these products, only Power BI Service, Power BI Embedded, and Power BI Desktop allow you to enrich data via code in R and Python:

Figure 1.16 – Power BI products compatibility with R and Python

Important note

From here on out, when we talk about Power BI Service in terms of compatibility with analytical languages, what we say will also apply to Power BI Embedded.

So, if you need to develop reports using advanced analytics through R and Python, make sure the target platform supports them.

Summary

This chapter has given a detailed overview of all the ways by which you can use R and Python scripts in Power BI Desktop. During the data ingestion and data transformation phases, Power Query Editor allows you to add steps containing R or Python code. You can also make use of these analytical languages during the data visualization phase thanks to the R and Python script visuals provided by Power BI Desktop.

It is also very important to know how the R and Python code will interact with the data already loaded or being loaded in Power BI. If you use Power Query Editor, both when loading and transforming data, the result of script processing will be persisted in the data model. Also, if you want to run the same scripts again, you have to refresh the data. On the other hand, if you use the R and Python script visuals, the code results can only be displayed and are not persisted in the data model. In this case, script execution occurs whenever cross-filtering is triggered via the other visuals in the report.

Unfortunately, at the time of writing, you cannot run R and Python scripts in any Power BI product. The only ones that provide for running analytics scripts are Power BI Desktop and the Power BI service.

In the next chapter, we will see how best to configure the R engine and RStudio to integrate with Power BI Desktop.