35,99 €
Unleash the full potential of Power BI with the integration of AI and machine learning techniques using OpenAI
Purchase of the print or Kindle book includes a free PDF eBook
Key Features
Take flight with Power BI machine learning and OpenAI using hands-on examples from the FAA airline data
Unlock the full potential of Power BI for advanced analytics using OpenAI
Design stunning data presentations, seamless integration of machine learning tools and technologies with OpenAI
Book Description
Microsoft Power BI is the ultimate solution for businesses looking to make data-driven decisions and unlock the full potential of their data. Unleashing Your Data with Power BI Machine Learning and OpenAI is designed for data scientists and BI professionals seeking to improve their existing solutions and workloads using AI. The book explains the intricacies of the subject by using a workshop-style data story for data ingestion, data modeling, analytics, and predictive analytics with Power BI machine learning. Along the way, you’ll learn about AI features, AI visuals, R/Python integration, and OpenAI integration. The workshop-style content allows you to practice all your learnings in real-life challenges and gain hands-on experience. Additionally, you’ll gain an understanding of AI/ML, step by step, with replicable examples and references. From enhancing data visualizations to building SaaS Power BI ML models, and integrating Azure OpenAI, this book will help you unlock new capabilities in Power BI.
By the end of this book, you’ll be well-equipped to build ML models in Power BI, plan projects for both BI and ML, understand R/Python visuals with Power BI, and introduce OpenAI to enhance your analytics solutions.
What you will learn
Discover best practices for implementing AI and ML capabilities in Power BI along with integration of OpenAI into the solution
Understand how to integrate OpenAI and cognitive services into Power BI
Explore how to build a SaaS auto ML model within Power BI
Gain an understanding of R/Python integration with Power BI
Enhance data visualizations for ML feature discovery
Discover how to improve existing solutions and workloads using AI and ML capabilities in Power BI with OpenAI
Acquire tips and tricks for successfully using AI and ML capabilities in Power BI along with integration of OpenAI into the solution
Who this book is for
This book is for data science and BI professionals looking to expand their skill sets into Power BI machine learning and OpenAI. This book is also useful for data scientists, data analysts, and IT professionals who want to learn how to incorporate OpenAI into Power BI for advanced experience.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 306
Veröffentlichungsjahr: 2023
Explore data through business intelligence, predictive analytics, and text generation
Greg Beaumont
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Ali Abidi
Senior Editor: David Sugarman
Technical Editor: Sweety Pagaria
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Jyoti Chauhan
Marketing Coordinator: Shifa Ansari
First published: June 2023
Production reference:1290523
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-83763-615-0
www.packtpub.com
I would like to thank my wife and family for their continuous support, especially for their patience over the last seven months while I worked weekends to complete this book. I would also like to thank the mentors and colleagues I’ve worked with over the years, both at Microsoft and in the consulting profession, who went out of their way to be teachers and teammates. The US Federal Aviation Administration (FAA) deserves praise for their efforts in curating and providing the Wildlife Strike database used in this book as open data available to the public. Finally, I am grateful to the many customers I have worked with over the years who have challenged and inspired me to work harder, learn more, and re-think my perspectives about how their businesses intersect with technology.
Greg Beaumont is a data architect at Microsoft, where he is tasked with solving complex problems and creating value for his customers. Focusing on the healthcare industry, Greg works closely with customers to plan enterprise analytics strategies, evaluate new tools and products, conduct training sessions and hackathons, and architect solutions that improve the quality of care and reduce costs. With years of experience in data architecture and a passion for innovation, Greg enjoys identifying and solving complex challenges. He strives to be a trusted advisor to his customers and is always seeking new ways to drive progress and help organizations thrive. He is a veteran of the Microsoft data speaker network and has worked with hundreds of customers on their data management and analytics strategies.
Peter ter Braake started working as a developer in 1996 after studying physics in Utrecht, the Netherlands. Databases and business intelligence piqued his interest the most, leading to him specializing in SQL Server and its business intelligence components. He has worked with Power BI from the tool’s very beginning. Peter started working as an independent contractor in 2008. This has enabled him to divide his time between teaching data-related classes, consulting with customers, and writing articles and books. Peter has also authored Data Modeling for Azure Data Services, Packt Publishing.
Inder Rana is a principal cloud architect for the Microsoft Azure cloud, specializing in data platforms including big data analytics and AI. With expertise in Azure services and a deep understanding of cloud architecture principles, Inder helps organizations leverage the power of Microsoft Azure to drive innovation and achieve their business goals. He is passionate about technology and strives to explain complex concepts in a simpler manner to help his clients understand and utilize technology in the right place for the right reasons.
Your journey starts with the ingestion and preparation of data using Power BI. After discussing data modeling for business intelligence and machine learning, you will learn how to connect to data for the use case, clean it up and check for errors, explore the data to ensure referential integrity, and then create a relational data model.
This part has the following chapters:
Chapter 1, Requirements, Data Modeling, and PlanningChapter 2, Preparing and Ingesting Data with Power QueryChapter 3, Exploring Data Using Power BI and Creating a Semantic ModelChapter 4, Model Data for Machine Learning in Power BIYou begin your journey by assessing the requirements and data for your project. The use case will be a fictional scenario, but everything will be built using real data from the Federal Aviation Administration’s (FAA) Wildlife Strike Database. The data is real, the topic can be understood by anyone, and the findings within the data are interesting and fun. According to the FAA’s website, about 47 animal strikes are reported daily by aircraft. These incidents can damage airplanes, potentially endanger passengers, and negatively impact wild animal (especially bird) populations.
For the use case, you have been assigned to provide your leadership with tools to do an interactive analysis of the FAA Wildlife Strike data, find insights about factors that influence the incidents, and also make predictions about future wildlife strike incidents and the associated costs. The primary goal of your project, predicting the future impact of FAA Wildlife Strikes, will require building some Power BI machine learning models.
Before uploading data to Power BI’s machine learning (ML) tools, you’ll need to create tables of data that will train the ML models. There is an old saying about data and analytics: “Garbage in, garbage out.”Software as a Service (SaaS) machine learning tools are easy to use, but you still need to feed them good-quality curated data. Identifying the right training data and getting it into the right format are crucial steps in an ML project.
This project will encompass data exploration, data transformation, data analysis, and additional downstream data transformations before you begin working with Power BI ML tools. You are already an experienced business intelligence (BI) professional and Power BI user, and now you are ready to take your skills to the next level with ML in Power BI!
Power BI supports connections to source data in many different formats, ranging from relational databases to unstructured sources to big flat tables of raw data. Countless books have been written about the best ways to structure and model data for different use cases. Rather than dive into the specifics of data modeling, for this book, we will begin with two simple assumptions:
Most of the time, a star schema design will provide the most efficient storage and query performance for business intelligence data modelsBasic ML models, such as the ones you will build in this book, are usually created with a flattened tableJust to be clear, not every solution will follow these assumptions. Rather, these assumptions are generalizations that can provide you with a starting point as you approach the design of a new data model. Quite often, there will not be a perfect answer, and the optimal design will be dictated by the types of queries and business logic that are generated by the end consumers of the data model.
If you’ve never heard the terms star schema and flattened data before, don’t worry! The book will progress at a pace that is intended to help you learn and will also stay at a level that makes sense when you review the FAA data. Let’s browse the FAA Wildlife Strike data and decide upon the best data modeling strategy for your new project!
In this chapter, we will take the following steps so that you can understand the data, think through how it will be used, and then formulate a preliminary plan for the data model:
Reviewing the source dataReviewing the requirements for the solutionDesigning a preliminary data modelConsiderations for MLFor this chapter, you will need the following:
Power BI Desktop April 2023 or later (no licenses required)FAA Wildlife Strike data files from either the FAA website or the Packt GitHub repo: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01You begin your journey by digging into the source data that you will be using for your project. Let’s get started!
The source data that you will be using for this book is real data from the United States FAA. The data contains reports filed when aircraft struck wildlife. There is a website providing details, documentation, updates, and access instructions at this URL: https://wildlife.faa.gov/home. The URL (and all URLs) will also be linked from the affiliated GitHub site at https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI in case changes are made after this book has been published.
If you’d prefer to follow along using the finished version of the content from this chapter rather than building it all step by step, you can download the PBIT version of the file at the Packt GitHub site folder for Chapter 1: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.
Within the FAA Wildlife Strike Database website, you can navigate to this link and run basic queries against the data and familiarize yourself with the content: https://wildlife.faa.gov/search. There are also two files linked from this page that you can reference while reviewing the source data.
At the time of this book’s writing, the second heading on the web page is titled Download the FAA Wildlife Strike Database, and it has a link titled Download that allows you to download the entire historical database along with a reference file. You can download the files from the FAA site for the purposes of this book. There will also be a Power BI PBIT file containing the results of the efforts of this chapter at the GitHub repository. A PBIT file is a Power BI template that can be populated with the files that you download from the Packt GitHub site. If the FAA data ever becomes unavailable, you can still proceed with the contents from the GitHub site to recreate the contents of every chapter.
The files you will be using from the FAA are as follows:
wildlife.accdb: This contains all of the historical FAA Wildlife Strike reports. You can also download a copy of the file that is identical to the book from the Packt GitHub site: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.read_me.xls: This contains descriptive information about the data in the wildlife.accdb database file. An .xlsx version of the file is available on the Packt GitHub site, too.The wildlife.accdb file is in an Access file format that can be opened with many different tools including Microsoft Access, Microsoft Excel, Power BI, and many more. For the purpose of this book, you will open it using Power BI Desktop. Power BI Desktop is available as a free download at this link: https://powerbi.microsoft.com/en-us/downloads/.
First, open up Power BI Desktop. Once it is open on your desktop, select the Get data drop-down menu from the ribbon and click on More… as shown in the following screenshot:Figure 1.1 – Connecting to data with Power BI Desktop
Next, within the Get data window, select Access database and click Connect:Figure 1.2 – Access database connector in Power BI
Select the Access database file that was downloaded and unzipped from the FAA Wildlife Strike Database, named wildlife:Figure 1.3 – The wildlife file shows up in Power BI
Select the STRIKE_REPORTS table and click Transform Data:Figure 1.4 – Preview of the data before making transformations
The Power Query window will open in Power BI Desktop with a preview of the FAA Wildlife Strike data. On the ribbon, select the View header for Data Preview, and then check the boxes for Column quality, Column distribution, and Column profile. These features will provide some insights for the data preview, that helps you explore and understand the data:Figure 1.5 – Data Preview features in Power Query
In Figure 1.5, notice that the first column, INDEX_NR, is highlighted. You can see that none of the values are empty, none have errors, and in Column statistics at the bottom of Figure 1.6, every value is a unique integer. The name INDEX_NR gives it away, but this column is the unique identifier for each row of data.
Let’s review another column in Power Query. Go ahead and highlight TIME_OF_DAY. As you can see in Figure 1.6, there are four distinct values and about 12% are blank. Blank values are an important consideration for this solution. Non-empty values include terms such as Day, Dawn, Dusk, and Night. What does an empty value mean? Was the field left blank by the person filing the report? Was it not entered properly into the system? You’ll revisit this topic later in the book.
Figure 1.6 – Column statistics help with understanding data
Since there are over 100 columns in the FAA Wildlife Strike reports’ data, we won’t discuss all of them in this chapter. That being said, reviewing each and every column would be a great way to review the data for errors, empty fields, distribution of values, and more. For the purposes of this chapter, go ahead and open up the read_me.xls file that was included with the ZIP file from the FAA. The first sheet is Column Name and contains the names and descriptive data about the columns in the wildlife.accdb file. Most of the columns fall into one of the following categories:
Date and time fields detailing the dates, times, and years for different events related to each reportDescriptive information about the event such as height of contact, latitudes and longitudes, originating airports, and flight numbersDescriptive information about the aircraft such as ownership, aircraft type and manufacturer, number of engines, location of engines, and so onEstimates of the damage due to the strike such as costs, costs adjusted for inflation, damage location on the aircraft, and moreInformation about the wildlife struck by the aircraft including species, size, quantities hit, and so onOnce you’ve finished browsing the report data, close the read_me.xlsx document on your desktop, and then connect to it from Power BI per the following steps. The document version used in this book can be downloaded from the Packt GitHub site here: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.
Click on Excel Workbook in the left-hand panel:Figure 1.7 – Excel Workbook is a new source of data
Select the read_me file from the browser and click Open:Figure 1.8 – Excel file ready to open in Power Query
Tick the Aircraft Type, Engine Codes, and Engine Position boxes. Then, click OK.Figure 1.9 – Sheets in Power Query can be individually selected
After clicking OK and importing the three sheets, notice that Aircraft Type, Engine Codes, and Engine Position are now available in Power Query as three separate tables of data:
Figure 1.10 – Three new tables are previewed in Power Query
The three tables contain descriptive information about values that exist in the FAA Wildlife Strike reports’ data:
Aircraft Type: A table that maps the Aircraft Code to a description such as Airplane, Helicopter, or GliderEngine Codes: Information about engine manufacturer and model numbersEngine Position: Details about the location of an engine on the aircraftFor all three of these tables, you’ll notice that there are some unnecessary rows and blank values. You will address these later in the book, so there is no need to make any modifications in Power Query at this time.
Once you’ve browsed the different columns from all the different tables in Power Query, click Close & Apply to import the data into Power BI and save it, per the following diagram:
Figure 1.11 – The Close & Apply button will import data into Power BI
Once the data is imported, you can save your Power BI Desktop file as a .pbix file. A copy of the PBIT file named Chapter 1 Template.pbit, which can be populated with the data and then saved as a PBIX, can be found at this GitHub link: https://github.com/PacktPublishing/Unleashing-Your-Data-with-Power-BI-Machine-Learning-and-OpenAI/tree/main/Chapter-01.
Now that you’ve
