E-Book
29,99 €

The Definitive Guide to Power Query (M) E-Book

Gregory Deckler

0,0

29,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Lebensstil
Sprache: Englisch

Beschreibung

Data transformation is a critical step in building data models and business intelligence reports. Power Query is an invaluable tool for anyone who wants to master data transformation, and this book will equip you with the knowledge and skills to make the most of it.
The Definitive Guide to Power Query (M) will help you build a solid foundation in the Power Query M language. As you progress through the chapters, you'll learn how to use that knowledge to implement advanced concepts and data transformations. This will set the stage for an uncompromisingly thorough exploration of the Power Query M Language.
You'll also get to grips with optimizing performance, handling errors, and implementing efficient data processing techniques. As this is a hands-on guide, the practical examples in the chapters will help you gain the skills to apply Power Query to real-world problems and improve your data analysis capabilities.
By the end of this book, you will be able to leverage all of Power Query's remarkable capabilities for data transformation.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 1095

Veröffentlichungsjahr: 2024

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

The Definitive Guide to Power Query (M)

Mastering complex data transformation with Power Query

Gregory Deckler

Rick de Groot

Melissa de Korte

The Definitive Guide to Power Query (M)

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Senior Publishing Product Manager: Gebin George

Acquisition Editor – Peer Reviews: Tejas Mhasvekar

Project Editor: Rianna Rodrigues

Content Development Editors: Shruti Menon and Shazeen Iqbal

Copy Editor: Safis Editing

Technical Editors: Aneri Patel and Kushal Sharma

Proofreader: Safis Editing

Indexer: Tejal Daruwale Soni

Presentation Designer: Pranit Padwal

Developer Relations Marketing Executive: Vignesh Raju

First published: March 2024

Production reference: 1260324

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-83508-972-9

www.packt.com

Foreword

Occasionally, the stars align in just the right way, and the result is something that is unexpectedly great. That’s the story behind the book you’re holding now. But I’m getting ahead of myself. Let’s turn the clock back to the start of 2022, which is where the story of this book begins.

At that point, I had been a data professional and Excel user for over 30 years and was doing a lot of Power BI development while conducting online training as well. I had long used Power Query to clean and transform my data prior to performing analysis and building reports. But almost all of that prep was done through the Power Query user interface. As well designed as the Power Query UI is, it’s also quite limiting, providing access to only a fraction of the 700+ available M functions. So, at the start of 2022, I decided that I was going to stop dabbling in M and make learning it thoroughly my top-priority training goal for the year. This was not easy in 2022, since while there were fantastic books on Power Query, they each had only a small percentage of their content dedicated to M. There were many great M resources online, but they offered deep insights into a grab bag of very specialized topics and applications, which also did not make for an ideal learning experience.

Despite these hurdles, by the end of 2022, I found that my deep dive into M had been transformative. My Power BI reports ran much faster, the DAX used in them was dramatically simpler, I was able to clean data faster than ever, and I now could easily build reusable custom functions to automate repetitive data tasks. I also found that learning how to visualize and manipulate complex data structures in M substantially improved my coding ability in other languages. However, I also realized that my experience was entirely unreplicable due to the fact that three of my friends (not coincidentally the authors of this book) happened to be among the most knowledgeable Power Query/M experts in the world, and I relied heavily on their expertise during my learning. I actually used to joke with both Melissa and Rick online about each of them being the “Chosen One” destined to write the M counterpart to The Definitive Guide to DAX, but I think that seemed too much of a monumental task to both of them.

During this time, I was the primary tester on a suite of apps that Gregory Deckler was developing. We would regularly talk about Power Query and M when we would discuss the potential future capabilities of those apps. One day, Greg mentioned to me that he was working on a book proposal outline for that same comprehensive guide to M that I had been joking about (but also genuinely wishing existed). He asked me who I would recommend as co-authors for that book, and I am beyond thrilled to say that my response can be found on the cover of this book.

Mastering M is neither a quick nor easy task. However, I am completely confident that whether you are a data analyst, Power BI developer, data scientist, business analyst, or even an Excel end user looking to level up your skills, having the same three clear, patient, and brilliant instructors I did will make learning M an enjoyable and game-changing experience for you too.

Brian Julius

Power BI Expert and Instructor

Contributors

About the authors

Gregory Deckler is a seven-time Microsoft MVP for Data Platform and an active blogger and Power BI community member, having authored over 6,000 solutions to community questions. Greg has authored numerous books on Power BI, including the first and second editions of Learn Power BI, the second editions of Power BI Cookbook and Mastering Power BI, and DAX Cookbook. Greg has also created several external tools for Power BI and regularly posts video content to his YouTube channels – Microsoft Hates Greg and DAX For Humans.

I would like to acknowledge my wonderful co-authors for this book; I learned so much from all of them. I’m also grateful to my son Rocket for supporting all of my endeavors. A special thanks to Brian Julius who inspired this book and without whom this book would not exist

Rick de Groot is a Power BI consultant, blogger, YouTuber, and author who hails from the Netherlands. With a background in finance, he has devoted 14+ years to building his expertise in data analysis, particularly in the Power Query M language. He is an independent consultant who also provides training on Power BI, Power Query, and DAX. He regularly publishes articles on his blogs, Power Query How and BI Gorilla, which have emerged as important sources for M articles and tutorials. He contributes extensively to his YouTube channel BI Gorilla as well. Rick’s commitment to sharing his expertise has earned him the Microsoft Data Platform MVP award for two consecutive years.

My deepest appreciation goes to Brian Julius . His belief in me nudged me to share more about Power Query and helped me grow as a writer for the M language. Without his support, I don’t think my name would have been on the cover of this book. He also played an important part in our discussions and helped shape the contents of this book. I am profoundly thankful for your support, Brian.

To my family – your patience and support while writing this book have meant the world to me.

Also, a huge shout-out to all of you out there. All the discussions we’ve had have been incredibly valuable.

Melissa de Korte is a passionate problem-solver known for simplifying complex problems with Power Query. With an impressive track record as an enterprise DNA expert and super user, she has become an asset to the community, actively engaging, supporting, and inspiring others. Behind her professional persona lies a genuine dedication to empowering others through sharing knowledge. Her portfolio includes blogs, tutorials, courses, and webinars to make Power Query and M more accessible.

To Marcel and Sam for filling every moment we share(d) with joy.

My gratitude goes to my co-authors, as well as Brian Julius, for their roles in this book’s creation. To my family for their endless patience and support. And to all who generously share their knowledge, spark curiosity and promote a supportive and inclusive community. Your contributions are invaluable!

About the reviewers

Erin Ostrowsky is a digital nomad who endeavors to build healthy, happy relationships wherever she goes. She’s passionate about STEAM – don’t forget the Arts, folks! – and hopes to inspire others to join in the fun problem-solving this world needs. Erin loves learning, creating, making friends, traveling, and writing about life’s many curiosities. Her current focus is running Fit 4 Duty Data, a data and reporting consultancy and training firm that supports the global mining and energy sectors.

I’d like to thank my Creator, my friends, and my family – you all make life beautiful, especially during the hard times.

Vahid Doustimajd is a recognized Microsoft MVP and certified trainer with over 15 years of experience. He is a technophile who is passionate about project management and data analytics. Currently, he works in Australia as a Project Controls and Analytics Manager. He is an active participant in the Microsoft Fabric and Power BI communities, enjoys sharing his expertise through blogs and videos, and organizes events for the Persian Power BI User Group.

I would like to extend my heartfelt gratitude to my wife for her endless support and patience. Her encouragement has been my guiding light.

Ahmed Oyelowo, a Managing Partner, lead trainer, and consultant at Foresight BI & Analytics Global Solutions, is a Microsoft Certified Power BI Data Analyst Associate and Azure Enterprise Data Analyst. He is also a Microsoft Certified Trainer, with a best-selling Power BI course on Udemy that has been taken by 90,000+ students. A four-time Microsoft MVP in the Data Platform category, he has developed cutting-edge solutions for various corporate clients.

My sincere appreciation to my Creator for the gift of all required to complete this review. I must also extend my gratitude to my dearest wife, Kafayat, and my lovely children, Yusraa and Yaasir, for their support and understanding while I was spending some time away from them to complete this task. Finally, I’d like to thank the authors of this book for this wonderful resource!

Learn more on Discord

Join our community’s Discord space for discussions with the author and other readers:

https://discord.gg/vCSG5GBbyS

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Introducing M

The history of M

Who should learn M?

Where and how is M used?

Experiences

Products and services

Why learn M?

M language basics

The let expression

The characteristics of M

Formal classification

Informal characteristics of M

Summary

Working with Power Query/M

Technical requirements

Touring the Power Query Desktop experience

A brief tour

Header

Formula bar

Ribbon

Queries pane

Query Settings pane

Preview pane

Status Bar

Your first query

Options and data source settings

Options

Data source settings

Editing experience-generated code

Creating custom columns

Adding an index column

Adding columns with examples

Math operations

Adding custom m code columns

Using the Advanced Editor

Summary

Accessing and Combining Data

Technical requirements

Accessing files and folders

File.Contents

Text/CSV

Excel

Folder

PDF

XML

Xml.Tables

Xml.Document

Azure Storage

Additional file formats

Retrieving web content

Investigating binary functions

Lines functions

Accessing databases and cubes

Cube functions

Working with standard data protocols

Addressing additional connectors

Popular software systems

Identity functions

Combining and joining data

Table.Combine

Table.NestedJoin and Table.Join

Table.FuzzyNestedJoin and Table.FuzzyJoin

Summary

Understanding Values and Expressions

Introducing the types of values

Binary values

Structure

Related functions

Special considerations

The Date/Time family of values

Date values

Time values

DateTime values

DateTimeZone values

Duration values

Logical values

Structure

Related Functions

Special considerations

Null values

Structure

Related functions

Special considerations

Number values

Structure and examples

Related functions

Special considerations

Text values

Structure

Related functions

Special considerations

List values

Structure

Related functions

Special considerations

Record values

Structure

Related functions

Special considerations

Table values

Structure

Related functions

Special considerations

Function values

Structure

Related functions

Special considerations

Type values

Operators

Expressions

Nesting let expressions

Coding best practices for expressions

Control structures

Enumerations

Summary

Understanding Data Types

What are data types?

The type system

Columns with mixed types

Column data type versus value type

Importance of types

Clarity and consistency

Data validation

Other reasons

Data types available in M

Primitive types

Abstract primitive types

Nullable primitive types

Using primitive types for data filtering

Custom types

List type

Record type

Table type

Function Type

Type detection

Retrieving data types from a data source

Automatically detecting types

Data type conversion

Converting value types

Converting column types

Avoiding data loss during conversion

Effect of locale/culture

Facets

Type Claims

Available Type Claims

Converting values using type claims

Inspecting Type Claims

Ascribing types

What is ascription?

Functions that support ascribing types

Ascribing types when creating records

Ascribing types when creating tables

Ascribing types when modifying tables

Ascribing types to any value

Errors when ascribing types

The base type is incompatible with the value

The Type Claim does not conform with the value

Ascribing incompatible types to structured values

Type equivalence, conformance, and assertion

Type equality

Type conformance

Type assertion

Summary

Structured Values

Introducing structured values

Lists

Introduction to lists

List operators

Equal

Not equal

Concatenate

Coalesce

Methods to create a list

Creating lists using the list initializer

Creating lists using functions

Referencing a table column

Using the a..b form

Accessing items in a list

Accessing list values by index

Handling non-existent index positions

Common operations with lists

Assigning data types to a list

Records

Introduction to records

Record operators

Equal

Not equal

Concatenation

Coalesce

Methods to create a record

Creating records using the record initializer

Creating records using functions

Retrieving a record by referencing a table row

Accessing fields in a record

Field selection

Record projection

Common operations with records

Structure for variables

Referencing the current row

Providing options for functions

Keeping track of intermediary results

Assigning a data type to records

Tables

Introduction to tables

Table operators

Equal

Not equal

Concatenation

Coalesce

Methods to create a table

Retrieve data from a source

Manually input data into functions

Re-use existing tables/queries

Using the Enter data functionality

Accessing elements in a table

Item access

Field access

Common operations with tables

Assigning a data type to tables

Summary

Conceptualizing M

Technical requirements

Understanding scope

Examining the global environment

Studying sections

Creating your own global environment

Understanding closures

Query folding

Managing metadata

Summary

Working with Nested Structures

Transitioning to coding

Getting started

Understanding Drill Down

The trick to getting more out of the UI

Methods for multistep value transformation

Transforming values in tables

Table.AddColumn

Table.TransformColumns

Table.ReplaceValue

Working with lists

Transforming a list

List.Transform

List.Zip

Extracting an item

Resizing a list

List.Range

List.Alternate

Filtering a list

List.FindText

List.Select

To-list conversions

Column or field names

A single column

All columns

All rows

Other operations

Expanding multiple list columns simultaneously

Flattening inconsistent multi-level nested lists

Working with records

Transforming records

Extracting a field value

Resizing records

Filtering records

To-record conversions

Table row to record

Record from table

Record from list

Conditional lookup or value replacement

Working with tables

Transforming tables

Extracting a cell value

Resizing a table in length

Resizing a table in width

Filtering tables

Approximate match

To-table conversions

Record-to-table conversion

Creating tables from columns, rows, or records

Table information

Working with mixed structures

Lists of tables, lists, or records

Tables with lists, records, or tables

Mixed structures

Flatten all

Unpacking all record fields from lists

Extracting data through lookup

Summary

Parameters and Custom Functions

Parameters

Understanding parameters

Creating parameters

Using parameters in your queries

Putting it all together

Parameterizing connection information

Dynamic file paths

Filtering a date range

Custom functions

What are custom functions?

Transforming queries into a function

What is the “create function” functionality?

Simplifying troubleshooting and making changes

Invoking custom functions

Manually in the advanced editor or formula bar

Using the UI

The each expression

Common usecases

Refining function definitions

Specifying data types

Making parameters optional

Referencing column and field names

Debugging custom functions

Function scope

Top-level expression

In line within a query

Putting it all together

Turning all columns into text

Merging tables based on date ranges

Summary

Dealing with Dates, Times, and Durations

Technical requirements

Dates

M calendar table

Other date formats

Julian days

Alternate date formats

Additional custom date functions

Working days

Moving average

Time

Creating a time table

Shift classification

Dates and times

Time zones

Correcting data refresh times

Duration

Working duration

Summary

Comparers, Replacers, Combiners, and Splitters

Technical requirements

Key concepts

Function invocation

Some common errors

Closures

Higher-order functions

Anonymous functions

Ordering values

Comparers

Comparer.Equals

Comparer.Ordinal

Comparer.OrdinalIgnoreCase

Comparer.FromCulture

Comparison criteria

Numeric value

Computing a sort key

List with key and order

Custom comparer with conditional logic

Custom comparer with Value.Compare

Equation criteria

Default comparers

Custom comparer

Key selectors

Combining key selectors and comparers

Replacers

Replacer.ReplaceText

Replacer.ReplaceValue

Custom replacers

Combiners

Combiner.CombineTextByDelimiter

Functionality

Example

Combiner.CombineTextByEachDelimiter

Functionality

Example

Combiner.CombineTextByLengths

Functionality

Example

Combiner.CombineTextByPositions

Functionality

Example

Combiner.CombineTextByRanges

Functionality

Example

Splitters

Splitter.SplitByNothing

Functionality

Example

Splitter.SplitTextByAnyDelimiter

Functionality

Example

Splitter.SplitTextByCharacterTransition

Functionality

Example

Splitter.SplitTextByDelimiter

Functionality

Example

Splitter.SplitTextByEachDelimiter

Functionality

Example

Splitter.SplitTextByLengths

Functionality

Example

Splitter.SplitTextByPositions

Functionality

Example

Splitter.SplitTextByRanges

Functionality

Example

Splitter.SplitTextByRepeatedLengths

Functionality

Example

Splitter.SplitTextByWhitespace

Functionality

Example

Practical examples

Removing control characters and excess spaces

Goals

A cleanTrim function

Extract email addresses from a string

Goals

Developing a getEmail function

Split combined cell values into rows

Goal

Transforming the table

Replacing multiple values

Goal

Accumulating a result

Combining rows conditionally

Goal

Group By’s comparer to the rescue

Summary

Handling Errors and Debugging

Technical requirements

What is an error?

Error containment

Error detection

Raising errors

The error expression

The … (ellipsis) operator

Error handling

Strategies for debugging

Common errors

Syntax errors

Dealing with errors – a top priority

DataSource.Error, could not find the source

An unknown or missing identifier

An unknown function

An unknown column reference

An unknown field reference

Not enough elements in the enumeration

Formula.Firewall error

Expression.Error: The key didn’t match any rows in the table

Expression.Error: The key matched more than one row in the table

Expression.Error: Evaluation resulted in a stack overflow and cannot continue

Putting it all together

Column selection

Building a custom solution

Reporting cell-level errors

Building a custom solution

Summary

Iteration and Recursion

Introduction to iteration

List.Transform

Extracting items from a list by position

Allocating a yearly budget to months

List.Accumulate

Function anatomy

Replacing multiple values

List.Generate

Advantages of List.Generate

Function anatomy

Handling variables using records

List.Generate alternatives

What are useful List.Generate scenarios

Looping through API data using List.Generate

Creating an efficient running total

Recursion

Why is recursion important?

Recursion versus iteration: a brief comparison

Recursive functions

What is the @ scoping operator?

Inclusive-identifier-reference

Using recursive functions

How to use the @ operator

Step 1: Write your initial code

Step 2: Identify the recursive call

Step 3: Add the @ operator

Step 4: Test your function

Removing consecutive spaces

Performance considerations using recursion

Summary

Troublesome Data Patterns

Pattern matching

Basics of pattern matching

Case sensitivity

Contains versus exact match

Allowed characters

Handling one or more elements

Wildcards

Extracting fixed patterns

Example 1, prefixed

Example 2, pattern

Example 3, splitters

Example 4, substitution

Example 5, regex

Combining data

Basics for combining data

Extract, transform, and combine

Get and inspect data

Location parameter

Connect to data

Filter files

Overall strategy

Choose a sample file

File parameter

Transformation pattern

Query, create function

Set up monitoring

Finetuning

Combine files

Summary

Optimizing Performance

Understanding memory usage when evaluating queries

Memory limit variations and adjustments

Query folding

Query folding in action

Query evaluation

Folding, not folding, and partial folding

Tools to determine foldability

View data source query

Query folding indicators

Query plan

Operations and their impact on folding

Foldable operations

Non-foldable operations

Data source privacy levels

Native database queries

Functions designed to prevent query folding

Strategies for maintaining query folding

Rearranging steps

Working with native queries

Rewriting code

Using cross-database folding

The formula firewall

What is the formula firewall?

Understanding partitions

The fundamental principle of the formula firewall

Firewall error: Referencing other partitions

Connecting to a URL using native parameters

Connecting to a URL using an Excel parameter

Resolving the firewall error

Firewall error: Accessing compatible data sources

Understanding privacy levels

Setting privacy levels

Resolving the firewall error

Optimizing query performance

Prioritize filtering rows and removing columns

Buffering versus streaming operations

Buffering operations

Streaming operations

Using the query plan

Using buffer functions

The impact of buffering and running totals

The setup

Method 1: Running total from regular table

Method 2: Running total from buffered table

Method 3: Running total from buffered column

Method 4: Running total from buffered column using List.Generate

Data source considerations

Data sources and speed

Using dataflows

Performance tips

Summary

Enabling Extensions

Technical requirements

What are Power Query extensions?

What can you do with extensions?

Preparing your environment

Getting Visual Studio Code

Getting the Power Query SDK

Setting up Internet Information Services

Setting up Discord

Discord client

Discord server

Discord app

Discord OAuth configuration

Creating a custom connector

Creating an extension project

TDGTPQM_Discord.pq

Configuring authentication

Adding client ID and client secret files

Adding Configuration Settings

Creating OAuth functions

Modifying the data source record definition

Adding a credential

Testing the connection

Configuring navigation and content

Adding API call functions for data retrieval

Adding navigation functions

Modifying the contents function

Installing and using a custom connector

Summary

Other Books You May Enjoy

Index

Landmarks

Cover

Index

Preface

Over the last decade, the popularity of the Power Query’s M language has continuously grown, such that it has become nearly ubiquitous within the Microsoft ecosystem, including Power BI, Excel, Power Platform, Dynamics 365, SQL Server, and Data Factory. Today, M and Power Query are indispensable tools for modern data professionals, such as business analysts, data scientists, and data enthusiasts.

This book seeks to make you a master of the M language. While M can be written using the Power Query Editor’s graphical user interface (GUI), this approach severely limits you to only a small fraction of the 700+ core M functions. We estimate that the GUI for authoring M queries allows you to solve only about 50% of challenges related to data transformation. However, mastery of M allows you to bring that figure close to 100%. Starting with simple concepts and code, this book progressively moves you along a path of increasing complexity. Complex concepts are explained in clear and concise language with practical examples that demonstrate the concepts in action. By the end of this book, there will be few, if any, data transformation challenges that you won’t be able to tackle head-on.

Get ready for an exciting and rewarding journey toward your mastery of the M language!

Who this book is for

If you’re new to Power Query, then this book might not be the best choice for you. There are books out there that introduce the basics through the user interface and simple button-based actions, offering a straightforward way to get started.

On the other hand, if you are serious about fully understanding the M language, then this book is for you. If this is your first experience with M, you may find certain concepts challenging. Reading through the pages the first time will provide you with lots of information, but some topics may be too complex. As you become more familiar with M, reading the book again will likely make these complex topics easier to understand.

Overall, our goal was to write a book that blends theoretical knowledge with practical examples. The first few chapters are more theoretical and provide the M language fundamentals. As the book progresses and your understanding deepens, we include projects and exercise files for you to work along with.

The Power Query M language is useful for a variety of users. Power BI users may use M to prepare their data model, Excel users may transform data and output it on their spreadsheets, while data factory users could use M to query an API and transfer the result into a database. In this book, we aim to cater to all these different audiences.

Although some subjects, such as creating custom connectors or optimizing performance, may appeal more to those with advanced knowledge due to their technical nature, we believe it’s beneficial for all users to understand the different ways the M language can be used.

Starting with the M language can be daunting, and mastering it took us several years. It’s a challenging journey that requires focus and a lot of experimentation. However, if you persevere, our book offers unprecedented depth of all aspects of M, allowing you to become a true M expert.

What this book covers

Chapter 1, Introducing M, introduces M language basics such as the let expression and also covers the history of M and the formal and informal characteristics of M. It also discusses who should learn M, why you should learn M, and where and how to use M.

Chapter 2, Working with Power Query/M, introduces the Power Query Editor, the main application for coding M. In addition, it allows you to transition from only using the Power Query Editor’s GUI to create M code to writing the code yourself via custom columns and the Advanced Editor.

Chapter 3, Accessing and Combining Data, explores the multitude of different data connectors available for M, including file and folder connectors, database and cube connectors, working with binary data, and finally, how to combine and merge data between queries.

Chapter 4, Values and Expressions in M, introduces the various kinds of values in the M language, as well as expressions, operators, control structures, and enumerations.

Chapter 5, Understanding Data Types, explores the importance of data types in the M language. It teaches you about their structure and their application in real-world scenarios. The chapter offers techniques to automatically detect data types and discusses why type conversion is important. Additionally, it presents the concept of facets and type ascription, which often lead to errors in your queries.

Chapter 6, Structured Values, covers some of the most important values in the M language,such as lists, records and tables. These values can store multiple values within them. You will learn the techniques for creating them, their typical uses, and the operators that work with structured values. Furthermore, the chapter explains how to access the individual values they contain.

Chapter 7, Conceptualizing M, explores some of the more abstract concepts of M that are critical to truly understanding how the M language works. The topics covered include the global environment, creating your own global environment, sections, closure, query folding, and metadata.

Chapter 8, Working with Nested Structures, presents several techniques to transform and manipulate structured values, such as lists, records, and tables, helping you move past the limitations of the GUI.

Chapter 9, Parameters and Custom Functions, begins by discussing the role of parameters in making queries flexible and dynamic. It then delves into how to create custom functions, covering their syntax, the importance of data types, and debugging techniques. Finally, you learn that the each expression is syntax sugar for a function.

Chapter 10, Dealing with Dates, Times, and Durations, explains how temporal analysis is key to unlocking many data insights and why you must be proficient in dealing with dates, times, and durations to perform such analyses. This chapter explores these subjects and more with numerous practical examples.

Chapter 11, Comparers, Replacers, Combiners, and Splitters, explores techniques for manipulating data. It demonstrates how to customize the way values are compared, ordered, replaced, combined, or split. These methods are essential for a wide range of common data transformation tasks.

Chapter 12, Handling Errors and Debugging, focuses on what errors are and provides guidance on how to handle them in the M language and debug your code successfully, enabling you to build more robust queries. In addition, it offers techniques to report errors.

Chapter 13, Iteration and Recursion, explains recursion through the use of the @ operator. The chapter then shifts focus to iteration techniques, using List.Transform, List.Accumulate, and List.Generate. You’ll learn about memory considerations and what aspects to consider for the best performance.

Chapter 14, Troublesome Data Patterns, illustrates the versatility of the M language, covering various common text extraction techniques as well as providing a comprehensive approach to building a manageable custom solution for dealing with files in bulk.

Chapter 15, Optimizing Performance, examines factors that influence query performance. It introduces mashup containers and the importance of memory management. We’ll delve into query folding, explore the formula firewall’s mechanics, and present various methods to improve query performance.

Chapter 16, Enabling Extensions, demonstrates how to extend the M language by creating your own, reusable library of M functions, including a detailed example of creating a custom data connector.

To get the most out of this book

We expect you to have a basic knowledge of Power Query and some experience with analyzing data. If you have experience with the M language, that is helpful to understand concepts more quickly. However, knowing M is not a requirement.

Throughout the book, there are references to SQL and Data Analysis Expressions (DAX) code. However, you don’t need to know these languages because the comparisons are simply meant to reflect on the different approaches between the languages. There’s no need to worry if you don’t understand a particular code snippet; that means the comparison is not as applicable to you.

In the more advanced sections of the book, we cover query folding, custom connectors, and memory usage. Some of you may not be familiar with these topics and that’s okay. However, this information is an important element of what the M language is used for and we think it’s good for everyone to read.

Download the example code files

The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/The-Definitive-Guide-to-Power-Query-M-/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/gbp/9781835089729.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example: “Navigate to the /ClientApp/src/app/cities folder.”

A block of code is set as follows:

#date( year as number, month as number, day as number, ) asdate

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are highlighted:

#date( year as number, month as number, day as number, ) asdate

Bold: Indicates a new term, an important word, or words that you see on the screen. For instance, words in menus or dialog boxes appear in the text like this. For example: “Navigate to the Home tab of the ribbon, click on the dropdown below the Transform data button, and select Edit parameters.”

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book’s title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you reported this to us. Please visit http://www.packtpub.com/submit-errata, click Submit Errata, and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Share your thoughts

Once you’ve read The Definitive Guide to Power Query (M), we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781835089729

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

1 Introducing M

M is a powerful and versatile formula language specifically designed for data manipulation and transformation. The term M is an informal designation. M’s official name is the Power Query Formula Language. For an explanation of this designation, see the History of M section later in this chapter. M is the language at the heart of Power Query, which is used in numerous applications like Microsoft Excel, Power BI, Power Platform, and Microsoft Fabric for data transformation and preparation.

The popularity of the M language has continuously grown over the last decade, and the language has been integrated into an impressive array of Microsoft tools and platforms. Today, M and Power Query are indispensable tools for modern data professionals such as business analysts, data scientists, and data enthusiasts.

This chapter is the beginning of your exciting journey that culminates in the mastery of the M language. We start with a brief history of M and then cover the basics of who, where, why, and how. Next, we introduce the absolute basics of the M language and finish with the formal and informal characteristics of M (in effect, what is M?). Overall, this chapter provides a firm foundation for the more in-depth exploration of the M language found throughout the rest of this book. Specifically, this chapter covers the following topics:

The history of MWho should learn M?Where and how Is M used?Why learn M?M language basicsThe characteristics of M

The history of M

The process of extracting, transforming, and loading data is a challenge as old as information technology itself. Both business users and IT professionals have historically struggled with the challenge, and numerous software tools have been developed over the years to help deal with the challenge such as SQL Server Integration Services (SSIS) and Alteryx.

However, many of these tools were complex and not easily portable. The M language and Power Query were created to help solve these issues.

While there might be some more speculative history regarding the origins of M, we can at least definitively trace M back to a project originally code-named Data Explorer. Data Explorer was an Azure SQL Labs project circa 2011 that aimed to simplify the process of accessing, cleaning, and preparing data from various sources. The query language was thought of as a mashup language (hence the M for mashup).

In 2013, Microsoft released Power Query as an add-in for Excel. Power Query introduced a user-friendly interface, allowing business users to perform data transformations via a visual editor. Behind the scenes, Power Query utilized the M language as the underlying formula language to drive the data transformations, and as such, these data transformations became repeatable. Instead of, for example, business users continually performing the same manual data transformations on source data received as comma-delimited files, that process could now be effectively automated.

Following the success of Power Query in Excel, Microsoft included Power Query as part of its new product, Power BI Designer, which eventually became Power BI Desktop. As Power Query gained popularity, there was a need to standardize the underlying formula language. In 2016, Microsoft submitted the Power Query Formula Language specification to the European Computer Manufacturers Association (ECMA), an international standards organization. This effort established a formal specification for the language, ensuring compatibility and interoperability between different implementations.

While the language was formally referred to as the Power Query Formula Language, it became commonly known as M among the user community. The informal name M gained widespread acceptance and is now widely used to refer to the language.

Microsoft continues to enhance and refine the M language as part of its ongoing investment in data integration and transformation technologies. New functions, features, and improvements are periodically introduced to provide users with more powerful and efficient ways to manipulate and prepare their data. In addition, Microsoft continues to introduce M within additional software tools and platforms, such as data integration within Microsoft Power Platform and dataflows within Power BI and Fabric.

Today, the M language is a key component of Microsoft’s data transformation and integration toolset. The proliferation of M, as well as its versatility and extensibility, make it an invaluable language for today’s modern data professionals.

Let’s now turn our attention to who should learn M.

Who should learn M?

M is a powerful tool for data professionals and individuals who work with data on a regular basis. The versatility and capabilities of M make it a valuable language to learn for various roles, including the following:

Data analysts: Data analysts who deal with data extraction, transformation, and preparation tasks can greatly benefit from learning M. It provides a comprehensive set of functions and operators that enable data analysts to efficiently shape and manipulate data from diverse sources. By mastering M, data analysts can automate repetitive tasks, handle complex data transformations, and ensure data quality, leading to more accurate and reliable data analysis.Business intelligence professionals: Professionals in the business intelligence (BI) space can greatly enhance their skills by learning M. It is a core component of Power BI. By understanding M, BI professionals gain the ability to connect to various data sources, perform complex data transformations, and create reusable data preparation workflows, enabling them to provide actionable insights and drive informed decision-making.Data engineers: Data engineers involved in the design and implementation of data pipelines and data integration processes can really benefit from learning M. It allows data engineers to efficiently extract, transform, and load (ETL) data from different sources into data warehouses or data lakes, specifically within Power BI and Microsoft Fabric. M also provides the flexibility and power to handle complex data formats, define custom transformations, and create efficient data processing workflows. By mastering M, data engineers can streamline data integration processes and ensure data consistency and quality.Data scientists: Data scientists who perform exploratory data analysis, model development, and advanced analytics can leverage the capabilities of M to efficiently prepare their data. M provides a robust set of functions for cleaning, shaping, and aggregating data, allowing data scientists to focus on the analytical aspects of their work. By incorporating M into their data preparation workflows, data scientists can streamline the pipeline of turning raw data into insights, spending less time on data cleansing and preparation and more time on data modeling and analysis.Power users: Power users in Excel and Microsoft’s Power Platform who work extensively with data and perform complex data manipulations can benefit from learning M. It is integrated into Excel through Power Query, empowering users to perform advanced data transformations within the familiar Excel interface. In addition, M is integrated into the Microsoft Power Platform via data integration, allowing data to be transformed and mapped between systems. By mastering M, Power users can expand their data manipulation capabilities, automate repetitive tasks, and enhance the accuracy and reliability of their analyses.Individuals in data-driven roles: Beyond the specific roles mentioned above, individuals in various data-driven roles, such as project managers, consultants, researchers, and domain experts, can benefit from learning M. Mastery of M provides the ability for individuals in these roles to independently handle data-related tasks, extract meaningful insights, and make informed decisions based on reliable data.

M is a valuable language for a wide range of data professionals and other individuals who work with data. Whether you are a data analyst, BI professional, data engineer, data scientist, power user, or someone in a data-driven role, learning M empowers you to efficiently ingest, transform, and prepare data for analysis.

Now that we understand what types of individuals would want to learn M, let’s next explore where those individuals can leverage their mastery of M.

Where and how is M used?

M is a versatile language included in various tools and platforms where data transformation and manipulation are essential. Its integration within the Power Query ecosystem enables users to leverage M’s capabilities in different environments. In this section, we explore some key areas where M is widely used.

Experiences

Before discussing specific products where M is used, it is important to understand the different experiences available for authoring M. There are two experiences available for authoring M, one intended for on-premises use and the other for cloud-based applications. These experiences are the following:

Power Query Desktop: Power Query Desktop is the experience for Power Query found in desktop applications such as Power BI Desktop and Microsoft Excel. While the experiences are similar, there are differences. For example, the artificial intelligence (AI) and machine learning (ML) integrations as well as the integrations with R and Python present in Power BI Desktop are not present in Microsoft Excel. Conversely, the Structured Column options available in Excel are not available in Power BI Desktop.Power Query Online: Power Query Online, a cloud-based service, allows users to create and manage data transformations within a web browser. M is used extensively in Power Query Online to define data transformations, connect to data sources, and perform complex data manipulations. Users can access and edit M queries directly within the browser interface, making it convenient to collaborate and work on data transformation tasks from anywhere with an internet connection. Power Query Online is integrated into a variety of Microsoft products, including the Power BI service, Power Apps, Power Automate, etc.

It is important to note that while two different experiences for authoring M exist, both provide nearly the exact same user experience. Even better, both provide the ability to edit the underlying M code, which is the primary focus of this book. Thus, the skills learned here apply equally to either experience used within any product or service.

Products and services

M is ubiquitous within the Microsoft ecosystem, including the following software and services:

Dataflows: Dataflows are product-agnostic, cloud-based M queries that can be reused across multiple different products. Dataflows enable users to build and manage reusable data preparation and transformation processes. Dataflows leverage the Power Query Online experience.Power BI Desktop: M is a fundamental component of Power BI Desktop, a leading BI tool. M allows users to connect to different data sources, perform data transformations, and create interactive visualizations and reports.

M enables users to extract, clean, and shape data from diverse sources, such as databases, Excel files, web services, and more. With M, users can define data transformation steps and create reusable queries that refresh and update data automatically when the underlying source changes.

Within Power BI Desktop, M is used within the Power Query editor, a sub-program launched from within Power BI Desktop. The Power Query editor provides a powerful graphical user interface (GUI) for working with the M formula language, as shown in the following screenshot:

Figure 1.1: Power Query editor in Power BI Desktop

The Power Query editor is covered in greater detail in Chapter 2, Working With Power Query/M.

Power BI Desktop also supports the use of dataflows.

Power BI/Fabric service: The Power BI/Fabric service (powerbi.com) is the cloud-based component of Power BI that enables you to share reports, dashboards, and other content. The service supports the use of M code via the creation of dataflows, using the Power Query Online experience.

To create a dataflow in the Power BI service, navigate to any workspace other than My Workspace and choose New and then Dataflow, as shown in Figure 1.2:

Figure 1.2: Create a dataflow in the Power BI service

Power BI Report Server: Power BI Report Server (PBRS) supports the Power Query Desktop experience, allowing users to create rich data transformations via M.Excel (Windows and Macintosh): M is seamlessly integrated into Excel, empowering users to perform advanced data transformations within the familiar Excel interface. Power Query, the engine behind Excel’s data transformation capabilities, is powered by M. Users can access the Power Query editor in Excel to apply M transformations, filter and sort data, remove duplicates, merge and append tables, and perform other data preparation tasks. M allows users to clean, reshape, and enrich data in Excel, enhancing the accuracy and reliability of their analyses.

In Excel, the Power Query editor interface can be accessed by using the Data tab of the ribbon and choosing Get Data:

Figure 1.3: Get Data in Microsoft Excel

Once the data source is chosen, the Power Query Editor interface can be accessed by choosing the Transform Data button:

Figure 1.4: Transform Data option in Microsoft Excel

Both the Windows and Macintosh versions of Excel also support accessing and using dataflows.

Power Apps: Power Apps is Microsoft’s low-code platform for creating applications. Both the Power Query Online experience as well as the use of dataflows are supported. A common use case is to leverage M either via the Power Query Online experience or dataflows, allowing users to seamlessly bring their data into the Dataverse (formally Common Data Service).Power Automate: Power Automate is Microsoft’s low code platform for automating workflows. Power Automate allows users to automate repetitive workflows and processes that may involve data manipulation and integration tasks. M can be employed within Power Automate to perform data transformations and handle complex data scenarios as part of the automated workflows, via the Power Query Online experience. By incorporating M into Power Automate, users can build sophisticated data integration and automation solutions that streamline their business processes. In addition, dataflows can be leveraged in Power Automate via Power Query Dataflows connector. This allows actions to occur once a dataflow completes and also provides the ability for a dataflow to be initiated as an action within a Power Automate flow.Data Factory: Data Factory is a managed cloud service specifically built for complex extract-transform-load (ETL) and extract-load-transform (ELT) integration projects. Data Factory allows the creation and orchestration of data-driven workflows, data movement, and transformation at scale. Both Azure Data Factory and Data Factory in Microsoft Fabric support M code, via both the Power Query Online experience as well as dataflows.SQL Server: SSIS supports the core M engine while SQL Server Analysis Services (SSAS) supports the Power Query Desktop experience.Dynamics 365 Customer Insights: Customer Insights within Dynamics 365 is Microsoft’s customer data platform (CDP) that provides a holistic view of customers, enabling personalized customer experiences. Customer Insights supports both dataflows as well as the Power Query Online experience.Visual Studio: Visual Studio allows M to be integrated as a language. This is done via the Power Query Language Service for Visual Studio Code and is available in the Visual Studio Code Marketplace. This language service provides fuzzy autocomplete, hover, function hints, and other functionality for writing M code within Visual Studio.

There is also the Visual Studio Power Query Software Development Kit (SDK). This SDK consists of a set of tools designed to help create custom Power Query data source connectors. The Visual Studio Power Query SDK is covered in greater detail in Chapter 16, Enabling Extensions.

Other data integration scenarios: M is not limited to the aforementioned software and services. M can also be leveraged in custom applications and programming environments that utilize Power Query libraries.

As you can see, M is widely used in different tools and platforms within the Microsoft ecosystem, such as Power BI Desktop, Excel, the Power BI and Fabric service, Power Platform, SQL Server, and Dynamics. M enables users to connect to various data sources, perform advanced data transformations, and automate data integration workflows. The skills learned in this book deal with the M language itself and, thus, transcend both the experience as well as the specific product or service. Thus, by mastering M, users gain the ability to create reusable data transformation processes and enhance their data manipulation capabilities across a wide range of data-related scenarios, as well as across any experience, product, or service that uses M as its underlying data transformation layer.

Let’s now turn our attention to why data professionals and other individuals might want to add M to their repertoire of language.

Why learn M?

In today’s data-driven world, efficiently and effectively transforming and analyzing data is a valuable skill. Power Query, a powerful data transformation and preparation tool, gained immense popularity due to its seamless integration with many popular software systems as well as its ease of use. At the heart of Power Query lies M, the Power Query Formula Language. But you may be asking yourself, why should you invest the time in order to learn M?

Here are seven reasons why we believe data professionals and other individuals should learn M:

Tapping into the full power of Power Query: In Gil Raviv’s book, Collect, Combine, and Transform Data Using Power Query in Excel and Power BI, Mr. Raviv estimates that the GUI for authoring M queries (see Where and how Is M used? in this chapter) allows you to solve only 40% of challenges related to data transformation, but mastery of M allows you to bring that figure closer to 99.99%. Later chapters demonstrate specific examples of solving data transformation challenges that cannot be done in the GUI. Since M serves as the backbone of Power Query’s data transformation capabilities, by mastering M, you gain full control over the data transformation process, allowing you to extract, clean, transform, and reshape data from diverse sources.Automation of repetitive tasks: One of the primary reasons you should learn M is to automate repetitive data transformation tasks. Business and IT professionals are often tasked with receiving data on a recurring basis and then generating reports based on this data. Instead of manually transforming this data each time (often in Excel) to prepare it for reporting purposes, leveraging M for this data transformation allows the data transformation logic to be implemented once and then automatically run each time new data is received.Flexibility and customization: While Power Query provides a user-friendly interface for data transformation tasks, it does have its limitations. By learning M, you can extend the capabilities of Power Query and overcome these limitations. M allows you to write custom functions, perform advanced transformations, and apply complex logic that goes beyond the built-in capabilities of the Power Query interface. This flexibility empowers you to tailor your data transformations precisely to meet the unique requirements of your data sources and analysis.Efficiency and performance optimization: M is a highly efficient and optimized language for data transformations. The Power Query engine intelligently processes M expressions, optimizing performance by reducing unnecessary data loads and transformations. When working with large datasets or complex transformations, knowing M enables you to write efficient code that significantly speeds up your data processing, as demonstrated in Chapter 15, Optimizing Performance. By understanding the underlying principles of M and its performance considerations, you can optimize your data workflows and save valuable time. Finally, leveraging M can greatly reduce and simplify the formula and Data Analysis Expressions (DAX) code in downstream applications like Excel and Power BI Desktop.Advanced data cleaning and transformation: M provides a comprehensive set of data cleaning and transformation functions that go far beyond the basic operations available in traditional spreadsheet applications. With M, you can easily handle data quality issues, such as removing duplicates, handling missing values, splitting columns, merging sets of data, and performing advanced calculations. Learning M enables you to tackle complex data cleaning and transformation tasks efficiently, leading to accurate and reliable data analysis.Integration with other programming languages: M is not only a standalone language but also integrates well with other programming languages such as SQL, R, and Python. This integration allows you to leverage the capabilities of these languages within your Power Query workflows. You can combine M code with native SQL queries, call R or Python scripts, and seamlessly incorporate external libraries and functions into your data transformation process. By expanding your knowledge to include M, you unlock the potential to leverage the best features of various programming languages for data manipulation.Career advancement: Proficiency in M and Power Query has become a sought-after skill in the data industry. As organizations increasingly rely on data for decision-making, individuals who possess the ability to efficiently transform, clean, and analyze data are in high demand. By investing time and effort in learning M, you position yourself as a valuable asset to organizations that rely on data-driven insights. The knowledge of M can open up new career opportunities, enhance your job prospects, and enable you to take on challenging data-related projects.

In summary, learning M enables you to efficiently ingest, transform, and analyze data from diverse sources. It provides flexibility, customization, and performance optimization capabilities that extend the functionality of Power Query itself. By mastering M, you gain a competitive edge in the data industry and open doors to new career possibilities.

We hope that you are now excited about learning M! Let’s turn our attention to the basics of the M language.

M language basics

As previously noted, M is a powerful language designed for data ingest and transformation within a variety of Microsoft software and services. Understanding the basics of the M language is essential for effectively leveraging its capabilities.

Here are some important fundamentals regarding the M language:

Expressions and functions: In M, expressions form the building blocks of data transformations. An expression represents a computation or operation that evaluates to a value. M provides a wide range of built-in functions that can be used to perform operations on data. Functions in M are called using a syntax where the function name is followed by arguments within parentheses. For example, the function Text.Start("Hello, World!", 5) returns the substring Hello from the input text. More about expressions and functions are covered in Chapter 4, Understanding Values and Expressions, as well as Chapter 9, Parameters and Custom Functions.Data types: M supports various data types, including text, numbers, dates, times, lists, tables, and records. Understanding the data types in M is crucial for performing accurate transformations. M provides functions to convert between different data types and manipulate data, based on their inherent characteristics. For example, the Text.From function converts a value to text, while the Date.Year function extracts the year component from a date or datetime value. Data types are covered in Chapter 5, Understanding Data Types.Variables and constants: M allows you to define variables and constants to store and reuse values during data transformations. Variables are created within a let expression, followed by a comma-separated list of variable assignments. Constants, on the other hand, are fixed values that remain constant throughout the execution. Variables and constants help improve code readability, enable reuse, and make complex transformations more manageable. More about variables and constants can be found throughout this book.Operators: M supports a variety of operators to perform mathematical calculations, logical comparisons, and text manipulations. Arithmetic operators (+, -, *, /, and so on) are used for numeric calculations, while comparison operators (>, <, , and so on) evaluate logical conditions. The combination operator ‘&' is used for concatenating text values, appending lists and tables, or merging records. Operators are covered in Chapter 4, Understanding Values and Expressions.Step-by-step transformation process: M follows a step-by-step transformation process where each step defines a data transformation operation. The Power Query editor provides a visual interface to define these steps and generates the corresponding M code. Steps can include operations such as filtering rows, removing duplicates, splitting columns, merging tables, and aggregating data. Chapter 2, Working with Power Query/M, covers this topic in more detail.Query folding: Query folding is an optimization technique in Power Query that pushes data transformations to the data source whenever possible. When using M, it is important to be aware of query folding to ensure efficient data processing. Query folding can improve performance by reducing data transfer between the data source and Power Query. However, not all transformations can be folded, so it is essential to understand which operations can be folded and which cannot. For example, when using Direct Query or Dual storage mode for tables, all M queries must fold, which can limit certain transformation operations. Query folding is discussed in Chapter 7, Conceptualizing M, and in Chapter 15, Optimizing Performance.Error handling and debugging: M provides error handling mechanisms to catch and handle exceptions during data transformations. By using functions like try, otherwise, and error

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: