Learn Microsoft Fabric - Arshad Ali - E-Book

Learn Microsoft Fabric E-Book

Arshad Ali

0,0
35,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Discover the capabilities of Microsoft Fabric, the premier unified solution designed for the AI era, seamlessly combining data integration, OneLake, transformation, visualization, universal security, and a unified business model. This book provides an overview of Microsoft Fabric, its components, and the wider analytics landscape.
In this book, you'll explore workloads such as Data Factory, Synapse Data Engineering, data science, data warehouse, real-time analytics, and Power BI. You’ll learn how to build end-to-end lakehouse and data warehouse solutions using the medallion architecture, unlock the real-time analytics, and implement machine learning and AI models. As you progress, you’ll build expertise in monitoring workloads and administering Fabric across tenants, capacities, and workspaces. The book also guides you step by step through enhancing security and governance practices in Microsoft Fabric and implementing CI/CD workflows with Azure DevOps or GitHub. Finally, you’ll discover the power of Copilot, an AI-driven assistant that accelerates your analytics journey.
By the end of this book, you’ll have unlocked the full potential of AI-driven data analytics, gaining a comprehensive understanding of the analytics landscape and mastery over the essential concepts and principles of Microsoft Fabric.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 317

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Learn Microsoft Fabric

A practical guide to performing data analytics in the era of artificial intelligence

Arshad Ali

Bradley Schacht

Learn Microsoft Fabric

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kaustubh Manglurkar

Publishing Product Manager: Deepesh Patel

Book Project Manager: Hemangi Lotlikar

Senior Editor: Rohit Singh

Technical Editor: Kavyashree K S

Copy Editor: Safis Editing

Proofreader: Safis Editing

Indexer: Subalakshmi Govindhan

Production Designer: Vijay Kamble

Developer Relations Marketing Executive: Nivedita Singh

First published: February 2024

Production reference: 1270223

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-83508-228-7

www.packtpub.com

This book is dedicated to my late parents, Mrs. and Mr. Md Azal Hussain, who gave me life and shaped me into who I am today. I wish they could witness this achievement, but I know they are proud of me wherever they are, and I am grateful for their love. I also dedicate this book to my wife, Shazia Arshad Ali, who has been my constant support and inspiration in everything I do, and who encouraged me to take on the challenge of writing this book; my daughter, Aidha, who is only eight years old but has been my accountability partner (keeping me in check to ensure that I finished chapters on time) and is eager to read this book; and my son, Sameed, whose smile and laughter have energized me throughout this journey.

– Arshad Ali

To my wife, Nichole, for loving me so well for the last 2,167 days. There is no one I would rather do life with than you.

– Bradley Schacht

Contributors

About the authors

Arshad Ali is a principal product manager at Microsoft, working on the Microsoft Fabric product team in Redmond, WA. He focuses on Spark Runtime, which empowers both data engineering and data science experiences. In his previous role, he helped strategic customers and partners adopt Azure Synapse and Microsoft Fabric.

Arshad has more than 20 years of industry experience and has been with Microsoft for over 16 years. He is the co-author of the book Big Data Analytics with Azure HDInsight and the author of over 200 technical articles and blogs on data and analytics. Arshad holds an MBA from the Foster School of Business at the University of Washington and an MCA from India.

This book would not have been possible without the support of our colleagues at Microsoft and our friends, who have been an inspiration and have supported us in writing this book, directly or indirectly. We are truly grateful to you all for your support and the opportunity you have given us to learn and grow. Thank you, Brad, for being an amazing co-author! We would also like to thank the entire Packt team, especially Deepesh, Rohit, Hemangi, and others, for turning our proposal from a dream into a reality, and Kay Sauter for reading the entire draft of the book and providing very helpful feedback and suggestions on it.

Bradley Schacht is a principal program manager on the Microsoft Fabric product team based in Saint Augustine, Florida. Bradley is a former consultant and trainer and has co-authored five books on SQL Server and Power BI. As a member of the Microsoft Fabric product team, Bradley works directly with customers to solve some of their most complex data problems and helps shape the future of Microsoft Fabric. Bradley gives back to the community by speaking at events, such as the PASS Summit, SQL Saturday, Code Camp, and user groups across the country, including locally at the Jacksonville SQL Server User Group (JSSUG). He is a contributor on SQLServerCentral.com and blogs on his personal site, BradleySchacht.com.

I give thanks to God, for all the blessings He has given me and who I would be lost without. Thanks to my amazing, beautiful wife, Nichole, and our crazy boys, Oliver and Levi, for their unending love and support. It means more than you could possibly imagine. To Arshad, my co-author, who worked hard to make this book happen. Lastly, to Han and Chewie – you have inspired me and given me the confidence to one day build the LEGO Millennium Falcon.

About the reviewer

Kay Sauter is a senior data engineer. He has over 10 years of experience working with SQL Server and holds a Certificate of Advanced Studies in Applied Data Analytics, Machine Learning, and Customer Intelligence from HWZ University of Applied Sciences in Business Administration, Zurich, Switzerland. He has also been a Microsoft MVP for the data platform since 2022. Kay shares his knowledge and insights on his blog and his newsletter on LinkedIn. He runs the Azure Data user group Data TGIF, and he is a co-organizer of the online live conference DATA BASH and the user group DEI Virtual Group. He is married and is based near Zurich, Switzerland.

Table of Contents

Preface

Part 1: An Introduction to Microsoft Fabric

1

Overview of Microsoft Fabric and Understanding Its Different Concepts

Introduction to Microsoft Fabric

Reviewing the core capabilities of Microsoft Fabric

Complete analytics platform

Lake-centric and open

Empower every business user

AI powered

Unified business model with universal compute capacity

Summary

2

Understanding Different Workloads and Getting Started with Microsoft Fabric

Getting started with Microsoft Fabric

Enabling Microsoft Fabric

Checking your access to Microsoft Fabric

Creating your first Fabric-enabled workspace

Data Factory

Pipelines

Activities

Connections

Dataflow Gen2

Loading data

Data engineering

Lakehouse

Spark Job Definition

Data Warehouse

Simplifying the Data Warehouse experience

Open and lake-centric

Combining the lakehouse and data warehouse

Loading data

Querying the warehouse

Data Science

SynapseML

MLflow integration

FLAML integration for automated ML (AutoML)

Data Wrangler

Semantic Link

Real-Time Analytics

Eventstreams

KQL databases

KQL queryset

Power BI

Reports

Datasets

Direct Lake

Summary

Part 2: Building End-to-End Analytics Systems

3

Building an End-to-End Analytics System – Lakehouse

Technical requirements

Understanding end-to-end scenarios

Understanding the end-to-end architecture

Understanding sample data and data models

Understanding data and transformation flow

Storage

Ingestion

Transformation

Importing notebooks

Creating a shortcut (for Files): Silver zone

Opening notebook and executing commands (loading to the Silver zone)

Incremental data load

Creating a shortcut (for Tables): Gold zone

Creating business aggregates for the Gold zone

Analyze

Power BI

SQL endpoint

Orchestrate data ingestion and transformation flow and schedule notebooks and pipelines

Data meshes in Fabric – a primer

Summary

4

Building an End-to-End Analytics System – Data Warehouse

Understanding the end-to-end scenario

Data and transformation flow

Creating a data warehouse

Creating tables in a data warehouse

Loading data

Loading data using the copy activity in Data Factory

Loading data using T-SQL

Data transformation using T-SQL

Orchestrating ETL operations with Data Factory pipelines

Analyzing data with Power BI

Summary

5

Building an End-to-End Analytics System – Real-Time Analytics

Understanding the end-to-end scenario

Creating a Kusto Query Language (KQL) database

Capturing and delivering data using eventstreams

Analyzing data with KQL

Reporting with Power BI

Creating a new Power BI report

Adding visualizations to the Power BI report

Configure page refresh on the Power BI report

Summary

6

Building an End-to-End Analytics System – Data Science

Technical requirements

End-to-end data science scenario

Data and storage – creating a lakehouse and ingesting data using Apache Spark

Importing notebooks

Problem formulation/ideation (business understanding)

Semantic Link

Data acquisition, discovery, and preprocessing

Data acquisition

Data discovery

Data preprocessing

Data Wrangler

Experimenting and modeling

Training – version 1

Training – version 2

AutoML with FLAML

Enriching and operationalizing

Analyzing and getting insights

Summary

Part 3: Administration and Monitoring

7

Monitoring Overview and Monitoring Different Workloads

Technical requirements

Overview of monitoring capabilities in Fabric

Monitoring Data Factory pipelines and dataflows

Monitoring Spark jobs (data engineering and data science)

Monitoring data warehouse activity

Monitoring Real-Time Analytics activity

Monitoring eventstreams

Monitoring KQL databases

Monitoring capacity usage with the Microsoft Fabric Capacity Metrics app

Summary

8

Administering Fabric

Enabling Microsoft Fabric in your tenant

What are capacities?

Managing Fabric capacities

Managing Spark job configurations

Starter pools

Custom Spark pools

Spark runtime

High concurrency

Automatically tracking machine learning experiments and models

Spark properties/configuration

Library management

Auto-tune

Spark utility (MSSparkUtils)

Summary

Part 4: Security and Developer Experience

9

Security and Governance Overview

Securing the Microsoft Fabric platform

Guest users

Conditional access

Securing Microsoft Fabric workspaces and items

Workspace-level permissions

Item-level permissions

Understanding governance and compliance in Microsoft Fabric

Domains

Microsoft Purview

Summary

10

Continuous Integration and Continuous Deployment (CI/CD)

Technical requirements

Understanding the end-to-end flow

Connecting to a Git repo with Azure DevOps

Working on a new feature or release

Creating and executing a deployment pipeline

Managing database code for a Fabric data warehouse

Managing database code with the SQL Database Projects extension

Summary

Part 5: AI Assistance with Copilot Integration

11

Overview of AI Assistance and Copilot Integration

Technical requirements

What is Copilot in Fabric?

Copilot in data engineering and data science

Copilot in Data Factory

Copilot in Power BI

Creating reports with the Power BI Copilot

Creating a narrative using Copilot

Generating synonyms with Copilot

Summary

Index

Other Books You May Enjoy

Part 1: An Introduction to Microsoft Fabric

This part of the book introduces the overall analytics landscape of Microsoft Fabric, explaining its different concepts and how Fabric has leaped ahead of other products/platforms available on the market with its unique differentiators. Further, it talks about the different types of real use cases that can be implemented quickly and easily and gets you started on your journey.

This part contains the following chapters:

Chapter 1, Overview of Microsoft Fabric and Understanding Its Different ConceptsChapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric

1

Overview of Microsoft Fabric and Understanding Its Different Concepts

As data volume and complexity grow, organizations across every industry have opportunities to harness data to digitally transform themselves by exploiting its power and gaining competitive advantages. However, these organizations have to manage and stitch together different specialized and disconnected products to build their end-to-end analytics system. As a result, they end up incurring high integration costs when ensuring these products function together as one analytics system. This often results in delays in obtaining insights to the extent that the information is no longer relevant.

This chapter will introduce you to Microsoft Fabric, its core capabilities, and how it addresses the challenges of modern data analytics.

Here is what will be covered in this chapter:

Introduction to Microsoft FabricReviewing the core capabilities of Microsoft FabricAn understanding of Microsoft Fabric as a complete platform for different types of workloads that are natively integrated for different, real use casesHow the platform empowers everyone in the organization to become part of the data-driven culture and how its Copilot integration increases productivityAn understanding of Microsoft Fabric as a unified business model with universal compute capacity

By the end of this chapter, you will have a high-level understanding of Microsoft Fabric, its core capabilities, and how it solves the long-standing challenges faced by data analytics.

Introduction to Microsoft Fabric

Microsoft Fabric is an end-to-end, all-in-one unified analytics platform that brings together all the data and analytics tools that organizations need. As a single, unified platform for data management, data lakes, data integration, data engineering, data warehousing, data science, real-time analytics, and business intelligence, it has been designed from the ground up to help organizations simplify their analytics workloads, reduce costs, and reduce the time taken to obtain insights in this era of AI. Microsoft Fabric is built on Azure, and it leverages the power of Azure’s computing, storage, reliability, security and governance, scale, performance, and networking services.

Reviewing the core capabilities of Microsoft Fabric

Microsoft Fabric is designed for the age of AI and is delivered as a single Software-as-a-Service (SaaS) product that provides auto-integration, auto-optimization, common architecture, central security and governance, a unified business model, and Office-like experiences across all workloads.

There are four core pillars of Microsoft Fabric:

Figure 1.1 – Microsoft Fabric’s core capability pillars

Let’s review each pillar in detail in the next subsections:

Complete Analytics PlatformLake-centric and openEmpower Every Business UserAI Powered

Complete analytics platform

While there is a standard pattern (for data warehouses or lakehouses) for typical analytics systems with well-defined components (such as ingestion, processing, and consumption), each of the components might need a different array of capabilities and might be well-served by a different class of products. Often, these products come from multiple vendors, and even if they come from a single vendor, it is too complex, expensive, time-consuming, and fragile to integrate them together because of the lack of native integration among them.

Microsoft Fabric is a single unified product that takes care of everything an organization (from departmental stores to large enterprises) needs to build an analytics system—all the way from ingesting data from different types of data sources to transforming at the correct scale using familiar utilities (SQL and Apache Spark)—serving this to business users with industry-leading Power BI, with shared security and governance that works across all the components in a cohesive manner.

Best-of-breed engines and capabilities

With Microsoft Fabric, you can use a single product with best-of-breed engines and capabilities, as shown in Figure 1.2. It offers a unified experience and architecture that provides all the capabilities required for an architect and developer to integrate data from different types of sources (on-premises or in the cloud), apply any necessary transformations using the tools and languages of their choice, derive insights, and present this to the business users. Moreover, by delivering the experience with SaaS as the foundation, everything is automatically integrated and optimized. This means that users can sign up for Microsoft Fabric within seconds and start getting real business value within minutes.

Microsoft Fabric empowers every team in the analytics process with the role-specific experiences they need so that data engineers, data warehousing professionals, data scientists, data analysts, and business users feel right at home as they work on the same single copy of the data and leverage the work of their colleagues.

Figure 1.2 – A complete analytics platform

As a complete platform, as shown in Figure 1.2, when it comes to building end-to-end analytics systems, Microsoft Fabric provides seven core workloads, which are natively integrated so that you can focus on generating business value for your organization rather than spending time on integrating different pieces together:

Data Factory: You can use Data Factory in Microsoft Fabric to build your data integration component for your analytics system. It combines the scale and power of Azure Data Factory, which gives you power and control by using Power Query, giving the user an intuitive UI-based experience to build integration flow and pipelines easily and quickly. You will be able to leverage 150+ native connectors to connect to data sources on-premises and in the cloud with drag-and-drop experiences for data transformation, as well as the ability to orchestrate data pipelines with ingestion and transformation tasks. You will learn more about Data Factory in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, Chapter 3, Building an End-to-End Analytics System – Lakehouse, and Chapter 4, Building an End-to-End Analytics System – Data Warehouse.Synapse Data Engineering: Data Engineering provides the world-class Apache Spark platform with a great authoring experience that empowers data engineers to work and collaborate in transforming data at scale and democratizing data through lakehouses. With an instant starter pool, your Spark session gets created right away within seconds, instead of waiting for Spark to set up the nodes for you, which helps you do more with data and obtain insights as quickly as possible. Spark’s integration with Data Factory in Fabric will enable notebooks and Spark jobs to be scheduled and orchestrated as part of the overall data pipelines. You will learn more about Data Engineering in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, and Chapter 3, Building an End-to-End Analytics System – Lakehouse.Synapse Data Science: Data Science enables you to build, train, deploy, and operationalize machine learning and AI models directly within the Fabric experience. It natively integrates with Azure Machine Learning to provide users with built-in experiment tracking, execution tracking, model registry, and so on. Data scientists will be empowered to work and collaborate to enrich organizational data for predictions and allow business analysts to incorporate these into Power BI reports, easily shifting insights from descriptive (analytics based on historical data) to predictive analytics (to predict patterns for future outcomes). You will learn more about Data Science in Chapters2–6.Synapse Data Warehousing: Data Warehousing brings together the best of lakehouse and data warehouse experiences with the industry-leading SQL performance and scale needed for your data warehouse based on relational Massively Parallel Processing (MPP) engine. It fully separates computing from storage, both of which can be independently scaled, and unlike traditional on-premises relational database platforms, it natively stores data in the open source Delta Lake (https://delta.io/) format for greater interoperability. You will learn more about Data Warehousing in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, and Chapter 4, Building an End-to-End Analytics System – Data Warehouse.Synapse Real-Time Analytics: Real-Time Analytics enables developers to work with data streams coming in from Internet of Things (IoT) devices, telemetry, logs, and more, and it helps to analyze massive volumes of semi-structured data (for example, JSON and text). Streaming data often occurs at high volumes and with shifting schemas and requires high performance and low latency for processing and utilization. You will learn more about Real-Time Analytics in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, and Chapter 5, Building an End-to-End Analytics System – Real-Time Analytics.Power BI: Power BI is an industry-leading data reporting and visualization product. The native integration of Power BI with other Fabric capabilities under a single unified platform provides a Business Intelligence (BI) platform for data reporting and visualization, which enables business analysts and business users to gain insights from data quickly and intuitively to make better decisions. Power BI is also natively integrated into Microsoft 365 and opens up the possibility of providing relevant insights to business users when using familiar tools such as Microsoft Excel and Microsoft Teams. You will learn more about Power BI in Chapter 2, Understanding Different Workloads and Getting Started with Microsoft Fabric, Chapter 3, Building an End-to-End Analytics System – Lakehouse, and Chapter 4, Building an End-to-End Analytics System – Data Warehouse.Data Activator: Data Activator provides the real-time automated detection and monitoring of data (all the way from relatively slow-moving data in warehouses to real-time streaming data in lakehouses or from messaging queues) and can trigger notifications and required actions when it finds specified patterns in data—all within a no-code experience.

Important note

Delta Lake is an open-format storage layer that brings Atomicity, Consistency, Isolation, and Durability (ACID) transactions to Apache Spark and big data workloads. You can learn more about Delta Lake here: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-what-is-delta-lake.

SaaS

The architecture of Microsoft Fabric is based on a SaaS foundation, as shown in Figure 1.3, instead of a traditional Platform as a Service (PaaS) to take simplicity and integration to the next level. This doesn’t mean that you have any less functionality; you will still have complete control over your data and experience, as is the case with all Azure PaaS services.

Figure 1.3 – SaaS-based intelligent data foundation

However, having this common SaaS foundation across all the previously discussed Fabric workloads means that some things change for the better, including the following:

Frictionless Onboarding: This works by default and offers a smoother experience by simplifying things such as configuration overhead.Simple Onboarding and Trials: This gets you started in seconds if you just want to kick the tires:Uses a single sign-on/sign-in once and works across all the workloads seamlessly.Fast provisioning and automatic scaling. For example, a data warehouse takes about 10–20 seconds to spin up rather than the 10+ minutes that it takes today. Likewise, Spark pools come online in less than 15 seconds rather than 3+ minutes today in Azure Synapse.Performance by Default: It lets you focus on creating business values:You have fewer knobs to tune because the best practices are implemented automatically. For example, for SQL, things such as stats are always being kept up to date, and for Spark, things such as Spark session-level configuration are auto-configured as your job progresses.Fabric workloads are auto-integrated and seamlessly work when you switch contexts. For example, you can use Apache Spark as part of the Data Engineering workload to create a table. Next, you can reference the same table in the data warehouse with your SQL queries or reference the same table in Power BI for reporting without moving data.All assets are easily discovered and reused by all developers across all the workloads. For example, you will be able to browse, work, and collaborate on the same coding artifacts as your colleagues.A unified data lake allows customers to keep the data where they are while using any analytics tools of their choice based on their experience and preference. For example, SQL developers can continue to use SQL for data warehouses, and Spark developers can use Spark-supported languages, all while working together on a single copy of the underlying data.Centralized Administration: Offers a simplified experience:For centralized administration, you have Fabric tenant-wide governance and control across all workloads.With the OneSecurity feature, you can centrally define security policy once within one place, which will then be honored by all the workloads across all the engines in Fabric.Centrally monitor and manage every aspect of all jobs submitted from all the workloads or engines.Centrally monitor capacity metrics to understand the resource consumption used by each of the workloads and jobs submitted. This also helps you with the charge-back type scenario.

Persona-optimized experiences

Each Fabric workload targets a specific persona as part of the persona-optimized experience, as shown in Figure 1.4 and listed here:

Data EngineersData ScientistsData Warehouse DevelopersReal-Time Analytics DevelopersPower BI DevelopersData Integration (or ETL) Developers

Figure 1.4 – Persona-optimized experience

This experience reduces the noise and quickly surfaces the most relevant information for what you need to get done. Selecting an experience, such as Synapse Data Warehouse, will bring you to a screen that shows the common tasks and resources that a user working on a data warehouse would find useful; in other words, the ability to create a new warehouse, create a new data pipeline, and link these directly to get started with Data Warehouse documentation. Likewise, if you switch to Data Engineering, you will find options to create a lakehouse, create a notebook to write your Spark application, import existing notebooks, and create a Spark Job Definition, as well as a direct link to the “Getting started with Data Engineering” documentation.

Lake-centric and open

Today, in most analytics systems (if not all), data resides in silos across different systems and storage (for example, some data exist in a lakehouse, whereas other data exist in a data warehouse in its own proprietary format). This not only means you have data duplicity occurring at multiple layers, but it also adds so much complexity (with respect to time, cost, and resources) to maintaining it over time to keep it up to date.

Fabric solves this problem with OneLake, as shown in Figure 1.5. When you create a Fabric tenant, OneLake is automatically provisioned and preconfigured. This OneLake logically provides a single, unified, multi-cloud data lake for the whole organization. As an analogy, you can think of OneLake as OneDrive (which comes with Microsoft 365) but for your data.

Figure 1.5 – OneLake is shared by all the engines

OneLake and OneCopy

OneLake is based on a SaaS foundation, and underneath, it is built on top of Azure Data Lake Storage (ADLS) Gen2, supporting any type of file whether structured, semi-structured, or unstructured. OneLake supports the same ADLS Gen2 Application Programming Interfaces (APIs) and Software Development Kits (SDKs) that are compatible with existing ADLS Gen2 applications, including Azure Databricks. This means that you can continue to leverage all your existing investments and integrations with other services that you’ve spent years building. You can think of all your data stored in OneLake as being stored underneath in a big storage account for the Fabric tenant. For every workspace that you create, a container appears in that storage account. Furthermore, any data items you store are stored in a folder hierarchy within those containers.

OneLake gives customers the ability to store one copy of their data for use with multiple analytics engines, which means all the analytics engines can access each other’s data. All the architects out there are probably thinking, “Does this mean I don’t have to do all that data copying to give my SQL people access to the data in my lake? And does that mean I don’t have to copy the data from my warehouse to a dataset in Power BI to be able to report on it?” Yes, that is correct. Along with OneLake comes the concept of OneCopy, which means every Fabric computing engine can see and interact with all the data.

OneLake eliminates today’s pervasive and chaotic data silos created by different developers by provisioning and configuring their own isolated storage accounts. Instead, OneLake provides a single, unified storage system for all developers, where the discovery and sharing of data is trivial, and compliance with policy and security settings is enforced centrally and uniformly across all the engines.

Additionally, regarding the OneSecurity feature, you can centrally define security policy once in one place, which will be honored by all the workloads across all the engines in Fabric.

Open format

In addition to OneLake, there is one other key change in Microsoft Fabric that enables all the functionalities in Fabric: standardization for the open source Delta Lake format. Every computing engine in Fabric now, by default, reads and writes data in the Delta Lake format. This standardization of the data on a single yet popular open format provides a way to prevent the need to copy data from one computing engine to another, for example, from a lakehouse into a warehouse or vice versa. All tabular data are stored in OneLake in an open source Delta Lake format when stored in the tables section of the lakehouse.

Important note

In the world of big data, Delta Lake is one of the preferred methods for storing data. With its vibrant and active community, new releases come out frequently and have better read and write performance and newer features for wider adoption and support.

Shortcut

The shortcut feature in OneLake is a reference to the data stored in other file locations; this makes data sharing as simple and easy as sharing files in OneDrive, removing the need for data duplication. As shown in Figure 1.6, shortcuts also allow for the instant linking of the data that already exists in Azure and in other clouds without any data copying and movement beforehand, making OneLake the first multi-cloud data lake. These file locations can be found within the same workspace or across different workspaces, for example, any lakehouses within the current OneLake or storage accounts that are external to the current OneLake asset in ADLS or Amazon S3. No matter where the location is, the reference makes it appear as though the files and folders are stored locally. In this way, it creates a data abstraction and data virtualization layer for all the data in your organization.

Figure 1.6 – OneLake shortcut