28,99 €
Microsoft's Azure Data Fundamentals (DP-900) certification exam validates your expertise in core data concepts and Azure’s powerful data services capabilities. This comprehensive guide written by Steve Miles—a Microsoft Azure MVP and certified trainer with over 25 years of experience in cloud data services and 30+ certifications across major platforms—serves as your gateway to a future shaped by data and AI, regardless of your technical background.
With the help of examples, you'll learn fundamental data concepts, including data representation, data storage options, and common workloads and gain clarity on the roles and responsibilities of key data professionals such as data administrators, engineers, and analysts. This guide covers all crucial exam domains, from data services capabilities of the Azure cloud platform to considerations for relational, non-relational, and analytics workloads, encompassing both Microsoft and open-source technologies.
To supplement your exam prep, this book gives you access to a suite of online resources designed to boost your confidence, including mock tests, interactive flashcards, and invaluable exam tips
By the end of this book, you’ll be fully prepared not only to pass the DP-900 exam but also to confidently tackle data solutions in Azure, setting a strong foundation for your data-driven career
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 212
Veröffentlichungsjahr: 2024
Microsoft Certified Azure Data Fundamentals (DP-900) Exam Guide
Build a solid foundation in Azure data services and pass the DP-900 exam on your first try
Steve Miles
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Author: Steve Miles
Reviewer: Priyanka Agrawal
Publishing Product Manager: Anindya Sil
Senior-Development Editor: Ketan Giri
Development Editor: Kalyani S.
Digital Editor: M Keerthi Nair
Presentation Designer: Shantanu Zagade
Editorial Board: Vijin Boricha, Megan Carlisle, Simon Cox, Ketan Giri, Saurabh Kadave, Alex Mazonowicz, Gandhali Raut, and Ankita Thakur
First Published: September 2024
Production Reference: 1260924
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB
ISBN: 978-1-83620-815-0
www.packtpub.com
Steve Miles works in a senior technology role for the cloud practice of a multi-billion turnover European IT distributor. He is a Microsoft Most Valuable Professional (MVP), Microsoft Certified Trainer (MCT), and an Alibaba Cloud MVP with 25+ years of technology experience in hosted datacenter services, hybrid, and multi-cloud platforms, and a previous military career in engineering, signals, and communications. Steve is the author of many books on Microsoft technologies with a focus on Azure, AI and data, as well as security, which can be found on his author profile on Amazon at https://www.amazon.com/stores/Steve-Miles/author/B09NDJ1RC8.
Steve is a petrolhead and can be found tinkering with cars when he is not writing.
You can connect with him on LinkedIn at https://www.linkedin.com/in/stevemiles70/
Priyanka Agrawal is a Technical Trainer at Microsoft, USA, with over 15 years of experience as a Microsoft Certified Trainer. She is passionate about learning and sharing knowledge in all capabilities and excels in delivering training, proctoring, and upskilling Microsoft Partners and Customers. Priyanka has significantly contributed to AI and Data-related courseware, exams, and high-profile events such as Microsoft Ignite, Microsoft Learn Live Shows, MCT Community AI Readiness, and Women in Cloud Skills Ready.
Beyond her professional achievements, Priyanka is a passionate advocate for environmental protection and actively supports related initiatives. In her personal time, she enjoys traveling and spending quality time with her family.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily.
Follow these simple steps to get the benefits:
Scan the QR code or visit the link below:https://packt.link/free-ebook/9781836208150
Submit your proof of purchase.That’s it! We’ll send your free PDF and other benefits to your email directly.In this first chapter of this DP-900 Microsoft Azure Data Fundamentals exam guide, you will delve into the core data concepts that underpin these transformative trends. You will explore the fundamental principles that govern the structure, processing, analysis, utilization, and management of data. In the second part, you will move on to understanding the evolving job roles and responsibilities required within modern data-driven organizations.
This chapter primarily focuses on the Describe Core Data Concepts module from the Skills Measured section of the DP-900 Microsoft Azure Data Fundamentalsexam.
This chapter’s content requires no prior understanding of data and does not necessarily require you to be in an existing data or technical role to learn the concepts and pick up the skills from this chapter.
This book and its accompanying online resources are designed to be a complete preparation tool for your DP-900 Exam.
The book is written in a way that you can apply everything you’ve learned here even after your certification. The online practice resources that come with this book (Figure 1.1) are designed to improve your test-taking skills. They are loaded with timed mock exams, interactive flashcards, and exam tips to help you work on your exam readiness from now till your test day.
Before You Proceed
To learn how to access these resources, head over to Chapter 9, Accessing the Online Practice Resources, at the end of the book.
Figure 1.1: Dashboard interface of the online practice resources
Here are some tips on how to make the most out of this book so that you can clear your certification and retain your knowledge beyond your exam:
Read each section thoroughly.Make ample notes: You can use your favorite online note-taking tool or use a physical notebook. The free online resources also give you access to an online version of this book. Click the BACK TO THE BOOK link from the Dashboard to access the book in Packt Reader. You can highlight specific sections of the book there.Chapter Review Questions: At the end of this chapter, you’ll find a link to review questions for this chapter. These are designed to test your knowledge of the chapter. Aim to score at least 75% before moving on to the next chapter. You’ll find detailed instructions on how to make the most of these questions at the end of this chapter in the Exam Readiness Drill - Chapter Review Questions section. That way, you’re improving your exam-taking skills after each chapter, rather than at the end.Flashcards: After you’ve gone through the book and scored 75% more in each of the chapter review questions, start reviewing the online flashcards. They will help you memorize key concepts.Mock Exams: Solve the mock exams that come with the book till your exam day. If you get some answers wrong, go back to the book and revisit the concepts you’re weak in.Exam Tips: Review these from time to time to improve your exam readiness even further.By the end of this chapter, you will be able to answer exam questions on the following topics with confidence:
Features of structured, semi-structured, and unstructured dataCommon data file formatsTypes of databasesTransactional and analytical workloadsResponsibilities for the roles of database administrators, data engineers, and data analystsYour knowledge will be tested at the end of this chapter, and some questions will be asked to review your understanding of the topic.
In addition, this chapter’s goal is to take your knowledge beyond the exam content so you are prepared for a real-world, day-to-day Azure data-focused role.
But before you continue with the content of this chapter, take a moment to pause and consider what it means to live in an era in which rapidly evolving technologies such as artificial intelligence (AI) are changing the world around you in real time. How are organizations going about solving their biggest challenges using data? What trends are transforming the world of work and changing the professional roles, tasks, and jobs you do every day?
This book’s first content section introduces you to data concepts and terminology. We will look at what data is, its definition, and how and why it is structured differently. Data structures are specialized formats for organizing, processing, retrieving, and storing data. They are essential for efficient data management and manipulation in computing.
Data is essentially information. When we are talking about modern IT systems, this data is stored as values and can be anything from a person’s age stored in a database to a two-hour film stored in the servers of an on-demand video-streaming service.
All data starts as “raw data,” collected from different sources without order and classification. Generally, this data needs to be filtered, sorted, and analyzed before it can be used. Raw data can be likened to crude oil; its potential and value cannot be derived when raw without action being taken to transform it into another form that can then be consumed in some more meaningful and valuable way. It takes a lot of resources and processes to turn something raw into something we can use that has value; otherwise, we just store the latent value that remains unlocked and unutilized. Gasoline is a good example of this transformation process. Crude oil is turned into gasoline, but not instantly; the refining process is the most crucial and critical. The same happens with raw data.
“Processed” data is data that has been transformed in some way, filtered, cleansed, organized, formatted, and analyzed to extract valuable information. Sales figures recorded daily would be raw data; in contrast, a report summarizing the sales trends during a monthly period would be the processed data. The processed data can then be stored in databases for further consumption for further derived value.
Data analysis refers to the steps of looking at numerical and other information, preparing and transforming it, and then modeling it to identify useful information, reach conclusions, and make decisions. Through this process, organizations may identify patterns, correlations, and trends from data that can inform strategic planning and operations. For example, by examining customer data, an organization can identify buying patterns and preferences that inform marketing efforts, the focus of their attention, and how they can best serve their customers.
Data values are typically organized into data entitiesand attributes.
These entities and attributes are central concepts in organizing data. A data entity represents a distinct object or concept about which data is stored. In a customer relationship management (CRM) system, for example, Customer is a data entity consisting of all the information about this particular customer, such as the customer ID, name, and email.
Each entity comprises many attributes (also known as properties or fields), which are specific features or facets of an entity. The attribute is a specific property or fact that concerns an entity; for example, Customer might have CustomerID, Name, Mail, and PurchaseHistory attributes.
In another example of a university scenario, entities might have the names Students, Courses, and Professors. Each of these entities has attributes that describe the characteristics of the entity they refer to. The Students entity may have the StudentID, Name, Major, and EnrollmentDate attributes. These attributes describe a student completely and uniquely, making entities of this kind distinguishable from one another.
Entities and attributes provide a structured way of organizing and making sense of data, facilitating its collection, analysis, and application.
When talking about data, you should consider that it has the following three V dimensions:
Volume: This is the amount of data generated.Velocity: This is the speed at which data is generated.Variety: This is the diversity of data generated.Data volume refers to the amount of data that is produced and captured. As AI technologies improve, data volume goes up since more data means better chances of training and developing more accurate models.
For instance, a self-driving car must gather thousands of examples from sensor data and camera feeds to figure out how to drive safely on the road. The more data AI has access to, the more effectively it can learn a new task and apply that learning to a new environment.
Data velocity refers to the speed of data flow. High rates of data flow are required for real-time applications involving analytics done by AI. For instance, in financial trading, AI algorithms track data streams in real time and make millisecond decisions to trade accordingly.
Data variety refers to the different types of data being generated, collected, and/or stored. Machines that employ AI capabilities can typically work with different types of data – structured data (such as the individual data fields that make up a database), unstructured data (such as narrative text and images), and semi-structured data JavaScript Object Notation (JSON) files and similar formats that lack rigid constraints on data fields). The range of data types an AI machine can process and learn from has grown along with its capabilities.
For example, NLP capabilities have been extended through our ability to train machines on human-sounding text, and computer vision capabilities have been enhanced by ingesting and learning from images. Within a single organization, machine learning (ML) systems that deal with customer service messages can now digest text from chat logs, extract voice data from voice calls, and even sense and analyze the sentiment of those messages.
Each of these three data dimensions constantly expands with the rise in AI technologies and every innovation and new business requirement.
There is a reciprocal relationship between data and AI, as advances in the technologies involved with AI dictate how much more data is needed, in terms of volume, velocity, and variety. AI systems are data hungry; algorithms powering AI systems train and learn from very large sets of data and require new, constantly updated data to enhance their enhanced image, speech, and text recognition and make useful predictions.
But much of it is also driven by business innovation; enterprises constantly seek new uses for AI to enhance their competitive positioning, whether to personalize and optimize marketing efforts based on consumer behavior analysis or to improve and automate a manufacturing process with the help of predictive maintenance via AI. This requires richer and higher-quality data.
This rate of business innovation and the widespread and rapid adoption of AI technologies has boosted the increase in data volume, velocity, and variety, where AI systems rely on large, fast-moving, diverse datasets. The more companies innovate with AI solutions and scale their use, the more data these systems produce (and demand), in turn expanding the existing data environment. The intimate connection between AI and data means that strategies to handle the growing volume, velocity, and variety of data will be crucial to meeting the needs of AI applications.
Now that you understand data and its significance and relationships with AI and innovation, you can read about data categories in the next section.
Data types and categories are important for understanding data in this data-driven world. In your professional life, you deal with information in the form of data in systems and applications. You also deal with information in your daily life – from real-time social media platforms and streaming music and movies to playing games and online banking.
All this data makes a lot of noise, so whether you are a business professional, a student, or a curious individual, knowing how data is categorized can help you better navigate the digital landscape.
Data can be categorized as follows:
Structured dataSemi-structured dataUnstructured dataIn the following sections, each data categorization will be broken down with simple examples and analogies to everyday use cases you may have encountered, making it easy for those at a base level of understanding to grasp these fundamental concepts.
This data is organized according to some fixed format or predefined structure and stored in a way that can be easily accessed, interpreted, and used.
When data has rows and columns and is arranged in a predefined format with a fixed schema, the data is said to be structured.
It is easier to describe with an example of structured data. A database is one example, but for something more relatable away from your professional life, consider a box of recipe cards.