35,99 €
With AI and machine learning (ML) models gaining popularity and integrating into more and more applications, it is more important than ever to ensure that models perform accurately and are not vulnerable to cyberattacks. However, attacks can target your data or environment as well. This book will help you identify security risks and apply the best practices to protect your assets on multiple levels, from data and models to applications and infrastructure.
This book begins by introducing what some common ML attacks are, how to identify your risks, and the industry standards and responsible AI principles you need to follow to gain an understanding of what you need to protect. Next, you will learn about the best practices to secure your assets. Starting with data protection and governance and then moving on to protect your infrastructure, you will gain insights into managing and securing your Azure ML workspace. This book introduces DevOps practices to automate your tasks securely and explains how to recover from ML attacks. Finally, you will learn how to set a security benchmark for your scenario and best practices to maintain and monitor your security posture.
By the end of this book, you’ll be able to implement best practices to assess and secure your ML assets throughout the Azure Machine Learning life cycle.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2023
Machine Learning Security with Azure
Best practices for assessing, securing, and monitoring Azure Machine Learning workloads
Georgia Kalyva
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Niranjan Naikwadi
Publishing Product Manager: Tejashwini R
Book Project Manager: Kirti Pisat
Senior Editor: Tiksha Abhimanyu Lad
Copy Editor: Safis Editing
Project Coordinator: Shambhavi Mishra
Proofreader: Safis Editing
Indexer: Hemangini Bari
Production Designer: Ponraj Dhandapani
DevRel Marketing Coordinator: Vinishka Kalra
First published: December 2023
Production reference: 1291123
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80512-048-3
www.packtpub.com
To my mother, Maria, and my father, Michael, for their sacrifices and their unwavering love and care. To my partner, George, for being my inspiration and support through every challenge.
– Georgia Kalyva
I have known and collaborated with Georgia Kalyva for 15 years, starting from her involvement in the technology communities, when she was a university student. Georgia showcased an amazing set of skills during her work as a developer combined with a truly magnificent way of delivering technical presentations in a great number of events. She led many initiatives and also proved herself by winning the Microsoft Imagine Cup national title as a college freshman.
Her improvement in not only in her technical skills but also in her presentations and public relations made her a person of reference, initially in the Greek IT market and later on the global community. During her professional years, she managed to get to a managerial position quickly, and her involvement with new technologies made her the go-to person. She had no problem jumping into issues and projects that were out of her comfort zone; thus she managed to get experience in infrastructure, architecture and cybersecurity. Her passion for Artificial Intelligence motivated her to create an amazing blog, to share her knowledge with other people, to speak about Artificial Intelligence in numerous tech events and thus to become a Microsoft MVP in Artificial Intelligence. She continued by becoming a Microsoft Technical Trainer and she also started teaching about the technologies she is so passionate about.
By reading Machine Learning Security with Azure, you will learn best practices on implementing security in your machine learning implementation, and you will also get the experience and knowledge of the tools that can be used, which Georgia not only delivered them during her trainings, but also implemented in many production customer scenarios. After you complete Georgia’s book, you will have a very good understanding of the core features, frameworks and solutions you can use to secure your machine learning environment. Georgia managed to point out all the features and options that Azure provides you, not only from a developer’s perspective but also from a security engineer’s side. Machine Learning Security with Azure will be your own security baseline walkthrough that you can follow to build secure machine learning solutions.
-George Kavvalakis
CEO and founder of Blacktrack Consulting, Azure Solutions and Security Architect, Top 100 Leaders in Global Healthcare awardee, Outstanding Leadership awardee x2, and Visionaries awardee.
Georgia Kalyva is a technical trainer at Microsoft. She is recognized as a Microsoft AI MVP, is a Microsoft Certified Trainer, and is an international speaker with more than 10 years of experience in Microsoft Cloud, AI, and developer technologies. Her career covers several areas, ranging from designing and implementing solutions to business and digital transformation. She holds a bachelor’s degree in informatics from the University of Piraeus, a master’s degree in business administration from the University of Derby, and multiple Microsoft certifications. Georgia’s honors include several awards from international technology and business competitions and her journey to excellence stems from a growth mindset and a passion for technology.
I would like to express my sincere gratitude to everyone who has played a role in making this project a reality, especially George and my parents. Your support, encouragement, and contributions have been invaluable, and I couldn’t have done it without you. Thank you!
Amreth Chandrasehar is an engineering leader in cloud, AI/ML engineering, observability, and SRE. Over the last few years, Amreth has played a key role in cloud migration, generative AI, AIOps, observability, and ML adoption at various organizations. Amreth is also the co-creator of the Conducktor platform, serving T-Mobile's 100+ million customers, and a tech/customer advisory board member at various companies on observability. Amreth has also co-created and open sourced Kardio.io, a service health dashboard tool. Amreth has been invited to speak at several key conferences and has won several awards.
I would like to thank my wife, Ashwinya Mani, and my son, Athvik A, for their patience and support provided during my review of this book.
This part is all about creating a plan to secure your resources. Security is organization-specific so you will get an overview of the Zero Trust security approach designed to secure any implementation of IT systems. You will learn to leverage the MITRE ATLAS knowledge base to understand ML attacks. Finally, you will also learn how to develop AI systems ethically and responsibly and how to use Azure services to ensure regulatory compliance.
This part has the following chapters:
Chapter 1, Assessing the Vulnerability of Your Algorithms, Models, and AI EnvironmentsChapter 2, Understanding the Most Common Machine Learning AttacksChapter 3, Planning for Regulatory ComplianceWelcome to your machine learning security journey with Azure! Together, we will explore all the methods and techniques to secure our AI projects and set a security baseline for our services. Let us start with a quick introduction to the machine learning (ML) life cycle and the Azure Machine Learning components and processes that go into working with ML in Azure. We will cover the essential knowledge you need to follow the concepts and implementations outlined in the rest of the book.
The next step will be to go through an example scenario, which we will reference throughout this book as the basis for applying the concepts of securing your data, models, workspace, and applications that use the deployed models from Azure Machine Learning. You can follow the instructions to re-create this scenario in your Azure Machine Learning environment to familiarize yourself with the Azure Machine Learning components.
We will use the Zero Trust model to develop an implementation and assessment strategy. This model is a security strategy based on the principle of Never trust, always verify. This model applies to all levels of implementation, from identities, infrastructure, and networks, to apps, endpoints, and data. This strategy is the best approach when working with multiple services and environments because we can easily adapt it to the complexity of modern cloud and hybrid environments. Since developing a strategy heavily depends on the individual scenario and use case for each organization, in this book, we will explore multiple options and demonstrate implementations of several Zero Trust aspects. Specifically, we will learn about attacks in Chapter 2, data governance and protection in Part 2 of the book, how to manage and secure access to the workspace and associated resources in Part 3, and in Chapter 10, we will gather all those best practices in an ML security baseline.
In this chapter, we are going to cover the following main topics:
Reviewing the Azure Machine Learning life cycleIntroducing an ML projectExploring the Zero Trust modelAssessing the vulnerability of ML assets and appsBy the end of this chapter, you will be familiar with the basic principles and defense areas of the Zero Trust strategy. You can use this strategy to create a high-level vulnerability assessment of artificial intelligence (AI)/ML project components, applications, and related services hosted in Azure.
Throughout this book, we will need a few things to apply the learnings and implementations. Each chapter will outline more details if needed, but the minimal resources we need are an Azure subscription and an Azure Machine Learning resource with its related services.
Throughout this book, we will reference the scenario presented in this section and other services and implementations in Azure. You will need an active Azure subscription and an Azure Machine Learning workspace to follow along or replicate the results.
If you don’t have an Azure subscription, you can activate a free trial by following this link: https://azure.microsoft.com/en-us/pricing/offers/ms-azr-0044p/.
If you run the project suggested in this chapter from end to end, it should not cost more than $150–$200 as long as you delete all associated resources afterward and use the lowest pricing tier of all services. However, this estimation can vary by the region that you choose, which features you choose to implement, the size of the dataset, and for how long you plan to keep the resources deployed. The trial will provide you with a sufficient balance to try it out; however, I strongly recommend using the Azure pricing calculator and working with the cost management in your subscriptions to ensure keeping the costs at a minimum and deleting or stopping resources when you no longer use or need them.
Note
The Azure free trial provides you with credits to spend on Azure services. After they are used up, you can keep the account and use free Azure services. You need to add your credit card, but the service won’t charge your card unless you explicitly change your subscription type. Azure uses a pay-as-you-go pricing model. To make sure you don’t run out of credits faster than intended, visit the Azure pricing calculator at https://azure.microsoft.com/en-us/pricing/calculator/.
To use Azure Machine Learning, you need to create an Azure Machine Learning resource. The following screenshot shows the basic options for creating one:
Figure 1.1 – Azure Machine Learning resource creation form
No matter what technology or framework we choose to work with to develop our ML project, there are four phases we go through. Each stage has one or more steps, depending on the individual scenario. The ML life cycle is significant because it clearly outlines every project step. Then, it is easy to break the project into tasks and assign them to the person responsible because, usually, more than one role is involved in an ML project.
Let us review all the stages before we connect them to the components of Azure Machine Learning.
In ML, we identify four stages: business understanding, data operations, model training, and model deployment. As shown in the following figure, these stages are part of an iterative process:
Figure 1.2 – ML life cycle
Let us go through each step of this iterative process and what it entails, starting with the business understanding stage and the gathering of the initial requirements.
Every project starts with a problem that needs to be solved. Business understanding (or problem understanding) is the first step in creating a plan that outlines what needs to be done. Ideally, we would like to have the requirements clearly detailed for us, but this is rarely the case. The first thing is to understand the project’s goal and where it can bring real value to the business. Then, evaluate which processes can be automated using ML by narrowing down the problem to actionable tasks.
For example, let us examine the following scenario. Our client is a hospital administration looking to minimize costs by increasing doctor productivity and automating as much of their workload as possible. In an analysis of doctors’ daily tasks, they found that they spend a lot of time looking through patient histories and analyzing their blood tests. By decreasing that time by 5%, the doctor could see more patients without working overtime. We can solve this problem by using supervised learning techniques, where an ML model could be trained to suggest illnesses by combining the patient’s symptoms and blood test results. The doctor would still have to verify the results. However, shortening the analysis time would increase the doctor’s productivity.
After narrowing down the requirements and clarifying the problem, the next step is to examine the data.
ML is based on data. During this stage, we work with everything that has to do with data operations, from data collection to data processing. Data gathering or data collection is the part where the goal is to collect relevant data to the problem at hand. This data could come from various sources such as files, databases, APIs, or sensors. It is one of the most critical steps in the project because we identify the different data sources and collect and integrate the data. The quality and quantity of the data we gather will determine the efficiency and accuracy of the model’s output.
Collected data can be messy and often not ready to be used by ML algorithms. Issues with the data include irrelevant data, noise, outliers, and missing data. This is where data preparation or data wrangling comes in. Any data irrelevant to our model should be filtered properly. When outliers are recognized, we usually eliminate them from the dataset. With missing data, the process is a little bit more complex. Once identified, the outliers should be evaluated and either removed or filled with default or calculated values. Finally, data might need to be encoded differently to be used by ML algorithms.
An appropriate ML algorithm is selected and trained on the prepared data at this stage. The model is developed by completing multiple training iterations and the result is evaluated at the end of each iteration until a satisfactory performance level is achieved. During this stage, there might be a need to go back and work with the data again to ensure that data are relevant and that no unintended correlation might affect the results.
Once the results are satisfactory, the final step is deploying the model so that software engineers can integrate it into their applications and make predictions. Although it might seem like the end, this is far from it. Deployed models need to be monitored to ensure proper performance. Models can degrade over time, which would impact the accuracy of their predictions. In this case, we can retrain the model with an updated dataset to ensure this does not happen and the cycle starts over again. Changing the requirements or introducing a new business need might also cause us to retrain our model.
Now that we have a good understanding of the ML process, we can move on to the Azure Machine Learning service components that are part of each stage of the ML life cycle. Everything we need to develop Azure Machine Learning projects is part of or in some way related to the Azure Machine Learning Studioor workspace.
Azure Machine Learning is a cloud service for accelerating the ML project life cycle. It leverages the Azure infrastructure to connect to data and train, deploy, and monitor models. The service includes everything from connecting data from multiple data sources to working on developing the code and training, evaluating, and publishing models ready to be used by web applications.
The service is a complete environment for developing end-to-end ML projects. It allows collaboration for multiple roles, from data scientists to developers, security engineers, or IT (information technology) administrators. In this section, we will review how each component maps to each part of the ML project life cycle and the service capabilities.
The Azure Machine Learning service creates several related Azure services required to use the service properly. First, you need an active Azure subscription. When you create the Azure Machine Learning resource, the following services are created alongside it: an Azure Storage account, an Application Insights resource, and Azure Container Registry.
An Azure Storage account serves as the filesystem. All imported files, notebooks, and so on are saved here. The Applications Insights resource can be used to monitor a deployed model and provide logs and insights when the deployed model does not work as expected. The Azure Container Registry is optional in creating the workspace, but you will probably need one if you plan to publish your models in the Azure Container Instances (ACI) service. The models can be published to different compute targets, including containers, and are created as APIs so any application can easily use the models and make predictions. Everything else is handled from the Azure Machine Learning Studio. Here, you can work with your data, create compute resources, train and deploy your models, manage user access, and much more.
The following screenshot shows the Azure Machine Learning Studio home page. On the left, you can find the Azure Machine Learning Authoring, Assets, Compute, and other resource management options:
Figure 1.3 – Azure Machine Learning Studio
Most of the work will happen in the workspace, whether that is data preparation and import, code development, or model training and inferencing. Let us see what features and assets we have available to work with.
There are a few ways of working with data in Azure Machine Learning Studio. The first is working with data stored elsewhere by creating a datastore. A datastore is a reference to existing storage on Azure. That could be Azure Blob Storage or Azure Data Lake. If your data is not in Azure, you can still work with it in Azure Machine Learning. You can always upload your files as data assets into the workspace or add a reference to an external database. Then, you can use them in your project, share them with your colleagues, and keep the versioning to track changes and updates.
In the following screenshot, you can see some of the types of data assets you can import:
Figure 1.4 – Create data asset in Azure Machine Learning Studio
You can use Python and R to develop your Azure Machine Learning project, but the service also provides a visual designer. You can use the designer for training and inference (production models that make predictions).
Refer to the following screenshot:
Figure 1.5 – Azure Machine Learning Designer
The designer has many out-of-the-box modules for multiple operations that you can drag and drop onto the canvas to create the training pipeline. That includes importing data, splitting datasets, SQL operations, algorithms, and model evaluation modules. If you need more, you can always use the custom script modules for Python or R and add your code as part of the pipeline. The benefit is that you can quickly go to an inference pipeline and convert it to a web service with just a few clicks.
The Automated Machine Learning (Automated ML) capability of Azure Machine Learning is a set of tools and techniques that automate the process of building ML models. It helps beginners or experienced data scientists by automatically selecting the best algorithm and hyperparameters for a given dataset and then training and validating the model. This process is done by applying various algorithms, such as decision trees, random forests, and deep neural networks, to the data and selecting the best-performing model. It also includes data features such as preprocessing, feature engineering, and model selection. It allows users to upload data into the Azure Machine Learning workspace and let the platform handle the rest of the ML process with minimal configuration. When the best model is selected from the list, it can be deployed to a web service ready to be consumed, as is the case with all models in the Azure Machine Learning workspace. Have a look at thefollowing screenshot:
Figure 1.6 – Automated ML supported algorithms
When training or deploying models, computing power is required. Azure Machine Learning provides a scalable compute infrastructure for training and deployment based on Azure infrastructure. To train your experiments, you can use compute targets or compute clusters. A compute target is a dedicated virtual machine running your training jobs. If you need more power, you can create a cluster with multiple nodes to run workloads in parallel. You can also attach compute from virtual machines you are not using or from other ML services such as Azure Databricks. You can use ACI or Azure Kubernetes Service (AKS) clusters to deploy models.
Besides the visual and automation tools, Azure Machine Learning supports Jupyter Notebooks for code development and collaboration. You can use the embedded notebook editor, Visual Studio Code with the Azure Machine Learning extension, or the Jupyter Notebook editor that you can launch from the running compute target during training. Refer to the following screenshot for this:
Figure 1.7 – Azure Machine Learning Studio notebook editor
Here, we can see the most fundamental components of Azure Machine Learning. The workspace is complete with multiple features and tools that can facilitate all stages of the ML life cycle and take an ML project from start to finish. In the following section, we will see how to use the features and tools outlined here to develop a sample ML project from development to production.
If you want to follow along with the implementation examples in this book, here is an example project to get you started. If you are already an expert in Azure Machine Learning, feel free to skip this introduction. This section will help beginners with the service or those in other roles to understand the ML life cycle in action. We will create a sample project that demonstrates how to import a dataset into Azure Machine Learning, how to use the Automated ML feature to train multiple models with multiple parameters, and deploy the resulting model as an endpoint to be used for predictions. The Automated ML feature was chosen as it does not require extensive data science expertise.
Log in to the Azure portal (https://portal.azure.com/) and look for Azure Machine Learning resource. From Overview, click on Studio web URL or the Launch Studio button to access your workspace, as seen in the following figure:
Figure 1.8 – Accessing your Azure Machine Learning workspace
Now you will find yourself on the home page of the workspace. From here on, you will find all the options in the left-hand menu outlined in the following sections.
ML starts with data, so you will need to find a dataset to use to train a model to make predictions. There are a lot of open source datasets available for ML and you can download them for free – for example, from university repositories for learning or research purposes. Just make sure the source is reputable so that there is no malware or something similar with your download. For this example, we will use the regression task with Automated ML. If you want to follow along with the steps, you can use any dataset; just make sure you tailor the Automated ML options to match the dataset you choose. If you don’t have a lot of experience and you want something close to the dataset used here, when you are looking for a dataset, pay attention to the task and data. If it can be used for regression and it contains a column with numerical data that your model will be trained to predict, any dataset will do. Automated ML does not support all types of ML tasks yet, so this is a good way to get started.
I am using a sample dataset that contains patient symptoms and a class column that notes whether the patient was diagnosed as diabetic. It will help to train a new model that predicts the probability if the patient will become diabetic or not based on symptoms.
If you have your dataset ready, all we have to do is go to the Data menu under Assets and create a new data asset. Here are the steps for it:
First, we provide details for Name, Description, and set Data type as Tabular, as shown here:Figure 1.9 – Create data asset
In the next step in the wizard, choose the From local files option:Figure 1.10 – Choosing the data source
Leave the default option in the Storage type screen and move on to upload the file in the File or folder selection screen:Figure 1.11 – Uploading the file
Under Settings, choose the options as illustrated until Data preview shows the correct columns and data: