34,79 €
Amazon Quicksight is an exciting new visualization that rivals PowerBI and Tableau, bringing several exciting features to the table – but sadly, there aren’t many resources out there that can help you learn the ropes. This book seeks to remedy that with the help of an AWS-certified expert who will help you leverage its full capabilities.
After learning QuickSight’s fundamental concepts and how to configure data sources, you’ll be introduced to the main analysis-building functionality of QuickSight to develop visuals and dashboards, and explore how to develop and share interactive dashboards with parameters and on-screen controls. You’ll dive into advanced filtering options with URL actions before learning how to set up alerts and scheduled reports.
Next, you’ll familiarize yourself with the types of insights before getting to grips with adding ML insights such as forecasting capabilities, analyzing time series data, adding narratives, and outlier detection to your dashboards. You’ll also explore patterns to automate operations and look closer into the API actions that allow us to control settings. Finally, you’ll learn advanced topics such as embedded dashboards and multitenancy.
By the end of this book, you’ll be well-versed with QuickSight’s BI and analytics functionalities that will help you create BI apps with ML capabilities.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 217
Veröffentlichungsjahr: 2022
Develop stunning data visualizations and machine learning-driven insights with Amazon QuickSight
Manos Samatas
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Sunith Shetty
Senior Editor: David Sugarman
Content Development Editor: Joseph Sunil
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Project Coordinator: Aparna Nair
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Roshan Kawale
First published: December 2021
Production reference: 1091121
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80107-929-7
www.packt.com
To Gabriela
For her unconditional support and encouragement.
Manos Samatas is a solutions architect specializing in big data and analytics. He has several years of experience developing and designing big data applications for various industries, including telecommunications, cybersecurity, healthcare, and the public sector. He is an accredited AWS Subject Matter Expert (SME) in analytics and he possesses the AWS Data Analytics Specialty and the AWS Solutions Architect Professional certifications. Manos lives in London with his fiancée Gabriela. In his free time, he enjoys traveling, socializing with friends, and taking care of his plants.
Madhavan Sriram is a manager of data science at Amazon and focuses on building data-intensive products for Amazon's transportation business. His team processes over 200 TB of data annually to build intelligent products that enable automated warehouse operations by leveraging machine learning, big data, and visualisation techniques. His research focus areas are in the space of big data and machine learning from an applied context. In the past, he worked for several large enterprises including IBM, Toyota, and Royal Philips on furthering their science and data landscape. He and his wife reside in Luxembourg and enjoy spending the time outdoors with their furry golden retriever "Richard Parker".
Raquel Marasigan is an information security engineer II lead at CBSPI, where she analyses the effectiveness of security strategies. Raquel earned her Bachelor of Arts from Arellano University, majoring in political science. Her interest in technology and security motivated her to continue her education in data analytics. She is currently enrolled with the University of Asia and the Pacific, completing a Masters Applied Business Analytics degree, and completing her capstone project for the 2022 school year. Raquel is fortunate to work alongside the author, Jason Dunn, Mia Heard, and other experts in the AWS Community Builders and AWS Users Group where they share their insights. Many thanks to Raquel's two sons for their encouragement throughout the process of reviewing this book.
The adoption of cloud-native business intelligence (BI) tools, such as Amazon QuickSight, enables organizations to gather insights from data at scale. This book is a practical guide to performing simple-to-advanced tasks with Amazon QuickSight.
You'll begin by learning QuickSight's fundamental concepts and how to configure data sources. Next, you'll be introduced to the main analysis-building functionality of QuickSight to develop visuals and dashboards. The book will also demonstrate how to develop and share interactive dashboards with parameters and onscreen controls. Advanced filtering options with URL actions will then be covered, before learning how to set up alerts and scheduled reports. Later, you'll explore the insight visual type in QuickSight using both existing insights and by building custom insights. Further chapters will show you how to add machine learning insights such as forecasting capabilities, analyzing time series data, adding narratives, and outlier detection to your dashboards. You'll also explore patterns to automate operations and look closer into the API actions that allow us to control settings. Finally, you'll learn about advanced topics such as embedded dashboards and multitenancy.
By the end of this book, you'll be well versed in QuickSight's BI and analytics functionalities that will help you create BI apps with ML capabilities.
This book is for BI developers and data analysts who are looking to create interactive dashboards using data from Lake House on AWS with Amazon QuickSight. This book will also be useful for anyone who wants to learn Amazon QuickSight in depth using practical examples. You will need to be familiar with general data visualization concepts; however, no prior experience with Amazon QuickSight is required.
Chapter 1, Introducing the AWS Analytics Ecosystem, starts by introducing the AWS analytics ecosystem. Then we will discuss how Amazon QuickSight fits within the wider ecosystem. We will look closer at the Lake House architecture and its benefits and different components. Finally, we will provide a step-by-step guide for the reader to set up this architecture in their development environment and add demo data that we will use with Amazon QuickSight to create visualizations later in the book.
Chapter 2, Introduction to Amazon QuickSight, introduces Amazon QuickSight and its main benefits as a cloud-native BI tool. We will explain the various options at the account creation stage, including the user authorization options. Finally, we will provide a step-by-step guide for the reader to set up a QuickSight account and configure the required permissions to connect to Amazon Redshift.
Chapter 3, Preparing Data with Amazon QuickSight, focuses on how to create data sources with Amazon QuickSight and use the dataset editor. We will provide a step-by-step guide to help readers set up data sources on their environment. Finally, we will look at more advanced operations such as joins, row-level security controls, and calculated fields.
Chapter 4, Developing Visuals and Dashboards, introduces the main analysis-building functionality of Amazon QuickSight. We will start by exploring the author UI and explain the different visual types. After adding certain visual types and explaining their functionality we will introduce the concepts of dashboards and stories and explain how we can share these dashboards with other users. Finally, we will look how to style a dashboard and create custom themes.
Chapter 5, Building Interactive Dashboards, explores how to develop interactive dashboards with Amazon QuickSight. The reader will learn to add custom controls on their dashboards and add interactivity to their BI application using parameters. We will also look at advanced filtering options with point-and-click actions with URL actions. Finally, we will explore the reader user experience via the web and mobile QuickSight app and we will explain how to set up alerts and scheduled reports.
Chapter 6, Working with ML Capabilities and Insights, explores the insight visual type in Amazon QuickSight. We will use both the QuickSight-suggested insights and build our own custom insights. We will add forecasting capabilities by analyzing time-series data, and we will add narratives and outlier detection. Finally, we will look more closely at how to integrate Amazon QuickSight with models deployed with Amazon SageMaker.
Chapter 7, Understanding Embedded Analytics, dives deeper into embedded dashboards. We will describe the architecture and the business drivers behind embedding, and we will explain the permission models. We will have a step-by-step guide to set up embedded analytics for authenticated or unauthenticated users. Finally, we will look briefly at how to embed the QuickSight console for QuickSight authors.
Chapter 8, Understanding the QuickSight API, explores patterns to automate certain operations using the QuickSight API. We will see how to create dashboards and reuse analyses using the Template API. We will also explore patterns to automate monitoring of dataset operations and finally, we will look more closely into the API actions that allow us to control settings.
Chapter 9, Managing QuickSight Permissions and Usage, focuses on data permissions and managing Amazon QuickSight operations. We will explain how it integrates with Lake Formation Redshift and Redshift Spectrum tables from a data authorization point of view. We will look at incident reporting using AWS CloudTrail and will examine the use of common operations to manage QuickSight usage.
Chapter 10, Multitenancy in Amazon QuickSight, talks about multitenancy in Amazon QuickSight. To understand it better, we will look at a simple hands-on example. Finally, we will look at an architecture that combines the two concepts of embedded analytics and multitenancy and explain its practical use cases.
You will need to be familiar with general data visualization concepts, but won't need any previous experience with Amazon QuickSight. Also, we expect you to have a basic understanding of the AWS cloud.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Please ensure that you terminate all running instances of AWS when not needed, to reduce costs.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Actionable-Insights-with-Amazon-QuickSight. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781801079297_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "For example, in QuickSight, the DeleteDataSet action deletes a dataset."
A block of code is set as follows:
$aws quicksight update-user --user-name author-iam --role AUTHOR --custom-permissions-name custom-author --email <your-email> --aws-account-id <account-id> --namespace default --region us-east-1
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
{
"Status": 200,
"EmbedUrl": "https://us-east-1.quicksight.aws.amazon.com/... ?code=...&identityprovider=quicksight&isauthcode=true",
"RequestId": "21d2ad96-3c2b-42a4-ae10-8eb28b20892c"
}
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "With the Manage Users option selected, click on Manage Permissions as shown."
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read Actionable Insights with Amazon QuickSight, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
This section is an introduction to Amazon QuickSight and Lake House architecture. After completing this part, the reader will understand how to set up Amazon QuickSight, manage data sources, and build and share basic dashboards.
This section consists of the following chapters:
Chapter 1, Introducing the AWS Analytics EcosystemChapter 2, Introduction to Amazon QuickSightChapter 3, Preparing Data with Amazon QuickSightChapter 4, Developing Visuals and DashboardsAs data increases in both volume and variety, organizations from all verticals are adopting cloud analytics services for their data analytics. AWS offers a number of analytics services covering data lakes, data warehousing, big data processing, extract, transform, load (ETL), and data visualization. In this chapter, we will introduce the AWS analytics ecosystem. Some of the services we discuss here will be mentioned again later in the book.
First, we will map the AWS services into categories. Then, we will discuss how Amazon QuickSight fits into the wider AWS analytics ecosystem. We will look more closely at a modern Lake House architecture and we will discuss its benefits and its components. Finally, we will provide a step-by-step guide to set up a data Lake House architecture on AWS and load and query a demo data sample. Some of this information may already be familiar to you, but let's go back over the basics.
In this chapter, we will cover the following topics:
Discovering the AWS analytics ecosystemExploring the data Lake House architecture on AWSCreating a basic Lake House architectureTo follow along with this chapter, you will need the following pre-requisites:
An AWS account with console accessAWS CLI accessThe code sample for this chapter can be accessed on the GitHub repository for this book at https://github.com/PacktPublishing/Actionable-Insights-with-Amazon-QuickSight/tree/main/chapter_1.
AWS provides a large number of analytics services. In addition to that, AWS has a number of partners who specialize in data analytics and offer analytics solutions that run on the AWS infrastructure. Partner solutions are not in the scope of this section, however. This section focuses on the AWS fully managed analytics services. In order to list the services, we will first define the specific categories related to analytics functions. Machine learning and predictive analytics are also out of the scope of this chapter. For every service category, we will then list the AWS services available, and for each service, we will provide a high-level description. Figure 1.1 depicts the commonly used AWS analytics services.
Figure 1.1 – AWS analytics services
More and more organizations aspire to be data-driven and use data to drive their strategic decisions. Business intelligence (BI) tools help organizations to transform data into actionable insights. With the use of BI tools, users can analyze data and then present their findings in reports or dashboards. These reports or dashboards can then be consumed by business users who are interested in getting a picture of the state of the business.
In 2015, AWS launched Amazon QuickSight, a cloud-native BI tool. Since then, AWS has added new features to QuickSight, enriching the standard dashboard functionality with machine learning capabilities and offering embedded dashboard functionality. Amazon QuickSight is the main technology we will be covering in this book. Over the next few chapters, we will start with the basic functionality of Amazon QuickSight, and then we will explore more advanced features. Where possible, we will use practical examples that can be repeated in your own development environment, to give you hands-on experience with Amazon QuickSight.
Data warehouses are repositories of data; they are important components of the BI process. Data stored in data warehouses is typically structured. Traditionally, data is ingested and centralized into data warehouses from different operational data stores. Data warehouses are optimized to run analytical queries over large amounts of data. The results of analytical queries are usually calculated after an aggregation over multiple rows from one or more tables. BI applications use analytical queries to aggregate data and visualize it. It is a common architectural approach to use a data warehouse to serve data to a BI application.
Back in 2012, AWS launched Amazon Redshift, a cloud-native, fully managed data warehouse service. Today, Redshift is one of the most popular cloud data warehouses with thousands of organizations from different verticals using it to analyze their data. Other popular cloud data warehouses include Snowflake and Google BigQuery. Amazon Redshift integrates with most BI tools and it integrates natively with Amazon QuickSight. We will discuss this topic in more detail in Chapter 3, Preparing Data with Amazon QuickSight, when we look more closely into Amazon QuickSight-supported data sources.
A data lake is a repository of data where organizations can easily centralize all of their data and apply it in different use cases such as reporting, visualization, big data analytics, and predictive analytics. Data stored in data lakes can be structured or semi-structured. Usually, data is ingested into the data lake in its raw format, and is then transformed and stored back into the data lake for further processing and analysis. A cloud data lake typically uses a cloud object store to store data. AWS introduced Amazon Simple Storage Service (S3) in March 2006, offering developers a highly scalable, reliable, and low-latency data storage infrastructure at very low cost. Amazon S3 can store an unlimited amount of data, a particularly useful feature for data lakes. Organizations have one less thing to worry about because they don't need to think about scaling their storage as the amount of data stored grows.
While scaling data lake storage is something that organizations and CIOs don't need to worry about much anymore, data lake governance needs to be considered carefully. Data lakes do not enforce data schemas or data formats and, without any governance, data lakes can degrade into unusable data repositories, often referred to as data swamps. AWS offers a number of services for data governance.
The AWS Glue Catalog is part of the AWS Glue service. It is a fully managed Apache Hive metastore-compatible data catalog. Big data applications (for example, Apache Spark, Apache Hive, Presto, and so on) use the metadata in the catalog to locate and parse data. The AWS Glue Catalog is a technical metadata repository and can catalog data in Amazon S3, and a number of relational or non-relational data stores including Redshift, Aurora, and DynamoDB, among others.
AWS Lake Formation runs on top of AWS Glue and Amazon S3 and provides a governance layer and access layer for data lakes on Amazon S3. It also provides a set of reusable ETL jobs, called blueprints, that can be used to perform common ETL tasks (for example, loading data from a relational data store into an S3 data lake). Lake Formation allows users to manage access permissions, using a familiar GRANT REVOKE syntax that you might have seen in relational database management systems (RDBMSes).
Amazon Macie is an AWS service for data protection. It provides an inventory of Amazon S3 buckets and it uses machine learning to identify and alert its users about sensitive data, such as personally identifiable information (PII).
Finally, and perhaps most importantly, AWS Identity and Access Management (IAM) is a fundamental AWS service that allows users to assign permissions to principals (for example, users, groups, or roles) and explicitly allow or deny access to AWS resources including data lake locations or tables in the data catalog.
Ad hoc analytics refers to getting answers from the data on an as-needed basis. Contrary with what happens with scheduled reports, ad hoc querying is initiated by a user when they need to get specific answers from their data. The user typically uses SQL via a workbench type of application or other analytics frameworks (for instance, Apache Spark) using notebook environments or other BI applications. AWS has a number of analytics services that can be used for ad hoc analytics.
Amazon Redshift can be used for ad hoc analysis of data. For ad hoc querying, users will typically connect to Amazon Redshift using a query editor application with the Redshift JDBC/ODBC drivers. Notebook integrations or BI tool integrations are also possible for ad hoc analysis. AWS offers a number of managed notebook environments such as EMR notebooks and SageMaker notebooks. Amazon Redshift also allows its users to query data that is stored outside the data warehouse. Amazon Redshift Spectrum allows Redshift users to query data stored in Amazon S3, eliminating the need to load the data first before querying. Redshift's federated querying capability allows users to query live data in operational data stores such as PostgreSQL and MySQL.
For big data and data lakes, Presto is a popular choice for ad hoc analysis. Presto provides a high-performance parallel SQL query engine. Amazon Athena lets users run Presto queries in a scalable serverless environment. Amazon QuickSight natively supports Amazon Athena. We will talk more about this native integration in Chapter 3, Preparing Data with Amazon QuickSight. Amazon EMR is a fully managed Hadoop cluster, and it comes with a range of applications from the open source big data ecosystem. Presto has two community projects, PrestoSQL and PrestoDB, both of which are part of the Amazon EMR service. Other options included with EMR are Hive on EMR and Spark on EMR.
ETL is a term used to describe a set of processes to extract, transform, and load data usually for analytical purposes. Organizations gather data from different data sources and centralize them in a central data repository. Data from different sources typically has different schemas and different conventions and standards, and therefore it can be challenging to combine them to get the required answers. For that reason, data needs to transformed so that it can work together. For example, cleaning the data, applying certain data quality thresholds, and standardizing to a specific standard (for instance, date and time formats used) are all important tasks to ensure the data is useable. A visual representation of the ETL process is shown in the following figure.
Figure 1.2 – The ETL process
AWS Glue is a fully managed ETL service offered by AWS. When it was first introduced in 2017, Glue ETL offered an Apache Spark environment optimized for ETL. Now, Glue ETL offers a wider range of options:
PySpark – Apache Spark using PythonSpark with Scala – Apache Spark with ScalaPython shell – For smaller ETL jobs that don't need a Spark clusterGlue Studio and Glue Databrew – Visual approach to ETL without the need to write code