34,79 €
Managing geospatial data and building location-based applications in the cloud can be a daunting task. This comprehensive guide helps you overcome this challenge by presenting the concept of working with geospatial data in the cloud in an easy-to-understand way, along with teaching you how to design and build data lake architecture in AWS for geospatial data.
You’ll begin by exploring the use of AWS databases like Redshift and Aurora PostgreSQL for storing and analyzing geospatial data. Next, you’ll leverage services such as DynamoDB and Athena, which offer powerful built-in geospatial functions for indexing and querying geospatial data. The book is filled with practical examples to illustrate the benefits of managing geospatial data in the cloud. As you advance, you’ll discover how to analyze and visualize data using Python and R, and utilize QuickSight to share derived insights. The concluding chapters explore the integration of commonly used platforms like Open Data on AWS, OpenStreetMap, and ArcGIS with AWS to enable you to optimize efficiency and provide a supportive community for continuous learning.
By the end of this book, you’ll have the necessary tools and expertise to build and manage your own geospatial data lake on AWS, along with the knowledge needed to tackle geospatial data management challenges and make the most of AWS services.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 308
Veröffentlichungsjahr: 2023
Discover how to manage and analyze geospatial data in the cloud
Scott Bateman
Janahan Gnanachandran
Jeff DeMuth
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Publishing Product Manager: Reshma Raman
Book Project Manager: Kirti Pisat
Content Development Editor: Joseph Sunil
Technical Editor: Sweety Pagaria
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Subalakshmi Govindhan
Production Designer: Ponraj Dhandapani
DevRel Marketing Coordinator: Nivedita Singh
First published: June 2023
Production reference: 1280623
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80461-382-5
www.packtpub.com
This book is dedicated to my father, Orren, for instilling my passion for both computers and maps. To the memory of my mother, Jolene, for teaching me the persistence and patience needed to get my thoughts on the page. To my wife, Angel, and wonderful children Jackson and Emily for the support and encouragement to bring this book to life.
Scott Bateman
This book is dedicated to my parents, Gnanachandran and Rajaletchumy, for guiding me through life’s challenges and celebrating my successes. To my amazing wife, Dilakshana and lovely children, Jashwin, and Dhruvish, for your unwavering support and belief in me. Finally, to the team at Amazon Web Services (AWS), whose relentless pursuit of excellence has transformed the way we harness technology. Thank you for inspiring this work and for your commitment to empowering businesses worldwide.
Janahan (Jana) Gnanachandran
Scott Bateman is a Principal Solutions Architect at AWS focused on customers in the energy industry. Prior to joining the AWS Houston office in 2019, he was Director of business applications at bpx energy and has worked for over a quarter century innovating with technology to solve the toughest energy business problems. As part of the Geospatial Technical Field Community (TFC) group within AWS, Scott is able to speak with customers about common challenges gathering geospatial data, tracking assets, optimizing driving routes, and better understanding facilities and property through remote sensing. When not working or writing, Scott enjoys snowboarding, flying drones, traveling to unknown destinations, and learning something new every day.
Janahan (Jana) Gnanachandran is a Principal Solutions Architect at AWS. He partners with customers across industries to increase speed, agility, and drive innovation using the cloud for digital transformation. Jana holds a BE in Electronics and Communication Engineering from Anna University, India, and an MS in Computer Engineering from the University of Louisiana at Lafayette. Alongside designing scalable, secure, and cost-effective cloud solutions, Jana conducts workshops, training sessions, and public speaking engagements on cloud best practices, architectural best practices, and data and AI/ML strategies. When not immersed in cloud computing, he enjoys playing tennis or golf, photography, and traveling with his wife and their 2 kids.
Jeff DeMuth is a solutions architect who joined Amazon Web Services (AWS) in 2016. He focuses on the geospatial community and is passionate about geographic information systems (GIS) and technology. Outside of work, Jeff enjoys travelling, building Internet of Things (IoT) applications, and tinkering with the latest gadgets.
Faizan Tayyab is a GIS Professional with over 16 years of experience in the oil and gas industry. He holds a master’s degree and has several certifications covering various technologies including web & cloud technologies. He is also a software trainer, teaching courses to worldwide audience on technologies through Udemy, a popular online training platform. He is well-versed with various technologies and actively contributes to open source geospatial community through development and distribution of software components available free for use by other geographers and developers.
Angela Orthmeyer is currently the Lead Geospatial Data Analyst at RapidSOS, the world’s leading intelligent safety platform. Angela is a creative problem solver with experience in GIS, data science and project management. Prior to her position at RapidSOS, she was a Data Scientist at CKM Analytix, a Natural Resources Social Scientist at the National Oceanic and Atmospheric Administration, and a Peace Corps Volunteer in Panama. Her education spans the social and natural sciences. Angela received a Master of Environmental Management from Yale University, a B.S. in Biology from the University of Richmond, and a certificate in Data Analytics from Principal Analytics Prep.
Rohit Mendadhala is a Geospatial Data Scientist and an FAA Certified Drone Pilot with over 7 years of professional experience in developing, managing, and implementing geospatial solutions at scale for organizations across a wide range of industries such as government, environmental, transportation, software, telecom, real-estate research. His core areas of expertise include Geospatial Data Analytics, Data Visualization, Spatial Analysis and Mapping, Spatial Data Science, GIS Development, Web and Enterprise GIS, Market Research and Analysis using ArcGIS and open-source geospatial platforms. He enjoys discovering underlying patterns in large datasets with a spatial context and curating answers to critical thought-provoking questions.
Shital Dhakal is a seasoned GIS professional with over seven years’ experience in the field of GIS and remote sensing. He has acquired industry and research experience in North America, Europe, and Asia. Currently, he works at a San Francisco Bay Area-based start-up and helps local government to implement enterprise GIS strategies. He is a certified GIS Professional (GISP) and has an MSc from Boise State University, Idaho. When he is not playing with spatial data, writing blogs, or making maps, he can be found hiking in the Sierra Nevada and, occasionally, in the Himalayas.
In this part we will learn how to work with Geospatial Data in the cloud, and the economics of storing and analyzing the data in the cloud.
This part has the following chapters:
Chapter 1, Introduction to Geospatial Data in the Cloud, shows us how to work with Geospatial Data in the cloud, and the economics of storing and analyzing the data in the cloud.Chapter 2, Quality and Temporal Geospatial Concepts, explores the different quality characteristics of geospatial data. Additionally, concepts will be presented that show how the time-specific (temporal) aspects of data can be captured and designated in the data structure.This book is divided into four parts that will walk you through key concepts, tools, and techniques for dealing with geospatial data. Part 1 sets the foundation for the entire book, establishing key ideas that provide synergy with subsequent parts. Each chapter is further subdivided into topics that dive deep into a specific subject. This introductory chapter of Part 1 will cover the following topics:
Introduction to cloud computing and AWSStoring geospatial data in the cloudBuilding your geospatial data strategyGeospatial data management best practicesCost management in the cloudYou are most likely familiar with the benefits that geospatial analysis can provide. Governmental entities, corporations, and other organizations routinely solve complex, location-based problems with the help of geospatial computing. While paper maps are still around, most use cases for geospatial data have evolved to live in the digital world. We can now create maps faster and draw more geographical insights from data than at any point in history. This phenomenon has been made possible by blending the expertise of geospatial practitioners with the power of Geographical Information Systems (GIS). Critical thinking and higher-order analysis can be done by humans while computers handle the monotonous data processing and rendering tasks. As the geospatial community continues to refine the balance of which jobs require manual effort and which can be handled by computers, we are collectively improving our ability to understand our world.
Geospatial computing has been around for decades, but the last 10 years have seen a dramatic shift in the capabilities and computing power available to practitioners. The emergence of the cloud as a fundamental building block of technical systems has offered needle-moving opportunities in compute, storage, and analytical capabilities. In addition to a revolution in the infrastructure behind GIS systems, the cloud has expanded the optionality in every layer of the technical stack. Common problems such as running out of disk space, long durations of geospatial processing jobs, limited data availability, and difficult collaboration across teams can be things of the past. AWS provides solutions to these problems and more, and in this book, we will describe, dissect, and provide examples of how you can do this for your organization.
Cloud computing provides the ability to rapidly experiment with new tools and processing techniques that would never be possible using a fixed set of compute resources. Not only are new capabilities available and continually improving but your team will also have more time to learn and use these new technologies with the time saved in creating, configuring, and maintaining the environment. The undifferenced heavy lifting of managing geospatial storage devices, application servers, geodatabases, and data flows can be replaced with time spent analyzing, understanding, and visualizing the data. Traditional this or that technical trade-off decisions are no longer binary proposals. Your organization can use the right tool for each job, and blend as many tools and features into your environment as is appropriate for your requirements. By paying for the precise amount of resources you use in AWS, it is possible to break free from restrictive, punitive, and time-limiting licensing situations. In some cases, the amount of an AWS compute resource you use is measured and charged down to the millisecond, so you literally don’t pay for a second of unused time. If a team infrequently needs to leverage a capability, such as a monthly data processing job, this can result in substantial cost savings by eliminating idle virtual machines and supporting technical resources. If cost savings are not your top concern, the same proportion of your budget can be dedicated to more capable hardware that delivers dramatically reduced timeframes compared to limited compute environments.
The global infrastructure of AWS allows you to position data in the best location to minimize latency, providing the best possible performance. Powerful replication and caching technologies can be used to minimize wait time and allow robust cataloging and characterization of your geospatial assets. The global flexibility of your GIS environment is further enabled with the use of innovative end user compute options. Virtual desktop services in AWS allow organizations to keep the geospatial processing close to the data for maximum performance, even if the user is geographically distanced from both. AWS and the cloud have continued to evolve and provide never-before-seen capabilities in geospatial power and flexibility. Over the course of this book, we will examine what these concepts are, how they work, and how you can put them to work in your environment.
Now that we have learned the story of cloud computing on AWS, let’s check out how we can implement geospatial data there.
As you learn about the possibilities for storing geospatial data in the cloud, it may seem daunting due to the number of options available. Many AWS customers experiment with Amazon Simple Storage Service (S3) for geospatial data storage as their first project. Relational databases, NoSQL databases, and caching options commonly follow in the evolution of geospatial technical architectures. General GIS data storage best practices still apply to the cloud, so much of the knowledge that practitioners have gained over the years directly applies to geospatial data management on AWS. Familiar GIS file formats that work well in S3 include the following:
Shapefiles (.shp, .shx, .dbf, .prj, and others)File geodatabases (.gdb)Keyhole Markup Language (.kml)Comma-Separated Values (.csv)Geospatial JavaScript Object Notation (.geojson)Geostationary Earth Orbit Tagged Image File Format (.tiff)The physical location of data is still important for latency-sensitive workloads. Formats and organization of data can usually remain unchanged when moving to S3 to limit the impact of migrations. Spatial indexes and use-based access patterns will dramatically improve the performance and ability of your system to deliver the desired capabilities to your users.
Relational databases have long been the cornerstone of most enterprise GIS environments. This is especially true for vector datasets. AWS offers the most comprehensive set of relational database options with flexible sizing and architecture to meet your specific requirements. For customers looking to migrate geodatabases to the cloud with the least amount of environmental change, Amazon Elastic Compute Cloud (EC2) virtual machine instances provide a similar capability to what is commonly used in on-premises data centers. Each database server can be instantiated on the specific operating system that is used by the source server. Using EC2 with Amazon Elastic Block Store (EBS) network-attached storage provides the highest level of control and flexibility. Each server is created by specifying the amount of CPU, memory, and network throughput desired. Relational database management system (RDBMS) software can be manually installed on the EC2 instance, or an Amazon Machine Image (AMI) for the particular use case can be selected from the AWS catalog to remove manual steps from the process. While this option provides the highest degree of flexibility, it also requires the most database configuration and administration knowledge.
Many customers find it useful to leverage Amazon Relational Database Service (RDS) to establish database clusters and instances for their GIS environments. RDS can be leveraged by creating full-featured database Microsoft SQL Server, Oracle, PostgreSQL, MySQL, or MariaDB clusters. AWS allows the selection of specific instance types to focus on memory or compute optimization in a variety of configurations. Multiple Availability Zone (AZ)-enabled databases can be created to establish fault tolerance or improve performance. Using RDS dramatically simplifies database administration, and decreases the time required to select, provision, and configure your geospatial database using the specific technical parameters to meet the business requirements.
Amazon Aurora provides an open source path to highly capable and performant relational databases. PostgreSQL or MySQL environments can be created with specific settings for the desired capabilities. Although this may mean converting data from a source format, such as Microsoft SQL Server or Oracle, the overall cost savings and simplified management make this an attractive option to modernize and right-size any geospatial database.
In addition to standard relational database options, AWS provides other services to manage and use geospatial data. Amazon Redshift is the fastest and most widely used cloud data warehouse and supports geospatial data through the geometry data type. Users can query spatial data in Redshift’s built-in SQL functions to find the distance between two points, interrogate polygon relationships, and provide other location insights into their data. Amazon DynamoDB is a fully managed, key-value NoSQL database with an SLA of up to 99.999% availability. For organizations leveraging MongoDB, Amazon DocumentDB provides a fully managed option for simplified instantiation and management. Finally, AWS offers the Amazon OpenSearch Service for petabyte-scale data storage, search, and visualization.
The best part is that you don’t have to choose a single option for your geospatial environment. Often, companies find that different workloads benefit from having the ability to choose the most appropriate data landscape. Combining Infrastructure as a Service (IaaS) workloads with fully managed databases and modern databases is not only possible but a signature of a well-architected geospatial environment. Transactional systems may benefit from relational geodatabases, while mobile applications may be more aligned with NoSQL data stores. When you operate in a world of consumption-based resources, there is no downside to using the most appropriate data store for each workload. Having familiarity with the cloud options for storing geospatial data is crucial in strategic planning, which we will cover in the next topic.
One of the most important concepts to consider in your geospatial data strategy is the amount of change you are willing to accept in your technical infrastructure. This does not apply to new systems, but most organizations will have a treasure trove of geospatial data already. While lifting and shifting on-premises workloads to the cloud is advantageous, adapting your architecture to the cloud will amplify benefits in agility, resiliency, and cost optimization. For example, 95% of AWS customers elect to use open source geospatial databases as part of their cloud migration. This data conversion process, from vendor relational databases such as Oracle and Microsoft SQL Server to open source options such as PostgreSQL, enjoys a high degree of compatibility. This is an example of a simple change that can be made to eliminate significant license usage costs when migrating to the cloud. Simple changes such as these provide immediate and tangible benefits to geospatial practitioners in cloud architectures. Often, the same capabilities can be provided in AWS for a significantly reduced cost profile when comparing the cloud to on-premises GIS architectures.
All the same concepts and technologies you and your team are used to when operating an on-premises environment exist on AWS. Stemming from the consumption-based pricing model and broad set of EC2 instances available, AWS can offer a much more flexible model for the configuration and consumption of compute resources. Application servers used in geospatial environments can be migrated directly by selecting the platform, operating system, version, and dependencies appropriate for the given workload. Additional consideration should be given in this space to containerization where feasible. Leveraging containers in your server architecture can speed up environment migrations and provide additional scaling options.
A key part of building your geospatial data strategy is determining the structure and security of your data. AWS Identity and Access Management (IAM) serves as the foundation for defining authorization and authentication mechanisms in your environment. Single Sign-On (SSO) is commonly used to integrate with existing directories to leverage pre-existing hierarchies and permission methodologies. The flexibility of AWS allows you to bring the existing security constructs while expanding the ability to monitor, audit, and rectify security concerns in your GIS environment. It is highly recommended to encrypt most data; however, the value of encrypting unaltered public data can be debated. Keys should be regularly rotated and securely handled in accordance with any existing policies or guidelines from your organization.
As changes take place within your architecture, alerts and notifications provide critical insight to stewards of the environment. Amazon Simple Notification Service (SNS) can be integrated with any AWS service to send emails or text messages to the appropriate teams or individuals for optimized performance and security. Budgets and cost management alerts are native to AWS, making it easy to manage multiple accounts and environments based on your organization’s key performance indicators. Part of developing a cloud geospatial data strategy should be to internally ask where data issues are going unnoticed or not being addressed. By creating business rules, thresholds, and alerts, these data anomalies can notify administrators when specific areas within your data environment need attention.
Some commonly overlooked aspects of a geospatial data management strategy are the desktop end user tools that are necessary to manage and use the environment. Many GIS environments are dependent on high-powered desktop machines used by specialists. The graphics requirements for visualizing spatial data into a consumable image can be high, and the data throughput must support fluid panning and zooming through the data. Complications can arise when the user has a high-latency connection to the data. Many companies learned this the hard way when remote workers during COVID tried to continue business as usual from home. Traditional geospatial landscapes were designed for the power users to be in the office. Gigabit connectivity was a baseline requirement, and network outages meant that highly paid specialists were unable to do their work.
Virtual desktops have evolved, and continue to evolve, to provide best-in-class experiences for power users that are not co-located with their data. Part of a well-architected geospatial data management strategy is to store once, use many times. This principle takes a backseat when the performance when used is unacceptable. A short-term fix is to cache the data locally, but that brings a host of other cost and concurrency problems. Virtual desktops or Desktop-as-a-Service (DaaS) address this problem by keeping the compute close to the data. The user can be thousands of miles away and still enjoy a fluid graphical experience. Amazon WorkSpaces and Amazon AppStream provide this capability in the cloud. WorkSpaces provides a complete desktop environment for Windows or Linux that can be configured exactly as your specialists have today. AppStream adds desktop shortcuts to a specialist’s local desktop and streams the application visuals as a native application. Having access to the native geospatial data management tools as part of a cloud-based architecture results in a more robust and cohesive overall strategy.
AWS provides corporations and organizational customers
