39,59 €
Augmented Reality brings with it a set of challenges that are unseen and unheard of for traditional web and mobile developers. This book is your gateway to Augmented Reality development—not a theoretical showpiece for your bookshelf, but a handbook you will keep by your desk while coding and architecting your first AR app and for years to come.
The book opens with an introduction to Augmented Reality, including markets, technologies, and development tools. You will begin by setting up your development machine for Android, iOS, and Windows development, learning the basics of using Unity and the Vuforia AR platform as well as the open source ARToolKit and Microsoft Mixed Reality Toolkit. You will also receive an introduction to Apple's ARKit and Google's ARCore! You will then focus on building AR applications, exploring a variety of recognition targeting methods. You will go through multiple complete projects illustrating key market sectors including business marketing, education, industrial training, and gaming.
By the end of the book, you will have gained the necessary knowledge to make quality content appropriate for a range of AR devices, platforms, and intended uses.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 552
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: October 2017
Production reference: 1051017
ISBN: 978-1-78728-643-6
www.packtpub.com
Authors
Jonathan Linowes
Krystian Babilinski
Copy Editor
Safis Editing
Reviewers
Micheal Lanham
Project Coordinator
Ulhas Kambali
Commissioning Editor
Amarabha Banerjee
Proofreader
Safis Editing
Acquisition Editor
Reshma Raman
Indexer
Rekha Nair
Content Development Editor
Anurag Ghogre
Graphics
Abhinash Sahu
Technical Editor
Jash Bavishi
Production Coordinator
Melwyn Dsa
Jonathan Linowes is principal at Parkerhill Reality Labs, an immersive media Indie studio. He is a veritable 3D graphics enthusiast, Unity developer, successful entrepreneur, and teacher. He has a fine arts degree from Syracuse University and a master’s degree from the MIT Media Lab. He has founded several successful startups and held technical leadership positions at major corporations, including Autodesk Inc. He is the author of other books and videos by Packt, including Unity Virtual Reality Projects (2015) and Cardboard VR Projects for Android (2016).
Krystian Babilinski is an experienced Unity developer with extensive knowledge in 3D design. He has been developing professional AR/VR applications since 2015. He led Babilin Applications, a Unity Design group that promotes open source development and engages with the the Unity community. Krystian now leads the development at Parkerhill Reality Labs, which recently published Power Solitaire VR, a multiplatform VR game.
Micheal Lanham is a solutions architect with petroWEB and currently resides in Calgary, Alberta, in Canada. In his current role, he develops integrated GIS applications with advanced ML and spatial search capabilities. He has worked as a professional and amateur game developer; he has been building desktop and mobile games for over 15 years. In 2007, Micheal was introduced to Unity 3D and has been an avid developer, consultant, and manager of multiple Unity games and graphic projects ever since.
Micheal had previously written Augmented Reality Game Development and Game Audio Development with Unity 5.x, also published by Packt in 2017.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787286436.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Augment Your World
What is augmented reality?
Augmented reality versus virtual reality
How AR works
Handheld mobile AR
Optical eyewear AR
Target-based AR
3D spatial mapping
Developing AR with spatial mapping
Input for wearable AR
Other AR display techniques
Types of AR targets
Marker
Coded Markers
Images
Multi-targets
Text recognition
Simple shapes
Object recognition
Spatial maps
Geolocation
Technical issues in relation to augmented reality
Field of view
Visual perception
Focus
Resolution and refresh rate
Ergonomics
Applications of augmented reality
Business marketing
Education
Industrial training
Retail
Gaming
Others
The focus of this book
Summary
Setting Up Your System
Installing Unity
Requirements
Download and install
Introduction to Unity
Exploring the Unity Editor
Objects and hierarchy
Scene editing
Adding a cube
Adding a plane
Adding a material
Saving the scene
Changing the Scene view
Game development
Material textures, lighting, and shaders
Animation
Physics
Additional features
Using Cameras in AR
Getting and using Vuforia
Installing Vuforia
Downloading the Vuforia Unity package
Importing the Vuforia Assets package
VuforiaConfiguration setup
License key
Webcam
Building a quick demo with Vuforia
Adding AR Camera prefab to the Scene
Adding a target image
Adding a cube
Getting and using ARToolkit
Installing ARToolkit
Importing the ARToolkit Assets package
ARToolkit Scene setup
Adding the AR Controller
Adding the AR Root origin
Adding an AR camera
Saving the Scene
Building a quick demo with ARToolkit
Identifying the AR Marker
Adding an AR Tracked Object
Adding a cube
Summary
Building Your App
Identifying your platform and toolkits
Building and running from Unity
Targeting Android
Installing Java Development Kit (JDK)
About your JDK location
Installing an Android SDK
Installing via Android Studio
Installing via command-line tools
About your Android SDK root path location
Installing USB device, debugging and connection
Configuring Unity's external tools
Configuring a Unity platform and player for Android
Building and running
Troubleshooting
Android SDK path error
Plugins colliding error
Using Google ARCore for Unity
Targeting iOS
Having an Apple ID
Installing Xcode
Configuring the Unity player settings for iOS
ARToolkit player settings
Building and running
Troubleshooting
Plugins colliding error
Recommended project settings warning
Requires development team error
Linker failed error
No video feed on the iOS device
Using Apple ARKit for Unity
Targeting Microsoft HoloLens
Having a Microsoft developer account
Enabling Windows 10 Hyper-V
Installing Visual Studio
Installing the HoloLens emulator
Setting up and pairing the HoloLens device for development
Configuring Unity's external tools
Configuring the Unity platform and player for the UWP holographic
Build settings
Quality settings
Player settings - capabilities
Player settings - other settings
Vuforia settings for HoloLens
Enabling extended tracking
Adding HoloLensCamera to the Scene
Binding the HoloLens Camera
Building and running
Holographic emulation within Unity
MixedRealityToolkit for Unity
Summary
Augmented Business Cards
Planning your AR development
Project objective
AR targets
Graphic assets
Obtaining 3D models
Simplifying high poly models
Target device and development tools
Setting up the project (Vuforia)
Adding the image target
Adding ImageTarget prefab to the scene
Creating the target database
Importing database into Unity
Activating and running
Enable extended tracking or not?
What makes a good image target?
Adding objects
Building and running
Understanding scale
Real-life scale
Virtual scale and Unity
Target scale and object scale
Animating the drone
How do the blades spin?
Adding an Idle animation
Adding a fly animation
Connecting the clips in the Animator Controller
Playing, building and running
Building for iOS devices
Setting up the project
Adding the image target
Adding objects
Build settings
Building and running
Building and running using Apple ARKit
Building for HoloLens
Setting up the project
Adding the image target
Adding objects
Build settings
Building and running
Building with ARToolkit
Setting up the project
Preparing the image target
Adding the image target
Adding objects
Building and running
Summary
AR Solar System
The project plan
User experience
AR targets
Graphic assets
Target device and development tools
Setting up the project
Creating our initial project
Setting up the scene and folders
Using a marker target
Creating a SolarSystem container
Building the earth
Creating an earth
Rotating the earth
Adding audio
Lighting the scene
Adding sunlight
Night texture
Building an earth-moon system
Creating the container object
Creating the moon
Positioning the moon
A quick introduction to Unity C# programming
Animating the moon orbit
Adding the moon orbit
Adding a global timescale
Orbiting the sun
Making the sun the center, not the earth
Creating the sun
The earth orbiting around the sun
Tilt the earth's axis
Adding the other planets
Creating the planets with textures
Adding rings to Saturn
Switching views
Using VuMark targets (Vuforia)
Associating markers with planets
Adding a master speed rate UI
Creating a UI canvas and button
Gametime event handlers
Trigger input events
Building and running
Exporting the SolarSystem package
Building for Android devices – Vuforia
Building for iOS devices – Vuforia
Building for HoloLens – Vuforia
Building and running ARTookit
ARToolkit markers
Building the project for AR Toolkit
Using 2D bar code targets (AR Toolkit)
Markerless building and running
Building and running iOS with ARKit
Setting up a generic ARKit scene
Adding SolarSystem
Placing SolarSystem in the real world
UI for animation speed
Building and running HoloLens with MixedRealityToolkit
Creating the scene
Adding user selection of scale and time
Summary
How to Change a Flat Tire
The project plan
Project objective
User experience
Basic mobile version
AR mobile version
Markerless version
AR targets
Graphic assets and data
Software design patterns
Setting up the project
Creating the UI (view)
Creating an Instruction Canvas
Creating a Nav Panel
Creating a Content panel
Adding a title text element
Adding a body text element
Creating an Instructions Controller
Wiring up the controller with the UI
Creating an instruction data model
InstructionStep class
InstructionModel class
Connecting the model with the controller and UI
Loading data from a CSV file
Abstracting UI elements
Adding InstructionEvent to the controller
Refactoring InstructionsController
Defining InstructionElement
Linking up the UI elements in Unity
Adding image content
Adding an image to the instruction Content panel
Adding image data to the InstructionStep model
Importing the image files into your project
Adding video content
Adding video to the instruction content panel
Adding video player and render texture
Adding video data to the InstructionStep model
Adding a scroll view
Summary
Augmenting the Instruction Manual
Setting up the project for AR with Vuforia
Switching between AR Mode
Using user-defined targets
Adding a user-defined target builder
Adding an image target
Adding a capture button
Wire capture button to UDT capture event
Adding visual helpers to the AR Prompt
Adding a cursor
Adding a registration target
Removing the AR prompt during tracking
Preventing poor tracking
Integrating augmented content
Reading the AR graphic instructions
Creating AR UI elements
Displaying the augmented graphic
Making the augmented graphics
Including the instructions panel in AR
Using ARKit for spatial anchoring
Setting up the project for ARKit
Preparing the scene
Modifying the InstructionsController
Adding the AR mode button
Adding the anchor mode button
Adding the AR prompt
Adding AR graphic content
A Holographic instruction manual
Setting up the project for HoloLens
World space content canvas
Enabling the next and previous buttons
Adding an AR prompt
Placement of the hologram
Adding AR graphics content
Summary
Room Decoration with AR
The project plan
User experience
Graphic assets
Photos
Frames
User interface elements
Icon buttons
Setting up the project and scene
Create a new Unity project
Developing for HoloLens
Creating default picture
About Mixed Reality Toolkit Input Manager
Gaze Manager
Input Manager
Mixed Reality Toolkit input events
Creating a toolbar framework
Create a toolbar
PictureController component
PictureAction component
Wire up the actions
Move tool, with spatial mapping
Add the Move button and script
Use Spatial Mapping for positioning
Understanding surface planes
Scale tool with Gesture Recognizer
Adding the scale button and script
Scaling the picture
Supporting Cancel
Abstract selection menu UI
Adding the frame menu
SetFrame in PictureController
The FrameMenu object and component
Frame options objects
Activating the frame menu
Support for Cancel in PictureController
Adding the Image menu
SetImage in PictureController
The ImageMenu object and component
Image options objects
Activating the Image menu
Adjusting for Image aspect ratio
Adding and deleting framed pictures
Add and Delete in the Toolbar
GameController
Add and Delete Commands in PictureController
Handling empty scenes
UI feedback
Click audio feedback
Click animation feedback
Building for iOS with ARKit
Set up project and scene for ARKit
Use touch events instead of hand gestures
PictureAction
ClickableObjects
ScaleTool
MoveTool
Building for mobile AR with Vuforia
Set up project and scene for Vuforia
Set the image target
Add DefaultPicture to the scene
GameController
Use touch events instead of hand gestures
Summary
Poke the Ball Game
The game plan
User experience
Game components
Setting up the project
Creating an initial project
Setup the scene and folders
Importing the BallGameArt package
Setting the image target
Boxball game graphics
Ball game court
Scale adjustments
Bouncy balls
Bounce sound effect
Throwing the ball
Ready ball
Holding the ball
Throwing the ball
Detecting goals
Goal collider
CollisionBehavior component
Goal! feedback
Cheers for goals
BallGame component
Keeping score
Current core UI
Game controller
Tracking high score
Augmenting real-world objects
About Vuforia Smart Terrain
User experience and app states
Screen space canvas
Using Smart Terrain
Handling tracking events
App state
App state manager
Wiring up the state manager
Playing alternative games
Setting up the scene with ball games
Activating and deactivating games
Controlling which game to play
Other toolkits
Summary
Augmented Reality has been said to be the next major computing platform. This book shows you how to build exciting AR applications with Unity 3D and the leading AR toolkits for a spectrum of mobile and wearable devices. The book opens with an introduction to augmented reality, including the markets, technologies, and development tools. You will begin with setting up your development machine for Android, iOS, and/or Windows development, and learn the basics of using Unity and the Vuforia AR platform as well as the open source ARToolKit, Microsoft Mixed Reality toolkit, Google ARCore, and Apple’s ARKit! You will then focus on building AR applications, exploring a variety of recognition targeting methods. You will go through full projects illustrating key business sectors, including marketing, education, industrial training, and gaming. Throughout the book, we introduce major concepts in AR development, best practices in user experience, and important software design patterns that every professional and aspiring software developer should use. It was quite a challenge to construct the book in a way that (hopefully) retains its usefulness and relevancy for years to come. There is an ever-increasing number of platforms, toolkits, and AR-capable devices emerging each year. There are solid general-purpose toolkits such as Vuforia and the open-source ARToolkit, which support both Android and iOS devices. There is the beta Microsoft HoloLens and its Mixed Reality Toolkit for Unity. We had nearly completed writing this book when Apple announced its debut into the market with ARKit, and Google ARCore, so we took the time to integrate ARKit and ARCore into our chapter projects too. By the end of this book, you will gain the necessary knowledge to make quality content appropriate for a range of AR devices, platforms, and intended uses.
Chapter 1, Augment Your World, will introduce you to augmented reality and how it works, including a range of best practices, devices, and practical applications.
Chapter 2, Setting Up Your System, walks you through installing Unity, Vuforia, ARToolkit, and other software needed to develop AR projects on Windows or Mac development machines. It also includes a brief tutorial on how to use Unity.
Chapter 3, Building Your App, continues from Chapter 2, Setting Up Your System, to ensure that your system is set up to build and run AR on your preferred target devices, including Android, iOS, and Windows Mixed Reality (HoloLens).
Chapter 4, Augmented Business Cards, takes you through the building of an app that augments your business card. Using a drone photography company as the example, we make its business card come to life with a flying drone in AR.
Chapter 5, AR Solar System, demonstrates the application of AR for science and education. We build an animated model of the solar system using actual NASA scale, orbits, and texture data.
Chapter 6, How to Change a Flat Tire, dives into the Unity user interface (UI) development and also explores the software design pattern, while building a how-to instruction manual. The result is a regular mobile app using text, image, and video media. This is part 1 of the project.
Chapter 7, Augmenting the Instruction Manual, takes the mobile app developed in the previous chapter and augments it, adding 3D AR graphics as a new media type. This project demonstrates how AR need not be the central feature of an app but simply another kind of media.
Chapter 8, Room Decoration with AR, demonstrates the application of AR for design, architecture, and retail visualization. In this project, you can decorate your walls with framed photos, with a world-space toolbar to add, remove, resize, position, and change the pictures and frames.
Chapter 9, Poke the Ball Game, demonstrates the development of a fun ballgame that you can play on your real-world coffee table or desk using virtual balls and game court. You shoot the ball at the goal, aim to win, and keep score.
Each project can be built using a selection of AR toolkits and hardware devices, including Vuforia or the open source ARToolkit for Android or iOS. We also show how to build the same projects to target iOS with Apple ARKit, Google ARCore, and HoloLens with the Microsoft Mixed Reality Toolkit.
Requirements will depend on what you are using for a development machine, preferred AR toolkit, and target device. We assume you are developing on a Windows 10 PC or on a macOS. You will need a device to run your AR apps, whether that be an Android smartphone or tablet, an iOS iPhone or iPad, or Microsoft HoloLens.
All the software required for this book are described and explained in Chapter 2, Setting Up Your System, and Chapter 3, Building Your App, which include web links to download what you may need. Please refer to Chapter 3, Building Your App, to understand the specific combinations of development OS, AR toolkit SDK, and target devices supported.
The ideal target audience for this book is developers who have some experience in mobile development, either Android or iOS. Some broad web development experience would also be beneficial.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "We can include other contexts through the use of the include directive."
A block of code is set as follows:
[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)exten => s,102,Voicemail(b100)exten => i,1,Voicemail(s0)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]exten => s,1,Dial(Zap/1|30)exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample
/etc/asterisk/cdr_mysql.conf
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Clicking the Next button moves you to the next screen."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
The completed projects are available on GitHub in an account dedicated to this book: https://github.com/arunitybook. We encourage our readers to submit improvements, issues, and pull requests via GitHub. As AR toolkits and platforms change frequently, we aim to keep the repositories up to date with the help of the community.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Augmented-Reality-for-Developers. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/AugmentedRealityforDevelopers_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
We're at the dawn of a whole new computing platform, preceded by personal computers, the internet, and mobile device revolutions. Augmented reality (AR) is the future, today!
Let's help invent this future where your daily world is augmented by digital information, assistants, communication, and entertainment. As it emerges, there is a booming need for developers and other skilled makers to design and build these applications.
This book aims to educate you about the underlying AR technologies, best practices, and steps for making AR apps, using some of the most powerful and popular 3D development tools available, including Unity with Vuforia, Apple ARKit, Google ARCore, Microsoft HoloLens, and the open source ARToolkit. We will guide you through the making of quality content appropriate to a variety of AR devices and platforms and their intended uses.
In this first chapter, we introduce you to AR and talk about how it works and how it can be used. We will explore some of the key concepts and technical achievements that define the state of the art today. We then show examples of effective AR applications, and introduce the devices, platforms, and development tools that will be covered throughout this book.
Welcome to the future!
We will cover the following topics in this chapter:
Augmented reality versus virtual reality
How AR works
Types of markers
Technical issues with augmented reality
Applications of augmented reality
The focus of this book
Simply put, AR is the combination of digital data and real-world human sensory input in real-time that is apparently attached (registered) to the physical space.
AR is most often associated with visual augmentation, where computer graphics are combined with actual world imagery. Using a mobile device, such as a smartphone or tablet, AR combines graphics with video. We refer to this as handheld video see-through. The following is an image of the Pokémon Go game that brought AR to the general public in 2016:
AR is not really new; it has been explored in research labs, military, and other industries since the 1990's. Software toolkits for desktop PCs have been available as both open source and propriety platforms since the late 1990's. The proliferation of smartphones and tablets has accelerated the industrial and consumer interest in AR. And certainly, opportunities for handheld AR have not yet reached their full potential, with Apple only recently entering the fray with its release of ARKit for iOS in June 2017 and Google's release of ARCore SDK for Android in August 2017.
Much of today's interest and excitement for AR is moving toward wearable eyewear AR with optical see-through tracking. These sophisticated devices, such as Microsoft HoloLens and Metavision's Meta headsets, and yet-to-be-revealed (as of this writing) devices from Magic Leap and others use depth sensors to scan and model your environment and then register computer graphics to the real-world space. The following is a depiction of a HoloLens device used in a classroom:
However, AR doesn't necessarily need to be visual. Consider a blind person using computer-generated auditory feedback to help guide them through natural obstacles. Even for a sighted person, a system like that which augments the perception of your real-world surroundings with auditory assistance is very useful. Inversely, consider a deaf person using an AR device who listens and visually displays the sounds and words going on around them.
Also, consider tactic displays as augmented reality for touch. A simple example is, the Apple Watch with a mapping app that will tap you on your wrist with haptic vibrations to remind you it's time to turn at the next intersection. Bionics is another example of this. It's not hard to consider the current advances in prosthetics for amputees as AR for the body, augmenting kinesthesia perception of body position and movement.
Then, there's this idea of augmenting spatial cognition and way finding. In 2004, researcher Udo Wachter built and wore a belt on his waist, lined with haptic vibrators (buzzers) attached every few inches. The buzzer facing north at any given moment would vibrate, letting him constantly know what direction he was facing. Udo's sense of direction improved dramatically over a period of weeks (https://www.wired.com/2007/04/esp/):
Can AR apply to smell or taste? I don't really know, but researchers have been exploring these possibilities as well.
OK, this may be getting weird and very science fictiony. (Have you read Ready Player One and Snow Crash?) But let's play along a little bit more before we get into the crux of this specific book.
According to the Merriam-Webster dictionary (https://www.merriam-webster.com), the word augment is defined as, to make greater, more numerous, larger, or more intense. And reality is defined as, the quality or state of being real. Take a moment to reflect on this. You will realize that augmented reality, at its core, is about taking what is real and making it greater, more intense, and more useful.
Apart from this literal definition, augmented reality is a technology and, more importantly, a new medium whose purpose is to improve human experiences, whether they be directed tasks, learning, communication, or entertainment. We use the word real a lot when talking about AR: real-world, real-time, realism, really cool!
As human flesh and blood, we experience the real world through our senses: eyes, ears, nose, tongue, and skin. Through the miracle of life and consciousness, our brains integrate these different types of input, giving us vivid living experiences. Using human ingenuity and invention, we have built increasingly powerful and intelligent machines (computers) that can also sense the real world, however humbly. These computers crunch data much faster and more reliably than us. AR is the technology where we allow machines to present to us a data-processed representation of the world to enhance our knowledge and understanding.
In this way, AR uses a lot of artificial intelligence (AI) technologies. One way AR crosses with AI is in the area of computer vision. Computer vision is seen as a part of AI because it utilizes techniques for pattern recognition and computer learning. AR uses computer vision to recognize targets in your field of view, whether specific coded markers, natural feature tracking (NFT), or other techniques to recognize objects or text. Once your app recognizes a target and establishes its location and orientation in the real world, it can generate computer graphics that aligns with those real-world transforms, overlaid on top of the real-world imagery.
However, augmented reality is not just the combining of computer data with human senses. There's more to it than that. In his acclaimed 1997 research report, A Survey of augmented reality(http://www.cs.unc.edu/~azuma/ARpresence.pdf), Ronald Azuma proposed AR meet the following characteristics:
Combines real and virtual
Interactive in real time
Registered in 3D
AR is experienced in real time, not pre-recorded. Cinematic special effects, for example, that combine real action with computer graphics do not count as AR.
Also, the computer-generated display must be registered to the real 3D world. 2D overlays do not count as AR. By this definition, various head-up displays, such as in Iron Man or even Google Glass, are not AR. In AR, the app is aware of its 3D surroundings and graphics are registered to that space. From the user's point of view, AR graphics could actually be real objects physically sharing the space around them.
Throughout this book, we will emphasize these three characteristics of AR. Later in this chapter, we will explore the technologies that enable this fantastic combination of real and virtual, real-time interactions, and registration in 3D.
As wonderful as this AR future may seem, before moving on, it would be remiss not to highlight the alternative possible dystopian future of augmented reality! If you haven't seen it yet, we strongly recommend watching the Hyper-Reality video produced by artist Keiichi Matsuda (https://vimeo.com/166807261). This depiction of an incredible, frightening, yet very possible potential future infected with AR, as the artist explains, presents a provocative and kaleidoscopic new vision of the future, where physical and virtual realities have merged, and the city is saturated in media. But let's not worry about that right now. A screenshot of the video is as follows:
Virtual reality (VR) is a sister technology of AR. As described, AR augments your current experience in the real world by adding digital data to it. In contrast, VR magically, yet convincingly, transports you to a different (computer-generated) world. VR is intended to be a totally immersive experience in which you are no longer in the current environment. The sense of presence and immersion are critical for VR's success.
AR does not carry that burden of creating an entire world. For AR, it is sufficient for computer-generated graphics to be added to your existing world space. Although, as we'll see, that is not an easy accomplishment either and in some ways is much more difficult than VR. They have much in common, but AR and VR have contrasting technical challenges, market opportunities, and useful applications.
Since VR is so immersive, its applications are inherently limited. As a user, the decision to put on a VR headset and enter into a VR experience is, well, a commitment. Seriously! You are deciding to move yourself from where you are now and to a different place.
AR, however, brings virtual stuff to you. You physically stay where you are and augment that reality. This is a safer, less engaging, and more subtle transaction. It carries a lower barrier for market adoption and user acceptance.
VR headsets visually block off the real world. This is very intentional. No external light should seep into the view. In VR, everything you see is designed and produced by the application developer to create the VR experience. The technology design and development implications of this requirement are immense. A fundamental problem with VR is motion to photon latency. When you move your head, the VR image must update quickly, within 11 milliseconds for 90 frames per second, or you risk experiencing motion sickness. There are multiple theories why this happens (see https://en.wikipedia.org/wiki/Virtual_reality_sickness).
In AR, latency is much less of a problem because most of the visual field is the real world, either a video or optical see-through. You're less likely to experience vertigo when most of what you see is real world. Generally, there's a lot less graphics to render and less physics to calculate in each AR frame.
VR also imposes huge demands on your device's CPU and GPU processors to generate the 3D view for both left and right eyes. VR generates graphics for the entire scene as well as physics, animations, audio, and other processing requirements. Not as much rendering power is required by AR.
On the other hand, AR has an extra burden not borne by VR. AR must register its graphics with the real world. This can be quite complicated, computationally. When based on video processing, AR must engage image processing pattern recognition in real time to find and follow the target markers. More complex devices use depth sensors to build and track a scanned model of your physical space in real time (Simultaneous Localization and Mapping, or SLAM). As we'll see, there are a number of ways AR applications manage this complexity, using simple target shapes or clever image recognition and matching algorithms with predefined natural images. Should this be: Custom depth sensing hardware and semiconductors are used to calculate a 3D mesh of the user's environment in real time, along with geolocation sensors. This, in turn, is used to register the position and orientation computer graphics superimposed on the real-world visuals.
VR headsets ordinarily include headphones that, like the visual display, preferably block outside sounds in the real world so you can be fully immersed in the virtual one using spatial audio. In contrast, AR headsets provide open-back headphones or small speakers (instead of headphones) that allow the mix of real-world sounds with the spatial audio coming from the virtual scene.
Because of these inherent differences between AR and VR, the applications of these technologies can be quite different. In our opinion, a lot of applications presently being explored for VR will eventually find their home in AR instead. Even in cases where it's ambiguous whether the application could either augment the real world versus transport the user to a virtual space, the advantage of AR not isolating you from the real world will be key to the acceptance of these applications. Gaming will be prevalent with both AR and VR, albeit the games will be different. Cinematic storytelling and experiences that require immersive presence will continue to thrive in VR. But all other applications of 3D computer simulations may find their home in the AR market.
For developers, a key difference between VR and AR, especially when considering head-mounted wearable devices, is that VR is presently available in the form of consumer devices, such as Oculus Rift, HTC Vive, PlayStation VR, and Google Daydream, with millions of devices already in consumers' hands. Wearable AR devices are still in Beta release and quite expensive. That makes VR business opportunities more realistic and measurable. As a result, AR is largely confined to handheld (phone or tablet-based) apps for consumers, or if you delve into wearables, it's an internal corporate project, experimental project, or speculative product R&D.
We've discussed what augmented reality is, but how does it work? As we said earlier, AR requires that we combine the real environment with a computer-generated virtual environment. The graphics are registered to the real 3D world. And, this must be done in real time.
There are a number of ways to accomplish this. In this book, we will consider just two. The first is the most common and accessible method: using a handheld mobile device such as a smartphone or tablet. Its camera captures the environment, and the computer graphics are rendered on the device's screen.
A second technique, using wearable AR smartglasses, is just emerging in commercial devices, such as Microsoft HoloLens and Metavision's Meta 2. This is an optical see-through of the real world, with computer graphics shown on a wearable near-eye display.
Using a handheld mobile device, such as a smartphone or tablet, augmented reality uses the device's camera to capture the video of the real world and combine it with virtual objects.
As illustrated in the following image, running an AR app on a mobile device, you simply point its camera to a target in the real world and the app will recognize the target and render a 3D computer graphic registered to the target's position and orientation. This is handheld mobile video see-through augmented reality:
We use the words handheld and mobile because we're using a handheld mobile device. We use video see-through because we're using the device's camera to capture reality, which will be combined with computer graphics. The AR video image is displayed on the device's flat screen.
Mobile devices have features important for AR, including the following:
Untethered and battery-powered
Flat panel graphic display touchscreen input
Rear-facing camera
CPU (main processor), GPU (graphics processor), and memory
Motion sensors, namely accelerometer for detecting linear motion and gyroscope for rotational motion
GPS and/or other position sensors for geolocation and wireless and/or Wi-Fi data connection to the internet
Let's chat about each of these. First of all, mobile devices are... mobile.... Yeah, I know you get that. No wires. But what this really means is that like you, mobile devices are free to roam the real world. They are not tethered to a PC or other console. This is natural for AR because AR experiences take place in the real world, while moving around in the real world.
Mobile devices sport a flat panel color graphic display with excellent resolution and pixel density sufficient for handheld viewing distances. And, of course, the killer feature that helped catapult the iPhone revolution is the multitouch input sensor on the display that is used for interacting with the displayed images with your fingers.
A rear-facing camera is used to capture video from the real world and display it in real time on the screen. This video data is digital, so your AR app can modify it and combine virtual graphics in real time as well. This is a monocular image, captured from a single camera and thus a single viewpoint. Correspondingly, the computer graphics use a single viewpoint to render the virtual objects that go with it.
Today's mobile devices are quite powerful computers, including CPU (main processor) and GPU (graphics processor), both of which are critical for AR to recognize targets in the video, process sensor, and user input, and render the combined video on the screen. We continue to see these requirements and push hardware manufacturers to try ever harder to deliver higher performance.
Built-in sensors that measure motion, orientation, and other conditions are also key to the success of mobile AR. An accelerometer is used for detecting linear motion along three axes and a gyroscope for detecting rotational motion around the three axes. Using real-time data from the sensors, the software can estimate the device's position and orientation in real 3D space at any given time. This data is used to determine the specific view the device's camera is capturing and uses this 3D transformation to register the computer-generated graphics in 3D space as well.
In addition, GPS sensor can be used for applications that need to map where they are on the globe, for example, the use of AR to annotate a street view or mountain range or find a rogue Pokémon.
Last but not least, mobile devices are enabled with wireless communication and/or Wi-Fi connections to the internet. Many AR apps require an internet connection, especially when a database of recognition targets or metadata needs to be accessed online.
In contrast to handheld mobiles, AR devices worn like eyeglasses or futuristic visors, such as Microsoft HoloLens and Metavision Meta, may be referred to as optical see-through eyewear augmented reality devices, or simply, smartglasses. As illustrated in the following image, they do not use video to capture and render the real world. Instead, you look directly through the visor and the computer graphics are optically merged with the scene:
The display technologies used to implement optical see-through AR vary from vendor to vendor, but the principles are similar. The glass that you look through while wearing the device is not a basic lens material that might be prescribed by your optometrist. It uses a combiner lens much like a beam splitter, with an angled surface that redirects a projected image coming from the side toward your eye.
An optical see-through display will mix the light from the real world with the virtual objects. Thus, brighter graphics are more visible and effective; darker areas may get lost. Black pixels are transparent. For similar reasons, these devices do not work great in brightly lit environments. You don't need a very dark room but dim lighting is more effective.
We can refer to these displays as binocular. You look through the visor with both eyes. Like VR headsets, there will be two separate views generated, one for each eye to account for parallax and enhance the perception of 3D. In real life, each eye sees a slightly different view in front, offset by the inter-pupillary distance between your eyes. The augmented computer graphics must also be drawn separately for each eye with similar offset viewpoints.
One such device is Microsoft HoloLens, a standalone mobile unit; Metavision Meta 2 can be tethered to a PC using its processing resources. Wearable AR headsets are packed with hardware, yet they must be in a form factor that is lightweight and ergonomic so they can be comfortably worn as you move around. The headsets typically include the following:
Lens optics, with a specific field of view
Forward-facing camera
Depth sensors for positional tracking and hand recognition
Accelerometer and gyroscope for linear and rotational motion detection and near-ear audio speakers
Microphone
Furthermore, as a standalone device, you could say that HoloLens is like wearing a laptop wrapped around your head--hopefully, not for the weight but the processing capacity! It runs Windows 10 and must handle all the spatial and graphics processing itself. To assist, Microsoft developed a custom chip called holographic processing unit(HPU)to complement the CPU and GPU.
Instead of headphones, wearable AR headsets often include near-ear speakers that don't block out environmental sounds. While handheld AR could also emit audio, it would come from the phone's speaker or the headphones you may have inserted into your ears. In either case, the audio would not be registered with the graphics. With wearable near-eye visual augmentation, it's safe to assume that your ears are close to your eyes. This enables the use of spatial audio for more convincing and immersive AR experiences.
The following image illustrates a more traditional target-based AR. The device camera captures a frame of video. The software analyzes the frame looking for a familiar target, such as a pre-programmed marker, using a technique called photogrammetry. As part of target detection, its deformation (for example, size and skew) is analyzed to determine its distance, position, and orientation relative to the camera in a three-dimensional space.
From that, the camera pose (position and orientation) in 3D space is determined. These values are then used in the computer graphics calculations to render virtual objects. Finally, the rendered graphics are merged with the video frame and displayed to the user:
iOS and Android phones typically have a refresh rate of 60Hz. This means the image on your screen is updated 60 times a second, or 1.67 milliseconds per frame. A lot of work goes into this quick update. Also, much effort has been invested in optimizing the software to minimize any wasted calculations, eliminate redundancy, and other tricks that improve performance without negatively impacting user experience. For example, once a target has been recognized, the software will try to simply track and follow as it appears to move from one frame to the next rather than re-recognizing the target from scratch each time.
To interact with virtual objects on your mobile screen, the input processing required is a lot like any mobile app or game. As illustrated in the following image, the app detects a touch event on the screen. Then, it determines which object you intended to tap by mathematically casting a ray from the screen's XY position into 3D space, using the current camera pose. If the ray intersects a detectable object, the app may respond to the tap (for example, move or modify the geometry). The next time the frame is updated, these changes will be rendered on the screen:
A distinguishing characteristic of handheld mobile AR is that you experience it from an arm's length viewpoint. Holding the device out in front of you, you look through its screen like a portal to the augmented real world. The field of view is defined by the size of the device screen and how close you're holding it to your face. And it's not entirely a hands-free experience because unless you're using a tripod or something to hold the device, you're using one or two hands to hold the device at all times.
Snapchat's popular augmented reality selfies go even further. Using the phone's front-facing camera, the app analyzes your face using complex AI pattern matching algorithms to identify significant points, or nodes, that correspond to the features of your face--eyes, nose, lips, chin, and so on. It then constructs a 3D mesh, like a mask of your face. Using that, it can apply alternative graphics that match up with your facial features and even morph and distort your actual face for play and entertainment. See this video for a detailed explanation from Snapchat's Vox engineers: https://www.youtube.com/watch?v=Pc2aJxnmzh0. The ability to do all of this in real time is remarkably fun and serious business:
Perhaps, by the time you are reading this book, there will be mobile devices with built-in depth sensors, including Google Project Tango and Intel RealSense technologies, capable of scanning the environment and building a 3D spatial map mesh that could be used for more advanced tracking and interactions. We will explain these capabilities in the next topic and explore them in this book in the context of wearable AR headsets, but they may apply to new mobile devices too.
Handheld mobile AR described in the previous topic is mostly about augmenting 2D video with regard to the phone camera's location in 3D space. Optical wearable AR devices are completely about 3D data. Yes, like mobile AR, wearable AR devices can do target-based tracking using its built-in camera. But wait, there's more, much more!
These devices include depth sensors that scan your environment and construct a spatial map (3D mesh) of your environment. With this, you can register objects to specific surfaces without the need for special markers or a database of target images for tracking.
A depth sensor measures the distance of solid surfaces from you, using an infrared (IR) camera and projector. It projects IR dots into the environment (not visible to the naked eye) in a pattern that is then read by its IR camera and analyzed by the software (and/or hardware). On nearer objects, the dot pattern spread is different than further ones; depth is calculated using this displacement. Analysis is not performed on just a single snapshot but across multiple frames over time to provide more accuracy, so the spatial model can be continuously refined and updated.
A visible light camera may also be used in conjunction with the depth sensor data to further improve the spatial map. Using photogrammetry techniques, visible features in the scene are identified as a set of points (nodes) and tracked across multiple video frames. The 3D position of each node is calculated using triangulation.
From this, we get a good 3D mesh representation of the space, including the ability to discern separate objects that may occlude (be in front of) other objects. Other sensors locate the user's actual head in the real world, providing the user's own position and view of the scene. This technique is called SLAM. Originally developed for robotics applications, the 2002 seminal paper on this topic by Andrew Davison, University of Oxford, can be found at https://www.doc.ic.ac.uk/~ajd/Publications/davison_cml2002.pdf.
A cool thing about present day implementations of SLAP is how the data is continuously updated in response to real time sensor readings in your device.
The following illustration shows what occurs during each update frame. The device uses current readings from its sensors to maintain the spatial map and calculate the virtual camera pose. This camera transformation is then used to render views of the virtual objects registered to the mesh. The scene is rendered twice, for the left and right eye views. The computer graphics are displayed on the head-mounted visor glass and will be visible to the user as if it were really there--virtual objects sharing space with real world physical objects:
That said, spatial mapping is not limited to devices with depth sensing cameras. Using clever photogrammetry techniques, much can be accomplished in software alone. The Apple iOS ARKit, for example, uses just the video camera of the mobile device, processing each frame together with its various positional and motion sensors to fuse the data into a 3D point cloud representation of the environment. Google ARCore works similarly. The Vuforia SDK has a similar tool, albeit more limited, called Smart Terrain.
Spatial mapping is the representation of all of the information the app has from its sensors about the real world. It is used to render virtual AR world objects. Specifically, spatial mapping is used to do the following:
Help virtual objects or characters navigate around the room
Have virtual objects occlude a real object or be occluded by a real object to interact with something, such as bouncing off the floor
Place a virtual object onto a real object
Show the user a visualization of the room they are in
In video game development, a level designer's job is to create the fantasy world stage, including terrains, buildings, passageways, obstacles, and so on. The Unity game development platform has great tools to constrain the navigation of objects and characters within the physical constraints of the level. Game developers, for example, add simplified geometry, or navmesh, derived from a detailed level design; it is used to constrain the movement of characters within a scene. In many ways, the AR spatial map acts like a navmesh for your virtual AR objects.
A spatial map, while just a mesh, is 3D and does represent the surfaces of solid objects, not just walls and floors but furniture. When your virtual object moves behind a real object, the map can be used to occlude virtual objects with real-world objects when it's rendered on the display. Normally, occlusion is not possible without a spatial map.
When a spatial map has collider properties, it can be used to interact with virtual objects, letting them bump into or bounce off real-world surfaces.
Lastly, a spatial map could be used to transform physical objects directly. For example, since we know where the walls are, we can paint them a different color in AR.
This can get pretty complicated. A spatial map is just a triangular mesh. How can your application code determine physical objects from that? It's difficult but not an unsolvable problem. In fact, the HoloLens toolkit, for example, includes a spatialUnderstanding module that analyzes the spatial map and does higher level identification, such as identification of floor, ceiling, and walls, using techniques such as ray casting, topology queries, and shape queries.
Spatial mapping can encompass a whole lot of data that could overwhelm the processing resources of your device and deliver an underwhelming user experience. HoloLens, for example, mitigates this by letting you subdivide your physical space into what they call spatial surface observers, which in turn contain a set of spatial surfaces. An observer is a bounding volume that defines a region of space with mapping data as one or more surfaces. A surface is a triangle 3D mesh in real-world 3D space. Organizing and partitioning space reduces the dataset needed to be tracked, analyzed, and rendered for a given interaction.
Ordinarily AR eyewear devices neither use a game controller or clicker nor positionally tracked hand controllers. Instead, you use your hands. Hand gesture recognition is another challenging AI problem for computer vision and image processing.
In conjunction with tracking, where the user is looking (gaze), gestures are used to trigger events such as select, grab, and move. Assuming the device does not support eye tracking (moving your eyes without moving your head), the gaze reticle is normally at the center of your gaze. You must move your head to point to the object of interest that you want to interact with: