Professional Augmented Reality Browsers for Smartphones - Lester Madden - E-Book

Professional Augmented Reality Browsers for Smartphones E-Book

Lester Madden

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Create amazing mobile augmented reality apps with junaio, Layar, and Wikitude! Professional Augmented Reality Browsers for Smartphones guides you through creating your own augmented reality apps for the iPhone, Android, Symbian, and bada platforms, featuring fully workable and downloadable source code. You will learn important techniques through hands-on applications, and you will build on those skills as the book progresses. Professional Augmented Reality Browsers for Smartphones: * Describes how to use the latitude/longitude coordinate system to build location-aware solutions and tells where to get POIs for your own augmented reality applications * Details the leading augmented reality platforms and highlights the best applications * Covers development for the leading augmented reality browser platforms: Wikitude, Layar, and junaio * Shows how to build cross-platform location-aware content (Android, iPhone, Symbian, and bada) to display POIs directly in camera view * Includes tutorials for building 2D and 3D content, storing content in databases, and triggering actions when users reach specific locations wrox.com Programmer Forums Join our Programmer to Programmer forums to ask and answer programming questions about this book, join discussions on the hottest topics in the industry, and connect with fellow programmers from around the world. Code Downloads Take advantage of free code samples from this book, as well as code samples from hundreds of other books, all ready to use. Read More Find articles, ebooks, sample chapters, and tables of contents for hundreds of books, and more reference resources on programming topics that matter to you. Wrox Professional guides are planned and written by working programmers to meet the real-world needs of programmers, developers, and IT professionals. Focused and relevant, they address the issues technology professionals face every day. They provide examples, practical solutions, and expert education in new technologies, all designed to help programmers do a better job.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 393

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



CONTENTS

Part I: Introduction

Chapter 1: Introducing Augmented Reality (AR)

My Augmented Reality Journey

Why AR is Useful?

Summary

Chapter 2: Natural-Feature Tracking and Visual Search

Introducing Natural-Feature Tracking

Introducing Visual Search

Marketing AR-Enabled Apps

Summary

Chapter 3: Introduction to AR Browsers

AR Browser Basics

Wikitude World Browser

Layar Reality Browser

junaio

Browser Accuracy

Summary

Chapter 4: Latitude, Longitude, and Where to get POIs

Summary

Part II: Wikitude

Chapter 5: Building Worlds with KML

Using the Wikitude Dashboard

Developing with KML

Summary

Chapter 6: Building Worlds with ARML

Understanding Augmented Reality Markup Language (ARML)

Summary

Part III: Layar

Chapter 7: Building Layar Layers

Creating Your Layar Account

Creating a Layer

Preparing the Database

Customizing Your Layer

Adding Layar Actions

Summary

Chapter 8: Creating Filters and 2D Objects

Using Filters

Experimenting with 2D Objects

Summary

Chapter 9: Using Layar Tools

Launching Layers

The Layar Shortcut Tool

Hoppala

BuildAR

Skaloop

Summary

Part IV: junaio

Chapter 10: Creating junaio Channels

Understanding the Requirements

Setting up the Apache Server

Creating Your First Channel

Adding Images, Sound, and Video

Creating 3D Content

Using Animation

Using OBJ files

Creating 3D Content

Summary

Chapter 11: Natural-feature Tracking and Visual Search with junaio

Natural-feature Tracking for Non-Developers

Natural-feature Tracking for Developers

Using Visual Search

Overlaying Videos (Movie Textures)

Image Requirements for Natural-feature Tracking

Summary

Part V: The Next Steps

Chapter 12: Adding Advanced Functionality

Working with Dedicated XML Files

Creating Advanced Interactions

Using LLA Markers

Retrieving Data from a Database

Summary

Chapter 13: Taking Your Application to Market

Marketing Your Content

Summary

Chapter 14: The Future of AR

Using AR in Marketing

Using AR for Translation Services

Using AR for Interactive TV

Using AR in Diminishing Reality

Using AR in Advertising

Using AR in Books and Print

Using AR in Gaming

Using AR in Hardware

Summary

Appendix A: Wikitude Support and ARML Parameters

Support

ARML Parameters

Appendix B: Layar Support and Parameters

Support

Request Parameters

Appendix C: junaio Support and Parameters

Support Channels

Junaio Certification Program

Junaio Parameters

Troubleshooting Guide

Introduction

PART I

Introduction

CHAPTER 1: Introducing Augmented Reality (AR)CHAPTER 2: Natural-Feature Tracking and Visual SearchCHAPTER 3: Introduction to AR BrowsersCHAPTER 4: Latitude, Longitude, and Where to get POIs

Chapter 1

Introducing Augmented Reality (AR)

WHAT’S IN THIS CHAPTER?

What is augmented reality?Types of augmented realityWhy augmented reality is useful

This chapter introduces you to the various augmented reality technologies and provides details of some applications that you may want to try.

MY AUGMENTED REALITY JOURNEY

I first stumbled upon augmented reality (AR) in February 2009 when I was working for Symbian, the company that developed the mobile OS for S60, UIQ, and MOAP smartphones. It was a Friday afternoon and I was in need of some distractions after a particularly long meeting about the annual Smartphone Show we were planning.

Like many companies, we had a forum where employees could post interesting things they have found on the web; one of those interesting things happened to be an AR video. The video had just been posted to YouTube by Microsoft Research and showed a researcher walking through the campus with his laptop screen open and a webcam filming the hallway in front of him. Displayed on the laptop screen, superimposed on top of the video from the webcam was a trail of bubbles that were emanating from an object hidden somewhere in the building. As the researcher got closer to the object, the bubbles became thicker and thicker until the object was eventually located.

I remember being amazed at how computer graphics and the live video feed had been combined in this way. Beyond simple multimedia, it was simply amazing and I knew it was going to be the next big thing for smartphones. After much frantic searching on the web, I eventually discovered a handful of AR applications (mostly available for Nokia smartphones) and I decided to create AugmentedPlanet.com as the place to document the rise of AR across mobile, web, and the desktop.

What is AR?

I like to think of AR as being the opposite of virtual reality. Virtual reality immerses the user in a computer-generated world whereas AR combines the real world with computer graphics. In effect, AR brings the computer world to us. Unlike virtual reality, which requires specialist equipment to be experienced, AR requires only a way to capture the world around you and the means to experience the computer world (typically by overlaying computer graphics in the camera window). Because the requirements are minimal, many of today’s smartphones are ideal AR devices.

Before we get started on AR and its technologies, it is worth pointing out that this book takes the popular view on what AR is rather than presenting the view of the purist. Over the past year or so, many new solutions have been released to the various mobile application stores calling themselves AR applications. In addition, companies like Google and metaio are combining technologies like visual search (discussed in the next chapter) with their AR browsers, blurring the line defining AR. Ultimately, as developers, I think the “How do I build that?” question is far more entertaining than a discussion on the pros and cons of what is and isn’t AR. With that in mind, I am going to take the all-encompassing view of AR and define AR as a technology that:

Combines the real world with computer graphicsProvides interaction with objects in real-timeTracks objects in real-timeProvides recognition of images or objectsProvides real-time context or data

This broader definition allows us to include technologies that are often included under the AR umbrella but are not strictly AR in the purist sense. AR, as you will learn, means different things to different people. Through blogging about AR solutions, I have learned that perceptions of what AR is and what AR isn’t vary widely. Wikipedia (http://en.wikipedia.org/wiki/Augmented_reality) defines AR as a term for a live direct or indirect view of a physical real-world environment whose elements are augmented by virtual computer-generated sensory input such as sound or graphics.

Simply put, AR is the combining of computer graphics with live video feed. With such a simple description, however, you can see why it’s difficult to agree on what is and what isn’t AR. For example, if you are watching a live TV sporting event and the players stats are shown during the game, is that AR? If you have a digital camera, there is good chance that it has a small screen that provides you some additional information. That information might be the battery power, the number of photos you have taken, or perhaps even information about the lighting environment. Are these examples of AR? Well, they both augment your reality by providing you with additional contextual information. To many, they are valid example of usages of AR, but others will argue that to be considered true AR, they must perform tracking to keep track of objects in real-time.

As I suggested, let’s not get too focused on what AR is and what AR isn’t. Instead, let’s look at what is generally accepted as being AR technology.

Gravimetric AR

Gravimetric AR is the latest trend in AR applications for mobile devices. These applications are typically called browsers and will be the focus on the applications that we build in this book. Browsers use the phone’s gravimeter to determine the position of the user and the orientation. The browser genre was invented and made popular by two applications that originally appeared for the Android. These applications overlay computer data about objects that appear in the camera window. If the device is pointed at a tourist attraction (such as the Statue of Liberty shown in Figure 1-1), relevant information about the object is overlaid for the user.

FIGURE 1-1: AR Browser example

AR browsers take advantage of a smartphone’s hardware, so when the user pans the device around, new information is displayed to provide even more contextual information. Since no tracking of objects or image recognition is used, the application can be used indoors (even where a wall obstructs the view of the target object). In fact, the camera lens of a smartphone can be covered completely because the application neither knows nor cares about what the camera sees. This has led some to argue that AR browsers are not true examples of AR because the use of the camera is largely superficial, with no real-time tracking taking place.

You will learn more about AR browsers in Chapter 3, which covers terminology in more detail and provides insight into the main browsers for which you will be building content.

Fiduciary Markers

Fiduciary markers are the truest form of AR because they are used to track objects in the real-world. As shown in Figure 1-2, black and white squares are used as a point of reference or to provide scale and orientation to the application.

FIGURE 1-2: A fiduciary marker

When the marker is recognized by the software, an action takes place. Typically that action is the overlaying of a 3D object. The software is able to track the orientation of the marker and as the orientation changes, either because the user is moving his device or moving the marker, the view of the 3D object can be changed accordingly. Ultimately, this enables the user to move around the object and view it from 360 degrees. Similarly the application may know about scale, and as the marker is moved nearer or further from the camera, the 3D object size can be adjusted accordingly. In Figure 1-3, the ARGirl AR application created for the iPhone by APetrus shows a 3D image drawn on a marker.

FIGURE 1-3: The ARGirl iPhone app

This fun application enables the user to either print or draw a marker to be recognized and tracked. When the iPhone application recognizes the marker, a 3D image of a dancer is displayed. As a nice touch, the 3D model reacts to music. Despite marker-based applications being popular with Flash developers, they haven’t yet set the world on fire for mobile devices. In the case of the iPhone, Apple only added the API necessary for developers to gain access to the raw camera feed and analyze what the camera was viewing in iOS 4.0. The ability to detect markers is still fairly new for iPhone developers; Android and Symbian developers have had the functionality for some time. The number of marker-based applications, however, is relatively few.

Barcodes

Browsers and markers represent the high end of AR technology, with markers being the earlier of the two. Some people (myself included) say that markers are a natural evolution of barcodes. If the process of recognizing a marker is considered AR, then the process of recognizing other objects–be it facial recognition or object recognition--must also be considered AR. In my opinion, the humble barcode (see Figure 1-4) is a form of AR.

FIGURE 1-4: A standard barcode

While the barcode is not a very sophisticated image, a computer is used to recognize the code and retrieve related information when scanned. In this instance, no 3D rendering takes place; only the recognition and action occurs. Developers have taken advantage of this technology to build barcode shopping applications that enable users to scan hundreds of thousands of product barcodes to find the cheapest price online. Typically these barcode applications are not categorized as AR applications in the stores. Whether or not you consider them AR is essentially up to you.

There are many applications that enable you to use a phone’s camera to scan a barcode and obtain a price online. To experiment, try Red Laser, a free app for both Android and the iPhone. Once you have installed the application, try using it on various products around the house to see what information you can retrieve. Red Laser will also retrieve nutritional information for food products, so be sure to scan a variety of barcodes to see what you can discover.

Quick Response (QR) Codes

Quick response (QR) codes are two-dimensional codes that consist of lots of squares arranged in a square pattern. Typically these are black and white, but you might discover two-color QR codes from time to time. QR codes were invented in Japan in the early 1990s and used to track the various parts in vehicle manufacturing. Today they are used as quick links to websites, quick dial for a phone number, or even to quickly send a SMS message.

While the pattern for QR code looks random, it actually contains an embedded message which can be read by a QR code reader. Once read by the camera, the action is automatically carried out. Figure 1-5 shows a QR code for AugmentedPlanet.com. If you install a free QR code reader for your device and point your camera at the code, you will automatically be taken to the Augmented Planet home page without the need to type the URL.

FIGURE 1-5: QR code for Augmented Planet

QR codes are huge in Asia, where they are frequently used in advertising and magazines. Outside Asia, however, they haven’t gained mass adoption. Google is pushing the format and in 2009 distributed over 190,000 QR codes to businesses throughout the US. These codes were window stickers that enabled anyone with a QR code reader to scan the code and call up details about the business (for example, the phone number, reviews, saving the address to favorite places, and so on). In addition, the businesses were able to offer discount vouchers to the users. If you didn’t see that particular campaign, you might have seen numerous web sites–primarily those that have Android applications–using QR codes to link directly to the Android Market download page. If a user scans the QR code with her Android device, she can be taken directly to the download location for the relevant application in seconds.

QR codes are not like markers, which are recognized only by applications which are programmed to specifically look for that marker pattern. Where each marker is specific to an application, QR codes contain the data encoded in the pattern. That’s to say that if you created a QR code that links to a web site, the pattern that is generated follows an ISO standard with the URL encoded that can be read by different QR code applications. Since the information is embedded in the pattern, it cannot be changed and you must generate a new code to handle any changes (for example, pointing to a different URL).

There are many free web sites that enable you to create your own QR codes, so take 10 minutes to experiment with creating the various types of content. Try http://qrcode.kaywa.com/.

Microsoft Tags

Like QR codes, Microsoft tags are a quick way to enable users to access content. Unlike QR codes, however, which have all the information embedded in the image, tags enable you to dynamically update the source as often as you wish. With a tag you can update the destination or action to reflect a change in the URL or phone number just by changing the destination on the tag website. This is a clear benefit because you don’t need to update any printed material. You can also password-protect the resource so it can be used only by approved users.

You might choose to add a tag to your business card and have that tag represent your home phone number. In that situation, you could give your business card to anyone, but only those who know the password would be able to access your home number. Figure 1-6 shows a tag that provides a shortcut to the Augmented Planet web site.

FIGURE 1-6: Microsoft tag for Augmented Planet

Tags can be either color or black and white and the default image also includes details of where users can obtain the tag reader software. Of course, this information is entirely optional, but it’s useful to promote exactly what the tag is and how it can be read.

Tags also provide rich tracking functionality so you can see how often users use your tag. In addition, you can specify the duration that your tag is alive, setting both its start and expiration dates. Microsoft tags are still relatively new, but Microsoft is pushing the format to many advertisers, promoting the use of tags in their campaigns as a quick way to take a user from a printed advertisement to a website or even to place a call. Unlike QR codes, tags are–to some extent–customizable and can be combined with images.

In Figure 1-7, the image on the left is the raw tag; the image on the right is a tag that has been combined with an image of my cat. Both perform the same action. A skillful artist would be able to create a tag that utilizes the colors as part of the design to create a truly blended tag.

FIGURE 1-7: Customizable tags

Branding your tags with an image of your company logo is a good way to promote your software or services while also allowing users quick access to the information you’re promoting.

You should download a copy of the Microsoft tag reader from your phone’s application store. Once you install it, you can create tags at http://tag.microsoft.com/ and then test them.

Barcodes, QR codes, and tags are the simplest examples of AR. They are often referred to as hyperlinking rather than AR.

Markerless AR

As shown in Figure 1-2, markers are the black and white squares which enable an application to track and detect the orientation and adjust the position of the 3D object accordingly. Markerless AR is using AR without tracking or tracking without a special marker. Gravimetric AR browsers are an example of markerless AR since points of interest (POIs) and other data are displayed in the camera window but the data is not attached or tracked for any particular visual object. Many of the mobile games that claim to be AR are games that simply turn on the camera as the backdrop. Other than the camera providing a local setting for the user, the use of the camera is superficial. So, again, some will claim that this is not real AR.

Figure 1-8 shows Soulbit7’s entertaining AR Invaders iPhone game; this game adds an AR twist to the classic Space Invaders game by putting users in their surroundings (courtesy of the smartphone camera’s ability to provide game’s backdrop).

FIGURE 1-8: Soulbit7’s AR Invaders iPhone game

Markerless tracking is where AR is used to track objects in the real world without using special markers. Face recognition is an excellent example. You might be surprised to see just how advanced facial recognition systems are for mobile devices. Polar Rose, a Swedish company, showcased a facial recognition prototype for Android devices in 2009. The prototype, named Augmented ID, allowed a user to point his Android phone at a person’s face and the application would compare the face to faces in a database in an effort to find a match. If a match was found, the application overlaid the subject’s social media profile (Twitter, Facebook, LinkedIn, LastFM, and so on). The technology has since been purchased by Apple, so it gives and indication of the types of markerless AR applications we’ll see in the future.

Since the acquisition, the free face recognition SDK is no longer available to download. However, at the Mobile World Congress trade show in Barcelona in February 2011, a new company named Viewdle (www.viewdle.com) showcased their free face recognition SDK for Android developers. As Figure 1-9 shows, face recognition presents interesting use-cases for mobile applications.

FIGURE 1-9: Viewdle’s face recognition SDK

You will learn more about markerless tracking in Chapter, 2, “Natural Feature Tracking and Visual Search.”

Watch this facial recognition demo to see how advanced markerless AR is:http://tinyurl.com/3ynlfop.

WHY AR IS USEFUL?

It’s easy to see how useful hyper-linking applications are. Imagine reading an article on the bus to work and seeing a link to a web site to obtain more information or download an application. You could try to remember the URL for when you get to the office or you could type the URL into your phone’s browser. With AR, however, you can just snap a picture of the QR code/tag and have the information instantly. Considering the usefulness of QR codes, it is surprising that they are not more widely used.

The real strength of AR browsers is their discoverability. Today, browsers have most of the attention and it’s amazing how many people have yet to experience a browser for themselves. Browsers are incredibly useful ways to discover information about places and objects around you. Browsers have helped me discover information about my neighborhood that I never would have discovered otherwise.

One application for the iPhone (Get London Reading) shows all the books that were set around my immediate location; surprisingly, there are a lot of books based on my neighborhood. Another application (Museum of London) superimposes old pictures from the early 1900s so you can see how your neighborhood looked in the past. Of course, these applications I describe are specific to London; in later chapters, you will get to experiment with AR browsers and discover content for your own neighborhood.

SUMMARY

AR is defined as a means for overlaying computer graphics over live video captured by the camera. An AR application can be as simple as the adding of a graphic to a video feed. As AR becomes more popular, so, too, does the debate of what AR is and what AR isn’t. If you have experienced AR by holding a marker to your webcam and interacting with 3D, you’re not likely to consider anything less as a true form of AR.

AR is also used in barcode recognition, which is considered to be among the earliest examples of AR applications. These types of applications are considered to be hyper-linking applications because they do not render any display.

The most popular type of AR application for mobile devices available today is the browser that overlays contextual data about objects or locations for your surroundings. These browsers are what you will focus on in later chapters.

Chapter 2

Natural-Feature Tracking and Visual Search

WHAT’S IN THIS CHAPTER?

An introduction to natural-feature trackingAn introduction to visual search

In this chapter, you will learn about natural-feature tracking, which expands on the concept of the marker and enables 3D objects to be overlaid on top of real images (such as book covers, CD covers, or even your own holiday snapshots). You will also learn about visual search, which is similar to natural-feature tracking except a different action takes place when an image has been recognized. Before you begin, you’ll need an Android or an iPhone.

INTRODUCING NATURAL-FEATURE TRACKING

Unlike AR solutions that use markers as their basis for recognition, natural-feature tracking solutions can be applied to almost any image as long as the image is complex enough. An example of a natural-feature tracking application is a mobile application that can recognize a movie poster. With natural-feature tracking, the application can analyze the poster and identify it by comparing the poster image to similar images. In contrast, a marker-based solution requires a special identifier to be included on the poster; it would be the marker that provides the identification rather than the poster image.

Markerless solutions make a better solution for a couple of reasons. Markers and QR codes are not user-friendly and you would have a difficult task persuading the marketing people behind the latest Hollywood blockbuster to include such content with their marketing materials. Secondly, any such markers would need to be included on material at the time of printing. Marketing campaigns that use marker technology cannot be created retroactively.

In the movie poster example, it’s impossible to add markers for Jurassic Park because all packaging and marketing materials have already been produced. With an application capable of natural-feature tracking, there is no longer a requirement for a marker to be used. Anything in the real world (including, a face, an object, or an image) can be tracked and identified.

You’ll notice that in the description of natural-feature tracking, I have gone to great lengths to avoid simply stating that the technology is image recognition. This is largely due to the action that takes place when the image is recognized. Typically, when 3D is overlaid on top of the image, the image needs to be tracked so the 3D object can respond to changes in orientation and scale. Because no marker is present, the application uses the natural features of the image to provide alignment and orientation — hence the term natural-feature tracking. Image recognition is still a valid term, but it’s more akin to visual search solutions that simply identify an image and carry out a simple action. You will learn more about visual search later in this chapter.

How Natural-Feature Tracking Works

To recognize an object, a reference image is required. In keeping with our movie theme, let’s imagine we want to recognize the Jurassic Park movie poster and draw a 3D dinosaur on the poster that can be viewed with a smartphone. Since we only want to draw the dinosaur on the correct movie poster, we obviously need to have the Jurassic Park movie poster image so it can be compared for matches. Let’s call this original Jurassic Park movie poster our reference image. When users view the movie poster with their smartphones, the poster they are viewing in their smartphone camera windows must be compared to the reference image to see if the posters match. If they match, the 3D dinosaur is displayed; if not, no action takes place. That, of course, is a huge simplification of the process. In reality, it is a lot more complex.

The ability for humans to spot patterns is an amazing skill. Just by looking at a movie poster, we can identify it against a likely match. Unlike the human eye, which can instantly tell the difference between the Jurassic Park movie poster and a chocolate wrapper, a computer needs more assistance in understanding what it is seeing. Therefore the reference image is not simply a JPG or other image file residing on a server. Instead the reference image of the movie poster has to be converted to an image map of the light and dark areas that make up the poster. The reference image that is created is no longer something that can be viewed by humans. Instead it is a file that contains the encoded information that represents the light and dark areas of the poster. Don’t worry too much about this file; the format is propriety. Just accept that it’s an encoded file that you host on your web server for comparison.

The application that will recognize the Jurassic Park poster will need to convert whatever is in the smartphone camera window into a similar map of light and dark areas. Let’s call this the comparison image. The comparison image can be created in one of two ways:

The user takes a picture of the movie poster and the application converts the picture into the comparison image. In this case, the comparison image is sent to the server were the image map is located and then analyzed to determine if they are the same.Or, as you’ll see with modern smartphones that have greater processing power, the application analyses the video feed directly from the camera window (rather than requiring a phone) and constantly generates comparison images. As soon as a match is found, the desired action takes place.

Reference images that contain lots of sharp edges and high contrasts work the best. A polar bear sitting in a snow bank would not have enough contrast to produce a suitable reference image. Color is generally not an issue because, in most cases, the colors contained in the image are lost when the image is converted into the reference image.

Scenarios for Natural-Feature Tracking

Solutions that use natural-feature tracking typically overlay a 3D object over the recognized image. Like its marker-based counterpart, the image in the camera window is tracked and, as it is rotated, the 3D object is updated in real-time, enabling the user to view the 3D object from multiple angles.

As processing power for mobile phones increase, we are beginning to see more of these types of applications. They represent the future for AR and the days of the marker are numbered. Whereas hyperlinking solutions enable the user to quickly go to a web page, natural-feature tracking enables the user to interact with the content. Imagine seeing an advertisement for the latest trendy sports car in your newspaper. Hyper-linking would enable you to quickly visit the home page for the advertisement while natural-feature tracking will enable you to point your smartphone at the advertisement and see the car appear in your camera window. By moving your smartphone around the advertisement, you can view the car from any angle. Perhaps by interacting with the keys on your phone, you will even be able to customize the car to build your perfect ride.

As you will see in later chapters, these types of solutions are possible today. Figure 2-1 shows a user interacting with 3D provided by a natural-feature tracking solution. In the picture you can see the user holding their smartphone up to an image (hidden) with 3D graphics being overlaid.

FIGURE 2-1: An example of junaio GLUE’s natural-feature tracking

Gaming

Games that use natural-feature tracking are also becoming popular. One of the first popular natural-feature tracking games for the Android platform was a game called Space InvadAR. In this game, users play a clone of the popular Space Invaders game by using a picture of planet Earth. A player simply points his camera at the picture; once the image has been recognized, alien ships are launched, and the player is tasked with defending the planet. All this takes place over 3D images of Earth, providing a realistic background for the game.

If you are interested in seeing Zenitum’s Space InvadAR game, you can find the YouTube video at http://tinyurl.com/39nyxjy.

Not all feature tracking has to rely on recognizing a predetermined image. Some applications merely detect that something is shown in the camera window and changed. The change could be that something new has been added to the video feed or something has been removed. As an example that illustrates this concept, Upsies! is an iPhone game based on the popular game of trying to keep a soccer ball in the air for as long as possible by using only your feet. The game draws a soccer ball, which is visible in the phone’s display. Point your camera at your feet and then use your feet to keep the ball in the air and prevent it from touching the ground.

The game works by comparing the images taken in each video frame from the iPhone’s video feed. Think of the app as creating image maps on the fly and comparing them for matches. As it compares images, it is able to detect that a foot (or any other object) has been introduced and it detects when the foot is near the 3D ball. If they connect, it counts as a valid move.

You can see an example of Upsies!, developed by Blind Mice Studios, athttp://tinyurl.com/4hpwa7l.

INTRODUCING VISUAL SEARCH

Visual search is hotly debated as to whether it actually is AR. As previously mentioned, we’ll take the popular view and include it here as an example of AR — but feel free to decide for yourself. Personally, I have no problem considering visual search an AR technology. If QR codes are considered part of AR’s history, then it’s difficult to deny visual search the same courtesy.

Visual search applications work in a way that is similar to a natural-feature tracking application. In fact, each works the same as the other until the point at which the object/image is recognized. Natural-feature tracking applications are likely to overlay 3D; visual search applications perform a search based on what has been recognized.

Typical examples of how visual search is used include:

ShoppingTranslating languagesIdentifying objects

Shopping

There are numerous mobile applications that enable the user to shop for a product (such as a book, a DVD, or a game) just by taking a picture of the product. This is useful if you happen to be in a store and see a product you want to buy. Rather than making a note of the name of the product, you could take a picture of the cover to obtain a price comparison from Amazon or Google advertisers. Once the photo has been taken and processed, it is sent to the server for comparison to a reference image. If the photo matches the reference image, the product is identified and the price comparison results returned to the user.

SnapTell is a free application available for both the Android and iPhone and is worth installing to get a taste of how natural-feature tracking can be used for shopping. Try installing the application and taking pictures of books you have around the house.

Translating Languages

I have long thought that AR and translation go hand in hand. Imagine traveling to a foreign country and being able to get live translations on any text you point your smartphone at. Everything from street names, menus, and instructions all at the point of a lens. Numerous applications have attempted this, but the process of recognizing the text has been a painful one. The user first needed to take a picture, then the text in the picture needed to be highlighted, and, finally, it was then submitted to a server for processing.

Word Lens from Quest Visual is the first real natural-feature tracking AR application I have seen. The application translates the text and even overwrites the original language. Figure 2-2 shows the original English text on the left; when the language is converted to Spanish (shown on the right), it is displayed in place of the original language.

FIGURE 2-2: An example of a Word Lens AR translation

Applications like Word Lens rely on an optical character recognition (OCR) engine to convert the text found in the camera so it can be submitted to a translation engine and processed. Applications that use OCRs are not considered to be AR applications, but what sets this apart is the object containing the text is tracked and the results overlaid.

Identifying Objects

One of my favorite apps to demo at an event is Google Goggles, which is available for both Android and the iPhone. Like the shopping example that searches the image for a product match, Goggles identifies wine labels, works of art, and even architecture.

Imagine being on vacation and taking a photo of a famous building. Goggles will identify it for you! Images are processed in a manner similar to other examples in this chapter, only with Goggles, the images are compared to the millions of images that Google holds.

Goggles is must-try application for anyone interested in AR. Install it from your phone’s app store by searching for Google Goggles on Android or Google Mobile App on the iPhone.

Once you have installed the application, you can point the camera at different classes of objects to perform a visual search. The application even distinguishes between objects and images. Take a photo of the image shown in Figure 2-3 and use Google Goggles to identify the object for you.

FIGURE 2-3: Using Google Goggles to identify a famous bridge in London

MARKETING AR-ENABLED APPS

In Japan and the Far East, users are already accustomed to QR codes and recognize that they have special meaning. In Europe and the Americas, QR codes are generally known only by those who are technically savvy. Markerless applications present their own challenges. Imagine if you designed an interactive CD cover that displayed a video of one of the songs from the album when viewed through a smartphone. Or imagine an interactive movie poster that played a preview of the movie. How will you tell users that the image has special AR functionality?

Total Immersion, one of the giants in desktop and web AR, is trying to educate the world about AR-enabled apps by providing a unique AR+ icon (see Figure 2-4) that AR-enabled images/products can use to denote that they are AR enabled. Although Total Immersion’s efforts are still in their infancy, Total Immersion has been working with third parties (including clients, partners, and journalists) to promote the adoption of the AR+ icon.

FIGURE 2-4: The AR+ icon

Content providers and developers can customize the icon to indicate what special content is included. For more information about the program, visit http://arplus.t-immersion.com/

This move is intended to accomplish several objectives simultaneously: educate consumers, ease application development for AR providers and their partners, allow for seamless integration with existing solutions, control quality levels across the board, and promote industry growth.

Total Immersion is sensing a real demand in the market for this kind of clarity backed by an organized, collective effort. AR is moving well beyond digital marketing. Coming up fast are AR solutions in e-commerce/retail, experiential education and entertainment, medicine and science, embedded AR in durable consumer products, and public safety and transportation, among others.

SUMMARY

In this chapter, you have learned about the various types of natural-feature tracking and visual search applications that are available for the Android and iPhone. You have learned that AR can be used to simplify users’ lives, whether it’s by providing helpful product information or translating foreign languages.

Hopefully you’ve installed and experimented with some of the standalone applications that are available for your mobile device. If not, you should. As you learn to build AR applications for browsers later in this book, it’s helpful to know what applications are available and what other developers are building.

Chapter 3

Introduction to AR Browsers

WHAT’S IN THIS CHAPTER?

An overview of the clients you will be developing content forExamples of AR content created by other developersA first look at the functionality provide by AR browsersWhy browsers are not accurate

For the course of this book we will focus on building content for the three leading AR browser platforms. Taking this approach presents us with some benefits and, of course, the inevitable drawbacks. Before you begin, you’ll need an iPhone or an Android phone and the latest versions of junaio, Layar, and Wikitude.

Benefits of using the three leading AR browser platforms include:

It’s easier to build content with little or no programming knowledge required. In many cases, you can access a database or provide XML to reference your content.Your content appears to an audience already interested in AR and actively seeking your content.It’s in the platform providers’ interests to actively promote their applications to users, so that means less marketing on your part.You can take advantage of richer features as the platform provider makes the functionality available.You don’t have to worry about camera input, screen layout, or GPS APIs.The same content will work on Android, iPhone, and Symbian/Windows Phone/bada OS/MeeGo when the clients are released.It’s becoming increasingly common for AR browsers to be preinstalled on mobile devices. This provides even more discoverability opportunities for your content.

Drawbacks include:

Users must launch another application before finding your content.Your content can lay undiscovered in the platform provider’s client.You don’t have full control over the user interface.

Mobile AR is still very much in its infancy, so unless you have a specialist need and want to provide functionality beyond some of the benefits highlighted previously, your best option — for now, at least — is to build content for one of the leading platforms. These companies are investing millions of dollars into making their platforms a success; they have a marketing budget we can only dream of. It’s better to ride their wave than try to fight the tide ourselves.

Another reason not to go it alone is the fact that on the Android and the iPhone, there are almost 100 AR browsers all fighting for user attention. Many are simple; they offer nothing more than the ability to view POIs in the camera window. A handful offer developers the ability to build their own content, but only three are known all over the world and used by millions of users. With that in mind, we will focus on building content for Wikitude, Layar, and junaio.

Each browser offers unique functionality. So before we get down and dirty, we’ll take a look at each browser in detail to learn more about each platform and what kind of content developers are building.

AR BROWSER BASICS

While I said in Chapter 1 that you need a camera and a screen to experience AR, it may surprise you to know that the minimum requirement for an AR browser is simply a smartphone that has a GPS. Why no camera? Well, it is less common, but some developers have taken the approach of building applications that provide an AR experience through augmenting your hearing rather than using a visual method. Toozla is an application from a Russian company that detects the users’ location via GPS. When users come into range of a POI, an audio file begins to play, giving users an overview of their current surroundings. If users remain at the POI to listen, the application provides more detailed commentary. Like most AR applications, Toozla uses the Wikipedia API to provide some of its content, but Toozla also uses professionally recorded material to give users a richer tour guide experience. We are not going to cover building content for Toozla, but they do have an API if you are interested in producing audio content. You will find them at www.toozla.com

AR browsers work by using GPS to detect the user’s current location while the compass detects the direction the device is facing. With the user’s current location and their direction known, the nearby POIs can be displayed in the camera window. As the user moves the mobile device around, the accelerometer detects the elevation while the compass continues to detects the direction the user is currently facing. This combination enables the application to use the map data to build the AR view. In Figure 3-1, you can see POIs from the Qype world for the Wikitude browser shown on a map, with the map indicating their actual location. In Figure 3-2 the same POIs are drawn in the AR browser window, providing a visual indication on their locations.

FIGURE 3-1: POIs shown on a map

FIGURE 3-2: POIs shown in the AR browser

Many of today’s mobile devices are equipped with accelerometers and magnetometers , both of which are put to good use with providing an AR experience. The accelerometer is used to sense the orientation of a device (for example, when the screen view on the device is moved from landscape to portrait) as well as detect the acceleration or how quickly a user moves the device. The accelerometer is also used to detect the angle of the device, enabling an application to know if the camera is being pointed at the sky or at the ground. In addition, the magnetometer — or digital compass, if you prefer — can track changes in the Earth’s magnetic field and determine with considerable accuracy which direction the user is facing and thereby determine North, South, East or West. Hidden Sky for the iPhone is an AR application that enables users to track the location of satellites, stars, and planets in the sky. The user simply points his iPhone at the sky; the application uses the accelerometer to determine the height the device is being pointed and the magnetometer determines the direction. In this example, GPS determines the user’s latitude and longitude position and then presents him with the relevant POIs.

The Growth of AR Browsers

Since the first AR browser first appeared in 2008, the genre has grown to around 300 applications available for the iPhone alone. At last count, an average of 20 new iPhone applications that exhibited AR browser behavior were being added to the iPhone App Store each month. Many of these applications combine geo-location and the camera with freely available APIs from Wikipedia, Twitter, Flickr, Google Search, or an untold number of other sources to enable users to visualize the locations of POIs around them. There are applications that, for example, help you see who is tweeting around you, help you find your nearest train station, or help you find a place to eat.

Since this book is about building for the popular platforms rather than building applications from scratch, you can relax a little because a lot of the plumbing is taken care of. You won’t need to worry about APIs necessary to detect the user’s location, what design is best for building a user interface, or even how to figure the movement of the user’s devices. In most cases, you need only to supply the XML that contains the coordinates of the POI along with relevant icons and text. The difficult choice is determining which platform to build for.

Anatomy of a Browser

Before you delve into the various platforms, however, you should learn what all the platforms have in common so you have a better understanding of what is available to you. All platforms offer similar functionality but they all name their elements differently. So for the time being, we will refer to the components of the browser as follows (see Figure 3-3):

FIGURE 3-3: Anatomy of a browser

RadarInfo bubbleInformation barRangeMapApp Store

Radar

The radar provides a visual location of the direction where the POIs are located and essentially tells the user which way the device should be pointed.

Info Bubbles

POIs are displayed as info bubbles, which is an icon that visually indicates what it represents (for example, a knife and fork for food or a house that indicates a structure).

Information Bar

Once a user selects the info bubble, a short description about that POI is displayed in the information bar, offering the user more information about the POI active in the info bubble. If a user selects the text in the information bar, they are taken to a longer description about the POI. Depending on the browser, this can be a web page, a file, or text.

Range