E-Book
29,99 €

Voice Application Development for Android E-Book

Michael F. McTear

0,0

29,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Speech technology has been around for some time now. However, it has only more recently captured the imagination of the general public with the advent of personal assistants on mobile devices that you can talk to in your own language. The potential of voice apps is huge as a novel and natural way to use mobile devices.
Voice Application Development for Android is a practical, hands-on guide that provides you with a series of clear, step-by-step examples which will help you to build on the basic technologies and create more advanced and more engaging applications. With this book, you will learn how to create useful voice apps that you can deploy on your own Android device in no time at all.
This book introduces you to the technologies behind voice application development in a clear and intuitive way. You will learn how to use open source software to develop apps that talk and that recognize your speech. Building on this, you will progress to developing more complex apps that can perform useful tasks, and you will learn how to develop a simple voice-based personal assistant that you can customize to suit your own needs.
For more interesting information about the book, visit http://lsi.ugr.es/zoraida/androidspeechbook

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 175

Veröffentlichungsjahr: 2013

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Voice Application Development for Android

Credits

Foreword

About the Authors

Acknowledgement

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Web page for the book

Errata

Piracy

Questions

1. Speech on Android Devices

Using speech on an Android device

Speech-to-text

Text-to-speech

Voice Search

Android Voice Actions

Virtual Personal Assistants

Designing and developing a speech app

Why Google speech?

What is needed to create a Virtual Personal Assistant?

Summary

2. Text-to-Speech Synthesis

Introducing text-to-speech synthesis

The technology of text-to-speech synthesis

Using pre-recorded speech instead of TTS

Using Google text-to-speech synthesis

Starting the TTS engine

Developing applications with Google TTS

TTSWithLib app – Reading user input

TTSReadFile app – Reading a file out loud

Summary

3. Speech Recognition

The technology of speech recognition

Using Google speech recognition

Developing applications with the Google speech recognition API

ASRWithIntent app

ASRWithLib app

Summary

4. Simple Voice Interactions

Voice interactions

VoiceSearch app

VoiceLaunch app

VoiceSearchConfirmation app

Summary

5. Form-filling Dialogs

Form-filling dialogs

Implementing form-filling dialogs

Threading

XMLLib

FormFillLib

VXMLParser

DialogInterpreter

MusicBrain app

Summary

6. Grammars for Dialog

Grammars for speech recognition and natural language understanding

NLU with hand-crafted grammars

Statistical NLU

NLULib

Processing XML grammars

Processing statistical grammars

The GrammarTest app

Summary

7. Multilingual and Multimodal Dialogs

Multilinguality

Multimodality

Summary

8. Dialogs with Virtual Personal Assistants

The technology of VPA

Determining the user's intention

Making an appropriate response

Pandorabots

AIML

Using oob tag to add additional functions

The VPALib library

Creating a Pandorabot

Sample VPAs – Jack, Derek, and Stacy

Alternative approaches

Summary

9. Taking it Further

Developing a more advanced Virtual Personal Assistant

Summary

A. Afterword

Index

Voice Application Development for Android

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: November 2013

Production Reference: 2041213

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78328-529-7

www.packtpub.com

Cover Image by Aniket Sawant (<[email protected]>)

Credits

Authors

Michael F. McTear

Zoraida Callejas

Reviewers

Deborah A. Dahl

Greg Milette

Acquisition Editor

Rebecca Youe

Commissioning Editor

Amit Ghodake

Technical Editors

Aparna Chand

Nadeem N. Bagban

Project Coordinator

Michelle Quadros

Proofreader

Hardip Sidhu

Indexer

Mehreen Deshmukh

Graphics

Ronak Dhruv

Production Coordinator

Aparna Bhagat

Cover Work

Aparna Bhagat

Foreword

There are many reasons why users need to speak and listen to mobile devices. We spend the first couple of years of our lives learning how to speak and listen to other people, so it is natural that we should be able to speak and listen to our mobile devices. As mobiles become smaller, the space available for physical keypads shrinks, making more difficult to use. Wearable devices such as Google Glass and smart watches don't have physical keypads. Speaking and listening is becoming a major means of interaction with mobile devices.

Eventually computers with microphones and speakers will be embedded into our home environment, eliminating the need for remote controls and handheld device. Speaking and listening will become the major form of communication with home appliances such as TVs, environmental controls, home security, coffee makers, ovens, and refrigerators.

When we perform tasks that require the use of our eyes and hands, we need speech technologies. Speech is the only practical way for interacting with an Android computer while driving a car or operating complex machinery. Holding and using a mobile device while driving is illegal in some places.

Siri and other intelligent agents enable mobile users to speak a search query. While these systems require sophisticated artificial intelligence and natural language techniques which are complex and time consuming to implement, they demonstrate the use of speech technologies that enable users to search for information.

Guides for "self-help" tasks requiring both hands and eyes present big opportunities for Android applications. Soon we will have electronic guides that speak and listen to help us assemble, troubleshoot, repair, fine-tune, and use equipment of all kinds. What's causing the strange sound in my car's engine? Why won't my television turn on? How do I adjust the air conditioner to cool the house? How do I fix a paper jam in my printer? Printed instructions, user guides, and manuals may be difficult to locate and difficult to read while your eyes are examining and your hands are manipulating the equipment.

Let a speech-enabled application talk you through the process, step-by-step. These self-help applications replace user documentation for almost any product.

Rather than hunting for the appropriate paperwork, just download the latest instructions simply by scanning the QR code on the product. After completing a step, simply say "next" to listen to the next instruction or "repeat" to hear the current instruction again. The self-help application can also display device schematics, illustrations, and even animations and video clips illustrating how to perform a task.

Voice messages and sounds are two of the best ways to catch a person's attention. Important alerts, notifications, and messages should be presented to the user vocally, in addition to displaying them on a screen where the user might not notice them.

These are a few of the many reasons to develop applications that speak and listen to users. This book will introduce you to building speech applications. Its examples at different levels of complexity are a good starting point for experimenting with this technology. Then for more ideas of interesting applications to implement, see the Afterword at the end of the book.

James A. Larson

Vice President and Founder of Larson Technical Services

About the Authors

Michael F. McTear is Emeritus Professor of Knowledge Engineering at the University of Ulster with a special research interest in spoken language technologies. He graduated in German Language and Literature from Queens University Belfast in 1965, was awarded MA in Linguistics at University of Essex in 1975, and a PhD at the University of Ulster in 1981. He has been Visiting Professor at the University of Hawaii (1986-87), the University of Koblenz, Germany (1994-95), and University of Granada, Spain (2006- 2010). He has been researching in the field of spoken dialogue systems for more than 15 years and is the author of the widely used text book Spoken Dialogue Technology: Toward the Conversational User Interface (Springer Verlag, 2004). He also is a co-author of the book Spoken DialogueSystems (Morgan and Claypool, 2010).

Michael has delivered keynote addresses at many conferences and workshops, including the EU funded DUMAS Workshop, Geneva, 2004, the SIGDial workshop, Lisbon, 2005, the Spanish Conference on Natural Language Processing (SEPLN), Granada, 2005, and has delivered invited tutorials at IEEE/ACL Conference on Spoken Language Technologies, Aruba, 2006, and ACL 2007, Prague. He has presented on several occasions at SpeechTEK, a conference for speech technology professionals, in New York and London. He is a certified VoiceXML developer and has taught VoiceXML at training courses to professionals from companies including Genesys, Oracle, Orange, 3, Fujitsu, and Santander. He was the main developer of the VoiceXML-based home monitoring system for patients with type-2 diabetes, currently in use at the Ulster Hospital, Northern Ireland.

Zoraida Callejas is Assistant Professor at the University of Granada, Spain, where she has been teaching several subjects related to Oral and Multimodal Interfaces, Object Oriented Programming, and Software Engineering for the last eight years. She graduated in Computer Science in 2005, and was awarded a PhD in 2008 from the University of Granada. She has been Visiting Professor in Technical University of Liberec, Czech Republic (2007-13), University of Trento, Italy (2008), University of Ulster, Northern Ireland (2009), Technical University of Berlin, Germany (2010), University of Ulm, Germany (2012), and Telecom ParisTech, France (2013).

Zoraida focuses her research on speech technology and in particular, on spoken and multimodal dialogue systems. Zoraida has made presentations at the main conferences in the area of dialogue systems, and has published her research in several international journals and books. She has also coordinated training courses in the development of interactive speech processing systems, and has regularly taught object-oriented software development in Java in different graduate courses for nine years. Currently, she leads a local project for the development of Android speech applications for intellectually disabled users.

Acknowledgement

We would like to acknowledge the advice and help provided by Amit Ghodake, our Commissioning Editor at Packt Publishing, as well as the support of Michelle Quadros, our Project Coordinator, who ensured that we kept to schedule. A special thanks to our technical reviewers, Deborah A. Dahl and Greg Milette, whose comments and careful reading of the first draft of the book enabled us to make numerous changes in the final version that have greatly improved the quality of the book.

Finally, we would like to acknowledge our partners Sandra McTear and David Griol for putting up with our absences while we devoted so much of our time to writing, and sharing the stress of our tight schedule.

About the Reviewers

Dr. Deborah A. Dahl has been working in the areas of speech and natural language processing technologies for over 30 years. She received a Ph.D. in linguistics from the University of Minnesota in 1983, followed by a post-doctoral fellowship in Cognitive Science at the University of Pennsylvania. At Unisys Corporation, she performed research on natural language understanding and spoken dialog systems, and led teams which used these technologies in government and commercial applications. Dr. Dahl founded her company, Conversational Technologies, in 2002. Conversational Technologies provides expertise in the state of the art of speech, natural language, and multimodal technologies through reports, analysis, training, and design services that enable its clients to apply these technologies in creating compelling mobile, desktop, and cloud solutions. Dr. Dahl has published over 50 technical papers, and is the editor of the book Practical Spoken Dialog Systems. She is also a frequent speaker at speech industry conferences. In addition to her technical work, Dr. Dahl is active in the World Wide Web Consortium, working on standards development for speech and multimodal interaction as chair of the Multimodal Interaction Working Group. She received the 2012 Speech Luminary Award from Speech Technology Magazine. This is an annual award honoring individuals who push the boundaries of the speech technology industry, and, in doing so, influence others in a significant way.

Greg Milette is a programmer, author, entrepreneur, musician, and father of two who loves implementing great ideas. He has been developing Android apps since 2009 when he released a voice controlled recipe app called Digital Recipe Sidekick. In between yapping to his Android device in the kitchen, Greg co-authored a comprehensive book on sensors and speech recognition called Professional Android Sensor Programming, published by Wiley in 2012 and founded a mobile app consulting company called Gradison Technologies, Inc. He acknowledges the contributions to his work from the Android community, and his family who tirelessly review and test his material and constantly refresh his office with happiness.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print and bookmark contentOn demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

The idea of being able to talk with a computer has fascinated many people for a long time. However, until recently, this has seemed to be the stuff of science fiction. Now things have changed so that people who own a smartphone or tablet can perform many tasks on their device using voice—you can send a text message, update your calendar, set an alarm, and ask the sorts of queries that you would previously have typed into your search box. Often voice input is more convenient, especially on small devices where physical limitations make typing and tapping more difficult.

This book provides a practical guide to the development of voice apps for Android devices, using the Google Speech APIs for text-to-speech (TTS) and automated speech recognition (ASR) as well as other open source software. Although there are many books that cover Android programming in general, there is no single source that deals comprehensively with the development of voice-based applications for Android.

Developing for a voice user interface shares many of the characteristics of developing for more traditional interfaces, but there are also ways in which voice application development has its own specific requirements and it is important that developers coming to this area are aware of common pitfalls and difficulties. This book provides some introductory material to cover those aspects that may not be familiar to professionals from a mainstream computing background. It then goes on to show in detail how to put together complete apps, beginning with simple programs and progressing to more sophisticated applications. By building on the examples in the book and experimenting with the techniques described, you will be able to bring the power of voice to your Android apps, making them smarter and more intuitive, and boosting your users' mobile experience.

What this book covers

Chapter 1, Speech on Android Devices, discusses how speech can be used on Android devices and outlines the technologies involved.

Chapter 2, Text-to-Speech Synthesis, covers the technology of text-to-speech synthesis and how to use the Google TTS engine.

Chapter 3, Speech Recognition, provides an overview of the technology of speech recognition and how to use the Google Speech to Text engine.

Chapter 4, Simple Voice Interactions, shows how to build simple interactions in which the user and app can talk to each other to retrieve some information or perform an action.

Chapter 5, Form-filling Dialogs, illustrates how to create voice-enabled dialogs that are similar to form-filling in a traditional web application.

Chapter 6, Grammars for Dialog, introduces the use of grammars to interpret inputs from the user that go beyond single words and phrases.

Chapter 7, Multilingual and Multimodal Dialogs, looks at how to build apps that use different languages and modalities.

Chapter 8, Dialogs with Virtual Personal Assistants, shows how to build a speech-enabled personal assistant.

Chapter 9, Taking it Further, shows how to develop a more advanced Virtual Personal Assistant.

What you need for this book

To run the code examples and develop your own apps, you will need to install the Android SDK and platform tools. A complete bundle that includes the essential Android SDK component and a version of the Eclipse IDE with built-in ADT (Android Developer Tools) along with tutorials is available for download at http://developer.android.com/sdk/.

You will also need an Android device to build and test the examples as Android ASR (speech recognition) does not work on virtual devices (emulators).

Who this book is for

This book is intended for all those who are interested in speech application development, including students of speech technology and mobile computing. We assume some background of programming in general, particularly in Java. We also assume some familiarity with Android programming.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Web page for the book

There is a web page for the book at http://lsi.ugr.es/zoraida/androidspeechbook, with additional resources, including ideas for exercises and projects, suggestions for further reading, and links to useful web pages.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the erratasubmissionform link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.