29,99 €
Speech technology has been around for some time now. However, it has only more recently captured the imagination of the general public with the advent of personal assistants on mobile devices that you can talk to in your own language. The potential of voice apps is huge as a novel and natural way to use mobile devices.
Voice Application Development for Android is a practical, hands-on guide that provides you with a series of clear, step-by-step examples which will help you to build on the basic technologies and create more advanced and more engaging applications. With this book, you will learn how to create useful voice apps that you can deploy on your own Android device in no time at all.
This book introduces you to the technologies behind voice application development in a clear and intuitive way. You will learn how to use open source software to develop apps that talk and that recognize your speech. Building on this, you will progress to developing more complex apps that can perform useful tasks, and you will learn how to develop a simple voice-based personal assistant that you can customize to suit your own needs.
For more interesting information about the book, visit http://lsi.ugr.es/zoraida/androidspeechbook
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 175
Veröffentlichungsjahr: 2013
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2013
Production Reference: 2041213
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-529-7
www.packtpub.com
Cover Image by Aniket Sawant (<[email protected]>)
Authors
Michael F. McTear
Zoraida Callejas
Reviewers
Deborah A. Dahl
Greg Milette
Acquisition Editor
Rebecca Youe
Commissioning Editor
Amit Ghodake
Technical Editors
Aparna Chand
Nadeem N. Bagban
Project Coordinator
Michelle Quadros
Proofreader
Hardip Sidhu
Indexer
Mehreen Deshmukh
Graphics
Ronak Dhruv
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
There are many reasons why users need to speak and listen to mobile devices. We spend the first couple of years of our lives learning how to speak and listen to other people, so it is natural that we should be able to speak and listen to our mobile devices. As mobiles become smaller, the space available for physical keypads shrinks, making more difficult to use. Wearable devices such as Google Glass and smart watches don't have physical keypads. Speaking and listening is becoming a major means of interaction with mobile devices.
Eventually computers with microphones and speakers will be embedded into our home environment, eliminating the need for remote controls and handheld device. Speaking and listening will become the major form of communication with home appliances such as TVs, environmental controls, home security, coffee makers, ovens, and refrigerators.
When we perform tasks that require the use of our eyes and hands, we need speech technologies. Speech is the only practical way for interacting with an Android computer while driving a car or operating complex machinery. Holding and using a mobile device while driving is illegal in some places.
Siri and other intelligent agents enable mobile users to speak a search query. While these systems require sophisticated artificial intelligence and natural language techniques which are complex and time consuming to implement, they demonstrate the use of speech technologies that enable users to search for information.
Guides for "self-help" tasks requiring both hands and eyes present big opportunities for Android applications. Soon we will have electronic guides that speak and listen to help us assemble, troubleshoot, repair, fine-tune, and use equipment of all kinds. What's causing the strange sound in my car's engine? Why won't my television turn on? How do I adjust the air conditioner to cool the house? How do I fix a paper jam in my printer? Printed instructions, user guides, and manuals may be difficult to locate and difficult to read while your eyes are examining and your hands are manipulating the equipment.
Let a speech-enabled application talk you through the process, step-by-step. These self-help applications replace user documentation for almost any product.
Rather than hunting for the appropriate paperwork, just download the latest instructions simply by scanning the QR code on the product. After completing a step, simply say "next" to listen to the next instruction or "repeat" to hear the current instruction again. The self-help application can also display device schematics, illustrations, and even animations and video clips illustrating how to perform a task.
Voice messages and sounds are two of the best ways to catch a person's attention. Important alerts, notifications, and messages should be presented to the user vocally, in addition to displaying them on a screen where the user might not notice them.
These are a few of the many reasons to develop applications that speak and listen to users. This book will introduce you to building speech applications. Its examples at different levels of complexity are a good starting point for experimenting with this technology. Then for more ideas of interesting applications to implement, see the Afterword at the end of the book.
James A. Larson
Vice President and Founder of Larson Technical Services
Michael F. McTear is Emeritus Professor of Knowledge Engineering at the University of Ulster with a special research interest in spoken language technologies. He graduated in German Language and Literature from Queens University Belfast in 1965, was awarded MA in Linguistics at University of Essex in 1975, and a PhD at the University of Ulster in 1981. He has been Visiting Professor at the University of Hawaii (1986-87), the University of Koblenz, Germany (1994-95), and University of Granada, Spain (2006- 2010). He has been researching in the field of spoken dialogue systems for more than 15 years and is the author of the widely used text book Spoken Dialogue Technology: Toward the Conversational User Interface (Springer Verlag, 2004). He also is a co-author of the book Spoken DialogueSystems (Morgan and Claypool, 2010).
Michael has delivered keynote addresses at many conferences and workshops, including the EU funded DUMAS Workshop, Geneva, 2004, the SIGDial workshop, Lisbon, 2005, the Spanish Conference on Natural Language Processing (SEPLN), Granada, 2005, and has delivered invited tutorials at IEEE/ACL Conference on Spoken Language Technologies, Aruba, 2006, and ACL 2007, Prague. He has presented on several occasions at SpeechTEK, a conference for speech technology professionals, in New York and London. He is a certified VoiceXML developer and has taught VoiceXML at training courses to professionals from companies including Genesys, Oracle, Orange, 3, Fujitsu, and Santander. He was the main developer of the VoiceXML-based home monitoring system for patients with type-2 diabetes, currently in use at the Ulster Hospital, Northern Ireland.
Zoraida Callejas is Assistant Professor at the University of Granada, Spain, where she has been teaching several subjects related to Oral and Multimodal Interfaces, Object Oriented Programming, and Software Engineering for the last eight years. She graduated in Computer Science in 2005, and was awarded a PhD in 2008 from the University of Granada. She has been Visiting Professor in Technical University of Liberec, Czech Republic (2007-13), University of Trento, Italy (2008), University of Ulster, Northern Ireland (2009), Technical University of Berlin, Germany (2010), University of Ulm, Germany (2012), and Telecom ParisTech, France (2013).
Zoraida focuses her research on speech technology and in particular, on spoken and multimodal dialogue systems. Zoraida has made presentations at the main conferences in the area of dialogue systems, and has published her research in several international journals and books. She has also coordinated training courses in the development of interactive speech processing systems, and has regularly taught object-oriented software development in Java in different graduate courses for nine years. Currently, she leads a local project for the development of Android speech applications for intellectually disabled users.
We would like to acknowledge the advice and help provided by Amit Ghodake, our Commissioning Editor at Packt Publishing, as well as the support of Michelle Quadros, our Project Coordinator, who ensured that we kept to schedule. A special thanks to our technical reviewers, Deborah A. Dahl and Greg Milette, whose comments and careful reading of the first draft of the book enabled us to make numerous changes in the final version that have greatly improved the quality of the book.
Finally, we would like to acknowledge our partners Sandra McTear and David Griol for putting up with our absences while we devoted so much of our time to writing, and sharing the stress of our tight schedule.
Dr. Deborah A. Dahl has been working in the areas of speech and natural language processing technologies for over 30 years. She received a Ph.D. in linguistics from the University of Minnesota in 1983, followed by a post-doctoral fellowship in Cognitive Science at the University of Pennsylvania. At Unisys Corporation, she performed research on natural language understanding and spoken dialog systems, and led teams which used these technologies in government and commercial applications. Dr. Dahl founded her company, Conversational Technologies, in 2002. Conversational Technologies provides expertise in the state of the art of speech, natural language, and multimodal technologies through reports, analysis, training, and design services that enable its clients to apply these technologies in creating compelling mobile, desktop, and cloud solutions. Dr. Dahl has published over 50 technical papers, and is the editor of the book Practical Spoken Dialog Systems. She is also a frequent speaker at speech industry conferences. In addition to her technical work, Dr. Dahl is active in the World Wide Web Consortium, working on standards development for speech and multimodal interaction as chair of the Multimodal Interaction Working Group. She received the 2012 Speech Luminary Award from Speech Technology Magazine. This is an annual award honoring individuals who push the boundaries of the speech technology industry, and, in doing so, influence others in a significant way.
Greg Milette is a programmer, author, entrepreneur, musician, and father of two who loves implementing great ideas. He has been developing Android apps since 2009 when he released a voice controlled recipe app called Digital Recipe Sidekick. In between yapping to his Android device in the kitchen, Greg co-authored a comprehensive book on sensors and speech recognition called Professional Android Sensor Programming, published by Wiley in 2012 and founded a mobile app consulting company called Gradison Technologies, Inc. He acknowledges the contributions to his work from the Android community, and his family who tirelessly review and test his material and constantly refresh his office with happiness.
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
The idea of being able to talk with a computer has fascinated many people for a long time. However, until recently, this has seemed to be the stuff of science fiction. Now things have changed so that people who own a smartphone or tablet can perform many tasks on their device using voice—you can send a text message, update your calendar, set an alarm, and ask the sorts of queries that you would previously have typed into your search box. Often voice input is more convenient, especially on small devices where physical limitations make typing and tapping more difficult.
This book provides a practical guide to the development of voice apps for Android devices, using the Google Speech APIs for text-to-speech (TTS) and automated speech recognition (ASR) as well as other open source software. Although there are many books that cover Android programming in general, there is no single source that deals comprehensively with the development of voice-based applications for Android.
Developing for a voice user interface shares many of the characteristics of developing for more traditional interfaces, but there are also ways in which voice application development has its own specific requirements and it is important that developers coming to this area are aware of common pitfalls and difficulties. This book provides some introductory material to cover those aspects that may not be familiar to professionals from a mainstream computing background. It then goes on to show in detail how to put together complete apps, beginning with simple programs and progressing to more sophisticated applications. By building on the examples in the book and experimenting with the techniques described, you will be able to bring the power of voice to your Android apps, making them smarter and more intuitive, and boosting your users' mobile experience.
Chapter 1, Speech on Android Devices, discusses how speech can be used on Android devices and outlines the technologies involved.
Chapter 2, Text-to-Speech Synthesis, covers the technology of text-to-speech synthesis and how to use the Google TTS engine.
Chapter 3, Speech Recognition, provides an overview of the technology of speech recognition and how to use the Google Speech to Text engine.
Chapter 4, Simple Voice Interactions, shows how to build simple interactions in which the user and app can talk to each other to retrieve some information or perform an action.
Chapter 5, Form-filling Dialogs, illustrates how to create voice-enabled dialogs that are similar to form-filling in a traditional web application.
Chapter 6, Grammars for Dialog, introduces the use of grammars to interpret inputs from the user that go beyond single words and phrases.
Chapter 7, Multilingual and Multimodal Dialogs, looks at how to build apps that use different languages and modalities.
Chapter 8, Dialogs with Virtual Personal Assistants, shows how to build a speech-enabled personal assistant.
Chapter 9, Taking it Further, shows how to develop a more advanced Virtual Personal Assistant.
To run the code examples and develop your own apps, you will need to install the Android SDK and platform tools. A complete bundle that includes the essential Android SDK component and a version of the Eclipse IDE with built-in ADT (Android Developer Tools) along with tutorials is available for download at http://developer.android.com/sdk/.
You will also need an Android device to build and test the examples as Android ASR (speech recognition) does not work on virtual devices (emulators).
This book is intended for all those who are interested in speech application development, including students of speech technology and mobile computing. We assume some background of programming in general, particularly in Java. We also assume some familiarity with Android programming.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
There is a web page for the book at http://lsi.ugr.es/zoraida/androidspeechbook, with additional resources, including ideas for exercises and projects, suggestions for further reading, and links to useful web pages.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the erratasubmissionform link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
