29,99 €
Unlock the potential of your robots by enhancing their perception with cutting-edge artificial intelligence and machine learning techniques. From neural networks to computer vision, this second edition of the book equips you with the latest tools, new and expanded topics such as object recognition and creating artificial personality, and practical use cases to create truly smart robots.
Starting with robotics basics, robot architecture, control systems, and decision-making theory, this book presents systems-engineering methods to design problem-solving robots with single-board computers. You'll explore object recognition using YOLO and genetic algorithms to teach your robot to identify and pick up objects, leverage natural language processing to give your robot a voice, and master neural networks to classify and separate objects and navigate autonomously, before advancing to guiding your robot arms using reinforcement learning and genetic algorithms. The book also covers path planning and goal-oriented programming to prioritize your robot's tasks, showing you how to connect all software using Python and ROS 2 for a seamless experience.
By the end of this book, you'll have learned how to transform your robot into a helpful assistant with NLP and give it an artificial personality, ready to tackle real-world tasks and even crack jokes.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 551
Veröffentlichungsjahr: 2024
Artificial Intelligence for Robotics
Build intelligent robots using ROS 2, Python, OpenCV, and AI/ML techniques for real-world tasks
Francis X. Govers III
Copyright © 2024 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Preet Ahuja
Publishing Product Manager: Surbhi Suman
Book Project Manager: Ashwini Gowda
Senior Editors: Shruti Menon and Adrija Mitra
Technical Editor: Rajat Sharma
Copy Editor: Safis Editing
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Vijay Kamble
DevRel Marketing Coordinator: Rohan Dobhal
Senior DevRel Marketing Coordinator: Linda Pearlson
First published: August 2018
Second edition: March 2024
Production reference: 1290224
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-80512-959-2
www.packtpub.com
In loving memory of my grandmother, Sally (Mabel) Govers, who nurtured my love of science and learning from my earliest days.
Thanks to my long-time robotics mentor, Dr. Bob Finkelstein. Thanks, Bob, for all your faith and encouragement.
To my wife, Carol – thanks for your patience. To my children, Jessica and Corbin, their spouses, Peter and Amy, and my grandchildren, William, Oliver, Amelia, Henry, and Rowan. This book is dedicated to you all; thanks for the inspiration.
It is indeed a pleasure and an honor to prepare this foreword. Francis X. Govers III has four decades of experience in designing autonomous vehicles – self-driving cars, airships, unmanned aircraft, and robots. The content in this book reflects his expertise, knowledge, and authority on the subject of artificial intelligence (AI) for robotics. Francis has carefully crafted the chapters keeping in mind a reader who is eager to learn the practical aspects of the subject, taking them to a higher level where they feel they learned the topic to the extent that they can continue learning on their own. Francis showed his wizardry in teaching the art of AI by organizing the book into three learning modules and breaking down each module into three chapters. The final two chapters capture the summary and assessment of the knowledge gained.
This book is for professionals or hobbyists who have theoretical knowledge and are interested in applying it by learning how to program a robot to perform intelligent tasks autonomously. Each chapter in this book begins with a concept within the realm of AI, a programmable task, and a means to implement the task on a robot with illustration. While each chapter is sufficiently independent, they are designed to build from the previous ones with increasing depth and complexity. This is the magic thread that runs through the book consistently. Each chapter makes the learner accomplished, giving the satisfaction of learning. Learners can go through this book at their own comfortable pace, one chapter at a time, making progress gradually.
This takes the learners through the world of autonomy gradually. Francis guides the learners and makes them confident in each topic, yet curious to learn more. From setting up a robot for the first time to making the robot express emotions, Francis makes learning a fun activity! The subject of autonomy is complex, yet this book makes it easier to understand through illustrated examples and sample code segments. I am confident that this book will serve as a hands-on guide on intelligent robotics for college freshmen and professionals alike to fulfill the intent of the author.
Dr. Kamesh Namuduri
Professor, Department of Electrical Engineering
University of North Texas
Francis X. Govers III is an Associate Technical Fellow for Autonomy at Bell Textron, and chairman of the Textron Autonomy Council. He is the designer of over 30 unmanned vehicles and robots for land, sea, air, and space, including RAMSEE, the autonomous security guard robot. Francis helped lead the design of the International Space Station, the F-35 JSF Fighter, the US Army Future Combat Systems, and telemetry systems for NASCAR and IndyCar. He is an engineer, pilot, author, musician, artist, and maker. He received five outstanding achievement awards from NASA and recognition from Scientific American for World Changing Ideas. He has a Master of Science degree from Brandeis University and is a veteran of the US Air Force.
Thanks to my teammates at Bell Textron, especially Jason Hurst, Keith Stanney, Lee Anderson, Grant Bristow, Matt Holvey, Srushti Desai, Jeff Holcomb, and Rohn Olsen, and all the support from across the Textron organization. I also thank my academic partners, Dr. Kamesh Namuduri at the University of North Texas, and Dr. Kamesh Subbarao at the University of Texas at Arlington.
Jugesh Sundram is a robotics engineer working in the automotive and mobility sector on industrial problems that require computer vision-based AI solutions. He obtained his Master’s in mechanical engineering from Florida Institute of Technology, USA. He has worked across the globe, spanning the US, India, Singapore, and now Belgium. He focuses on researching, designing, developing, and applying simultaneous localization and mapping algorithms on mobile robots. He has contributed to research papers in various top robotics conferences and holds patents. A father of two, he enjoys his time off spending time with family, producing music, traveling, and exploring vegetarian cuisine across the world.
I attribute my joy in reviewing to the curiosity and knowledge imparted to me during my academic journey. For that, I thank all my teachers, professors, and colleagues who have mentored me along the way and led me to where I am today. I am grateful to my wonderful wife, kids, and parents, who gave meaning to the word “life” in work-life balance.
Karthikeyan Yuvaraj has over 10 years of experience in developing and deploying complex robotics and perception systems. His experience lies in developing software solutions for vision-based robotic manipulation. He has also worked in major industry names such as Alphabet, Honeywell, and DARPA. Currently, he is a senior camera software engineer at HP Inc., where he develops camera systems for state-of-the-art video conferencing technologies. He is also a fellow of the Institution of Engineering and Technology and a senior member of the Institute of Electrical and Electronics Engineers.
The first section of this book begins with the foundations of robotics and Artificial Intelligence (AI), covering what AI is and how it is used. Then we start to define our robot systems and talk about control. In the second chapter, we look at robot anatomy, the parts of a robot, and discuss autonomy principles and the Subsumption Architecture concept. You'll get a primer on the Robotics Operating System (ROS) and our single-board supercomputer. Finally, we show a systematic process for robot design using systems engineering principles and storyboards.
This part has the following chapters:
Chapter 1, The Foundation of Robotics and Artificial IntelligenceChapter 2, Setting Up Your RobotChapter 3, Conceptualizing the Practical Robot Design ProcessIn this book, I invite you to go on a journey with me to discover how to add Artificial Intelligence (AI) to a mobile robot. The basic difference between what I will call an AI robot and a more regular robot is the ability of the robot and its software to make decisions and to learn and adapt to its environment based on data from its sensors. To be a bit more specific, we are leaving the world of pre-coded robot design behind. Instead of programming all of the robot’s behaviors in advance, the robot, or more correctly, the robot software, will learn from examples we provide, or from interacting with the outside world. The robot software will not control its behavior as much as the data that we use to train the AI system will.
The AI robot will use its learning process to make predictions about the environment and how to achieve goals, and then use those predictions to create behavior. We will be trying out several forms of AI on our journey, including supervised and unsupervised learning, reinforcement learning, neural networks, and genetic algorithms. We will create a digital robot assistant that can talk and understand commands (and tell jokes), and we will create an Artificial Personality (AP) for our robot. We will learn how to teach our robot to navigate without a map, grasp objects by trial and error, and see in three dimensions.
In this chapter, we will cover the following key topics:
The basic principles of robotics and AIWhat is AI and autonomy (and what is it not)?Are recent developments in AI anything new?What is a robot?Introducing our sample problemWhen do you need AI for your robot?Introducing the robot and our development environmentThe technical requirements for completing the tasks in this chapter are described in the Preface at the beginning of this book.
All of the code for this book is available on the GitHub repository, available at https://github.com/PacktPublishing/Artificial-Intelligence-for-Robotics-2e/.
AI applied to robotics development requires a different set of skills from you, the robot designer or developer. You may have made robots before. You probably have a quadcopter or a 3D printer (which is, in fact, a robot). The familiar world of Proportional-Integral-Derivative (PID) controllers, sensor loops, and state machines is augmented by Artificial Neural Networks (ANNs), expert systems, genetic algorithms, and searching path planners. We want a robot that does not just react to its environment as a reflex action but has goals and intent – and can learn and adapt to the environment and is taught or trained rather than programmed. Some of the problems we can solve this way would be difficult, intractable, or impossible otherwise.
What we are going to do in this book is introduce a problem – picking up toys in a playroom – that we will use as our example throughout the book as we learn a series of techniques for applying AI to our robot. It is important to understand that, in this book, the journey is far more important than the destination. At the end of the book, you should have gained some important skills with broad applicability, not just learned how to pick up toys.
What we are going to do is first provide some tools and background to match the infrastructure that was used to develop the examples in the book. This is both to provide an even playing field and to not assume any practical knowledge on your part. To execute some of the advanced neural networks that we are going to build, we will use the GPUs in the Jetson.
In the rest of this chapter, we will discuss some basics about robotics and AI, and then proceed to develop two important tools that we will use in all of the examples in the rest of the book. We will introduce the concept of soft real-time control, and then provide a framework, or model, for creating autonomy for our robot called the Observe-Orient-Decide-Act (OODA) loop.
What would be the definition of AI? In general, it means a machine that exhibits some characteristics of intelligence – thinking, reasoning, planning, learning, and adapting. It can also mean a software program that can simulate thinking or reasoning. Let’s try some examples: a robot that avoids obstacles by simple rules (if the obstacle is to the right, go left) is not AI. A program that learns, by example, to recognize a cat in a video is AI. A robot arm that is operated by a joystick does not use AI, but a robot arm that adapts to different objects in order to pick them up is an application of AI.
There are two defining characteristics of AI robots that you must be aware of. First of all, AI robots are primarily trained to perform tasks, by providing examples, rather than being programmed step by step. For example, we will teach the robot’s software to recognize toys – things we want it to pick up – by training a neural network with examples of what toys look like. We will provide a training set of pictures with the toys in the images. We will specifically annotate what parts of the images are toys, and the robot will learn from that. Then we will test the robot to see that it learned what we wanted it to, somewhat like a teacher would test a student. The second characteristic is emergent behavior, in which the robot exhibits evolving actions that were not explicitly programmed into it. We provide the robot with controlling software that is inherently non-linear and self-organizing. The robot may suddenly exhibit some bizarre or unusual reaction to an event or situation that might appear to be odd, quirky, or even emotional. I worked with a self-driving car that we swore had delicate sensibilities and moved very daintily, earning it the nickname Ferdinand, after the sensitive, flower-loving bull from a cartoon, which was strange in a nine-ton truck that appeared to like plants. These behaviors are just caused by interactions of the various software components and control algorithms and do not represent anything more than that.
One concept you will hear in AI circles is the Turing test. The Turing test was proposed by Alan Turing in 1950, in a paper entitled Computing Machinery and Intelligence. He postulated that a human interrogator would question a hidden, unseen AI system, along with another human. If the human posing the questions was unable to tell which person was the computer and which was the human, then that AI computer would pass the test. This test supposes that the AI would be fully capable of listening to a conversation, understanding the content, and giving the same sort of answers a person would. Current AI chatbots can easily pass the Turing test and you may have interacted several times this week with AI on the phone without realizing it.
One group from the Association for the Advancement of Artificial Intelligence (AAAI) proposed that a more suitable test for AI might be the assembly of flatpack furniture – using the supplied instructions. However, to date, no robot has passed this test.
Our objective in this book is not to pass the Turing test, but rather to take some novel approaches to solving problems using techniques in machine learning, planning, goal seeking, pattern recognition, grouping, and clustering. Many of these problems would be very difficult to solve any other way. AI software that could pass the Turing test would be an example of general AI, or a full, working intelligent artificial brain, and, just like you, general AI does not need to be specifically trained to solve any particular problem. To date, general AI has not been created, but what we do have is narrow AI or software that simulates thinking in a very narrow application, such as recognizing objects, or picking good stocks to buy.
While we are not building general AI in this book, that means we are not going to be worried about our creations developing a mind of their own or getting out of control. That comes from the realm of science fiction and bad movies, rather than the reality of computers today. I am firmly of the mind that anyone preaching about the evils of AI or predicting that robots will take over the world has likely not seen the dismal state of AI research in terms of solving general problems or creating something resembling actual intelligence.
What has been is what will be, and what has been done is what will be done, and there is nothing new under the sun – Ecclesiastes 1:9, King James Bible
The modern practice of AI is not new. Most of these techniques were developed in the 1960s and 1970s and fell out of favor because the computing machinery of the day was insufficient for the complexity of software or the number of calculations required. They only waited for computers to get bigger and for another very significant event – the invention of the internet. In previous decades, if you needed 10,000 digitized pictures of cats to compile a database to train a neural network, the task would be almost impossible – you could take a lot of cat pictures, or scan images from books. Today, a Google search for cat pictures returns 126,000,000 results in 0.44 seconds. Finding cat pictures, or anything else, is just a search away, and you have your training set for your neural network – unless you need to train on a very specific set of objects that don’t happen to be on the internet, as we will see in this book, in which case we will once again be taking a lot of pictures with another modern aid not found in the sixties, a digital camera. The happy combination of very fast computers, cheap, plentiful storage, and access to almost unlimited data of every sort has produced a renaissance in AI.
Another modern development has occurred on the other end of the computer spectrum. While anyone can now have what we would have called a supercomputer back in 2000 on their desk at home, the development of the smartphone has driven a whole series of innovations that are just being felt in technology. Your wonder of a smartphone has accelerometers and gyroscopes made of tiny silicon chips called Micro-Electromechanical Systems (MEMS). It also has a high-resolution but very small digital camera and a multi-core computer processor that takes very little power to run. It also contains (probably) three radios – a Wi-Fi wireless network, a cellular phone, and a Bluetooth transceiver. As good as these parts are at making your iPhone fun to use, they have also found their way into parts available for robots. That is fun for us because what used to be only available for research labs and universities is now for sale to individual users. If you happen to have a university or research lab or work for a technology company with multi-million-dollar development budgets, you will also learn something from this book, and find tools and ideas that hopefully will inspire your robotics creations or power new products with exciting capabilities.
Now that you’re familiar with the concept of AI for robotics, let’s look at what a robot actually is.
The word robot entered the modern language from the play R.U.R by the Czech author Karel Capek, which was published back in 1920. Roboti is a Czech word meaning forced servitude. In the play, an industrialist learns how to build artificial people – not mechanical, metal men, but made of flesh and blood, and coming from a factory fully grown. The English translation of the name R.U.R as Rossum’s Universal Robots introduced the word robot to the world.
For the purposes of this book, a robot is a machine that is capable of sensing and reacting to its environment, and that has some human- or animal-like function. We generally think of a robot as an automated, self-directing mobile machine that can interact with the environment. That is to say, a robot has a physical form and exhibits some form of autonomy, or the ability to make decisions for itself based on observation of the external environment.
Next, let’s discuss the problem we will be trying to solve in this book.
In the course of this book, we will be using a single problem set that I feel most people can relate to easily, while still representing a real challenge for the most seasoned roboticist. We will be using AI and robotics techniques to pick up toys in my house after my grandchildren have visited. That sound you just heard was the gasp from the professional robotics engineers and researchers in the audience – this is a tough problem. Why is this a tough problem, and why is it ideal for this book?
Let’s discuss the problem and break it down a bit. Later, in Chapter 2, we will do a full task analysis, learn how to write use cases, and create storyboards to develop our approach, but we can start here with some general observations.
Robotics designers first start with the environment – where does the robot work? We divide environments into two categories: structured and unstructured. A structured environment, like the playing field for a FIRST robotics competition (a contest for robots built by high school students in the US, where all of the playing field is known in advance), an assembly line, or a lab bench, has everything in an organized space. You might have heard the saying “A place for everything and everything in its place” – that is a structured environment. Another way to think about it is that we know in advance where everything is or is going to be. We know what color things are, where they are placed in space, and what shape they are. A name for this type of information is a priori knowledge – things we know in advance. Having advanced knowledge of the environment in robotics is sometimes absolutely essential. Assembly line robots expect parts to arrive in an exact position and orientation to be grasped and placed into position. In other words, we have arranged the world to suit the robot.
In the world of my house, this is simply not an option. If I could get my grandchildren to put their toys in exactly the same spot each time, then we would not need a robot for this task. We have a set of objects that are fairly fixed – we only have so many toys for them to play with. We occasionally add things or lose toys, or something falls down the stairs, but the toys are elements of a set of fixed objects. What they are not is positioned or oriented in any particular manner – they are just where they were left when the kids finished playing with them and went home. We also have a fixed set of furniture, but some parts move – the footstool or chairs can be moved around. This is an unstructured environment, where the robot and the software have to adapt, not the toys or furniture.
The problem is to have the robot drive around the room and pick up toys. Here are some objectives for this task:
We want the user to interact with the robot by talking to it. We want the robot to understand what we want it to do, which is to say, what our intent is for the commands we are giving it.Once commanded to start, the robot will have to identify an object as being a toy or not being a toy. We only want to pick up toys.The robot must avoid hazards, the most important being the stairs going down to the first floor. Robots have a particular problem with negative obstacles (dropoffs, curbs, cliffs, stairs, etc.), and that is exactly what we have here.Once the robot finds a toy, it has to determine how to pick the toy up with its robot arm. Can it grasp the object directly, or must it scoop the item up, or push it along? We expect that the robot will try different ways to pick up toys and may need several trial-and-error attempts.Once the toy is picked up by the robot arm, the robot needs to carry the toy to a toy box. The robot must recognize the toy box in the room, remember where it is for repeat trips, and then position itself to place the toy in the box. Again, more than one attempt may be required.After the toy is dropped off, the robot returns to patrolling the room looking for more toys. At some point, hopefully, all of the toys will be retrieved. It may have to ask us, the human, whether the room is acceptable, or whether it needs to continue cleaning.What will we learn from this problem? We will be using this backdrop to examine a variety of AI techniques and tools. The purpose of the book is to teach you how to develop AI solutions with robots. It is the process and the approach that is the critical information here, not the problem and not the robot I developed for the book. We will be demonstrating techniques for making a moving machine that can learn and adapt to its environment. I would expect that you will pick and choose which chapters to read and in which order, according to your interests and your needs, and as such, each of the chapters will be standalone lessons.
The first three chapters are foundation material that supports the rest of the book by setting up the problem and providing a firm framework to attach the rest of the material.
Not all of the chapters or topics in this book are considered classical AI approaches, but they do represent different ways of approaching machine learning and decision-making problems. We will be exploring together the following topics:
Control theory and timing: We will build a firm foundation for robot control by understanding control theory and timing. We will be using a soft real-time control scheme with what I call a frame-based control loop. This technique has a fancy technical name – rate monotonic scheduling – but I think you will find the concept intuitive and easy to understand.OODA loop: At the most basic level, AI is a way for the robot to make decisions about its actions. We will introduce a model for decision-making that comes from the US Air Force, called the OODA loop. This describes how a robot (or a person) makes decisions. Our robot will have two of these loops, an inner loop or introspective loop, and an outward-looking environment sensor loop. The lower, inner loop takes priority over the slower, outer loop, just as the autonomic parts of your body (such as the heartbeat, breathing, and eating) take precedence over your task functions (such as going to work, paying bills, and mowing the yard). This makes our system a type of subsumption architecture, a biologically inspired control paradigm named by Rodney Brooks of MIT, one of the founders of iRobot and Rethink Robotics, and the designer of a robot named Baxter.Figure 1.1 – My version of the OODA loop
Note
The OODA loop was invented by Col. John Boyd, a man also called The Father of the F-16. Col. Boyd’s ideas are still widely quoted today, and his OODA loop is used to describe robot AI, military planning, or marketing strategies with equal utility. OODA provides a model for how a thinking machine that interacts with its environment might work.
Our robot works not by simply following commands or instructions step by step but by setting goals and then working to achieve those goals. The robot is free to set its own path or determine how to get to its goal. We will tell the robot pick up that toy and the robot will decide which toy, how to get in range, and how to pick up the toy. If we, the human robot owner, instead tried to treat the robot as a teleoperated hand, we would have to give the robot many individual instructions, such as move forward, move right, extend arm, and open hand, each individually, and without giving the robot any idea why we were making those motions. In a goal-oriented structure, the robot will be aware of which objects are toys and which are not and it will know how to find the toy box and how to put toys in the box. This is the difference between an autonomous robot and a radio-controlled teleoperated device.
Before designing the specifics of our robot and its software, we have to match its capabilities to the environment and the problem it must solve. The book will introduce some tools for designing the robot and managing the development of the software. We will use two tools from the discipline of systems engineering to accomplish this – use cases and storyboards. I will make this process as streamlined as possible. More advanced types of systems engineering are used by NASA, aerospace, and automobile companies to design rockets, cars, and aircraft – this gives you a taste of those types of structured processes.
The following sections will each detail step-by-step examples of applying AI techniques to a robotics problem:
We start with object recognition. We need our robot to recognize objects, and then classify them as either toys to be picked up or not toys to be left alone. We will use a trained ANN to recognize objects from a video camera from various angles and lighting conditions. We will be using the process of transfer learning to extend an existing object recognition system, YOLOv8, to recognize our toys quickly and reliably.The next task, once a toy is identified, is to pick it up. Writing a general-purpose pick up anything program for a robot arm is a difficult task involving a lot of higher mathematics (use the internet to look up inverse kinematics to see what I mean). What if we let the robot sort this out for itself? We use genetic algorithms that permit the robot to invent its own behaviors and learn to use its arm on its own. Then we will employ deep reinforcement learning (DRL) to let the robot teach itself how to grasp various objects using an end effector (robot speak for a hand).Our robot needs to understand commands and instructions from its owner (us). We use natural language processing (NLP) to not just recognize speech but to understand our intent for the robot to create goals consistent with what we want it to do. We use a neat technique that I call the fill in the blank method to allow the robot to reason from the context of a command. This process is useful for a lot of robot planning tasks.The robot’s next problem is navigating rooms while avoiding the stairs and other hazards. We will use a combination of a unique, mapless navigation technique with 3D vision provided by a special stereo camera to see and avoid obstacles.The robot will need to be able to find the toy box to put items away, as well as have a general framework for planning moves in the future. We will use decision trees for path planning, as well as discussing pruning or quickly rejecting bad plans. If you imagine what a computer chess program algorithm must do, looking several moves ahead and scoring good moves versus bad moves before selecting a strategy, that will give you an idea of the power of this technique. This type of decision tree has many uses and can handle many dimensions of strategies. We’ll be using it as one of two ways to find a path to our toy box to put toys away.Our final task brings a different set of tools not normally used in robotics, or at least not the way we are going to employ them.I have five wonderful, talented, and delightful grandchildren who love to come and visit. You’ll be hearing a lot about them throughout the book. The oldest grandson is 10 years old, and autistic, as is my granddaughter, the third child, who is 8, as well as the youngest boy, who is 6 as I write this. I introduced my eldest grandson, William, to the robot – and he immediately wanted to have a conversation with it. He asked, “What’s your name?” and “What do you do?” He was disappointed when the robot made no reply. So for the grandkids, we will be developing an engine for the robot to carry out a short conversation – we will be creating a robot personality to interact with children. William had one more request for this robot – he wants it to tell and respond to knock, knock jokes, so we will use that as a prototype of special dialog.
While developing a robot with actual feelings is far beyond the state of the art in robotics or AI today, we can simulate having a personality with a finite state machine and some Monte Carlo modeling. We will also give the robot a model for human interaction so that the robot will take into account the child’s mood as well. I like to call this type of software an AP to distinguish it from our AI. AI builds a model of thinking, and an AP builds a model of emotion for our robot.
Now that you’re familiar with the problem we will be addressing in this book, let’s briefly discuss when and why you might need AI for your robot.
We generally describe AI as a technique for modeling or simulating processes that emulate how our brains make decisions. Let’s discuss how AI can be used in robotics to provide capabilities that may be difficult for traditional programming techniques to achieve. One of those is identifying objects in images or pictures. If you connect a camera to a computer, the computer receives not an image, but an array of numbers that represent pixels (picture elements). If we are trying to determine whether a certain object, say a toy, is located in the image, then this can be quite tricky. You can find shapes, such as circles or squares, but a teddy bear? Moreover, what if the teddy bear is upside down, or lying flat on a surface? This is the sort of problem that an AI program can solve when nothing else can.
Our traditional approach for creating robot behaviors is to figure out what function we want and to write code to make that happen. When we have a simple function, such as driving around an obstacle, then this approach works well, and we can get results with a little tuning.
Some examples of AI and ML for robotics include:
NLP: Using AI/ML to allow the robot to understand and respond to natural human speech and commands. This makes interacting with the robot much more intuitive.Computer vision: Using AI to let the robot see and recognize objects or people’s faces, read text, and so on. This helps the robot operate in real-world environments.Motion planning: AI can help the robot plan optimal paths and motions to navigate around obstacles and people. This makes the robot’s movements more efficient and human-like.Reinforcement learning: The robot can learn how to do, and improve at doing, tasks through trial and error using AI reinforcement learning algorithms. This means less explicit programming is needed.The main rule of thumb is to use AI/ML whenever you want the robot to perform robustly in a complex, dynamic real-world environment. The AI gives it more perceptual and decision-making capabilities.
Now let’s look at one function we need for this robot – recognizing that an object is either a toy (and needs to be picked up) or is not. Creating a standard function for this via programming is quite difficult. Regular computer vision processes separate an image into shapes, colors, or areas. Our problem is the toys don’t have predictable shapes (circles, squares, or triangles), they don’t have consistent colors, and they are not all the same size. What we would rather do is to teach the robot what is a toy and what is not. That is what we would do with a person. We just need a process for teaching the robot how to use a camera to recognize a particular object. Fortunately, this is an area of AI that has been deeply studied, and there are already techniques to accomplish this, which we will use in Chapter 4. We will use a convolutional neural network (CNN) to recognize toys from camera images. This is a type of supervised learning, where we use examples to show the software what type of object we want to recognize, and then create a customized function that predicts the class (or type) of object based on the pixels that represent it in an image. One of the principles of AI that we will be applying is gradual learning using gradient descent. This means that instead of trying to make the computer learn a skill all in one go, we will train it a little bit at a time, gently training a function to output what we want by looking at errors (or loss) and making small changes. We use the principle of gradient descent – looking at the slope of the change in errors – to determine which way to adjust the training.
You may be thinking at this point, “If that works for learning to classify pictures, then maybe it can be used to classify other things," and you would be right. We’ll use a similar approach – with somewhat different neural networks – to teach the robot to answer to its name, by recognizing the sound.
So, in general, when do we need to use AI in a robot? When we need to emulate some sort of decision-making process that would be difficult or impossible to create with procedural steps (i.e., programming). It’s easy to see that neural networks are emulations of animal thought processes since they are a (greatly) simplified model of how neurons interact. Other AI techniques can be more difficult to understand.
One common theme could be that AI consistently uses programming by example as a technique to replace code with a common framework and variables with data. Instead of programming by process, we are programming by showing the software what result we want and having the software come up with how to get to that result. So for object recognition using pictures, we provide pictures of objects and the answer to what kind of object is represented by the picture. We repeat this over and over and train the software – by modifying the parameters in the code.
Another type of behavior we can create with AI has to do with behaviors. There are a lot of tasks that can be thought of as games. We can easily imagine how this works. Let’s say you want your children to pick up the toys in their room. You could command them to do it – which may or may not work. Or, you could make it a game by awarding points for each toy picked up, and giving a reward (such as giving a dollar) based on the number of points scored. What did we add by doing this? We added a metric, or measurement tool, to let the children know how well they are doing – a point system. And, more critically, we added a reward for specific behaviors. This can be a process we can use to modify or create behaviors in a robot. This is formally called reinforcement learning. While we can’t give a robot an emotional reward (as robots don’t have wants or needs), we can program the robot to seek to maximize a reward function. Then we can use the same process of making a small adjustment in parameters that change the reward, see whether that improves the score, and then either keep that change (when learning results in more reward, our reinforcement) or discard it if the score goes down. This type of process works well for robot motion, and for controlling robot arms.
I must tell you that the task set out in this book – to pick up toys in an unstructured environment – is nearly impossible to perform without AI techniques. It could be done by modifying the environment – say, by putting RFID tags in the toys – but not otherwise. That, then, is the purpose of this book – to show how certain tasks, which are difficult or impossible to solve otherwise, can be completed using the combination of AI and robotics.
Next, let’s discuss our robot and the development environment that we’ll be using in this book.
This is a book about robots and AI, so we really need to have a robot to use for all of our practical examples. As we will discuss in Chapter 2 at some length, I have selected robot hardware and software that will be accessible to the average reader. The particular brand and type are not important, and I’ve upgraded Albert considerably since the first edition was published some five years ago. In the interest of keeping things up to date, we are putting all of the hardware details in the GitHub repository for this book.
As shown in the following photographs taken from two different perspectives, my robot has new omnidirectional wheels, a mechanical six-degree-of-freedom arm, and a computer brain:
Figure 1.2 – Albert the robot has wheels and a mechanical arm
I’ll call it Albert, since it needs some sort of name, and I like the reference to Prince Albert, consort of Queen Victoria, who was famous for taking marvelous care of their nine children. All nine of his children survived to adulthood, which was a rarity in the Victorian age, and he had 42 grandchildren. He went by his middle name; his actual first name was Francis.
Our tasks in this book center around picking up toys in an interior space, so our robot has a solid base with four motors and omni wheels for driving over carpet. Our steering method is the tank type, or differential drive, where we steer by sending different commands to the wheel motors. If we want to go straight ahead, we set all four motors to the same forward speed. If we want to travel backward, we reverse both motors the same amount. Turns are accomplished by moving one side forward and the other backward (which makes the robot turn in place) or by giving one side more forward drive than the other. We can make any sort of turn this way. The omni wheels allow us to do some other tricks as well – we can turn the wheels toward each other and translate directly sideways, and even turn in a circle while pointing at the same spot on the ground. We will mostly drive like a truck or car but will use the y-axis motion occasionally to line things up. Speaking of axes, I’ll use the x axis to mean that the robot will move straight ahead, the y axis refers to horizontal movement from side to side, and the z axis is up and down, which we need for the robot’s arm.
In order to pick up toys, we need some sort of manipulator, so I’ve included a six-axis robot arm that imitates a shoulder–elbow–wrist–hand combination that is quite dexterous and, since it is made out of standard digital servos, quite easy to wire and program.
The main control of the Albert robot is the Nvidia Nano single-board computer (SBC), which talks to the operator via a USB Wi-Fi dongle. The Nvidia talks to an Arduino Mega 2560 microcontroller and motor controller that we will use to control motors via Pulse Width Modulation (PWM) pulses. The following figure depicts the internal components of the robot:
Figure 1.3 – Block diagram of the robot
We will be primarily concerned with the Nvidia Nano SBC, which is the brains of our robot. We will set up the rest of the components once and not change them for the entire book.
The Nvidia Nano acts as the main interface between our control station, which is a PC running Windows, and the robot itself via a Wi-Fi network. Just about any low-power, Linux-based SBC can perform this job, such as a BeagleBone Black, Odroid XU4, or an Intel Edison. One of the advantages of the Nano is that it can use its Graphics Processing Units (GPUs) to speed up the processing of neural networks.
Connected to the SBC is an Arduino with a motor controller. The Nano talks through a USB port addressed as a serial port. We also need a 5V regulator to provide the proper power from the 11.1V rechargeable lithium battery power pack into the robot. My power pack is a rechargeable 3S1P (three cells in series and one in parallel) 2700 Ah battery (normally used for quadcopter drones) and came with the appropriate charger. As with any lithium battery, follow all of the directions that come with the battery pack and recharge it in a metal box or container in case of fire.
I am going to direct you once again to the Git repository to see all of the software that runs the robot, but I’ll cover the basics here to remind you. The base operating system for the robot is Linux running on an Nvidia Nano SBC, as we said. We are using the ROS 2 to connect all of our various software components together, and it also does a wonderful job of taking care of all of the finicky networking tasks such as setting up sockets and establishing connections. It also comes with a great library of already prepared functions that we can just take advantage of, such as a joystick interface. ROS 2 is not a true operating system that controls the whole computer like Linux or Windows does, but rather is a backbone of communications and interface standards and utilities that make putting together a robot a lot simpler. The name I like to use for this type of system is Modular Open System Architecture (MOSA). ROS 2 uses a publish/subscribe technique to move data from one place to another that truly decouples the programs that produce data (such as sensors and cameras) from those programs that use data, such as controls and displays. We’ll be making a lot of our own stuff and only using a few ROS functions. Packt has several great books for learning ROS; my favorite is Effective Robotics Programming with ROS.
The programming language we will use throughout this book, with a couple of minor exceptions, will be Python. Python is a great language for this purpose for two great reasons: it is widely used in the robotics community in conjunction with ROS, and it is also widely accepted in the machine learning and AI community. This double whammy makes using Python irresistible. Python is an interpreted language, which has three amazing advantages for us:
Portability: Python is very portable between Windows, Mac, and Linux. Usually, you can get by with just a line or two of changes if you use a function out of the operating system, such as opening a file. Python has access to a huge collection of C/C++ libraries that also add to its utility.No compilation: As an interpreted language, Python does not require a compile step. Some of the programs we are developing in this book are pretty involved, and if we wrote them in C or C++, it would take 10 or 20 minutes of build time each time we made a change. You can do a lot with that much time, which you can spend getting your program to run and not waiting for the make process to finish.Isolation