Hands-On Genetic Algorithms with Python - Eyal Wirsansky - E-Book

Hands-On Genetic Algorithms with Python E-Book

Eyal Wirsansky

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Genetic algorithms are a family of search, optimization, and learning algorithms inspired by the principles of natural evolution. By imitating the evolutionary process, genetic algorithms can overcome hurdles encountered in traditional search algorithms and provide high-quality solutions for a variety of problems. This book will help you get to grips with a powerful yet simple approach to applying genetic algorithms to a wide range of tasks using Python, covering the latest developments in artificial intelligence.
After introducing you to genetic algorithms and their principles of operation, you'll understand how they differ from traditional algorithms and what types of problems they can solve. You'll then discover how they can be applied to search and optimization problems, such as planning, scheduling, gaming, and analytics. As you advance, you'll also learn how to use genetic algorithms to improve your machine learning and deep learning models, solve reinforcement learning tasks, and perform image reconstruction. Finally, you'll cover several related technologies that can open up new possibilities for future applications.
By the end of this book, you'll have hands-on experience of applying genetic algorithms in artificial intelligence as well as in numerous other domains.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 381

Veröffentlichungsjahr: 2020

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Genetic Algorithms with Python
Applying genetic algorithms to solve real-world deep learning and artificial intelligence problems
Eyal Wirsansky
BIRMINGHAM - MUMBAI

Hands-On Genetic Algorithms with Python

Copyright © 2020 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Sunith ShettyAcquisition Editor:Porous GodhaaContent Development Editor:Pratik AndradeSenior Editor: Ayaan HodaTechnical Editor:Mohd Riyan KhanCopy Editor:Safis EditingProject Coordinator:Anish DanielProofreader: Safis EditingIndexer:Priyanka DhadkeProduction Designer: Deepika Naik

First published: January 2020

Production reference: 1300120

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-83855-774-4

www.packt.com

To my wife, Jackie, for her love, patience, and support. To my children, Danielle and Liam, whose creativity and artistic talents inspired me in writing this book.
– Eyal Wirsansky

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Eyal Wirsansky is a senior software engineer, a technology community leader, and an artificial intelligence enthusiast and researcher. Eyal started his software engineering career as a pioneer in the field of voice over IP, and he now has over 20 years' experience of creating a variety of high-performing enterprise solutions. While in graduate school, he focused his research on genetic algorithms and neural networks. One outcome of his research is a novel supervised machine learning algorithm that combines the two.

Eyal leads the Jacksonville (FL) Java user group, hosts the Artificial Intelligence for Enterprise virtual user group, and writes the developer-oriented artificial intelligence blog, ai4java.

I would like to thank my family and close friends for their patience, support, and encouragement throughout the lengthy process of writing this book. Special thanks go to the Jacksonville Python Users Group (PyJax) for their feedback and support.

About the reviewer

Lisa Bang did her BS in marine biology at UC Santa Cruz, and an MS in bioinformatics at Soongsil University in Seoul under the tutelage of Dr. Kwang-Hwi Cho. Her masters' thesis was on a method for making QSARs reproducible using Jupyter Notebook, and contained a genetic algorithm component to reduce search space. This is now being developed into DEAP-VS to be compatible with Python 3. She also worked at Geisinger Health System as part of the Biomedical and Translational Informatics Institute, using next-generation sequencing and electronic health record data to analyze outcomes in cancer and other diseases. She now works at Ultragenyx Pharmaceutical, focusing on preclinical research using bioinformatics and chemoinformatics on rare genetic diseases.

Thank you to my family, my teachers, and my mentors.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Genetic Algorithms with Python

Dedication

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Code in Action

Conventions used

Get in touch

Reviews

Section 1: The Basics of Genetic Algorithms

An Introduction to Genetic Algorithms

What are genetic algorithms?

Darwinian evolution

The genetic algorithms analogy

Genotype

Population

Fitness function

Selection

Crossover

Mutation

The theory behind genetic algorithms

The schema theorem

Differences from traditional algorithms

Population-based

Genetic representation

Fitness function

Probabilistic behavior

Advantages of genetic algorithms

Global optimization

Handling complex problems

Handling a lack of mathematical representation

Resilience to noise

Parallelism

Continuous learning

Limitations of genetic algorithms

Special definitions

Hyperparameter tuning

Computationally-intensive

Premature convergence

No guaranteed solution

Use cases of genetic algorithms

Summary

Further reading

Understanding the Key Components of Genetic Algorithms

Basic flow of a genetic algorithm

Creating the initial population

Calculating the fitness

Applying selection, crossover, and mutation

Checking the stopping conditions

Selection methods

Roulette wheel selection

Stochastic universal sampling

Rank-based selection

Fitness scaling

Tournament selection

Crossover methods

Single-point crossover

Two-point and k-point crossover

Uniform crossover

Crossover for ordered lists

Ordered crossover

Mutation methods

Flip bit mutation

Swap mutation

Inversion mutation

Scramble mutation

Real-coded genetic algorithms

Blend crossover

Simulated binary crossover

Real mutation

Understanding elitism

Niching and sharing

Serial niching versus parallel niching

The art of solving problems using genetic algorithms

Summary

Further reading

Section 2: Solving Problems with Genetic Algorithms

Using the DEAP Framework

Technical requirements

Introduction to DEAP

Using the creator module

Creating the Fitness class

Defining the fitness strategy

Storing the fitness values

Creating the Individual class

Using the Toolbox class

Creating genetic operators

Creating the population

Calculating the fitness

The OneMax problem

Solving the OneMax problem with DEAP

Choosing the chromosome

Calculating the fitness

Choosing the genetic operators

Setting the stopping condition

Implementing with DEAP

Setting up

Evolving the solution

Running the program

Using built-in algorithms

The Statistics object

The algorithm

The logbook

Running the program

Adding the hall of fame

Experimenting with the algorithm's settings

Population size and number of generations

Crossover operator

Mutation operator

Selection operator

Tournament size and relation to mutation probability

Roulette wheel selection

Summary

Further reading

Combinatorial Optimization

Technical requirements

Search problems and combinatorial optimization

Solving the knapsack problem

The Rosetta Code knapsack 0-1 problem

Solution representation

Python problem representation

Genetic algorithms solution

Solving the TSP

TSPLIB benchmark files

Solution representation

Python problem representation

Genetic algorithms solution

Improving the results with enhanced exploration and elitism

Solving the VRP

Solution representation

Python problem representation

Genetic algorithms solution

Summary

Further reading

Constraint Satisfaction

Technical requirements

Constraint satisfaction in search problems

Solving the N-Queens problem

Solution representation

Python problem representation

Genetic algorithms solution

Solving the nurse scheduling problem

Solution representation

Hard constraints versus soft constraints

Python problem representation

Genetic algorithms solution

Solving the graph coloring problem

Solution representation

Using hard and soft constraints for the graph coloring problem

Python problem representation

Genetic algorithms solution

Summary

Further reading

Optimizing Continuous Functions

Technical requirements

Chromosomes and genetic operators for real numbers

Using DEAP with continuous functions

Optimizing the Eggholder function

Optimizing the Eggholder function with genetic algorithms

Improving the speed with an increased mutation rate

Optimizing Himmelblau's function

Optimizing Himmelblau's function with genetic algorithms

Using niching and sharing to find multiple solutions

Simionescu's function and constrained optimization

Constrained optimization with genetic algorithms

Optimizing Simionescu's function using genetic algorithms

Using constraints to find multiple solutions

Summary

Further reading

Section 3: Artificial Intelligence Applications of Genetic Algorithms

Enhancing Machine Learning Models Using Feature Selection

Technical requirements

Supervised machine learning

Classification

Regression

Supervised learning algorithms

Feature selection in supervised learning

Selecting the features for the Friedman-1 regression problem

Solution representation

Python problem representation

Genetic algorithms solution

Selecting the features for the classification Zoo dataset

Python problem representation

Genetic algorithms solution

Summary

Further reading

Hyperparameter Tuning of Machine Learning Models

Technical requirements

Hyperparameters in machine learning

Hyperparameter tuning

The Wine dataset

The adaptive boosting classifier

Tuning the hyperparameters using a genetic grid search

Testing the classifier's default performance

Running the conventional grid search

Running the genetic algorithm-driven grid search

Tuning the hyperparameters using a direct genetic approach

Hyperparameter representation

Evaluating the classifier accuracy

Tuning the hyperparameters using genetic algorithms

Summary

Further reading

Architecture Optimization of Deep Learning Networks

Technical requirements

Artificial neural networks and deep learning

Multilayer Perceptron

Deep learning and convolutional neural networks

Optimizing the architecture of a deep learning classifier

The Iris flower dataset

Representing the hidden layer configuration

Evaluating the classifier's accuracy

Optimizing the MLP architecture using genetic algorithms

Combining architecture optimization with hyperparameter tuning

Solution representation

Evaluating the classifier's accuracy

Optimizing the MLP's combined configuration using genetic algorithms

Summary

Further reading

Reinforcement Learning with Genetic Algorithms

Technical requirements

Reinforcement learning

Genetic algorithms and reinforcement learning

OpenAI Gym

The env interface

Solving the MountainCar environment

Solution representation

Evaluating the solution

Python problem representation

Genetic algorithms solution

Solving the CartPole environment

Controlling the CartPole with a neural network

Solution representation and evaluation

Python problem representation

Genetic algorithms solution

Summary

Further reading

Section 4: Related Technologies

Genetic Image Reconstruction

Technical requirements

Reconstructing images with polygons

Image processing in Python

Python image processing libraries

The Pillow library

The scikit-image library

The opencv-python library

Drawing images with polygons

Measuring the difference between images

Pixel-based Mean Squared Error

Structural Similarity (SSIM)

Using genetic algorithms to reconstruct images

Solution representation and evaluation

Python problem representation

Genetic algorithm implementation

Adding a callback to the genetic run

Image reconstruction results

Using pixel-based Mean Squared Error

Using the SSIM index

Other experiments

Summary

Further reading

Other Evolutionary and Bio-Inspired Computation Techniques

Technical requirements

Evolutionary computation and bio-inspired computing

Genetic programming

Genetic programming example – even parity check

Genetic programming implementation

Simplifying the solution

Particle swarm optimization

PSO example – function optimization

Particle swarm optimization implementation

Other related techniques

Evolution strategies

Differential evolution

Ant colony optimization

Artificial immune systems

Artificial life

Summary

Further reading

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Drawing inspiration from Charles Darwin's theory of natural evolution, genetic algorithms are among the most fascinating techniques for solving search, optimization, and learning problems. They can often prove successful where traditional algorithms fail to provide adequate results within a reasonable timeframe.

This book will take you on a journey to mastering this extremely powerful, yet simple, approach, and applying it to a wide variety of tasks, culminating in AI applications.

Using this book, you will gain an understanding of genetic algorithms, how they work, and when to use them. In addition, the book will provide you with hands-on experience of applying genetic algorithms to various domains using the popular Python programming language.

Who this book is for

This book was written to help software developers, data scientists, and AI enthusiasts interested in harnessing genetic algorithms to carry out tasks involving learning, searching, and optimization in their applications, as well as enhancing the performance and accuracy of their existing intelligent applications.

This book is also intended for anyone who is tasked with real-life, hard-to-solve problems where traditional algorithms are not useful, or fail to provide adequate results within a practical amount of time. The book demonstrates how genetic algorithms can be used as a powerful, yet simple, approach to solving a variety of complex problems.

What this book covers

Chapter 1, An Introduction to Genetic Algorithms, introduces genetic algorithms, their underlying theory, and their basic principles of operation. You will then explore the differences between genetic algorithms and traditional methods, and learn about the best use cases for genetic algorithms.

Chapter 2, Understanding the Key Components of Genetic Algorithms, dives deeper into the key components and the implementation details of genetic algorithms. After outlining the basic genetic flow, you will learn about their different components and the various implementations for each component.

Chapter 3, Using the DEAP Framework, introduces DEAP—a powerful and flexible evolutionary computation framework capable of solving real-life problems using genetic algorithms. You will discover how to use this framework by writing a Python program that solves the OneMax problem—the 'Hello World' of genetic algorithms.

Chapter 4, Combinatorial Optimization, covers combinatorial optimization problems, such as the knapsack problem, the traveling salesman problem, and the vehicle routing problem, and how to write Python programs that solve them using genetic algorithms and the DEAP framework.

Chapter 5, Constraint Satisfaction, introduces constraint satisfaction problems, such as the N-Queen problem, the nurse scheduling problem, and the graph coloring problem, and explains how to write Python programs that solve them using genetic algorithms and the DEAP framework.

Chapter 6, Optimizing Continuous Functions, covers continuous optimization problems, and how they can be solved by means of genetic algorithms. The examples you will use include the optimization of the Eggholder function, Himmelblau's function, and Simionescu's function. Along the way, you will explore the concepts of niching, sharing, and constraint handling.

Chapter 7, Enhancing Machine Learning Models Using Feature Selection, talks about supervised machine learning models, and explains how genetic algorithms can be used to improve the performance of these models by selecting the best subset of features from the input data provided.

Chapter 8, Hyperparameter Tuning of Machine Learning Models, explains how genetic algorithms can be used to improve the performance of supervised machine learning models by tuning the hyperparameters of the models, either by applying a genetic algorithm-driven grid search, or by using a direct genetic search.

Chapter 9, Architecture Optimization of Deep Learning Networks, focuses on artificial neural networks, and discovers how genetic algorithms can be used to improve the performance of neural-based models by optimizing their network architecture. You will then learn how to combine network architecture optimization with hyperparameter tuning.

Chapter 10, Reinforcement Learning with Genetic Algorithms, covers reinforcement learning, and explains how genetic algorithms can be applied to reinforcement learning tasks while solving two benchmark environments—MountainCar and CartPole— from the OpenAI Gym toolkit.

Chapter 11, Genetic Image Reconstruction, experiments with thereconstruction of a well-known image using a set of semi-transparent polygons, orchestrated by genetic algorithms. Along the way, you will gain useful experience in image processing and the relevant Python libraries.

Chapter 12, Other Evolutionary and Bio-Inspired Computation Techniques, broadens your horizons and gets you acquainted with several other biologically inspired problem-solving techniques. Two of these methods—genetic programming and particle swarm optimization—will be demonstrated using DEAP-based Python programs.

To get the most out of this book

To get the most out of this book, you should have a working knowledge of the Python programming language, and basic knowledge of mathematics and computer science. An understanding of fundamental machine learning concepts will be beneficial, but not mandatory, as the book covers the necessary concepts in a nutshell.

To run the programming examples accompanying this book, you will need Python release 3.7 or newer, as well as several Python packages described throughout the book. A Python IDE (Integrated Development Environment), such as PyCharm or Visual Studio Code, is recommended but not required.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Hands-On-Genetic-Algorithms-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838557744_ColorImages.pdf.

Code in Action

Visit the following link to check out videos of the code being run:http://bit.ly/3azd7Sp

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: The Basics of Genetic Algorithms

In this section, you will be introduced to the key concepts of genetic algorithms and how they can be used.

This section comprises the following chapters:

Chapter 1

,

An Introduction to Genetic Algorithms

Chapter 2

,

Understanding the Key Components of Genetic Algorithms

An Introduction to Genetic Algorithms

Drawing its inspiration from Charles Darwin's theory of natural evolution, one of the most fascinating techniques for problem-solving is the algorithm family suitably named evolutionary computation. Within this family, the most prominent and widely used branch is known as genetic algorithms. This chapter is the beginning of your journey to mastering this extremely powerful, yet extremely simple, technique.

In this chapter, we will introduce genetic algorithms and their analogy to Darwinian evolution, and dive into their basic principles of operation as well as their underlying theory. We will then go over the differences between genetic algorithms and traditional ones and cover the advantages and limitations of genetic algorithms and their uses. We will conclude by reviewing the cases where the use of a genetic algorithm may prove beneficial.

In this introductory chapter, we will cover the following topics:

What are genetic algorithms?

The theory behind genetic algorithms

Differences between genetic algorithms and traditional algorithms

Advantages and limitations of genetic algorithms

When to use genetic algorithms

What are genetic algorithms?

Genetic algorithms are a family of search algorithms inspired by the principles of evolution in nature. By imitating the process of natural selection and reproduction, genetic algorithms can produce high-quality solutions for various problems involving search, optimization, and learning. At the same time, their analogy to natural evolution allows genetic algorithms to overcome some of the hurdles that are encountered by traditional search and optimization algorithms, especially for problems with a large number of parameters and complex mathematical representations.

In the rest of this section, we will review the basic ideas of genetic algorithms, as well as their analogy to the evolutionary processes transpiring in nature.

Darwinian evolution

Genetic algorithms implement a simplified version of the Darwinian evolution that takes place in nature. The principles of the Darwinian evolution theory can be summarized using the following principles:

The principle of

variation

: The traits (attributes) of individual specimens belonging to a population may vary. As a result, the specimens differ from each other to some degree; for example, in their behavior or appearance.

The principle of

inheritance

: S

ome traits are consistently passed on from specimens to their offspring. As a result, o

ffspring resemble their parents more than they resemble unrelated specimens.

The principle of

selection

: P

opulations typically struggle for resources within their given environment. The specimens possessing traits that are better adapted to the environment will be more

successful at surviving, and

will also contribute more offspring to the next generation.

In other words, evolution maintains a population of individual specimens that vary from each other. Those who are better adapted to their environment have a greater chance of surviving, breeding, and passing their traits to the next generation. This way, as generations go by, species become more adapted to their environment and to the challenges presented to them.

An important enabler of evolution is crossover or recombination – where offspring are created with a mix of their parents' traits. Crossover helps in maintaining the diversity of the population and in bringing together the better traits over time. In addition,mutations– random variations in traits – can play a role in evolution by introducing changes that can result in a leap forward every once in a while.

The genetic algorithms analogy

Genetic algorithms seek to find the optimal solution for a given problem. Whereas Darwinian evolution maintains a population of individual specimens, genetic algorithms maintain a population of candidate solutions, called individuals, for that given problem. These candidate solutions are iteratively evaluated and used to create a new generation of solutions. Those who are better at solving this problem have a greater chance of being selected and passing their qualities to the next generation of candidate solutions. This way, as generations go by, candidate solutions get better at solving the problem at hand.

In the following sections, we will describe the various components of genetic algorithms that enable this analogy for Darwinian evolution.

Genotype

In nature, breeding, reproduction, and mutation are facilitated via the genotype – a collection of genes that are grouped into chromosomes. If two specimens breed to create offspring, each chromosome of the offspring will carry a mix of genes from both parents.

Mimicking this concept, in the case of genetic algorithms, each individual is represented by a chromosome representing a collection of genes. For example, a chromosome can be expressed as a binary string, where each bit represents a single gene:

Simple binary-coded chromosome

The preceding image shows an example of one such binary-coded chromosome, representing one particular individual.

Population

At any point in time, genetic algorithms maintain a population of individuals– a collection of candidate solutions for the problem at hand. Since each individual is represented by some chromosome, this population of individuals can be seen as a collection of such chromosomes:

A population of individuals represented by binary-coded chromosomes

The population continually represents the current generation and evolves over time when the current generation is replaced by a new one.

Fitness function

At each iteration of the algorithm, the individuals are evaluated using a fitness function (also called the target function). This is the function we seek to optimize or the problem we attempt to solve.

Individuals who achieve a better fitness score represent better solutions and are more likely to be chosen to reproduce and be represented in the next generation. Over time, the quality of the solutions improves, the fitness values increase, and the process can stop once a solution is found with a satisfactory fitness value.

Selection

After calculating the fitness of every individual in the population, a selection process is used to determine which of the individuals in the population will get to reproduce and create the offspring that will form the next generation.

This selection process is based on the fitness score of the individuals. Those with higher score values are more likely to be chosen and pass their genetic material to the next generation.

Individuals with low fitness values can still be chosen, but with lower probability. This way, their genetic material is not completely excluded.

Crossover

To create a pair of new individuals, two parents are usually chosen from the current generation, and parts of their chromosomes are interchanged (crossed over) to create two new chromosomes representing the offspring. This operation is called crossover, or recombination:

Crossover operation between two binary-coded chromosomes Source: https://commons.wikimedia.org/wiki/File:Computational.science.Genetic.algorithm.Crossover.One.Point.svg.Image by Yearofthedragon. Licensed under Creative Commons CC BY-SA 3.0: https://creativecommons.org/licenses/by-sa/3.0/deed.en

The preceding image illustrates a simple crossover operation of creating two offspring from two parents.

Mutation

The purpose of the mutation operator is to periodically and randomly refresh the population, introduce new patterns into the chromosomes, and encourage search in uncharted areas of the solution space.

A mutation may manifest itself as a random change in a gene. Mutations are implemented as random changes to one or more of the chromosome values; for example, flipping a bit in a binary string:

Mutation operator applied to a binary-coded chromosome

The preceding image shows an example of the mutation operation.

Now, let's look at the theory behind genetic algorithms.

The theory behind genetic algorithms

Thebuilding-block hypothesisunderlying genetic algorithms is that the optimal solution to the problem at hand is assembled of small building blocks, and as we bring more of these building blocks together, we get closer to this optimal solution.

Individuals in the population who contain some of the desired building blocks are identified by their superior scores. The repeated operations of selection and crossover result in the better individuals conveying these building blocks to the next generations, while possibly combining them with other successful building blocks. This creates genetic pressure, thus guiding the population toward having more and more individuals with the building blocks that form the optimal solution.

As a result, each generation is better than the previous one and contains more individuals that are closer to the optimal solution.

For example, if we have a population of four-digit binary strings and we want to find the string that has the largest possible sum of digits, the digit 1 appearing at any of the four string positions will be a good building block. As the algorithm progresses, it will identify solutions that have these building blocks and bring them together. Each generation will have more individuals with 1 values in various positions, ultimately resulting in the string 1111, which combines all the desired building blocks. This is illustrated in the following image:

Demonstration of a crossover operation bringing the building blocks of the optimal solution together

The preceding image demonstrates how two individuals that are good solutions for this problem (each has three 1 values) create an offspring that is the best possible solution (four 1 bits, that is, the offspring on the right-hand side) when the crossover operation brings the desired building blocks of both parents together.

The schema theorem

A more formal expression of the building-block hypothesis is Holland's schema theorem, also called the fundamental theorem of genetic algorithms.

This theorem refers to schemata (plural of schema), which are patterns (or templates) that can be found within the chromosomes. Each schema represents a subset of chromosomes that have a certain similarity among them.

For example, if the set of chromosomes is represented by binary strings of length four, the schema 1*01 represents all those chromosomes that have a 1 in the leftmost position, 01 in the rightmost two positions, and either a 1 or a 0 in the second from left position, since the * represents a wildcard value.

For each schema, we can assign two measurements:

Order

: The number of digits that are fixed (not wildcards)

Defining length

: The distance between the two furthermost fixed digits

The following table provides several examples of four-digit binary schemata and their measurements:

Schema

Order

Defining Length

1101

4

3

1*01

3

3

*101

3

2

*1*1

2

2

**01

2

1

1***

1

0

****

0

0

Each chromosome in the population corresponds to multiple schemata in the same way that a given string matches regular expressions. The chromosome 1101, for example, corresponds to each and every one of the schemata that appear in this table since it matches each of the patterns they represent. If this chromosome has a higher score, it is more likely to survive the selection operation, along with all the schemata it represents. As this chromosome gets crossed over with another, or as it gets mutated, some of the schemata will survive and others will disappear. The schemata of low order and short defining length are the ones more likely to survive.

Consequentially, the schema theorem states that the frequency of schemata of low order, short defining length, and above-average fitness increases exponentially in successive generations. In other words, the smaller, simpler building blocks that represent the attributes that make a solution better will become increasingly present in the population as the genetic algorithm progresses. We will look at the difference between genetic and traditional algorithms in the next section.

Differences from traditional algorithms

There are several important differences between genetic algorithms and traditional search and optimization algorithms, such as gradient-based algorithms.

The key characteristics of genetic algorithms distinguishing them from traditional algorithms are:

Maintaining a population of solutions

Using a genetic representation of the solutions

Utilizing the outcome of a fitness function

Exhibiting a probabilistic behavior

In the upcoming sections, we will describe these factors in greater detail.

Population-based

The genetic search is conducted over a population of candidate solutions (individuals) rather than a single candidate. At any point in the search, the algorithm retains a set of individuals that form the current generation. Each iteration of the genetic algorithm creates the next generation of individuals.

In contrast, most other search algorithms maintain a single solution and iteratively modify it in search of the best solution. The gradient descent algorithm, for example, iteratively moves the current solution in the direction of steepest descent, which is defined by the negative of the given function's gradient.

Genetic representation

Instead of operating directly on candidate solutions, genetic algorithms operate on their representations (or coding), often referred to as chromosomes. An example of a simple chromosome is a fixed-length binary string.

The chromosomes allow us to facilitate the genetic operations of crossover and mutation. Crossover is implemented by interchanging chromosome parts between two parents, while mutation is implemented by modifying parts of the chromosome.

A side effect of the use of genetic representation is decoupling the search from the original problem domain. Genetic algorithms are not aware of what the chromosomes represent and do not attempt to interpret them.

Fitness function

The fitness function represents the problem we would like to solve. The objective of genetic algorithms is to find the individuals that yield the highest score when this function is calculated for them.

Unlike many of the traditional search algorithms, genetic algorithms only consider the value that's obtained by the fitness function and do not rely on derivatives or any other information. This makes them suitable to handle functions that are hard or impossible to mathematically differentiate.

Probabilistic behavior

While many of the traditional algorithms are deterministic in nature, the rules that are used by genetic algorithms to advance from one generation to the next are probabilistic.

For example, when selecting the individuals that will be used to create the next generation, the probability of selecting a given individual increases with the individual's fitness, but there is still a random element in making that choice. Individuals with low score values can still be chosen as well, although with a lower probability.

Mutation is probability-driven as well, usually occurs with low likelihood, and makes changes at random location(s) in the chromosome.

The crossover operator can have a probabilistic element as well. In some variations of genetic algorithms, the crossover will only occur at a certain probability. If no crossover takes place, both parents are duplicated into the next generation without change.

Despite the probabilistic nature of this process, the genetic algorithm-based search is not random; instead, it uses the random aspect to direct the search toward areas in the search space where there is a better chance to improve the results. Now, let's look at the advantages of genetic algorithms.

Advantages of genetic algorithms

The unique characteristics of genetic algorithms that we discussed in the previous sections provide several advantages over traditional search algorithms.

The main advantages of genetic algorithms are as follows:

Global optimization capability

Handling problems with a complex mathematical representation

Handling problems that lack mathematical representation

Resilience to noise

Support for parallelism and distributed processing

Suitability for continuous learning

We will cover each of these in the upcoming sections.

Global optimization

In many cases, optimization problems have local maxima and minima points; these represent solutions that are better than those around them, but not the best overall.

The following image illustrates the differences between global and local maxima and minima points:

The global and local maxima and minimaof a function Source: https://commons.wikimedia.org/wiki/File:Computational.science.Genetic.algorithm.Crossover.One.Point.svg. Image by KSmrq. Licensed under Creative Commons CC BY-SA 3.0: https://creativecommons.org/licenses/by-sa/3.0/

Most traditional search and optimization algorithms, and particularly those that are gradient-based, are prone to getting stuck in a local maximum rather than finding the global one. This is because, in the vicinity of a local maximum, any small change will degrade the score.

Genetic algorithms, on the other hand, are less sensitive to this phenomenon and are more likely to find the global maximum. This is due to the use of a population of candidate solutions rather than a single one, and the crossover and mutation operations that will, in many cases, result in candidate solutions that are distant from the previous ones. This holds true as long as we manage to maintain the diversity of the population and avoid premature convergence, as we will mention in the next section.

Handling complex problems

Since genetic algorithms only require the outcome of the fitness function for each individual and are not concerned with other aspects of the fitness function such as derivatives, they can be used for problems with complex mathematical representations or functions that are hard or impossible to differentiate.

Other complex cases where genetic algorithms excel include problems with a large number of parameters and problems with a mix of parameter types; for example, a combination of continuous and discrete parameters.

Handling a lack of mathematical representation

Genetic algorithms can be used for problems that lack mathematical representation altogether. One such case of particular interest is when the fitness score is based on human opinion. Imagine, for example, that we want to find the most attractive color palette to be used on a website. We can try different color combinations and ask users to rate the attractiveness of the site. We can apply genetic algorithms to search for the best scoring combination while using this opinion-based score as the fitness function outcome. The genetic algorithm will operate as usual, even though the fitness function lacks any mathematical representation and there is no way to calculate the score directly from a given color combination.

As we will see in the next chapter, genetic algorithms can even deal with cases where the score of each individual cannot be obtained, as long as we have a way to compare two individuals and determine which of them is better. An example of this is a machine learning algorithm that drives a car in a simulated race. A genetic algorithm-based search can optimize and tune the machine learning algorithm by having different versions of it compete against each other to determine which version is better.

Resilience to noise

Some problems present noisy behavior. This means that, even for similar input parameter values, the output value may be somewhat different every time it's measured. This can happen, for example, when the data that's being used is being read from sensor outputs, or in cases where the score is based on human opinion, as was discussed in the previous section.

While this kind of behavior can throw off many traditional search algorithms, genetic algorithms are generally resilient to it, thanks to the repetitive operation of reassembling and reevaluating the individuals.

Parallelism

Genetic algorithms lend themselves well to parallelization and distributed processing. Fitness is independently calculated for each individual, which means all the individuals in the population can be evaluated concurrently.

In addition, the operations of selection, crossover, and mutation can each be performed concurrently on individuals and pairs of individuals in the population.

This makes the approach of genetic algorithms a natural candidate for distributed as well as cloud-based implementation.

Continuous learning

In nature, evolution never stops. As the environmental conditions change, the population will adapt to them. Similarly, genetic algorithms can operate continuously in an ever-changing environment, and at any point in time, the best current solution can be fetched and used.

For this to be effective, the changes in the environment need to be slow in relation to the generation turnaround rate of the genetic algorithm-based search. Now that we have covered the advantages, let's look at the limitations of genetic algorithms.

Limitations of genetic algorithms

To get the most out of genetic algorithms, we need to be aware of their limitations and potential pitfalls.

The limitations of genetic algorithms are as follows:

The need for special definitions

The need for hyperparameter tuning

Computationally-intensive operations

The risk of premature convergence

No guaranteed solution

We will cover each of these in the upcoming sections.

Special definitions

When applying genetic algorithms to a given problem, we need to create a suitable representation for them – define the fitness function and the chromosome structure, as well as the selection, crossover, and mutation operators that will work for this problem. This can often prove to be challenging and time-consuming.

Luckily, genetic algorithms have already been applied to countless different types of problems, and many of these definitions have been standardized. This book covers numerous types of real-life problems and the way they can be solved using genetic algorithms. Use this as guidance whenever you are challenged by a new problem.

Hyperparameter tuning

The behavior of genetic algorithms is controlled by a set of hyperparameters, such as the population size and mutation rate. When applying genetic algorithms to the problem at hand, there are no exact rules for making these choices.

However, this is the case for virtually all search and optimization algorithms. After going over the examples in this book and doing some experimentation of your own, you will be able to make sensible choices for these values.

Computationally-intensive

Operating on (potentially large) populations and the repetitive nature of genetic algorithms can be computationally intensive, as well as time consuming before a good result is reached.

These can be alleviated with a good choice of hyperparameters, implementing parallel processing, and in some cases, caching the intermediate results.

Premature convergence

If the fitness of one individual is much higher than the rest of the population, it may be duplicated enough that it takes over the entire population. This can lead to the genetic algorithm getting prematurely stuck in a local maximum, instead of finding the global one.

To prevent this from occurring, it is important to maintain the diversity of the population. Various ways to maintain diversity will be discussed in the next chapter.

No guaranteed solution

The use of genetic algorithms does not guarantee that the global maximum for the problem at hand will be found.

However, this is almost the case for any search and optimization algorithm, unless it is an analytical solution for a particular type of problem.

Generally, genetic algorithms, when used appropriately, are known to provide good solutions within a reasonable amount of time. Now, let's look at a few use cases of genetic algorithms.

Use cases of genetic algorithms

Based on the material we covered in the previous sections, genetic algorithms are best suited for the following types of problems:

Problems with complex mathematical representation

:

Since genetic algorithms only require the outcome of the fitness function, they can be used for problems with target functions that are hard or impossible to differentiate, problems with a large number

of parameters, and problems with a mix of parameter types.

Problems with no mathematical representation

: Genetic algorithms don't require a mathematical representation of the problem as long as a score value can be obtained or a method is available to compare two solutions.

Problems involving a noisy environment

:

Genetic algorithms are resilient to problems where data may not be consistent, such as data originating from sensor output or from human-based scoring.

Problems involving an environment that c

hanges over time

: Genetic algorithms can respond to slow changes in the environment by continuously creating new generations that will adapt to the changes that occur.

On the other hand, when a problem has a known and specialized way of being solved, using an existing traditional or analytic method is likely to be a more efficient choice.

Summary

In this chapter, we started by introducing genetic algorithms, their analogy to Darwinian evolution, and their basic principles of operation, including the use of population, genotype, the fitness function, and the genetic operators of selection, crossover, and mutation.

Then, we covered the theory underlying genetic algorithms by going over the building-block hypothesis