15 Math Concepts Every Data Scientist Should Know - David Hoyle - E-Book

15 Math Concepts Every Data Scientist Should Know E-Book

David Hoyle

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Data science combines the power of data with the rigor of scientific methodology, with mathematics providing the tools and frameworks for analysis, algorithm development, and deriving insights. As machine learning algorithms become increasingly complex, a solid grounding in math is crucial for data scientists. David Hoyle, with over 30 years of experience in statistical and mathematical modeling, brings unparalleled industrial expertise to this book, drawing from his work in building predictive models for the world's largest retailers.
Encompassing 15 crucial concepts, this book covers a spectrum of mathematical techniques to help you understand a vast range of data science algorithms and applications. Starting with essential foundational concepts, such as random variables and probability distributions, you’ll learn why data varies, and explore matrices and linear algebra to transform that data. Building upon this foundation, the book spans general intermediate concepts, such as model complexity and network analysis, as well as advanced concepts such as kernel-based learning and information theory. Each concept is illustrated with Python code snippets demonstrating their practical application to solve problems.
By the end of the book, you’ll have the confidence to apply key mathematical concepts to your data science challenges.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 861

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



15 Math Concepts Every Data Scientist Should Know

Understand and learn how to apply the math behind data science algorithms

David Hoyle

15 Math Concepts Every Data Scientist Should Know

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Niranjan Naikwadi

Publishing Product Manager: Yasir Ali Khan

Content Development Editor: Joseph Sunil

Technical Editor: Seemanjay Ameriya

Copy Editor: Safis Editing

Project Coordinator: Urvi Sharma

Proofreader: Safis Editing

Indexer: Hemangini Bari

Production Designer: Joshua Misquitta

Marketing Coordinator: Vinishka Kalra

First published: July 2024

Production reference: 2221024

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-83763-418-7

www.packtpub.com

To my wife Clare for her unwavering love, support, and inspiration throughout our life together.

– David Hoyle

Contributors

About the author

David Hoyle has over 30 years’ experience in machine learning, statistics, and mathematical modeling. He gained a BSc. degree in mathematics and physics and a Ph.D. in theoretical physics, both from the University of Bristol, UK. He then embarked on an academic career that included research at the University of Cambridge and leading his own research groups as an Associate Professor at the University of Exeter and the University of Manchester in the UK. For the last 13 years, he has worked in the commercial sector, including for Lloyds Banking Group – one of the UK’s largest retail banks, and as joint Head of Data Science for AutoTrader UK. He now works for the global customer data science company dunnhumby, building statistical and machine learning models for the world’s largest retailers, including Tesco UK and Walmart. He lives and works in Manchester, UK.

This has been a long endeavor. I would like to thank my wife and children for their encouragement, and the team at Packt for their patience and support throughout the process.

About the reviewer

Emmanuel Nyatefe is a data analyst with over 5 years of experience in data analytics, AI, and ML. He holds a Masters of Science in Business Analytics from the W. P. Carey School of Business at Arizona State University and a Bachelors of Science in Business Information Technology from Kwame Nkrumah University of Science and Technology. He has led various AI and ML projects, including developing models for detecting crop diseases and applying Generative AI to innovate business solutions and optimize operations. His expertise in data engineering, modeling, and visualization, alongside his proficiency in LLMs and advanced analytics, highlights his significant contributions to data science. His dedication to data-driven innovation is evident in his book review.

Part 1: Essential Concepts

In this part, we will introduce the math concepts that you will encounter again and again as a data scientist. These concepts are vital to gain a good understanding of. After a recap of basic math notation, we look at the concepts related to how data is produced and then move through to concepts related to how to transform data, finally building up to our end goal of how to model data. These concepts are essential because you will use and combine them simultaneously in your work. By the end of Part 1, you will be comfortable with the math concepts that underpin almost all data science models and algorithms.

This section contains the following chapters:

Chapter 1, Recap of Mathematical Notation and TerminologyChapter 2, Random Variables and Probability DistributionsChapter 3, Matrices and Linear AlgebraChapter 4, Loss Functions and OptimizationAnchor 5, Probabilistic Modeling

1

Recap of Mathematical Notation and Terminology

Our tour of math concepts will start properly in Chapter 2. Before we begin that tour, we’ll start by recapping some mathematical notation and terminology. Mathematics is a language, and mathematical symbols and notation are its alphabet. Therefore, we must be comfortable with and understand the basics of this alphabet.

In this chapter, we will recap the most common core notation and terminology that we are likely to use repeatedly throughout the book. We have grouped the recap into six main math areas or topics. Those topics are as follows:

Number systems: In this section, we introduce notation for real and complex numbersLinear algebra: In this section, we introduce notation for describing vectors and matricesSums, products, and logarithms: In this section, we introduce notation for succinctly representing sums and products, and we introduce rules for logarithmsDifferential and integral calculus: In this section, we introduce basic notation for differentiation and integrationAnalysis: In this section, we introduce notation for describing limits, and order notationCombinatorics: In this section, we introduce notation for binomial coefficients

Some of this notation you may already be familiar with. For example, complex numbers, matrices, logarithms, and basic differential calculus you will have seen either in high school or in the first year of an undergraduate degree in a numerate subject. Other topics, such as order notation, you may have encountered as part of a university degree course on mathematical analysis or algorithm complexity, or it may be new to you. For the most part, the notation we recap in this chapter you will have seen before. You can skip this chapter if you want to and if you are already familiar and comfortable with the symbols and notation recapped here. You can easily come back later or read those sections that contain notation that is new to you.

We should emphasize that this chapter is a recap. It is brief. It is not meant to be an exhaustive and comprehensive review. We focus on presenting a few main facts, but also on trying to give a feel for why the notation may be useful and how it is likely to be used.

Finally, we will encounter new notation, terminology, and symbols as we progress through the book when we are discussing specific topics. We will introduce this new notation and terminology as and when we need it.

Technical requirements

As this chapter solely recaps some of the mathematical notation we will use in later chapters, there are no code examples given and hence no technical requirements for this particular chapter.

For later chapters, you will be able to find code examples at the GitHub repository: https://github.com/PacktPublishing/15-Math-Concepts-Every-Data-Scientist-Should-Know

Number systems

In this section, we introduce notation for describing sets of numbers. We will focus on the real numbers and the complex numbers.

Notation for numbers and fields

As this is a book about data science, we will be dealing with numbers. So, it will be worthwhile recapping the notation we use to refer to the most common sets of numbers.

Most of the numbers we will deal with in this book will be real numbers, such as 4.6, 1, or -2.3. We can think of them as “living” on the real number line shown in Figure 1.1. The real number line is a one-dimensional continuous structure. There are an infinite number of real numbers. We denote the set of all real numbers by the symbol ℝ.

Figure 1.1: The real number line

Obviously, there will be situations where we want to restrict our datasets to, say, just integer-valued numbers. This would be the case if we were analyzing count data, such as the number of items of a particular product on an e-commerce site sold on a particular day. The integer numbers, …, -2, -1, 0, 1, 2, …, are a subset of the real numbers, and we denote them by the symbol ℤ. Despite them being a subset of the real numbers, there are still an infinite number of integers.

For the e-commerce count data that we mentioned earlier, the integer value would always be positive. If we restrict ourselves to strictly positive integers, 1, 2, 3, …, and so on, then we have the natural or counting numbers. These we denote by the symbol , clearly meaning positive integers. The fact that these strictly positive integers are the natural numbers means we also denote them using the symbol .

As well as real numbers, we will occasionally deal with complex numbers. As the name suggests, complex numbers have more structure to them than real numbers. The complex numbers don’t live on the real number line and so are not a subset of the real numbers, but instead, they have a two-dimensional structure, which we’ll explain in a moment. We denote the set of complex numbers by the symbol .

Sometimes, there are very specific occasions when we may want to refer to other subsets of the real numbers. Other common symbols you may encounter are , for the set of rational numbers, and for the two-element set . The latter you may encounter when we talk about modeling binary discrete target variables or working with binary features.

Numbers such as 4.6 are specific instances of a real number. When we are talking about algorithms or code, we will want to talk about variables, in which case we use a symbol such as to represent a number, which could take on a range of different values depending on what we do with it. But what could that range be? When we are documenting an algorithm, we may want to tell the reader that will always be a real number. We do that by writing , which is mathematical language for “ is in the set of real numbers,” or more succinctly, “is real.”

Likewise, if we wanted to say was always a positive integer, then we would write . Or, if we wanted to say was a complex number, we would write .

When we have several variables that all have similar properties or that may be related in some way – for example, they represent different features of a data point in a training set – then we use subscripts to denote the different variables. For example, we would use to represent three features of a dataset. Just as with the single variable , if we want to say that those three features will always contain real numbers, then we would write .

Complex numbers

If the real numbers live on the one-dimensional structure that is the real number line, this raises the question of whether we can have numbers that live in a two-dimensional space. Complex numbers are such numbers. A complex number, , has two components or parts. These are a real part, , and an imaginary part, , with both being real numbers. The real and imaginary parts are combined, and we write the complex number as follows:

Eq. 1

The symbol has a special meaning. It is in fact the square root of -1, so that . We can think of the pair of numbers, , as picking out a point in a 2D plane. That plane is the complex plane, sometimes also called the Argand plane. Figure 1.2 shows the point in the complex plane:

Figure 1.2: The complex number plane

The position of along the x-axis is given by the real part of , while the position of along the y-axis is given by the imaginary part of . We also use Re to denote the real part of , and Im to denote the imaginary part of , so that we have the following:

Eq. 2

Consequently, we have used Re and Im to label the axes of the complex plane in Figure 1.2.

A number that has sits entirely on the x-axis and is a purely real number. Likewise, a complex number that has sits entirely on the y-axis and is a purely imaginary number.

Just as with other 2D planes, we can represent a point in the complex plane not just with Cartesian coordinates but with polar coordinates as well. This is also illustrated in Figure 1.2. A quick bit of high-school trigonometry gives us the following:

Eq. 3

The symbol denotes the modulus of and is the same as the distance of the point from the origin in Figure 1.2. Looking at Figure 1.2 and using Pythagoras’ theorem, we can calculate using the following:

Eq. 4

The angle is conventionally measured in a counterclockwise direction and in radians, so that a point on the positive y-axis would have (remember ). Euler’s formula is as follows:

Eq. 5

This means we can also write in the following form:

Eq. 6

This last form for writing a complex number will be useful when we introduce Fourier transforms, which are used to represent functions as a sum of sine and cosine waves. In fact, this is our main reason for introducing complex numbers.

One important concept relating to the complex number z is that of its complex conjugate. The complex conjugate of we will denote by . Sometimes, the symbol is used instead. The complex conjugate is related to by flipping the sign of the imaginary part of . So, if , then . In Figure 1.3, this is shown by simply reflecting in the x-axis. A useful relation that follows is the following:

Eq. 7

Figure 1.3: The complex conjugate

The integers, real numbers, and complex numbers represent the overwhelming majority of the numbers we will meet throughout this book, so this is a good place to end our recap of number systems.

Let’s summarize what we learned.

What we learned

In this section, we have learned the following:

The notation , for describing the real numbersThe notation , for describing the integer numbersThe notations and , for describing the strictly positive integers, also known as the natural numbersThe notation , for describing the binary set The notation , for describing the complex numbersHow complex numbers have a real and an imaginary partHow a complex number can also be described in terms of a modulus, , and a phase, How to calculate the complex conjugate of a complex number

In the next section, having learned how to describe both real and complex numbers, we move on to how to describe collections of numbers (vectors) and how to describe mathematical objects (matrices) that transform those vectors.

Linear algebra

In this section, we introduce notation to describe vectors and matrices, which are key mathematical objects that we will encounter again and again throughout this book.

Vectors

In many circumstances, we will want to represent a set of numbers together. For example, the numbers 7.3 and 1.2 might represent the values of two features that correspond to a data point in a training set. We often group these numbers together in brackets and write them as (7.3, 1.2) or [7.3, 1.2]. Because of the similarity to the way we write spatial coordinates, we tend to call a collection of numbers that are held together a vector. A vector can be two-dimensional, as in the example just given, or d-dimensional, meaning it contains d components, and so might look like .

We can write a vector in two ways. We can write it as a row vector, going across the page, such as the following vector:

d-dimensional row vector

Eq. 8

Alternatively, we can write it as a column vector going down the page, such as the following vector:

Eq. 9

We can convert between a row vector and a column vector (and vice versa) using the transpose operator, denoted by a superscript. So, the transpose of a row vector is a column vector. See the following example:

Eq. 10

And vice-versa in the following example:

Eq. 11

Symbolically, we often write a vector using a boldface font – for example, would mean a vector. Sometimes, we use an underline to denote a vector, so you may also see . Throughout this book, I will use an underline to denote a vector. This will make it clear when I am talking about a vector.

Matrices

Usually, we will want to transform a vector more than just transposing it. Linear transformations of vectors can be done with matrices. We will cover such transformations in Chapter 3, but for now, we will just show how we write a matrix. A matrix is a two-dimensional array. For example, the following array is a matrix:

Eq. 12

We have used a double underline to denote the matrix . Note that a matrix has a double underline because it is a two-dimensional structure, while we use a single underline for a vector, which is a one-dimensional structure.

Because a matrix is a two-dimensional structure, we use two numbers to describe its size: the number of rows and the number of columns. If a matrix has R rows and C columns, we describe it as an R x C matrix. The matrix in Eq. 12 is a 3 x 4 matrix.

We pick out individual parts of a matrix by referring to a matrix element. The symbol or refers to the number that is in the position of the ith row and jth column. So, for the matrix in Eq. 12, .

The matrix elements in the previous example are all integers. This need not be the case. A matrix element could be any real number. It can also be a complex number. If all the matrix elements are real, we say it is a real matrix, while if any of the matrix elements are complex, then we say the matrix is complex.

That short recap on notation for vectors and matrices is enough for now. We will meet vectors and matrices again in Chapter 3, but for now, let’s summarize what we have learned about them.

What we learned

In this section, we have learned about the following:

How to represent a vector as a collection of multiple components (numbers)Row vectors and column vectors and how they are related to each other via the transpose operatorHow a matrix is a two-dimensional collection of components (numbers) and how the notation is used to pick out individual components or matrix elements

In the next section, now we have learned about various notations for individual numbers and collections of them, we move on to notation for performing operations on them. We start with the simplest operations – adding numbers together, multiplying numbers together, and taking logarithms.

Sums, products, and logarithms

In this section, we introduce notation for doing the most basic operations we can do with numbers, namely adding them together or multiplying them together. We’ll then introduce notation for working with logarithms.

Sums and the 𝚺 notation

When we want to add several numbers together, we can use the summation, or , notation. For example, if we want to represent the addition of the numbers , we use the notation to write this as follows:

Eq. 13

This notation is shorthand for writing . This essentially defines what the notation represents – that is, the following:

Eq. 14

In the left-hand side (LHS) of Eq. 14, the integer indexing variable, , takes the values between 1 (indicated beneath the symbol) and 5 (indicated above the symbol) and we interpret the LHS as “take all the numbers for the values of indicated by the symbol and add them together.”

You may wonder whether the shorthand notation on the LHS of Eq. 14 is of any use. After all, the right-hand side (RHS) isn’t very long. However, when we want to represent the adding up of lots of numbers, then the notation really comes into its own. For example, if we want to add up the numbers up to , then we use the notation to write this compactly, as follows:

Eq. 15

Sometimes, we will use the notation to add together a set of numbers where the size of the set (the number of numbers being added together) is variable. For example, see the following notation:

∑i1iNxi

Eq. 16

This means “add together the N numbers, .” Clearly, we would get a different result for different choices of . This means the expression given in Eq. 16 is a function of .

Sometimes, you may see variants of the expression in the previous equation. Sometimes, a person may omit the upper value of or both the lower and upper values in the notation because it is taken as understood what the values should naturally be. For example, you may see the following:

Eq. 17

This usually means “add up all values of in the problem we are analyzing.” Similarly, the expressions and mean the same thing.

Finally, note that when writing sums using the notation, we haven’t said where the values of come from. We could in fact use the notation to add up the values we get after we have applied a function to the values . In this case, we would write the following:

Eq. 18

The LHS of Eq. 18 is the notation way of writing the RHS. The example in Eq. 19 makes this clearer. If we set so we had five numbers, , and we want to apply the sine function to these five numbers and add them up, then we would write the following:

Eq. 19

Finally, it is worth pointing out that we can also use the notation to add numbers that are simple functions of the index variable . For example, using the notation, we can write the sum of the first 100 squares as follows:

Eq. 20

This is obviously shorthand notation for .

Products and the notation

Having introduced the notation and explained it at length, we can now introduce the complimentary idea of a concise, shorthand notation for multiplying lots of numbers together. We do this with the Π or product notation. If we want to multiply together, we can write this as follows:

Eq. 21

As with the notation, we can use the notation more generally. For example, we can write as shorthand for . Again, we can use the product notation as shorthand for multiplying function values together, as follows:

Eq. 22

Logarithms

Logarithms are extremely useful for describing how quickly a quantity or function grows. In particular, the logarithm tells us the exponent that describes the rate of growth of a quantity or function. Let’s make that more explicit. The logarithm to base of the number is . Mathematically, we write this as follows:

Eq. 23

The symbol is shorthand for taking the logarithm to base . This shorthand is so common that even in the text, I will use the word log when I mean logarithm. It is also not uncommon to omit the brackets in the previous equation and write . The most common bases we use for taking logarithms are base , base 10, and base 2. Of these, base is so commonly used that we use a different symbol, , when taking the log. So, in effect, this means . This symbol means the naturallogarithm or natural log to denote the fact that taking the log to base is the most natural or common thing to do. Because taking the natural log is so common or natural, most mathematicians don’t really consider taking the log to any other base, and so by default, we use the symbol to mean . Watch out for this. If you see the symbol without a base specified, then it either means the base is not important – for example, the proof of the mathematical statement does not depend upon the base – or base is implicitly meant. This is also the case in most computer programming languages. Applying the operator will return the natural logarithm. For example, in Python, if we use the numpy.log(y) NumPy function, we will get the natural logarithm of returned.

We can see from Eq. 23 that the logarithm does in fact tell us the exponent (in base ) of the number we are taking the log of. So, if , then . Because taking the log effectively gives us an exponent value, the logarithm of a number is typically much smaller than the number itself. More importantly, it also means that the logarithm function is monotonic, so that increases as increases. The word “monotonic” means “of one tone” or “of one direction,” and so it means either only going up (monotonically increasing) or only going down (monotonically decreasing). This is shown in Figure 1.4, which shows the natural logarithm function from which we can see the value of increasing as gets bigger:

Figure 1.4: Graph of the natural logarithm function

An important consequence of the monotonically increasing nature of the logarithm function is that if we have a function and we want to find the value of where has its highest (or maximum) value, then that maximal value of , let’s call it , is also the point where has its maximal value. In mathematical notation, we can write this fact as follows:

Eq. 24

We will refer to this again in a moment.

There are well-known rules for taking logarithms of reciprocals, products, and ratios. These are (for any base):

Eq. 25

And the following:

Eq. 26

Combining these two rules, we get the rule for taking the log of a ratio:

Eq. 27

The rule for taking the log of a product is particularly useful when we have a product formed from many numbers. Using the and notations we introduced earlier, we can write the following:

Eq. 28

This, in conjunction with the fact that taking the log is a monotonic transformation, will be very useful to us when we start to use the concept of maximum likelihood to build probabilistic models in Chapter 5.

We will make lots of use of sums, products, and logarithms throughout this book, but we have all the notation we need to work with them, so let’s summarize what we have learned about that notation.

What we learned

In this section, we have learned about the following:

The Σ notation for adding lots of numbers togetherThe Π notation for multiplying lots of numbers togetherHow we can also use the Σ and Π notations when we have a function, , applied to our numbers, How a logarithm function transforms a number into an exponentHow a logarithm function is defined with respect to a specific baseHow base is the most commonly used base for taking logarithms, and the corresponding logarithm function is called the natural logarithm and is often denoted as How to take logarithms of products of numbers and ratios of numbers

In the next section, we will stick with functions applied to numbers, but we’ll learn how to describe how fast a function changes as we change , using the notation of differential calculus. We will also learn how to describe the area under a function using the notation of integral calculus.

Differential and integral calculus

In this section, we won’t go into the fundamentals of differential calculus, but instead just recap some basic results and notation. Therefore, we are assuming you already have some basic familiarity with differentiation and integration.

Differentiation

Let’s start with what the derivative of a function or curve intuitively represents. An example curve is shown in Figure 1.5. The derivative of this function is denoted by the following symbol:

Eq. 29

The derivative of is itself a function of . The numerical value of the derivative evaluated at a particular value of , let’s say at , is the gradient (or slope) of the tangent to the curve at . As such, we can think of the derivative as defining the local gradient value of the curve. This is illustrated in Figure 1.5as well:

Figure 1.5: The derivative as the gradient of the tangent to the curve

Sometimes, when we want to be explicit about a particular point on the curve where we are evaluating the derivative, then we will use the notation , or more simply, . This means the derivative function is evaluated at , while the notation tends to get used to mean the derivative function in general.

Clearly, if is a function of , then we can ctalculate its derivative just like we could calculate the derivative of the function . Using the derivative notation, this second derivative is written as follows:

Eq. 30

What does this second derivative represent? Well, it is the gradient of the curve of – that is, the rate of change of the rate of change of . Similarly, we can calculate, if we wish, higher derivatives of – for example, the third derivative denoted by , or the fourth derivative denoted by .

Due to the history of how differential calculus was developed, we have a second, commonly used notation for the first and second derivatives etc. of a function. In this second notation form, the first derivative of the function is written as , or simply . So, . As you might have guessed, the second and third derivatives are written as and in this new notation. Obviously, when we get to very high order derivatives – for example, fifth, sixth, and so on – writing all those apostrophes in the superscript becomes a bit clunky, so we also use the notation to denote the derivative, with meaning the function itself. Consequently, the notations all mean the same thing – the first derivative – and we often will use them interchangeably in the same document, proof, or explanation.

Now, let’s recap how to calculate the derivative of some common functions we’re likely to encounter. The derivative of a linear function is straightforward and intuitive – it is the gradient of that linear function. So, more explicitly, the derivative of a constant is zero and we have the following:

Eq. 31

The derivative of a power is calculated as follows:

Eq. 32

The derivative of the exponential function is itself, as follows:

Eq. 33

The derivative of the natural logarithm function lnx is calculated as follows:

Eq. 34

For calculating the derivative of more complicated functions, we typically make use of the “chain rule,” which is a rule for how to calculate the derivative of a composite function. If we have a composite function , then the chain rule says the following:

Eq. 35

Let’s illustrate that with an explicit example. Imagine I have the function . Using the chain rule and the results for the derivatives of the natural logarithm function and for a power of , we have the following:

Eq. 36

Similarly, if we have the function , then the chain rule and the derivatives of exponential and linear functions combine to give us the following:

Eq. 37

Finally, we come to the situation where our function may depend upon more than just one variable. For example, what if our function is a function of and so that ? What does the derivative mean or represent now? One way to look at it is to ask, what if we just kept constant and calculated the derivative with respect to ? To do this, we just apply the simple rules we have outlined previously. However, that calculation gives us the gradient in only the direction of the function. It gives us only partial information about how the function is changing, so we call this the partial derivative of with respect to . We use a slightly different symbol to denote a partial derivative, namely . Just as we can calculate a partial derivative of with respect to , we can calculate a partial derivative of with respect to by holding constant and applying the rules of one-dimensional differential calculus. As you may have guessed, we use the symbol for this partial derivative with respect to .

Finding maxima and minima

A common task we will want to carry out is to find the maximum value or the minimum value of a function. For example, we may want to find the model parameter values that have the smallest (minimum) error on a training dataset. In these cases, even if we can’t find the absolute best (global minimum error) parameters, a local minimum may still be useful to us, as this would represent the best model parameters in a limited but relevant region of the model parameter space.

Finding the maximum value of a function and finding the minimum value of a function are closely related tasks because if we find the maximum value of the function , then we have found the minimum value of the function . So, from now on, we will largely discuss only how to find the maximum value of a function.

To help find the maximum value of a function, we make use of the differential calculus we have just recapped. If we look at Figure 1.6, we can see that at the maximum point of the function shown, the gradient is zero:

Figure 1.6: A function with a single maximum

Now, since the gradient is given by the function , we just have to find the point where . That is, we solve the equation for values of . Clearly, from Figure 1.6, in this example, there is only one solution to this equation in the region between and , and it occurs at . For other functions, we may have multiple solutions to the equation . Solutions to this equation are called stationary points. Why? Well, the gradient is zero, so the function isn’t changing much in the region around the stationary point, so the function is effectively “stationary.”

The maximum value of a function is not always a stationary point. Look at Figure 1.7:

Figure 1.7: A function with a minimum and a maximum

There is a stationary point at . It is clearly not the highest value of the function. It is a maximum of the function, but only in the small region around , so we refer to it as being a local maximum. It is not the global maximum of the function. However, this local maximum may still be useful to us.

Notice in Figure 1.7 that we have another stationary point at . It is a local minimum, but not the global minimum. So, if we are interested in finding maxima of a function, even if they are just local maxima, how can we do this if finding solutions to can’t distinguish between a maximum and a minimum? Well, take a closer look at Figure 1.7. To the left of the (local) maximum at , the gradient is positive, while to the right of the maximum, the gradient is negative. So, the gradient is decreasing in the region around this stationary point. That is, the second derivative is negative at . Conversely, if we look at the gradient in the region around the local minimum at , the gradient is increasing, and the second derivative is positive at . This means the second derivative gives us a means of distinguishing maxima from minima. Putting this together, we have the following:

Eq. 38

Eq. 39

Finally, let’s return to our actual goal – to find the maximal (highest) value of a function. Let’s say we are interested only in the region between and . Just from a visual inspection of Figure 1.7, we can see the global maximum is at the boundary at and is not even a stationary point. This emphasizes that while a stationary point with is at least a local maximum, it may not be a global maximum – and a global maximum may not be a stationary point.

Integration

Integral calculus is the counterpart to differential calculus, in that if we know that a function is the derivative of another function, say, , then we can easily calculate the integral of .

But what does the integral of represent? We write the integral as follows:

Eq. 40

The similarity in shape between the integral symbol ∫ and the ∑ symbol used to denote summation is no coincidence. Integration is derived as a limit of a summation of lots of small contributions. We won’t go into that derivation here, other than to say the integral in the previous equation represents a summation of lots of values of the function evaluated at different points between and . The integral gives the area between and under the curve given by , as shown in Figure 1.8:

Figure 1.8: The integral as the area under the curve

There are numerous situations where we want to calculate such a quantity. For example, we may need to calculate the total probability of a particular event happening over a range of values of a predictive feature. In this case, we would have to add up lots of small probability contributions that change as the value of the predictive feature changes.

Unfortunately, calculating integrals is less of a routine process compared to differentiating a function. Calculating integrals can be more of an art. Where we can spot that the function being integrated is the derivative of another function, we can easily calculate the integral. So, for example, if we know that , then,

Eq. 41

This equation is called the fundamental theorem of calculus because it links integral calculus to differential calculus.

Frequently, you will see an integral written without explicit limits, such as . The result is a function, which is defined up to a constant. For the previous example, where the integral without explicit limits would be written as follows:

Eq. 42

Once we have the function on the RHS, we can plug in explicit limits to get the result of the integral with limits. In this case, we would get the following:

Eq. 43

Again, notice that we subtract the value at the lower limit. The constants cancel each other out and we recover the original definition of the integral with limits, as in Eq. 41.

Often, we are not so lucky, and we won’t know that is an exact derivative of some other function. So, calculating an integral can rely more upon having seen the integral before or looking up the integral in a big book of integrals – see point 1 in the Notes and further reading section at the end of the chapter. Consequently, we are not going to dwell upon techniques for evaluating integrals, and throughout the book, we will simply give the result of an integral when it is needed and only explain briefly, where possible, how it was calculated.

However, there are some integrals that we will encounter again and again throughout this book, so we’ll recap them here.

From our recap of differentiation, we know that . By making use of the fundamental theorem of calculus, we can work out the following:

Eq. 44

The other integral we will make use of a lot is related to the normal or Gaussian probability distribution, and we will encounter it a lot when we start building probabilistic predictive models. The integral result we refer to is as follows:

Eq. 45

This integral result