29,99 €
Mathematics of Machine Learning provides a rigorous yet accessible introduction to the mathematical underpinnings of machine learning, designed for engineers, developers, and data scientists ready to elevate their technical expertise. With this book, you’ll explore the core disciplines of linear algebra, calculus, and probability theory essential for mastering advanced machine learning concepts.
PhD mathematician turned ML engineer Tivadar Danka—known for his intuitive teaching style that has attracted 100k+ followers—guides you through complex concepts with clarity, providing the structured guidance you need to deepen your theoretical knowledge and enhance your ability to solve complex machine learning problems. Balancing theory with application, this book offers clear explanations of mathematical constructs and their direct relevance to machine learning tasks. Through practical Python examples, you’ll learn to implement and use these ideas in real-world scenarios, such as training machine learning models with gradient descent or working with vectors, matrices, and tensors.
By the end of this book, you’ll have gained the confidence to engage with advanced machine learning literature and tailor algorithms to meet specific project requirements.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 817
Veröffentlichungsjahr: 2025
Mathematics of Machine Learning
Master linear algebra, calculus, and probability for machine learning
Tivadar Danka
Mathematics of Machine Learning
Copyright © 2025 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Portfolio Director: Sunith Shetty
Relationship Lead: Tushar Gupta
Project Manager: Amit Ramadas
Content Engineer: Deepayan Bhattacharjee
Technical Editor: Kushal Sharma
Copy Editor: Safis Editing
Indexer: Hemangini Bari
Proofreader: Deepayan Bhattacharjee
Production Designer: Ganesh Bhadwalkar
Growth Leads: Merlyn M Shelley &Bhavesh Amin
Marketing Owner: Ankur Mulasi
First published: May 2025
Production reference: 1210525
Published by Packt Publishing Ltd. Grosvenor House 11 St Paul’s Square Birmingham B3 1RB, UK.
ISBN 978-1-83702-787-3
www.packtpub.com
This book is dedicated to my mother, whom I lost while making this book.
Thanks, Mom! You are inside every line I write.
– Tivadar Danka
I met Tivadar during Covid. We were all stuck at home, unsure what to do with all the extra time, so we started talking about building something together.
I wanted to teach people Machine Learning. I had this idea about building a website that would ask random questions for people to answer. I wanted the site to do a hundred different things, but one thing was non-negotiable: I wanted people to leave feeling they had learned something different.
Tivadar was the answer to that.
Machine Learning is tough, and unfortunately, most educational content you find online suffers from chronic handwaving syndrome: overused buzzwords, skipped intuition, and more confusion than when you started.
At the time, Tivadar was already writing online about math. He wasn’t the only one, but he was different. He was taking seemingly mundane topics and telling stories around them that were surprisingly effective.
There wasn’t any handwaving or burying people under a mountain of theoretical ideas. The writing was different, sharp, and fresh.
I had never been excited about math before. I read every single one of Tivadar’s posts. I wasn’t just learning the rules, I was learning how to think. And, shockingly, I was entertained.
I had never seen that combination before.
I asked Tivadar to help me with the site, and he did – for a while – until he decided to move on to start writing this book. I remember telling him I understood, but I was secretly sad – really sad.
Today, I’m thrilled this happened the way it did.
Mathematics of Machine Learning is the inevitable consequence of those short posts that excited me about math for the first time. It’s not just the best book I’ve read on the subject, it’s the one I wish had existed when I started.
This book does something rare: it teaches you the math behind machine learning without boring you with vague concepts—or making you forget why you showed up in the first place.
The book is laser-focused on what you need and says nothing about what you don’t. The explanations are vintage Tivadar: sharp, detailed, and entertaining. You can’t just read or memorize them; you’ll understand them.
I’ve been reading this book since it was an idea and a bunch of notes and sketches. I’ve watched it grow from online posts to something polished and powerful. And I’ve learned a lot – not just about math, but about how to explain math.
I’ll leave you to it. You’re in for a treat. Enjoy the journey – I know I did.
Santiago Valdarrama,Founder of ml.school
Tivadar Danka is an independent thinker, who believes that the truth value of any proposition is independent of the titles, awards, qualifications, and affiliations of the one asserting it. If you are looking for confirmation that you made a good purchase with this book, start at Chapter 1.
Yes, that’s really a reference to the first chapter in the author bio; that’s where the important part begins.
Matthew Kehoe earned a PhD in Computational Mathematics from the University of Illinois at Chicago, where he specialized in numerical partial differential equations and inverse electromagnetic scattering theory. Following two graduate internships with the National Science Foundation and several years in software development and technical consulting, he now serves as a senior researcher leading projects in radar signal processing, scientific machine learning, and natural language processing. In addition to his applied research, he maintains a strong interest in analytic number theory and has begun writing a book on computing zeros of the Riemann zeta function.
Shravan Patankar, PhD, is a researcher and software engineer with deep expertise in artificial intelligence, machine learning, and data science. He earned his PhD in mathematics from the University of Illinois at Chicago, where his research led to three peer-reviewed publications, including two in top mathematical journals and one in Scientific Reports (Nature) on COVID-19 death estimates. With a strong foundation in mathematical reasoning and statistical modeling, Shravan has applied these principles across both academic and industry contexts.
Professionally, he is a Software Engineer in AI/ML at KPIT Technologies, where he works on software-defined vehicles and contributes to the safety of autonomous driving systems. Shravan has also served as an instructor and teaching assistant at UIC, teaching subjects from introductory calculus and statistics to applied Python programming. His collaborative and interdisciplinary work has been showcased at national conferences and international seminars. I would like to thank my mentors and peers for shaping my journey. I am especially grateful to my family and friends for their unwavering encouragement and support during the review process.
Read this book alongside other users, deep learning experts, and the author himself. Ask questions, provide solutions to other readers, chat with the author via Ask Me Anything sessions, and much more. Scan the QR code or visit the link to join the community:https://packt.link/math
Introduction
What is this book about?
How to read this book
Conventions used
What this book covers
To get the most out of this book
Part 1: Linear Algebra
1 Vectors and Vector Spaces
1.1 What is a vector space?
1.1.1 Examples of vector spaces
1.2 The basis
1.2.1 Linear combinations and independence
1.2.2 Spans of vector sets
1.2.3 Bases, the minimal generating sets
1.2.4 Finite dimensional vector spaces
1.2.5 Why are bases so important?
1.2.6 The existence of bases
1.2.7 Subspaces
1.3 Vectors in practice
1.3.1 Tuples
1.3.2 Lists
1.3.3 NumPy arrays
1.3.4 NumPy arrays as vectors
1.3.5 Is NumPy really faster than Python?
1.4 Summary
1.5 Problems
2 The Geometric Structure of Vector Spaces
2.1 Norms and distances
2.1.1 Defining distances from norms
2.2 Inner products, angles, and lots of reasons to care about them
2.2.1 The generated norm
2.2.2 Orthogonality
2.2.3 The geometric interpretation of inner products
2.2.4 Orthogonal and orthonormal bases
2.2.5 The Gram-Schmidt orthogonalization process
2.2.6 The orthogonal complement
2.3 Summary
2.4 Problems
3 Linear Algebra in Practice
3.1 Vectors in NumPy
3.1.1 Norms, distances, and dot products
3.1.2 The Gram-Schmidt orthogonalization process
3.2 Matrices, the workhorses of linear algebra
3.2.1 Manipulating matrices
3.2.2 Matrices as arrays
3.2.3 Matrices in NumPy
3.2.4 Matrix multiplication, revisited
3.2.5 Matrices and data
3.3 Summary
3.4 Problems
4 Linear Transformations
4.1 What is a linear transformation?
4.1.1 Linear transformations and matrices
4.1.2 Matrix operations revisited
4.1.3 Inverting linear transformations
4.1.4 The kernel and the image
4.2 Change of basis
4.2.1 The transformation matrix
4.3 Linear transformations in the Euclidean plane
4.3.1 Stretching
4.3.2 Rotations
4.3.3 Shearing
4.3.4 Reflection
4.3.5 Orthogonal projection
4.4 Determinants, or how linear transformations affect volume
4.4.1 How linear transformations scale the area
4.4.2 The multi-linearity of determinants
4.4.3 Fundamental properties of the determinants
4.5 Summary
4.6 Problems
5 Matrices and Equations
5.1 Linear equations
5.1.1 Gaussian elimination
5.1.2 Gaussian elimination by hand
5.1.3 When can we perform Gaussian elimination?
5.1.4 The time complexity of Gaussian elimination
5.1.5 When can a system of linear equations be solved?
5.1.6 Inverting matrices
5.2 The LU decomposition
5.2.1 Implementing the LU decomposition
5.2.2 Inverting a matrix, for real
5.2.3 How to actually invert matrices
5.3 Determinants in practice
5.3.1 The lesser of two evils
5.3.2 The recursive way
5.3.3 How to actually compute determinants
5.4 Summary
5.5 Problems
6 Eigenvalues and Eigenvectors
6.1 Eigenvalues of matrices
6.2 Finding eigenvalue-eigenvector pairs
6.2.1 The characteristic polynomial
6.2.2 Finding eigenvectors
6.3 Eigenvectors, eigenspaces, and their bases
6.4 Summary
6.5 Problems
7 Matrix Factorizations
7.1 Special transformations
7.1.1 The adjoint transformation
7.1.2 Orthogonal transformations
7.2 Self-adjoint transformations and the spectral decomposition theorem
7.3 The singular value decomposition
7.4 Orthogonal projections
7.4.1 Properties of orthogonal projections
7.4.2 Orthogonal projections are the optimal projections
7.5 Computing eigenvalues
7.5.1 Power iteration for calculating the eigenvectors of real symmetric matrices
7.5.2 Power iteration in practice
7.5.3 Power iteration for the rest of the eigenvectors
7.6 The QR algorithm
7.6.1 The QR decomposition
7.6.2 Iterating the QR decomposition
7.7 Summary
7.8 Problems
8 Matrices and Graphs
8.1 The directed graph of a nonnegative matrix
8.2 Benefits of the graph representation
8.2.1 The connectivity of graphs
8.3 The Frobenius normal form
8.3.1 Permutation matrices
8.3.2 Directed graphs and their strongly connected components
8.3.3 Putting graphs and permutation matrices together
8.4 Summary
8.5 Problems
References
Part 2: Calculus
9 Functions
9.1 Functions in theory
9.1.1 The mathematical definition of a function
9.1.2 Domain and image
9.1.3 Operations with functions
9.1.4 Mental models of functions
9.2 Functions in practice
9.2.1 Operations on functions
9.2.2 Functions as callable objects
9.2.3 Function base class
9.2.4 Composition in the object-oriented way
9.3 Summary
9.4 Problems
10 Numbers, Sequences, and Series
10.1 Numbers
10.1.1 Natural numbers and integers
10.1.2 Rational numbers
10.1.3 Real numbers
10.2 Sequences
10.2.1 Convergence
10.2.2 Properties of convergence
10.2.3 Famous convergent sequences
10.2.4 The role of convergence in machine learning
10.2.5 Divergent sequences
10.2.6 The big and small O notation
10.2.7 Real numbers are sequences
10.3 Series
10.3.1 Convergent and divergent series
10.3.2 Properties of series
10.3.3 Conditional and absolute convergence
10.3.4 Revisiting rearrangements
10.3.5 Convergence tests for series
10.3.6 The Cauchy product of series
10.4 Summary
10.5 Problems
11 Topology, Limits, and Continuity
11.1 Topology
11.1.1 Open and closed sets
11.1.2 Distance and topology
11.1.3 Sets and sequences
11.1.4 Bounded sets
11.1.5 Compact sets
11.2 Limits
11.2.1 Equivalent definitions of limits
11.3 Continuity
11.3.1 Properties of continuous functions
11.4 Summary
11.5 Problems
12 Differentiation
12.1 Differentiation in theory
12.1.1 Equivalent forms of differentiation
12.1.2 Differentiation and continuity
12.2 Differentiation in practice
12.2.1 Rules of differentiation
12.2.2 Derivatives of elementary functions
12.2.3 Higher-order derivatives
12.2.4 Extending the Function base class
12.2.5 The derivative of compositions
12.2.6 Numerical differentiation
12.3 Summary
12.4 Problems
13 Optimization
13.1 Minima, maxima, and derivatives
13.1.1 Local minima and maxima
13.1.2 Characterization of optima with higher order derivatives
13.1.3 Mean value theorems
13.2 The basics of gradient descent
13.2.1 Derivatives, revisited
13.2.2 The gradient descent algorithm
13.2.3 Implementing gradient descent
13.2.4 Drawbacks and caveats
13.3 Why does gradient descent work?
13.3.1 Differential equations 101
13.3.2 The (slightly more) general form of ODEs
13.3.3 A geometric interpretation of differential equations
13.3.4 A continuous version of gradient ascent
13.3.5 Gradient ascent as a discretized differential equation
13.3.6 Gradient ascent in action
13.4 Summary
13.5 Problems
14 Integration
14.1 Integration in theory
14.1.1 Partitions and their refinements
14.1.2 The Riemann integral
14.1.3 Integration as the inverse of differentiation
14.2 Integration in practice
14.2.1 Integrals and operations
14.2.2 Integration by parts
14.2.3 Integration by substitution
14.2.4 Numerical integration
14.2.5 Implementing the trapezoidal rule
14.3 Summary
14.4 Problems
Join our community on Discord
References
Part 3: Multivariable Calculus
15 Multivariable Functions
15.1 What is a multivariable function?
15.2 Linear functions in multiple variables
15.3 The curse of dimensionality
15.4 Summary
16 Derivatives and Gradients
16.1 Partial and total derivatives
16.1.1 The gradient
16.1.2 Higher order partial derivatives
16.1.3 The total derivative
16.1.4 Directional derivatives
16.1.5 Properties of the gradient
16.2 Derivatives of vector-valued functions
16.2.1 The derivatives of curves
16.2.2 The Jacobian and Hessian matrices
16.2.3 The total derivative for vector-vector functions
16.2.4 Derivatives and function operations
16.3 Summary
16.4 Problems
17 Optimization in Multiple Variables
17.1 Multivariable functions in code
17.2 Minima and maxima, revisited
17.3 Gradient descent in its full form
17.4 Summary
17.5 Problems
References
Part 4: Probability Theory
18 What is Probability?
18.1 The language of thinking
18.1.1 Thinking in absolutes
18.1.2 Thinking in probabilities
18.2 The axioms of probability
18.2.1 Event spaces and σ-algebras
18.2.2 Describing σ-algebras
18.2.3 σ-algebras over real numbers
18.2.4 Probability measures
18.2.5 Fundamental properties of probability
18.2.6 Probability spaces on ℝn
18.2.7 How to interpret probability
18.3 Conditional probability
18.3.1 Independence
18.3.2 The law of total probability revisited
18.3.3 The Bayes theorem
18.3.4 The Bayesian interpretation of probability
18.3.5 The probabilistic inference process
18.3.6 The Monty Hall paradox
18.4 Summary
18.5 Problems
19 Random Variables and Distributions
19.1 Random variables
19.1.1 Discrete random variables
19.1.2 Real-valued random variables
19.1.3 Random variables in general
19.1.4 Behind the definition of random variables
19.1.5 Independence of random variables
19.2 Discrete distributions
19.2.1 The Bernoulli distribution
19.2.2 The binomial distribution
19.2.3 The geometric distribution
19.2.4 The uniform distribution
19.2.5 The single-point distribution
19.2.6 Law of total probability, revisited once more
19.2.7 Sums of discrete random variables
19.3 Real-valued distributions
19.3.1 The cumulative distribution function
19.3.2 Properties of the distribution function
19.3.3 Cumulative distribution functions for discrete random variables
19.3.4 The uniform distribution
19.3.5 The exponential distribution
19.3.6 The normal distribution
19.4 Density functions
19.4.1 Density functions in practice
19.4.2 Classification of real-valued random variables
19.5 Summary
19.6 Problems
20 The Expected Value
20.1 Discrete random variables
20.1.1 The expected value in poker
20.2 Continuous random variables
20.3 Properties of the expected value
20.4 Variance
20.4.1 Covariance and correlation
20.5 The law of large numbers
20.5.1 Tossing coins…
20.5.2 …rolling dice…
20.5.3 …and all the rest
20.5.4 The weak law of large numbers
20.5.5 The strong law of large numbers
20.6 Information theory
20.6.1 Guess the number
20.6.2 Guess the number 2: Electric Boogaloo
20.6.3 Information and entropy
20.6.4 Differential entropy
20.7 The Maximum Likelihood Estimation
20.7.1 Probabilistic modeling 101
20.7.2 Modeling heights
20.7.3 The general method
20.7.4 The German tank problem
20.8 Summary
20.9 Problems
References
Part 5: Appendix
Appendix A It’s Just Logic
A.1 Mathematical logic 101
A.2 Logical connectives
A.3 The propositional calculus
A.4 Variables and predicates
A.5 Existential and universal quantification
A.6 Problems
Appendix B The Structure of Mathematics
B.1 What is a definition?
B.2 What is a theorem?
B.3 What is a proof?
B.4 Equivalences
B.5 Proof techniques
B.5.1 Proof by induction
B.5.2 Proof by contradiction
B.5.3 Contraposition
Appendix C Basics of Set Theory
C.1 What is a set?
C.2 Operations on sets
C.2.1 Union, intersection, difference
C.2.2 De Morgan’s laws
C.3 The Cartesian product
C.4 The cardinality of sets
C.5 The Russell paradox (optional)
Appendix D Complex Numbers
D.1 The definition of complex numbers
D.2 The geometric representation
D.3 The fundamental theorem of algebra
D.4 Why are complex numbers important?
Other Books You May Enjoy
Index
Title Page
Cover
Table of Contents
This part comprises the following chapters:
Chapter
1
, Vectors and Vector Spaces
Chapter
2
, The Geometric Structure of Vector Spaces
Chapter
3
, Linear Algebra in Practice
Chapter
4
, Linear Transformations
Chapter
5
, Matrices and Equations
Chapter
6
, Eigenvalues and Eignevectors
Chapter
7
, Matrix Factorizations
Chapter
8
, Matrices and Graphs