61,99 €
State-of-the-art algorithmic deep learning and tensoring techniques for financial institutions
The computational demand of risk calculations in financial institutions has ballooned and shows no sign of stopping. It is no longer viable to simply add more computing power to deal with this increased demand. The solution? Algorithmic solutions based on deep learning and Chebyshev tensors represent a practical way to reduce costs while simultaneously increasing risk calculation capabilities. Machine Learning for Risk Calculations: A Practitioner’s View provides an in-depth review of a number of algorithmic solutions and demonstrates how they can be used to overcome the massive computational burden of risk calculations in financial institutions.
This book will get you started by reviewing fundamental techniques, including deep learning and Chebyshev tensors. You’ll then discover algorithmic tools that, in combination with the fundamentals, deliver actual solutions to the real problems financial institutions encounter on a regular basis. Numerical tests and examples demonstrate how these solutions can be applied to practical problems, including XVA and Counterparty Credit Risk, IMM capital, PFE, VaR, FRTB, Dynamic Initial Margin, pricing function calibration, volatility surface parametrisation, portfolio optimisation and others. Finally, you’ll uncover the benefits these techniques provide, the practicalities of implementing them, and the software which can be used.
Quants, IT professionals, and financial risk managers will benefit from this practitioner-oriented approach to state-of-the-art risk calculation.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 885
Veröffentlichungsjahr: 2021
Cover
Title Page
Copyright
Dedication
Acknowledgements
Foreword
Motivation and aim of this booknotesSet
BOOK OUTLINE
NOTE
PART One: Fundamental Approximation Methods
Chapter 1: Machine Learning
1.1 INTRODUCTION TO MACHINE LEARNING
1.2 THE LINEAR MODEL
1.3 TRAINING AND PREDICTING
1.4 MODEL COMPLEXITY
NOTES
Chapter 2: Deep Neural Nets
2.1 A BRIEF HISTORY OF DEEP NEURAL NETS
2.2 THE BASIC DEEP NEURAL NET MODEL
2.3 UNIVERSAL APPROXIMATION THEOREMS
2.4 TRAINING OF DEEP NEURAL NETS
2.5 MORE SOPHISTICATED DNNs
2.6 SUMMARY OF CHAPTER
NOTES
Chapter 3: Chebyshev Tensors
3.1 APPROXIMATING FUNCTIONS WITH POLYNOMIALS
3.2 CHEBYSHEV SERIES
3.3 CHEBYSHEV TENSORS AND INTERPOLANTS
3.4 EX ANTE ERROR ESTIMATION
3.5 WHAT MAKES CHEBYSHEV POINTS UNIQUE
3.6 EVALUATION OF CHEBYSHEV INTERPOLANTS
3.7 DERIVATIVE APPROXIMATION
3.8 CHEBYSHEV SPLINES
3.9 ALGEBRAIC OPERATIONS WITH CHEBYSHEV TENSORS
3.10 CHEBYSHEV TENSORS AND MACHINE LEARNING
3.11 SUMMARY OF CHAPTER
NOTES
PART Two: The toolkit — plugging in approximation methods
Chapter 4: Introduction: why is a toolkit needed
4.1 THE PRICING PROBLEM
4.2 RISK CALCULATION WITH PROXY PRICING
4.3 THE CURSE OF DIMENSIONALITY
4.4 THE TECHNIQUES IN THE TOOLKIT
Chapter 5: Composition techniques
5.1 LEVERAGING FROM EXISTING PARAMETRISATIONS
5.2 CREATING A PARAMETRISATION
5.3 SUMMARY OF CHAPTER
Chapter 6: Tensors in TT format and Tensor Extension Algorithms
6.1 TENSORS IN TT FORMAT
6.2 TENSOR EXTENSION ALGORITHMS
6.3 STEP 1 — OPTIMISING OVER TENSORS OF FIXED RANK
6.4 STEP 2 — OPTIMISING OVER TENSORS OF VARYING RANK
6.5 STEP 3 — ADAPTING THE SAMPLING SET
6.6 SUMMARY OF CHAPTER
NOTES
Chapter 7: Sliding Technique
7.1 SLIDE
7.2 SLIDER
7.3 EVALUATING A SLIDER
7.4 SUMMARY OF CHAPTER
Chapter 8: The Jacobian projection technique
8.1 SETTING THE BACKGROUND
8.2 WHAT WE CAN RECOVER
8.3 PARTIAL DERIVATIVES VIA PROJECTIONS ONTO THE JACOBIAN
NOTES
PART Three: Hybrid solutions — approximation methods and the toolkit
Chapter 9: Introduction
9.1 THE DIMENSIONALITY PROBLEM REVISITED
9.2 EXPLOITING THE COMPOSITION TECHNIQUE
Chapter 10: The Toolkit and Deep Neural Nets
10.1 BUILDING ON
P
USING THE IMAGE OF
G
10.2 BUILDING ON
f
Chapter 11: The Toolkit and Chebyshev Tensors
11.1 FULL CHEBYSHEV TENSOR
11.2 TT-FORMAT CHEBYSHEV TENSOR
11.3 CHEBYSHEV SLIDER
11.4 A FINAL NOTE
Chapter 12: Hybrid Deep Neural Nets and Chebyshev Tensors Frameworks
12.1 THE FUNDAMENTAL IDEA
12.2 DNN+CT WITH STATIC TRAINING SET
12.3 DNN+CT WITH DYNAMIC TRAINING SET
12.4 NUMERICAL TESTS
12.5 ENHANCED DNN+CT ARCHITECTURES AND FURTHER RESEARCH
NOTES
PART Four: Applications
Chapter 13: The aim
13.1 SUITABILITY OF THE APPROXIMATION METHODS
13.2 UNDERSTANDING THE VARIABLES AT PLAY
NOTE
Chapter 14: When to use Chebyshev Tensors and when to use Deep Neural Nets
14.1 SPEED AND CONVERGENCE
14.2 THE QUESTION OF DIMENSION
14.3 PARTIAL DERIVATIVES AND EX ANTE ERROR ESTIMATION
14.4 SUMMARY OF CHAPTER
NOTES
Chapter 15: Counterparty credit risk
15.1 MONTE CARLO SIMULATIONS FOR CCR
15.2 SOLUTION
15.3 TESTS
15.4 RESULTS ANALYSIS AND CONCLUSIONS
15.5 SUMMARY OF CHAPTER
NOTES
Chapter 16: Market Risk
16.1 VAR-LIKE CALCULATIONS
16.2 ENHANCED REVALUATION GRIDS
16.3 FUNDAMENTAL REVIEW OF THE TRADING BOOK
16.4 PROOF OF CONCEPT
16.5 STABILITY OF TECHNIQUE
16.6 RESULTS BEYOND VANILLA PORTFOLIOS — FURTHER RESEARCH
16.7 SUMMARY OF CHAPTER
NOTES
Chapter 17: Dynamic sensitivities
17.1 SIMULATING SENSITIVITIES
17.2 THE SOLUTION
17.3 AN IMPORTANT USE OF DYNAMIC SENSITIVITIES
17.4 NUMERICAL TESTS
17.5 DISCUSSION OF RESULTS
17.6 ALTERNATIVE METHODS
17.7 SUMMARY OF CHAPTER
NOTES
Chapter 18: Pricing model calibration
18.1 INTRODUCTION
18.2 SOLUTION
18.3 TEST DESCRIPTION
18.4 RESULTS WITH CHEBYSHEV TENSORS
18.5 RESULTS WITH DEEP NEURAL NETS
18.6 COMPARISON OF RESULTS VIA CT AND DNN
18.7 SUMMARY OF CHAPTER
NOTES
Chapter 19: Approximation of the implied volatility function
19.1 THE COMPUTATION OF IMPLIED VOLATILITY
19.2 SOLUTION
19.3 RESULTS
19.4 SUMMARY OF CHAPTER
NOTES
Chapter 20: Optimisation Problems
20.1 BALANCE SHEET OPTIMISATION
20.2 MINIMISATION OF MARGIN FUNDING COST
20.3 GENERALISATION — CURRENTLY “IMPOSSIBLE” CALCULATIONS
20.4 SUMMARY OF CHAPTER
NOTES
Chapter 21: Pricing Cloning
21.1 PRICING FUNCTION CLONING
21.2 SUMMARY OF CHAPTER
NOTES
Chapter 22: XVA sensitivities
22.1 FINITE DIFFERENCES AND PROXY PRICERS
22.2 PROXY PRICERS AND AAD
NOTES
Chapter 23: Sensitivities of exotic derivatives
23.1 BENCHMARK SENSITIVITIES COMPUTATION
23.2 SENSITIVITIES VIA CHEBYSHEV TENSORS
NOTES
Chapter 24: Software libraries relevant to the book
24.1 RELEVANT SOFTWARE LIBRARIES
24.2 THE MCX SUITE
Appendix A: Families of orthogonal polynomials
NOTE
Appendix B: Exponential convergence of Chebyshev Tensors
Appendix C: Chebyshev Splines on functions with no singularity points
NOTE
Appendix D: Computational savings details for CCR
D.1 BARRIER OPTION
D.2 CROSS-CURRENCY SWAP
D.3 BERMUDAN SWAPTION
D.4 AMERICAN OPTION
NOTES
Appendix E: Computational savings details for dynamic sensitivities
E.1 FX SWAP
E.2 EUROPEAN SPREAD OPTION
NOTE
Appendix F: Dynamic sensitivities on the market space
F.1 THE PARAMETRISATION
F.2 NUMERICAL TESTS
F.3 FUTURE WORK... WHEN
k
> 1
NOTES
Appendix G: Dynamic sensitivities and IM via Jacobian Projection technique
NOTES
Appendix H: MVA optimisation — further computational enhancement
Bibliography
Index
End User License Agreement
Chapter 1
TABLE 1.1 Table showing an example of data to train on.
TABLE 1.2 Underfitting and overfitting accuracy, measured using mean squared er...
TABLE 1.3 Table showing train and validation errors for different values of reg...
Chapter 12
TABLE 12.1 Maximum error of each hNN architecture.
Chapter 14
TABLE 14.1 This table shows how the number of Chebyshev points increases with d...
TABLE 14.2 This table shows how the number of Chebyshev point evaluations neede...
Chapter 15
TABLE 15.1 Computational gain of running the CCR calculation for an IR swap wit...
TABLE 15.2 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.3 Computational gain of running the CCR calculation for a barrier opti...
TABLE 15.4 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.5 Computational costs and savings obtained by using TT-format CTs to c...
TABLE 15.6 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.7 Computational gain of running the CCR calculation for a Bermudan Swa...
TABLE 15.8 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.9 Computational gain of running the CCR calculation for a Bermudan Swa...
TABLE 15.10 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.11 Computational gain of running the CCR calculation for a Bermudan Sw...
TABLE 15.12 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.13 Computational gain of running the CCR calculation with a CT in TT f...
TABLE 15.14 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.15 Computational gain of running the CCR calculation with a DNN on a p...
TABLE 15.16 Mean and maximum relative errors for PV profiles at expectation and
TABLE 15.17 Comparison of computational gains and CVA errors of approximation a...
Chapter 16
TABLE 16.1 Portfolio of swaps, slider configuration
.
TABLE 16.2 Portfolio of swaptions, slider configuration
on
-day liquidity hor...
TABLE 16.3 Portfolio of swaptions, slider configuration
on
-day liquidity hor...
Chapter 17
TABLE 17.1 Maximum relative percentage error for market sensitivities, EIM and ...
TABLE 17.2 Maximum relative percentage error for market sensitivities, EIM and ...
TABLE 17.3 Computational savings obtained by using CTs in TT format to compute ...
TABLE 17.4 Computational savings obtained by using CTs in TT format to compute ...
Chapter 18
TABLE 18.1 Summary and comparison of testing results for solution via CTs and D...
Chapter 19
TABLE 19.1 Parameters used to build CTs. Parameter
denotes the rank
of the S...
TABLE 19.2 Errors (time-scaled implied volatility and and normalised price) and...
TABLE 19.3 Errors (time-scaled implied volatility and and normalised price) and...
TABLE 19.4 Errors (time-scaled implied volatility and and normalised price) and...
Chapter 20
TABLE 20.1 Balance sheet input variables for the optimisation routine.
Appendix D
TABLE D.1 Computational gain of running the CCR calculation for Barr...
TABLE D.2 Computational costs and savings obtained by using TT-forma...
TABLE D.3 Computational gain of running the CCR calculation with a C...
TABLE D.4 Computational gain of running the CCR calculation for a Be...
TABLE D.5 Computational gain of running the CCR calculation for a Be...
TABLE D.6 Computational gain of running the CCR calculation for Amer...
TABLE D.7 Computational gain of running the CCR calculation for Amer...
Appendix E
TABLE E.1 Computational savings obtained by using CTs in TT format t...
TABLE E.2 Computational savings obtained by using CTs in TT format t...
Appendix F
TABLE F.1 Maximum relative percentage error for EIM and PFIM (
quan...
TABLE F.2 Computational savings obtained by using full CTs to comput...
Appendix G
TABLE G.1 Maximum relative percentage error for EIM and PFIM (
quan...
TABLE G.2 Computational savings obtained by using full CTs to comput...
Chapter 1
Figure 1.1 Linear Regression fit to points in dimension
. Filled in circles...
Figure 1.2 Linear regression fit (denote by regression fit) to given data (d...
Figure 1.3 Left pane shows how the basic model of Linear Regression is not p...
Figure 1.4 Surface of loss function for basic Linear Regression.
Figure 1.5 Cross-section of cost function for DNN.
Figure 1.6 Probability density functions of normal distribution.
Figure 1.7 Figure showing underfitting and overfitting phenomenon. The plot ...
Figure 1.8 Figure showing how regression fit improves with regularisation. T...
Figure 1.9 Figure showing the different values for training and validation e...
Chapter 2
Figure 2.1 Diagram of a Perceptron.
Figure 2.2 Diagram of an artificial neuron.
Figure 2.3 Some of the most popular activation functions.
Figure 2.4 Biological neuron.
Figure 2.5 Artificial neural network with multidimensional output.
Figure 2.6 Artificial neural network with output of dimension 1.
Figure 2.7 A Deep Neural Net with input dimension
,
layers and the
-th l...
Figure 2.8 Forward pass in backpropagation.
Figure 2.9 Backward pass in backpropagation.
Figure 2.10 Surface representing the cost function for which gradient descen...
Figure 2.11 Level curves example for cost function. Arrows denote possible p...
Figure 2.12 Image represents the incoming data point. The neuron focuses on ...
Figure 2.13 Max pooling.
Chapter 3
Figure 3.1 Chebyshev polynomials from degree
to degree
.
Figure 3.2 Tensor of dimension 1. Grid given by balls. Values on grid points...
Figure 3.3 Exponential divergence from Runge function by polynomial interpol...
Figure 3.4 Chebyshev points in one dimension.
Figure 3.5 Chebyshev grid in two dimensions.
Figure 3.6 Chebyshev interpolants convergence to Runge function.
Figure 3.7 Chebyshev interpolants convergence error to Black-Scholes functio...
Figure 3.8 Chebyshev interpolant in dimension
.
Figure 3.9 Chebyshev interpolants error convergence to Black-Scholes functio...
Figure 3.10 Empirical error versus predicted error for Black-Scholes functio...
Figure 3.11 Empirical error versus predicted error for Black-Scholes functio...
Figure 3.12 Approximating Runge function with different grid distributions. ...
Figure 3.13 Comparison of errors obtained with equidistant tensors and CTs o...
Figure 3.14 CT evaluation in dimension 2.
Figure 3.15 Dashed curve is error obtained when evaluating with barycentric ...
Figure 3.16 Oscillations from the Gibbs phenomenon around a jump discontinui...
Chapter 4
Figure 4.1 Main steps of a typical risk calculation.
Figure 4.2 How the typical steps of a risk calculation are modified when usi...
Chapter 5
Figure 5.1 Autoencoder architecture.
Chapter 6
Figure 6.1 TT Tensor diagram.
Chapter 8
Figure 8.1 Path followed by the short rate within the space of swap rates. T...
Figure 8.2 Short rate direction at
inside the space of directions spanned ...
Chapter 10
Figure 10.1 Image of parametrisation
. The image has dimension
but sits i...
Chapter 12
Figure 12.1 Illustration of how a DNN funnels information in a forward pass....
Figure 12.2 Hybrid DNN and interpolation architecture.
Figure 12.3 Illustration of static data key features.
Figure 12.4 Illustration of learning process with static training set.
Figure 12.5 Illustration of learning process with dynamic training set.
Figure 12.6 The
-axis represents the number of learning iterations. The
-a...
Figure 12.7 The
-axis represents the number of learning iterations. The
-a...
Figure 12.8 Cost function versus number of learning iterations static traini...
Figure 12.9 Cost function versus number of learning iterations dynamic train...
Chapter 14
Figure 14.1 Comparison of how errors of approximation (logarithmic scale), o...
Figure 14.2 Which hybrid solution to use depending on the dimension of the p...
Chapter 15
Figure 15.1 Monte Carlo simulation showing time points in the future and mod...
Figure 15.2 PV profiles — at expectation and
th percentiles — for an IR swa...
Figure 15.3 PV profiles — at expectation and
th percentiles — for a barrier...
Figure 15.4 Noise distribution of the Monte Carlo type (i.e. original pricin...
Figure 15.5 PV profiles — at expectation and
th percentiles — for a cross-c...
Figure 15.6 PV profiles — at expectation and
th percentiles — for a Bermuda...
Figure 15.7 PV profiles — at expectation and
th percentiles — for a Bermuda...
Figure 15.8 PV profiles — at expectation and
th percentiles — for a Bermuda...
Figure 15.9 PV profiles — at expectation and
th percentiles — for a portfol...
Figure 15.10 Noise distribution of the Monte Carlo type of pricing function ...
Figure 15.11 PV profiles — at expectation and
th percentiles — for a Bermud...
Chapter 16
Figure 16.1 Comparison of convergence rates between revaluation grids on equ...
Figure 16.2 Illustration of function approximation via CT or via Taylor expa...
Figure 16.3 Portfolio of Swaps, slider configuration
. Top left: PCA dim.
Figure 16.4 This figure shows how the Orthogonal Chebyshev Slider relative E...
Figure 16.5 Portfolio of swaps, slider configuration
, PCA dimension
, eva...
Figure 16.6 Portfolio of swaptions, slider configuration
, on
-day liquidi...
Figure 16.7 This figure shows how the Orthogonal Chebyshev Slider relative E...
Figure 16.8 Portfolio of swaptions, slider configuration
, PCA dimension
,...
Figure 16.9 Portfolio of swaptions, slider configuration
, on
-day liquidi...
Figure 16.10 This figure shows how the Orthogonal Chebyshev Slider relative ...
Figure 16.11 Portfolio of swaptions, slider configuration
, PCA dimension
Figure 16.12 Daily rolling mean ratio over a period of 10 years for CT, line...
Figure 16.13 Daily rolling variance ratio over a period of 10 years for CT, ...
Chapter 17
Figure 17.1 Percentage relative errors of CTs for sensitivity to the first s...
Figure 17.2 Percentage relative errors of CTs for sensitivity to the USD/EUR...
Figure 17.3 IM profiles — expectation and
quantiles — for FX Swap obtained...
Figure 17.4 Noise distribution for the MC-based spread option pricing functi...
Figure 17.5 Percentage relative errors of CTs for the first spot. Histograms...
Figure 17.6 Equity delta margin profiles — at expectation and
quantiles — ...
Chapter 18
Figure 18.1 Average and maximum error of approximation heat map for full CTs...
Figure 18.2 Average and maximum error of approximation heat map for TT-forma...
Figure 18.3 Quantiles for the distribution of RMSE values obtained by calibr...
Figure 18.4 Average and maximum error of approximation heat map for CTs buil...
Figure 18.5 Distribution of RMSE values obtained by calibrating
synthetica...
Chapter 19
Figure 19.1 Left-hand pane shows first pivot point and both horizontal and v...
Figure 19.2 Left-hand pane shows the normalised call price (Equation 19.4) i...
Figure 19.3 Domain of the normalised call pricing function is split into fou...
Figure 19.4 Domains over which the tests were performed. Domain
is the sma...
Chapter 20
Figure 20.1 NII values for different input scenarios. Bars 1 to 8 are manual...
Figure 20.2 A 3D plot on the left pane showing the surface of NII in terms o...
Figure 20.3 MVA with a given counterparty, when one of the new payer swaps i...
Chapter 21
Figure 21.1 Illustration of a generic risk system with and without pricing c...
Chapter 23
Figure 23.1 The left pane shows the noisy spot-vol surface generated with a ...
Appendix B
Figure B.1 Chebyshev Tensor evaluation in dimension
.
Appendix C
Figure C.1 Reduction in the number of calls to pricing function through the ...
Appendix F
Figure F.1 DIM profiles — expectation and
quantiles — for a European Swapt...
Appendix G
Figure G.1 DIM profiles — expectation and
quantiles — for a European Swapt...
Cover Page
Title Page
Copyright
Dedication
Acknowledgements
Foreword
Motivation and aim of this book
Table of Contents
Begin Reading
References
Index
WILEY END USER LICENSE AGREEMENT
ii
iii
iv
v
xvii
xix
xxi
xxiii
xxiv
xxv
xxvi
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
139
140
141
142
143
144
145
146
147
148
149
150
151
153
155
156
157
159
160
161
162
163
165
166
167
168
169
170
171
172
173
174
175
176
177
179
180
181
182
183
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
385
386
387
388
389
391
392
393
395
396
397
398
399
400
401
403
404
405
407
408
409
410
411
412
413
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
Founded in 1807, John Wiley & Sons is the oldest independent publishing company in the United States. With offices in North America, Europe, Australia and Asia, Wiley is globally committed to developing and marketing print and electronic products and services for our customers' professional and personal knowledge and understanding.
The Wiley Finance series contains books written specifically for finance and investment professionals as well as sophisticated individual investors and their financial advisors. Book topics range from portfolio management to e-commerce, risk management, financial engineering, valuation and financial instrument analysis, as well as much more.
For a list of available titles, visit our Web site at www.WileyFinance.com.
I. RUIZ
M. ZERON
Foreword by P. Karasinski
This edition first published 2021
Copyright © 2022 by Ignacio Ruiz and Mariano Zeron. All rights reserved.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Ignacio Ruiz and Mariano Zeron to be identified as the authors of this work has been asserted in accordance with law.
Registered Office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data
Names: Ruiz, Ignacio, 1972- author. | Laris, Mariano Zeron Medina, author.
Title: Machine learning for risk calculations : a practitioner's view / Ignacio Ruiz, Mariano Zeron Medina Laris.
Description: Hoboken, New Jersey : Wiley, [2022] | Includes index.
Identifiers: LCCN 2021036694 (print) | LCCN 2021036695 (ebook) | ISBN 9781119791386 (hardback) | ISBN 9781119791393 (adobe pdf) | ISBN 9781119791409 (epub)
Subjects: LCSH: Machine learning. | Financial risk management.
Classification: LCC Q325.5 .R855 2022 (print) | LCC Q325.5 (ebook) | DDC 332.10285/631—dc23
LC record available at https://lccn.loc.gov/2021036694
LC ebook record available at http://lccn.loc.gov/2021036695
Cover image: © korkeng/Shutterstock
Cover design: Wiley
To my sister Cristina, a beautiful soul around us, an inspiration in my life.
To my parents, for their unwavering support.
This book has been significantly benefited from the support and input of a whole range of individuals whom we want to thank. In particular, we would like to thank our friend Emilio Viúdez, who directly contributed to many of the chapters in this book. Most importantly, he has been an extraordinary companion in the journey of which this book is one of the results.
We would also like to thank (in no particular order of importance) Jesus Alonso, Andrew Aziz, Dimitra Bampou, Russell Barker, Assad Bouayoun, Juan Antonio Burgos, Paul Burnett, Pablo Cassatella, Justin Chan, Lucia Cipolina, Alex Daminoff, Matthew Dear, Piero Del Boca, Thomas Devereux, Alberto Elices, Eduardo Epperlein and his group at NOMURA, Andrew Green, Stephen Hancock, Brian Huge, Marc Jeannin, Akshay Jha, Paul Jones, Christian Kappen, Piotr Karasinski, Gordon Lee, Udit Mahajan, Navneet Mathur, Adolfo Montoro, Cesar Mora, Rubén Moral, Yacine Moulay-Rchid, Laura Müller, Stuart Neil, Maria Nogueiras, Yogi Patel, Jose María Pesquero, Maxim Petrashev, Carlos Rioja, Samir Saurav, Joaquín Seco, Naimish Shah, Anton Simanenka, Jono Simpson, Takis Sironis, John Sleath, Robert Smith, Theo Stampoulis, Lauri Tamminen, Alok Tiwari, Alessandro Vecci, Satya Vemireddy and Hernan Zúñiga for the time and patience they had with us in different occasions.
The reader can find further resources on the topics of this book at
mocaxintelligence.org
and on the YouTube channel
youtube.com/mocax
I met Mariano and Ignacio at the Quantitative Finance 2019 Rome conference. Mariano gave a presentation on the joint work with Ignacio on the Chebyshev Tensors techniques for CVA pricing and FRTB capital. After the talk I went to speak to Mariano, as I had found the results he had shown quite remarkable. That was the beginning of my productive relationship with both of them.
The 2008 banking crisis changed profoundly the derivatives industry. Before it, the derivatives business was driven by the creation of exotic trades. After the crisis, the paradigm changed and computing the risks carried in the balance sheets became central to a degree not seen before. Being able to compute accurately these risk numbers in a timely and cost-effective manner is now the main driver of the business.
This has been achieved historically mostly by increasing the amount of hardware used for the computation in conjunction with only a few limited algorithmic solutions. This route has become increasingly less economical, as computational needs have further increased due to regulation and market demand. Mariano and Ignacio offer in this book a family of algorithmic solutions that substantially reduce the need for increased computational power, as well as solutions for some calculations that are very difficult to do without them.
This text applies the mathematics behind the Chebyshev Tensors combined with Deep Learning within the specific contexts of many of the risk calculations that banks, hedge funds and other financial institutions need to do on a constant basis, with the aim of reducing the computational demand of these calculations, while retaining the required accuracy. This is done in a robust manner, using as a starting point the mathematical properties of the techniques involved, but not forgetting that at times some well understood heuristics are needed to potentiate the applicability of the mathematical methods chosen.
This thinking process is applied to a number of practical applications presented in the final chapters of the book, ranging from counterparty credit risk (CCR), market risk, portfolio optimisation, and several others. The results presented in this book have the potential to be disruptive for the industry. I hope that the quantitative finance community will enjoy and benefit from the ideas put forward in this text.
Piotr Karasinski
The world of risk analytics has had an ever growing demand for computing capacity since the early 2000s. When one of the book's authors started working in this field, he was asked to work on the CCR engine at Credit Suisse for the new Basel II regulation and IMM capital calculation. At the time, it was the latest big thing in the industry. The IMM-related calculations were one of the most (if not the most) complicated calculations the bank had done up to that point. A few hundred CPUs were bought and installed in a state-of-the-art grid computing farm. The belief was that such grid would be able to do any CCR calculation. However, it did not take much time for the team to realise that more computing power was needed to match the computational requirements of new calculations being requested. Over the years, we have experienced a world in which, regardless of how much computing power is available with the latest technologies, soon it is insufficient to meet the new demands and needs to be upgraded only a few years later.
Indeed, the world of banking, and in particular the business of derivatives, has become a technology race (like many other industries, it must be said). As P. Karasinski says in his Foreword to this book, it used to be about creating and selling the new exotic product. Now it is about computing prices and increasingly sophisticated risk metrics in a prompt and efficient manner — partly as a result of regulations that have become more stringent since the 2008 crisis, partly as a result of the higher standards for risk management the industry has developed. That is where the differentiation between different broker-dealers resides and where the source of profitability lies at present.
Until recently, the computational cost associated with the calculation of risk numbers has been mostly addressed by throwing brute computing capacity at it, that is, buying more and better hardware. It is known that many tier-one banks have farms of several tens of thousand of CPUs and GPUs. Also banks are now leasing cloud computing capability from external vendors. This is, of course, at a considerable cost, which needs to be managed. Obviously, this cannot continue increasing forever without denting the profitability of the business.
Part of the reason why financial institutions have opted for more hardware is down to Moore's second law that states that the computing capacity of transistor chips per dollar of capital expenditure grows exponentially.1 This has certainly been true up to recently. However, this increase in computing capacity was driven by the constant miniaturisation of the basic elements of chips (semiconductor transistors, magnetic memory bits, etc.). Now that we are reaching the 10 nanometer range for semiconductor transistors, the rate of growth stated in Moore's Laws is stalling in commercial computers. This is illustrated by the fact that 10 years ago, the processing capability of a new computer was massively superior compared to the processing capacity of computers only a few years before; at present, it is only marginally better. The reason is that the size of one atom is, roughly, around 0.1 nanometers, and when we decrease the size of transistors below a few tens of nanometers, quantum effects start to appear and temperature becomes a problem. The subject of quantum computing — a most interesting topic — is well outside the scope of this book, but the reality is that, for now and for the foreseeable future, hardware will only be able to offer limited increased computing capacity. As a result, the paradigm has changed from creating more computing power via hardware to developing algorithmic solutions that optimise calculations.
In parallel, there has been a lot of work done by the quantitative analytics community to create algorithmic methods that accelerate the calculations and decrease their hardware need. A notorious example has been the family of Adjoint Algorithmic Differentiation (AAD) solutions in the world of XVA pricing, which in its general version can compute as many XVA sensitivities as needed with the added cost of (roughly) 10 XVA pricing runs. Seen from the perspective of the times when computing one CVA run for a few netting sets was already a challenge, this improvement is remarkable. However, it comes at a considerable price: the implementation effort is most significant. This is particularly the case if one already has a functioning XVA platform and wants to adapt it to AAD. This task can be so daunting that many banks do not consider it a viable option.
This book is based on the belief that the optimal solution to many of the computational challenges in finance lies in the union of algorithmic solutions, and appropriate software implementation of these, run on powerful hardware. The aim of this book is to review how some numerical mathematical methods, when applied thoughtfully, taking into account the specific characteristics of the calculations we want to improve, can create substantial computational enhancements. Indeed, the book is the direct result of the experience the authors have had, over the past few years, while trying to solve difficult calculations (sometimes seemingly impossible) in real-life settings within financial institutions.
The solutions proposed throughout this book apply mainly to existing risk engines within operating financial institutions. Some of these risk engines have been developed over many years by different business units and with different goals in mind. This has produced, in many cases, an amalgamation of risk engines that is suboptimal from an efficiency standpoint. Although starting new engines from scratch may correct the shortcomings of the legacy systems, this requires not only a lot of time and money but also enormous projects in many cases. In fact, a number of banks have reportedly started and stopped the development of global pricing and risk systems from the ground up due to the scale of the job. Quite often, it makes more sense, from a practical perspective, to upgrade existing engines, improving what already exists, using the increasingly demanding business needs and regulatory environment as guidelines, instead of developing a new one. With this in mind, the solutions proposed in this book are highly pragmatic.
Also, we keep in mind that for a solution to be implemented, budgets need to be approved by someone usually high up in the pyramid. Therefore, small(ish) incremental changes with tangible benefits are more likely to succeed than big open ambitious projects. However, to be noted, this does not mean that the solutions put forward in the book cannot be implemented in a system being built from the ground up; in fact, in some cases this would be the optimal approach. All we say is that having the option of incremental changes that are easy to manage is always a bonus to not lose sight of.
One of the common threads in all solutions discussed in the book is that they are grounded on mathematically robust results. Ideally, we would like everything to be based on solid theoretical frameworks. However, as the reader will soon learn, sometimes heuristic rules need to be used in conjunction with mathematical theories. The right combination of mathematical theories and heuristics, partly determined by the context of the problem (for example, the characteristics of the systems being used), is what delivers the most effective outcome. When such heuristic rules are used or discussed, we make the point clear, indicating its range of validity and limitations, so the quantitative analyst can make use of them safely.
Many of the computational problems that banks encounter are the result of having to evaluate a given function a large number of times under (only slightly) different inputs, together with the fact that such functions are costly to compute. Examples of these functions are Over-the-Counter derivative pricing functions, that need to be evaluated from several hundreds to millions of times in risk calculations. From a computational standpoint, these computations tend to be the bottleneck in risk calculations. Our approach is to find a way to take advantage of the specifics in the risk calculation so that a very accurate and fast-to-compute replica of the pricing function can be generated. As a consequence, one computes the same risk metrics in practice, but more efficiently. Also, similar replication methods are applied to other computational challenges for model calibration, for example, leading to significant improvements, too. Furthermore, the techniques presented in this book open the door for a new family of computations that, without them, seem impossible to achieve in many cases, like balance sheet optimisations.
As just said, the solutions discussed in this book are rooted in identifying computationally expensive functions to evaluate — which create computational bottlenecks in calculations — and creating replicas of these problematic functions that can be efficiently computed while at the same time giving essentially the same results.
We start off in Part I with a general overview of Machine Learning techniques. Then we focus on two of the most effective methods to replicate functions: Deep Neural Nets (DNNs) and Chebyshev Tensors (CTs). In mathematical terms, we delve into function approximation because the goal is to create a mathematical object, which comes with a computational architecture, that closely approximates the original function. In our case, we look for techniques that deliver replicas that can be evaluated substantially faster than the function they approximate and that can be calibrated with reasonable computational effort. At this point, discussions are mostly theoretical and few comments are made regarding their applications. The goal is to provide a solid mathematical background we can leverage from subsequent chapters.
Once the fundamental approximation methods have been established, we present, in Part II, a number of tools that will enable the optimal utilisation of the approximation methods in the applications of interest. We see these tools as the equivalent of nuts, bolts and spanners used to assemble the different components of a car: essential tools without which we cannot build the vehicle. A number of mathematical and computational tools will be discussed, these being the Composition Technique, Tensor Extension Algorithms, Sliding Techniques and Jacobian Techniques.
Then, Part III explains how the approximation methods from Part I and the tools in Part II can be combined to create solutions. In particular, we focus on how to use the toolkit with DNNs and with CTs, as well as how to use DNNs and CTs together in order to achieve a hybrid approximation method.
Following all the previous discussions, the book comes to life in Part IV. In it, the theoretical solutions are applied to real computational problems that financial institutions face. We cover the fundamental calculations in CCR (XVAs, IMM capital, PFE); XVA sensitivities (hedging and CVA capital); Market Risk (VaR and FRTB); dynamic simulation of portfolio sensitivities in a Monte Carlo simulation — with a special focus on its application to the simulation of Initial Margin (XVAs, IMM capital, PFE and CVA capital); we discuss computational techniques that enable the efficient calibration of sophisticated pricing models (Front Office pricer calibration); we also see how the fundamental approximation techniques can be used in the context of implied volatility evaluation (ultra-fast computations); stable computations of sensitivities for exotic derivatives is covered (Front Office hedging); we also discuss how to use the techniques presented in the previous part of the book in the context of balance sheet and portfolio optimisation problems (profitability maximisation); and we elaborate on an originally unintended but positive side effect of the methods: how a pricing function can be “cloned” from one IT system to another (IT systems interaction).
The different software packages used to generate the results presented (mainly) in Part IV are referenced in a chapter toward the end of the book, along with the websites where they can be downloaded from. The software package that implements most of the Chebyshev machinery used in this book, the MoCaX suit (developed by us), has dedicated sections in that chapter with examples of how to use it. We trust the readers will find it useful.
Some of the methods discussed in this book fall under the scope of a patent. At the time of this book going to press, the patent holders are happy to provide a license to anyone interested in using them. For further information, please contact the authors.
We hope that the quantitative community finds this book interesting and useful, and we encourage anyone working in this field to get in touch with us. We very much enjoy collaboration frameworks. We can be reached via LinkedIn, for example.
1
Moore's first law relates to the capacity of processors. Moore's second law relates to the monetary cost of processors.
The aim of this chapter is to present in the clearest possible manner the main concepts behind Machine Learning (ML) models. This will set a unified framework under which the approximation methods — which constitute the spearhead of the solutions used to tackle the computational problems in Part IV— are presented.
The main ideas presented in this chapter will be particularly relevant to Chapter 2, where DNNs are introduced. Without them, a good number of ideas in Chapter 2 will not be as easy to digest.
The chapter starts with a quick introduction to the field of ML. We then touch upon its history, briefly describe the main areas in ML and mention the applications we are most interested in.
Then we delve into the core of the chapter, which is the presentation of the main concepts underpinning most ML models. These will be treated either in relation to the concept of training and predicting with an ML model — considering both the frequentist and the Bayesian approach — or in relation to the idea of model complexity.
Along the way, we will use the standard Linear Model to introduce and illustrate the main concepts. Despite its simplicity, the standard Linear Model shares the key ML concepts with most other models. It therefore makes sense to use it as a guiding thread.
Artificial Intelligence (AI), the field that studies the intelligence of machines — as opposed to natural intelligence, which is displayed by humans and animals — has been one of the most successful and thriving areas of study in the last few decades. Among its many branches, ML is the one that is concerned with algorithms that automatically improve through experience. This is a fundamental component of AI, as it enables learning from the structures and patterns of data, allowing non-human agents to make decisions.
An often-quoted formal definition of ML is the following:
“A computer program is said to learn from experience with respect to some class of tasks and performance measure if its performance at tasks in , as measured by , improves with experience ” ([57]).
Intuitively speaking, this says that ML consists of a collection of methods and algorithms that automatically extract patterns from data with the purpose of performing predictive tasks.
Even though the term Machine Learning was coined in 1959 by Arthur Samuel — a leading figure in the fields of computer gaming and AI — some of the most basic and common ML algorithms predate the second half of the 20th century, the period with which ML is normally associated.
For example, the origins of Linear Regression can be traced to the beginning of the 19th century through the works of Legendre and Gauss ([70]). This technique was designed to determine the orbit of bodies around the sun by using astronomical data. Calculations would have been made by hand. Also, the amount of data collected would have been small (minuscule by today's standards). Yet, the technique proved successful even under these circumstances.
Other ML models were also developed decades before the advent of computers. For example, Principal Component Analysis (PCA), a well-known dimensionality reduction technique — used on a regular basis still today — was first developed in 1901 by Karl Pearson.
Also, the main ideas underpinning decision trees — the main constituent in random forests, one of the most powerful ML models — have existed, in some form or another, for centuries. Clustering techniques — a common technique employed in ML and data science these days — were used in psychology and anthropology as early as the 1930s.1
However, the models just mentioned did not develop into the form they have today until the second half of the 20th century. There were two key elements that helped change the landscape. The first was an unprecedented increase, throughout the 20th century, in the sophistication of statistical modelling. The second and probably most fundamental was the development of the computer as we know it today in the 1950s.
Before the advent of the computer, thinking of asking a machine to perform the tasks we nowadays do on a regular basis would have been unthinkable to the vast majority of scientists. Having access to a computer meant that computations with data sets could take place in a very short period of time. Also, it enabled researchers to consider larger and more complex data sets. This, alongside a larger appetite for more complex statistical models, led to the enhancement of old models and the creation of new.
For example, all sorts of bells and whistles were added to the 19th century version of Linear Regression to make it much more flexible, robust and capable of learning from non-linear data. Also, decision trees gave rise to random forests. At the same time, new models — some of the most powerful in ML today — were developed, such as support vector machines and Neural Nets.
It is worth mentioning that, despite the large number of new models developed over the last few decades, the Linear Model — as mentioned before, one of the oldest — is still used with great success to this day. Moreover, it has been one of the main building blocks for other, more sophisticated models. Despite its simplicity, it shares its main characteristics with pretty much all other ML models. As such, it will be, in coming sections, the example we use to illustrate the main facets of ML models and in particular those of DNNs.
In the first few years, even decades, of the ML era, the range of applications was limited. This was not just due to the simplicity of the models and the small number of people working with them but also to the fact that computers were only found in specialised research centres. As computers have become more powerful and ubiquitous — not only in a good number of industries but also as a tool for personal use — the range and number of applications has grown substantially.
Nowadays, Machine Learning models are used in a wide range of applications: in forecasting, such as in weather and stock market prediction; in anomaly detection, for example, fraud detection; for classification, for example, to identify patients with specific medical conditions; for ranking tasks, as search engines do when recommending websites; for summarising, for example, sentiment analysis in social media; for decision-making in robotics. And the list goes on.
As was mentioned in Section 1.1, ML consists of a set of models that automatically learn patterns from data and use these patterns to perform some task. These methods are typically divided into three sub-categories: supervised learning, unsupervised learning and reinforcement learning.
Models and algorithms that fall into the category of supervised learning are those that learn from input and output data. Denote the input data by . This consists of a set of vectors , where each vector , for , lives in , for some . The variables in these vectors are typically called features or attributes of the set . They can be discrete or continuous. The output data, denoted by , consists of a set of vectors — typically real values — that represent a response or target variable. Again, this variable can be discrete or continuous.
Each element in pairs up with . The data used to train the algorithm therefore consists of data points . The model learns patterns from subject to this pairing; the idea being that once these patterns are learnt, the model can, with a high degree of accuracy, assign a value to any new data point .
One of the implicit assumptions in supervised learning is that the features from the data set are powerful enough to the target variable . For example, if we want to predict whether it will rain we should consider features that are related to the chance of raining. One should not choose the amount of oil being extracted in Saudi Arabia as a feature to predict the chance of rain in Buenos Aires, but rather measurements such as atmospheric pressure, humidity and wind around Buenos Aires.
There is a vast number of situations where supervised learning has been applied over the years. We make use of it on a regular basis in things such as email spam detection, image and speech recognition, fraud detection, weather prediction, medical diagnosing and so on.
By contrast, in unsupervised learning the data set consists of only the input set . The goal is to find patterns in that are not subject to a target variable. This is, of course, a problem that opens up a range of options in terms of how one should train the algorithms.
Although not used as often as supervised learning, the range of applications nowadays is anyway vast. It is used for clustering, anomaly detection, information compression, density estimation and latent variable learning. These techniques can help assign labels to data that are otherwise unlabelled. It can also help reduce the size of the feature space in data sets, making it more portable and in some cases reducing the complexity of the data set (something that can make supervised algorithms perform better).
There is a type of unsupervised learning algorithm that is of particular relevance to this book. This encompasses all dimensionality reduction algorithms, such as Principal Component Analysis. Not only have they been used in finance for a long time and in a wide range of cases, but they constitute an important part of the Sliding Technique presented in Chapter 7.
The third main sub-category in ML is reinforcement learning. This field deals with the ways in which a software agent ought to make decisions within an environment, where the aim is to maximise a predefined notion of gain or reward. Under this ML paradigm, there is no input/output data available to train from the beginning. The model learns through interaction with the environment. The data from which it learns are constantly generated and depend on the model used. Reinforcement learning models have become popular in applications such as self-driving cars, reducing energy costs in different industries, and famously helping train a machine to beat the number one player in the world at Go.2 It also has become important in hyper-parameter optimisation, a very important aspect that comes up when working in real-life applications with many ML models. We discuss this topic in more detail in Section 1.4.3.
The applications of interest in this book are presented in Part IV. These essentially consist of resolving the computational bottleneck associated with the repeated call of functions in risk calculations. We explain the specifics of applications to CCR, market risk, model calibration, balance sheet optimisation, volatility surfaces and risk metric optimisation exercises. Although the main ML model used in recent years to tackle these problems is based on DNNs, more general (and basic) ML models have been used for a long time to tackle closely related problems.
Linear regression has been used for years in many areas of finance. In fact, one of the most popular techniques in the last 20 years, used to speed up the pricing of exotic products and the computation of risk calculations, is Longstaff-Schwartz (least-squares Monte Carlo), which essentially relies on the repeated application of Linear Regression in a Monte Carlo simulation. This was first presented in [52] and opened up new avenues of research. Alongside Linear Regression, dimensionality reduction techniques, such as PCA, have been used for a long time in many areas of finance. People often think of ML as a set of techniques recently developed, involving much more complex and expensive algorithms than the ones underpinning Linear Regression and PCA. However, both Linear Regression and PCA perfectly satisfy the conditions to be ML algorithms.
For a long time, the ML models and algorithms used in finance were of the simpler kind. But with the advent of powerful computing capabilities, a range of well implemented and easy to use DNNs packages in the most common programming languages, practitioners have begun to increasingly use DNNs.
In particular, risk calculations, risk optimisation and calibration of pricing functions — all exercises that demand large computational capabilities — stand to gain a lot from the use of approximating techniques. This book elaborates on two of the most powerful ones that we know of: DNNs and Chebyshev Tensors (CTs). Chapter 2 sets the theoretical framework for DNNs, and Chapter 3 does so for CTs.
Focusing on DNNs, there is a growing body of literature addressing the computational bottleneck of risk calculations using DNNs, examples of which are [25] in CCR and [41] in pricing function calibration. The core idea is to replace functions that are called thousands of times in a particular process by a DNN. Once the approximation has been achieved — through proper training — the DNN is used instead of the function. Because DNNs are fast to evaluate — unless they are too large, amounting to little more than simple linear algebra operations — the process that was once problematic from a computational point of view is reduced to a manageable computation.
This section presents the core elements of Linear Regression. Linear Regression (more generally, Linear Models) is very important in ML. Mathematically, it is one of the simplest and easiest to understand. A big advantage is that it can be solved analytically; meaning, it is easy to use and quick to deploy, features that many other models do not have. However, despite their simplicity, Linear Models are used on a regular basis in a wide range of contexts with success. Not only that, but Linear Models also constitute important modules or building blocks for more complex models. Finally — and of particular importance for this chapter — the fundamental concepts and characteristics of a huge range of ML models can be found in the Linear Model. As the latter is simpler, it makes sense to start there.
Consider the following example. A real estate company is interested in having an effective way of pricing houses in different areas of a city. Assume they have access to the surface area of the properties and the average household income of each post code. Moreover, assume they have the price for of these properties. This means, in terms of input/output data , that they have input data , where represents a property, its corresponding surface area, and the average income of the post code where the property is located, and output data , where is the price of the -th property.
The question we ask is whether the data can be used to obtain a function with which we can predict the price of properties. In particular, can we obtain a function , such that given any pair , not necessarily in , — where is the surface area of the property and the average income of its post code — the value is a good proxy to the real price of the property? That is, can we learn the patterns present in
