E-Book
61,99 €

Machine Learning for Risk Calculations E-Book

Ignacio Ruiz

0,0

61,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: John Wiley & Sons
Kategorie: Fachliteratur
Serie: Wiley Finance Series
Sprache: Englisch

Beschreibung

State-of-the-art algorithmic deep learning and tensoring techniques for financial institutions

The computational demand of risk calculations in financial institutions has ballooned and shows no sign of stopping. It is no longer viable to simply add more computing power to deal with this increased demand. The solution? Algorithmic solutions based on deep learning and Chebyshev tensors represent a practical way to reduce costs while simultaneously increasing risk calculation capabilities. Machine Learning for Risk Calculations: A Practitioner’s View provides an in-depth review of a number of algorithmic solutions and demonstrates how they can be used to overcome the massive computational burden of risk calculations in financial institutions.

This book will get you started by reviewing fundamental techniques, including deep learning and Chebyshev tensors. You’ll then discover algorithmic tools that, in combination with the fundamentals, deliver actual solutions to the real problems financial institutions encounter on a regular basis. Numerical tests and examples demonstrate how these solutions can be applied to practical problems, including XVA and Counterparty Credit Risk, IMM capital, PFE, VaR, FRTB, Dynamic Initial Margin, pricing function calibration, volatility surface parametrisation, portfolio optimisation and others. Finally, you’ll uncover the benefits these techniques provide, the practicalities of implementing them, and the software which can be used.

Review the fundamentals of deep learning and Chebyshev tensors
Discover pioneering algorithmic techniques that can create new opportunities in complex risk calculation
Learn how to apply the solutions to a wide range of real-life risk calculations.
Download sample code used in the book, so you can follow along and experiment with your own calculations
Realize improved risk management whilst overcoming the burden of limited computational power

Quants, IT professionals, and financial risk managers will benefit from this practitioner-oriented approach to state-of-the-art risk calculation.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 885

Veröffentlichungsjahr: 2021

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Cover

Title Page

Dedication

Acknowledgements

Foreword

Motivation and aim of this booknotesSet

BOOK OUTLINE

NOTE

PART One: Fundamental Approximation Methods

Chapter 1: Machine Learning

1.1 INTRODUCTION TO MACHINE LEARNING

1.2 THE LINEAR MODEL

1.3 TRAINING AND PREDICTING

1.4 MODEL COMPLEXITY

NOTES

Chapter 2: Deep Neural Nets

2.1 A BRIEF HISTORY OF DEEP NEURAL NETS

2.2 THE BASIC DEEP NEURAL NET MODEL

2.3 UNIVERSAL APPROXIMATION THEOREMS

2.4 TRAINING OF DEEP NEURAL NETS

2.5 MORE SOPHISTICATED DNNs

2.6 SUMMARY OF CHAPTER

NOTES

Chapter 3: Chebyshev Tensors

3.1 APPROXIMATING FUNCTIONS WITH POLYNOMIALS

3.2 CHEBYSHEV SERIES

3.3 CHEBYSHEV TENSORS AND INTERPOLANTS

3.4 EX ANTE ERROR ESTIMATION

3.5 WHAT MAKES CHEBYSHEV POINTS UNIQUE

3.6 EVALUATION OF CHEBYSHEV INTERPOLANTS

3.7 DERIVATIVE APPROXIMATION

3.8 CHEBYSHEV SPLINES

3.9 ALGEBRAIC OPERATIONS WITH CHEBYSHEV TENSORS

3.10 CHEBYSHEV TENSORS AND MACHINE LEARNING

3.11 SUMMARY OF CHAPTER

NOTES

PART Two: The toolkit — plugging in approximation methods

Chapter 4: Introduction: why is a toolkit needed

4.1 THE PRICING PROBLEM

4.2 RISK CALCULATION WITH PROXY PRICING

4.3 THE CURSE OF DIMENSIONALITY

4.4 THE TECHNIQUES IN THE TOOLKIT

Chapter 5: Composition techniques

5.1 LEVERAGING FROM EXISTING PARAMETRISATIONS

5.2 CREATING A PARAMETRISATION

5.3 SUMMARY OF CHAPTER

Chapter 6: Tensors in TT format and Tensor Extension Algorithms

6.1 TENSORS IN TT FORMAT

6.2 TENSOR EXTENSION ALGORITHMS

6.3 STEP 1 — OPTIMISING OVER TENSORS OF FIXED RANK

6.4 STEP 2 — OPTIMISING OVER TENSORS OF VARYING RANK

6.5 STEP 3 — ADAPTING THE SAMPLING SET

6.6 SUMMARY OF CHAPTER

NOTES

Chapter 7: Sliding Technique

7.1 SLIDE

7.2 SLIDER

7.3 EVALUATING A SLIDER

7.4 SUMMARY OF CHAPTER

Chapter 8: The Jacobian projection technique

8.1 SETTING THE BACKGROUND

8.2 WHAT WE CAN RECOVER

8.3 PARTIAL DERIVATIVES VIA PROJECTIONS ONTO THE JACOBIAN

NOTES

PART Three: Hybrid solutions — approximation methods and the toolkit

Chapter 9: Introduction

9.1 THE DIMENSIONALITY PROBLEM REVISITED

9.2 EXPLOITING THE COMPOSITION TECHNIQUE

Chapter 10: The Toolkit and Deep Neural Nets

10.1 BUILDING ON

USING THE IMAGE OF

10.2 BUILDING ON

Chapter 11: The Toolkit and Chebyshev Tensors

11.1 FULL CHEBYSHEV TENSOR

11.2 TT-FORMAT CHEBYSHEV TENSOR

11.3 CHEBYSHEV SLIDER

11.4 A FINAL NOTE

Chapter 12: Hybrid Deep Neural Nets and Chebyshev Tensors Frameworks

12.1 THE FUNDAMENTAL IDEA

12.2 DNN+CT WITH STATIC TRAINING SET

12.3 DNN+CT WITH DYNAMIC TRAINING SET

12.4 NUMERICAL TESTS

12.5 ENHANCED DNN+CT ARCHITECTURES AND FURTHER RESEARCH

NOTES

PART Four: Applications

Chapter 13: The aim

13.1 SUITABILITY OF THE APPROXIMATION METHODS

13.2 UNDERSTANDING THE VARIABLES AT PLAY

NOTE

Chapter 14: When to use Chebyshev Tensors and when to use Deep Neural Nets

14.1 SPEED AND CONVERGENCE

14.2 THE QUESTION OF DIMENSION

14.3 PARTIAL DERIVATIVES AND EX ANTE ERROR ESTIMATION

14.4 SUMMARY OF CHAPTER

NOTES

Chapter 15: Counterparty credit risk

15.1 MONTE CARLO SIMULATIONS FOR CCR

15.2 SOLUTION

15.3 TESTS

15.4 RESULTS ANALYSIS AND CONCLUSIONS

15.5 SUMMARY OF CHAPTER

NOTES

Chapter 16: Market Risk

16.1 VAR-LIKE CALCULATIONS

16.2 ENHANCED REVALUATION GRIDS

16.3 FUNDAMENTAL REVIEW OF THE TRADING BOOK

16.4 PROOF OF CONCEPT

16.5 STABILITY OF TECHNIQUE

16.6 RESULTS BEYOND VANILLA PORTFOLIOS — FURTHER RESEARCH

16.7 SUMMARY OF CHAPTER

NOTES

Chapter 17: Dynamic sensitivities

17.1 SIMULATING SENSITIVITIES

17.2 THE SOLUTION

17.3 AN IMPORTANT USE OF DYNAMIC SENSITIVITIES

17.4 NUMERICAL TESTS

17.5 DISCUSSION OF RESULTS

17.6 ALTERNATIVE METHODS

17.7 SUMMARY OF CHAPTER

NOTES

Chapter 18: Pricing model calibration

18.1 INTRODUCTION

18.2 SOLUTION

18.3 TEST DESCRIPTION

18.4 RESULTS WITH CHEBYSHEV TENSORS

18.5 RESULTS WITH DEEP NEURAL NETS

18.6 COMPARISON OF RESULTS VIA CT AND DNN

18.7 SUMMARY OF CHAPTER

NOTES

Chapter 19: Approximation of the implied volatility function

19.1 THE COMPUTATION OF IMPLIED VOLATILITY

19.2 SOLUTION

19.3 RESULTS

19.4 SUMMARY OF CHAPTER

NOTES

Chapter 20: Optimisation Problems

20.1 BALANCE SHEET OPTIMISATION

20.2 MINIMISATION OF MARGIN FUNDING COST

20.3 GENERALISATION — CURRENTLY “IMPOSSIBLE” CALCULATIONS

20.4 SUMMARY OF CHAPTER

NOTES

Chapter 21: Pricing Cloning

21.1 PRICING FUNCTION CLONING

21.2 SUMMARY OF CHAPTER

NOTES

Chapter 22: XVA sensitivities

22.1 FINITE DIFFERENCES AND PROXY PRICERS

22.2 PROXY PRICERS AND AAD

NOTES

Chapter 23: Sensitivities of exotic derivatives

23.1 BENCHMARK SENSITIVITIES COMPUTATION

23.2 SENSITIVITIES VIA CHEBYSHEV TENSORS

NOTES

Chapter 24: Software libraries relevant to the book

24.1 RELEVANT SOFTWARE LIBRARIES

24.2 THE MCX SUITE

Appendix A: Families of orthogonal polynomials

NOTE

Appendix B: Exponential convergence of Chebyshev Tensors

Appendix C: Chebyshev Splines on functions with no singularity points

NOTE

Appendix D: Computational savings details for CCR

D.1 BARRIER OPTION

D.2 CROSS-CURRENCY SWAP

D.3 BERMUDAN SWAPTION

D.4 AMERICAN OPTION

NOTES

Appendix E: Computational savings details for dynamic sensitivities

E.1 FX SWAP

E.2 EUROPEAN SPREAD OPTION

NOTE

Appendix F: Dynamic sensitivities on the market space

F.1 THE PARAMETRISATION

F.2 NUMERICAL TESTS

F.3 FUTURE WORK... WHEN

> 1

NOTES

Appendix G: Dynamic sensitivities and IM via Jacobian Projection technique

NOTES

Appendix H: MVA optimisation — further computational enhancement

Bibliography

Index

End User License Agreement

List of Tables

Chapter 1

TABLE 1.1 Table showing an example of data to train on.

TABLE 1.2 Underfitting and overfitting accuracy, measured using mean squared er...

TABLE 1.3 Table showing train and validation errors for different values of reg...

Chapter 12

TABLE 12.1 Maximum error of each hNN architecture.

Chapter 14

TABLE 14.1 This table shows how the number of Chebyshev points increases with d...

TABLE 14.2 This table shows how the number of Chebyshev point evaluations neede...

Chapter 15

TABLE 15.1 Computational gain of running the CCR calculation for an IR swap wit...

TABLE 15.2 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.3 Computational gain of running the CCR calculation for a barrier opti...

TABLE 15.4 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.5 Computational costs and savings obtained by using TT-format CTs to c...

TABLE 15.6 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.7 Computational gain of running the CCR calculation for a Bermudan Swa...

TABLE 15.8 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.9 Computational gain of running the CCR calculation for a Bermudan Swa...

TABLE 15.10 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.11 Computational gain of running the CCR calculation for a Bermudan Sw...

TABLE 15.12 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.13 Computational gain of running the CCR calculation with a CT in TT f...

TABLE 15.14 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.15 Computational gain of running the CCR calculation with a DNN on a p...

TABLE 15.16 Mean and maximum relative errors for PV profiles at expectation and

TABLE 15.17 Comparison of computational gains and CVA errors of approximation a...

Chapter 16

TABLE 16.1 Portfolio of swaps, slider configuration

TABLE 16.2 Portfolio of swaptions, slider configuration

-day liquidity hor...

TABLE 16.3 Portfolio of swaptions, slider configuration

-day liquidity hor...

Chapter 17

TABLE 17.1 Maximum relative percentage error for market sensitivities, EIM and ...

TABLE 17.2 Maximum relative percentage error for market sensitivities, EIM and ...

TABLE 17.3 Computational savings obtained by using CTs in TT format to compute ...

TABLE 17.4 Computational savings obtained by using CTs in TT format to compute ...

Chapter 18

TABLE 18.1 Summary and comparison of testing results for solution via CTs and D...

Chapter 19

TABLE 19.1 Parameters used to build CTs. Parameter

denotes the rank

of the S...

TABLE 19.2 Errors (time-scaled implied volatility and and normalised price) and...

TABLE 19.3 Errors (time-scaled implied volatility and and normalised price) and...

TABLE 19.4 Errors (time-scaled implied volatility and and normalised price) and...

Chapter 20

TABLE 20.1 Balance sheet input variables for the optimisation routine.

Appendix D

TABLE D.1 Computational gain of running the CCR calculation for Barr...

TABLE D.2 Computational costs and savings obtained by using TT-forma...

TABLE D.3 Computational gain of running the CCR calculation with a C...

TABLE D.4 Computational gain of running the CCR calculation for a Be...

TABLE D.5 Computational gain of running the CCR calculation for a Be...

TABLE D.6 Computational gain of running the CCR calculation for Amer...

TABLE D.7 Computational gain of running the CCR calculation for Amer...

Appendix E

TABLE E.1 Computational savings obtained by using CTs in TT format t...

TABLE E.2 Computational savings obtained by using CTs in TT format t...

Appendix F

TABLE F.1 Maximum relative percentage error for EIM and PFIM (

quan...

TABLE F.2 Computational savings obtained by using full CTs to comput...

Appendix G

TABLE G.1 Maximum relative percentage error for EIM and PFIM (

quan...

TABLE G.2 Computational savings obtained by using full CTs to comput...

List of Illustrations

Chapter 1

Figure 1.1 Linear Regression fit to points in dimension

. Filled in circles...

Figure 1.2 Linear regression fit (denote by regression fit) to given data (d...

Figure 1.3 Left pane shows how the basic model of Linear Regression is not p...

Figure 1.4 Surface of loss function for basic Linear Regression.

Figure 1.5 Cross-section of cost function for DNN.

Figure 1.6 Probability density functions of normal distribution.

Figure 1.7 Figure showing underfitting and overfitting phenomenon. The plot ...

Figure 1.8 Figure showing how regression fit improves with regularisation. T...

Figure 1.9 Figure showing the different values for training and validation e...

Chapter 2

Figure 2.1 Diagram of a Perceptron.

Figure 2.2 Diagram of an artificial neuron.

Figure 2.3 Some of the most popular activation functions.

Figure 2.4 Biological neuron.

Figure 2.5 Artificial neural network with multidimensional output.

Figure 2.6 Artificial neural network with output of dimension 1.

Figure 2.7 A Deep Neural Net with input dimension

layers and the

-th l...

Figure 2.8 Forward pass in backpropagation.

Figure 2.9 Backward pass in backpropagation.

Figure 2.10 Surface representing the cost function for which gradient descen...

Figure 2.11 Level curves example for cost function. Arrows denote possible p...

Figure 2.12 Image represents the incoming data point. The neuron focuses on ...

Figure 2.13 Max pooling.

Chapter 3

Figure 3.1 Chebyshev polynomials from degree

to degree

Figure 3.2 Tensor of dimension 1. Grid given by balls. Values on grid points...

Figure 3.3 Exponential divergence from Runge function by polynomial interpol...

Figure 3.4 Chebyshev points in one dimension.

Figure 3.5 Chebyshev grid in two dimensions.

Figure 3.6 Chebyshev interpolants convergence to Runge function.

Figure 3.7 Chebyshev interpolants convergence error to Black-Scholes functio...

Figure 3.8 Chebyshev interpolant in dimension

Figure 3.9 Chebyshev interpolants error convergence to Black-Scholes functio...

Figure 3.10 Empirical error versus predicted error for Black-Scholes functio...

Figure 3.11 Empirical error versus predicted error for Black-Scholes functio...

Figure 3.12 Approximating Runge function with different grid distributions. ...

Figure 3.13 Comparison of errors obtained with equidistant tensors and CTs o...

Figure 3.14 CT evaluation in dimension 2.

Figure 3.15 Dashed curve is error obtained when evaluating with barycentric ...

Figure 3.16 Oscillations from the Gibbs phenomenon around a jump discontinui...

Chapter 4

Figure 4.1 Main steps of a typical risk calculation.

Figure 4.2 How the typical steps of a risk calculation are modified when usi...

Chapter 5

Figure 5.1 Autoencoder architecture.

Chapter 6

Figure 6.1 TT Tensor diagram.

Chapter 8

Figure 8.1 Path followed by the short rate within the space of swap rates. T...

Figure 8.2 Short rate direction at

inside the space of directions spanned ...

Chapter 10

Figure 10.1 Image of parametrisation

. The image has dimension

but sits i...

Chapter 12

Figure 12.1 Illustration of how a DNN funnels information in a forward pass....

Figure 12.2 Hybrid DNN and interpolation architecture.

Figure 12.3 Illustration of static data key features.

Figure 12.4 Illustration of learning process with static training set.

Figure 12.5 Illustration of learning process with dynamic training set.

Figure 12.6 The

-axis represents the number of learning iterations. The

-a...

Figure 12.7 The

-axis represents the number of learning iterations. The

-a...

Figure 12.8 Cost function versus number of learning iterations static traini...

Figure 12.9 Cost function versus number of learning iterations dynamic train...

Chapter 14

Figure 14.1 Comparison of how errors of approximation (logarithmic scale), o...

Figure 14.2 Which hybrid solution to use depending on the dimension of the p...

Chapter 15

Figure 15.1 Monte Carlo simulation showing time points in the future and mod...

Figure 15.2 PV profiles — at expectation and

th percentiles — for an IR swa...

Figure 15.3 PV profiles — at expectation and

th percentiles — for a barrier...

Figure 15.4 Noise distribution of the Monte Carlo type (i.e. original pricin...

Figure 15.5 PV profiles — at expectation and

th percentiles — for a cross-c...

Figure 15.6 PV profiles — at expectation and

th percentiles — for a Bermuda...

Figure 15.7 PV profiles — at expectation and

th percentiles — for a Bermuda...

Figure 15.8 PV profiles — at expectation and

th percentiles — for a Bermuda...

Figure 15.9 PV profiles — at expectation and

th percentiles — for a portfol...

Figure 15.10 Noise distribution of the Monte Carlo type of pricing function ...

Figure 15.11 PV profiles — at expectation and

th percentiles — for a Bermud...

Chapter 16

Figure 16.1 Comparison of convergence rates between revaluation grids on equ...

Figure 16.2 Illustration of function approximation via CT or via Taylor expa...

Figure 16.3 Portfolio of Swaps, slider configuration

. Top left: PCA dim.

Figure 16.4 This figure shows how the Orthogonal Chebyshev Slider relative E...

Figure 16.5 Portfolio of swaps, slider configuration

, PCA dimension

, eva...

Figure 16.6 Portfolio of swaptions, slider configuration

, on

-day liquidi...

Figure 16.7 This figure shows how the Orthogonal Chebyshev Slider relative E...

Figure 16.8 Portfolio of swaptions, slider configuration

, PCA dimension

,...

Figure 16.9 Portfolio of swaptions, slider configuration

, on

-day liquidi...

Figure 16.10 This figure shows how the Orthogonal Chebyshev Slider relative ...

Figure 16.11 Portfolio of swaptions, slider configuration

, PCA dimension

Figure 16.12 Daily rolling mean ratio over a period of 10 years for CT, line...

Figure 16.13 Daily rolling variance ratio over a period of 10 years for CT, ...

Chapter 17

Figure 17.1 Percentage relative errors of CTs for sensitivity to the first s...

Figure 17.2 Percentage relative errors of CTs for sensitivity to the USD/EUR...

Figure 17.3 IM profiles — expectation and

quantiles — for FX Swap obtained...

Figure 17.4 Noise distribution for the MC-based spread option pricing functi...

Figure 17.5 Percentage relative errors of CTs for the first spot. Histograms...

Figure 17.6 Equity delta margin profiles — at expectation and

quantiles — ...

Chapter 18

Figure 18.1 Average and maximum error of approximation heat map for full CTs...

Figure 18.2 Average and maximum error of approximation heat map for TT-forma...

Figure 18.3 Quantiles for the distribution of RMSE values obtained by calibr...

Figure 18.4 Average and maximum error of approximation heat map for CTs buil...

Figure 18.5 Distribution of RMSE values obtained by calibrating

synthetica...

Chapter 19

Figure 19.1 Left-hand pane shows first pivot point and both horizontal and v...

Figure 19.2 Left-hand pane shows the normalised call price (Equation 19.4) i...

Figure 19.3 Domain of the normalised call pricing function is split into fou...

Figure 19.4 Domains over which the tests were performed. Domain

is the sma...

Chapter 20

Figure 20.1 NII values for different input scenarios. Bars 1 to 8 are manual...

Figure 20.2 A 3D plot on the left pane showing the surface of NII in terms o...

Figure 20.3 MVA with a given counterparty, when one of the new payer swaps i...

Chapter 21

Figure 21.1 Illustration of a generic risk system with and without pricing c...

Chapter 23

Figure 23.1 The left pane shows the noisy spot-vol surface generated with a ...

Appendix B

Figure B.1 Chebyshev Tensor evaluation in dimension

Appendix C

Figure C.1 Reduction in the number of calls to pricing function through the ...

Appendix F

Figure F.1 DIM profiles — expectation and

quantiles — for a European Swapt...

Appendix G

Figure G.1 DIM profiles — expectation and

quantiles — for a European Swapt...

Guide

Cover Page

Title Page

Dedication

Acknowledgements

Foreword

Motivation and aim of this book

Table of Contents

Begin Reading

References

Index

WILEY END USER LICENSE AGREEMENT

Pages

iii

xvii

xix

xxi

xxiii

xxiv

xxv

xxvi

100

101

102

103

104

105

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

139

140

141

142

143

144

145

146

147

148

149

150

151

153

155

156

157

159

160

161

162

163

165

166

167

168

169

170

171

172

173

174

175

176

177

179

180

181

182

183

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

385

386

387

388

389

391

392

393

395

396

397

398

399

400

401

403

404

405

407

408

409

410

411

412

413

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

Founded in 1807, John Wiley & Sons is the oldest independent publishing company in the United States. With offices in North America, Europe, Australia and Asia, Wiley is globally committed to developing and marketing print and electronic products and services for our customers' professional and personal knowledge and understanding.

The Wiley Finance series contains books written specifically for finance and investment professionals as well as sophisticated individual investors and their financial advisors. Book topics range from portfolio management to e-commerce, risk management, financial engineering, valuation and financial instrument analysis, as well as much more.

For a list of available titles, visit our Web site at www.WileyFinance.com.

Machine Learning for Risk Calculations

A Practitioner's View

I. RUIZ

M. ZERON

Foreword by P. Karasinski

This edition first published 2021

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Ignacio Ruiz and Mariano Zeron to be identified as the authors of this work has been asserted in accordance with law.

Registered Office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Ruiz, Ignacio, 1972- author. | Laris, Mariano Zeron Medina, author.

Title: Machine learning for risk calculations : a practitioner's view / Ignacio Ruiz, Mariano Zeron Medina Laris.

Description: Hoboken, New Jersey : Wiley, [2022] | Includes index.

Identifiers: LCCN 2021036694 (print) | LCCN 2021036695 (ebook) | ISBN 9781119791386 (hardback) | ISBN 9781119791393 (adobe pdf) | ISBN 9781119791409 (epub)

Subjects: LCSH: Machine learning. | Financial risk management.

Classification: LCC Q325.5 .R855 2022 (print) | LCC Q325.5 (ebook) | DDC 332.10285/631—dc23

LC record available at https://lccn.loc.gov/2021036694

LC ebook record available at http://lccn.loc.gov/2021036695

Cover design: Wiley

To my sister Cristina, a beautiful soul around us, an inspiration in my life.

To my parents, for their unwavering support.

Acknowledgements

This book has been significantly benefited from the support and input of a whole range of individuals whom we want to thank. In particular, we would like to thank our friend Emilio Viúdez, who directly contributed to many of the chapters in this book. Most importantly, he has been an extraordinary companion in the journey of which this book is one of the results.

We would also like to thank (in no particular order of importance) Jesus Alonso, Andrew Aziz, Dimitra Bampou, Russell Barker, Assad Bouayoun, Juan Antonio Burgos, Paul Burnett, Pablo Cassatella, Justin Chan, Lucia Cipolina, Alex Daminoff, Matthew Dear, Piero Del Boca, Thomas Devereux, Alberto Elices, Eduardo Epperlein and his group at NOMURA, Andrew Green, Stephen Hancock, Brian Huge, Marc Jeannin, Akshay Jha, Paul Jones, Christian Kappen, Piotr Karasinski, Gordon Lee, Udit Mahajan, Navneet Mathur, Adolfo Montoro, Cesar Mora, Rubén Moral, Yacine Moulay-Rchid, Laura Müller, Stuart Neil, Maria Nogueiras, Yogi Patel, Jose María Pesquero, Maxim Petrashev, Carlos Rioja, Samir Saurav, Joaquín Seco, Naimish Shah, Anton Simanenka, Jono Simpson, Takis Sironis, John Sleath, Robert Smith, Theo Stampoulis, Lauri Tamminen, Alok Tiwari, Alessandro Vecci, Satya Vemireddy and Hernan Zúñiga for the time and patience they had with us in different occasions.

The reader can find further resources on the topics of this book at

mocaxintelligence.org

and on the YouTube channel

youtube.com/mocax

Foreword

I met Mariano and Ignacio at the Quantitative Finance 2019 Rome conference. Mariano gave a presentation on the joint work with Ignacio on the Chebyshev Tensors techniques for CVA pricing and FRTB capital. After the talk I went to speak to Mariano, as I had found the results he had shown quite remarkable. That was the beginning of my productive relationship with both of them.

The 2008 banking crisis changed profoundly the derivatives industry. Before it, the derivatives business was driven by the creation of exotic trades. After the crisis, the paradigm changed and computing the risks carried in the balance sheets became central to a degree not seen before. Being able to compute accurately these risk numbers in a timely and cost-effective manner is now the main driver of the business.

This has been achieved historically mostly by increasing the amount of hardware used for the computation in conjunction with only a few limited algorithmic solutions. This route has become increasingly less economical, as computational needs have further increased due to regulation and market demand. Mariano and Ignacio offer in this book a family of algorithmic solutions that substantially reduce the need for increased computational power, as well as solutions for some calculations that are very difficult to do without them.

This text applies the mathematics behind the Chebyshev Tensors combined with Deep Learning within the specific contexts of many of the risk calculations that banks, hedge funds and other financial institutions need to do on a constant basis, with the aim of reducing the computational demand of these calculations, while retaining the required accuracy. This is done in a robust manner, using as a starting point the mathematical properties of the techniques involved, but not forgetting that at times some well understood heuristics are needed to potentiate the applicability of the mathematical methods chosen.

This thinking process is applied to a number of practical applications presented in the final chapters of the book, ranging from counterparty credit risk (CCR), market risk, portfolio optimisation, and several others. The results presented in this book have the potential to be disruptive for the industry. I hope that the quantitative finance community will enjoy and benefit from the ideas put forward in this text.

Piotr Karasinski

Motivation and aim of this book

The world of risk analytics has had an ever growing demand for computing capacity since the early 2000s. When one of the book's authors started working in this field, he was asked to work on the CCR engine at Credit Suisse for the new Basel II regulation and IMM capital calculation. At the time, it was the latest big thing in the industry. The IMM-related calculations were one of the most (if not the most) complicated calculations the bank had done up to that point. A few hundred CPUs were bought and installed in a state-of-the-art grid computing farm. The belief was that such grid would be able to do any CCR calculation. However, it did not take much time for the team to realise that more computing power was needed to match the computational requirements of new calculations being requested. Over the years, we have experienced a world in which, regardless of how much computing power is available with the latest technologies, soon it is insufficient to meet the new demands and needs to be upgraded only a few years later.

Indeed, the world of banking, and in particular the business of derivatives, has become a technology race (like many other industries, it must be said). As P. Karasinski says in his Foreword to this book, it used to be about creating and selling the new exotic product. Now it is about computing prices and increasingly sophisticated risk metrics in a prompt and efficient manner — partly as a result of regulations that have become more stringent since the 2008 crisis, partly as a result of the higher standards for risk management the industry has developed. That is where the differentiation between different broker-dealers resides and where the source of profitability lies at present.

Until recently, the computational cost associated with the calculation of risk numbers has been mostly addressed by throwing brute computing capacity at it, that is, buying more and better hardware. It is known that many tier-one banks have farms of several tens of thousand of CPUs and GPUs. Also banks are now leasing cloud computing capability from external vendors. This is, of course, at a considerable cost, which needs to be managed. Obviously, this cannot continue increasing forever without denting the profitability of the business.

Part of the reason why financial institutions have opted for more hardware is down to Moore's second law that states that the computing capacity of transistor chips per dollar of capital expenditure grows exponentially.1 This has certainly been true up to recently. However, this increase in computing capacity was driven by the constant miniaturisation of the basic elements of chips (semiconductor transistors, magnetic memory bits, etc.). Now that we are reaching the 10 nanometer range for semiconductor transistors, the rate of growth stated in Moore's Laws is stalling in commercial computers. This is illustrated by the fact that 10 years ago, the processing capability of a new computer was massively superior compared to the processing capacity of computers only a few years before; at present, it is only marginally better. The reason is that the size of one atom is, roughly, around 0.1 nanometers, and when we decrease the size of transistors below a few tens of nanometers, quantum effects start to appear and temperature becomes a problem. The subject of quantum computing — a most interesting topic — is well outside the scope of this book, but the reality is that, for now and for the foreseeable future, hardware will only be able to offer limited increased computing capacity. As a result, the paradigm has changed from creating more computing power via hardware to developing algorithmic solutions that optimise calculations.

In parallel, there has been a lot of work done by the quantitative analytics community to create algorithmic methods that accelerate the calculations and decrease their hardware need. A notorious example has been the family of Adjoint Algorithmic Differentiation (AAD) solutions in the world of XVA pricing, which in its general version can compute as many XVA sensitivities as needed with the added cost of (roughly) 10 XVA pricing runs. Seen from the perspective of the times when computing one CVA run for a few netting sets was already a challenge, this improvement is remarkable. However, it comes at a considerable price: the implementation effort is most significant. This is particularly the case if one already has a functioning XVA platform and wants to adapt it to AAD. This task can be so daunting that many banks do not consider it a viable option.

This book is based on the belief that the optimal solution to many of the computational challenges in finance lies in the union of algorithmic solutions, and appropriate software implementation of these, run on powerful hardware. The aim of this book is to review how some numerical mathematical methods, when applied thoughtfully, taking into account the specific characteristics of the calculations we want to improve, can create substantial computational enhancements. Indeed, the book is the direct result of the experience the authors have had, over the past few years, while trying to solve difficult calculations (sometimes seemingly impossible) in real-life settings within financial institutions.

The solutions proposed throughout this book apply mainly to existing risk engines within operating financial institutions. Some of these risk engines have been developed over many years by different business units and with different goals in mind. This has produced, in many cases, an amalgamation of risk engines that is suboptimal from an efficiency standpoint. Although starting new engines from scratch may correct the shortcomings of the legacy systems, this requires not only a lot of time and money but also enormous projects in many cases. In fact, a number of banks have reportedly started and stopped the development of global pricing and risk systems from the ground up due to the scale of the job. Quite often, it makes more sense, from a practical perspective, to upgrade existing engines, improving what already exists, using the increasingly demanding business needs and regulatory environment as guidelines, instead of developing a new one. With this in mind, the solutions proposed in this book are highly pragmatic.

Also, we keep in mind that for a solution to be implemented, budgets need to be approved by someone usually high up in the pyramid. Therefore, small(ish) incremental changes with tangible benefits are more likely to succeed than big open ambitious projects. However, to be noted, this does not mean that the solutions put forward in the book cannot be implemented in a system being built from the ground up; in fact, in some cases this would be the optimal approach. All we say is that having the option of incremental changes that are easy to manage is always a bonus to not lose sight of.

One of the common threads in all solutions discussed in the book is that they are grounded on mathematically robust results. Ideally, we would like everything to be based on solid theoretical frameworks. However, as the reader will soon learn, sometimes heuristic rules need to be used in conjunction with mathematical theories. The right combination of mathematical theories and heuristics, partly determined by the context of the problem (for example, the characteristics of the systems being used), is what delivers the most effective outcome. When such heuristic rules are used or discussed, we make the point clear, indicating its range of validity and limitations, so the quantitative analyst can make use of them safely.

Many of the computational problems that banks encounter are the result of having to evaluate a given function a large number of times under (only slightly) different inputs, together with the fact that such functions are costly to compute. Examples of these functions are Over-the-Counter derivative pricing functions, that need to be evaluated from several hundreds to millions of times in risk calculations. From a computational standpoint, these computations tend to be the bottleneck in risk calculations. Our approach is to find a way to take advantage of the specifics in the risk calculation so that a very accurate and fast-to-compute replica of the pricing function can be generated. As a consequence, one computes the same risk metrics in practice, but more efficiently. Also, similar replication methods are applied to other computational challenges for model calibration, for example, leading to significant improvements, too. Furthermore, the techniques presented in this book open the door for a new family of computations that, without them, seem impossible to achieve in many cases, like balance sheet optimisations.

BOOK OUTLINE

As just said, the solutions discussed in this book are rooted in identifying computationally expensive functions to evaluate — which create computational bottlenecks in calculations — and creating replicas of these problematic functions that can be efficiently computed while at the same time giving essentially the same results.

We start off in Part I with a general overview of Machine Learning techniques. Then we focus on two of the most effective methods to replicate functions: Deep Neural Nets (DNNs) and Chebyshev Tensors (CTs). In mathematical terms, we delve into function approximation because the goal is to create a mathematical object, which comes with a computational architecture, that closely approximates the original function. In our case, we look for techniques that deliver replicas that can be evaluated substantially faster than the function they approximate and that can be calibrated with reasonable computational effort. At this point, discussions are mostly theoretical and few comments are made regarding their applications. The goal is to provide a solid mathematical background we can leverage from subsequent chapters.

Once the fundamental approximation methods have been established, we present, in Part II, a number of tools that will enable the optimal utilisation of the approximation methods in the applications of interest. We see these tools as the equivalent of nuts, bolts and spanners used to assemble the different components of a car: essential tools without which we cannot build the vehicle. A number of mathematical and computational tools will be discussed, these being the Composition Technique, Tensor Extension Algorithms, Sliding Techniques and Jacobian Techniques.

Then, Part III explains how the approximation methods from Part I and the tools in Part II can be combined to create solutions. In particular, we focus on how to use the toolkit with DNNs and with CTs, as well as how to use DNNs and CTs together in order to achieve a hybrid approximation method.

Following all the previous discussions, the book comes to life in Part IV. In it, the theoretical solutions are applied to real computational problems that financial institutions face. We cover the fundamental calculations in CCR (XVAs, IMM capital, PFE); XVA sensitivities (hedging and CVA capital); Market Risk (VaR and FRTB); dynamic simulation of portfolio sensitivities in a Monte Carlo simulation — with a special focus on its application to the simulation of Initial Margin (XVAs, IMM capital, PFE and CVA capital); we discuss computational techniques that enable the efficient calibration of sophisticated pricing models (Front Office pricer calibration); we also see how the fundamental approximation techniques can be used in the context of implied volatility evaluation (ultra-fast computations); stable computations of sensitivities for exotic derivatives is covered (Front Office hedging); we also discuss how to use the techniques presented in the previous part of the book in the context of balance sheet and portfolio optimisation problems (profitability maximisation); and we elaborate on an originally unintended but positive side effect of the methods: how a pricing function can be “cloned” from one IT system to another (IT systems interaction).

The different software packages used to generate the results presented (mainly) in Part IV are referenced in a chapter toward the end of the book, along with the websites where they can be downloaded from. The software package that implements most of the Chebyshev machinery used in this book, the MoCaX suit (developed by us), has dedicated sections in that chapter with examples of how to use it. We trust the readers will find it useful.

Some of the methods discussed in this book fall under the scope of a patent. At the time of this book going to press, the patent holders are happy to provide a license to anyone interested in using them. For further information, please contact the authors.

We hope that the quantitative community finds this book interesting and useful, and we encourage anyone working in this field to get in touch with us. We very much enjoy collaboration frameworks. We can be reached via LinkedIn, for example.

NOTE

Moore's first law relates to the capacity of processors. Moore's second law relates to the monetary cost of processors.

PART OneFundamental Approximation Methods

CHAPTER 1Machine Learning

The aim of this chapter is to present in the clearest possible manner the main concepts behind Machine Learning (ML) models. This will set a unified framework under which the approximation methods — which constitute the spearhead of the solutions used to tackle the computational problems in Part IV— are presented.

The main ideas presented in this chapter will be particularly relevant to Chapter 2, where DNNs are introduced. Without them, a good number of ideas in Chapter 2 will not be as easy to digest.

The chapter starts with a quick introduction to the field of ML. We then touch upon its history, briefly describe the main areas in ML and mention the applications we are most interested in.

Then we delve into the core of the chapter, which is the presentation of the main concepts underpinning most ML models. These will be treated either in relation to the concept of training and predicting with an ML model — considering both the frequentist and the Bayesian approach — or in relation to the idea of model complexity.

Along the way, we will use the standard Linear Model to introduce and illustrate the main concepts. Despite its simplicity, the standard Linear Model shares the key ML concepts with most other models. It therefore makes sense to use it as a guiding thread.

1.1 INTRODUCTION TO MACHINE LEARNING

Artificial Intelligence (AI), the field that studies the intelligence of machines — as opposed to natural intelligence, which is displayed by humans and animals — has been one of the most successful and thriving areas of study in the last few decades. Among its many branches, ML is the one that is concerned with algorithms that automatically improve through experience. This is a fundamental component of AI, as it enables learning from the structures and patterns of data, allowing non-human agents to make decisions.

An often-quoted formal definition of ML is the following:

“A computer program is said to learn from experience with respect to some class of tasks and performance measure if its performance at tasks in , as measured by , improves with experience ” ([57]).

Intuitively speaking, this says that ML consists of a collection of methods and algorithms that automatically extract patterns from data with the purpose of performing predictive tasks.

1.1.1 A brief history of Machine Learning Methods

Even though the term Machine Learning was coined in 1959 by Arthur Samuel — a leading figure in the fields of computer gaming and AI — some of the most basic and common ML algorithms predate the second half of the 20th century, the period with which ML is normally associated.

For example, the origins of Linear Regression can be traced to the beginning of the 19th century through the works of Legendre and Gauss ([70]). This technique was designed to determine the orbit of bodies around the sun by using astronomical data. Calculations would have been made by hand. Also, the amount of data collected would have been small (minuscule by today's standards). Yet, the technique proved successful even under these circumstances.

Other ML models were also developed decades before the advent of computers. For example, Principal Component Analysis (PCA), a well-known dimensionality reduction technique — used on a regular basis still today — was first developed in 1901 by Karl Pearson.

Also, the main ideas underpinning decision trees — the main constituent in random forests, one of the most powerful ML models — have existed, in some form or another, for centuries. Clustering techniques — a common technique employed in ML and data science these days — were used in psychology and anthropology as early as the 1930s.1

However, the models just mentioned did not develop into the form they have today until the second half of the 20th century. There were two key elements that helped change the landscape. The first was an unprecedented increase, throughout the 20th century, in the sophistication of statistical modelling. The second and probably most fundamental was the development of the computer as we know it today in the 1950s.

Before the advent of the computer, thinking of asking a machine to perform the tasks we nowadays do on a regular basis would have been unthinkable to the vast majority of scientists. Having access to a computer meant that computations with data sets could take place in a very short period of time. Also, it enabled researchers to consider larger and more complex data sets. This, alongside a larger appetite for more complex statistical models, led to the enhancement of old models and the creation of new.

For example, all sorts of bells and whistles were added to the 19th century version of Linear Regression to make it much more flexible, robust and capable of learning from non-linear data. Also, decision trees gave rise to random forests. At the same time, new models — some of the most powerful in ML today — were developed, such as support vector machines and Neural Nets.

Linear Regression still relevant today

It is worth mentioning that, despite the large number of new models developed over the last few decades, the Linear Model — as mentioned before, one of the oldest — is still used with great success to this day. Moreover, it has been one of the main building blocks for other, more sophisticated models. Despite its simplicity, it shares its main characteristics with pretty much all other ML models. As such, it will be, in coming sections, the example we use to illustrate the main facets of ML models and in particular those of DNNs.

In the first few years, even decades, of the ML era, the range of applications was limited. This was not just due to the simplicity of the models and the small number of people working with them but also to the fact that computers were only found in specialised research centres. As computers have become more powerful and ubiquitous — not only in a good number of industries but also as a tool for personal use — the range and number of applications has grown substantially.

Nowadays, Machine Learning models are used in a wide range of applications: in forecasting, such as in weather and stock market prediction; in anomaly detection, for example, fraud detection; for classification, for example, to identify patients with specific medical conditions; for ranking tasks, as search engines do when recommending websites; for summarising, for example, sentiment analysis in social media; for decision-making in robotics. And the list goes on.

1.1.2 Main sub-categories in Machine Learning

As was mentioned in Section 1.1, ML consists of a set of models that automatically learn patterns from data and use these patterns to perform some task. These methods are typically divided into three sub-categories: supervised learning, unsupervised learning and reinforcement learning.

Supervised learning

Models and algorithms that fall into the category of supervised learning are those that learn from input and output data. Denote the input data by . This consists of a set of vectors , where each vector , for , lives in , for some . The variables in these vectors are typically called features or attributes of the set . They can be discrete or continuous. The output data, denoted by , consists of a set of vectors — typically real values — that represent a response or target variable. Again, this variable can be discrete or continuous.

Each element in pairs up with . The data used to train the algorithm therefore consists of data points . The model learns patterns from subject to this pairing; the idea being that once these patterns are learnt, the model can, with a high degree of accuracy, assign a value to any new data point .

One of the implicit assumptions in supervised learning is that the features from the data set are powerful enough to the target variable . For example, if we want to predict whether it will rain we should consider features that are related to the chance of raining. One should not choose the amount of oil being extracted in Saudi Arabia as a feature to predict the chance of rain in Buenos Aires, but rather measurements such as atmospheric pressure, humidity and wind around Buenos Aires.

There is a vast number of situations where supervised learning has been applied over the years. We make use of it on a regular basis in things such as email spam detection, image and speech recognition, fraud detection, weather prediction, medical diagnosing and so on.

Unsupervised learning

By contrast, in unsupervised learning the data set consists of only the input set . The goal is to find patterns in that are not subject to a target variable. This is, of course, a problem that opens up a range of options in terms of how one should train the algorithms.

Although not used as often as supervised learning, the range of applications nowadays is anyway vast. It is used for clustering, anomaly detection, information compression, density estimation and latent variable learning. These techniques can help assign labels to data that are otherwise unlabelled. It can also help reduce the size of the feature space in data sets, making it more portable and in some cases reducing the complexity of the data set (something that can make supervised algorithms perform better).

There is a type of unsupervised learning algorithm that is of particular relevance to this book. This encompasses all dimensionality reduction algorithms, such as Principal Component Analysis. Not only have they been used in finance for a long time and in a wide range of cases, but they constitute an important part of the Sliding Technique presented in Chapter 7.

Reinforcement learning

The third main sub-category in ML is reinforcement learning. This field deals with the ways in which a software agent ought to make decisions within an environment, where the aim is to maximise a predefined notion of gain or reward. Under this ML paradigm, there is no input/output data available to train from the beginning. The model learns through interaction with the environment. The data from which it learns are constantly generated and depend on the model used. Reinforcement learning models have become popular in applications such as self-driving cars, reducing energy costs in different industries, and famously helping train a machine to beat the number one player in the world at Go.2 It also has become important in hyper-parameter optimisation, a very important aspect that comes up when working in real-life applications with many ML models. We discuss this topic in more detail in Section 1.4.3.

1.1.3 Applications of interest

The applications of interest in this book are presented in Part IV. These essentially consist of resolving the computational bottleneck associated with the repeated call of functions in risk calculations. We explain the specifics of applications to CCR, market risk, model calibration, balance sheet optimisation, volatility surfaces and risk metric optimisation exercises. Although the main ML model used in recent years to tackle these problems is based on DNNs, more general (and basic) ML models have been used for a long time to tackle closely related problems.

Linear regression has been used for years in many areas of finance. In fact, one of the most popular techniques in the last 20 years, used to speed up the pricing of exotic products and the computation of risk calculations, is Longstaff-Schwartz (least-squares Monte Carlo), which essentially relies on the repeated application of Linear Regression in a Monte Carlo simulation. This was first presented in [52] and opened up new avenues of research. Alongside Linear Regression, dimensionality reduction techniques, such as PCA, have been used for a long time in many areas of finance. People often think of ML as a set of techniques recently developed, involving much more complex and expensive algorithms than the ones underpinning Linear Regression and PCA. However, both Linear Regression and PCA perfectly satisfy the conditions to be ML algorithms.

For a long time, the ML models and algorithms used in finance were of the simpler kind. But with the advent of powerful computing capabilities, a range of well implemented and easy to use DNNs packages in the most common programming languages, practitioners have begun to increasingly use DNNs.

In particular, risk calculations, risk optimisation and calibration of pricing functions — all exercises that demand large computational capabilities — stand to gain a lot from the use of approximating techniques. This book elaborates on two of the most powerful ones that we know of: DNNs and Chebyshev Tensors (CTs). Chapter 2 sets the theoretical framework for DNNs, and Chapter 3 does so for CTs.

Focusing on DNNs, there is a growing body of literature addressing the computational bottleneck of risk calculations using DNNs, examples of which are [25] in CCR and [41] in pricing function calibration. The core idea is to replace functions that are called thousands of times in a particular process by a DNN. Once the approximation has been achieved — through proper training — the DNN is used instead of the function. Because DNNs are fast to evaluate — unless they are too large, amounting to little more than simple linear algebra operations — the process that was once problematic from a computational point of view is reduced to a manageable computation.

1.2 THE LINEAR MODEL

This section presents the core elements of Linear Regression. Linear Regression (more generally, Linear Models) is very important in ML. Mathematically, it is one of the simplest and easiest to understand. A big advantage is that it can be solved analytically; meaning, it is easy to use and quick to deploy, features that many other models do not have. However, despite their simplicity, Linear Models are used on a regular basis in a wide range of contexts with success. Not only that, but Linear Models also constitute important modules or building blocks for more complex models. Finally — and of particular importance for this chapter — the fundamental concepts and characteristics of a huge range of ML models can be found in the Linear Model. As the latter is simpler, it makes sense to start there.

1.2.1 General concepts

Consider the following example. A real estate company is interested in having an effective way of pricing houses in different areas of a city. Assume they have access to the surface area of the properties and the average household income of each post code. Moreover, assume they have the price for of these properties. This means, in terms of input/output data , that they have input data , where represents a property, its corresponding surface area, and the average income of the post code where the property is located, and output data , where is the price of the -th property.

The question we ask is whether the data can be used to obtain a function with which we can predict the price of properties. In particular, can we obtain a function , such that given any pair , not necessarily in , — where is the surface area of the property and the average income of its post code — the value is a good proxy to the real price of the property? That is, can we learn the patterns present in

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben: