Financial Data Analytics with Machine Learning, Optimization and Statistics - Sam Chen - E-Book

Financial Data Analytics with Machine Learning, Optimization and Statistics E-Book

Sam Chen

0,0
50,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

An essential introduction to data analytics and Machine Learning techniques in the business sector

In Financial Data Analytics with Machine Learning, Optimization and Statistics, a team consisting of a distinguished applied mathematician and statistician, experienced actuarial professionals and working data analysts delivers an expertly balanced combination of traditional financial statistics, effective machine learning tools, and mathematics. The book focuses on contemporary techniques used for data analytics in the financial sector and the insurance industry with an emphasis on mathematical understanding and statistical principles and connects them with common and practical financial problems. Each chapter is equipped with derivations and proofs—especially of key results—and includes several realistic examples which stem from common financial contexts. The computer algorithms in the book are implemented using Python and R, two of the most widely used programming languages for applied science and in academia and industry, so that readers can implement the relevant models and use the programs themselves.

The book begins with a brief introduction to basic sampling theory and the fundamentals of simulation techniques, followed by a comparison between R and Python. It then discusses statistical diagnosis for financial security data and introduces some common tools in financial forensics such as Benford's Law, Zipf's Law, and anomaly detection. The statistical estimation and Expectation-Maximization (EM) & Majorization-Minimization (MM) algorithms are also covered. The book next focuses on univariate and multivariate dynamic volatility and correlation forecasting, and emphasis is placed on the celebrated Kelly's formula, followed by a brief introduction to quantitative risk management and dependence modelling for extremal events. A practical topic on numerical finance for traditional option pricing and Greek computations immediately follows as well as other important topics in financial data-driven aspects, such as Principal Component Analysis (PCA) and recommender systems with their applications, as well as advanced regression learners such as kernel regression and logistic regression, with discussions on model assessment methods such as simple Receiver Operating Characteristic (ROC) curves and Area Under Curve (AUC) for typical classification problems.

The book then moves on to other commonly used machine learning tools like linear classifiers such as perceptrons and their generalization, the multilayered counterpart (MLP), Support Vector Machines (SVM), as well as Classification and Regression Trees (CART) and Random Forests. Subsequent chapters focus on linear Bayesian learning, including well-received credibility theory in actuarial science and functional kernel regression, and non-linear Bayesian learning, such as the Naïve Bayes classifier and the Comonotone-Independence Bayesian Classifier (CIBer) recently independently developed by the authors and used successfully in InsurTech.

After an in-depth discussion on cluster analyses such as K-means clustering and its inversion, the K-nearest neighbor (KNN) method, the book concludes by introducing some useful deep neural networks for FinTech, like the potential use of the Long-Short Term Memory model (LSTM) for stock price prediction.

This book can help readers become well-equipped with the following skills:

  • To evaluate financial and insurance data quality, and use the distilled knowledge obtained from the data after applying data analytic tools to make timely financial decisions
  • To apply effective data dimension reduction tools to enhance supervised learning
  • To describe and select suitable data analytic tools as introduced above for a given dataset depending upon classification or regression prediction purpose

The book covers the competencies tested by several professional examinations, such as the Predictive Analytics Exam offered by the Society of Actuaries, and the Institute and Faculty of Actuaries' Actuarial Statistics Exam.

Besides being an indispensable resource for senior undergraduate and graduate students taking courses in financial engineering, statistics, quantitative finance, risk management, actuarial science, data science, and mathematics for AI, Financial Data Analytics with Machine Learning, Optimization and Statistics also belongs in the libraries of aspiring and practicing quantitative analysts working in commercial and investment banking.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 895

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Copyright

Dedication

About the Authors

Foreword

Preface

Acknowledgements

Introduction

DEVELOPMENT OF FINANCIAL DATA ANALYTICS

ORGANIZATION OF THE BOOK

REFERENCES

NOTE

PART One: Data Cleansing and Analytical Models

CHAPTER 1: Mathematical and Statistical Preliminaries

1.1 RANDOM VECTOR

1.2 MATRIX THEORY

1.3 VECTORS AND MATRIX NORMS

1.4 COMMON PROBABILITY DISTRIBUTIONS

1.5 INTRODUCTORY BAYESIAN STATISTICS

REFERENCES

NOTES

CHAPTER 2: Introduction to Python and R

2.1 WHAT IS PYTHON?

2.2 WHAT IS R?

2.3 PACKAGE MANAGEMENT IN PYTHON AND R

2.4 BASIC OPERATIONS IN PYTHON AND R

2.5 ONE-WAY ANOVA AND TUKEY'S HSD FOR STOCK MARKET INDICES

REFERENCES

NOTES

CHAPTER 3: Statistical Diagnostics of Financial Data

3.1 NORMALITY ASSUMPTION FOR RELATIVE STOCK PRICE CHANGES

3.2 STUDENT'S -DISTRIBUTION FOR STOCK PRICE CHANGES

3.3 TESTING FOR MULTIVARIATE NORMALITY

3.4 SAMPLE CORRELATION MATRIX

3.5 EMPIRICAL PROPERTIES OF STOCK PRICES

3.A APPENDIX

REFERENCES

NOTE

CHAPTER 4: Financial Forensics

4.1 BENFORD'S LAW

4.2 SCALING INVARIANCE AND BENFORD'S LAW

4.3 BENFORD'S LAW IN BUSINESS REPORTS

4.4 BENFORD'S LAW IN GROWTH FIGURES

4.5 ZIPF'S LAW

4.6 ZIPF'S LAW AND COVID-19 FIGURES

4.A APPENDIX

REFERENCES

NOTES

CHAPTER 5: Numerical Finance

5.1 FUNDAMENTALS OF SIMULATION

5.2 VARIANCE REDUCTION TECHNIQUE

5.3 A REVIEW OF FINANCIAL CALCULUS AND DERIVATIVE PRICING

*5.4 GREEKS AND THEIR APPROXIMATIONS

REFERENCES

NOTES

CHAPTER 6: Approximation for Model Inference

6.1 EM ALGORITHM

6.2 MM ALGORITHM

*6.3 A SHORT COURSE ON THE THEORY OF MARKOV CHAINS

*6.4 MARKOV CHAIN MONTE CARLO

*6.A APPENDIX

REFERENCES

NOTES

CHAPTER 7: Time-Varying Volatility Matrix and Kelly Fraction

7.1 FLUCTUATION OF VOLATILITIES

7.2 EXPONENTIALLY WEIGHTED MOVING AVERAGE

7.3 ARIMA TIME SERIES MODEL

7.4 ARCH AND GARCH MODELS

*7.5 KELLY FRACTION

7.6 CALENDAR EFFECTS

*7.A APPENDIX

REFERENCES

NOTES

CHAPTER 8: Risk Measures, Extreme Values, and Copulae

8.1 VALUE-AT-RISK AND EXPECTED SHORTFALL

8.2 BASEL ACCORDS AND RISK MEASURES

8.3 HISTORICAL SIMULATION (BOOTSTRAPPING)

8.4 STATISTICAL MODEL BUILDING APPROACH

8.5 USE OF EXTREME VALUE THEORY

8.6 BACKTESTING

8.7 ESTIMATES OF EXPECTED SHORTFALL

8.8 DEPENDENCE MODELLING VIA COPULAE

*8.A APPENDIX

REFERENCES

NOTES

PART Two: Linear Models

CHAPTER 9: Principal Component Analysis and Recommender Systems

9.1 US ZERO‐COUPON RATES

9.2 PCA ALGORITHM

9.3 FINANCIAL INTERPRETATION OF PCS FOR US ZERO‐COUPON RATES

9.4 PCA AS AN EIGENVALUE PROBLEM

9.5 FACTOR MODELS VIA PCA

9.6 VALUE‐AT‐RISK VIA PCA

9.7 PORTFOLIO IMMUNIZATION

9.8 FACIAL RECOGNITION VIA PCA

9.9 NON‐LIFE INSURANCE VIA PCA

9.10 INVESTMENT STRATEGIES USING PCA

*9.11 RECOMMENDER SYSTEM

*9.A APPENDIX

REFERENCES

NOTES

CHAPTER 10: Regression Learning

10.1 SIMPLE AND MULTIPLE LINEAR REGRESSION MODELS AND BEYOND

10.2 POLYNOMIAL REGRESSION

10.3 GENERALIZED LINEAR MODELS

10.4 LOGISTIC REGRESSION

10.5 POISSON REGRESSION

10.6 MODEL EVALUATION AND CONSIDERATIONS IN PRACTICE

*10.7 PRINCIPAL COMPONENT REGRESSION

*10.A APPENDIX

REFERENCES

NOTES

CHAPTER 11: Linear Classifiers

11.1 PERCEPTRON

11.2 SUPPORT VECTOR MACHINE

*11.A APPENDIX

REFERENCES

NOTES

PART Three: Nonlinear Models

CHAPTER 12: Bayesian Learning

12.1 SIMPLE CREDIBILITY THEORY

*12.2 BAYESIAN ASYMPTOTIC INFERENCE

12.3 REVISITING POLYNOMIAL REGRESSION

12.4 BAYESIAN CLASSIFIERS

12.5 COMONOTONE‐INDEPENDENCE BAYES CLASSIFIER (CIBER)

12.A APPENDIX

REFERENCES

NOTES

CHAPTER 13: Classification and Regression Trees, and Random Forests

13.1 CLASSIFICATION (DECISION) TREES

*13.2 CONCEPTS OF ENTROPIES

13.3 INFORMATION GAIN

13.4 OTHER IMPURITY MEASURES FOR INFORMATION

13.5 SPLITTING AGAINST CONTINUOUS ATTRIBUTES

13.6 OVERFITTING IN CLASSIFICATION TREE

13.7 CLASSIFICATION TREES IN PYTHON AND R

13.8 REGRESSION TREES

13.9 RANDOM FOREST

13.A APPENDIX

REFERENCES

NOTES

CHAPTER 14: Cluster Analysis

14.1 ‐MEANS CLUSTERING

14.2 ‐NEAREST NEIGHBOUR

*14.3 KERNEL REGRESSION

*14.A APPENDIX

REFERENCES

NOTES

CHAPTER 15: Applications of Deep Learning in Finance

15.1 HUMAN BRAINS AND ARTIFICIAL NEURONS

15.2 FEEDFORWARD NETWORK

15.3 ANN WITH LINEAR OUTPUTS

15.4 ANN WITH LOGISTIC OUTPUTS

15.5 ADAPTIVE LEARNING RATE

15.6 TRAINING NEURAL NETWORKS VIA BACKPROPAGATION

15.7 MULTILAYER PERCEPTRON

15.8 UNIVERSAL APPROXIMATION THEOREM

15.9 LONG SHORT‐TERM MEMORY (LSTM)

REFERENCES

NOTES

Postlude

Index

End User License Agreement

List of Tables

Chapter 4

TABLE 4.1 The expected frequencies of the leading digits of randomly select...

TABLE 4.2 When Benford analysis is likely useful?

TABLE 4.3 When Benford analysis is not likely useful?

TABLE 4.4 Suggested “five-digit test” -value criteria for the chi-square go...

TABLE 4.5 A summary table of Nigrini's “five-digit test” for US census data ...

TABLE 4.6 -values of chi-square goodness-of-fit tests of nine countries ove...

TABLE 4.7 Regional COVID-19 cases by country extracted from [20].

Chapter 5

TABLE 5.1 Sample means and sample variances of various variance reduction te...

TABLE 5.2 Sample means and sample variances of various variance reduction te...

TABLE 5.3 The estimated price of Asian call option in Example 5.7 in Python ...

TABLE 5.4 The estimated price (top) and standard deviation (bottom) of 100 s...

TABLE 5.5 The estimated price (top) and standard deviation (bottom) of 100 s...

TABLE 5.6 Commonly used Greeks. Note that vega is not the name of any Greek ...

Chapter 7

TABLE 7.1 Properties of ACF and PACF for ARMA models.

Chapter 8

TABLE 8.1 1986 BIS V@R Amendment from [5].

Chapter 9

TABLE 9.1 Money exposure of a bond portfolio.

TABLE 9.2 The 20 selected candidates that have the 10 smallest and 10 large...

Chapter 10

TABLE 10.1 Confusion matrix for a medical test.

TABLE 10.2 A confusion matrix of a test of a rare disease.

TABLE 10.3 The dataset of 20 labeled samples with the respective predicted ...

TABLE 10.4 A confusion matrix of a test using the dataset in Table 10.3.

TABLE 10.5 Confusion matrix for a test obtained by a random classification ...

Chapter 12

TABLE 12.1 The contingency table of a sample dataset with two categorical f...

TABLE 12.2 Proportions of the first five continuous features in the trainin...

TABLE 12.3 Proportions of the first five continuous features in the trainin...

TABLE 12.4 Frequencies of the sixth categorical variable in the training da...

Chapter 13

TABLE 13.1 15 credit default samples with Class N or Y as their label ; at...

TABLE 13.2 Number of elements in Groups 1 and 2 after splitting by either a...

TABLE 13.3 Choosing the threshold value for the attribute “Taxable Income” ...

TABLE 13.4 Total counts on the number of trials required for Algorithm 13.1...

Chapter 14

TABLE 14.1 The correspondence between the original groups and the labels as...

List of Illustrations

Introduction

FIGURE 1 GitHub repository for this book.

Chapter 1

FIGURE 1.1 Comparison between MLE (a) and Bayesian inference (b). MLE draws ...

FIGURE 1.2 Density of with (red solid), 5 (orange dashed), 10 (blue dott...

FIGURE 1.3 The posterior distribution .

FIGURE 1.4 Posterior distributions of with different values of .

FIGURE 1.5 Predictive distributions of for various values of via Bayesia...

Chapter 2

FIGURE 2.1 Typing “pip install tensorflow” in

Anaconda prompt

.

FIGURE 2.2 Function documentation can be shown when the cursor is placed ove...

FIGURE 2.3 Histograms of the daily logarithmic returns from 2005 to 2007 in

FIGURE 2.4 Histograms of the daily logarithmic returns from 2018 to 2020 in

FIGURE 2.5 Histograms of the daily logarithmic returns from 2005 to 2007 in ...

FIGURE 2.6 Histograms of the daily logarithmic returns from 2018 to 2020 in ...

FIGURE 2.7 Boxplot, meanplot, and Tukey HSD plot for three stock market indi...

FIGURE 2.8 Boxplot, meanplot, Tukey HSD plot for three stock market indices ...

FIGURE 2.9 Boxplot, meanplot, and Tukey HSD plot for three stock market indi...

FIGURE 2.10 Boxplot, meanplot, and Tukey HSD plot for three stock market ind...

Chapter 3

FIGURE 3.1 Plots of prices and returns of HSBC, CLP, and CK via Python, gene...

Figure 3.2 Histograms and Normal Q-Q plots of the returns of HSBC, CLP, and ...

FIGURE 3.3 Plots of prices and returns of HSBC, CLP, and CK's via

R

, generat...

Figure 3.4 Histograms and Normal Q-Q plots of returns of HSBC, CLP, and CK v...

FIGURE 3.5 Student's Q-Q plots for the returns of HSBC, CLP, and CK via Py...

FIGURE 3.6 Student's Q-Q plots for the returns of HSBC, CLP, and CK via

R

,...

FIGURE 3.7 Chi-square Q-Q plots for the returns of HSBC, CLP, and CK.

FIGURE 3.8 Correlation analysis on the returns of HSBC, CLP, and CK via Pyth...

FIGURE 3.9 Correlation analysis on the returns of HSBC, CLP, and CK via

R

.

FIGURE 3.10 Empirical properties of the prices and returns of HSBC, CLP, and...

FIGURE 3.11 Empirical properties of the prices and returns of HSBC, CLP, and...

FIGURE 3.12 ACF plots of the prices, returns and squared returns of HSBC, CL...

FIGURE 3.13 ACF plots of the prices, returns and squared returns of HSBC, CL...

Chapter 4

FIGURE 4.1 Summation over under the measure.

FIGURE 4.2 Histograms for the expected proportions of the set of all possibl...

FIGURE 4.3 Actual first digit frequencies vs those expected ones given by Be...

FIGURE 4.4 Actual second digit frequencies vs those expected given by Benfor...

FIGURE 4.5 Actual first-two digit frequencies vs those expected ones given b...

FIGURE 4.6 Actual first-three digit frequencies vs those expected ones given...

FIGURE 4.7 Actual last-two digit frequencies vs those expected ones given by...

FIGURE 4.8 Leading digit frequencies of nine countries over a period of 30 a...

FIGURE 4.9 Log-log plots of confirmed COVID-19 cases in Russia (black solid)...

FIGURE 4.10 Log-log plots of confirmed COVID-19 cases in each country.

Chapter 5

FIGURE 5.1 Histograms of the samples from pseudo generators for the uniform ...

FIGURE 5.2 Histograms of the samples from pseudo generators for the exponent...

FIGURE 5.3 Histograms of the samples from pseudo generators for normal distr...

FIGURE 5.4 Actual and simulated stock prices for HSBC (solid), CLP (dashed),...

FIGURE 5.5 Actual and simulated stock prices for HSBC (solid), CLP (dashed),...

FIGURE 5.6 The histograms of the sample of for for Euler–Maruyama (left)...

FIGURE 5.7 The histograms of the sample of for for Euler–Maruyama (left)...

FIGURE 5.8 Convergence of the pathwise differentiation method under a CEV mo...

FIGURE 5.9 Convergence of the pathwise differentiation method under a CEV mo...

FIGURE 5.10 Convergence of the pathwise differentiation method under a CEV m...

FIGURE 5.11 Convergence of the pathwise differentiation method under a CEV m...

FIGURE 5.12 Convergence of the pathwise differentiation method under a CEV m...

FIGURE 5.13 Convergence of the pathwise differentiation method under a CEV m...

FIGURE 5.14 Comparison of convergence of the pathwise differentiation and li...

FIGURE 5.15 Comparison of convergence of the pathwise differentiation and li...

Chapter 6

FIGURE 6.1 Illustration of the EM algorithm after the first two iterations....

FIGURE 6.2 A sample of handwritten digits from MNIST.

FIGURE 6.3 An artificial image (right) as the average of handwritten images ...

FIGURE 6.4 Examples of deficits in handwriting images.

FIGURE 6.5 MNIST with Bernoulli Mixture Model, generated by Programme 6.6 in...

FIGURE 6.6 Some examples of confusing handwritten digits.

Figure 6.7 MNIST with Bernoulli Mixture Model, generated by Programme 6.9 in...

FIGURE 6.8 Simulation paths and histograms for components of , generated in...

FIGURE 6.9 Simulation paths and histograms for components of , generated in...

FIGURE 6.10 Normal Q-Q plots of the standardized residuals from the SV model...

Chapter 7

FIGURE 7.1 Reproduced from Figures 3.1 and 3.3: Plots of returns of HSBC, CL...

FIGURE 7.2 90-day and 180-day simple moving standard deviations of HSBC arit...

FIGURE 7.3 90-day and 180-day simple moving standard deviations of HSBC arit...

FIGURE 7.4 Respective sample ACF and PACF plots of AR(2) and MA(2) models.

FIGURE 7.5 Plots of Bitcoin minute prices and standard deviations over 30-mi...

FIGURE 7.6 Plots of Bitcoin minute prices and standard deviations over 30-mi...

FIGURE 7.7 Sample ACF and PACF plots of Bitcoin prices in Python, generated ...

FIGURE 7.8 Sample ACF and PACF plots of Bitcoin prices in

R

, generated in Pr...

FIGURE 7.9 Plot of the residuals of the fitted ARIMA() model in Python, gen...

FIGURE 7.10 Plot of the residuals of the fitted ARIMA() model in

R

, generat...

FIGURE 7.11 Bitcoin price prediction with the fitted ARIMA() model on 31 De...

FIGURE 7.12 Bitcoin price prediction with the fitted ARIMA() model on 31 De...

FIGURE 7.13 Plots of model diagnostics for a GARCH() model fitted to HSBC r...

FIGURE 7.14 Plots of model diagnostics for a GARCH() model fitted to HSBC r...

FIGURE 7.15 Plots of GARCH() fitted volatilities, 90-day, and 180-day simpl...

FIGURE 7.16 Plots of GARCH() fitted volatilities, 90-day, and 180-day simpl...

FIGURE 7.17 Plots of model diagnostics for L-GARCH() fitted with HSBC retur...

FIGURE 7.18 Plots obtained in a DCC-GARCH() model in

R

, generated by Progra...

FIGURE 7.19 Illustration of the four cases of as in Propositions 7.1 and 7...

FIGURE 7.20 Changes of portfolio values in years 2003–2022, in terms of the ...

FIGURE 7.21 Bar chart of three stock indices based on the goodness index fro...

Chapter 8

FIGURE 8.1 Illustration of .

FIGURE 8.2 Illustration of , which equals the area of the blue region.

FIGURE 8.3 The location of Basel, Switzerland. Canton of Basel-Stadt map wit...

FIGURE 8.4 The fitted line of against using figures from Table 8.1.

FIGURE 8.5 Histogram of profit and loss during the last 250 days in Python, ...

FIGURE 8.6 Histogram of profit and loss during the last 250 days in

R

, gener...

FIGURE 8.7 Q-Q plots of the sample generated by pseudo observations of HSBC,...

FIGURE 8.8 Q-Q plots of the sample generated by pseudo observations of HSBC,...

FIGURE 8.9 Plots from the fitted Gaussian copula in Python, generated by Pro...

FIGURE 8.10 Plots from the fitted Gaussian copula in

R

, generated by Program...

FIGURE 8.11 Q-Q plots generated in Programme 8.22.

FIGURE 8.12 Q-Q plots generated in Programme 8.24.

FIGURE 8.13 Plots from the fitted -copula in Python, generated by Programme...

FIGURE 8.14 Plots from the fitted -copula via

R

, generated in Programme 8.2...

FIGURE 8.15 Q-Q plots of the empirical squared Mahalanobis distances against...

FIGURE 8.16 Q-Q plot and the plot of squared differences for the fitted Gaus...

FIGURE 8.17 Q-Q plot and the plot of squared differences for the fitted Gaus...

FIGURE 8.18 Q-Q plot and the plot of squared differences for the fitted Gaus...

FIGURE 8.19 Q-Q plot and the plot of squared differences for the fitted Gaus...

FIGURE 8.20 Scatter plots for bivariate risks by using different Archimedean...

FIGURE 8.21 Scatter plots for bivariate risks by using different Archimedean...

FIGURE 8.22 Bivariate Clayton copula: Fréchet Fréchet marginals with .

Figure 8.23 Bivariate Clayton copula: Fréchet Gumbel marginals with .

FIGURE 8.24 High-dimensional Clayton copula: Frechet Gumbel Weibull marg...

FIGURE 8.25 Scatter plots of some extreme-value copulae in

R

.

FIGURE 8.26 Scatter plots of some extreme-value copulae in

R

(continued).

FIGURE 8.27 Gumbel–Hougaard copula: Bivariate Fréchet cases; also see [11]....

Figure 8.28 Gumbel–Hougaard copula: Gumbel marginals; also see [11].

Figure 8.29 Gumbel–Hougaard copula: Weibull marginals; also see [11].

Chapter 9

FIGURE 9.1 Scree plots of the US rates example.

FIGURE 9.2 Loadings of 's for the first three principal components of the U...

FIGURE 9.3 Scatter plot for the first three principal components from the US...

FIGURE 9.4 Scatter plot for the first three principal components from the US...

FIGURE 9.5 Facial recognition for the actress

IU

.

FIGURE 9.6 Testing results of matching faces, generated by Programme 9.23 in...

FIGURE 9.7 Bar chart of the Euclidean distances of the projected PCs between...

FIGURE 9.8 Testing results of matching faces, generated by Programme 9.24 in...

FIGURE 9.9 Bar chart of the Euclidean distances of the projected PCs between...

FIGURE 9.10 Property risk analysis through image processing.

FIGURE 9.11 Biomedical image recognition is commonly used to determine wheth...

FIGURE 9.12 Enhancing visibility for biomedical images. Source: [4].

FIGURE 9.13 Housing insurance analysis through satellite geographical images...

FIGURE 9.14 Performance comparison of the three portfolios from April to Jun...

FIGURE 9.15

Netflix Prize

, October 2006. The rating matrix is large and sp...

FIGURE 9.16

Singular Value Decomposition

of the rating matrix .

Chapter 10

FIGURE 10.1 Neighborhoods of Boston (https://upload.wikimedia.org/wikipedia/...

FIGURE 10.2 The output graph from Programme 10.1; the black circle dots are ...

FIGURE 10.3 The output graph from Programme 10.2; the black dots are the tra...

FIGURE 10.4 The output of the linear regression model for

HSI

generated by P...

FIGURE 10.5 Scatter plot of the Mahalabonis distances, generated in Programm...

FIGURE 10.6 Photo of an Iris virginica flower; original picture from...

FIGURE 10.7 Using the cleansed dataset

d3

from Programme 10.11, lift charts ...

FIGURE 10.8 Illustration of TPR and FPR.

FIGURE 10.9 ROC curve for the data extracted from Table 10.3, where each poi...

FIGURE 10.10 ROC curves of two binary logistic regressions deployed to two d...

FIGURE 10.11 Evaluation of the performance of a classification algorithm usi...

FIGURE 10.12 Scree plot of the Boston housing training dataset in Python, ge...

FIGURE 10.13 Scree plot of the Boston housing training dataset in

R

, generat...

FIGURE 10.14 Causing singularity of for logistic regression as approachi...

Chapter 11

FIGURE 11.1 An example of a linear classifier for two‐dimensional feature ve...

FIGURE 11.2 Two yellow dots and two red crosses which cannot be classified b...

FIGURE 11.3 Perceptron: Category 1 (red crosses); Category 2 (yellow dots)

FIGURE 11.4 An update of the estimate for under gradient descent method at...

FIGURE 11.5 Some possible 's all of which satisfy sgn for all .

FIGURE 11.6 Performance of logistic regression represented by solid lines, a...

FIGURE 11.7 An example of an SVM model for two‐dimensional feature vectors....

FIGURE 11.8 Kernel trick: 2D to 3D transformation.

FIGURE 11.9 Applying quadratic kernel transformation to the data points gene...

FIGURE 11.10 SVM with RBF kernel resulting in different boundaries of the se...

FIGURE 11.11 Example of the existence of some outliers or noise in a dataset...

FIGURE 11.12 Plot of training and testing accuracies of the SVM models with ...

FIGURE 11.13 Plot of training and testing accuracies of the SVM models with ...

FIGURE 11.14 The MSE plot against parameter of the first three iterations ...

FIGURE 11.15 Gradient descent for SGD (dotted trace in grey), mini‐batch SGD...

FIGURE 11.16 A contour plot for Problem 11.25.

FIGURE 11.17 (a) A compact convex set; (b) A convex, but non‐compact, set; (...

FIGURE 11.18 The two paths of optimizations taken with respect to von Neuman...

FIGURE 11.19 Duality Gap

FIGURE 11.20 Value of the optimal portfolio and the S&P500 index in the test...

Chapter 12

FIGURE 12.1 Polynomial regression using Bayesian approach with various sampl...

FIGURE 12.2 Boxplot of ‐scores for all classifiers in the simulation study....

FIGURE 12.3 Boxplot of accuracies for all classifiers in the simulation stud...

FIGURE 12.4 Downloading CIBer package in Python via terminal.

FIGURE 12.5 GitHub for CIBer.

FIGURE 12.6 Class label distributions of bank churners (left) and default pr...

FIGURE 12.7 Boxplot of ‐score for each classifier in the bank churners data...

FIGURE 12.8 Boxplots of accuracies for all classifiers in the bank churners ...

FIGURE 12.9 Boxplots of recall rates for all classifiers in the bank churner...

FIGURE 12.10 Boxplots of ‐scores for all classifiers in the default premium...

FIGURE 12.11 Boxplots of accuracies for all classifiers in the default premi...

FIGURE 12.12 Boxplots of recall rates for all classifiers in the default pre...

Chapter 13

FIGURE 13.1 A classification tree showing criteria for choosing a Prince Cha...

FIGURE 13.2 An illustration for splitting tree with a sample at node by ...

FIGURE 13.3 Selection of attribute or based on information gain.

FIGURE 13.4 A graph of entropy, Gini‐index and misclassification error as ...

FIGURE 13.5 An illustration of trees with different numbers of terminal node...

FIGURE 13.6 Classification tree with five leaf nodes and four decision nodes...

FIGURE 13.7 Three possible trees formed from in Figure 13.6.

FIGURE 13.8 Classification trees for the 2002 data without removing outliers...

FIGURE 13.9 Plot of

ln_MV

versus

HSI

, with the horizontal line representin...

FIGURE 13.10 Classification tree for the iris flower dataset.

FIGURE 13.11 Plot of

Petal.Length

versus

Petal.width

, with the horizontal li...

FIGURE 13.12 An illustration of splitting at a parent node with a dataset ....

FIGURE 13.13 Regression tree splitting algorithm for one attribute.

FIGURE 13.14 Illustrations of a feasible partition of a space of feature vec...

FIGURE 13.15 A regression tree for the medical premium data via Python, gene...

FIGURE 13.16 A regression tree for the medical premium data via

R

, generated...

FIGURE 13.17 A graphical illustration of the random forest method.

FIGURE 13.18 Classification trees for the Taiwan credit default dataset.

FIGURE 13.19 An illustration of

Wordle

gameplay under normal mode. The targe...

Chapter 14

FIGURE 14.1 The illustration for the iterations with and ; the

bisectors

...

FIGURE 14.2 Scatter plots of pairwise components of observations for each gr...

FIGURE 14.3 Scatter plot matrix colored corresponding to the output from ‐m...

FIGURE 14.4 Boxplots for each feature variable in the cleansed 2002 HSI data...

FIGURE 14.5 An example of a two‐dimensional asymmetric sample.

FIGURE 14.6 Scatter plot for two variables (left), and boxplots for each fea...

FIGURE 14.7 Image segmentation using ‐means clustering.

FIGURE 14.8 Illustration of ‐NN classification, the newly added point is id...

FIGURE 14.9 Illustrations of three commonly adopted kernel functions.

FIGURE 14.10 Comparison of Nadaraya–Watson estimates and the true underlying...

FIGURE 14.11 Illustrations for , (colored circle) and the artificial clus...

Chapter 15

FIGURE 15.1 An illustration of the analogy between actual neurons and abstra...

FIGURE 15.2 An artificial neuron.

FIGURE 15.3 Commonly used activation functions for an axon.

FIGURE 15.4 Illustration of a 4‐3‐2 ANN.

FIGURE 15.5 Illustration of a 4‐2‐1 ANN.

FIGURE 15.6 The three solutions of four different initial seeds in the param...

FIGURE 15.7 Illustration of a 4‐2‐3 ANN.

FIGURE 15.8 Illustration of a 2‐2‐1 ANN.

FIGURE 15.9 Illustration of a multilayer perceptron. Each node is connected ...

FIGURE 15.10 A 10‐100‐100‐1 MLP model for stock price prediction.

FIGURE 15.11 Input and output for each neuron with weights assigned.

FIGURE 15.12 Predicting the day‐11 price for

Anglo American

(AAL),

Goldman S

...

FIGURE 15.13 Examples of sigmoid functions.

FIGURE 15.14 Illustration of approximating a function by simple functions.

FIGURE 15.15 Construction of a simple function.

FIGURE 15.16 A component in an ANN used to construct a rectangular partition...

FIGURE 15.17 A standard LSTM memory cell; also see [20].

FIGURE 15.18 Connections of cells in a LSTM model.

FIGURE 15.19 A two‐layer LSTM model with an equivalent

unfolded representati

...

FIGURE 15.20 LSTM backpropagation for the two‐layer model.

FIGURE 15.21 A 6‐8‐8‐1 LSTM model for Bitcoin price predictions.

FIGURE 15.22 Plot of Bitcoin price predictions via LSTM.

Guide

Cover

Table of Contents

Title Page

Copyright

Dedication

About the Authors

Foreword

Preface

Acknowledgements

Introduction

Begin Reading

Postlude

Index

End User License Agreement

Pages

i

ii

iii

iv

vi

vii

viii

ix

xvii

xviii

xix

xxi

xxii

xxiii

xxv

xxvi

1

2

3

4

5

6

7

9

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

569

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

630

631

632

633

634

635

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

661

662

663

664

665

666

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750

751

752

753

754

755

756

757

758

759

760

761

762

763

764

765

767

769

770

771

772

773

774

775

776

777

778

779

780

781

782

783

784

785

Praise for Financial Data Analytics

Really interesting, and an impressive masterpiece! Financial Data Analytics contains a rich amount of material, with original research findings in almost every chapter; many parts of the book will even be directly helpful for my own teaching in business school. In view of its dedication towards data-driven analytical tools genuinely needed in financial problems, I believe that it is the very book that defines the scope of financial data analytics.

—Alain Bensoussan, Fellow of AMS, IEEE, and SIAM; President of INRIA (1984–1996); President of CNES (Centre National d'Etudes Spatiales) (1996–2003); Chairman of ESA Council (European Space Agency) (1999–2002); Former Member of Advisory Board, Mathematical Finance; Lars Magnus Ericsson Chair Professor of Management, Naveen Jindal School of Management, University of Texas at Dallas

 

Financial Data Analytics provides a timely and thorough exploration of crucial topics in contemporary data science, specifically tailored for a quantitative finance audience. It skillfully balances introductory and advanced concepts, seamlessly integrating mathematical foundations with detailed coding examples. Designed to appeal to a broad audience, the content accommodates varying levels of familiarity with the mathematical and computational aspects of quantitative finance. The authors' adept presentation of complex ideas, coupled with practical applications, renders Financial Data Analytics an invaluable resource for both novice and seasoned professionals alike.

—KC Gary Chan, Fellow of ASA and IMS; President (Western North American Region), International Biometrics Society (2022); Professor, Department of Statistics, University of Washington

 

This book presents a wide coverage of state-of-the-art topics in data analytics, which are crucial in our current era of big data. Its organic blend of mathematical derivations of the theory and practical applications in FinTech and InsurTech via tailor-made implementable Python and R codes is exceptional. To given but a single example, based on my own research interests, the novel CIBer is a very interesting and original new tool. It is a joy to read Financial Data Analytics, a book that cannot be missed on the bookshelf of any researcher or student interested in this topic.

—Jan Dhaene, Full Professor, Director of Master of Science in Financial and Actuarial Engineering, and Head of Actuarial Research Group, Department of Accountancy, Finance and Insurance, Faculty of Business and Economics, KU Leuven; Head of Division “Actuariële Toepassingen voor Verzekerings-ondernemingen en Pensioenfondsbeheer”, KU Leuven Research and Development; Member of Institute of Actuaries of Belgium, Member and Vice-chair of Actuarial Education Network, International Actuarial Association

 

Financial Data Analytics is a fantastic book that offers rare gifts to industry practitioners. The important theories are brought to life through R and Python program codes, developed by the authors for the book, and easily adaptable for industry use. The book has comprehensive coverage of state-of-the-art techniques for every need. I like reading the practical applications, which help develop intuition for the more complicated methodologies, and surely someone can make a handsome profit implementing them. A super read, and a must-have for professionals if numbers rule your world.

—Kaiser Fung, Bestselling author, Numbers Rule Your World and Numbersense; Founding Director, MSc programme in Applied Analytics, Columbia University; Founder, Principal Analytics Prep

 

Financial Data Analytics is an exceptional book that integrates mathematics, practical examples, and real-life scenarios. With its focus on real datasets and practical programming codes in Python and R, the book offers a comprehensive exploration of various topics. It presents novel research findings and provides valuable insights for researchers, practitioners, and actuarial students. The book strikes a balance between foundational concepts and advanced techniques, making it an invaluable reference for professionals in the field. Additionally, its relevance extends to actuarial students preparing for their professional examinations. By redefining the landscape of financial data analytics in FinTech and InsurTech, this book establishes itself as a trusted guide in the industry.

—Simon Lam, Fellow of SOA, CFA, and FRM; President of The Actuarial Society of Hong Kong (2018, 2023); Deputy CEO & General Manager, Munich Re (Hong Kong)

 

The book will certainly play an impactful role in the advancement of financial analytics and should be on the bookshelf of every serious student of the topic.

—Wai Keung Li, Fellow of ASA and IMS; Emeritus Professor, The University of Hong Kong; Dean, Faculty of Liberal Arts and Social Sciences, The Education University of Hong Kong

 

Financial Data Analytics is a masterfully written book that encompasses a wide spectrum of statistical models and algorithms, with a special emphasis on financial and insurance applications. Drawing upon their multidisciplinary background and extensive research experience, as well as their close connection with the industry, the authors skillfully explain the theoretical underpinnings of both conventional and contemporary statistical methods that are truly relevant to the industry (including but not limited to regression learning, classification trees, neural networks, as well as the specification and assessment of these models), and amply illustrate the practical applications of these methods in various disciplines, by an abundance of real financial and insurance data, and using both Python and R. The dual focus on theory and applications, together with the discussion on recent advancements of the fields, makes the book one of a kind, even field-defining, among books on similar topics, and an ideal resource for anyone interested in understanding and implementing statistical models in this era of big data, as well as for students preparing for professional examinations on data analytics, such as the SRM, PA and ATPA exams of the Society of Actuaries.

—Ambrose Lo, Fellow of SOA, Chartered Enterprise Risk Analyst; Author of ACTEX Study Manual for SOA Exam SRM, ACTEX Study Manual for SOA Exam PA, and ACTEX Study Manual for SOA Exam ATPA

 

It is a tome!

—Suresh P. Sethi, Fellow of INFORMS, IEEE, POMS, and SIAM; Eugene McDermott Chair Professor of Operations Management, Naveen Jindal School of Management, University of Texas at Dallas

 

Financial Data Analytics is an encyclopedic documentation of in-depth and extensive statistical analysis in finance and beyond. It provides an end-to-end systematic approach to academics and practitioners with theories, tremendous examples and data, and algorithms with coding that are readily applicable in real life. The book encompasses four dimensions of coverage—theoretical framework to application and coding, distributional characteristics to data diagnosis and simulation, learning, and lastly coverage of both qualitative and quantitative data. The book consolidates classical knowledge with the most contemporary research in all subjects. Financial Data Analytics is one comprehensive biblical handbook for academic researchers, financial practitioners, and graduate students for both methodologies and applications. The book also lays a systematic framework for future extension and enrichment for financial data analytics.

—Nai-pan Tang, Former Chief Risk Officer and Member of Executive Committee, Hang Seng Bank; Former Deputy CEO and Chief Risk Officer, Shanghai Commercial Bank Ltd.; Former Director of the Board, Deputy CEO, Alternative CEO, Chief Risk Officer, and Vice Chairman of Asset Management, China CITIC Bank International; Director, The Hong Kong Institute of Bankers (2019–2021); Professor of Practice, Department of Finance, Chinese University of Hong Kong

 

Financial Data Analytics is a very impressive work with extensive coverage and many interesting topics. It stands out among similar books by the unique blend of detailed mathematical derivations, practical examples, real-life datasets, and readily available programme codes in Python and R. It also contains many novel results from the authors' recent research. Whether you are researchers on related fields, practitioners in the financial industry, or students preparing for a few exams at The Society of Actuaries or The Institute and Faculty of Actuaries, Financial Data Analytics will certainly be a valuable reference book.

—Hailiang Yang, Associate of SOA and Honorary Fellow of IFoA; Editor of Insurance: Mathematics and Economics; Professor, Department of Financial and Actuarial Mathematics, Xi'an Jiaotong–Liverpool University

 

Throughout my career at JP Morgan Chase and then CITIC Securities, I have participated in and witnessed how various data analytics tools revolutionized financial industry. Financial Data Analytics is a perfect example echoing this trend, by discussing a wide spectrum of modern tools in data analytics, from both a theoretical viewpoint and a practical aspect, with readily implementable programme codes in both Python and R for real-life examples. This combination is so unique amongst the few books on data analytics; it not only reflects the broad range of theoretical knowledge of the authors, but also demonstrates their close ties with the finance and insurance industries. I actually had the opportunity to testify some profit-making strategies mentioned in the book, and their performance was genuinely impressive. This book is far more than an academic monograph for scholars; it is certainly an illuminating guide for practitioners to explore their own alchemy of finance.

—Wei Zhou, Executive Director of Equity Derivatives Quantitative Research, JP Morgan Chase (2016–2021); Executive Director and Head of Quantitative Modelling, CITIC Securities

Founded in 1807, John Wiley & Sons is the oldest independent publishing company in the United States. With offices in North America, Europe, Australia and Asia, Wiley is globally committed to developing and marketing print and electronic products and services for our customers' professional and personal knowledge and understanding.

The Wiley Finance series contains books written specifically for finance and investment professionals as well as sophisticated individual investors and their financial advisors. Book topics range from portfolio management to e-commerce, risk management, financial engineering, valuation and financial instrument analysis, as well as much more.

For a list of available titles, visit our Web site at www.WileyFinance.com.

Financial Data Analytics

with Machine Learning, Optimization and Statistics

 

SAM CHEN

Hang Seng University of Hong Kong

KA CHUN CHEUNG

University of Hong Kong

PHILLIP YAM

Chinese University of Hong Kong

 

 

with programme codes by Kaiser Fan

 

 

 

 

 

This edition first published 2025

© 2025 by Sam Chen, Ka Chun Cheung, Phillip Yam.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Sam Chen, Ka Chun Cheung, Phillip Yam to be identified as the authors of this work has been asserted in accordance with law.

Registered Office(s)John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.Wiley also publishes its books in a variety of electronic formats and by print-on- demand. Some content that appears in standard print versions of this book may not be available in other formats. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data Is Available:

ISBN 9781119863373 (Cloth)ISBN 9781119863380 (ePDF)ISBN 9781119863397 (ePub)ISBN 9781119863403 (oBook)

Cover Design: WileyCover Image: © da-kuk/Getty Images

To our parents and families

About the Authors

Yongzhao Chen (Sam) received his BSc in Actuarial Science with first class honours and PhD in Actuarial Science from The University of Hong Kong. He is currently an Assistant Professor at the Department of Mathematics, Statistics and Insurance of the Hang Seng University of Hong Kong. His research interests include actuarial science, especially credibility theory, and data analytics.

Ka Chun Cheung received his BSc in Actuarial Science with first class honours and PhD from The University of Hong Kong. He was the Director of the Actuarial Science Programme, and is currently Head and full Professor at the Department of Statistics and Actuarial Science in School of Computing and Data Science, The University of Hong Kong. He is an Associate of the Society of Actuaries and an elected member of the International Statistical Institute. He is serving on the editorial boards of Insurance, Mathematics and Economics and Journal of Industrial and Management Optimization. His current research interests include various topics in actuarial science, including optimal reinsurance, stochastic orders, dependence structures, and extreme value theory.

Phillip Yam received his BSc in Actuarial Science with first class honours and MPhil from The University of Hong Kong. Supported by the two scholarships awarded by the Croucher Foundation (Hong Kong), he obtained an MASt (Master of Advanced Study) degree, Part III of the Mathematical Tripos, with Distinction in Mathematics from University of Cambridge and a DPhil in Mathematics from University of Oxford. During his postgraduate studies, he was awarded with the E. M. Burnett Prize in Mathematics from University of Cambridge, and the junior research fellowship from The Erwin Schrödinger International Institute for Mathematics and Physics of University of Vienna.

Phillip is currently the Co-Director of the Interdisciplinary Major Programme in Quantitative Finance and Risk Management Science, and a full Professor at the Department of Statistics of The Chinese University of Hong Kong (CUHK). He is also Assistant Dean (Education) of CUHK Faculty of Science, and Fellow of the Centre for Promoting Science Education in the Faculty. He has been appointed as a research fellow in the Hausdorff Research Institute for Mathematics at the University of Bonn and a Visiting Professor in both the Department of Statistics at Columbia University in the City of New York and Naveen Jindal of Management at University of Texas at Dallas. He has published about a hundred journal articles in actuarial science, applied mathematics, data analytics, engineering, financial mathematics, operations management, and statistics, and has also been serving in editorial boards of several journals in these fields. Together with Alain Bensoussan and Jens Frehse, he wrote the first monograph on mean field games and mean field type control theory. His research project with the title “Comonotone-independence Bayes Classifier (CIBer)” was awarded a Silver Medal in the 48th International Exhibition of Inventions Geneva in 2023. Besides academia, he has provided consulting services for various financial institutions and insurance companies, and established close connections in these industries; many of his students also work in international investment banking and insurance companies.

Kaiser Fan received his BSc in Risk Management Science with first class honours and MPhil from The Chinese University of Hong Kong under the guidance of Professor Phillip Yam. As a data scientist, his research interests include data analytics, and machine learning especially in deep learning. He contributes to the programming and many examples and illustrations in the book.

Foreword

To the memory of

Tze Leung Lai (1945–2023)

Late Ray Lyman Wilbur Professor of Statistics,

Stanford University

We were saddened to hear about the sudden passing away of Professor Tze Leung Lai. We would like to thank him for his care for the younger generation, including ourselves, as well as all of his valuable guidance in the past two decades, since Ka Chun's and Phillip's senior-year undergraduate and master studies at The University of Hong Kong; we all learned a lot from him, both indirectly or directly. He was certainly a renowned scholar. Due to the pandemic, we could not make the trip to visit him in person; we were hoping to meet him again last summer after the pandemic eventually came to an end, only to learn that he departed too soon. During the book writing process, we sent a draft version of the book to him. He was glad of what we had achieved and also graciously offered to write a foreword for this book, which can no longer become a reality now. However, this foreword is always reserved for him. We thank you again for your generous offer, Professor Lai; thank you, our mentor, may you rest in peace.

Winter, 2023

Preface

In the field of finance, nothing is more important than gaining profits, and any innovation that draws people's attention must lead to at least the same level of profit as the existing methods; indeed, this has always been the main driving force of updates to relevant curricula over time. For instance, with the development of option pricing and portfolio selection in 1970s, the financial training from mid-1980s to 2000s heavily involved Itô's stochastic calculus and partial differential equations. On the other hand, volatility models such as GARCH were proposed by Robert Engle in the early 1980s for a better estimation of parameters facilitating derivative valuation and portfolio management, and various academic curricula quickly followed suit by placing more emphasis on financial econometrics. Quantitative analysis of game theory, particularly the numerical algorithm for discovering equilibrium points, gained more importance and attention in academia after John Nash won the Nobel Memorial Prize in Economic Sciences in 1994; the trend continues today with further sophistication and generalization towards the context of mean field games in the last dozen years. In the 2000s, as behavioural finance was gaining increasing attention in society, people wanted to learn more methods in the realm of experimental behavioural economics and finance, especially on how the market makes use of statistical methods to understand the impact of different human behaviour and devise advertising strategies accordingly, which explains why case studies and primitive statistics have been prevalent in the financial classes in recent decades.

Recently, attention has been diverted towards AI. The revolutionary developments in machine learning and deep learning have brought new elements of data analytics into finance, particularly including the heated areas of InsurTech, FinTech and RegTech. To catch up with the trend, curriculum designs should be revised to cover financial or business data analytics in a comprehensive manner, and statistics is certainly at the core of them; this is precisely why we wanted to write this book. Among the few books in this field, involving the use of standard statistics in financial analysis with either Python or R, and statistical applications in financial engineering, focus is usually put on the possible financial applications of conventional statistical tools, yet some practical problems may require tools beyond traditional statistics, and we aim to address a few relevant issues in this book. Another important motivation for us is certainly the positive feedback from students regarding our teaching materials in the past decade, which we have consolidated as the foundation of this book.

This book investigates contemporary practical techniques of financial data analytics that are specific for real-life scenarios and leave room for a high profit-making potential, with 15 chapters in total covering a wide range of important and frontier topics in this field. For example, we shall explore data analytics in investment strategy, financial forensics, and the immediate use of deep learning in finance. We also critically discuss the pros and cons of machine learning tools. While we raise caution against potential pitfalls of new approaches like deep learning, we also propose a novel feature engineering scheme as part of CIBer (see Chapter 12) to overcome limitations of existing methods regarding input features, which also achieves a promising classification performance. Examples are provided throughout the whole book, in which we focus on a few typical datasets from real-life financial markets to facilitate intuitive comparisons among models, allowing readers to form their own judgements on their pros and cons, and hence apply suitable data analysis methods to their own datasets. Executable detailed programme codes in Python and R are also readily available with corresponding examples. Practitioners including quants and fund managers can gain insights into the latest developments of data analytics from the book, and help to formulate effective investment strategies or to facilitate better product designs. This up-to-date knowledge may further help them conduct novel applied research in different business disciplines. It is also our hope that senior-year students and postgraduates can deepen their understanding on this field and find the book useful for their future academic research. Meanwhile, the contents of this book also cover a large part of syllabi of modules from different public professional examinations on predictive analytics, including but not limited to the Statistics for Risk Modeling (SRM) Exam and Predictive Analytics (PA) Exam of the Society of Actuaries, making it a suitable main or supplementary reading for these professional examinations.

To benefit the most from this book, it is advisable that readers have a solid background in probability and statistics, linear algebra, and advanced calculus, preferably at the sophomore level. For some parts of the book, some basic knowledge in real and complex analysis would enhance a full understanding of them. Especially, some sections marked by asterisks necessitate a higher level of mathematical understanding and may be omitted during the initial reading. Anyhow, for the convenience of the readers, some of the relevant basic knowledge is reviewed in Chapter 1. Acquaintance with programming languages such as Python or R is also instrumental. To make the book more self-contained, a quick overview of these two programming languages is provided in Chapter 2. In addition, readers interested in more sophisticated investment strategies and derivative pricing also need a rudimentary background in Itô's stochastic calculus and continuous martingale theory, which is unavoidable given the technical nature of the subject matter.

In writing this book, our multidisciplinary background proved helpful; we all had diverse training in actuarial science, economics and finance, mathematics, probability and statistics, and our research also involves the application of data analytics in diverse applied areas. We have established long-term research collaborations with industry practitioners, and many of our undergraduate and postgraduate students, friends, and colleagues are also working in world-leading companies in finance and insurance sectors. Growing up in the traditional global financial hub of Hong Kong has also equipped us with practical financial knowledge that benefits our pragmatic research, while we are not bounded by the routine methods in solving both research and practical problems. In addition, our graduate student Kaiser Fan also made a unique contribution by implementing most programme codes in the examples in a tailor-made and illuminating fashion.

With the publication of this book, we welcome valuable feedback and comments from readers. Due to limitations in both scope and time, we could not delve into all topics in detail, and we apologize for any missing information. While we drew inspiration from a wide range of literature, and we tried to cite all of them, some may still have unintentionally slipped our mind over time, and we sincerely apologize for any oversights. We also benefited from courses on financial data analytics or equivalents, including teaching materials and course design, in renowned universities in Asia, Australia, Europe and North America.

While we believe our book offers a valuable collection of tools in financial data analytics, we deliberately left out blockchains, as they have shifted towards being an internet and network security concern rather than an analytical tool for financial information. Meanwhile, we have an upcoming book on deep learning with some applications in finance, to which we briefly hint in the closing chapter of this book. Hopefully readers will enjoy the current book and stay tuned for the upcoming release.

Sam Chen, Ka Chun Cheung, and Phillip YamHong Kong, December 2023

Acknowledgements

First and foremost, we would like to thank Wiley for their kind consideration of our book, as well as their assistance and patience in different stages of preparation, so that we could have more time to polish this highly original book.