50,99 €
An essential introduction to data analytics and Machine Learning techniques in the business sector
In Financial Data Analytics with Machine Learning, Optimization and Statistics, a team consisting of a distinguished applied mathematician and statistician, experienced actuarial professionals and working data analysts delivers an expertly balanced combination of traditional financial statistics, effective machine learning tools, and mathematics. The book focuses on contemporary techniques used for data analytics in the financial sector and the insurance industry with an emphasis on mathematical understanding and statistical principles and connects them with common and practical financial problems. Each chapter is equipped with derivations and proofs—especially of key results—and includes several realistic examples which stem from common financial contexts. The computer algorithms in the book are implemented using Python and R, two of the most widely used programming languages for applied science and in academia and industry, so that readers can implement the relevant models and use the programs themselves.
The book begins with a brief introduction to basic sampling theory and the fundamentals of simulation techniques, followed by a comparison between R and Python. It then discusses statistical diagnosis for financial security data and introduces some common tools in financial forensics such as Benford's Law, Zipf's Law, and anomaly detection. The statistical estimation and Expectation-Maximization (EM) & Majorization-Minimization (MM) algorithms are also covered. The book next focuses on univariate and multivariate dynamic volatility and correlation forecasting, and emphasis is placed on the celebrated Kelly's formula, followed by a brief introduction to quantitative risk management and dependence modelling for extremal events. A practical topic on numerical finance for traditional option pricing and Greek computations immediately follows as well as other important topics in financial data-driven aspects, such as Principal Component Analysis (PCA) and recommender systems with their applications, as well as advanced regression learners such as kernel regression and logistic regression, with discussions on model assessment methods such as simple Receiver Operating Characteristic (ROC) curves and Area Under Curve (AUC) for typical classification problems.
The book then moves on to other commonly used machine learning tools like linear classifiers such as perceptrons and their generalization, the multilayered counterpart (MLP), Support Vector Machines (SVM), as well as Classification and Regression Trees (CART) and Random Forests. Subsequent chapters focus on linear Bayesian learning, including well-received credibility theory in actuarial science and functional kernel regression, and non-linear Bayesian learning, such as the Naïve Bayes classifier and the Comonotone-Independence Bayesian Classifier (CIBer) recently independently developed by the authors and used successfully in InsurTech.
After an in-depth discussion on cluster analyses such as K-means clustering and its inversion, the K-nearest neighbor (KNN) method, the book concludes by introducing some useful deep neural networks for FinTech, like the potential use of the Long-Short Term Memory model (LSTM) for stock price prediction.
This book can help readers become well-equipped with the following skills:
The book covers the competencies tested by several professional examinations, such as the Predictive Analytics Exam offered by the Society of Actuaries, and the Institute and Faculty of Actuaries' Actuarial Statistics Exam.
Besides being an indispensable resource for senior undergraduate and graduate students taking courses in financial engineering, statistics, quantitative finance, risk management, actuarial science, data science, and mathematics for AI, Financial Data Analytics with Machine Learning, Optimization and Statistics also belongs in the libraries of aspiring and practicing quantitative analysts working in commercial and investment banking.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 895
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Title Page
Copyright
Dedication
About the Authors
Foreword
Preface
Acknowledgements
Introduction
DEVELOPMENT OF FINANCIAL DATA ANALYTICS
ORGANIZATION OF THE BOOK
REFERENCES
NOTE
PART One: Data Cleansing and Analytical Models
CHAPTER 1: Mathematical and Statistical Preliminaries
1.1 RANDOM VECTOR
1.2 MATRIX THEORY
1.3 VECTORS AND MATRIX NORMS
1.4 COMMON PROBABILITY DISTRIBUTIONS
1.5 INTRODUCTORY BAYESIAN STATISTICS
REFERENCES
NOTES
CHAPTER 2: Introduction to Python and R
2.1 WHAT IS PYTHON?
2.2 WHAT IS R?
2.3 PACKAGE MANAGEMENT IN PYTHON AND R
2.4 BASIC OPERATIONS IN PYTHON AND R
2.5 ONE-WAY ANOVA AND TUKEY'S HSD FOR STOCK MARKET INDICES
REFERENCES
NOTES
CHAPTER 3: Statistical Diagnostics of Financial Data
3.1 NORMALITY ASSUMPTION FOR RELATIVE STOCK PRICE CHANGES
3.2 STUDENT'S -DISTRIBUTION FOR STOCK PRICE CHANGES
3.3 TESTING FOR MULTIVARIATE NORMALITY
3.4 SAMPLE CORRELATION MATRIX
3.5 EMPIRICAL PROPERTIES OF STOCK PRICES
3.A APPENDIX
REFERENCES
NOTE
CHAPTER 4: Financial Forensics
4.1 BENFORD'S LAW
4.2 SCALING INVARIANCE AND BENFORD'S LAW
4.3 BENFORD'S LAW IN BUSINESS REPORTS
4.4 BENFORD'S LAW IN GROWTH FIGURES
4.5 ZIPF'S LAW
4.6 ZIPF'S LAW AND COVID-19 FIGURES
4.A APPENDIX
REFERENCES
NOTES
CHAPTER 5: Numerical Finance
5.1 FUNDAMENTALS OF SIMULATION
5.2 VARIANCE REDUCTION TECHNIQUE
5.3 A REVIEW OF FINANCIAL CALCULUS AND DERIVATIVE PRICING
*5.4 GREEKS AND THEIR APPROXIMATIONS
REFERENCES
NOTES
CHAPTER 6: Approximation for Model Inference
6.1 EM ALGORITHM
6.2 MM ALGORITHM
*6.3 A SHORT COURSE ON THE THEORY OF MARKOV CHAINS
*6.4 MARKOV CHAIN MONTE CARLO
*6.A APPENDIX
REFERENCES
NOTES
CHAPTER 7: Time-Varying Volatility Matrix and Kelly Fraction
7.1 FLUCTUATION OF VOLATILITIES
7.2 EXPONENTIALLY WEIGHTED MOVING AVERAGE
7.3 ARIMA TIME SERIES MODEL
7.4 ARCH AND GARCH MODELS
*7.5 KELLY FRACTION
7.6 CALENDAR EFFECTS
*7.A APPENDIX
REFERENCES
NOTES
CHAPTER 8: Risk Measures, Extreme Values, and Copulae
8.1 VALUE-AT-RISK AND EXPECTED SHORTFALL
8.2 BASEL ACCORDS AND RISK MEASURES
8.3 HISTORICAL SIMULATION (BOOTSTRAPPING)
8.4 STATISTICAL MODEL BUILDING APPROACH
8.5 USE OF EXTREME VALUE THEORY
8.6 BACKTESTING
8.7 ESTIMATES OF EXPECTED SHORTFALL
8.8 DEPENDENCE MODELLING VIA COPULAE
*8.A APPENDIX
REFERENCES
NOTES
PART Two: Linear Models
CHAPTER 9: Principal Component Analysis and Recommender Systems
9.1 US ZERO‐COUPON RATES
9.2 PCA ALGORITHM
9.3 FINANCIAL INTERPRETATION OF PCS FOR US ZERO‐COUPON RATES
9.4 PCA AS AN EIGENVALUE PROBLEM
9.5 FACTOR MODELS VIA PCA
9.6 VALUE‐AT‐RISK VIA PCA
9.7 PORTFOLIO IMMUNIZATION
9.8 FACIAL RECOGNITION VIA PCA
9.9 NON‐LIFE INSURANCE VIA PCA
9.10 INVESTMENT STRATEGIES USING PCA
*9.11 RECOMMENDER SYSTEM
*9.A APPENDIX
REFERENCES
NOTES
CHAPTER 10: Regression Learning
10.1 SIMPLE AND MULTIPLE LINEAR REGRESSION MODELS AND BEYOND
10.2 POLYNOMIAL REGRESSION
10.3 GENERALIZED LINEAR MODELS
10.4 LOGISTIC REGRESSION
10.5 POISSON REGRESSION
10.6 MODEL EVALUATION AND CONSIDERATIONS IN PRACTICE
*10.7 PRINCIPAL COMPONENT REGRESSION
*10.A APPENDIX
REFERENCES
NOTES
CHAPTER 11: Linear Classifiers
11.1 PERCEPTRON
11.2 SUPPORT VECTOR MACHINE
*11.A APPENDIX
REFERENCES
NOTES
PART Three: Nonlinear Models
CHAPTER 12: Bayesian Learning
12.1 SIMPLE CREDIBILITY THEORY
*12.2 BAYESIAN ASYMPTOTIC INFERENCE
12.3 REVISITING POLYNOMIAL REGRESSION
12.4 BAYESIAN CLASSIFIERS
12.5 COMONOTONE‐INDEPENDENCE BAYES CLASSIFIER (CIBER)
12.A APPENDIX
REFERENCES
NOTES
CHAPTER 13: Classification and Regression Trees, and Random Forests
13.1 CLASSIFICATION (DECISION) TREES
*13.2 CONCEPTS OF ENTROPIES
13.3 INFORMATION GAIN
13.4 OTHER IMPURITY MEASURES FOR INFORMATION
13.5 SPLITTING AGAINST CONTINUOUS ATTRIBUTES
13.6 OVERFITTING IN CLASSIFICATION TREE
13.7 CLASSIFICATION TREES IN PYTHON AND R
13.8 REGRESSION TREES
13.9 RANDOM FOREST
13.A APPENDIX
REFERENCES
NOTES
CHAPTER 14: Cluster Analysis
14.1 ‐MEANS CLUSTERING
14.2 ‐NEAREST NEIGHBOUR
*14.3 KERNEL REGRESSION
*14.A APPENDIX
REFERENCES
NOTES
CHAPTER 15: Applications of Deep Learning in Finance
15.1 HUMAN BRAINS AND ARTIFICIAL NEURONS
15.2 FEEDFORWARD NETWORK
15.3 ANN WITH LINEAR OUTPUTS
15.4 ANN WITH LOGISTIC OUTPUTS
15.5 ADAPTIVE LEARNING RATE
15.6 TRAINING NEURAL NETWORKS VIA BACKPROPAGATION
15.7 MULTILAYER PERCEPTRON
15.8 UNIVERSAL APPROXIMATION THEOREM
15.9 LONG SHORT‐TERM MEMORY (LSTM)
REFERENCES
NOTES
Postlude
Index
End User License Agreement
Chapter 4
TABLE 4.1 The expected frequencies of the leading digits of randomly select...
TABLE 4.2 When Benford analysis is likely useful?
TABLE 4.3 When Benford analysis is not likely useful?
TABLE 4.4 Suggested “five-digit test” -value criteria for the chi-square go...
TABLE 4.5 A summary table of Nigrini's “five-digit test” for US census data ...
TABLE 4.6 -values of chi-square goodness-of-fit tests of nine countries ove...
TABLE 4.7 Regional COVID-19 cases by country extracted from [20].
Chapter 5
TABLE 5.1 Sample means and sample variances of various variance reduction te...
TABLE 5.2 Sample means and sample variances of various variance reduction te...
TABLE 5.3 The estimated price of Asian call option in Example 5.7 in Python ...
TABLE 5.4 The estimated price (top) and standard deviation (bottom) of 100 s...
TABLE 5.5 The estimated price (top) and standard deviation (bottom) of 100 s...
TABLE 5.6 Commonly used Greeks. Note that vega is not the name of any Greek ...
Chapter 7
TABLE 7.1 Properties of ACF and PACF for ARMA models.
Chapter 8
TABLE 8.1 1986 BIS V@R Amendment from [5].
Chapter 9
TABLE 9.1 Money exposure of a bond portfolio.
TABLE 9.2 The 20 selected candidates that have the 10 smallest and 10 large...
Chapter 10
TABLE 10.1 Confusion matrix for a medical test.
TABLE 10.2 A confusion matrix of a test of a rare disease.
TABLE 10.3 The dataset of 20 labeled samples with the respective predicted ...
TABLE 10.4 A confusion matrix of a test using the dataset in Table 10.3.
TABLE 10.5 Confusion matrix for a test obtained by a random classification ...
Chapter 12
TABLE 12.1 The contingency table of a sample dataset with two categorical f...
TABLE 12.2 Proportions of the first five continuous features in the trainin...
TABLE 12.3 Proportions of the first five continuous features in the trainin...
TABLE 12.4 Frequencies of the sixth categorical variable in the training da...
Chapter 13
TABLE 13.1 15 credit default samples with Class N or Y as their label ; at...
TABLE 13.2 Number of elements in Groups 1 and 2 after splitting by either a...
TABLE 13.3 Choosing the threshold value for the attribute “Taxable Income” ...
TABLE 13.4 Total counts on the number of trials required for Algorithm 13.1...
Chapter 14
TABLE 14.1 The correspondence between the original groups and the labels as...
Introduction
FIGURE 1 GitHub repository for this book.
Chapter 1
FIGURE 1.1 Comparison between MLE (a) and Bayesian inference (b). MLE draws ...
FIGURE 1.2 Density of with (red solid), 5 (orange dashed), 10 (blue dott...
FIGURE 1.3 The posterior distribution .
FIGURE 1.4 Posterior distributions of with different values of .
FIGURE 1.5 Predictive distributions of for various values of via Bayesia...
Chapter 2
FIGURE 2.1 Typing “pip install tensorflow” in
Anaconda prompt
.
FIGURE 2.2 Function documentation can be shown when the cursor is placed ove...
FIGURE 2.3 Histograms of the daily logarithmic returns from 2005 to 2007 in
FIGURE 2.4 Histograms of the daily logarithmic returns from 2018 to 2020 in
FIGURE 2.5 Histograms of the daily logarithmic returns from 2005 to 2007 in ...
FIGURE 2.6 Histograms of the daily logarithmic returns from 2018 to 2020 in ...
FIGURE 2.7 Boxplot, meanplot, and Tukey HSD plot for three stock market indi...
FIGURE 2.8 Boxplot, meanplot, Tukey HSD plot for three stock market indices ...
FIGURE 2.9 Boxplot, meanplot, and Tukey HSD plot for three stock market indi...
FIGURE 2.10 Boxplot, meanplot, and Tukey HSD plot for three stock market ind...
Chapter 3
FIGURE 3.1 Plots of prices and returns of HSBC, CLP, and CK via Python, gene...
Figure 3.2 Histograms and Normal Q-Q plots of the returns of HSBC, CLP, and ...
FIGURE 3.3 Plots of prices and returns of HSBC, CLP, and CK's via
R
, generat...
Figure 3.4 Histograms and Normal Q-Q plots of returns of HSBC, CLP, and CK v...
FIGURE 3.5 Student's Q-Q plots for the returns of HSBC, CLP, and CK via Py...
FIGURE 3.6 Student's Q-Q plots for the returns of HSBC, CLP, and CK via
R
,...
FIGURE 3.7 Chi-square Q-Q plots for the returns of HSBC, CLP, and CK.
FIGURE 3.8 Correlation analysis on the returns of HSBC, CLP, and CK via Pyth...
FIGURE 3.9 Correlation analysis on the returns of HSBC, CLP, and CK via
R
.
FIGURE 3.10 Empirical properties of the prices and returns of HSBC, CLP, and...
FIGURE 3.11 Empirical properties of the prices and returns of HSBC, CLP, and...
FIGURE 3.12 ACF plots of the prices, returns and squared returns of HSBC, CL...
FIGURE 3.13 ACF plots of the prices, returns and squared returns of HSBC, CL...
Chapter 4
FIGURE 4.1 Summation over under the measure.
FIGURE 4.2 Histograms for the expected proportions of the set of all possibl...
FIGURE 4.3 Actual first digit frequencies vs those expected ones given by Be...
FIGURE 4.4 Actual second digit frequencies vs those expected given by Benfor...
FIGURE 4.5 Actual first-two digit frequencies vs those expected ones given b...
FIGURE 4.6 Actual first-three digit frequencies vs those expected ones given...
FIGURE 4.7 Actual last-two digit frequencies vs those expected ones given by...
FIGURE 4.8 Leading digit frequencies of nine countries over a period of 30 a...
FIGURE 4.9 Log-log plots of confirmed COVID-19 cases in Russia (black solid)...
FIGURE 4.10 Log-log plots of confirmed COVID-19 cases in each country.
Chapter 5
FIGURE 5.1 Histograms of the samples from pseudo generators for the uniform ...
FIGURE 5.2 Histograms of the samples from pseudo generators for the exponent...
FIGURE 5.3 Histograms of the samples from pseudo generators for normal distr...
FIGURE 5.4 Actual and simulated stock prices for HSBC (solid), CLP (dashed),...
FIGURE 5.5 Actual and simulated stock prices for HSBC (solid), CLP (dashed),...
FIGURE 5.6 The histograms of the sample of for for Euler–Maruyama (left)...
FIGURE 5.7 The histograms of the sample of for for Euler–Maruyama (left)...
FIGURE 5.8 Convergence of the pathwise differentiation method under a CEV mo...
FIGURE 5.9 Convergence of the pathwise differentiation method under a CEV mo...
FIGURE 5.10 Convergence of the pathwise differentiation method under a CEV m...
FIGURE 5.11 Convergence of the pathwise differentiation method under a CEV m...
FIGURE 5.12 Convergence of the pathwise differentiation method under a CEV m...
FIGURE 5.13 Convergence of the pathwise differentiation method under a CEV m...
FIGURE 5.14 Comparison of convergence of the pathwise differentiation and li...
FIGURE 5.15 Comparison of convergence of the pathwise differentiation and li...
Chapter 6
FIGURE 6.1 Illustration of the EM algorithm after the first two iterations....
FIGURE 6.2 A sample of handwritten digits from MNIST.
FIGURE 6.3 An artificial image (right) as the average of handwritten images ...
FIGURE 6.4 Examples of deficits in handwriting images.
FIGURE 6.5 MNIST with Bernoulli Mixture Model, generated by Programme 6.6 in...
FIGURE 6.6 Some examples of confusing handwritten digits.
Figure 6.7 MNIST with Bernoulli Mixture Model, generated by Programme 6.9 in...
FIGURE 6.8 Simulation paths and histograms for components of , generated in...
FIGURE 6.9 Simulation paths and histograms for components of , generated in...
FIGURE 6.10 Normal Q-Q plots of the standardized residuals from the SV model...
Chapter 7
FIGURE 7.1 Reproduced from Figures 3.1 and 3.3: Plots of returns of HSBC, CL...
FIGURE 7.2 90-day and 180-day simple moving standard deviations of HSBC arit...
FIGURE 7.3 90-day and 180-day simple moving standard deviations of HSBC arit...
FIGURE 7.4 Respective sample ACF and PACF plots of AR(2) and MA(2) models.
FIGURE 7.5 Plots of Bitcoin minute prices and standard deviations over 30-mi...
FIGURE 7.6 Plots of Bitcoin minute prices and standard deviations over 30-mi...
FIGURE 7.7 Sample ACF and PACF plots of Bitcoin prices in Python, generated ...
FIGURE 7.8 Sample ACF and PACF plots of Bitcoin prices in
R
, generated in Pr...
FIGURE 7.9 Plot of the residuals of the fitted ARIMA() model in Python, gen...
FIGURE 7.10 Plot of the residuals of the fitted ARIMA() model in
R
, generat...
FIGURE 7.11 Bitcoin price prediction with the fitted ARIMA() model on 31 De...
FIGURE 7.12 Bitcoin price prediction with the fitted ARIMA() model on 31 De...
FIGURE 7.13 Plots of model diagnostics for a GARCH() model fitted to HSBC r...
FIGURE 7.14 Plots of model diagnostics for a GARCH() model fitted to HSBC r...
FIGURE 7.15 Plots of GARCH() fitted volatilities, 90-day, and 180-day simpl...
FIGURE 7.16 Plots of GARCH() fitted volatilities, 90-day, and 180-day simpl...
FIGURE 7.17 Plots of model diagnostics for L-GARCH() fitted with HSBC retur...
FIGURE 7.18 Plots obtained in a DCC-GARCH() model in
R
, generated by Progra...
FIGURE 7.19 Illustration of the four cases of as in Propositions 7.1 and 7...
FIGURE 7.20 Changes of portfolio values in years 2003–2022, in terms of the ...
FIGURE 7.21 Bar chart of three stock indices based on the goodness index fro...
Chapter 8
FIGURE 8.1 Illustration of .
FIGURE 8.2 Illustration of , which equals the area of the blue region.
FIGURE 8.3 The location of Basel, Switzerland. Canton of Basel-Stadt map wit...
FIGURE 8.4 The fitted line of against using figures from Table 8.1.
FIGURE 8.5 Histogram of profit and loss during the last 250 days in Python, ...
FIGURE 8.6 Histogram of profit and loss during the last 250 days in
R
, gener...
FIGURE 8.7 Q-Q plots of the sample generated by pseudo observations of HSBC,...
FIGURE 8.8 Q-Q plots of the sample generated by pseudo observations of HSBC,...
FIGURE 8.9 Plots from the fitted Gaussian copula in Python, generated by Pro...
FIGURE 8.10 Plots from the fitted Gaussian copula in
R
, generated by Program...
FIGURE 8.11 Q-Q plots generated in Programme 8.22.
FIGURE 8.12 Q-Q plots generated in Programme 8.24.
FIGURE 8.13 Plots from the fitted -copula in Python, generated by Programme...
FIGURE 8.14 Plots from the fitted -copula via
R
, generated in Programme 8.2...
FIGURE 8.15 Q-Q plots of the empirical squared Mahalanobis distances against...
FIGURE 8.16 Q-Q plot and the plot of squared differences for the fitted Gaus...
FIGURE 8.17 Q-Q plot and the plot of squared differences for the fitted Gaus...
FIGURE 8.18 Q-Q plot and the plot of squared differences for the fitted Gaus...
FIGURE 8.19 Q-Q plot and the plot of squared differences for the fitted Gaus...
FIGURE 8.20 Scatter plots for bivariate risks by using different Archimedean...
FIGURE 8.21 Scatter plots for bivariate risks by using different Archimedean...
FIGURE 8.22 Bivariate Clayton copula: Fréchet Fréchet marginals with .
Figure 8.23 Bivariate Clayton copula: Fréchet Gumbel marginals with .
FIGURE 8.24 High-dimensional Clayton copula: Frechet Gumbel Weibull marg...
FIGURE 8.25 Scatter plots of some extreme-value copulae in
R
.
FIGURE 8.26 Scatter plots of some extreme-value copulae in
R
(continued).
FIGURE 8.27 Gumbel–Hougaard copula: Bivariate Fréchet cases; also see [11]....
Figure 8.28 Gumbel–Hougaard copula: Gumbel marginals; also see [11].
Figure 8.29 Gumbel–Hougaard copula: Weibull marginals; also see [11].
Chapter 9
FIGURE 9.1 Scree plots of the US rates example.
FIGURE 9.2 Loadings of 's for the first three principal components of the U...
FIGURE 9.3 Scatter plot for the first three principal components from the US...
FIGURE 9.4 Scatter plot for the first three principal components from the US...
FIGURE 9.5 Facial recognition for the actress
IU
.
FIGURE 9.6 Testing results of matching faces, generated by Programme 9.23 in...
FIGURE 9.7 Bar chart of the Euclidean distances of the projected PCs between...
FIGURE 9.8 Testing results of matching faces, generated by Programme 9.24 in...
FIGURE 9.9 Bar chart of the Euclidean distances of the projected PCs between...
FIGURE 9.10 Property risk analysis through image processing.
FIGURE 9.11 Biomedical image recognition is commonly used to determine wheth...
FIGURE 9.12 Enhancing visibility for biomedical images. Source: [4].
FIGURE 9.13 Housing insurance analysis through satellite geographical images...
FIGURE 9.14 Performance comparison of the three portfolios from April to Jun...
FIGURE 9.15
Netflix Prize
, October 2006. The rating matrix is large and sp...
FIGURE 9.16
Singular Value Decomposition
of the rating matrix .
Chapter 10
FIGURE 10.1 Neighborhoods of Boston (https://upload.wikimedia.org/wikipedia/...
FIGURE 10.2 The output graph from Programme 10.1; the black circle dots are ...
FIGURE 10.3 The output graph from Programme 10.2; the black dots are the tra...
FIGURE 10.4 The output of the linear regression model for
HSI
generated by P...
FIGURE 10.5 Scatter plot of the Mahalabonis distances, generated in Programm...
FIGURE 10.6 Photo of an Iris virginica flower; original picture from...
FIGURE 10.7 Using the cleansed dataset
d3
from Programme 10.11, lift charts ...
FIGURE 10.8 Illustration of TPR and FPR.
FIGURE 10.9 ROC curve for the data extracted from Table 10.3, where each poi...
FIGURE 10.10 ROC curves of two binary logistic regressions deployed to two d...
FIGURE 10.11 Evaluation of the performance of a classification algorithm usi...
FIGURE 10.12 Scree plot of the Boston housing training dataset in Python, ge...
FIGURE 10.13 Scree plot of the Boston housing training dataset in
R
, generat...
FIGURE 10.14 Causing singularity of for logistic regression as approachi...
Chapter 11
FIGURE 11.1 An example of a linear classifier for two‐dimensional feature ve...
FIGURE 11.2 Two yellow dots and two red crosses which cannot be classified b...
FIGURE 11.3 Perceptron: Category 1 (red crosses); Category 2 (yellow dots)
FIGURE 11.4 An update of the estimate for under gradient descent method at...
FIGURE 11.5 Some possible 's all of which satisfy sgn for all .
FIGURE 11.6 Performance of logistic regression represented by solid lines, a...
FIGURE 11.7 An example of an SVM model for two‐dimensional feature vectors....
FIGURE 11.8 Kernel trick: 2D to 3D transformation.
FIGURE 11.9 Applying quadratic kernel transformation to the data points gene...
FIGURE 11.10 SVM with RBF kernel resulting in different boundaries of the se...
FIGURE 11.11 Example of the existence of some outliers or noise in a dataset...
FIGURE 11.12 Plot of training and testing accuracies of the SVM models with ...
FIGURE 11.13 Plot of training and testing accuracies of the SVM models with ...
FIGURE 11.14 The MSE plot against parameter of the first three iterations ...
FIGURE 11.15 Gradient descent for SGD (dotted trace in grey), mini‐batch SGD...
FIGURE 11.16 A contour plot for Problem 11.25.
FIGURE 11.17 (a) A compact convex set; (b) A convex, but non‐compact, set; (...
FIGURE 11.18 The two paths of optimizations taken with respect to von Neuman...
FIGURE 11.19 Duality Gap
FIGURE 11.20 Value of the optimal portfolio and the S&P500 index in the test...
Chapter 12
FIGURE 12.1 Polynomial regression using Bayesian approach with various sampl...
FIGURE 12.2 Boxplot of ‐scores for all classifiers in the simulation study....
FIGURE 12.3 Boxplot of accuracies for all classifiers in the simulation stud...
FIGURE 12.4 Downloading CIBer package in Python via terminal.
FIGURE 12.5 GitHub for CIBer.
FIGURE 12.6 Class label distributions of bank churners (left) and default pr...
FIGURE 12.7 Boxplot of ‐score for each classifier in the bank churners data...
FIGURE 12.8 Boxplots of accuracies for all classifiers in the bank churners ...
FIGURE 12.9 Boxplots of recall rates for all classifiers in the bank churner...
FIGURE 12.10 Boxplots of ‐scores for all classifiers in the default premium...
FIGURE 12.11 Boxplots of accuracies for all classifiers in the default premi...
FIGURE 12.12 Boxplots of recall rates for all classifiers in the default pre...
Chapter 13
FIGURE 13.1 A classification tree showing criteria for choosing a Prince Cha...
FIGURE 13.2 An illustration for splitting tree with a sample at node by ...
FIGURE 13.3 Selection of attribute or based on information gain.
FIGURE 13.4 A graph of entropy, Gini‐index and misclassification error as ...
FIGURE 13.5 An illustration of trees with different numbers of terminal node...
FIGURE 13.6 Classification tree with five leaf nodes and four decision nodes...
FIGURE 13.7 Three possible trees formed from in Figure 13.6.
FIGURE 13.8 Classification trees for the 2002 data without removing outliers...
FIGURE 13.9 Plot of
ln_MV
versus
HSI
, with the horizontal line representin...
FIGURE 13.10 Classification tree for the iris flower dataset.
FIGURE 13.11 Plot of
Petal.Length
versus
Petal.width
, with the horizontal li...
FIGURE 13.12 An illustration of splitting at a parent node with a dataset ....
FIGURE 13.13 Regression tree splitting algorithm for one attribute.
FIGURE 13.14 Illustrations of a feasible partition of a space of feature vec...
FIGURE 13.15 A regression tree for the medical premium data via Python, gene...
FIGURE 13.16 A regression tree for the medical premium data via
R
, generated...
FIGURE 13.17 A graphical illustration of the random forest method.
FIGURE 13.18 Classification trees for the Taiwan credit default dataset.
FIGURE 13.19 An illustration of
Wordle
gameplay under normal mode. The targe...
Chapter 14
FIGURE 14.1 The illustration for the iterations with and ; the
bisectors
...
FIGURE 14.2 Scatter plots of pairwise components of observations for each gr...
FIGURE 14.3 Scatter plot matrix colored corresponding to the output from ‐m...
FIGURE 14.4 Boxplots for each feature variable in the cleansed 2002 HSI data...
FIGURE 14.5 An example of a two‐dimensional asymmetric sample.
FIGURE 14.6 Scatter plot for two variables (left), and boxplots for each fea...
FIGURE 14.7 Image segmentation using ‐means clustering.
FIGURE 14.8 Illustration of ‐NN classification, the newly added point is id...
FIGURE 14.9 Illustrations of three commonly adopted kernel functions.
FIGURE 14.10 Comparison of Nadaraya–Watson estimates and the true underlying...
FIGURE 14.11 Illustrations for , (colored circle) and the artificial clus...
Chapter 15
FIGURE 15.1 An illustration of the analogy between actual neurons and abstra...
FIGURE 15.2 An artificial neuron.
FIGURE 15.3 Commonly used activation functions for an axon.
FIGURE 15.4 Illustration of a 4‐3‐2 ANN.
FIGURE 15.5 Illustration of a 4‐2‐1 ANN.
FIGURE 15.6 The three solutions of four different initial seeds in the param...
FIGURE 15.7 Illustration of a 4‐2‐3 ANN.
FIGURE 15.8 Illustration of a 2‐2‐1 ANN.
FIGURE 15.9 Illustration of a multilayer perceptron. Each node is connected ...
FIGURE 15.10 A 10‐100‐100‐1 MLP model for stock price prediction.
FIGURE 15.11 Input and output for each neuron with weights assigned.
FIGURE 15.12 Predicting the day‐11 price for
Anglo American
(AAL),
Goldman S
...
FIGURE 15.13 Examples of sigmoid functions.
FIGURE 15.14 Illustration of approximating a function by simple functions.
FIGURE 15.15 Construction of a simple function.
FIGURE 15.16 A component in an ANN used to construct a rectangular partition...
FIGURE 15.17 A standard LSTM memory cell; also see [20].
FIGURE 15.18 Connections of cells in a LSTM model.
FIGURE 15.19 A two‐layer LSTM model with an equivalent
unfolded representati
...
FIGURE 15.20 LSTM backpropagation for the two‐layer model.
FIGURE 15.21 A 6‐8‐8‐1 LSTM model for Bitcoin price predictions.
FIGURE 15.22 Plot of Bitcoin price predictions via LSTM.
Cover
Table of Contents
Title Page
Copyright
Dedication
About the Authors
Foreword
Preface
Acknowledgements
Introduction
Begin Reading
Postlude
Index
End User License Agreement
i
ii
iii
iv
vi
vii
viii
ix
xvii
xviii
xix
xxi
xxii
xxiii
xxv
xxvi
1
2
3
4
5
6
7
9
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
569
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
767
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
Really interesting, and an impressive masterpiece! Financial Data Analytics contains a rich amount of material, with original research findings in almost every chapter; many parts of the book will even be directly helpful for my own teaching in business school. In view of its dedication towards data-driven analytical tools genuinely needed in financial problems, I believe that it is the very book that defines the scope of financial data analytics.
—Alain Bensoussan, Fellow of AMS, IEEE, and SIAM; President of INRIA (1984–1996); President of CNES (Centre National d'Etudes Spatiales) (1996–2003); Chairman of ESA Council (European Space Agency) (1999–2002); Former Member of Advisory Board, Mathematical Finance; Lars Magnus Ericsson Chair Professor of Management, Naveen Jindal School of Management, University of Texas at Dallas
Financial Data Analytics provides a timely and thorough exploration of crucial topics in contemporary data science, specifically tailored for a quantitative finance audience. It skillfully balances introductory and advanced concepts, seamlessly integrating mathematical foundations with detailed coding examples. Designed to appeal to a broad audience, the content accommodates varying levels of familiarity with the mathematical and computational aspects of quantitative finance. The authors' adept presentation of complex ideas, coupled with practical applications, renders Financial Data Analytics an invaluable resource for both novice and seasoned professionals alike.
—KC Gary Chan, Fellow of ASA and IMS; President (Western North American Region), International Biometrics Society (2022); Professor, Department of Statistics, University of Washington
This book presents a wide coverage of state-of-the-art topics in data analytics, which are crucial in our current era of big data. Its organic blend of mathematical derivations of the theory and practical applications in FinTech and InsurTech via tailor-made implementable Python and R codes is exceptional. To given but a single example, based on my own research interests, the novel CIBer is a very interesting and original new tool. It is a joy to read Financial Data Analytics, a book that cannot be missed on the bookshelf of any researcher or student interested in this topic.
—Jan Dhaene, Full Professor, Director of Master of Science in Financial and Actuarial Engineering, and Head of Actuarial Research Group, Department of Accountancy, Finance and Insurance, Faculty of Business and Economics, KU Leuven; Head of Division “Actuariële Toepassingen voor Verzekerings-ondernemingen en Pensioenfondsbeheer”, KU Leuven Research and Development; Member of Institute of Actuaries of Belgium, Member and Vice-chair of Actuarial Education Network, International Actuarial Association
Financial Data Analytics is a fantastic book that offers rare gifts to industry practitioners. The important theories are brought to life through R and Python program codes, developed by the authors for the book, and easily adaptable for industry use. The book has comprehensive coverage of state-of-the-art techniques for every need. I like reading the practical applications, which help develop intuition for the more complicated methodologies, and surely someone can make a handsome profit implementing them. A super read, and a must-have for professionals if numbers rule your world.
—Kaiser Fung, Bestselling author, Numbers Rule Your World and Numbersense; Founding Director, MSc programme in Applied Analytics, Columbia University; Founder, Principal Analytics Prep
Financial Data Analytics is an exceptional book that integrates mathematics, practical examples, and real-life scenarios. With its focus on real datasets and practical programming codes in Python and R, the book offers a comprehensive exploration of various topics. It presents novel research findings and provides valuable insights for researchers, practitioners, and actuarial students. The book strikes a balance between foundational concepts and advanced techniques, making it an invaluable reference for professionals in the field. Additionally, its relevance extends to actuarial students preparing for their professional examinations. By redefining the landscape of financial data analytics in FinTech and InsurTech, this book establishes itself as a trusted guide in the industry.
—Simon Lam, Fellow of SOA, CFA, and FRM; President of The Actuarial Society of Hong Kong (2018, 2023); Deputy CEO & General Manager, Munich Re (Hong Kong)
The book will certainly play an impactful role in the advancement of financial analytics and should be on the bookshelf of every serious student of the topic.
—Wai Keung Li, Fellow of ASA and IMS; Emeritus Professor, The University of Hong Kong; Dean, Faculty of Liberal Arts and Social Sciences, The Education University of Hong Kong
Financial Data Analytics is a masterfully written book that encompasses a wide spectrum of statistical models and algorithms, with a special emphasis on financial and insurance applications. Drawing upon their multidisciplinary background and extensive research experience, as well as their close connection with the industry, the authors skillfully explain the theoretical underpinnings of both conventional and contemporary statistical methods that are truly relevant to the industry (including but not limited to regression learning, classification trees, neural networks, as well as the specification and assessment of these models), and amply illustrate the practical applications of these methods in various disciplines, by an abundance of real financial and insurance data, and using both Python and R. The dual focus on theory and applications, together with the discussion on recent advancements of the fields, makes the book one of a kind, even field-defining, among books on similar topics, and an ideal resource for anyone interested in understanding and implementing statistical models in this era of big data, as well as for students preparing for professional examinations on data analytics, such as the SRM, PA and ATPA exams of the Society of Actuaries.
—Ambrose Lo, Fellow of SOA, Chartered Enterprise Risk Analyst; Author of ACTEX Study Manual for SOA Exam SRM, ACTEX Study Manual for SOA Exam PA, and ACTEX Study Manual for SOA Exam ATPA
It is a tome!
—Suresh P. Sethi, Fellow of INFORMS, IEEE, POMS, and SIAM; Eugene McDermott Chair Professor of Operations Management, Naveen Jindal School of Management, University of Texas at Dallas
Financial Data Analytics is an encyclopedic documentation of in-depth and extensive statistical analysis in finance and beyond. It provides an end-to-end systematic approach to academics and practitioners with theories, tremendous examples and data, and algorithms with coding that are readily applicable in real life. The book encompasses four dimensions of coverage—theoretical framework to application and coding, distributional characteristics to data diagnosis and simulation, learning, and lastly coverage of both qualitative and quantitative data. The book consolidates classical knowledge with the most contemporary research in all subjects. Financial Data Analytics is one comprehensive biblical handbook for academic researchers, financial practitioners, and graduate students for both methodologies and applications. The book also lays a systematic framework for future extension and enrichment for financial data analytics.
—Nai-pan Tang, Former Chief Risk Officer and Member of Executive Committee, Hang Seng Bank; Former Deputy CEO and Chief Risk Officer, Shanghai Commercial Bank Ltd.; Former Director of the Board, Deputy CEO, Alternative CEO, Chief Risk Officer, and Vice Chairman of Asset Management, China CITIC Bank International; Director, The Hong Kong Institute of Bankers (2019–2021); Professor of Practice, Department of Finance, Chinese University of Hong Kong
Financial Data Analytics is a very impressive work with extensive coverage and many interesting topics. It stands out among similar books by the unique blend of detailed mathematical derivations, practical examples, real-life datasets, and readily available programme codes in Python and R. It also contains many novel results from the authors' recent research. Whether you are researchers on related fields, practitioners in the financial industry, or students preparing for a few exams at The Society of Actuaries or The Institute and Faculty of Actuaries, Financial Data Analytics will certainly be a valuable reference book.
—Hailiang Yang, Associate of SOA and Honorary Fellow of IFoA; Editor of Insurance: Mathematics and Economics; Professor, Department of Financial and Actuarial Mathematics, Xi'an Jiaotong–Liverpool University
Throughout my career at JP Morgan Chase and then CITIC Securities, I have participated in and witnessed how various data analytics tools revolutionized financial industry. Financial Data Analytics is a perfect example echoing this trend, by discussing a wide spectrum of modern tools in data analytics, from both a theoretical viewpoint and a practical aspect, with readily implementable programme codes in both Python and R for real-life examples. This combination is so unique amongst the few books on data analytics; it not only reflects the broad range of theoretical knowledge of the authors, but also demonstrates their close ties with the finance and insurance industries. I actually had the opportunity to testify some profit-making strategies mentioned in the book, and their performance was genuinely impressive. This book is far more than an academic monograph for scholars; it is certainly an illuminating guide for practitioners to explore their own alchemy of finance.
—Wei Zhou, Executive Director of Equity Derivatives Quantitative Research, JP Morgan Chase (2016–2021); Executive Director and Head of Quantitative Modelling, CITIC Securities
Founded in 1807, John Wiley & Sons is the oldest independent publishing company in the United States. With offices in North America, Europe, Australia and Asia, Wiley is globally committed to developing and marketing print and electronic products and services for our customers' professional and personal knowledge and understanding.
The Wiley Finance series contains books written specifically for finance and investment professionals as well as sophisticated individual investors and their financial advisors. Book topics range from portfolio management to e-commerce, risk management, financial engineering, valuation and financial instrument analysis, as well as much more.
For a list of available titles, visit our Web site at www.WileyFinance.com.
SAM CHEN
Hang Seng University of Hong Kong
KA CHUN CHEUNG
University of Hong Kong
PHILLIP YAM
Chinese University of Hong Kong
with programme codes by Kaiser Fan
This edition first published 2025
© 2025 by Sam Chen, Ka Chun Cheung, Phillip Yam.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Sam Chen, Ka Chun Cheung, Phillip Yam to be identified as the authors of this work has been asserted in accordance with law.
Registered Office(s)John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USAJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial OfficeThe Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.Wiley also publishes its books in a variety of electronic formats and by print-on- demand. Some content that appears in standard print versions of this book may not be available in other formats. Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of WarrantyWhile the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data Is Available:
ISBN 9781119863373 (Cloth)ISBN 9781119863380 (ePDF)ISBN 9781119863397 (ePub)ISBN 9781119863403 (oBook)
Cover Design: WileyCover Image: © da-kuk/Getty Images
To our parents and families
Yongzhao Chen (Sam) received his BSc in Actuarial Science with first class honours and PhD in Actuarial Science from The University of Hong Kong. He is currently an Assistant Professor at the Department of Mathematics, Statistics and Insurance of the Hang Seng University of Hong Kong. His research interests include actuarial science, especially credibility theory, and data analytics.
Ka Chun Cheung received his BSc in Actuarial Science with first class honours and PhD from The University of Hong Kong. He was the Director of the Actuarial Science Programme, and is currently Head and full Professor at the Department of Statistics and Actuarial Science in School of Computing and Data Science, The University of Hong Kong. He is an Associate of the Society of Actuaries and an elected member of the International Statistical Institute. He is serving on the editorial boards of Insurance, Mathematics and Economics and Journal of Industrial and Management Optimization. His current research interests include various topics in actuarial science, including optimal reinsurance, stochastic orders, dependence structures, and extreme value theory.
Phillip Yam received his BSc in Actuarial Science with first class honours and MPhil from The University of Hong Kong. Supported by the two scholarships awarded by the Croucher Foundation (Hong Kong), he obtained an MASt (Master of Advanced Study) degree, Part III of the Mathematical Tripos, with Distinction in Mathematics from University of Cambridge and a DPhil in Mathematics from University of Oxford. During his postgraduate studies, he was awarded with the E. M. Burnett Prize in Mathematics from University of Cambridge, and the junior research fellowship from The Erwin Schrödinger International Institute for Mathematics and Physics of University of Vienna.
Phillip is currently the Co-Director of the Interdisciplinary Major Programme in Quantitative Finance and Risk Management Science, and a full Professor at the Department of Statistics of The Chinese University of Hong Kong (CUHK). He is also Assistant Dean (Education) of CUHK Faculty of Science, and Fellow of the Centre for Promoting Science Education in the Faculty. He has been appointed as a research fellow in the Hausdorff Research Institute for Mathematics at the University of Bonn and a Visiting Professor in both the Department of Statistics at Columbia University in the City of New York and Naveen Jindal of Management at University of Texas at Dallas. He has published about a hundred journal articles in actuarial science, applied mathematics, data analytics, engineering, financial mathematics, operations management, and statistics, and has also been serving in editorial boards of several journals in these fields. Together with Alain Bensoussan and Jens Frehse, he wrote the first monograph on mean field games and mean field type control theory. His research project with the title “Comonotone-independence Bayes Classifier (CIBer)” was awarded a Silver Medal in the 48th International Exhibition of Inventions Geneva in 2023. Besides academia, he has provided consulting services for various financial institutions and insurance companies, and established close connections in these industries; many of his students also work in international investment banking and insurance companies.
Kaiser Fan received his BSc in Risk Management Science with first class honours and MPhil from The Chinese University of Hong Kong under the guidance of Professor Phillip Yam. As a data scientist, his research interests include data analytics, and machine learning especially in deep learning. He contributes to the programming and many examples and illustrations in the book.
To the memory of
Tze Leung Lai (1945–2023)
Late Ray Lyman Wilbur Professor of Statistics,
Stanford University
We were saddened to hear about the sudden passing away of Professor Tze Leung Lai. We would like to thank him for his care for the younger generation, including ourselves, as well as all of his valuable guidance in the past two decades, since Ka Chun's and Phillip's senior-year undergraduate and master studies at The University of Hong Kong; we all learned a lot from him, both indirectly or directly. He was certainly a renowned scholar. Due to the pandemic, we could not make the trip to visit him in person; we were hoping to meet him again last summer after the pandemic eventually came to an end, only to learn that he departed too soon. During the book writing process, we sent a draft version of the book to him. He was glad of what we had achieved and also graciously offered to write a foreword for this book, which can no longer become a reality now. However, this foreword is always reserved for him. We thank you again for your generous offer, Professor Lai; thank you, our mentor, may you rest in peace.
Winter, 2023
In the field of finance, nothing is more important than gaining profits, and any innovation that draws people's attention must lead to at least the same level of profit as the existing methods; indeed, this has always been the main driving force of updates to relevant curricula over time. For instance, with the development of option pricing and portfolio selection in 1970s, the financial training from mid-1980s to 2000s heavily involved Itô's stochastic calculus and partial differential equations. On the other hand, volatility models such as GARCH were proposed by Robert Engle in the early 1980s for a better estimation of parameters facilitating derivative valuation and portfolio management, and various academic curricula quickly followed suit by placing more emphasis on financial econometrics. Quantitative analysis of game theory, particularly the numerical algorithm for discovering equilibrium points, gained more importance and attention in academia after John Nash won the Nobel Memorial Prize in Economic Sciences in 1994; the trend continues today with further sophistication and generalization towards the context of mean field games in the last dozen years. In the 2000s, as behavioural finance was gaining increasing attention in society, people wanted to learn more methods in the realm of experimental behavioural economics and finance, especially on how the market makes use of statistical methods to understand the impact of different human behaviour and devise advertising strategies accordingly, which explains why case studies and primitive statistics have been prevalent in the financial classes in recent decades.
Recently, attention has been diverted towards AI. The revolutionary developments in machine learning and deep learning have brought new elements of data analytics into finance, particularly including the heated areas of InsurTech, FinTech and RegTech. To catch up with the trend, curriculum designs should be revised to cover financial or business data analytics in a comprehensive manner, and statistics is certainly at the core of them; this is precisely why we wanted to write this book. Among the few books in this field, involving the use of standard statistics in financial analysis with either Python or R, and statistical applications in financial engineering, focus is usually put on the possible financial applications of conventional statistical tools, yet some practical problems may require tools beyond traditional statistics, and we aim to address a few relevant issues in this book. Another important motivation for us is certainly the positive feedback from students regarding our teaching materials in the past decade, which we have consolidated as the foundation of this book.
This book investigates contemporary practical techniques of financial data analytics that are specific for real-life scenarios and leave room for a high profit-making potential, with 15 chapters in total covering a wide range of important and frontier topics in this field. For example, we shall explore data analytics in investment strategy, financial forensics, and the immediate use of deep learning in finance. We also critically discuss the pros and cons of machine learning tools. While we raise caution against potential pitfalls of new approaches like deep learning, we also propose a novel feature engineering scheme as part of CIBer (see Chapter 12) to overcome limitations of existing methods regarding input features, which also achieves a promising classification performance. Examples are provided throughout the whole book, in which we focus on a few typical datasets from real-life financial markets to facilitate intuitive comparisons among models, allowing readers to form their own judgements on their pros and cons, and hence apply suitable data analysis methods to their own datasets. Executable detailed programme codes in Python and R are also readily available with corresponding examples. Practitioners including quants and fund managers can gain insights into the latest developments of data analytics from the book, and help to formulate effective investment strategies or to facilitate better product designs. This up-to-date knowledge may further help them conduct novel applied research in different business disciplines. It is also our hope that senior-year students and postgraduates can deepen their understanding on this field and find the book useful for their future academic research. Meanwhile, the contents of this book also cover a large part of syllabi of modules from different public professional examinations on predictive analytics, including but not limited to the Statistics for Risk Modeling (SRM) Exam and Predictive Analytics (PA) Exam of the Society of Actuaries, making it a suitable main or supplementary reading for these professional examinations.
To benefit the most from this book, it is advisable that readers have a solid background in probability and statistics, linear algebra, and advanced calculus, preferably at the sophomore level. For some parts of the book, some basic knowledge in real and complex analysis would enhance a full understanding of them. Especially, some sections marked by asterisks necessitate a higher level of mathematical understanding and may be omitted during the initial reading. Anyhow, for the convenience of the readers, some of the relevant basic knowledge is reviewed in Chapter 1. Acquaintance with programming languages such as Python or R is also instrumental. To make the book more self-contained, a quick overview of these two programming languages is provided in Chapter 2. In addition, readers interested in more sophisticated investment strategies and derivative pricing also need a rudimentary background in Itô's stochastic calculus and continuous martingale theory, which is unavoidable given the technical nature of the subject matter.
In writing this book, our multidisciplinary background proved helpful; we all had diverse training in actuarial science, economics and finance, mathematics, probability and statistics, and our research also involves the application of data analytics in diverse applied areas. We have established long-term research collaborations with industry practitioners, and many of our undergraduate and postgraduate students, friends, and colleagues are also working in world-leading companies in finance and insurance sectors. Growing up in the traditional global financial hub of Hong Kong has also equipped us with practical financial knowledge that benefits our pragmatic research, while we are not bounded by the routine methods in solving both research and practical problems. In addition, our graduate student Kaiser Fan also made a unique contribution by implementing most programme codes in the examples in a tailor-made and illuminating fashion.
With the publication of this book, we welcome valuable feedback and comments from readers. Due to limitations in both scope and time, we could not delve into all topics in detail, and we apologize for any missing information. While we drew inspiration from a wide range of literature, and we tried to cite all of them, some may still have unintentionally slipped our mind over time, and we sincerely apologize for any oversights. We also benefited from courses on financial data analytics or equivalents, including teaching materials and course design, in renowned universities in Asia, Australia, Europe and North America.
While we believe our book offers a valuable collection of tools in financial data analytics, we deliberately left out blockchains, as they have shifted towards being an internet and network security concern rather than an analytical tool for financial information. Meanwhile, we have an upcoming book on deep learning with some applications in finance, to which we briefly hint in the closing chapter of this book. Hopefully readers will enjoy the current book and stay tuned for the upcoming release.
Sam Chen, Ka Chun Cheung, and Phillip YamHong Kong, December 2023
First and foremost, we would like to thank Wiley for their kind consideration of our book, as well as their assistance and patience in different stages of preparation, so that we could have more time to polish this highly original book.