38,99 €
Discover foundational and advanced techniques in quantitative equity trading from a veteran insider In Quantitative Portfolio Management: The Art and Science of Statistical Arbitrage, distinguished physicist-turned-quant Dr. Michael Isichenko delivers a systematic review of the quantitative trading of equities, or statistical arbitrage. The book teaches you how to source financial data, learn patterns of asset returns from historical data, generate and combine multiple forecasts, manage risk, build a stock portfolio optimized for risk and trading costs, and execute trades. In this important book, you'll discover: * Machine learning methods of forecasting stock returns in efficient financial markets * How to combine multiple forecasts into a single model by using secondary machine learning, dimensionality reduction, and other methods * Ways of avoiding the pitfalls of overfitting and the curse of dimensionality, including topics of active research such as "benign overfitting" in machine learning * The theoretical and practical aspects of portfolio construction, including multi-factor risk models, multi-period trading costs, and optimal leverage Perfect for investment professionals, like quantitative traders and portfolio managers, Quantitative Portfolio Management will also earn a place in the libraries of data scientists and students in a variety of statistical and quantitative disciplines. It is an indispensable guide for anyone who hopes to improve their understanding of how to apply data science, machine learning, and optimization to the stock market.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 419
Veröffentlichungsjahr: 2021
Cover
Title Page
Copyright
List of Figures
Code Listings
Preface
About this Book
Abstract
Acknowledgments
Introduction
Chapter 1: Market Data
1.1 Tick and bar data
1.2 Corporate actions and adjustment factor
1.3 Linear vs log returns
Chapter 2: Forecasting
2.1 Data for forecasts
2.2 Technical forecasts
2.3 Basic concepts of statistical learning
2.4 Machine learning
2.5 Dynamical modeling
2.6 Alternative reality
2.7 Timeliness-significance tradeoff
2.8 Grouping
2.9 Conditioning
2.10 Pairwise predictors
2.11 Forecast for securities from their linear combinations
2.12 Forecast research vs simulation
Chapter 3: Forecast Combining
3.1 Correlation and diversification
3.2 Portfolio combining
3.3 Mean-variance combination of forecasts
3.4 Combining features vs combining forecasts
3.5 Dimensionality reduction
3.6 Synthetic security view
3.7 Collaborative filtering
3.8 Alpha pool management
Chapter 4: Risk
4.1 Value at risk and expected shortfall
4.2 Factor models
4.3 Types of risk factors
4.4 Return and risk decomposition
4.5 Weighted PCA
4.6 PCA transformation
4.7 Crowding and liquidation
4.8 Liquidity risk and short squeeze
4.9 Forecast uncertainty and alpha risk
Chapter 5: Trading Costs and Market Elasticity
5.1 Slippage
5.2 Impact
5.3 Cost of carry
5.4 Market-wide impact and elasticity
Chapter 6: Portfolio Construction
6.1 Hedged allocation
6.2 Forecast from rule-based strategy
6.3 Single-period vs multi-period mean-variance utility
6.4 Single-name multi-period optimization
6.5 Multi-period portfolio optimization
6.6 Portfolio capacity
6.7 Portfolio optimization with forecast revision
6.8 Portfolio optimization with forecast uncertainty
6.9 Kelly criterion and optimal leverage
6.10 Intraday optimization and execution
Chapter 7: Simulation
7.1 Simulation vs production
7.2 Simulation and overfitting
7.3 Research and simulation efficiency
7.4 Paper trading
7.5 Bugs
Afterword: Economic and Social Aspects of Quant Trading
Appendix
A1 Secmaster mappings
A2 Woodbury matrix identities
A3 Toeplitz matrix
Index
Question Index
Quotes Index
Stories Index
End User License Agreement
Cover
Table of Contents
Begin Reading
iii
iv
xi
xii
xiii
xv
xvi
xvii
xix
xx
xxi
xxii
xxiii
xxiv
xxv
xxvi
xxvi
xxviii
xxix
xxx
xxxi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
261
262
Michael Isichenko
Copyright © 2021 by Michael Isichenko. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data
Names: Isichenko, Michael, author.
Title: Quantitative portfolio management : the art and science of statistical arbitrage / Michael Isichenko.
Description: Hoboken, New Jersey : John Wiley & Sons, Inc., [2021] | Includes bibliographical references and index.
Identifiers: LCCN 2021013923 (print) | LCCN 2021013924 (ebook) | ISBN 9781119821328 (cloth) | ISBN 9781119821229 (adobe pdf) | ISBN 9781119821212 (epub)
Subjects: LCSH: Portfolio management—Mathematical models. | Arbitrage.
Classification: LCC HG4529.5 .I83 2021 (print) | LCC HG4529.5 (ebook) | DDC 332.6—dc23
LC record available at https://lccn.loc.gov/2021013923
LC ebook record available at https://lccn.loc.gov/2021013924
Cover Design: Wiley
Cover Image: © Michael Isichenko
2.1
Fractional Brownian motion and the Hurst exponent
2.2
Three datasets
2.3
Bias-variance tradeoff in OLS regression
2.4
Piecewise-linear convex function
2.5
Convex regression
2.6
Curse of dimensionality in OLS regression
2.7
Spectra of random covariance matrices
2.8
Total variation denoising (TVD)
2.9
Cross-validated local linear regression (LLR)
2.10
Gaussian process (GP) regression
2.11
Ridge regression
2.12
Lasso regression
2.13
Double dip of generalization error
2.14
Stacked dominoes
2.15
Historical cost of computer CPU power
3.1
Sharpe triangle
3.2
Combining 1000 pnl time series
3.3
Street light fixture
4.1
Value at risk and expected shortfall
5.1
Linear vs concave impact
6.1
Optimal position with impact costs
6.2
Optimal position with slippage costs
6.3
Simulation of the Kelly criterion
2.1
Bias-variance tradeoff for 100 OLS features
2.2
Eigenvalues of random covariance matrix
2.3
Local linear regression (LLR) solver
2.4
Gaussian process (GP) using sklearn
2.5
Lasso regression using sklearn
2.6
Double dip of generalization error
7.1
Macros for readable C++
7.2
Bilingual coding
This book describes the process used by quantitative traders, or quants, a community the author has belonged to for a number of years. Quants are not usually trained as quants, but often come from one of the “hard sciences” such as mathematics, statistics, physics, electrical engineering, economics, or computer science. The author, a physicist by training, feels guilty for (ab)using the word describing a fundamental concept of quantum physics in the context of quantitative trading, but this slang is too rooted in the industry to be avoided. Having quantitative finance professionals in mind, the intended audience is presumed interdisciplinary, fluent in mathematical notation, not foreign to algorithmic thinking, familiar with basic financial concepts such as market-neutral strategies, and not needing a definition of pnl. This book could be also interesting to those readers who are thinking of joining the quant workforce and wondering if it is worth it.
The quant trading business, especially its alpha part, tends to be fairly secretive, but the traffic of portfolio managers and analysts between quant shops has created a body of common knowledge, some of which has been published in the literature. The book is an attempt to cover parts of this knowledge, as well as to add a few ideas developed by the author in his own free time. I appreciate the concern of some of the more advanced colleagues of mine about letting the tricks of the trade “out in the wild.” Those tricks, such as machine learning and optimization algorithms, are mostly in the public domain already but are spread over multiple fields. In addition to academic research, Wall Street can learn a lot from Silicon Valley, whose inhabitants have generated a tremendous and less secretive body of knowledge. Using an analogy with cryptography, sec urity through obscurity is a popular approach in quantitative trading, but it gradually gives way to security by design ultimately rooted in the increasingly difficult forecasting of future asset prices, the holy skill and grail of quantitative portfolio management. The rest of the quant trading process, while not exactly trivial in scope, is within the reach of a reasonably trained scientist, this author included, who is willing and able to read Wikipedia,1 and learn better coding.
The choice of topics for this book is aligned with the author's personal interests in the field, although an honest attempt is made to cover, in depth or in passing, all relevant parts of statistical arbitrage, a quantitative approach to equity trading. Whether or not a particular formula or approach is expected to help make money (or avoid losses) is not disclosed or opined upon, in part because any application success is data- and implementation-dependent, and in part to keep the reader in suspense. The book is also an attempt to strike a balance between what the author could say and is comfortable saying. In the field of quantitative trading, the more interesting stuff doesn't usually get published. In this book, the reader will hopefully find a few things that might be interesting or at least entertaining.
Any resemblance of described quantitative practices to past or existing firms is coincidental and may not be statistically significant. As Kurt Vonnegut admitted in Slaughterhouse-Five, All this happened, more or less. This book is for quants and, occasionally, about quants.
A lot of the quantitative portfolio management process involves data and code. The exposition style adopted in this book does not include too many charts, tables, or code snippets, although there are some. Instead, the focus is on ideas, motivation for various approaches, and mathematical description seeking a terse and elegant exposition whenever possible. Mathematical formulas tend to be more compact and expressive than code written in any programming language. In addition, and quoting Eugene Wigner,2the enormous usefulness of mathematics in the natural sciences is something bordering on the mysterious and ... there is no rational explanation for it.
_____________
1
Accordingly, and for the reader's convenience, the electronic version of this book has multiple hyperlinks to Wikipedia and other URLs.
This book is an unlikely result of some 20 years of trial-and-error discovery. It is also a work in progress. The author will appreciate indication of any omission or error, as well as any feedback from the reader, whose comments are most welcome at [email protected].
M.I.
New York-Montauk, June 2020–May 2021.
_____________
2
E. Wigner,
The Unreasonable Effectiveness of Mathematics in the Natural Sciences
, Communications in Pure and Applied Mathematics, 13(I), February 1960.
Quantitative trading of financial securities is a multi-billion dollar business employing thousands of portfolio managers and quantitative analysts (“quants”) trained in mathematics, physics, or other “hard” sciences. The quants trade stocks and other securities creating liquidity for investors and competing, as best they can, at finding and exploiting any mispricings with their systematic data-driven trading algorithms. The result is highly efficient financial markets, which nonetheless are not immune to events of crowding, bubbling, occasional liquidation panic, and “cobra effects” including the high-frequency trading (HFT) arms race. This book attempts a systematic description of the quant trading process by covering all its major parts including sourcing financial data, “learning” future asset returns from historical data, generating and combining forecasts, diversification and its limitations, risk and leverage management, building optimal portfolios of stocks subject to risk preferences and trading costs, and executing trades. The book highlights the difficulties of financial forecasting due to quantitative competition, the curse of dimensionality, and the propensity to overfitting. Some of the topics included in the book have not been previously discussed in the literature. The exposition seeks a balance between financial insight, mathematical ideas of statistical and machine learning, practical computational aspects, actual stories and thoughts “from the trenches,” as observed by a physicist turned a quant, and even tough or funny questions asked at countless quant interviews. The intended audience includes practicing quants, who will encounter things both familiar and novel (such as lesser-known ML algorithms, combining multiple alphas, or multi-period portfolio optimization), students and scientists thinking of joining the quant workforce (and wondering if it's worth it), financial regulators (mindful of the unintended cobra effects they may create), investors (trying to understand their risk-reward tradeoff), and the general public interested in quantitative and algorithmic trading from a broad scientific, social, and occasionally ironic standpoint.
The book presents a systematic review of the quantitative equity trading process, aka statistical arbitrage, including market and other financial data, alpha generation, risk, trading costs, and portfolio construction. Financial forecasting involves statistical learning of future asset returns on features extracted from relevant current and past data, including price-volume, fundamental and analyst, holdings and flows, news, alternative, and other publicly available datasets. Both theoretical and algorithmic machine learning (ML) aspects of financial forecasting are reviewed with an emphasis on regularization methods, bias-variance and other tradeoffs, generalization error, the curse of dimensionality, and traps of overfitting. ML involves a wealth of parametric, nonparametric, deep, online, and latent structure algorithms, whose success is data-dependent according to the “No free lunch” theorem. Meta-learning methods include hyperparameter optimization, boosting, and other ensemble methods. An important context of financial ML is competition-based market efficiency imposing limits on the acceptable complexity and expected performance of predictive models. Some topics of active research such as “benign overfitting” in interpolating deep neural nets and other ML algorithms are also covered. Several approaches of combining multiple forecasts are discussed using secondary ML, dimensionality reduction, and other methods, while highlighting correlation-based limits on alpha diversification. Multi-factor risk models and trading costs are reviewed including both theoretical and empirical aspects relevant to portfolio construction. Effects of price impact on stock market macro elasticity are also discussed. A unified framework of multi-period portfolio optimization is presented with several special closed-form solutions with impact and slippage costs and approximations for efficient algorithmic approaches. Optimal portfolio capacity and leverage are discussed, including a critical review of the Kelly criterion. The book also presents a brief review of intraday algorithmic execution and high-frequency trading (HFT) and raises fundamental questions of more efficient market design to benefit the general investing public.
This book wouldn't be possible without the author's interaction with many colleagues in academia and coworkers, competitors, and friends in the financial industry. The role of the early mentors, Vladimir Yankov (in physics) and Aaron Sosnick (in finance), was especially valuable in forming the author's ways of thinking about challenging problems and asking better questions.
Special thanks to all my superiors in the industry for prudently hiring or dismissing me, as appropriate for each occasion, and to all my peers and direct reports for the opportunity to learn from them.
I would like to thank Marco Avellaneda and Jean-Philippe Bouchaud for encouraging me to write up this material, as well as Aaron for discouraging it. A few fellow quants including, but not limited to, Colin Rust and Alexander Barzykin provided valuable comments and critique on various parts of the book draft. Their feedback is gratefully acknowledged.
Warm regards to those interviewers and interviewees who made the endless Q&A sessions more fun than they are supposed to be.
And thank you, Angela, for food, books, love, and understanding.
The time needed to write this book was an unexpected byproduct of the spread of the SARS-CoV-2 virus, which may have caused a temporary loss of smell, taste, or job, but hopefully not of sense of humor.
Science is what we understand well enough to explain to a computer. Art is everything else we do.
Donald Knuth
Financial investment is a way of increasing existing wealth by buying and selling assets of fluctuating value and bearing related risk. The value of a bona fide investment is expected to grow on average, or in expectation, albeit without a guarantee. The very fact that such activity, pure gambling aside, exists is rooted in the global accumulation of capital, or, loosely speaking, increase in commercial productivity through rational management and technological innovation. There are also demographic reasons for the stock market to grow—or occasionally crash.
Another important reason for investments is that people differ in their current need for money. Retirees have accumulated assets to spend while younger people need cash to pay for education or housing, entrepreneurs need capital to create new products and services, and so forth. The banking and financial industry serves as an intermediary between lenders and borrowers, facilitating loans, mortgages, and municipal and corporate bonds. In addition to debt, much of the investment is in equity. A major part of the US equity market is held by pension funds, including via mutual funds holdings.1 Aside from occasional crisis periods, the equity market has outperformed the inflation rate. Stock prices are correlated with the gross domestic product (GDP) in all major economies.2 Many index and mutual funds make simple diversified bets on national or global stock markets or industrial sectors, thus providing inexpensive investment vehicles to the public.
In addition to the traditional, long-only investments, many hedge funds utilize long-short and market-neutral strategies by betting on both asset appreciation and depreciation.3 Such strategies require alpha, or the process of continuous generation of specific views of future returns of individual assets, asset groups, and their relative movements. Quantitative alpha-based portfolio management is conceptually the same for long-only, long-short, or market-neutral strategies, which differ only in exposure constraints and resulting risk profiles. For reasons of risk and leverage, however, most quantitative equity portfolios are exactly or approximately market-neutral. Market-neutral quantitative trading strategies are often collectively referred to as statistical arbitrage or statarb. One can think of the long-only market-wide investments as sails relying on a breeze subject to a relatively stable weather forecast and hopefully blowing in the right direction, and market-neutral strategies as feeding on turbulent eddies and waves that are zero-mean disturbances not transferring anything material—other than wealth changing hands. The understanding and utilization of all kinds of pricing waves, however, involves certain complexity and requires a nontrivial data processing, quantitative, and operational effort. In this sense, market-neutral quant strategies are at best a zero-sum game with a natural selection of the fittest. This does not necessarily mean that half of the quants are doomed to fail in the near term: successful quant funds probably feed more on imperfect decisions and execution by retail investors, pension, and mutual funds than on less advanced quant traders. By doing so, quant traders generate needed liquidity for traditional, long-only investors. Trading profits of market-neutral hedge funds, which are ultimately losses (or reduced profits) of other market participants, can be seen as a cost of efficiency and liquidity of financial markets. Whether or not this cost is fair is hard to say.
_____________
1
Organization for Economic Co-operation and Development (OECD) presents a detailed analysis of world equity ownership: A. De La Cruz, A. Medina, Y. Tang,
Owners of the World's Listed Companies
, OECD Capital Market Series, Paris, 2019.
2
F. Jareño, A. Escribano, A. Cuenca,
Macroeconomic variables and stock markets: an international study
, Applied Econometrics and International Development, 19(1), 2019.
3
A.W. Lo,
Hedge Funds: An Analytic Perspective - Updated Edition
, Princeton University Press, 2010.
Historically, statistical arbitrage started as trading pairs of similar stocks using mean-reversion-type alpha signals betting on the similarity.4 The strategy appears to be first used for proprietary trading at Morgan Stanley in the 1980s. The names often mentioned among the statarb pioneers include Gerry Bamberger, Nunzio Tartaglia, David E. Shaw, Peter Muller, and Jim Simons. The early success of statistical arbitrage started in top secrecy. In a rare confession, Peter Muller, the head of the Process Driven Trading (PDT) group at Morgan Stanley in the 1990s, wrote: Unfortunately, the mere knowledge that it is possible to beat the market consistently may increase competition and make our type of trading more difficult. So why did I write this article? Well, one of the editors is a friend of mine and asked nicely. Plus, chances are you won't believe everything I'm telling you.5 The pair trading approach soon developed into a more general portfolio trading using mean reversion, momentum, fundamentals, and any other types of forecast quants can possibly generate. The secrets proliferated, and multiple quantitative funds were started. Quantitative trading has been a growing and an increasingly competitive part of the financial landscape since early 1990s.
On many occasions within this book, it will be emphasized that it is difficult to build successful trading models and systems. Indeed, quants betting on their complex but often ephemeral models are not unlike behavioral speculators, albeit at a more technical level. John Maynard Keynes once offered an opinion of a British economist on American finance:6Even outside the field of finance, Americans are apt to be unduly interested in discovering what average opinion believes average opinion to be; and this national weakness finds its nemesis in the stock market... It is usually agreed that casinos should, in the public interest, be inaccessible and expensive. And perhaps the same is true of stock exchanges.
_____________
4
M. Avellaneda, J.-H. Lee.
Statistical arbitrage in the US equities market
, Quantitative Finance, 10(7), pp. 761–782, 2010.
5
P. Muller,
Proprietary trading: truth and fiction
, Quantitative Finance, 1(1), 2001.
6
J.M. Kaynes,
The General Theory of Employment, Interest, and Money
, Macmillan, 1936.
This book touches upon several theoretical and applied disciplines including statistical forecasting, machine learning, and optimization, each being a vast body of knowledge covered by many dedicated in-depth books and reviews. Financial forecasting, a poor man's time machine giving a glimpse of future asset prices, is based on big data research, statistical models, and machine learning. This activity is not pure math and is not specific to finance. There has been a stream of statistical ideas across applied fields, including statements that most research findings are false for most research designs and for most fields.7 Perhaps quants keep up the tradition when modeling financial markets. Portfolio optimization is a more mathematical subject logically decoupled from forecasting, which has to do with extracting maximum utility from whatever forecasts are available.
Our coverage is limited to topics more relevant to the quant research process and based on the author's experience and interests. Out of several asset classes available to quants, this book focuses primarily on equities, but the general mathematical approach makes some of the material applicable to futures, options, and other asset classes. Although being a part of the broader field of quantitative finance, the topics of this book do not include financial derivatives and their valuation, which may appear to be main theme of quantitative finance, at least when judged by academic literature.8 Most of the academic approaches to finance are based on the premise of efficient markets,9 precluding profitable arbitrage. Acknowledging market efficiency as a pretty accurate, if pessimistic, zeroth-order approximation, our emphasis is on quantitative approaches to trading financial instruments for profit while controlling for risks. This activity constitutes statistical arbitrage.
When thinking about ways of profitable trading, the reader and the author would necessarily ask the more general question: what makes asset prices move, predictably or otherwise? Financial economics has long preached theories involving concepts such as fundamental information, noise and informed traders, supply and demand, adaptivity,10 and, more recently, inelasticity,11 which is a form of market impact (Sec. 5.4). In contrast to somewhat axiomatic economists' method, physicists, who got interested in finance, have used their field's bottom-up approach involving market microstructure and ample market data.12 It is definitely supply and demand forces, and the details of market organization, that determine the price dynamics. The dynamics are complicated, in part due to being affected by how market participants learn/understand these dynamics and keep adjusting their trading strategies. From the standpoint of a portfolio manager, price changes are made of two parts: the impact of his own portfolio and the impact of others. If the former can be treated as trading costs, which are partially under the PM's control, the latter is subject to statistical or dynamical modeling and forecasting.
_____________
7
J.P.A. Ioannidis,
Why Most Published Research Findings Are False
, PLoS Med 2(8): e124, 2005.
8
P. Wilmott,
Frequently Asked Questions in Quantitative Finance
, Wiley, 2009.
9
P.A. Samuelson,
Proof That Properly Anticipated Prices Fluctuate Randomly
, Industrial Management Review, 6, pp. 41–49, 1965.
10
A.W. Lo,
The Adaptive Markets Hypothesis: Market Efficiency from an Evolutionary Perspective
, Journal of Portfolio Management, 30(5), pp. 15–29, 2004.
Among other things, this book gives a fair amount of attention to the combination of multiple financial forecasts, an important question not well covered in the literature. Forecast combination is a more advanced version of the well-discussed theme of investment diversification. Just like it is difficult to make forecasts in efficient markets, it is also difficult, but not impossible, to optimally combine forecasts due to their correlation and what is known as the curse of dimensionality. To break the never ending cycle of quantitative trial and error, it is important to understand fundamental limitations on what can and what can't be done.
The book is structured as follows. Chapter 1 briefly reviews raw and derived market data used by quants. Alpha generation, the central part of the quant process, is discussed in Chapter 2. This chapter starts with additional financial data usable for forecasting future asset returns. Both theoretical and algorithmic aspects of machine learning (ML) are discussed with an emphasis on challenges specific to financial forecasting. Once multiple alphas have been generated, they need to be combined to form the best possible forecast for each asset. Good ways of combining alphas is an alpha in itself. ML approaches to forecast combining are discussed in Chapter 3. A formal view of risk management, as relevant to portfolio construction, is presented in Chapter 4. Trading costs, with an emphasis on their mathematical structure, are reviewed in Chapter 5. There a case is made for a linear impact model that, while approximate, has a strong advantage of making several closed-form multi-period optimization solutions possible. Impact of a net flow of funds at a macro scale is also discussed with implications for stock market elasticity and bubbles. Chapter 6 describes the construction of a portfolio optimized for expected future profits subject to trading costs and risk preferences. This part tends to use the most math and includes previously unpublished results for multi-period portfolio optimization subject to impact and slippage costs. Related questions of portfolio capacity and optimal leverage, including the Kelly criterion, are also discussed. Chapter 7 concerns the purpose and implementation of a trading simulator and its role in quant research. A few auxiliary algorithmic and mathematical details are presented in appendices.
_____________
11
X. Gabaix, R.S.J. Koijen,
In Search of the Origins of Financial Fluctuations: The Inelastic Markets Hypothesis
, Swiss Finance Institute Research Paper No. 20-91, Available at SSRN:
https://ssrn.com/abstract=3686935
, 2021.
12
J.-P. Bouchaud, J.D. Farmer, F. Lillo,
How markets slowly digest changes in supply and demand
, arXiv:0809.0822 [q-fin.TR], 2008.
Computation is a primary tool in most parts of the quantitative trading process and in machine learning. Several aspects of computing, including coding style, efficiency, bugs, and environmental issues are discussed throughout the book. A few important machine learning concepts, such as bias-variance tradeoff (Secs. 2.3.5 and 2.4.12) and the curse of dimensionality (Sec. 2.4.10), are supported by small self-contained pieces of Python code generating meaningful plots. The reader is encouraged to experiment along these lines. It is often easier to do productive experimental mathematics than real math.
Some of the material covering statistics, machine learning, and optimization necessarily involves a fair amount of math and relies on academic and applied research in various, often disjoint, fields. Our exposition does not attempt to be mathematically rigorous and mostly settles for a “physicist's level of rigor” while trying to build a qualitative understanding of what's going on. Accordingly, the book is designed to be reasonably accessible and informative to a less technical reader who can skip over the more scary math and focus on the plain English around it. For example, the fairly technical method of boosting in ML (Sec. 2.4.14) is explained as follows: The idea of boosting is twofold: learning on someone else's errors and voting by majority.
The field of quantitative portfolio management is too broad for a single paper or book to cover. Important topics either omitted here or just mentioned in passing include market microstructure theory, algorithmic execution, big data management, and non-equity asset classes. Several books cover these and related topics.13,14,15,16,17 While citing multiple research papers in various fields, the author could not possibly do justice to all relevant or original multidisciplinary contributions. The footnote references include work that seemed useful, stimulating, or just fascinating when developing (or explaining) forecasting and optimization ideas for quantitative portfolio management. Among the many destinations where Google search brings us, the arXiv,18 is an impressive open source of reasonably high signal-to-noise ratio19 publications.
A note about footnotes. Citing sources in footnotes seems more user-friendly than at the end of chapters. Footnotes are also used for various reflections or mini stories that could be either meaningful or entertaining but often tangential to the main material.
Finally, in the spirit of the quant problem-solving sportsmanship, and for the reader's entertainment, a number of actual interview questions asked at various quant job interviews are inserted in different sections of the book and indexed at the end, along with the main index, quotes, and the stories.
_____________
13
R.C. Grinold, R.N. Kahn,
Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk
. McGraw-Hill, New York, 2000.
14
R.K. Narang,
Inside the Black Box: A Simple Guide to Quantitative and High Frequency Trading, 2nd Edition
, Wiley, 2013.
15
J.-P. Bouchaud, J. Bonart, J. Donier, M. Gould,
Trades, Quotes and Prices
. Financial Markets Under the Microscope, Cambridge University Press, 2018.
16
Z. Kakushadze, J.A. Serur,
151 Trading Strategies
, Available at SSRN:
https://ssrn.com/abstract=3247865
, 2018.
17
Finding Alphas: A Quantitative Approach to Building Trading Strategies, 2nd Edition
, Edited by I. Tulchinsky, Wiley, New York, 2019.
18
https://arxiv.org
.
19
A. Jackson,
From Preprints to E-prints: The Rise of Electronic Preprint Servers in Mathematics
, Notices of the AMS, 49, 2002.
Perhaps the most useful predictor of future asset prices are past prices, trading volumes, and related exchange-originated data commonly referred to as technical, or price-volume data. Market data comes from quotes and trades. The most comprehensive view of the equity market includes exchange-specific limit order book by issue, which is built from limit orders forming buy and sell queues at different price depths, market orders, and their crossing (trades) per exchange rules such as price/time priority. In addition to the full depth of book tick stream, there are simplified datafeeds such as Level 2 (low-depth order book levels and trades), Level 1 (best bid and offer and trades), minute bars (cumulative quote and trade activity per discrete time intervals), and daily summary data (open, close, high, low, volume, etc).
Depth of book data is primarily used by high frequency trading (HFT) strategies and execution algos provided by brokers and other firms, although one can argue that a suitable analysis of the order book could detect the presence of a big directional trader affecting a longer-term price movement. Most non-HFT quant traders utilize either daily or bar data—market data recorded with certain granularity such as every 5 minutes—for research and real-time data for production execution.1
Major financial information companies such as Thompson Reuters and Bloomberg offer market data at different levels of granularity, both historical and in real time. A quant strategy needs the history for research and simulation (Chapter 7) and real time for production trading. Historical simulation is never exactly the same as production trading but can, and must, be reasonably close to the modeled reality, lest research code have a lookahead bug, that is, violate the causality principle by using “future-in-the-past” data. As discussed in Chapter 2, highly competitive and efficient financial markets keep the predictability of future price movements at a very low level. As a result even a subtle lookahead (Sec. 2.1.1) in a quant trading simulator can be picked up by a sensitive machine learning (ML) algorithm to generate a spurious forecast looking great in simulation but never working in production.
Compute the products:
From a quant interview
Equities as an asset class are subject to occasional corporate actions (“cax”) including dividends, splits, spin-offs, mergers, capital restructuring, and multi-way cax. Maintaining an accurate historical cax database is a challenge in itself. Failure to do so to a good approximation results in wrong asset returns and real-time performance not matching simulation (Sec. 7.1). For alpha research purposes it is generally sufficient to approximate each cax with two numbers, dividend and split .2 The dividend can be an actual dividend paid by the issue in the universe currency such as US dollar (USD) or the current total value of any foreign currency dividend or stock spin-off.
_____________
1
Sometimes even real-time trading is done on bar data. The author has observed peculiar periodic pnl fluctuations of his medium-frequency US equity portfolio. The regular 30-minute price spikes indicated a repetitive portfolio rebalancing by a significant market participant whose trades were correlated with the author's positions.
Security return for day is defined as the relative change in the closing price from previous day to current day :
To account for corporate actions, the prices are adjusted, that is, multiplied by an adjustment factor so (1.1) give a correct return on investment after the adjustment. In general, a multi-day return from day to day equals
The adjustment factor is used only in a ratio across days and is therefore defined up to constant normalizing coefficient. There are two ways of price adjustment: backward and forward. The backward adjustment used, for example, in the Bloomberg terminal is normalized so today's adjustment factor equals one and changes by cax events going back in time. On a new day, all values are recomputed.
Another way is forward adjustment, in which scheme starts with one on the first day of the security pricing history and then changes as
Cax events are understood as those with or . The past history of the forward adjustment is not changed by new entries. Therefore, the forward adjustment factor can be recorded and incrementally maintained along with price-volume data. If backward adjustment factor is desired as of current date, it can be computed as
_____________
2
This is clearly not enough for updating an actual trading portfolio for newly spun off regular or when-issued stock. For this, portfolio managers usually rely on maintenance performed by the prime broker.
The rationale forEq. (1.3) is as follows. Dividend and split for day are always known in advance. One perfectly logical, if impractical, way to reinvest a dividend (including a monetized spin-off) per share is to borrow cash to buy additional shares of the same stock at the previous day close and then return the loan the morning after from the dividend proceeds. To stay fully invested, the total dividend amount must equal the loan amount , therefore
In terms of value at hand, this manipulation is equivalent to a stock split. If there is also a post-dividend split , one-day adjustment factor equals
and formula (1.3) follows.3
Some quant shops have used a similar reinvestment logic of buying shares of stock at the new closing price resulting in a somewhat simpler day adjustment factor,
This formula is fine as far as only daily data is concerned, but applying this adjustment to intraday prices results in a lookahead (Sec. 2.1.1) due to using a future, while intraday, closing price . Intraday forecast features depending on such adjustment factor can generate a wonderful forecast for dividend-paying stocks in simulation, but production trading using such forecasts will likely be disappointing. Formula (1.6) differs from the “simple” Eq. (1.7) by a typically small amount but is free from lookahead.
Dividend (including any spin-off) values found in actual historical data can occasionally reach or exceed the previous close value causing trouble in Eq. (1.3). Such conditions are rare and normally due to a datafeed error or a major capital reorganization warranting a termination of the security and starting a new one via a suitable entry in the security master (Sec. 2.1.2), even if the entity has continued under the same name.
_____________
3
Eqs. (1.3)
and
(1.6)
apply to the convention that a dividend is paid on a pre-split (previous day) share. A post-split dividend convention is used by some data vendors and requires a straightforward modification of the adjustment factor. Simultaneous dividends and splits are infrequent.
Price adjustment is also used for non-equity asset classes. Instead of corporate actions, futures contracts have an expiration date and must be “rolled” to continue position exposure. The roll is done by closing an existing position shortly before its expiration and opening an equivalent dollar position for the next available expiration month. For futures on physical commodities, such as oil or metals, the price of a contract with a later expiration date is normally higher than a similar contract with an earlier expiration due to the cost of carry including storage and insurance. The monthly or quarterly rolling price difference can be thought of as a (possibly negative) dividend or a split and handled by a backward or forward adjustment factor using Eq. (1.3). Brokers provide services of trading “continuous futures,” or automatically rolled futures positions.
Given a list of consecutive daily portfolio pnls, compute, in linear time, its maximum drawdown.
From a quant interview
The linear return (1.1), also known as simple or accounting return, defines a daily portfolio pnl through dollar position :
Here boldface notation is used for vectors in the space of portfolio securities. For pnl computation, the linear returns are cross-sectionally additive with position weights. Risk factor models (Sec. 4.2) add more prominence to the cross-sectional linear algebra of simple returns.
It is also convenient to use log returns
which, unlike the linear returns, are serially additive, for a fixed initial investment in one asset, across time periods. In quant research, both types of return are used interchangeably.
Over short-term horizons of order one day, stock returns are of order 1%, so the difference between the linear and the logarithmic return
is of order , or a basis point (bps), which is in the ballpark of the return predictability (Sec. 2.3.3). The expectation, or forecast, of the log return (1.10) is
where is the volatility (standard deviation) of the return. Due to the negative sign of the correction in (1.11), its effect can be meaningful even for a slightly non-dollar-neutral or volatility-exposed portfolio. Volatility is one of commonly used risk factors (Sec. 4.3).
The difference between linear and log returns affects forecasting (Chapter 2
It is very difficult to predict, especially the future.
Possibly Niels Bohr
For a non-gambler, investing money into moving things makes sense only when he or she is able to predict, more or less, where those things are moving. Predictability is actually not that bad in physics describing dynamics of matter by various ordinary or partial differential equations. The horizon of physical predictability is still limited by the Lyapunov exponentiation of nearby orbits. This exponential accumulation of uncertainty in a formally deterministic system is a signature of dynamical chaos, which almost invariably appears in dimensions greater than two.1 For financial assets, it is also easier to predict returns over shorter horizons, mainly due to higher costs of faster trading (Chapter 5). The rules of the game and clear players' stimuli such as greed and fear provide a degree of dynamical description to the markets (Sec. 2.5), but those dynamics are very noisy due to many dimensions/participants often making noisy decisions. In addition, actions of the more powerful and moneyed participants—informed traders—reduce the degree, and shorten the horizon, of price predictability. After all easy signals are arbed out, which they surely are, we are left with a pretty low predictability of future returns—on the order of bps2 per day—vs stock volatility.
_____________
1
A generic dynamical system
in a
-dimensional state space free of special symmetries and associated conservation laws generates deterministic trajectories
typically filling allowed volume in a chaotic/fractal manner and exponentially sensitive to the initial condition
,
, unless topology rules out chaotic trajectories.
The predictability, or signal-to-noise ratio is probably the most important small parameter of quantitative finance. In mathematics or theoretical physics, a small parameter such as the fine structure constant, makes it easier to work out useful analytics by perturbation theory or series expansion, but the low price predictability does not seem very helpful to quants, who also face non-trivial transaction costs. The zeroth order approximation in is of course the classical efficient market hypothesis preventing hedge funds from existence. The equivalent no-arbitrage argument is used in the Black-Scholes derivative pricing model.3 Market efficiency does not necessarily imply the existence of a well-defined true asset value. As Fisher Black famously noted,4We might define an efficient market as one in which price is within a factor of 2 of value. In view of possibly inelastic dynamics of the aggregate stock market, the factor of 2 could easily turn 5 posing some difficulties for the market efficiency at the macro level (Sec. 5.4).
In the first order in , where quant traders operate, there exists a popular but flawed argument that small signal-to-noise ratio leaves viable only simple forecasts such as those based on low-dimensional ordinary least squares (OLS). While OLS regression (Sec. 2.4.3) remains one of the most frequently used forecasting tools, and for a good enough reason of low complexity if the number of predictors is low (Sec. 2.4.10), examples of sections 2.3 and 2.4 make the case for other methods as well.
_____________
2
Here bps stands for
basis points
, or units of
, commonly used for measuring small returns.
3
F. Black, M. Scholes,
The Pricing of Options and Corporate Liabilities
, Journal of Political Economy, 81(3), pp. 637–654, 1973.
4
F. Black,
Noise
, The Journal of Finance, 41(3), pp. 529–554, 1986.
Given a CSV file with 11 columns and 1000 rows, the last 10 rows with missing s, fill in the blanks using this computer.
From a quant interview
Any future return prediction, or forecast (call it ), needs some predictor data (call it ), possibly multi-dimensional, so the forecast is a function of the predictor: . We will call this forecast function a model. The data must be available at the time of forecasting; the data is not yet directly observed and therefore modeled. To have predictive power, a predictor must be relevant to the outcome or, put simply, make sense for the context. Further, predictors are best cast in the form of features—formulaic or algorithmic combinations preferably expressing clear forecast ideas and maximizing predictive power. The terms predictor and feature are used interchangeably.
Forecasting financial time series involves both art and science. Feature engineering is mostly art based on financial intuition, educated priors, and a degree of understanding of market workings, although there are some general approaches as well.5 Generating a forecast from features is mostly science, either using methods of classical statistics (Sec. 2.3) or those of the actively developing field of machine learning (Sec. 2.4).
In addition to the price-volume data mentioned in Sec. 1.1, below is an incomplete list of data sources commonly used for equity forecasting. Most types of data are available from more than one provider, which are not listed here.
What is the minimal vertical size of a wall mirror needed for a 6-foot-tall person to see himself from head to toe?
From a quant interview
_____________
5
M. Kuhn, K. Johnson,
Feature Engineering and Selection: A Practical Approach for Predictive Models
, Chapman & Hall/CRC Data Science Series, 2019.
To be usable in quant research, financial data must be stored point-in-time (PIT), meaning a clear labeling of each record by the time when it had, or would, become available in the normal course of datafeed updates. An incorrect time labeling, either in data storage or upon data loading in RAM, can lead to lookahead, or a research bug of inadvertently using future data, whereas real-time production trading is clearly free of such “feature.” If a researcher sees a too-good-to-be-true simulation (Sec. 7) performance,6 the case for the lookahead is clear, but it is fair to assume that many quant strategies fail to perform due to never discovered subtle lookahead bugs.
