40,99 €
Dive into algo trading with step-by-step tutorials and expert insight Machine Trading is a practical guide to building your algorithmic trading business. Written by a recognized trader with major institution expertise, this book provides step-by-step instruction on quantitative trading and the latest technologies available even outside the Wall Street sphere. You'll discover the latest platforms that are becoming increasingly easy to use, gain access to new markets, and learn new quantitative strategies that are applicable to stocks, options, futures, currencies, and even bitcoins. The companion website provides downloadable software codes, and you'll learn to design your own proprietary tools using MATLAB. The author's experiences provide deep insight into both the business and human side of systematic trading and money management, and his evolution from proprietary trader to fund manager contains valuable lessons for investors at any level. Algorithmic trading is booming, and the theories, tools, technologies, and the markets themselves are evolving at a rapid pace. This book gets you up to speed, and walks you through the process of developing your own proprietary trading operation using the latest tools. * Utilize the newer, easier algorithmic trading platforms * Access markets previously unavailable to systematic traders * Adopt new strategies for a variety of instruments * Gain expert perspective into the human side of trading The strength of algorithmic trading is its versatility. It can be used in any strategy, including market-making, inter-market spreading, arbitrage, or pure speculation; decision-making and implementation can be augmented at any stage, or may operate completely automatically. Traders looking to step up their strategy need look no further than Machine Trading for clear instruction and expert solutions.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 399
Veröffentlichungsjahr: 2016
Cover
Series Page
Title Page
Copyright
Dedication
Preface
Chapter 1: The Basics of Algorithmic Trading
Historical Market Data
Live Market Data
Backtesting and Trading Platforms
Brokers
Performance Metrics
Portfolio Optimization
Summary
Exercises
Endnotes
Chapter 2: Factor Models
Time‐series Factors
Cross‐sectional Factors
A Two‐Factor Model
Using Option Prices to Predict Stock Returns
Short Interest
Liquidity
Statistical Factors
Putting Them All Together
Summary
Exercises
Endnotes
Chapter 3: Time‐Series Analysis
AR(p)
ARMA(
p
,
q
)
VAR(
p
)
State Space Models
Summary
Exercises
Endnotes
Chapter 4: Artificial Intelligence Techniques
Stepwise Regression
Regression Tree
Cross Validation
Bagging
Random Subspace and Random Forest
Boosting
Classification Tree
Support Vector Machine
Hidden Markov Model
Neural Network
Data Aggregation and Normalization
Application to Stocks Selection
Summary
Exercises
Endnotes
Chapter 5: Options Strategies
Trading Volatility without Options
Predicting Volatility
Event‐Driven Strategies
Gamma Scalping
Dispersion Trading
Cross‐Sectional Mean Reversion of Implied Volatility
Summary
Exercises
Endnotes
Chapter 6: Intraday Trading and Market Microstructure
Latency Reduction
Order Type and Routing Optimization
Adverse Selection Reduction
Backtesting Intraday Strategies
Order Flow
Order Book Imbalance
Summary
Exercises
Endnotes
Chapter 7: Bitcoins
Bitcoin Facts
Time‐Series Techniques
Mean Reversion Strategy
Artificial Intelligence Techniques
Order Flow
Cross‐Exchange Arbitrage
Summary
Exercises
Endnotes
Chapter 8: Algorithmic Trading Is Good for Body and Soul
Your Mind and Your Health
Trading as a Service
Does This Stuff Really Work?
Keeping Up with the Latest Trends
Managing Other People's Money
Conclusion
Endnotes
Bibliography
About the Author
Index
End User License Agreement
Chapter 1: The Basics of Algorithmic Trading
Table 1.1 : Ranking of Three Programming Languages for Quant Trading (Ranked from 1 to 3 where 1 = best ranking and 3 = poorest ranking.)
Table 1.2 : Mean of Net vs. Log Returns
Chapter 2: Factor Models
Table 2.1 : Input Factor Loadings that Are Size‐Independent
Chapter 3: Time‐Series Analysis
Table 3.1 : Coefficients of an AR(10) Model Applied to AUD.USD
Table 3.2 : Coefficients of an ARMA(2, 5) Model Applied to AUD.USD
Table 3.3 : Constant Offsets, Autoregressive Coefficients, and Covariance of a VAR(1) Model Applied to Computer Hardware Stocks
Table 3.4 : Error Correction Matrix of a VEC(0) Model Applied to Computer Hardware Stocks
Table 3.5 : Estimated Values for
B
and
D
Matrices (Off‐Diagonal Elements Are 0)
Table 3.6 : Estimated Values for
B
Chapter 4: Artificial Intelligence Techniques
Table 4.1 : Performance Comparison for Different Network Architectures with Retraining
Table 4.2 : Performance Comparison for Different Network Architectures with Averaging
Table 4.3 : Input Factors That Are Size‐Independent
Table 4.4 : Factors Selected by Stepwise Regression
Chapter 5: Options Strategies
Table 5.1 : Performance Comparison between Long SPY vs. Short VX
Table 5.2 : Contracts Trading Dates
Chapter 6: Intraday Trading and Market Microstructure
Table 6.1 . Order Book for Exchanges L and N
Table 6.2 . Order Book for Exchanges L and N
Table 6.3 . BBO for Outright Market for ED
Table 6.4 . BBO for Outright Market after Adding New Calendar Spread Limit Order
Table 6.5 . Data Structure for Aggressor Tag Method
Table 6.6 . Data Structure for Volume Bars
Chapter 7: Bitcoins
Table 7.1 : Comparison of Riskiness of Risky Assets
Chapter 1: The Basics of Algorithmic Trading
Figure 1.1 Algorithmic trading at a glance
Figure 1.2 : Efficient frontier
Chapter 2: Factor Models
Figure 2.1 : Fundamental cross‐sectional factor model on SPX component stocks
Figure 2.2 : One‐ and two‐factor models on SPX component stocks
Figure 2.3 : Liquidity factor on SPX component stocks
Figure 2.4 : Statistical factor on SPX component stocks
Chapter 3: Time‐Series Analysis
Figure 3.1 : AR(10) trading strategy applied to AUD.USD
Figure 3.2 : ARMA(2, 5) trading strategy applied to AUD.USD
Figure 3.3 : VAR(1) trading strategy applied to computer hardware stocks
Figure 3.4 : Kalman filter trading strategy applied to computer hardware stocks (in‐sample)
Figure 3.5 : Kalman filter trading strategy applied to computer hardware stocks (out‐of‐sample)
Figure 3.6 : Kalman filter estimate of the slope between EWC and EWA
Figure 3.7 : Kalman filter estimate of the offset between EWC and EWA
Figure 3.8 : Kalman filter trading strategy applied to EWC–EWA (in‐sample)
Figure 3.9 : Kalman filter trading strategy applied to EWC–EWA (out‐of‐sample)
Chapter 4: Artificial Intelligence Techniques
Figure 4.1 : Out‐of‐sample performance of stepwise regression on SPY
Figure 4.2 : Regression tree on SPY
Figure 4.3 : Trading model based on regression tree
Figure 4.4 : Leaving out a subset of training set for cross‐validation test
Figure 4.5 : Bagging with
K
bags (In bag 1, data samples 2 and 4 will be part of test set 1. In bag
K
, data samples 1 and 3 will be part of test set K.)
Figure 4.6 : Trading model based on regression tree with bagging (
K
= 5)
Figure 4.7 : Effect of boosting a regression tree on train vs. test set Sharpe ratio
Figure 4.8 : Support vector machine illustrated
Figure 4.9 : Support vector machine with cross validation
Figure 4.10 : Hidden states transition probabilities of an HMM
Figure 4.11 : A feed forward neural network for our example
Figure 4.12 : Cross‐validated regression tree on SPX component stocks
Figure 4.13 : Stepwise regression on SPX component stocks using fundamental factors
Chapter 5: Options Strategies
Figure 5.1 : Long SPY vs. Short VX
Figure 5.2 : Cumulative returns of VX‐ES roll returns strategy
Figure 5.3 : Hedge ratio between XIV and SPY determined by Kalman filter
Figure 5.4 : Cumulative returns of XIV‐SPY roll returns strategy
Figure 5.5 : Cumulative returns of
Strategy
Figure 5.6 :
Figure 5.7 :
Strategy
Figure 5.8 : Dispersion trading of SPX
Figure 5.9 : Leverage of AAPL options
Chapter 6: Intraday Trading and Market Microstructure
Figure 6.1
Z
(CDF of a Gaussian with zero mean and unit variance)
Chapter 07
Figure 7.1 : AR(16) trading strategy applied to BTC.USD
Figure 7.2 : ARMA(3, 7) trading strategy applied to BTC.USD
Figure 7.3 : Bollinger band trading strategy
Figure 7.4 : Bollinger band trading strategy on BTC.USD equity curve
Chapter 8: Algorithmic Trading Is Good for Body and Soul
Figure 8.1 : Spread of GLD and GDX: Out‐of‐sample
Cover
Table of Contents
Begin Reading
ii
iv
v
ix
x
xi
xii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
Founded in 1807, John Wiley & Sons is the oldest independent publishing company in the United States. With offices in North America, Europe, Australia, and Asia, Wiley is globally committed to developing and marketing print and electronic products and services for our customers' professional and personal knowledge and understanding.
The Wiley Trading series features books by traders who have survived the market's ever‐changing temperament and have prospered–some by reinventing systems, others by getting back to basics. Whether a novice trader, professional, or somewhere in between, these books will provide the advice and strategies needed to prosper today and well into the future.
Formore on this series, visit our website at www.WileyTrading.com.
Ernest P. Chan
Copyright © 2017 by Ernest P. Chan. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993, or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Cataloging-in-Publication Data is available:
ISBN 978-1-119-21960-6 (Hardcover)
ISBN 978-1-119-21967-5 (ePDF)
ISBN 978-1-119-21965-1 (ePub)
Cover Design: Wiley
Cover Images: Wave © kdrkara90/Shutterstock; Abstract background © Marina Koven/Shutterstock; Fractal Realms © agsandrew/Shutterstock
To my mom, Ching, my spouse, Ben, and to the memory of my beloved father, Hung Yip.
The best way to learn something really well is to teach it to someone else (Bargh and Schul, 1980). So I confess that one major motivation for my writing this book, the third and the most advanced to date in a series, is to force myself to study in more depth the following topics:
The latest backtesting and trading platforms and the best and most cost‐effective vendors for all manners of data (
Chapter 1
);
How to pick the best broker for algorithmic executions and what precautions we should take (
Chapter 1
);
The simplest way to optimize allocations to different assets and strategies (
Chapter 1
);
Factor models in all their glory, including those derived from the options market, and why they can be useful to short‐term traders (
Chapter 2
);
Time series techniques: ARIMA, VAR, and state space models (with hidden variables) as applied to practical trading (
Chapter 3
);
Artificial intelligence/machine learning techniques: particularly methods that will reduce overfitting (
Chapter 4
);
Options and volatility trading strategies, including those that involve portfolios of options (
Chapter 5
);
Intraday and higher frequency trading: market microstructure, order types and routing optimization, dark pools, adverse selection, order flow, and how to backtest intraday strategies with tick data (
Chapter 6
);
Bitcoins: bringing some of the techniques we covered to this new asset class (
Chapter 7
);
How to keep up with the latest knowledge (
Chapter 8
);
Transitioning from a proprietary trader to an investment advisor (
Chapter 8
).
I don't know if these topics will excite you or bring you profits, but my study of them has certainly improved my own money management skills. Besides, sharing knowledge and ideas is fun and ultimately conducive to creativity and profits.
You will find most of the materials quite accessible to anyone who has some experience in a quantitative field, be it computer science, engineering, or physics. Not much prior knowledge of trading and finance is assumed (except for the chapter on options, where we do assume basic familiarity). However, if you are completely new to trading, you may find my more basic treatments in Quantitative Trading (Chan, 2009) and Algorithmic Trading (Chan, 2013) easier to understand. This book can be treated as a continuation of my first two books, with coverage on topics that I have not discussed before, but it can also be read independently.
Although many prototype trading strategies have been included as examples, one should definitely not treat them as shrink‐wrapped products ready to deploy in live trading. As I have emphasized in my previous books, nobody should trade someone else's strategies without a thorough, independent backtest, removing all likely sources of biases and data errors, and adding various variations for improvement. Most, if not all, the strategies I describe contain hidden biases in one way or another, waiting for you to unearth and eliminate.
I use MATLAB for all of my research in trading. I find it extremely user‐friendly, with constantly improving and new features, and with an increasing number of specialized toolboxes that I can draw on. For example, without the Statistics and Machine Learning Toolbox, it would take much longer to explore using AI/ML techniques for trading. (See why Google scientist and machine learning expert Kevin Murphy prefers MATLAB to R for AI/ML research in Murphy, 2015.) In the past, readers have complained about the high price of a MATLAB license. But now, it costs only $150 for a “Home” license, with each additional toolbox costing only $45. No serious traders should compromise their productivity because of this small cost. I am also familiar with R, which is a close relative to MATLAB. But frankly, it is no match for MATLAB in terms of performance and user‐friendliness. A detailed comparison of these languages can be found in Chapters 1 and 6. If you don't already know MATLAB, it is very easy to request a one‐month trial license from mathworks.com and use its many free online tutorials to learn the language. One great advantage of MATLAB over R or other open‐source languages is that there is excellent customer support: If you have a question, just email or call the staff at Mathworks. (Often, someone with a PhD will answer your questions.)
I have taught many of these topics to both retail and institutional traders at my biannual workshops in London, as well as online (www.epchan.com). In order to facilitate lecturers who would like to use this as a textbook for a special topics course on Algorithmic Trading, I have included many exercises at the end of most chapters. Some of these exercises should be treated as suggestions for open‐ended projects; there are no ready‐made answers.
Readers will also find all of the software and some data used in the examples on epchan.com/book3. The userid and password are embedded in Box 1.1. But unlike my previous books, some of the data involved in the example strategies are under strict licensing restrictions and therefore are unavailable for free download from my website. Readers are invited to purchase or rent them from their original sources, all of which are described in Chapter 1.
I have benefited from tips, ideas, and help from many people in putting the content together. An incomplete list would include:
Stephen Aikin, a renowned author (Aikin, 2012) and lecturer, who helped me understand implied quotes due to calendar spreads in the futures markets (
Chapter 6
).
David Don and Joseph Signorelli of Lime Brokerage, who corrected some of my misunderstanding of the market microstructure (
Chapter 6
).
Jonathan Shore, infinitely knowledgeable about bitcoins, who helped compile some order book data in that market and shared that with me (
Chapter 7
).
Dr. Roger Hunter, CTO at our firm, QTS Capital Management, who reviewed my manuscript and who never failed to find software bugs in my codes.
The team at Interactive Brokers (especially Joanne, Ragini, Mike, Greg, Ian, and Ralph) whose infinite patience with my questions about all issues related to trading are much appreciated.
I would like to thank Professor Thomas Miller of Northwestern University for hiring me to teach the Risk Analytics course at the Master of Science in Predictive Analytics program. In the same vein, I would also like to thank Matthew Clements and Jim Biss at Global Markets Training for organizing the London workshops for me over the years. Quite a few nuggets of knowledge in this book come out of materials or discussions from these courses and workshops.
Trading and research have been made a lot more interesting and enjoyable because I was able to work closely with our team at QTS, who contributed to research, ideas, and general knowledge, some of which find their way into this book. Among them, Roger, of course, without whom there wouldn't be QTS, but also Yang, Marcin, Sam, and last but not least, Ray.
Of course, none of my books would come into existence without the support of Wiley, especially my long‐time editor Bill Falloon, development editor Julie Kerr, production editor Caroline Maria, and copy editor Cheryl Ferguson (from whom no missing “end” to a “for”‐loop can escape). It was truly a great pleasure to work with them, and their enthusiasm and professionalism are greatly appreciated.
An algorithmic trading strategy feeds market data (historical or live) into a computer (backtest or automated execution) program. The program then submits orders to the broker through an API, and receives order status notifications back from the broker. The flowchart in Figure 1.1 illustrates this process.
Figure 1.1 Algorithmic trading at a glance
Notice that I deliberately use the same box to indicate the computer program that generates backtest results and live orders: This is the best way to ensure we are trading the exact same model that we have backtested.
In this chapter, I will discuss the latest services, products, and their vendors applicable to each of the blocks in Figure 1.1. In addition, I will describe my favorite performance metrics, the way to determine the optimal leverage, and the simplest asset allocation method. Though I have touched on many (but not all) of these issues in my previous books, I have updated them here based on the state of the art. The FinTech industry has not been standing still, nor has my understanding of issues ranging from brokers' safety to subtleties of portfolio optimization.
For daily historical data in stocks and futures, I have been using CSI (csidata.com) for a long time. CSI has a very flexible, and robust, desktop application. The beauty of this application is that we can set a time in the evening when the data are automatically updated through an Internet connection with CSI's server. Also, the data can be stored in various convenient formats such as .txt, .csv, or .xlsx. We can ask it to automatically adjust historical stock (and ETF) prices for splits and dividends. For a little extra, CSI can also provide delisted stocks' historical data, so that you can have a survivorship‐bias‐free data set.1 (By the way, CSI data powers Yahoo! Finance's historical stock data.) For futures, we can choose different rollover methods to create continuous contracts. Or not, since the original contract prices are also available, and many professional traders prefer to backtest futures using original contract prices instead of back‐adjusted continuous contract prices. This is because the latter depends on a particular roll method, and may have look‐ahead bias embedded (see Chan, 2013, for a detailed exploration of this issue). Finally, CSI has excellent customer support through email and phone.
An alternative to CSI is Quandl.com, which is a consolidator of many kinds of data from many different vendors. It also provides an API in different languages (including MATLAB, which I use in this book, or Python, which many other traders use) that we can use for data selection and download. Some of Quandl's data are free (daily data for stocks is one example), and others require payment. I have purchased, for example, fundamental stock data from them (see Chapter 2, Factor Models), and they are much more economical than established vendors such as Compustat.
Serious traders or academic finance researchers may prefer stock, ETF, and mutual fund data from CRSP (www.crsp.com). Their historical data are carefully compiled to be survivorship‐bias‐free, and dividends and splits are provided separately so you can decide how to utilize them in a backtest. But most importantly, they provide the best bid and offer (BBO) prices at the close. These are important because, as is explained in Box 6.4 in Chapter 6, using the usual consolidated closing prices from CSI or Quandl can inflate backtest performances for certain strategies. A similar issue arises from using the consolidated opening prices. The best open and close prices to use are the auction prices from the primary exchange. (See also Box 6.4 for an explanation of how we can extract such auction prices from tick data.) The second best open and close prices to use are the BBO prices that can be used to compute the midprices at the open and close. Unfortunately CRSP does not provide the BBO prices at the open, so one must use intraday data for that purpose. For academic researchers, CRSP data can be obtained at a lower cost through WRDS (wrds‐web.wharton.upenn.edu), which is a consolidator of many high quality historical databases for scholarly research.
Of course, those serious traders who can afford to buy data from CRSP may also be able to afford a Bloomberg terminal subscription. One advantage of a Bloomberg terminal is that we can download the “primary exchange” close price for US stocks. Of course, a Bloomberg subscription also includes access to many historical and live data spanning a wide variety of instruments and, importantly, breaking news on every stock. I have found Bloomberg's news service to be superior to many other vendors'. Often, we will see a stock moves suddenly, and are not able to find the reason anywhere else but on Bloomberg's news feed. They do capture the most obscure news on the most obscure stocks in the shortest time frame, which is important when you have an event‐driven strategy. Bloomberg's historical US stock data are also survivorship‐bias‐free. (To be fair to Thomson Reuter's Eikon platform, which is a keen competitor to Bloomberg's, I have not tested its news feed. So it is possible that it provides just as wide and timely coverage as well. There is one feature on Eikon that impressed me in a demo: I was able to see the geographical locations of individual oil tankers and where they were headed. Apparently, this is useful to oil traders to predict short‐term oil inventory, supply, and demand.)
For futures traders, daily data does not present much of a problem. CSIs and the free data on Quandl are as good as any.
Daily options data can be obtained from ORATS.com as well as ivolatility.com. Both offerings are survivorship‐bias‐free. The institutional trader or academic researcher may also purchase from Option Metrics, which is often part of the WRDS package (see above). One nice feature of all these databases: They do not include just option closing prices, but also the bid‐ask quote at the close as well. This is important because some options, especially ones that are out‐of‐the‐money or have long tenor, may be traded infrequently. So the last trade price of the day may be very different from the bid‐ask quotes at the close, and is not indicative of the price we can actually trade at or near the close. These databases also include auxiliary but important information such as the Greeks and implied volatilities. Furthermore, they may include an implied volatility surface. This uses implied volatilities computed on actual options and interpolates them to yield implied volatilities for strikes and tenors that did not actually exist.
Options historical data tend to be more expensive than stock or futures data, due to their voluminous nature. However, it can be cheaper to rent intraday option prices from QuantGo.com than to buy daily option prices from other vendors. We will talk more about QuantGo when we discuss intraday data in general. It would be a trivial programming exercise to extract the end‐of‐day quotes from the intraday data, by looking for the quotes with timestamps at or just before the daily closing time.
Beyond daily price data, there are, of course, fundamental financial data for companies. I already mentioned that Quandl carries such data. Institutional traders would most likely look to Compustat for such data. For corporate earnings estimates by analysts, Thomson Reuters' IBES database is the standard. Both Compustat and IBES are available from WRDS. Meanwhile, crowd‐sourced earnings estimates are available from Estimize. There is some research that suggests Estimize's contributors can more accurately forecast actual earnings than traditional sell‐side analysts (Wang et al., 2014). An example strategy using Estimize's data is discussed in Deltix (2014). Short interest data are available from Compustat and SunGard's Astec database. SunGard's data have a lot more details culled from stock lenders and prime brokers around the Street than a simple short interest number. In addition, their data are available on an intraday basis as a live feed, though the historical data do not have historical time stamps.
News data is another type of data that is becoming fashionable. Many vendors sell elementized news feeds (i.e., news that is machine‐readable, which makes it easier to capture keywords, categories, and the like), including Bloomberg, Dow Jones, and Thomson Reuters. If it is too much trouble for your strategy to generate trading signals from raw news data, you can also buy news sentiment data from Ravenpack, Thomson Reuters News Analytics, MarketPsych, or Accern. (AcquireMedia's NewsEdge database is similar, but they provide only impact scores. This is a kind of unsigned sentiment score that doesn't tell you which way the stock will move, only that it will move, which may be suitable for options traders.) However, there is one problem for sentiment data: Different vendors have different ways to compute sentiment scores from the raw news. So a trading model depends to some extent on which vendors' sentiment scores are most predictive.
We will leave the topic of buying or renting intraday data to Chapter 6 on Intraday Trading, because the features associated with intraday data are intimately tied to the market microstructure. Here, we will just note that some of the historical intraday data vendors include tickdata.com, nanex.net, CQG, QuantGo.com, kibot, and of course, the various exchanges themselves.
Finding, buying, or renting data is both expensive and time‐consuming, though consolidators like Quandl and QuantGo have made it much less so. Another way to avoid dealing with acquiring data directly is to adopt a trading platform that comes integrated with data (though you may have to pay for the data separately). A good example is Quantopian.com, which provides free US stock trades data with one‐minute bars, together with many other forms of fundamental and news data at lower frequency. (I have been told that futures data will be available soon.) We will talk more about platforms like these in the section “Backtesting and Trading Platforms.”
Most if not all brokerages provide live market data to their clients, and if you are trading a daily strategy (i.e., you trade only at the market open or close), such data are usually more than sufficient. However, if you engage in intraday trading, then the quality of data becomes a bigger issue. As we will discuss more thoroughly in Chapter 6, low latency market data can be quite expensive to obtain. Vendors that provide data suitable for intraday trading that can tolerate a latency of more than 25ms (ms = millisecond) include eSignal, IQFeed, CQG, Interactive Data, Bloomberg, and many others. But vendors that provide data feed with latency of below 10ms are far fewer: They include S&P Capital IQ (formerly QuantHouse), SR Labs, and Thomson Reuters. Of course, you can also subscribe to the direct feeds from the exchange, but that is strictly for high frequency traders who find the high expense justified by the high return. (That is true unless you are after currency data, where most FX exchanges will give their customer a free direct feed.)
As with historical market data, many trading platforms also include live market data feeds. These will be discussed in the following section.
Traditionally, we would backtest our trading strategy on one platform (e.g., R) and once successful, write a different program to automate execution of the strategy that utilizes a broker's API. However, this proves to be quite bug‐prone: there is no way to ensure that the backtest and the live trading program encapsulate exactly the same trading logic. Fortunately, most backtest platforms nowadays have extended their ability to execute live as well; hence, we will combine the discussions on backtesting and trading platforms here.
As I mentioned in the Preface, MATLAB has been my favorite backtesting platform. It has a very comprehensive and user‐friendly interface for developing and debugging programs, and it has a wide array of toolboxes that cover almost every arcane mathematical or computational technique you will likely encounter in trading strategy development. One of these toolboxes, the Trading Toolbox, enables a MATLAB program to connect to a number of brokerages' APIs to receive market data, submit orders, and receive order status notifications. If you prefer not to buy the Trading Toolbox, there are at least three adaptors developed by third‐party vendors that enable you to do the same: exchangeapi.com's quant2ib, undocumentedmatlab.com's IB‐Matlab, and Jev Kuznetsov's free MATLAB‐to‐IB API available at MATLAB's File Exchange. I have discussed these options in some depth in Chan (2013). Finally, MATLAB is fast. (See Chapter 6 for a comparison of performance speed among MATLAB, R, and Python.) The only drawback for this platform is that it isn't free, but the “Home” license costs only $150, with each additional toolbox costing an extra $45. If you plan to buy MATLAB's Toolboxes, here are the three I recommend (in decreasing order of importance): Statistics and Machine Learning, Econometrics, and Financial Instruments (for options traders).
For those who prefer free, open‐source platforms, there are always R and Python.
R is very similar to MATLAB: It is an array‐processing language, and it has a large variety of specialized “packages” (the analogue of MATLAB's toolboxes), many of them perhaps more sophisticated than MATLAB's due to the large number of academic researchers who use R. There is a GUI development platform called RStudio, but I find its user interface to be quite crude compared to that of MATLAB, and hence debugging productivity is lower. R is also the slowest among the three languages, and the slowness is all the more problematic because, unlike MATLAB or Python, it cannot be compiled into C or C++ code for speeding up. (You can, however, use the Rcpp package to access compiled C++ code from within R.) As for automating executions, you can connect an R program to Interactive Brokers through a package called “IBroker.”
Python is a language in the ascendant, though I know of quants who used it for backtesting back in 1998. Aside from being a standalone language of choice for many quantitative traders, platforms such as Quantopian also use it as their strategy specification language. Native Python is not an array processing language (though one can use SciPy packages which do have this feature). While array processing is convenient for backtesting a large number of instruments simultaneously (e.g., portfolio of stocks), it is not useful for writing an automated execution program. Hence, if we were to insist on using the same program for both backtesting and live execution, we can disregard this feature altogether. One major advantage of Python is that its codes can be developed and debugged using Microsoft's Visual Studio, thus piggybacking on the full power of this well‐polished development environment. Another integrated development environment2 (IDE) for Python that received good reviews is PyCharm. Python's pandas library is a data analysis package similar to R, and the rpy2 package provides an interface to access all R packages. Python isn't as fast as MATLAB, though it is faster than R, and can be compiled into C or C++.3 You can connect a Python trading program to Interactive Brokers for executions via IBPy or a number of other packages.
In Table 1.1, I provide my personal, arguably subjective, ranking of the various features and aspects of the three scripting languages most widely used in trading strategy development.
Table 1.1: Ranking of Three Programming Languages for Quant Trading (Ranked from 1 to 3 where 1 = best ranking and 3 = poorest ranking.)
Feature
MATLAB
R
Python
Ease of use
1
2
2
IDE polish
1
3
1
Speed
1
3
2
Toolboxes
2
1
1
Compilation to C/C++
1
N/A
1
Connectivity to brokers
1
2
2
Customer support
1
N/A
N/A
Price
2
1
1
You may be wondering why I have left out some of the most common programming languages such as C/C++, Java, or C#. The fact is, the most productive way to develop trading strategies is to quickly build prototype programs to test a lot of them, and to quickly discard them if they fail to live up to their promises. Scripting languages (also called REPL4 languages) like MATLAB, R, or Python allow us to shorten this research cycle. However, productivity in research is not the same as productivity in building an automated trading system. The latter usually involves the usual bells and whistles of software development such as object‐oriented design, exception handling, version control, memory and speed optimization, and so forth. Such bells and whistles are actually quite cumbersome to implement in a scripting language, and, furthermore, it is quite easy to introduce bugs without a robust and extensible software architecture. Typically, once object‐oriented design is imposed on a scripting language, it will run too slowly to be useful as an execution system.
To benefit from the best of both worlds, in our firm we do our initial research mostly in MATLAB or Python (though I often ask our research associates to test their strategies on Quantopian first, just to make sure their codes do not have look‐ahead bias). After we settle on a strategy, our CTO, Roger, will then independently build a system in C# that can both backtest and execute live the same strategy, as a way to confirm the correctness of the initial backtest. Once it is confirmed, a figurative turn of the key will allow us to trade live, with much lower latency than if we were to execute using a scripting language. In building the execution system, we can often reuse existing classes that we have written for other strategies. This way, we have compressed the research life cycle without sacrificing software correctness, component reuse, or execution efficiency.
There are now many backtesting and automated execution platforms available that purport to make it easier to develop and deploy automated trading strategies both quickly and robustly, just like what I described in the last paragraph, but without using two different languages. I have written extensively about this in Chan (2013), so here I will restrict myself to those platforms that fit the following criteria:
It has integrated historical and live market data, or provides adaptors for popular data vendors.
It allows maximum flexibility in strategy design by relying on generic programming languages such as Python or Java.
It allows connection to popular broker APIs.
It allows backtesting and live trading in US equities, among other instruments.
The platforms that satisfy these criteria are Quantopian, QuantConnect, Ninjatrader, Marketcetera, AlgoTrader, OpenQuant, Deltix, and QuantHouse. (This is, of course, not an exhaustive list.) Some of these platforms, such as Quantopian, are free, while others, such as QuantHouse, have a price tag suitable for institutional traders.
If you are trading a high frequency strategy, it is possible that many of these platforms will not be adequate: either because they may be too slow, or because they are missing level 2 quotes. But there are platforms such as Lime Brokerage's Strategy Studio that are designed specifically for such demanding tasks. More about this will be discussed in Chapter 6.
Though these platforms have certainly made it easier to backtest and automate trading strategies, they usually come with some constraints on the exact type of strategies, data, or instruments that we are allowed to test. Not surprisingly, the more expensive a platform is, the more flexible it is. But if absolute flexibility coupled with low development cost is required, you will probably have to do what we do in our firm.
In this day and age, practically every broker will offer customers an API where they can subscribe to market data, submit orders, and receive order status notifications: See the last two blocks on the right of Figure 1.1. Of course, the ease of use and comprehensiveness of their APIs differ widely. If you would rather not deal directly with the vagaries and low‐level details of each broker's API, you can use one of the automated execution platforms that I described in the previous section.
Meanwhile, commissions are generally so low that it also isn't a crucial factor in deciding who to trade with. Brokers have other ways to make money from you besides commissions. One popular way, for some stockbrokers, is “payment for order flow.” Essentially, when you send your order to this broker, it will forward your order to a particular market maker or a specific exchange for a rebate (e.g., a penny per share), and let them execute this order. Brokers who do this must inform their customers of this practice when they open their accounts. If you prefer not to let your broker earn this rebate, and instead want to pocket it yourself, you can use brokers with Direct Market Access (e.g., Interactive Brokers or Lime Brokerage). For some FX brokers, a popular way to earn money is to widen the bid‐offer spread on a currency pair. This way, they earn the difference between the spread they offer you and the spread they have to pay in the interbank market where they execute your trade. This can make it hard to compare costs among different FX brokers, since this extra spread may change over time, but of course this is precisely the reason why some brokers adopt the practice. If you prefer transparency, you can restrict your search to FX brokers that only charge commissions, not spread.
If you are a futures or currency trader, there is one more item about your broker that you have to worry about: financial stability. For US securities (i.e., stocks trading) accounts, many traders know that their cash5 is insured up to $250,000 by the Securities Investor Protection Corporation (SIPC). Many brokers also provide additional insurance through a private insurance company such as Lloyd's of London. However, none of this is applicable to commodities futures trading accounts—hence the furor in wake of the MF Global and PFGBest bankruptcies (MarketWatch, 2012). It would be pointless to advise you to ascertain the financial strength of a commodities broker before opening an account: If the National Futures Association (NFA) couldn't detect that some of these large brokers (called Futures Commission Merchants, or FCM) are not meeting capital requirement or are committing fraud, what chance do we have? However, there is one way out of this conundrum: If a securities broker is also an FCM, it might offer a way to automatically sweep the excess cash6 from an uninsured commodities account into an insured securities account. (For example, Interactive Brokers offers such a cash sweep service.)
Currency traders may have heard of Herstatt risk, or settlement risk (Galati, 2002). To illustrate, let's say we have sold 1 million EURUSD to a bank, and have delivered 1 million EUR to it. But before this bank can send us the corresponding amount of USD, it collapses (as was the case with Bankhaus Herstatt in Germany, 1974). We won't be getting our USD any time soon, and have lost 1 million EUR. It is a risk that is not solved by having an account at a reputable FX broker, because your counterparty may be some bank in a faraway country, and your broker isn't liable for your loss. This scenario is often used by some uninformed financial advisors to scare customers off investing in foreign currency strategies or funds. However, since the establishment of the CLS bank in 2002, an institution owned by some of the largest banks in the world, this risk has been largely eliminated. The CLS bank is a US financial institution regulated by the US Federal Reserve, and it facilitates simultaneous settlement of both legs of a currency transaction. It can do so because it maintains accounts with 18 central banks around the world. In our example above, we will receive the USD payment at the same time our counterparty receives our EUR payment, all via the CLS bank. It is almost like transacting on a stock exchange, where we are guaranteed that if we sold some shares, payment for those shares would be forthcoming, even if the trader who bought our shares went bankrupt. The 18 central banks whose currencies settle in CLS are listed at www.cls‐group.com/About/Pages/Currencies.aspx, and these are the currencies that have little or no settlement risk.
We have been discussing the risks that other parties (brokers or counterparties) fail to pay us in a transaction. But what if we are the ones who fail? More specifically, what if in a levered transaction, we lost more money in the brokerage account than our account equity? Are we liable to the negative equity? Can we lose more than what we invested in the account? The answer is yes and yes, if we invested as individuals. In a particularly poignant example, on January 15, 2015, the Swiss National Bank (Switzerland's central bank) suddenly announced that the Swiss Franc would no longer be pegged to the euro (Economist, 2015). EURCHF plummeted by about 10 percent in seconds, and ultimately lost 29 percent in a day. Many FX traders' account equities went negative. Some smaller FX brokers failed because their customers wouldn't pay them back the losses. In a situation like this, how do we make sure we won't be liable? The way to protect ourselves is not to invest in our own personal or family's name, but through a limited liability vehicle that we fully own, such as a corporation (or S‐corporation in the United States), limited liability company (in the United States), or limited partnership. Investing in someone else's limited partnership, such as a hedge fund, will also work. In case of negative equity, the limited liability vehicle will have to declare bankruptcy. This isn't great, but is not catastrophic.
What performance metrics should a backtest program generate? I am typically interested in only five of them: CAGR (compound annual growth rate), Sharpe ratio, Calmar ratio, maximum drawdown, and maximum drawdown duration. I have discussed most of these except the Calmar ratio in detail in my previous books, so I will just briefly highlight some of their usage here.
CAGR is a bit different from average annualized returns: It assumes we do not transfer cash into or out of an account each time period, while maintaining the same leverage throughout. To take an extreme example, if a strategy returns 1 percent per trading day, CAGR will be 1.01252 − 1 = 1127 percent. That's compounding at work. On the other hand, the average annualized return would be just 252 × 0.01 = 252 percent, and it is the return we would get if we withdraw profit or add cash to make up for losses each time period. But I emphasize that we must keep the leverage constant to achieve this compounded growth. In other words, your positions or orders must be resized each day based on the account equity and the leverage—something that an automated trading program should be able to do quite easily.
In a backtest, I recommend we set the leverage to one. Otherwise, the higher the leverage we use, the higher will be the CAGR, up to a point as determined by the Kelly formula below. So it is quite meaningless to pick an arbitrary leverage for a backtest. But to ensure that the leverage is one, one must make sure that the returns are measured by taking the Profit and Loss (P&L) and divide that by the total gross market value of the position(s). For example, if we are long $100 of stock A while short $100 of stock B, and the P&L is $1, the unlevered return is just 0.5 percent.
I have written a lot before about using the Kelly formula (Chan, 2009):
