35,99 €
If you are interested in quantitative finance, financial modeling, and trading, or simply want to learn how Python and pandas can be applied to finance, then this book is ideal for you. Some knowledge of Python and pandas is assumed. Interest in financial concepts is helpful, but no prior knowledge is expected.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 269
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: May 2015
Production reference: 1190515
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-510-4
www.packtpub.com
Author
Michael Heydt
Reviewers
James Beveridge
Philipp Deutsch
Jon Gaither
Jim Holmström
Francesco Pochetti
Commissioning Editor
Kartikey Pandey
Content Development Editor
Merwyn D'souza
Technical Editor
Shashank Desai
Copy Editor
Sarang Chari
Project Coordinator
Neha Bhatnagar
Proofreaders
Stephen Copestake
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Sheetal Aute
Disha Haria
Production Coordinator
Conidon Miranda
Cover Work
Conidon Miranda
Michael Heydt is an independent consultant, educator, and trainer with nearly 30 years of professional software development experience, during which time, he focused on Agile software design and implementation using advanced technologies in multiple verticals, including media, finance, energy, and healthcare. He holds an MS degree in mathematics and computer science from Drexel University and an executive master's of technology management degree from the University of Pennsylvania's School of Engineering and Wharton Business School. His studies and research have focused on technology management, software engineering, entrepreneurship, information retrieval, data sciences, and computational finance. Since 2005, he has specialized in building energy and financial trading systems for major investment banks on Wall Street and for several global energy-trading companies, utilizing .NET, C#, WPF, TPL, DataFlow, Python, R, Mono, iOS, and Android. His current interests include creating seamless applications using desktop, mobile, and wearable technologies, which utilize high-concurrency, high-availability, and real-time data analytics; augmented and virtual reality; cloud services; messaging; computer vision; natural user interfaces; and software-defined networks. He is the author of numerous technology articles, papers, and books. He is a frequent speaker at .NET user groups and various mobile and cloud conferences, and he regularly delivers webinars and conducts training courses on emerging and advanced technologies. To know more about Michael, visit his website at http://bseamless.com/.
James Beveridge is a product analyst and machine learning specialist. He earned his BS degree in mathematics from Cal Poly, San Luis Obispo, CA. He has worked with the finance and analytics teams in technology and marketing companies in the Bay Area, Chicago, and New York. His current work focuses on segmentation and classification modeling, statistics, and product development. He has enjoyed contributing to this book as a technical reviewer.
Philipp Deutsch obtained degrees in mathematics and physics from the University of Vienna and the Vienna University of Technology before starting a career in financial services and consulting. He has worked on a number of projects involving data analytics across Europe, both in the banking and consumer/retail sectors, and has extensive experience in Python, R, and SQL. He currently lives in London.
Jon Gaither is a senior information systems student at Clemson University with a background in finance. He started learning Python during his sophomore year of college. Since then, he has dabbled in frameworks such as Flask, Django, and pandas purely out of interest. Outside of Python, Jon has studied Java, SAS, VBA, and SQL. His professional experience comes from internships in financial services and satellite communications.
Jim Holmström is soon to graduate with a bachelor's degree in engineering physics and a master's degree in machine learning from KTH Royal Institute of Technology, Stockholm.
He is currently a developer and partner at Watty—an electricity data analysis start-up that creates a breakdown of a household's energy spending from the total electricity consumption data. Watty's leading-edge technology stack has pandas as an integral part.
Both professionally and in his free time, he enjoys data analysis, functional programming, and well-structured code.
For more information, visit http://portfolio.jim.pm.
Francesco Pochetti graduated in physical chemistry in Rome in 2012 and was employed at Avio in Italy. He worked there for 2 years as a solid rocket propellant specialist, taking care of the formulation and development of rocket fuels for both military and aerospace purposes. In July 2014, he moved to Berlin to attend Data Science Retreat—a 3-month boot camp in data analysis and machine learning in Python and R. After this short German experience, he ended up at Amazon in Luxembourg, where he currently works as a business analyst for Kindle content.
In his spare time, he likes to read and play around with several programming languages, Python being among his preferred ones. You can follow him and his data-related projects at http://francescopochetti.com/.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Mastering pandas for Finance will teach you how to use Python and pandas to model and solve real-world financial problems using pandas, Python, and several open source tools that assist in various financial tasks, such as option pricing and algorithmic trading.
This book brings together various diverse concepts related to finance in an attempt to provide a unified reference to discover and learn several important concepts in finance and explains how to implement them using a core of Python and pandas that provides a unified experience across the different models and tools.
You will start by learning about the facilities provided by pandas to model financial information, specifically time-series data, and to use its built-in capabilities to manipulate time-series data, group and derive aggregate results, and calculate common financial measurements, such as percentage changes, correlation of time-series, various moving window operations, and key data visualizations for finance.
After establishing a strong foundation from which to use pandas to model financial time-series data, the book turns its attention to using pandas as a tool to model the data that is required as a base for performing other financial calculations. The book will cover diverse areas in which pandas can assist, including the correlations of Google trends with stock movements, creating algorithmic trading systems, and calculating options payoffs, prices, and behaviors. The book also shows how to model portfolios and their risk and to optimize them for specific risk/return tolerances.
Chapter 1, Getting Started with pandas Using Wakari.io, walks you through using Wakari.io, an online collaborative data analytics platform, that utilizes Python, IPython Notebook, and pandas. We will start with a brief overview of Wakari.io and step through how to upgrade the default Python environment and install all of the tools used throughout this text. At the end, you will have a fully functional financial analytics platform supporting all of the examples we will cover.
Chapter 2, Introducingthe Series and DataFrame, teaches you about the core pandas data structures—the Series and the DataFrame. You will learn how a Series expands on the functionality of the NumPy array to provide much richer representation and manipulation of sequences of data through the use of high-performance indices. You will then learn about the pandas DataFrame and how to use it to model two-dimensional tabular data.
Chapter 3, Reshaping, Reorganizing, and Aggregating, focuses on how to use pandas to group data, enabling you to perform aggregate operations on grouped data to assist with deriving analytic results. You will learn to reorganize, group, and aggregate stock data and to use grouped data to calculate simple risk measurements.
Chapter 4, Time-series, explains how to use pandas to represent sequences of pricing data that are indexed by the progression of time. You will learn how pandas represents date and time as well as concepts such as periods, frequencies, time zones, and calendars. The focus then shifts to learning how to model time-series data with pandas and to perform various operations such as shifting, lagging, resampling, and moving window operations.
Chapter 5, Time-series Stock Data, leads you through retrieving and performing various financial calculations using historical stock quotes obtained from Yahoo! Finance. You will learn to retrieve quotes, perform various calculations, such as percentage changes, cumulative returns, moving averages, and volatility, and finish with demonstrations of several analysis techniques including return distribution, correlation, and least squares analysis.
Chapter 6, Trading Using Google Trends, demonstrates how to form correlations between index data and trends in searches on Google. You will learn how to gather index data from Quandl along with trend data from Google and then how to correlate this time-series data and use that information to generate trade signals, which will be used to calculate the effectiveness of the trading strategy as compared to the actual market performance.
Chapter 7, Algorithmic Trading, introduces you to the concepts of algorithmic trading through demonstrations of several trading strategies, including simple moving averages, exponentially weighted averages, crossovers, and pairs-trading. You will then learn to implement these strategies with pandas data structures and to use Zipline, an open source back-testing tool, to simulate trading behavior on historical data.
Chapter 8, Working with Options, teaches you to model and evaluate options. You will first learn briefly about options, how they function, and how to calculate their payoffs. You will then load options data from Yahoo! Finance into pandas data structures and examine various options attributes, such as implied volatility and volatility smiles and smirks. We then examine the pricing of options with Black-Scholes using Mibian and finish with an overview of Greeks and how to calculate them using Mibian.
Chapter 9, Portfolios and Risk, will teach you how to model portfolios of multiple stocks using pandas. You will learn about the concepts of Modern Portfolio Theory and how to apply those theories with pandas and Python to calculate the risk and returns of a portfolio, assign different weights to different instruments in a portfolio, derive the Sharpe ratio, calculate efficient frontiers and value at risk, and optimize portfolio instrument allocation.
This book assumes that you have some familiarity with programming concepts, but even those without programming, or specifically Python programming, experience, will be comfortable with the examples as they focus on pandas constructs more than Python or programming. The examples are based on Anaconda Python 2.7 and pandas 0.15.1. If you do not have either installed, guidance is provided in Chapter 1, Getting Started with pandas Using Wakari.io, on installing both on Windows, OS X, and Ubuntu systems. For those interested in not installing any software, instructions are also given on using the Wakari.io online Python data analysis service. Data is either provided with the text or is available for download via pandas from data services such as Yahoo! Finance. We will also use several open source software packages such as Zipline and Mibian, the retrieval, installation, and usage of which will be explained during the appropriate chapters.
If you are interested in quantitative finance, financial modeling, trading, or simply want to learn Python and pandas as applied to finance, then this book is for you. Some knowledge of Python and pandas is assumed, but the book will spend time explaining all of the necessary pandas concepts that are required within the context of application to finance. Interest in financial concepts is helpful, but no prior knowledge is expected.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title through the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. The code examples in the book are also publicly available on Wakari.io at https://wakari.io/sharing/bundle/Pandas4Finance/MasteringPandas4Finance_Index.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
In Mastering pandas for Finance, we will examine the use of pandas to manage financial data and perform various financial analyses with a specific focus on financial processes that can be facilitated using the capabilities provided within pandas, along with an occasional quantitative financial technique. I have made an assumption that you have basic knowledge of Python programming and have used IPython and IPython Notebooks. Knowledge of pandas is preferred, but we will cover enough information on pandas for any reader to be able to understand the technique being used. We will occasionally and briefly touch upon areas of quantitative finance, but those times will be mostly for information purposes and will have implementations that are provided in the code of the text.
During this voyage of discovery, we will begin with an overview/review of concepts and data structures in pandas that are of importance to financial analysis. We will then move into various concepts, techniques, tools, and examples of specific financial analysis problems as solved with Python, pandas, and several other Python libraries and tools, including Wakari, matplotlib, SciPy, Quandl, Zipline, and Mibian. These will be varied in nature, and topics ranging from analysis of historical stock data, correlating search data with trends in stock prices, algorithmic trading and backtesting, options modeling and pricing, and portfolio and risk analysis will be covered.
In this first chapter, we will walk through creating an account and environment in Wakari.io and installing the code samples into that environment. I have chosen Wakari.io as a basis for a pandas-based financial environment because it is relatively painless to get up and running with all of the tools we will utilize, and also the samples provided in the code bundle of this book are in the IPython Notebook format, which is simple to use within Wakari.io.
The use of Wakari, however, does not prevent you from using your own Python environment. The examples in the text will run in any Python environment and were originally built using the Anaconda and IPython Notebook formats with all of the mentioned tools installed within the environment. Just in case you don't want to use Wakari, all the code examples in the text are presented as IPython and will run in a properly configured IPython environment.
So, let's get started. In this chapter, we will cover the following topics:
Wakari (http://continuum.io/wakari) is a collaborative data analytics platform that allows you to explore data and create analytic scripts in collaboration with IPython Notebooks. It is an offering of Continuum Analytics, the creators of the Anaconda Python distribution, which is generally considered to be one of the best Python distributions. Wakari is offered as a solution that you can run in your enterprise at an expense, or as a web- or cloud-based solution offered on a freemium basis. The following screenshot shows Wakari as an offering of Continuum Analytics:
The approach in this text will be to guide you in using the cloud-based Wakari solution. This environment provides an effective quick start to learning pandas and performing all the data analysis in this text but with very minimal effort in managing a local Python installation.
The cloud-based offering for Wakari is available at https://wakari.io. For convenience, from this point on, I will refer to Wakari.io as Wakari, but always know that I am referring to the cloud-based solution.
Wakari is a freemium service that allows you to run web-based Python distributions. Specifics on the free part of the freemium services can be found on the site, but all of the examples in this text can be run for free in the Wakari environment (at least at the time of writing this book). Wakari offers very low resistance to success in learning all of the concepts in this text as well as many others.
The guidance in this chapter will take you through creating and setting up an online Python environment, which can run all of the examples in this book. To start, open your browser and enter https://wakari.io in the address bar. This will display the following page:
Sign up for a new account, and upon successful registration for the service, you will be presented with the following web interface to manage IPython Notebooks:
IPython Notebooks are a default feature in Wakari for the purpose of developing Python applications. All the examples in this book were developed as IPython Notebooks, although the code can be run sequentially in IPython or even Python. An advantage of IPython Notebooks is the ability to intermix markdown with Python code within a semi-dynamic web page, which allows easy reuse of code, and perhaps more importantly, publishing of code on the Web.
As a matter of fact, you can find all the code files for this book on Wakari at https://wakari.io/sharing/bundle/Pandas4Finance/MasteringPandas4Finance_Index.
At the time of writing this book, the default Python environment provided by Wakari is Python 2.7.9, and more specifically, Anaconda 1.9.1 (all version numbers are at the time of writing, so when you read this, they may be newer). This is, in general, a good environment for what we want to accomplish in this book, although a few packages need updating and several others need to be installed. In Wakari, pandas is currently at 0.16.0, which is satisfactory for our needs.
The specific packages that either need updating or installing are as follows:
We will go over each of these briefly and also see how to install/update each. In general, the update/install process will be performed using a shell within Wakari. One of the spectacular features of Wakari includes running both interactive IPython sessions and operating system shells directly in the browser.
From a new environment within Wakari, you can open terminals using the Terminals tab. Click on the Terminals tab, and you will see the following screenshot, which represents a default IPython shell for your account (currently referred to as np18py27-19):
You can perform any Python programming within this web-based interface, including all of the examples in this book. However, the default Wakari environment needs a few updates and first-time installs to run all of the examples in the text.
We can perform updates to the environment by opening a shell. This can be performed by selecting Shell from the drop-down menu, along with np18py27-1.9, and pressing the +Tab button. After that, you will be presented with the following screenshot:
We are now in an OS shell that provides you with many options, including updating your Python environment, which we will now perform.
We need to update one package in the default Wakari environment—matplotlib. This is the graphics package we will use at various points in this book. For most of the purposes, the version in Wakari (1.3.1) is satisfactory, but the candlestick charts that we will create require an update to matplotlib from 1.3.1 to a higher version. This is performed with the conda package manager using the conda update matplotlib command. When issuing this, you will see something similar to the following in the terminal tab in your web browser:
The remainder of the packages need to be installed. All these package installations follow the same process, although there are slightly different commands, which alternate between using pip and the conda package manager for installation.
For time zone operations, tzlocal is used and is updated using pip. The installation is performed as shown here:
The samples do not use html5lib directly, but other libraries do use it indirectly. We will use these libraries to read and parse data. We need to update this using conda, as shown here:
A library provided at https://www.quandl.com/, Quandl is a provider of data that you can integrate into your applications via download or the API. The Python API that we will use to access S&P 500 data is free and can be installed using conda, as shown here:
Available at https://www.quantopian.com/, Zipline is a backtesting/trading simulator that we will use. Quantopian is a website that focuses on algorithmic trading, and it produces Zipline, which it uses as one of its underlying technologies. Although installed using conda, Zipline requires the use of a different channel. Notice the slight variation in the use of conda to specify the Quantopian channel in the following screenshot:
The final package we need to install is Mibian, a small library that computes Black-Scholes and its derivatives. This is installed using pip, as shown here:
We are now ready to run any of the sample Notebooks.
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
To install the examples in Wakari, download the code bundle and unzip the files to a local directory. You will see a set of files as shown here:
To upload the files to Wakari, click on the upload files icon and drag the files into the Drag & Drop Here section of the web page:
Once dropped, click on the Upload Files button, and you will see the following files in your Wakari directory:
At this point, you should be able to open and run any of the Notebooks and even examine the data in the browser. As an example, the following screenshot demonstrates the Notebook for Chapter 2, Introducing the Series and DataFrame, opened in Wakari:
This chapter was a brief introduction to this book. You learned how to set up a Python environment in Wakari.io to be able to run the code samples provided throughout the text. This included instructions on how to update the default Wakari.io Python environment to support the required packages that are required for all of the examples in the remainder of the text.
In the next chapter, we will dive into using pandas and its core data structures, Series and DataFrame. These will be core to representing data in later chapters, where we primarily use pandas DataFrame objects to represent financial data, which we apply to various financial analyses.
pandas provides a comprehensive set of data structures for working with and manipulating data and performing various statistical and financial analyses. The two primary data structures in pandas are Series and DataFrame. In this chapter, we will examine the Series object and how it extends a NumPy ndarray to provide operations such as indexed data retrieval, axis labeling, and automatic alignment. Then, we will move on to examine how DataFrame extends the capabilities of Series to use columnar/tabular data, which can be of more than one data type.