Mastering Numerical Computing with NumPy - Umit Mert Cakmak - E-Book

Mastering Numerical Computing with NumPy E-Book

Umit Mert Cakmak

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

NumPy is one of the most important scientific computing libraries available for Python. Mastering Numerical Computing with NumPy teaches you how to achieve expert level competency to perform complex operations, with in-depth coverage of advanced concepts.
Beginning with NumPy's arrays and functions, you will familiarize yourself with linear algebra concepts to perform vector and matrix math operations. You will thoroughly understand and practice data processing, exploratory data analysis (EDA), and predictive modeling. You will then move on to working on practical examples which will teach you how to use NumPy statistics in order to explore US housing data and develop a predictive model using simple and multiple linear regression techniques. Once you have got to grips with the basics, you will explore unsupervised learning and clustering algorithms, followed by understanding how to write better NumPy code while keeping advanced considerations in mind. The book also demonstrates the use of different high-performance numerical computing libraries and their relationship with NumPy. You will study how to benchmark the performance of different configurations and choose the best for your system.
By the end of this book, you will have become an expert in handling and performing complex data manipulations.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 214

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Mastering Numerical Computing with NumPy
Master scientific computing and perform complex operations with ease
Umit Mert Cakmak
Mert Cuhadaroglu
BIRMINGHAM - MUMBAI

Mastering Numerical Computing with NumPy

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pravin DhandreAcquisition Editor: Viraj MadhavContent Development Editor: Snehal KolteTechnical Editor: Dinesh ChaudharyCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer: Priyanka DhadkeGraphics: Tania DuttaProduction Coordinator: Arvindkumar Gupta

First published: June 2018

Production reference: 1270618

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78899-335-7

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Umit Mert Cakmak is a data scientist at IBM, where he excels at helping clients solve complex data science problems, from inception to delivery of deployable assets. His research spans multiple disciplines beyond his industry and he likes sharing his insights at conferences, universities, and meet-ups.

First and foremost, my heartiest thanks to my mother and father for their true love and support. I am grateful for the lessons that they have taught me. I would like to dedicate my writings to my family, friends, colleagues, and all the great people who relentlessly work to make the world a better place.

Mert Cuhadaroglu is a BI Developer in EPAM, developing E2E analytics solutions for complex business problems in various industries, mostly investment banking, FMCG, media, communication, and pharma. He consistently uses advanced statistical models and ML algorithms to provide actionable insights. Throughout his career, he has worked in several other industries, such as banking and asset management. He continues his academic research in AI for trading algorithms.

I am very grateful to my parents, who have always encouraged me to pursue knowledge. I would also like to thank my coauthor, friends, and the Packt team. I would like to dedicate my writings to my family, friends and all people who supported me in this endeavor.

About the reviewer

Tiago Antao is a computer scientist turned computational biologist with a PhD from Liverpool School of Tropical Medicine in UK. He is a co-author of Biopython, a major bioinformatics package written in Python.

In his career he has worked at University of Cambridge (UK) and University of Oxford (UK), and is currently a research scientist at University of Montana (USA). He is the author of Bioinformatics with Python Cookbook.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Mastering Numerical Computing with NumPy

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Working with NumPy Arrays

Technical requirements

Why do we need NumPy?

Who uses NumPy?

Introduction to vectors and matrices

Basics of NumPy array objects

NumPy array operations

Working with multidimensional arrays

Indexing, slicing, reshaping, resizing, and broadcasting

Summary

Linear Algebra with NumPy

Vector and matrix mathematics

What's an eigenvalue and how do we compute it?

Computing the norm and determinant

Solving linear equations

Computing gradient

Summary

Exploratory Data Analysis of Boston Housing Data with NumPy Statistics

Loading and saving files

Exploring our dataset

Looking at basic statistics

Computing histograms

Explaining skewness and kurtosis

Trimmed statistics

Box plots

Computing correlations

Summary

Predicting Housing Prices Using Linear Regression

Supervised learning and linear regression

Independent and dependent variables

Hyperparameters

Loss and error functions

Univariate linear regression with gradient descent

Using linear regression to model housing prices

Summary

Clustering Clients of a Wholesale Distributor Using NumPy

Unsupervised learning and clustering

Hyperparameters

The loss function

Implementing our algorithm for a single variable

Modifying our algorithm

Summary

NumPy, SciPy, Pandas, and Scikit-Learn

NumPy and SciPy

Linear regression with SciPy and NumPy

NumPy and pandas

Quantitative modeling with stock prices using pandas

SciPy and scikit-learn

K-means clustering in housing data with scikit-learn

Summary

Advanced Numpy

NumPy internals

How does NumPy manage memory?

Profiling NumPy code to understand the performance

Summary

Overview of High-Performance Numerical Computing Libraries

BLAS and LAPACK

ATLAS

Intel Math Kernel Library

OpenBLAS

Configuring NumPy with low-level libraries using AWS EC2

Installing BLAS and LAPACK

Installing OpenBLAS

Installing Intel MKL

Installing ATLAS

Compute-intensive tasks for benchmarking

Matrix decomposition

Singular-value decomposition

Cholesky decomposition

Lower-upper decomposition

Eigenvalue decomposition

QR decomposition

Working with sparse linear systems

Summary

Performance Benchmarks

Why do we need a benchmark?

Preparing for a performance benchmark

Performance with BLAS and LAPACK

Performance with OpenBLAS

Performance with ATLAS

Performance with Intel MKL

Results

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

If you are trying to hone your skills in the field of data science, there are many books and courses out there with varying levels of difficulty. What usually happens is that you start to study introductory resources and then continue with more in-depth, technical ones to get a taste of a new field or technology. If you were following this kind of learning path for sometime, you must have realized that it becomes very time consuming journey. We, as lifelong learners, need books with more compact representation of knowledge and experience which requires the right balance between theory and practice. This book aims to bring beginner, intermediate, and advanced concepts together and it is our humble effort to build up your knowledge from scratch.

This book assumes no previous background of scientific computing and will introduce various subjects using practical examples. It may sometimes feel like separate topics pulled together randomly and the book's flow doesn't stick to one consistent path. This was a deliberate decision we made to give you a little taste of several different topics and applications.

We hope that you will read this book to have a broader overview of scientific computing as well as to master the nitty-gritty of NumPy and other supporting scientific libraries of Python such as SciPy and Scikit-Learn.

Who this book is for

This book is for everyone who would like to gain additional knowledge in the data science field. Mastering Numerical Computing with NumPy is for you if you are a Python programmer, data analyst, data engineer, or data science enthusiast who wants to master the intricacies of NumPy and build solutions for your numeric and scientific computational problems. You are expected to have familiarity with mathematics to get the most out of this book.

What this book covers

Chapter 1, Working with Numpy Arrays, explains the basics of numerical computing with NumPy, which is a Python library for working with multi-dimensional arrays and matrices used by scientific computing applications.

Chapter 2, Linear Algebra with Numpy, covers the basics of linear algebra and provides practical NumPy examples.

Chapter 3, Exploratory Data Analysis of Boston Housing Data with NumPy Statistics, explains exploratory data analysis and provides examples using Boston Housing Dataset.

Chapter 4, Predicting Housing Prices Using Linear Regression, covers supervised learning and provides a practical example for predicting housing prices using linear regression.

Chapter 5, Clustering Clients of a Wholesale Distributor Using NumPy, explains unsupervised learning and provides a practical example of a clustering algorithm to model a wholesale distributor sales dataset, which contains information on annual spending in monetary units for diverse product categories.

Chapter 6, NumPy, SciPy, Pandas, and Scikit-Learn, shows the relationship between NumPy and other libraries and provides examples of how they are used together.

Chapter 7, Advanced Numpy, explains the advanced considerations of NumPy library usage.

Chapter 8, Overview of High-Performance Numerical Computing Libraries, introduces several low-level, high-performance numerical computing libraries and their relationship with NumPy.

Chapter 9, Performance Benchmarks, takes a deep dive into the performance of NumPy algorithms depending on the underlying high-performance numerical computing libraries.

To get the most out of this book

Basic Python programming knowledge will definitely help, though it is not strictly necessary

Anaconda distribution for Python 3 will be enough to cover most of the examples used in this book

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Mastering-Numerical-Computing-with-NumPy. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/MasteringNumericalComputingwithNumPy_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Another important parameter in this function is learning_rate."

A block of code is set as follows:

'sepal width (cm)','petal length (cm)','petal width (cm)'])

Any command-line input or output is written as follows:

$ sudo apt-get update

$ sudo apt-get upgrade

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Dependent is the variable that we want to predict."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Working with NumPy Arrays

Scientific computing is a multidisciplinary field, with its applications spanning across disciplines such as numerical analysis, computational finance, and bioinformatics.

Let's consider a case for financial markets. When you think about financial markets, there is a huge interconnected web of interactions. Governments, banks, investment funds, insurance companies, pensions, individual investors, and others are involved in this exchange of financial instruments. You can't simply model all the interactions between market participants because everyone who is involved in financial transactions has different motives and different risk/return objectives. There are also other factors which affect the prices of financial assets. Even modeling one asset price requires you to do a tremendous amount of work, and your success is not guaranteed. In mathematical terms, this doesn't have a closed-form solution and this makes a great case for utilizing scientific computing where you can use advanced computing techniques to attack such problems.

By writing computer programs, you will have the power to better understand the system you are working on. Usually, the computer program you will be writing will be some sort of simulation, such as the Monte Carlo simulation. By using a simulation such as Monte Carlo, you can model the price of option contracts. Pricing financial assets is a good material for simulations, simply because of the complexity of financial markets. All of these mathematical computations need a powerful, scalable and convenient structure for your data (which is mostly in matrix form) when you do your computation. In other words, you need a more compact structure than a list in order to simplify your task. NumPy is a perfect candidate for performant vector/matrix operations and its extensive library of mathematical operations makes numeric computing easy and efficient.

In this chapter, wewill cover the following topics:

The importance of NumPy

Theoretical and practical information about vectors and matrices

NumPy array operations and their usage in multidimensional arrays

The question is, where should we start practicing coding skills? In this book, you will be using Python because of its huge adoption in the scientific community, and you will mainly work with a specific library called NumPy, which stands for numerical Python.

Technical requirements

In this book, we will use Jupyter Notebooks. We will edit and run Python code via a web browser. It's an open source platform which you can install by following the instructions in this link: http://jupyter.org/install.

This book will be using Python 3.x, so when you open a new notebook, you should pick Python 3 kernel. Alternatively, you can install Jupyter Notebook using Anaconda (Python version 3.6), which is highly recommended. You can install it by following the instructions in this link: https://www.anaconda.com/download/.

Why do we need NumPy?

Python has become a rockstar programming language recently, not only because it has friendly syntax and readability, but because it can be used for a variety of purposes. Python's ecosystem of various libraries makes various computations relatively easy for programmers. Stack Overflow is one the most popular websites for programmers. Users can ask questions by tagging which programming language they relate to. The following figure shows the growth of major programming languages by calculating these tags and plot the popularity of major programming languages over the years. The research conducted by Stack Overflow can be further analyzed via this link to their official blog: https://stackoverflow.blog/2017/09/06/incredible-growth-python/:

Growth of major programming languages

NumPy is the most fundamental package for scientific computing in Python and is the base for many other packages. Since Python was not initially designed for numerical computing, this need has arised in the late 90's when Python started to become popular among engineers and programmers who needed faster vector operations. As you can see from the following figure, many popular machine learning and computational packages use some of NumPy's features, and the most important thing is that they use NumPy arrays heavily in their methods, which makes NumPy an essential library for scientific projects.

The figure shows some well-known libraries which use NumPy features:

NumPy stack

For numerical computing, you mainly work with vectors and matrices. You can manipulate them in different ways by using a range of mathematical functions. NumPy is a perfect fit for these kinds of situations since it allows users to have their computations completed efficiently. Even though Python lists are very easy to create and manipulate, they don't support vectorized operations. Python doesn't have fixed type elements in lists and for example, for loop is not very efficient because, at every iteration, data type needs to be checked. In NumPy arrays, however, the data type is fixed and also supports vectorized operations. NumPy is not just more efficient in multidimensional array operations comparing to Python lists; it also provides many mathematical methods that you can apply as soon as it's imported. NumPy is a core library for the scientific Python data science stack.

SciPy has strong relationship with NumPy as it's using NumPy multidimensional arrays as a base data structure for its scientific functions for linear algebra, optimization. interpolation, integration, FFT, signal and image processing and others. SciPy was built on top of the NumPy array framework and uplifted scientific programming with its advanced mathematical functions. Therefore some parts of the NumPy API have been moved to SciPy. This relationship with NumPy makes SciPy more convenient for advanced scientific computing in many cases.

To sum this up, we can summarize NumPy's advantages as follows:

It's open source and zero-cost

It's a high-level programming language with user-friendly syntax

It's more efficient than Python lists

It has more advanced built-in functions and is well-integrated with other libraries

Who uses NumPy?

In both academic and business circles, you will hear people talking about the tools and technologies they use in their work. Depending on the environment and conditions, you might need to work with specific technologies. For example, if your company has already invested in SAS, you will need to carry out your project in the SAS development environment suited to your problem.

However, one of the advantages of NumPy is that it's open source, and it costs nothing for you to utilize it in your project. If you have already coded in Python, it's super easy to learn. If performance is your concern, you can easily embed C or Fortran code. Moreover, it will introduce you to a whole other set of libraries such as SciPy and Scikit-learn, which you can use to solve almost any problem.

Since data mining and predictive analytics became really important recently, roles like Data Scientist and Data Analyst are mentioned as the hottest jobs of the 21st century in many business journals such as Forbes, Bloomberg, and so on. People who need to work with data and do analysis, modeling, or forecasting should become familiar with NumPy's usage and its capabilities, as it will help you quickly prototype and test your ideas. If you are a working professional, your firm most probably wants to use data analysis methods in order to move one step ahead of its competitors. If they can better understand the data they have, they can understand the business better, and this will lead them to make better decisions. NumPy plays a critical role here as it is capable of performing wide range of operations and making your projects timewise efficient.

Basics of NumPy array objects

As mentioned in the preceding section, what makes NumPy special is the usage of multidimensional arrays called ndarrays. All ndarray items are homogeneous and use the same size in memory. Let's start by importing NumPy and analyzing the structure of a NumPy array object by creating the array. You can easily import this library by typing the following statement into your console. You can use any naming convention instead of np, but in this book, np