Hands-On Data Science with Anaconda - Yuxing Yan - E-Book

Hands-On Data Science with Anaconda E-Book

Yuxing Yan

0,0
27,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Anaconda is an open source platform that brings together the best tools for data science professionals with more than 100 popular packages supporting Python, Scala, and R languages. Hands-On Data Science with Anaconda gets you started with Anaconda and demonstrates how you can use it to perform data science operations in the real world.
The book begins with setting up the environment for Anaconda platform in order to make it accessible for tools and frameworks such as Jupyter, pandas, matplotlib, Python, R, Julia, and more. You’ll walk through package manager Conda, through which you can automatically manage all packages including cross-language dependencies, and work across Linux, macOS, and Windows. You’ll explore all the essentials of data science and linear algebra to perform data science tasks using packages such as SciPy, contrastive, scikit-learn, Rattle, and Rmixmod.
Once you’re accustomed to all this, you’ll start with operations in data science such as cleaning, sorting, and data classification. You’ll move on to learning how to perform tasks such as clustering, regression, prediction, and building machine learning models and optimizing them. In addition to this, you’ll learn how to visualize data using the packages available for Julia, Python, and R.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 267

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On Data Science with Anaconda
Utilize the right mix of tools to create high-performance data science applications
Dr. Yuxing Yan
James Yan
BIRMINGHAM - MUMBAI

Hands-On Data Science with Anaconda

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pravin DhandreAcquisition Editor: Divya PoojariContent Development Editor: Dattatraya MoreTechnical Editor: Nirbhaya ShajiCopy Editor:Safis EditingProject Coordinator: Shweta H BirwatkarProofreader: Safis EditingIndexer: Tejal Daruwale SoniGraphics: Jisha ChirayilProduction Coordinator: Shantanu Zagade

First published: May 2018

Production reference: 1300518

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78883-119-2

www.packtpub.com

To my bosses: Mark Zaporowski (Canisius College), K.G. Viswanathan (Hofstra University), Lisa Fairchild (Loyola University), John Doe (Wharton School), David Ding (Nanyang Technological University), and Ben Amoako-Adu (Wilfrid Laurier University).
– Yuxing Yan
To my dad, mom, and sister.
–James Yan
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Dr. Yuxing Yan graduated from McGill University with a PhD in finance. He has taught various finance courses at eight universities in Canada, Singapore, and the U.S. He has published 23 research and teaching-related papers, and is the author of 6 books. Two of his recent publications are Python for Finance and Financial Modelling using R. He is well-versed in R, Python, SAS, MATLAB, Octave, and C. In addition, he is an expert on financial data analytics.

I thank Ben Amoako-Adu, Brian Smith, Jin-Chun Duan, Jerome Detemple, Lawrence Kryzanowski, Chris Schull, Mark Keintz, Dong Xu, Eric Zhu, Paul Ratnaraj, Premal Vora, Shuguang Zhang, Mireia Gine, Shaojun Zhang, Qian Sun, Shaobo Ji, Xing Zhang, Changwen Miao, Karyl Leggio, K. G. Viswanathan, Mark Lennon, Qiyu Zhang, Xiaoning (my wife), Jing (my daughter) and James (my son) for their help and support.

James Yan is an undergraduate student at the University of Toronto (UofT), currently double-majoring in computer science and statistics. He has hands-on knowledge of Python, R, Java, MATLAB, and SQL. During his study at UofT, he has taken many related courses, such as Methods of Data Analysis I and II, Methods of Applied Statistics, Introduction to Databases, Introduction to Artificial Intelligence, and Numerical Methods, including a capstone course on AI in clinical medicine.

About the reviewer

Justin (Byung Uk) Lee completed his BA and master's in computer science at KAIST. He developed Korean Windows CE 1.0 and 2.0 at Microsoft while working for LG Electronics. Later, he ran his own business for more than 7 years, which proposed custom-tailored financial portfolios derived from data analysis. He then worked for several life and non-life insurers, including Samsung Life as a CMO and CSMO conducting CRM-based marketing. Currently, he intensively researches machine learning based big data finance analysis and financial applications using blockchain.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On Data Science with Anaconda

Dedication

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Ecosystem of Anaconda

Introduction

Reasons for using Jupyter via Anaconda

Using Jupyter without pre-installation

Miniconda

Anaconda Cloud

Finding help

Summary

Review questions and exercises

Anaconda Installation

Installing Anaconda

Anaconda for Windows

Testing Python

Using IPython

Using Python via Jupyter

Introducing Spyder

Installing R via Conda

Installing Julia and linking it to Jupyter

Installing Octave and linking it to Jupyter

Finding help

Summary

Review questions and exercises

Data Basics

Sources of data

UCI machine learning

Introduction to the Python pandas package

Several ways to input data

Inputting data using R

Inputting data using Python

Introduction to the Quandl data delivery platform

Dealing with missing data

Data sorting

Slicing and dicing datasets

Merging different datasets

Data output

Introduction to the cbsodata Python package

Introduction to the datadotworld Python package

Introduction to the haven and foreign R packages

Introduction to the dslabs R package

Generating Python datasets

Generating R datasets

Summary

Review questions and exercises

Data Visualization

Importance of data visualization

Data visualization in R

Data visualization in Python

Data visualization in Julia

Drawing simple graphs

Various bar charts, pie charts, and histograms

Adding a trend

Adding legends and other explanations

Visualization packages for R

Visualization packages for Python

Visualization packages for Julia

Dynamic visualization

Saving pictures as pdf

Saving dynamic visualization as HTML file

Summary

Review questions and exercises

Statistical Modeling in Anaconda

Introduction to linear models

Running a linear regression in R, Python, Julia, and Octave

Critical value and the decision rule

F-test, critical value, and the decision rule

An application of a linear regression in finance

Dealing with missing data

Removing missing data

Replacing missing data with another value

Detecting outliers and treatments

Several multivariate linear models

Collinearity and its solution

A model's performance measure

Summary

Review questions and exercises

Managing Packages

Introduction to packages, modules, or toolboxes

Two examples of using packages

Finding all R packages

Finding all Python packages

Finding all Julia packages

Finding all Octave packages

Task views for R

Finding manuals

Package dependencies

Package management in R

Package management in Python

Package management in Julia

Package management in Octave

Conda – the package manager

Creating a set of programs in R and Python

Finding environmental variables

Summary

Review questions and exercises

Optimization in Anaconda

Why optimization is important

General issues for optimization problems

Expressing various kinds of optimization problems as LPP

Quadratic optimization

Optimization in R

Optimization in Python

Optimization in Julia

Optimization in Octave

Example #1 – stock portfolio optimization

Example #2 – optimal tax policy

Packages for optimization in R

Packages for optimization in Python

Packages for optimization in Octave

Packages for optimization in Julia

Summary

Review questions and exercises

Unsupervised Learning in Anaconda

Introduction to unsupervised learning

Hierarchical clustering

k-means clustering

Introduction to Python packages – scipy

Introduction to Python packages – contrastive

Introduction to Python packages – sklearn (scikit-learn)

Introduction to R packages – rattle

Introduction to R packages – randomUniformForest

Introduction to R packages – Rmixmod

Implementation using Julia

Task view for Cluster Analysis

Summary

Review questions and exercises

Supervised Learning in Anaconda

A glance at supervised learning

Classification

The k-nearest neighbors algorithm

Bayes classifiers

Reinforcement learning

Implementation of supervised learning via R

Introduction to RTextTools

Implementation via Python

Using the scikit-learn (sklearn) module

Implementation via Octave

Implementation via Julia

Task view for machine learning in R

Summary

Review questions and exercises

Predictive Data Analytics – Modeling and Validation

Understanding predictive data analytics

Useful datasets

The AppliedPredictiveModeling R package

Time series analytics

Predicting future events

Seasonality

Visualizing components

R package – LiblineaR

R package – datarobot

R package – eclust

Model selection

Python package – model-catwalk

Python package – sklearn

Julia package – QuantEcon

Octave package – ltfat

Granger causality test

Summary

Review questions and exercises

Anaconda Cloud

Introduction to Anaconda Cloud

Jupyter Notebook in depth

Formats of Jupyter Notebook

Sharing of notebooks

Sharing of projects

Sharing of environments

Replicating others' environments locally

Downloading a package from Anaconda

Summary

Review questions and exercises

Distributed Computing, Parallel Computing, and HPCC

Introduction to distributed versus parallel computing

Task view for parallel processing

Sample programs in Python

Understanding MPI

R package Rmpi

R package plyr

R package parallel

R package snow

Parallel processing in Python

Parallel processing for word frequency

Parallel Monte-Carlo options pricing

Compute nodes

Anaconda add-on

Introduction to HPCC

Summary

Review questions and exercises

References

Chapter 01: Ecosystem of Anaconda

Chapter 02: Anaconda Installation

Chapter 03: Data Basics

Chapter 04: Data Visualization

Chapter 05: Statistical Modeling in Anaconda

Chapter 06: Managing Packages

Chapter 07: Optimization in Anaconda

Chapter 08: Unsupervised Learning in Anaconda

Chapter 09: Supervised Learning in Anaconda

Chapter 10: Predictive Data Analytics – Modelling and Validation

Chapter 11: Anaconda Cloud

Chapter 12: Distributed Computing, Parallel Computing, and HPCC

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Anaconda is an open source data science platform that brings the best tools for data science together. It is a data science stack that includes more than 100 popular packages based on Python, Scala, and R. With the help of its package manager, conda, users can work with hundreds of packages in different languages and perform data preprocessing, modeling, clustering, classification, and validation with ease.

This book will get you started with Anaconda and how you can use it to perform data science operations in the real world. You will start of setting up the environment for the Anaconda platform, Jupyter, and installing the relevant packages. You will then cover the basics of data science and linear algebra for performing data science tasks. Once you are ready to go, you will start with data science operations such as cleaning, sorting, and data classification. You will then learn how to perform tasks such as clustering, regression, prediction, building machine learning models, and optimizing them. You will also learn how to visualize data and share the projects.

During this course, you will learn how to use different packages, using Anaconda to get the best results. You will learn how to efficiently use conda — the package, dependency, and environment manager for Anaconda. You will also be introduced to several powerful features of Anaconda, such as additional projects, project add-ons, shared project drives, and powerful compute nodes that are available in the paid version for accomplishing advanced data handling processes. You will learn how to build scalable and functionally efficient packages, and how to perform heterogeneous data exploration, distributed computing, and more. You will learn to discover and share packages, notebooks, and environments to increase productivity. You will also learn about Anaconda Accelerate, a feature that can help you to achieve SLAs easily and optimize computational power.

In this book, we introduce four programming languages: R, Python, Octave, and Julia. There are several reasons for doing so. Firstly, all four are open source, which is one of the future trends. Secondly, one of the most obvious advantages to using the Anaconda platform is that it allows you to where we could implement many programs written in different languages. However, for many new readers, learning four languages at the same time would be quite challenging. The best strategy is to focus on R and Python first. After a while, or after finishing the whole book, learn Octave or Julia on the second reading.

R

: This is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, such as Windows and macOS. We think that R might be the easiest of many good computer languages, especially those that offer free software. The author has published a book entitled

Financial Modeling using R;

you can refer to its Amazon link at

http://canisius.edu/~yany/webs/amazon2018R.shtml

.

Python

: This is an interpreted high-level programming language for general-purpose programming. For business analytics/data science, Python is probably the number 1 choice out of many promising computer languages. In 2017, the author published a book entitled

Python for Finance

(second edition); you can refer to its Amazon link at

http://canisius.edu/~yany/webs/amazonP4F2.shtml.

Octave

: This is a piece of software featuring a high-level programming language, primarily intended for numerical computations. Octave helps with solving linear and nonlinear problems numerically, as well as performing other numerical experiments. Octave is also free. Its syntax is largely compatible with MATLAB, which is quite popular on Wall Street and in other industries.

Julia

: This is a high-level, high-performance dynamic programming language for numerical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. Julia’s base library, largely written in Julia itself, also integrates mature, best-of-breed, open source C and Fortran libraries for linear algebra, random number generation, signal processing, and string processing.

Happy reading!

Who this book is for

Hands-On Data Science with Anaconda is for you if you are a developer who is looking for the best tools on the market to perform data science operations. It's also ideal for data analysts and data science professionals who want to improve the efficiency of their data science applications using the best libraries in multiple languages. Basic programming knowledge with R or Python and basic knowledge of linear algebra is expected.

What this book covers

Chapter 1, Ecosystem of Anaconda, introduces some basic concepts such as the reasons why we use Anaconda and the advantages of using a full-fledged Anaconda and/or its baby version, Miniconda. Then, it covers the use of Anaconda online, without installation. We also test a few simple programs, written in R, Python, Julia, and Octave.

Chapter 2, Anaconda Installation, shows how to install Anaconda, test whether the installation is successful, how to launch Jupyter and use it to launch Python, how to launch Spyder and R, and how to find help. Most of these concepts or procedures are quite basic, so users who are quite confident with them can skip this chapter and go directly to the next chapter.

Chapter 3, Data Basics, discusses sources of open data, which include the Bureau of Labor Statistics, the Census Bureau, Professor French’s Data Library, the Federal Reserve’s Data Library, and the UCI (University of California at Irvin) Machine Learning Repository. After that, it explains how to input data; how to deal with missing data; how to sort, slice, and dice datasets; how to merge different datasets and data output. For different languages, such as Python, R, Julia and Octave, several relevant packages for data manipulation are introduced and discussed.

Chapter 4, Data Visualization, discusses various types of visual presentations, which include simple graphs, bar charts, pie charts, and histograms, written in different languages such as R, Python, and Julia. Visual presentations can help our audience understand our data better. For many complex concepts or theories, we could use visual presentations to help explain their logic and complexity. A typical example is the so-called bisection method or bisection search.

Chapter 5, Statistical Modeling in Anaconda, explains many important issues related to statistics, such as T-distribution, F-distribution, T-test, and F-test. We also discuss linear regression, how to deal with missing data, how to treat outliers, collinearity and its treatments, and how to run a multi-variable linear regression.

Chapter 6, Managing Packages, explains the importance of managing packages, how to find out all packages available for R, Python, and Julia, and how to find the manual for each package. In addition, we discuss the issue of package dependency and how to make our programming a little easier when dealing with packages.

Chapter 7, Optimization in Anaconda, discusses several optimization topics, including general optimization problems, expressing various kinds of optimization problems as LPPs, and quadratic optimization. Several examples are offered to make our discussion more practice-oriented, such as how to choose an optimal stock portfolio, how to optimize wealth and resources to promote sustainable development, and how much the government should really tax people. In addition, we introduce several packages for optimization in R, Python, Julia, and Octave.

Chapter 8,Unsupervised Learning in Anaconda, covers unsupervised learning. In particular, hierarchical clustering and k-means clustering are covered. As for R and Python, several related packages are looked at in details. For R: rattle, Rmixmod, and randomUniformForest; For Python: Scipy.cluster, Contrastive, and sklearn.

Chapter 9, Supervised Learning in Anaconda, discusses supervised learning, including classification, k-nearest neighbors algorithm, Bayes' classifiers, reinforcement learning, and specific R and Python-related modules, such as RTextTools and sklearn. In addition, you will see their implementation in R, Python, Julia, and Octave.

Chapter 10, Predictive Data Analytics – Modelling and Validation, covers predictive data analytics, modeling and validation, some useful datasets, time series analytics, how to predict future events, seasonality, and how to visualize our data. We mention prsklearn and catwalk for Python, datarobot, LiblineaR, and eclust for R, QuantEcon for Julia and ltfat for Octave.

Chapter 11, Anaconda Cloud, discusses Anaconda Cloud. Some topics include Jupyter Notebook in depth, different formats of Jupyter notebooks, how to share notebooks with your partners, how to share different projects over different platforms, how to share your working environments, and how to replicate other's environments locally.

Chapter 12, Distributed Computing, Parallel Computing, and HPCC, covers distributed computing and Anaconda Accelerate. When our data or tasks become more complex, we need a good system or a set of tools to process data and run complex algorithms. For this purpose, distributed computing is one solution. In particular, we will explain compute nodes, project add-ons, parallel processing, and advanced Python for data parallelism.

To get the most out of this book

The chapters in this book require a PC or Mac with 8GB or 16GB of RAM (the higher, the better). Your machine should have at least a 2.2 GHz Core i3/i5 processor or an AMD equivalent.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Hands-On-Data-Science-with-Anaconda. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/HandsOnDataSciencewithAnaconda_ColorImages.pdf.

Conventions used

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, path names, dummy URLs, user input, and Twitter handles are shown as follows: "The most widely used Python package for graphs and images is called matplotlib."

A block of code is set as follows:

import matplotlib.pyplot as plt plt.plot([2,3,8,12]) plt.show()

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

import matplotlib.pyplot as plt

plt.plot([2,3,8,12]) plt.show()

Any command-line input or output is written as follows:

install.packages("rattle")

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "For the sources of data, we choose from seven potential formats, such as File, ARFF, ODBC, R Dataset, RData File, and we can load our data from there."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Ecosystem of Anaconda

In the preface, we mentioned that this book is designed for readers who are looking for tools in the area of data science. Existing data analysts and data science professionals who wish to improve the efficiency of their data science applications by using the best libraries with multiple languages will find this book quite useful. The platform discussed in detail across various chapters is Anaconda and the computational tools could be Python, R, Julia, or Octave. The beauty of using these programming languages is that they are all open source, as in free to download. In this chapter, we start from the very beginning: a simple introduction. For this book, we assume that readers have some basic knowledge related to several programming languages, such as R and Python. There are many books available, such as Python for Data Analysis by McKinney (2013) and Python for Finance by Yan (2017).

In this chapter, the following topics will be covered:

Introduction

Miniconda

Anaconda Cloud

Finding help

Introduction

Nowadays, we are overwhelmed by large amounts of information—see Shi, Zhang, and Khan (2017), or Fang and Zhang (2016)—the catchphrase being big data. However, defining it is still controversial, since many explanations are available. Davenport and Patil (2012) suggest that if your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a mashup of several analytical efforts, you've got a big data opportunity.

Many users of data science or data analytics are learning several programming languages such as R and Python, but how can they use both of them at the same time? If John is using R while his teammate is using Python, how do they communicate with each other? How do team members share their packages, programs, and even their working environments? In this book, we try our best to offer a solution to all of these challenging tasks by introducing Anaconda, since it possesses several wonderful properties.

Generally speaking, R is a programming language for statistical computing and graphics that is supported by the R Foundation for statistical computing. Python is an interpreted, object-oriented programming language similar to Perl that has gained popularity because of its clear syntax and readability. Julia is for numerical computing and extensive mathematical function and is designed for parallelism and cloud computing, while Octave is for numerical computation and mathematics-oriented and batch-oriented language. All those four languages, R, Python, Julia, and Octave, are free.

Reasons for using Jupyter via Anaconda

In data science or data analytics, we usually work in a team. This means that each developer, researcher, or team member, might have his/her favorite programming language, such as Python, R, Octave, or Julia. If we could have a platform to run all of those languages, it would be great. Fortunately, Jupyter is such a platform, since this platform can accommodate over 40 languages, including Python, R, Julia, Octave, and Scala.

In Chapter 2, Anaconda Installation, we will show you how to run those four languages via Jupyter. Of course, there are other benefits of using Anaconda: we might worry less about the dependency of installed packages, manage packages more efficiently, and share our programs, projects, and working environments. In addition, Jupyter Notebooks can be shared with others using email, Dropbox, GitHub, and the Jupyter Notebook Viewer.

Using Jupyter without pre-installation

In Chapter 2, Anaconda Installation, we will discuss how to install Jupyter via Anaconda installation. However, we could launch Jupyter occasionally without pre-installation by going to the web page at https://jupyter.org/try:

The welcome screen will be presented with various options for trying out different languages.

For example, by clicking the

Try Jupyter with Julia

image, we would see the following screen:

To save space, the screenshot shows only the first part of the demo. Any readers could try the previous two steps to view the whole demo. In addition, if we click the

Try Jupyter with R

image, the following screen would show:

After selecting

Try Jupyter with Python,

you will be presented with the welcome screen for the same.

Next, we will show you how to execute a few simple commands in R, Python, and Julia. For example, we could use R to use the platform to run a few simple command lines. In the following example, we enter

pv=100

,

r=0.1

,and

n=5

:

After clicking the

Run

button on the menu bar, we assign those values to the three variables. Then we can estimate the future value of this present value, as illustrated here:

Similarly, we could try to use Python, as shown here:

In the preceding example, we import the Python package called scipy and give it a short name, sp. Although other short names could be used to represent the scipy package, it is a convention to use sp. Then, we use the sqrt() function included in the Python package.

For Julia, we could try the following code (shown in the following screenshot). Again, after going to File|New on the menu, we choose Julia 0.6.0. As of May 09, 2018, 0.6.0 is the current version for Julia. Note that your current version for Julia could be different:

In the code, we define a function called sphere_vol with just one input value of r (in radians). The answer is 64.45 for an input value of 2.5.

Miniconda

Anaconda is a full distribution of Python and comes with over 1,000 open source packages after installation. Because of this, the total size is over 3 GB. Anaconda is good if we intend to have many packages downloaded and pre-installed. On the other hand, Miniconda contains only Python and other necessary libraries needed to run conda itself. The size for the Miniconda is about 400 MB, much smaller than the full version of Anaconda, so extra packages have to be downloaded and installed as requested.

There are many reasons why a new user might prefer a watered-down version of Anaconda. For example, they might not need so many packages. Another reason is that users might not have enough space. Those users could download Miniconda at https://conda.io/miniconda.html. Again, in Chapter 2, Anaconda Installation, we will discuss in detail how to install Anaconda and run programs written in different languages, such as Python, R, Julia, and Octave.

Anaconda Cloud

In Chapter 2, Anaconda Installation, we'll explain this in more detail. This function is used to collaborate with different users or group members. For example, we have a small group of ten developers working on the same project. For this reason, we have to share our programs, command datasets, and working environments, and we could use Anaconda Cloud to do so. After going to https://anaconda.org/, we will be directed to the Anaconda home page.

Note that users have to register with Anaconda before they can use this function. For example, one of the authors has the link https://anaconda.org/paulyan/dashboard. After we register, we can see the following:

Later in the book, we devote a whole chapter to this.

Finding help

There are many websites we can visit to get help. The first allows us to find the user guide, shown at the following link: https://docs.anaconda.com/anaconda/user-guide/