73,19 €
Leverage the power of Julia to design and develop high performing programs
This learning path is for data scientists and for all those who work in technical and scientific computation projects. It will be great for Julia developers who are interested in high-performance technical computing.
This learning path assumes that you already have some basic working knowledge of Julia's syntax and high-level dynamic languages such as MATLAB, R, Python, or Ruby.
In this learning path, you will learn to use an interesting and dynamic programming language—Julia! You will get a chance to tackle your numerical and data problems with Julia. You'll begin the journey by setting up a running Julia platform before exploring its various built-in types. We'll then move on to the various functions and constructs in Julia. We'll walk through the two important collection types—arrays and matrices in Julia.
You will dive into how Julia uses type information to achieve its performance goals, and how to use multiple dispatch to help the compiler emit high performance machine code. You will see how Julia's design makes code fast, and you'll see its distributed computing capabilities.
By the end of this learning path, you will see how data works using simple statistics and analytics, and you'll discover its high and dynamic performance—its real strength, which makes it particularly useful in highly intensive computing tasks.
This learning path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:
This hands-on manual will give you great explanations of the important concepts related to Julia programming.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 865
Veröffentlichungsjahr: 2016
Leverage the power of Julia to design and develop high performing programs
A course in three modules
BIRMINGHAM - MUMBAI
Copyright © 2016 Packt Publishing
All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Published on: November 2016
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78712-570-4
www.packtpub.com
Authors
Ivo Balbaert
Avik Sengupta
Malcolm Sherrington
Reviewers
Pascal Bugnion
Michael Otte
Dustin Stansbury
Zhuo QL
Gururaghav Gopal
Dan Wlasiuk
Content Development Editor
Priyanka Mehta
Graphics
Disha Haria
Production Coordinator
Aparna Bhagat
Julia is a relatively young programming language. The initial design work on the Julia project began at MIT in August 2009, and by February 2012, it became open source. It is largely the work of three developers Stefan Karpinski, Jeff Bezanson, and Viral Shah. These three, together with Alan Edelman, still remain actively committed to Julia and MIT currently hosts a variety of courses in Julia, manyof which are available over the Internet.
Initially, Julia was envisaged by the designers as a scientific language sufficiently rapid to make the necessity of modeling in an interactive language and subsequently having to redevelop in a compiled language, such as C or Fortran. At that time the major scientific languages were propriety ones such as MATLAB and Mathematica, and were relatively slow. There were clones of these languages in the open source domain, such as GNU Octave and Scilab, but these were even slower. When it launched, the community saw Julia as a replacement for MATLAB, but this is not exactly the case. Although the syntax of Julia is similar to MATLAB, so much so that anyone competent in MATLAB can easily learn Julia, it was not designed as a clone. It is a more feature-rich language with many significant differences that will be discussed in depth later.
The period since 2009 has seen the rise of two new computing disciplines: big data/cloud computing and data science. Big data processing on Hadoop is conventionally seen as the realm of Java programming, since Hadoop runs on the Java virtual machine. It is, of course, possible to process big data by using programming languages other than those that are Java-based and utilize the streaming-jar paradigm, and Julia can be used in a way similar to C++, C#, and Python.
The emergence of data science heralded the use of programming languages that were simple for analysts with some programming skills but who were not principally programmers. The two languages that stepped up to fill the breach have been R and Python. Both of these are relatively old with their origins back in the 1990s. However, the popularity of these two has seen a rapid growth, ironically from around the time when Julia was introduced to the world. Even so, with such estimated and staid opposition, Julia has excited the scientific programming community and continues to make inroads in this space.
The aim of this course is to cover all aspects of Julia that make it appealing to the data scientist. The language is evolving quickly. Binary distributions are available for Linux, Mac OS X, and Linux, but these will lag behind the current sources. So, to do some serious work with Julia, it is important to understand how to obtain and build a running system from source. In addition, there are interactive development environments available for Julia and the course will discuss both the Jupyter and Juno IDEs.
Module 1, Getting Started with Julia, a head start to tackle your numerical and data problems with Julia. Your journey will begin by learning how to set up a running Julia platform before exploring its various built-in types. You will then move on to cover the different functions and constructs in Julia. The module will then walk you through the two important collection types―arrays and matrices. Over the course of the module, you will also be introduced to homoiconicity, the meta-programming concept in Julia. Towards the concluding part of the module, you will also learn how to run external programs. This module will cover all you need to know about Julia to leverage its high speed and efficiency.
Module 2, Julia High Performance, will take you on a journey to understand the performance characteristics of your Julia programs, and enables you to utilize the promise of near C levels of performance in Julia. You will learn to analyze and measure the performance of Julia code, understand how to avoid bottlenecks, and design your program for the highest possible performance. In this module, you will also see how Julia uses type information to achieve its performance goals, and how to use multiple dispatch to help the compiler to emit high performance machine code. Numbers and their arrays are obviously the key structures in scientific computing – you will see how Julia's design makes them fast.
Module 3, Mastering Julia, you will compare the different ways of working with Julia and explore Julia's key features in-depth by looking at design and build. You will see how data works using simple statistics and analytics, and discover Julia's speed, its real strength, which makes it particularly useful in highly intensive computing tasks and observe how Julia can cooperate with external processes in order to enhance graphics and data visualization. Finally, you will look into meta-programming and learn how it adds great power to the language and establish networking and distributed computing with Julia.
Developing in Julia can be done under any of the familiar computing operating systems: Linux, OS X, and Windows. To explore the language in depth, the reader may wish to acquire the latest versions and to build from source under Linux. However, to work with the language using a binary distribution on any of the three platforms, the installation is very straightforward and convenient. In addition, Julia now comes pre-packaged with the Juno IDE, which just requires expansion from a compressed (zipped) archive.
This learning path is for data scientists and for all those who work in technical and scientific computation projects. It will be great for Julia developers who are interested in high-performance technical computing.
This learning path assumes that you already have some basic working knowledge of Julia's syntax and high-level dynamic languages such as MATLAB, R, Python, or Ruby.
Feedback from our readers is always welcome. Let us know what you think about this course—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the title of the course in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to any of our product, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt course, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this course from your account at http://www.packtpub.com. If you purchased this course elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
You can also download the code files by clicking on the Code Files button on the course's webpage at the Packt Publishing website. This page can be accessed by entering the course's name in the Search box. Please note that you need to be logged into your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
The code bundle for the course is also hosted on GitHub at https://github.com/PacktPublishing/Julia-High-Performance-Programming. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books/courses—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this course. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your course, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book/course in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this course, you can contact us at <[email protected]>, and we will do our best to address the problem.
Getting Started with Julia
Enter the exciting world of Julia, a high-performance language for technical computing
This introduction will present you with the reasons why Julia is quickly growing in popularity in the technical, data scientist, and high-performance computing arena. We will cover the following topics:
The core designers and developers of Julia (Jeff Bezanson, Stefan Karpinski, and Viral Shah) have made it clear that Julia was born out of a deep frustration with the existing software toolset in the technical computing disciplines. Basically, it boils down to the following dilemma:
Julia was explicitly designed to bridge this gap. It gives you the possibility of writing high-performance code that uses CPU and memory resources as effectively as can be done in C, but working in pure Julia all the way down, reduces the need for a low-level language. This way, you can rapidly iterate using a simple programming model from the problem prototype to near-C performance. The Julia developers have proven that working in one environment that has the expressive capabilities as well as the pure speed is possible using the recent advances in Low Level Virtual Machine Just in Time (LLVM JIT) compiler technologies (for more information, see http://en.wikipedia.org/wiki/LLVM).
In summary, they designed Julia to have the following specifications:
Notice that there is no need to indicate the multiplications.
It provides the computational power and speed without having to leave the Julia environment.Metaprogramming and macro capabilities (due to its homoiconicity (refer to Chapter 7, Metaprogramming in Julia), inherited from Lisp), to increase its abstraction power.Also, it is usable for general programming purposes, not only in pure computing disciplines.It has built-in and simple to use concurrent and parallel capabilities to thrive in the multicore world of today and tomorrow.Julia unites this all in one environment, something which was thought impossible until now by most researchers and language designers.
The Julia logo
Julia reconciles and brings together the technologies that before were considered separate, namely:
How can Julia have the flexibility of the first and the speed of the second category?
Julia has no static compilation step. The machine code is generated just-in-time by an LLVM-based JIT compiler. This compiler, together with the design of the language, helps Julia to achieve maximal performance for numerical, technical, and scientific computing. The key for the performance is the type information, which is gathered by a fully automatic and intelligent type inference engine, that deduces the type from the data contained in the variables. Indeed, because Julia has a dynamic type system, declaring the type of variables in the code is optional. Indicating types is not necessary, but it can be done to document the code, improve tooling possibilities, or in some cases, to give hints to the compiler to choose a more optimized execution path. This optional typing discipline is an aspect it shares with Dart. Typeless Julia is a valid and useful subset of the language, similar to traditional dynamic languages, but it nevertheless runs at statically compiled speeds. Julia applies generic programming and polymorphic functions to the limit, writing an algorithm just once and applying it to a broad range of types. This provides common functionality across drastically different types, for example: size is a generic function with 50 concrete method implementations. A system called dynamic multiple dispatch efficiently picks the optimal method for all of a function's arguments from tens of method definitions. Depending on the actual types very specific and efficient native code implementations of the function are chosen or generated, so its type system lets it align closer with primitive machine operations.
In summary, data flow-based type inference implies multiple dispatch choosing specialized execution code.
However, do keep in mind that types are not statically checked. Exceptions due to type errors can occur at runtime, so thorough testing is mandatory. As to categorizing Julia in the programming language universe, it embodies multiple paradigms, such as procedural, functional, metaprogramming, and also (but not fully) object oriented. It is by no means an exclusively class-based language such as Java, Ruby, or C#. Nevertheless, its type system offers a kind of inheritance and is very powerful. Conversions and promotions for numeric and other types are elegant, friendly, and swift, and user-defined types are as fast and compact as built-in types. As for functional programming, Julia makes it very easy to design programs with pure functions and has no side effects; functions are first-class objects, as in mathematics.
Julia also supports a multiprocessing environment based on a message passing model to allow programs to run via multiple processes (local or remote) using distributed arrays, enabling distributed programs based on any of the models for parallel programming.
Julia is equally suited for general programming as is Python. It has as good and modern (Unicode capable) string processing and regular expressions as Perl or other languages. Moreover, it can also be used at the shell level, as a glue language to synchronize the execution of other programs or to manage other processes.
Julia has a standard library written in Julia itself, and a built-in package manager based on GitHub, which is called Metadata, to work with a steadily growing collection of external libraries called packages. It is cross platform, supporting GNU/Linux, Darwin/OS X, Windows, and FreeBSD for both x86/64 (64-bit) and x86 (32-bit) architectures.
Because speed is one of the ultimate targets of Julia, a benchmark comparison with other languages is displayed prominently on the Julia website (http://julialang.org/). It shows that Julia's rivals C and Fortran, often stay within a factor of two of fully optimized C code, and leave the traditional dynamic language category far behind. One of Julia's explicit goals is to have sufficiently good performance that you never have to drop down into C. This is in contrast to the following environments, where (even for NumPy) you often have to work with C to get enough performance when moving to production. So, a new era of technical computing can be envisioned, where libraries can be developed in a high-level language instead of in C or FORTRAN. Julia is especially good at running MATLAB and R-style programs. Let's compare them somewhat more in detail.
Julia is instantly familiar to MATLAB users; its syntax strongly resembles that of MATLAB, but Julia aims to be a much more general purpose language than MATLAB. The names of most functions in Julia correspond to the MATLAB/Octave names, and not the R names. Under the covers, however, the way the computations are done, things are extremely different. Julia also has equally powerful capabilities in linear algebra, the field where MATLAB is traditionally applied. However, using Julia won't give you the same license fee headaches. Moreover, the benchmarks show that it is from 10 to 1,000 times faster depending on the type of operation, also when compared to Octave (the open source version of MATLAB). Julia provides an interface to the MATLAB language with the package MATLAB.jl (https://github.com/lindahua/MATLAB.jl).
R was until now the chosen development language in the statistics domain. Julia proves to be as usable as R in this domain, but again with a performance increase of a factor of 10 to 1,000. Doing statistics in MATLAB is frustrating, as is doing linear algebra in R, but Julia fits both the purposes. Julia has a much richer type system than the vector-based types of R. Some statistics experts such as Douglas Bates heavily support and promote Julia as well. Julia provides an interface to the R language with the package Rif.jl (https://github.com/lgautier/Rif.jl).
Again, Julia has a performance head start of a factor of 10 to 30 times as compared to Python. However, Julia compiles the code that reads like Python into machine code that performs like C. Furthermore, if necessary you can call Python functions from within Julia using the PyCall package (https://github.com/stevengj/PyCall.jl).
Because of the huge number of existing libraries in all these languages, any practical data scientist can and will need to mix the Julia code with R or Python when the problem at hand demands it.
Julia can also be applied to data analysis and big data, because these often involve predictive analysis, modeling problems that can often be reduced to linear algebra algorithms, or graph analysis techniques, all things Julia is good at tackling.
In the field of High Performance Computing (HPC), a language such as Julia has long been lacking. With Julia, domain experts can experiment and quickly and easily express a problem in such a way that they can use modern HPC hardware as easily as a desktop PC. In other words, a language that gets users started quickly without the need to understand the details of the underlying machine architecture is very welcome in this area.
The following are the links that can be useful while using Julia:
In this introduction, we gave an overview of Julia's characteristics and compared them to the existing languages in its field. Julia's main advantage is its ability to generate specialized code for different input types. When coupled with the compiler's ability to infer these types, this makes it possible to write the Julia code at an abstract level while achieving the efficiency associated with the low-level code. Julia is already quite stable and production ready. The learning curve for Julia is very gentle; the idea being that people who don't care about fancy language features should be able to use it productively too and learn about new features only when they become useful or needed.
This chapter guides you through the download and installation of all the necessary components of Julia. The topics covered in this chapter are as follows:
By the end of this chapter, you will have a running Julia platform. Moreover, you will be able to work with Julia's shell as well as with editors or integrated development environments with a lot of built-in features to make development more comfortable.
The Julia platform in binary (that is, executable) form can be downloaded from http://julialang.org/downloads/. It exists for three major platforms (Windows, Linux, and OS X) in 32- and 64-bit format, and is delivered as a package or in an archive format. You should use the current official stable release when doing serious professional work with Julia (at the time of writing, this is Version 0.3). If you would like to investigate the latest developments, install the upcoming version (which is now Version 0.4). The previous link contains detailed and platform-specific instructions for the installation. We will not repeat these instructions here completely, but we will summarize some important points.
You need to keep the following things in mind if you are using the Windows OS:
The Julia folder structure in Windows
The Julia REPL
For Ubuntu systems (Version 12.04 or later), there is a Personal Package Archive (PPA) for Julia (can be found at https://launchpad.net/~staticfloat/+archive/ubuntu/juliareleases) that makes the installation painless. All you need to do to get the stable version is to issue the following commands in a terminal session:
If you want to be at the bleeding edge of development, you can download the nightly builds instead of the stable releases. The nightly builds are generally less stable, but will contain the most recent features. To do so, replace the first of the preceding commands with:
This way, you can always upgrade to a more recent version by issuing the following commands:
The Julia executable lives in /usr/bin/julia (given by the JULIA_HOME variable or by the which julia command) and the standard library is installed in /usr/share/julia/base, with shared libraries in /usr/lib/x86_64-linux-gnu/Julia.
For other Linux versions, the best way to get Julia running is to build from source (refer to the next section).
Installation for OS X is straightforward—using the standard software installation tools for the platform. Add /Applications/Julia-n.m.app/Contents/Resources/julia/bin/Julia to make Julia available everywhere on your computer.
If you want code to be run whenever you start a Julia session, put it in /home/.juliarc.jl on Ubuntu, ~/.juliarc.jl on OS X, or c:\Users\username\.juliarc.jl on Windows. For instance, if this file contains the following code:
Then, Julia starts up in its shell (or REPL as it is usually called) with the following text in the screenshot, which shows its character representation capabilities:
Using .juliarc.jl
Perform the following steps to build Julia from source:
This will download the Julia source code into a julia directory in the current folder.
The Julia building process needs the GNU compilation tools g++, gfortran, and m4, so make sure that you have installed them with the following command:For more information on how to build Julia on Windows, OS X, and other systems, refer to https://github.com/JuliaLang/julia/.
Using parallelization
If you want Julia to use n concurrent processes, compile the source with make -j n.
There are two ways of using Julia. As described in the previous section, we can use the Julia shell for interactive work. Alternatively, we can write programs in a text file, save them with a .jl extension, and let Julia execute the whole program sequentially.
Most of the standard library in Julia (can be found in /share/julia/base relative to where Julia was installed) is written in Julia itself. The rest of Julia's code ecosystem is contained in packages that are simply Git repositories. They are most often authored by external contributors, and already provide functionality for such diverse disciplines such as bioinformatics, chemistry, cosmology, finance, linguistics, machine learning, mathematics, statistics, and high-performance computing. A searchable package list can be found at http://pkg.julialang.org/. Official Julia packages are registered in the METADATA.jl file in the Julia Git repository, available on GitHub at https://github.com/JuliaLang/METADATA.jl.
Julia's installation contains a built-in package manager Pkg for installing additional Julia packages written in Julia. The downloaded packages are stored in a cache ready to be used by Julia given by Pkg.dir(), which are located at c:\users\username\.julia\vn.m\.cache, /home/$USER/.julia/vn.m/.cache, or ~/.julia/vn.m/.cache. If you want to check which packages are installed, run the Pkg.status() command in the Julia REPL, to get a list of packages with their versions, as shown in the following screenshot:
Packages list
The Pkg.installed() command gives you the same information, but in a dictionary form and is usable in code. Version and dependency management is handled automatically by Pkg. Different versions of Julia can coexist with incompatible packages, each version has its own package cache.
If you get an error with Pkg.status() such as ErrorException("Unable to read directory METADATA."), issue a Pkg.init() command to create the package repository folders, and clone METADATA from Git. If the problem is not easy to find or the cache becomes corrupted somehow, you can just delete the .julia folder, enter Pkg.init(), and start with an empty cache. Then, add the packages you need.
Before adding a new package, it is always a good idea to update your package database for the already installed packages with the Pkg.update()command. Then, add a new package by issuing the Pkg.add("PackageName") command, and execute using PackageName in code or in the REPL. For example, to add 2D plotting capabilities, install the Winston package with Pkg.add("Winston "). To make a graph of 100 random numbers between 0 and 1, execute the following commands:
The rand(100) function is an array with 100 random numbers. This produces the following output:
A plot of white noise with Winston
After installing a new Julia version, update all the installed packages by running Pkg.update() in the REPL. For more detailed information, you can refer to http://docs.julialang.org/en/latest/manual/packages/.
Julia Studio is a free desktop app for working with Julia that runs on Linux, Windows, and OS X (http://forio.com/labs/julia-studio/). It works with the 0.3 release on Windows (Version 0.2.1 for Linux and OS X, at this time, if you want Julia Studio to work with Julia v0.3 on Linux and OS X, you have to do the compilation of the source code of the Studio yourself). It contains a sophisticated editor and integrated REPL, version control with Git, and a very handy side pane with access to the command history, filesystem, packages, and the list of edited documents. It is created by Forio, a company that makes software for simulations, data explorations, interactive learning, and predictive analytics. In the following screenshot, you can see some of Julia Studio's features, such as the Console section and the green Run button (or F5) in the upper-right corner. The simple program fizzbuzz.jl prints for the first 100 integers for "fizz" if the number is a multiple of 3, "buzz" if a multiple of 5, and "fizzbuzz" if it is a multiple of 15.
Julia Studio
Notice the # sign that indicates the beginning of comments, the elegant and familiar for loop and if elseif construct, and how they are closed with end. The 1:100 range is a range; mod returns the remainder of the division; the function mod(i, n) can also be written as an i % n operator. Using four spaces for indentation is a convention. Recently, Forio also developed Epicenter, a computational platform for hosting the server-side models (also in Julia), and building interactive web interfaces for these models.
The popular Sublime Text editor (http://www.sublimetext.com/3) now has a plugin based on IJulia (https://github.com/quinnj/Sublime-IJulia) authored by Jacob Quinn. It gives you syntax highlighting, autocompletion, and an in-editor REPL, which you basically just open like any other text file, but it runs Julia code for you. You can also select some code from a code file and send it to the REPL with the shortcut CTRL + B, or send the entire file there. Sublime-IJulia provides a frontend to the IJulia backend kernel, so that you can start an IJulia frontend in a Sublime view and interact with the kernel. Here is a summary of the installation, for details you can refer to the preceding URL:
Another promising IDE for Julia and a work in progress by Mike Innes and Keno Fisher is Juno, which is based on the Light Table environment. The docs at http://junolab.org/docs/installing.html provides detailed instructions for installing and configuring Juno. Here is a summary of the steps:
Light Table works extensively with a command palette that you can open by typing Ctrl + SPACE, entering a command, and then selecting it. Juno provides an integrated console, and you can evaluate single expressions in the code editor directly by typing Ctrl + Enter at the end of the line. A complete script is evaluated by typing Ctrl + Shift + Enter.
For terminal users, the available editors are as follows:
On Linux, gedit is very good. The Julia plugin works well and provides autocompletion. Notepad++ also has Julia support from the contrib directory mentioned earlier.
The SageMath project (https://cloud.sagemath.com/) runs Julia in the cloud within a terminal and lets you work with IPython notebooks. You can also work and teach with Julia in the cloud using the JuliaBoxplatform (https://juliabox.org/).
By now, you should have been able to install Julia in a working environment you prefer. You should also have some experience with working in the REPL. We will put this to good use starting in the next chapter, where we will meet the basic data types in Julia, by testing out everything in the REPL.