Julia: High Performance Programming - Ivo Balbaert - E-Book

Julia: High Performance Programming E-Book

Ivo Balbaert

0,0
73,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Leverage the power of Julia to design and develop high performing programs

About This Book

  • Get to know the best techniques to create blazingly fast programs with Julia
  • Stand out from the crowd by developing code that runs faster than your peers' code
  • Complete an extensive data science project through the entire cycle from ETL to analytics and data visualization

Who This Book Is For

This learning path is for data scientists and for all those who work in technical and scientific computation projects. It will be great for Julia developers who are interested in high-performance technical computing.

This learning path assumes that you already have some basic working knowledge of Julia's syntax and high-level dynamic languages such as MATLAB, R, Python, or Ruby.

What You Will Learn

  • Set up your Julia environment to achieve the highest productivity
  • Solve your tasks in a high-level dynamic language and use types for your data only when needed
  • Apply Julia to tackle problems concurrently and in a distributed environment
  • Get a sense of the possibilities and limitations of Julia's performance
  • Use Julia arrays to write high performance code
  • Build a data science project through the entire cycle of ETL, analytics, and data visualization
  • Display graphics and visualizations to carry out modeling and simulation in Julia
  • Develop your own packages and contribute to the Julia Community

In Detail

In this learning path, you will learn to use an interesting and dynamic programming language—Julia! You will get a chance to tackle your numerical and data problems with Julia. You'll begin the journey by setting up a running Julia platform before exploring its various built-in types. We'll then move on to the various functions and constructs in Julia. We'll walk through the two important collection types—arrays and matrices in Julia.

You will dive into how Julia uses type information to achieve its performance goals, and how to use multiple dispatch to help the compiler emit high performance machine code. You will see how Julia's design makes code fast, and you'll see its distributed computing capabilities.

By the end of this learning path, you will see how data works using simple statistics and analytics, and you'll discover its high and dynamic performance—its real strength, which makes it particularly useful in highly intensive computing tasks.

This learning path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:

  • Getting Started with Julia by Ivo Balvaert
  • Julia High Performance by Avik Sengupta
  • Mastering Julia by Malcolm Sherrington

Style and approach

This hands-on manual will give you great explanations of the important concepts related to Julia programming.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 865

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Julia: High Performance Programming
Julia: High Performance Programming
Credits
Preface
What this learning path covers
What you need for this learning path
Who this learning path is for
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
I. Module 1
The Rationale for Julia
The scope of Julia
Julia's place among the other programming languages
A comparison with other languages for the data scientist
MATLAB
R
Python
Useful links
Summary
1. Installing the Julia Platform
Installing Julia
Windows version – usable from Windows XP SP2 onwards
Ubuntu version
OS X
Building from source
Working with Julia's shell
Startup options and Julia scripts
Packages
Adding a new package
Installing and working with Julia Studio
Installing and working with IJulia
Installing Sublime-IJulia
Installing Juno
Other editors and IDEs
How Julia works
Summary
2. Variables, Types, and Operations
Variables, naming conventions, and comments
Types
Integers
Floating point numbers
Elementary mathematical functions and operations
Rational and complex numbers
Characters
Strings
Formatting numbers and strings
Regular expressions
Ranges and arrays
Other ways to create arrays
Some common functions for arrays
How to convert an array of chars to a string
Dates and times
Scope and constants
Summary
3. Functions
Defining functions
Optional and keyword arguments
Anonymous functions
First-class functions and closures
Recursive functions
Map, filter, and list comprehensions
Generic functions and multiple dispatch
Summary
4. Control Flow
Conditional evaluation
Repeated evaluation
The for loop
The while loop
The break statement
The continue statement
Exception handling
Scope revisited
Tasks
Summary
5. Collection Types
Matrices
Tuples
Dictionaries
Keys and values – looping
Sets
Making a set of tuples
Example project – word frequency
Summary
6. More on Types, Methods, and Modules
Type annotations and conversions
Type conversions and promotions
The type hierarchy – subtypes and supertypes
Concrete and abstract types
User-defined and composite types
When are two values or objects equal or identical?
Multiple dispatch example
Types and collections – inner constructors
Type unions
Parametric types and methods
Standard modules and paths
Summary
7. Metaprogramming in Julia
Expressions and symbols
Eval and interpolation
Defining macros
Built-in macros
Testing
Debugging
Benchmarking
Starting a task
Reflection capabilities
Summary
8. I/O, Networking, and Parallel Computing
Basic input and output
Working with files
Reading and writing CSV files
Using DataFrames
Other file formats
Working with TCP sockets and servers
Interacting with databases
Parallel operations and computing
Creating processes
Using low-level communications
Parallel loops and maps
Distributed arrays
Summary
9. Running External Programs
Running shell commands
Interpolation
Pipelining
Calling C and FORTRAN
Calling Python
Performance tips
Tools to use
Summary
10. The Standard Library and Packages
Digging deeper into the standard library
Julia's package manager
Installing and updating packages
Publishing a package
Graphics in Julia
Using Gadfly on data
Summary
A. List of Macros and Packages
Macros
List of packages
II. Module 2
1. Julia is Fast
Julia – fast and dynamic
Designed for speed
JIT and LLVM
Types
How fast can Julia be?
Summary
2. Analyzing Julia Performance
Timing Julia code
Tic and Toc
The @time macro
The @timev macro
The Julia profiler
Using the profiler
ProfileView
Analyzing memory allocation
Using the memory allocation tracker
Statistically accurate benchmarking
Using Benchmarks.jl
Summary
3. Types in Julia
The Julia type system
Using types
Multiple dispatch
Abstract types
Julia's type hierarchy
Composite and immutable types
Type parameters
Type inference
Type-stability
Definitions
Fixing type-instability
Performance pitfalls
Identifying type-stability
Loop variables
Kernel methods
Types in storage locations
Arrays
Composite types
Parametric composite types
Summary
4. Functions and Macros – Structuring Julia Code for High Performance
Using globals
The trouble with globals
Fixing performance issues with globals
Inlining
Default inlining
Controlling inlining
Disabling inlining
Closures and anonymous functions
FastAnonymous
Using macros for performance
The Julia compilation process
Using macros
Evaluating a polynomial
Horner's method
The Horner macro
Generated functions
Using generated functions
Using generated functions for performance
Using named parameters
Summary
5. Fast Numbers
Numbers in Julia
Integers
Integer overflow
BigInt
The floating point
Unchecked conversions for unsigned integers
Trading performance for accuracy
The fastmath macro
The K-B-N summation
Subnormal numbers
Subnormal numbers to zero
Summary
6. Fast Arrays
Array internals in Julia
Array representation and storage
Column-wise storage
Bound checking
Removing the cost of bound checking
Configuring bound checks at startup
Allocations and in-place operations
Preallocating function output
Mutating versions
Array views
SIMD parallelization
Yeppp!
Writing generic library functions with arrays
Summary
7. Beyond the Single Processor
Parallelism in Julia
Starting a cluster
Communication between Julia processes
Programming parallel tasks
@everywhere
@spawn
Parallel for
Parallel map
Distributed arrays
Shared arrays
Threading
Summary
III. Module 3
1. The Julia Environment
Introduction
Philosophy
Role in data science and big data
Comparison with other languages
Features
Getting started
Julia sources
Exploring the source stack
Juno
IJulia
A quick look at some Julia
Julia via the console
Installing some packages
A bit of graphics creating more realistic graphics with Winston
My benchmarks
Package management
Listing, adding, and removing
Choosing and exploring packages
Statistics and mathematics
Data visualization
Web and networking
Database and specialist packages
How to uninstall Julia
Adding an unregistered package
What makes Julia special
Parallel processing
Multiple dispatch
Homoiconic macros
Interlanguage cooperation
Summary
2. Developing in Julia
Integers, bits, bytes, and bools
Integers
Logical and arithmetic operators
Booleans
Arrays
Operations on matrices
Elemental operations
A simple Markov chain – cat and mouse
Char and strings
Characters
Strings
Unicode support
Regular expressions
Byte array literals
Version literals
An example
Real, complex, and rational numbers
Reals
Operators and built-in functions
Special values
BigFloats
Rationals
Complex numbers
Juliasets
Composite types
More about matrices
Vectorized and devectorized code
Multidimensional arrays
Broadcasting
Sparse matrices
Data arrays and data frames
Dictionaries, sets, and others
Dictionaries
Sets
Other data structures
Summary
3. Types and Dispatch
Functions
First-class objects
Passing arguments
Default and optional arguments
Variable argument list
Named parameters
Scope
The Queen's problem
Julia's type system
A look at the rational type
A vehicle datatype
Typealias and unions
Enumerations (revisited)
Multiple dispatch
Parametric types
Conversion and promotion
Conversion
Promotion
A fixed vector module
Summary
4. Interoperability
Interfacing with other programming environments
Calling C and Fortran
Mapping C types
Array conversions
Type correspondences
Calling a Fortran routine
Calling curl to retrieve a web page
Python
Some others to watch
The Julia API
Calling API from C
Metaprogramming
Symbols
Macros
Testing
Error handling
The enum macro
Tasks
Parallel operations
Distributed arrays
A simple MapReduce
Executing commands
Running commands
Working with the filesystem
Redirection and pipes
Perl one-liners
Summary
5. Working with Data
Basic I/O
Terminal I/O
Disk files
Text processing
Binary files
Structured datasets
CSV and DLM files
HDF5
XML files
DataFrames and RDatasets
The DataFrames package
DataFrames
RDatasets
Subsetting, sorting, and joining data
Statistics
Simple statistics
Samples and estimations
Pandas
Selected topics
Time series
Distributions
Kernel density
Hypothesis testing
GLM
Summary
6. Scientific Programming
Linear algebra
Simultaneous equations
Decompositions
Eigenvalues and eigenvectors
Special matrices
A symmetric eigenproblem
Signal processing
Frequency analysis
Filtering and smoothing
Digital signal filters
Image processing
Differential equations
The solution of ordinary differential equations
Non-linear ordinary differential equations
Partial differential equations
Optimization problems
JuMP
Optim
NLopt
Using with the MathProgBase interface
Stochastic problems
Stochastic simulations
SimJulia
Bank teller example
Bayesian methods and Markov processes
Monte Carlo Markov Chains
MCMC frameworks
Summary
7. Graphics
Basic graphics in Julia
Text plotting
Cairo
Winston
Data visualization
Gadfly
Compose
Graphic engines
PyPlot
Gaston
PGF plots
Using the Web
Bokeh
Plotly
Raster graphics
Cairo (revisited)
Winston (revisited)
Images and ImageView
Summary
8. Databases
A basic view of databases
The red pill or the blue pill?
Interfacing to databases
Other considerations
Relational databases
Building and loading
Native interfaces
ODBC
Other interfacing techniques
DBI
SQLite
MySQL
PostgreSQL
PyCall
JDBC
NoSQL datastores
Key-value systems
Document datastores
RESTful interfacing
JSON
Web-based databases
Graphic systems
Summary
9. Networking
Sockets and servers
Well-known ports
UDP and TCP sockets in Julia
A "Looking-Glass World" echo server
Named pipes
Working with the Web
A TCP web service
The JuliaWeb group
The "quotes" server
WebSockets
Messaging
E-mail
Twitter
SMS and esendex
Cloud services
Introducing Amazon Web Services
The AWS.jl package
The Google Cloud
Summary
10. Working with Julia
Under the hood
Femtolisp
The Julia API
Code generation
Performance tips
Best practice
Profiling
Lint
Debugging
Developing a package
Anatomy
Taxonomy
Using Git
Publishing
Community groups
Classifications
JuliaAstro
Cosmology models
The Flexible Image Transport System
The high-level API
The low-level API
JuliaGPU
What's missing?
Summary
A. Bibliography
Index

Julia: High Performance Programming

Julia: High Performance Programming

Leverage the power of Julia to design and develop high performing programs

A course in three modules

BIRMINGHAM - MUMBAI

Julia: High Performance Programming

Copyright © 2016 Packt Publishing

All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Published on: November 2016

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78712-570-4

www.packtpub.com

Credits

Authors

Ivo Balbaert

Avik Sengupta

Malcolm Sherrington

Reviewers

Pascal Bugnion

Michael Otte

Dustin Stansbury

Zhuo QL

Gururaghav Gopal

Dan Wlasiuk

Content Development Editor

Priyanka Mehta

Graphics

Disha Haria

Production Coordinator

Aparna Bhagat

Preface

Julia is a relatively young programming language. The initial design work on the Julia project began at MIT in August 2009, and by February 2012, it became open source. It is largely the work of three developers Stefan Karpinski, Jeff Bezanson, and Viral Shah. These three, together with Alan Edelman, still remain actively committed to Julia and MIT currently hosts a variety of courses in Julia, manyof which are available over the Internet.

Initially, Julia was envisaged by the designers as a scientific language sufficiently rapid to make the necessity of modeling in an interactive language and subsequently having to redevelop in a compiled language, such as C or Fortran. At that time the major scientific languages were propriety ones such as MATLAB and Mathematica, and were relatively slow. There were clones of these languages in the open source domain, such as GNU Octave and Scilab, but these were even slower. When it launched, the community saw Julia as a replacement for MATLAB, but this is not exactly the case. Although the syntax of Julia is similar to MATLAB, so much so that anyone competent in MATLAB can easily learn Julia, it was not designed as a clone. It is a more feature-rich language with many significant differences that will be discussed in depth later.

The period since 2009 has seen the rise of two new computing disciplines: big data/cloud computing and data science. Big data processing on Hadoop is conventionally seen as the realm of Java programming, since Hadoop runs on the Java virtual machine. It is, of course, possible to process big data by using programming languages other than those that are Java-based and utilize the streaming-jar paradigm, and Julia can be used in a way similar to C++, C#, and Python.

The emergence of data science heralded the use of programming languages that were simple for analysts with some programming skills but who were not principally programmers. The two languages that stepped up to fill the breach have been R and Python. Both of these are relatively old with their origins back in the 1990s. However, the popularity of these two has seen a rapid growth, ironically from around the time when Julia was introduced to the world. Even so, with such estimated and staid opposition, Julia has excited the scientific programming community and continues to make inroads in this space.

The aim of this course is to cover all aspects of Julia that make it appealing to the data scientist. The language is evolving quickly. Binary distributions are available for Linux, Mac OS X, and Linux, but these will lag behind the current sources. So, to do some serious work with Julia, it is important to understand how to obtain and build a running system from source. In addition, there are interactive development environments available for Julia and the course will discuss both the Jupyter and Juno IDEs.

What this learning path covers

Module 1, Getting Started with Julia, a head start to tackle your numerical and data problems with Julia. Your journey will begin by learning how to set up a running Julia platform before exploring its various built-in types. You will then move on to cover the different functions and constructs in Julia. The module will then walk you through the two important collection types―arrays and matrices. Over the course of the module, you will also be introduced to homoiconicity, the meta-programming concept in Julia. Towards the concluding part of the module, you will also learn how to run external programs. This module will cover all you need to know about Julia to leverage its high speed and efficiency.

Module 2, Julia High Performance, will take you on a journey to understand the performance characteristics of your Julia programs, and enables you to utilize the promise of near C levels of performance in Julia. You will learn to analyze and measure the performance of Julia code, understand how to avoid bottlenecks, and design your program for the highest possible performance. In this module, you will also see how Julia uses type information to achieve its performance goals, and how to use multiple dispatch to help the compiler to emit high performance machine code. Numbers and their arrays are obviously the key structures in scientific computing – you will see how Julia's design makes them fast.

Module 3, Mastering Julia, you will compare the different ways of working with Julia and explore Julia's key features in-depth by looking at design and build. You will see how data works using simple statistics and analytics, and discover Julia's speed, its real strength, which makes it particularly useful in highly intensive computing tasks and observe how Julia can cooperate with external processes in order to enhance graphics and data visualization. Finally, you will look into meta-programming and learn how it adds great power to the language and establish networking and distributed computing with Julia.

What you need for this learning path

Developing in Julia can be done under any of the familiar computing operating systems: Linux, OS X, and Windows. To explore the language in depth, the reader may wish to acquire the latest versions and to build from source under Linux. However, to work with the language using a binary distribution on any of the three platforms, the installation is very straightforward and convenient. In addition, Julia now comes pre-packaged with the Juno IDE, which just requires expansion from a compressed (zipped) archive.

Who this learning path is for

This learning path is for data scientists and for all those who work in technical and scientific computation projects. It will be great for Julia developers who are interested in high-performance technical computing.

This learning path assumes that you already have some basic working knowledge of Julia's syntax and high-level dynamic languages such as MATLAB, R, Python, or Ruby.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this course—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the title of the course in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to any of our product, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt course, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this course from your account at http://www.packtpub.com. If you purchased this course elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the course in the Search box.Select the course for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

You can also download the code files by clicking on the Code Files button on the course's webpage at the Packt Publishing website. This page can be accessed by entering the course's name in the Search box. Please note that you need to be logged into your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the course is also hosted on GitHub at https://github.com/PacktPublishing/Julia-High-Performance-Programming. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books/courses—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this course. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your course, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book/course in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this course, you can contact us at <[email protected]>, and we will do our best to address the problem.

Part I. Module 1

Getting Started with Julia

Enter the exciting world of Julia, a high-performance language for technical computing

The Rationale for Julia

This introduction will present you with the reasons why Julia is quickly growing in popularity in the technical, data scientist, and high-performance computing arena. We will cover the following topics:

The scope of JuliaJulia's place among other programming languagesA comparison with other languages for the data scientistUseful links

The scope of Julia

The core designers and developers of Julia (Jeff Bezanson, Stefan Karpinski, and Viral Shah) have made it clear that Julia was born out of a deep frustration with the existing software toolset in the technical computing disciplines. Basically, it boils down to the following dilemma:

Prototyping is a problem in this domain that needs a high-level, easy-to-use, and flexible language that lets the developer concentrate on the problem itself instead of on low-level details of the language and computation.The actual computation of a problem needs maximum performance; a factor of 10 in computation time makes a world of difference (think of one day versus ten days), so the production version often has to be (re)written in C or FORTRAN.Before Julia, practitioners had to be satisfied with a "speed for convenience" trade-off, use developer-friendly and expressive, but decades-old interpreted languages such as MATLAB, R, or Python to express the problem at a high level. To program the performance-sensitive parts and speed up the actual computation, people had to resort to statically compiled languages such as C or FORTRAN, or even the assembly code. Mastery on both the levels is not evident: writing high-level code in MATLAB, R, or Python for prototyping on the one hand, and writing code that does the same thing in C, which is used for the actual execution.

Julia was explicitly designed to bridge this gap. It gives you the possibility of writing high-performance code that uses CPU and memory resources as effectively as can be done in C, but working in pure Julia all the way down, reduces the need for a low-level language. This way, you can rapidly iterate using a simple programming model from the problem prototype to near-C performance. The Julia developers have proven that working in one environment that has the expressive capabilities as well as the pure speed is possible using the recent advances in Low Level Virtual Machine Just in Time (LLVM JIT) compiler technologies (for more information, see http://en.wikipedia.org/wiki/LLVM).

In summary, they designed Julia to have the following specifications:

Julia is open source and free with a liberal (MIT) license.It is designed to be an easy-to-use and learn, elegant, clear and dynamic, interactive language by reducing the development time. To that end, Julia almost looks like the pseudo code with an obvious and familiar mathematical notation; for example, here is the definition for a polynomial function, straight from the code:
x -> 7x^3 + 30x^2 + 5x + 42

Notice that there is no need to indicate the multiplications.

It provides the computational power and speed without having to leave the Julia environment.Metaprogramming and macro capabilities (due to its homoiconicity (refer to Chapter 7, Metaprogramming in Julia), inherited from Lisp), to increase its abstraction power.Also, it is usable for general programming purposes, not only in pure computing disciplines.It has built-in and simple to use concurrent and parallel capabilities to thrive in the multicore world of today and tomorrow.

Julia unites this all in one environment, something which was thought impossible until now by most researchers and language designers.

The Julia logo

Julia's place among the other programming languages

Julia reconciles and brings together the technologies that before were considered separate, namely:

The dynamic, untyped, and interpreted languages on the one hand (Python, Ruby, Perl, MATLAB/Octave, R, and so on)The statically typed and compiled languages on the other (C, C++, Fortran, and Fortress)

How can Julia have the flexibility of the first and the speed of the second category?

Julia has no static compilation step. The machine code is generated just-in-time by an LLVM-based JIT compiler. This compiler, together with the design of the language, helps Julia to achieve maximal performance for numerical, technical, and scientific computing. The key for the performance is the type information, which is gathered by a fully automatic and intelligent type inference engine, that deduces the type from the data contained in the variables. Indeed, because Julia has a dynamic type system, declaring the type of variables in the code is optional. Indicating types is not necessary, but it can be done to document the code, improve tooling possibilities, or in some cases, to give hints to the compiler to choose a more optimized execution path. This optional typing discipline is an aspect it shares with Dart. Typeless Julia is a valid and useful subset of the language, similar to traditional dynamic languages, but it nevertheless runs at statically compiled speeds. Julia applies generic programming and polymorphic functions to the limit, writing an algorithm just once and applying it to a broad range of types. This provides common functionality across drastically different types, for example: size is a generic function with 50 concrete method implementations. A system called dynamic multiple dispatch efficiently picks the optimal method for all of a function's arguments from tens of method definitions. Depending on the actual types very specific and efficient native code implementations of the function are chosen or generated, so its type system lets it align closer with primitive machine operations.

Note

In summary, data flow-based type inference implies multiple dispatch choosing specialized execution code.

However, do keep in mind that types are not statically checked. Exceptions due to type errors can occur at runtime, so thorough testing is mandatory. As to categorizing Julia in the programming language universe, it embodies multiple paradigms, such as procedural, functional, metaprogramming, and also (but not fully) object oriented. It is by no means an exclusively class-based language such as Java, Ruby, or C#. Nevertheless, its type system offers a kind of inheritance and is very powerful. Conversions and promotions for numeric and other types are elegant, friendly, and swift, and user-defined types are as fast and compact as built-in types. As for functional programming, Julia makes it very easy to design programs with pure functions and has no side effects; functions are first-class objects, as in mathematics.

Julia also supports a multiprocessing environment based on a message passing model to allow programs to run via multiple processes (local or remote) using distributed arrays, enabling distributed programs based on any of the models for parallel programming.

Julia is equally suited for general programming as is Python. It has as good and modern (Unicode capable) string processing and regular expressions as Perl or other languages. Moreover, it can also be used at the shell level, as a glue language to synchronize the execution of other programs or to manage other processes.

Julia has a standard library written in Julia itself, and a built-in package manager based on GitHub, which is called Metadata, to work with a steadily growing collection of external libraries called packages. It is cross platform, supporting GNU/Linux, Darwin/OS X, Windows, and FreeBSD for both x86/64 (64-bit) and x86 (32-bit) architectures.

A comparison with other languages for the data scientist

Because speed is one of the ultimate targets of Julia, a benchmark comparison with other languages is displayed prominently on the Julia website (http://julialang.org/). It shows that Julia's rivals C and Fortran, often stay within a factor of two of fully optimized C code, and leave the traditional dynamic language category far behind. One of Julia's explicit goals is to have sufficiently good performance that you never have to drop down into C. This is in contrast to the following environments, where (even for NumPy) you often have to work with C to get enough performance when moving to production. So, a new era of technical computing can be envisioned, where libraries can be developed in a high-level language instead of in C or FORTRAN. Julia is especially good at running MATLAB and R-style programs. Let's compare them somewhat more in detail.

MATLAB

Julia is instantly familiar to MATLAB users; its syntax strongly resembles that of MATLAB, but Julia aims to be a much more general purpose language than MATLAB. The names of most functions in Julia correspond to the MATLAB/Octave names, and not the R names. Under the covers, however, the way the computations are done, things are extremely different. Julia also has equally powerful capabilities in linear algebra, the field where MATLAB is traditionally applied. However, using Julia won't give you the same license fee headaches. Moreover, the benchmarks show that it is from 10 to 1,000 times faster depending on the type of operation, also when compared to Octave (the open source version of MATLAB). Julia provides an interface to the MATLAB language with the package MATLAB.jl (https://github.com/lindahua/MATLAB.jl).

R

R was until now the chosen development language in the statistics domain. Julia proves to be as usable as R in this domain, but again with a performance increase of a factor of 10 to 1,000. Doing statistics in MATLAB is frustrating, as is doing linear algebra in R, but Julia fits both the purposes. Julia has a much richer type system than the vector-based types of R. Some statistics experts such as Douglas Bates heavily support and promote Julia as well. Julia provides an interface to the R language with the package Rif.jl (https://github.com/lgautier/Rif.jl).

Python

Again, Julia has a performance head start of a factor of 10 to 30 times as compared to Python. However, Julia compiles the code that reads like Python into machine code that performs like C. Furthermore, if necessary you can call Python functions from within Julia using the PyCall package (https://github.com/stevengj/PyCall.jl).

Because of the huge number of existing libraries in all these languages, any practical data scientist can and will need to mix the Julia code with R or Python when the problem at hand demands it.

Julia can also be applied to data analysis and big data, because these often involve predictive analysis, modeling problems that can often be reduced to linear algebra algorithms, or graph analysis techniques, all things Julia is good at tackling.

In the field of High Performance Computing (HPC), a language such as Julia has long been lacking. With Julia, domain experts can experiment and quickly and easily express a problem in such a way that they can use modern HPC hardware as easily as a desktop PC. In other words, a language that gets users started quickly without the need to understand the details of the underlying machine architecture is very welcome in this area.

Useful links

The following are the links that can be useful while using Julia:

The main Julia website can be found at http://julialang.org/For documentation, refer to http://docs.julialang.org/en/latestView the packages at http://pkg.julialang.org/index.htmlSubscribe to the mailing lists at http://julialang.org/community/Get support at an IRC channel from http://webchat.freenode.net/?channels=julia

Summary

In this introduction, we gave an overview of Julia's characteristics and compared them to the existing languages in its field. Julia's main advantage is its ability to generate specialized code for different input types. When coupled with the compiler's ability to infer these types, this makes it possible to write the Julia code at an abstract level while achieving the efficiency associated with the low-level code. Julia is already quite stable and production ready. The learning curve for Julia is very gentle; the idea being that people who don't care about fancy language features should be able to use it productively too and learn about new features only when they become useful or needed.

Chapter 1. Installing the Julia Platform

This chapter guides you through the download and installation of all the necessary components of Julia. The topics covered in this chapter are as follows:

Installing JuliaWorking with Julia's shellStart-up options and Julia scriptsPackagesInstalling and working with Julia StudioInstalling and working with IJuliaInstalling Sublime-IJuliaInstalling JunoOther editors and IDEsWorking of Julia

By the end of this chapter, you will have a running Julia platform. Moreover, you will be able to work with Julia's shell as well as with editors or integrated development environments with a lot of built-in features to make development more comfortable.

Installing Julia

The Julia platform in binary (that is, executable) form can be downloaded from http://julialang.org/downloads/. It exists for three major platforms (Windows, Linux, and OS X) in 32- and 64-bit format, and is delivered as a package or in an archive format. You should use the current official stable release when doing serious professional work with Julia (at the time of writing, this is Version 0.3). If you would like to investigate the latest developments, install the upcoming version (which is now Version 0.4). The previous link contains detailed and platform-specific instructions for the installation. We will not repeat these instructions here completely, but we will summarize some important points.

Windows version – usable from Windows XP SP2 onwards

You need to keep the following things in mind if you are using the Windows OS:

As a prerequisite, you need the 7zip extractor program, so first download and install http://www.7-zip.org/download.html.Now, download the julia-n.m.p-win64.exe file to a temporary folder (n.m.p is the version number, such as 0.2.1 or 0.3.0; win32/win64 are respectively the 32- and 64-bit version; a release candidate file looks like julia-0.4.0-rc1-nnnnnnn-win64 (nnnnnnn is a checksum number such as 0480f1b).Double-click on the file (or right-click, and select Run as Administrator if you want Julia installed for all users on the machine). Clicking OK on the security dialog message, and then choosing the installation directory (for example, c:\julia) will extract the archive into the chosen folder, producing the following directory structure, and taking some 400 MB of disk space:

The Julia folder structure in Windows

A menu shortcut will be created which, when clicked, starts the Julia command-line version or Read Evaluate Print Loop (REPL), as shown in the following screenshot:

The Julia REPL

On Windows, if you have chosen C:\Julia as your installation directory, this is the C:\Julia\bin\julia.exe file. Add C:\Julia\bin to your PATH variable if you want the REPL to be available on any Command Prompt. The default installation folder on Windows is: C:\Users\UserName\AppData\Local\Julia-n.m.p (where n.m.p is the version number, such as 0.3.2).More information on Julia in the Windows OS can be found at https://github.com/JuliaLang/julia/blob/master/README.windows.md.

Ubuntu version

For Ubuntu systems (Version 12.04 or later), there is a Personal Package Archive (PPA) for Julia (can be found at https://launchpad.net/~staticfloat/+archive/ubuntu/juliareleases) that makes the installation painless. All you need to do to get the stable version is to issue the following commands in a terminal session:

sudo add-apt-repository ppa:staticfloat/juliareleasessudo add-apt-repository ppa:staticfloat/julia-depssudo apt-get updatesudo apt-get install julia

If you want to be at the bleeding edge of development, you can download the nightly builds instead of the stable releases. The nightly builds are generally less stable, but will contain the most recent features. To do so, replace the first of the preceding commands with:

sudo add-apt-repository ppa:staticfloat/julianightlies

This way, you can always upgrade to a more recent version by issuing the following commands:

sudo apt-get updatesudo apt-get upgrade

The Julia executable lives in /usr/bin/julia (given by the JULIA_HOME variable or by the which julia command) and the standard library is installed in /usr/share/julia/base, with shared libraries in /usr/lib/x86_64-linux-gnu/Julia.

For other Linux versions, the best way to get Julia running is to build from source (refer to the next section).

OS X

Installation for OS X is straightforward—using the standard software installation tools for the platform. Add /Applications/Julia-n.m.app/Contents/Resources/julia/bin/Julia to make Julia available everywhere on your computer.

If you want code to be run whenever you start a Julia session, put it in /home/.juliarc.jl on Ubuntu, ~/.juliarc.jl on OS X, or c:\Users\username\.juliarc.jl on Windows. For instance, if this file contains the following code:

println("Greetings! 你好! 안녕하세요?")

Then, Julia starts up in its shell (or REPL as it is usually called) with the following text in the screenshot, which shows its character representation capabilities:

Using .juliarc.jl

Building from source

Perform the following steps to build Julia from source:

Download the source code, rather than the binaries, if you intend to contribute to the development of Julia itself, or if no Julia binaries are provided for your operating system or particular computer architecture. Building from source is quite straightforward on Ubuntu, so we will outline the procedure here. The Julia source code can be found on GitHub at https://github.com/JuliaLang/julia.git.Compiling these will get you the latest Julia version, not the stable version (if you want the latter, download the binaries, and refer to the previous section).Make sure you have git installed; if not, issue the command:
sudo apt-get -f install git
Then, clone the Julia sources with the following command:
git clone git://github.com/JuliaLang/julia.git

This will download the Julia source code into a julia directory in the current folder.

The Julia building process needs the GNU compilation tools g++, gfortran, and m4, so make sure that you have installed them with the following command:
sudo apt-get install gfortran g++ m4
Now go to the Julia folder and start the compilation process as follows:
cd juliamake
After a successful build, Julia starts up with the ./julia command.Afterwards, if you want to download and compile the newest version, here are the commands to do this in the Julia source directory:
git pullmake cleanmake

For more information on how to build Julia on Windows, OS X, and other systems, refer to https://github.com/JuliaLang/julia/.

Tip

Using parallelization

If you want Julia to use n concurrent processes, compile the source with make -j n.

There are two ways of using Julia. As described in the previous section, we can use the Julia shell for interactive work. Alternatively, we can write programs in a text file, save them with a .jl extension, and let Julia execute the whole program sequentially.

Packages

Most of the standard library in Julia (can be found in /share/julia/base relative to where Julia was installed) is written in Julia itself. The rest of Julia's code ecosystem is contained in packages that are simply Git repositories. They are most often authored by external contributors, and already provide functionality for such diverse disciplines such as bioinformatics, chemistry, cosmology, finance, linguistics, machine learning, mathematics, statistics, and high-performance computing. A searchable package list can be found at http://pkg.julialang.org/. Official Julia packages are registered in the METADATA.jl file in the Julia Git repository, available on GitHub at https://github.com/JuliaLang/METADATA.jl.

Julia's installation contains a built-in package manager Pkg for installing additional Julia packages written in Julia. The downloaded packages are stored in a cache ready to be used by Julia given by Pkg.dir(), which are located at c:\users\username\.julia\vn.m\.cache, /home/$USER/.julia/vn.m/.cache, or ~/.julia/vn.m/.cache. If you want to check which packages are installed, run the Pkg.status() command in the Julia REPL, to get a list of packages with their versions, as shown in the following screenshot:

Packages list

The Pkg.installed() command gives you the same information, but in a dictionary form and is usable in code. Version and dependency management is handled automatically by Pkg. Different versions of Julia can coexist with incompatible packages, each version has its own package cache.

Tip

If you get an error with Pkg.status() such as ErrorException("Unable to read directory METADATA."), issue a Pkg.init() command to create the package repository folders, and clone METADATA from Git. If the problem is not easy to find or the cache becomes corrupted somehow, you can just delete the .julia folder, enter Pkg.init(), and start with an empty cache. Then, add the packages you need.

Adding a new package

Before adding a new package, it is always a good idea to update your package database for the already installed packages with the Pkg.update()command. Then, add a new package by issuing the Pkg.add("PackageName") command, and execute using PackageName in code or in the REPL. For example, to add 2D plotting capabilities, install the Winston package with Pkg.add("Winston "). To make a graph of 100 random numbers between 0 and 1, execute the following commands:

using Winstonplot(rand(100))

The rand(100) function is an array with 100 random numbers. This produces the following output:

A plot of white noise with Winston

After installing a new Julia version, update all the installed packages by running Pkg.update() in the REPL. For more detailed information, you can refer to http://docs.julialang.org/en/latest/manual/packages/.

Installing and working with Julia Studio

Julia Studio is a free desktop app for working with Julia that runs on Linux, Windows, and OS X (http://forio.com/labs/julia-studio/). It works with the 0.3 release on Windows (Version 0.2.1 for Linux and OS X, at this time, if you want Julia Studio to work with Julia v0.3 on Linux and OS X, you have to do the compilation of the source code of the Studio yourself). It contains a sophisticated editor and integrated REPL, version control with Git, and a very handy side pane with access to the command history, filesystem, packages, and the list of edited documents. It is created by Forio, a company that makes software for simulations, data explorations, interactive learning, and predictive analytics. In the following screenshot, you can see some of Julia Studio's features, such as the Console section and the green Run button (or F5) in the upper-right corner. The simple program fizzbuzz.jl prints for the first 100 integers for "fizz" if the number is a multiple of 3, "buzz" if a multiple of 5, and "fizzbuzz" if it is a multiple of 15.

Julia Studio

Notice the # sign that indicates the beginning of comments, the elegant and familiar for loop and if elseif construct, and how they are closed with end. The 1:100 range is a range; mod returns the remainder of the division; the function mod(i, n) can also be written as an i % n operator. Using four spaces for indentation is a convention. Recently, Forio also developed Epicenter, a computational platform for hosting the server-side models (also in Julia), and building interactive web interfaces for these models.

Installing Sublime-IJulia

The popular Sublime Text editor (http://www.sublimetext.com/3) now has a plugin based on IJulia (https://github.com/quinnj/Sublime-IJulia) authored by Jacob Quinn. It gives you syntax highlighting, autocompletion, and an in-editor REPL, which you basically just open like any other text file, but it runs Julia code for you. You can also select some code from a code file and send it to the REPL with the shortcut CTRL + B, or send the entire file there. Sublime-IJulia provides a frontend to the IJulia backend kernel, so that you can start an IJulia frontend in a Sublime view and interact with the kernel. Here is a summary of the installation, for details you can refer to the preceding URL:

From within the Julia REPL, install the ZMQ and IJulia packages.From within Sublime Text, install the Package Control package (https://sublime.wbond.net/installation).From within Sublime Text, install the IJulia package from the Sublime command palette.Ctrl + Shift + P opens up a new IJulia console. Start entering commands, and press Shift + Enter to execute them. The Tab key provides command completion.

Installing Juno

Another promising IDE for Julia and a work in progress by Mike Innes and Keno Fisher is Juno, which is based on the Light Table environment. The docs at http://junolab.org/docs/installing.html provides detailed instructions for installing and configuring Juno. Here is a summary of the steps:

Get LightTable from http://lighttable.com.Start LightTable, install the Juno plugin through its plugin manager, and restart LightTable.

Light Table works extensively with a command palette that you can open by typing Ctrl + SPACE, entering a command, and then selecting it. Juno provides an integrated console, and you can evaluate single expressions in the code editor directly by typing Ctrl + Enter at the end of the line. A complete script is evaluated by typing Ctrl + Shift + Enter.

Other editors and IDEs

For terminal users, the available editors are as follows:

Vimtogether with Julia-vim works great (https://github.com/JuliaLang/julia-vim)Emacswith julia-mode.el from the https://github.com/JuliaLang/julia/tree/master/contrib directory

On Linux, gedit is very good. The Julia plugin works well and provides autocompletion. Notepad++ also has Julia support from the contrib directory mentioned earlier.

The SageMath project (https://cloud.sagemath.com/) runs Julia in the cloud within a terminal and lets you work with IPython notebooks. You can also work and teach with Julia in the cloud using the JuliaBoxplatform (https://juliabox.org/).

Summary

By now, you should have been able to install Julia in a working environment you prefer. You should also have some experience with working in the REPL. We will put this to good use starting in the next chapter, where we will meet the basic data types in Julia, by testing out everything in the REPL.