32,39 €
This book is for you if you are a data scientist or working on any technical or scientific computation projects. The book assumes you have a basic working knowledge of high-level dynamic languages such as MATLAB, R, Python, or Ruby.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 254
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: February 2015
Production reference: 1200215
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-479-5
www.packtpub.com
Author
Ivo Balbaert
Reviewers
Pascal Bugnion
Michael Otte
Dustin E. Stansbury
Commissioning Editor
Kevin Colaco
Acquisition Editor
Kevin Colaco
Content Development Editor
Neeshma Ramakrishnan
Technical Editors
Mrunmayee Patil
Shali Sasidharan
Copy Editor
Rashmi Sawant
Project Coordinator
Purav Motiwalla
Proofreaders
Mario Cecere
Paul Hindle
Indexer
Monica Ajmera Mehta
Production Coordinator
Conidon Miranda
Cover Work
Conidon Miranda
Ivo Balbaert is currently a lecturer in (web) programming and databases at CVO Antwerpen (www.cvoantwerpen.be), a community college in Belgium. He received a PhD degree in applied physics from the University of Antwerp in 1986. He worked for 20 years in the software industry as a developer and consultant in several companies, and for 10 years as a project manager at the University Hospital of Antwerp. From 2000 onward, he switched to partly teaching and developing software (KHM Mechelen, CVO Antwerp).
He also wrote an introductory book in Dutch about developing in Ruby and Rails, Programmeren met Ruby en Rails, Van Duuren Media. In 2012, he authored a book on the Go programming language, The Way To Go, iUniverse. In 2013, in collaboration with Dzenan Ridzanovic, he authored Learning Dart and Dart Cookbook, both by Packt Publishing.
I would like to thank the technical reviewers Pascal Bugnion, Michael Otte, and Dustin Stansbury for the many useful remarks that improved the text.
Pascal Bugnion is a data scientist with a strong analytical background as well as a passion for software development. He pursued a materials science undergraduate degree at Oxford University. He then went on to complete a PhD in computational physics at Cambridge University, during which he developed and applied the quantum Monte Carlo methods to solidstate physics. This resulted in four publications, including an article in Physical Review Letters, the leading physics journal. He now works as a database architect for SCL Elections, a company that specializes in predicting voter behavior.
Pascal is strongly interested in contributing to open source software, especially the Python scientific stack. He has contributed to NumPy, matplotlib, and IPython, and maintains ScikitMonaco, a Python library for Monte Carlo integration as well as GMaps, a Python module for embedding Google maps in IPython notebooks.
Michael Otte has interests that include the application of artificial intelligence to robotics, with a focus on path planning algorithms and multirobot systems. He has been using the Julia language since 2012 to implement motion planning, graph search, and other algorithms, many of which have appeared in top peer-reviewed publications. See www.ottelab.com for more details. He is currently a research associate with the Department of Aerospace Engineering Sciences at the University of Colorado at Boulder. Prior to this, he was a postdoctoral associate with the Laboratory for Information and Decision Systems (LIDS) at the Massachusetts Institute of Technology. He received his PhD and MS degrees at the University of Colorado at Boulder in computer science and a BS degree in aeronautical engineering and computer science from Clarkson University.
Dustin Stansbury received his BS degree in both physics and psychology from Appalachian State University and his PhD degree in vision science from the University of California, Berkeley. His graduate research focused on developing hierarchical statistical models of the mammalian visual and auditory systems. He currently works in the field of music retrieval and regularly contributes to his machine learning blog, theclevermachine.
Dustin has contributed a chapter to the text book, Scene Vision: Making sense of what we see, MIT Press 2014, Cambridge MA.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Julia is a new programming language that was developed at MIT in the Applied Computing Group under the supervision of Prof. Alan Edelman. Its development started in 2009, and it was first presented publicly in February 2012. It is still a fairly young language when you look at the current Version number (0.3), but its foundation is stable; the core language has had no backwards incompatible changes since Version 0.1. It is based on clear and solid principles, and its popularity is steadily increasing in the technical, data scientist, and high-performance computing arenas. In the section The Rationale for Julia, we present an overview of the principles on which Julia is based and compare them to other languages.
Chapter 1, Installing the Julia Platform, guides you with the installation of all the necessary components required for a Julia environment. It teaches you how to work with Julia's console (the REPL) and discusses some of the more elaborate development editors you can use.
Chapter 2, Variables, Types, and Operations, discusses the elementary built-in types in Julia, and the operations that can be performed on them, so that you are prepared to start writing the code with them.
Chapter 3, Functions, explains why functions are the basic building blocks of Julia, and how to effectively use them.
Chapter 4, Control Flow, shows Julia's elegant control constructs, how to perform error handling, and how to use coroutines (called Tasks in Julia) to structure the execution of your code.
Chapter 5, Collection Types, explores the different types that group individual values, such as arrays and matrices, tuples, dictionaries, and sets.
Chapter 6, More on Types, Methods, and Modules, digs deeper into the type concept and explains how this is used in multiple dispatch to get C-like performance. Modules, a higher code organizing concept, are discussed as well.
Chapter 7, Metaprogramming in Julia, touches on the deeper layers of Julia, such as expressions and reflection capabilities, and demonstrates the power of macros.
Chapter 8, I/O, Networking, and Parallel Computing, shows how to work with data in files and databases using DataFrames. We can explore the networking capabilities, and shows how to set up a parallel computing environment with Julia.
Chapter 9, Running External Programs, looks at how Julia interacts with the command line and other languages and also discusses performance tips.
Chapter 10, The Standard Library and Packages, digs deeper into the standard library and demonstrates the important packages for visualization of data.
Appendix, List of Macros and Packages, provides you with handy reference lists of the macros and packages used in this book.
To run the code examples in the book, you will need the Julia platform for your computer, which can be downloaded from http://julialang.org/downloads/. To work more comfortably with Julia scripts, a development environment such as IJulia, Sublime Text, or LightTable is advisable. Chapter 1, Installing the Julia Platform, contains detailed instructions to set up your Julia environment.
This book is intended for the data scientist and for all those who work in technical and scientific computation projects. It will get you up and running quickly with Julia to start simplifying your projects applications. The book assumes that you already have some basic working knowledge of high-level dynamic languages such as MATLAB, R, Python, or Ruby.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
This introduction will present you with the reasons why Julia is quickly growing in popularity in the technical, data scientist, and high-performance computing arena. We will cover the following topics:
The core designers and developers of Julia (Jeff Bezanson, Stefan Karpinski, and Viral Shah) have made it clear that Julia was born out of a deep frustration with the existing software toolset in the technical computing disciplines. Basically, it boils down to the following dilemma:
Julia was explicitly designed to bridge this gap. It gives you the possibility of writing high-performance code that uses CPU and memory resources as effectively as can be done in C, but working in pure Julia all the way down, reduces the need for a low-level language. This way, you can rapidly iterate using a simple programming model from the problem prototype to near-C performance. The Julia developers have proven that working in one environment that has the expressive capabilities as well as the pure speed is possible using the recent advances in Low Level Virtual Machine Just in Time (LLVM JIT) compiler technologies (for more information, see http://en.wikipedia.org/wiki/LLVM).
In summary, they designed Julia to have the following specifications:
Notice that there is no need to indicate the multiplications.
It provides the computational power and speed without having to leave the Julia environment.Metaprogramming and macro capabilities (due to its homoiconicity (refer to Chapter 7, Metaprogramming in Julia), inherited from Lisp), to increase its abstraction power.Also, it is usable for general programming purposes, not only in pure computing disciplines.It has built-in and simple to use concurrent and parallel capabilities to thrive in the multicore world of today and tomorrow.Julia unites this all in one environment, something which was thought impossible until now by most researchers and language designers.
The Julia logo
Julia reconciles and brings together the technologies that before were considered separate, namely:
How can Julia have the flexibility of the first and the speed of the second category?
Julia has no static compilation step. The machine code is generated just-in-time by an LLVM-based JIT compiler. This compiler, together with the design of the language, helps Julia to achieve maximal performance for numerical, technical, and scientific computing. The key for the performance is the type information, which is gathered by a fully automatic and intelligent type inference engine, that deduces the type from the data contained in the variables. Indeed, because Julia has a dynamic type system, declaring the type of variables in the code is optional. Indicating types is not necessary, but it can be done to document the code, improve tooling possibilities, or in some cases, to give hints to the compiler to choose a more optimized execution path. This optional typing discipline is an aspect it shares with Dart. Typeless Julia is a valid and useful subset of the language, similar to traditional dynamic languages, but it nevertheless runs at statically compiled speeds. Julia applies generic programming and polymorphic functions to the limit, writing an algorithm just once and applying it to a broad range of types. This provides common functionality across drastically different types, for example: size is a generic function with 50 concrete method implementations. A system called dynamic multiple dispatch efficiently picks the optimal method for all of a function's arguments from tens of method definitions. Depending on the actual types very specific and efficient native code implementations of the function are chosen or generated, so its type system lets it align closer with primitive machine operations.
In summary, data flow-based type inference implies multiple dispatch choosing specialized execution code.
However, do keep in mind that types are not statically checked. Exceptions due to type errors can occur at runtime, so thorough testing is mandatory. As to categorizing Julia in the programming language universe, it embodies multiple paradigms, such as procedural, functional, metaprogramming, and also (but not fully) object oriented. It is by no means an exclusively class-based language such as Java, Ruby, or C#. Nevertheless, its type system offers a kind of inheritance and is very powerful. Conversions and promotions for numeric and other types are elegant, friendly, and swift, and user-defined types are as fast and compact as built-in types. As for functional programming, Julia makes it very easy to design programs with pure functions and has no side effects; functions are first-class objects, as in mathematics.
Julia also supports a multiprocessing environment based on a message passing model to allow programs to run via multiple processes (local or remote) using distributed arrays, enabling distributed programs based on any of the models for parallel programming.
Julia is equally suited for general programming as is Python. It has as good and modern (Unicode capable) string processing and regular expressions as Perl or other languages. Moreover, it can also be used at the shell level, as a glue language to synchronize the execution of other programs or to manage other processes.
Julia has a standard library written in Julia itself, and a built-in package manager based on GitHub, which is called Metadata, to work with a steadily growing collection of external libraries called packages. It is cross platform, supporting GNU/Linux, Darwin/OS X, Windows, and FreeBSD for both x86/64 (64-bit) and x86 (32-bit) architectures.
Because speed is one of the ultimate targets of Julia, a benchmark comparison with other languages is displayed prominently on the Julia website (http://julialang.org/). It shows that Julia's rivals C and Fortran, often stay within a factor of two of fully optimized C code, and leave the traditional dynamic language category far behind. One of Julia's explicit goals is to have sufficiently good performance that you never have to drop down into C. This is in contrast to the following environments, where (even for NumPy) you often have to work with C to get enough performance when moving to production. So, a new era of technical computing can be envisioned, where libraries can be developed in a high-level language instead of in C or FORTRAN. Julia is especially good at running MATLAB and R-style programs. Let's compare them somewhat more in detail.
Julia is instantly familiar to MATLAB users; its syntax strongly resembles that of MATLAB, but Julia aims to be a much more general purpose language than MATLAB. The names of most functions in Julia correspond to the MATLAB/Octave names, and not the R names. Under the covers, however, the way the computations are done, things are extremely different. Julia also has equally powerful capabilities in linear algebra, the field where MATLAB is traditionally applied. However, using Julia won't give you the same license fee headaches. Moreover, the benchmarks show that it is from 10 to 1,000 times faster depending on the type of operation, also when compared to Octave (the open source version of MATLAB). Julia provides an interface to the MATLAB language with the package MATLAB.jl (https://github.com/lindahua/MATLAB.jl).
R was until now the chosen development language in the statistics domain. Julia proves to be as usable as R in this domain, but again with a performance increase of a factor of 10 to 1,000. Doing statistics in MATLAB is frustrating, as is doing linear algebra in R, but Julia fits both the purposes. Julia has a much richer type system than the vector-based types of R. Some statistics experts such as Douglas Bates heavily support and promote Julia as well. Julia provides an interface to the R language with the package Rif.jl (https://github.com/lgautier/Rif.jl).
Again, Julia has a performance head start of a factor of 10 to 30 times as compared to Python. However, Julia compiles the code that reads like Python into machine code that performs like C. Furthermore, if necessary you can call Python functions from within Julia using the PyCall package (https://github.com/stevengj/PyCall.jl).
Because of the huge number of existing libraries in all these languages, any practical data scientist can and will need to mix the Julia code with R or Python when the problem at hand demands it.
Julia can also be applied to data analysis and big data, because these often involve predictive analysis, modeling problems that can often be reduced to linear algebra algorithms, or graph analysis techniques, all things Julia is good at tackling.
In the field of High Performance Computing (HPC), a language such as Julia has long been lacking. With Julia, domain experts can experiment and quickly and easily express a problem in such a way that they can use modern HPC hardware as easily as a desktop PC. In other words, a language that gets users started quickly without the need to understand the details of the underlying machine architecture is very welcome in this area.
The following are the links that can be useful while using Julia:
In this introduction, we gave an overview of Julia's characteristics and compared them to the existing languages in its field. Julia's main advantage is its ability to generate specialized code for different input types. When coupled with the compiler's ability to infer these types, this makes it possible to write the Julia code at an abstract level while achieving the efficiency associated with the low-level code. Julia is already quite stable and production ready. The learning curve for Julia is very gentle; the idea being that people who don't care about fancy language features should be able to use it productively too and learn about new features only when they become useful or needed.
This chapter guides you through the download and installation of all the necessary components of Julia. The topics covered in this chapter are as follows:
By the end of this chapter, you will have a running Julia platform. Moreover, you will be able to work with Julia's shell as well as with editors or integrated development environments with a lot of built-in features to make development more comfortable.
The Julia platform in binary (that is, executable) form can be downloaded from http://julialang.org/downloads/. It exists for three major platforms (Windows, Linux, and OS X) in 32- and 64-bit format, and is delivered as a package or in an archive format. You should use the current official stable release when doing serious professional work with Julia (at the time of writing, this is Version 0.3). If you would like to investigate the latest developments, install the upcoming version (which is now Version 0.4). The previous link contains detailed and platform-specific instructions for the installation. We will not repeat these instructions here completely, but we will summarize some important points.
You need to keep the following things in mind if you are using the Windows OS:
The Julia folder structure in Windows
The Julia REPL