33,59 €
Over 40 recipes to get you up and running with programming using Julia
This book is for data scientists and data analysts who are familiar with the basics of the Julia language. Prior experience of working with high-level languages such as MATLAB, Python, R, or Ruby is expected.
Want to handle everything that Julia can throw at you and get the most of it every day? This practical guide to programming with Julia for performing numerical computation will make you more productive and able work with data more efficiently. The book starts with the main features of Julia to help you quickly refresh your knowledge of functions, modules, and arrays. We'll also show you how to utilize the Julia language to identify, retrieve, and transform data sets so you can perform data analysis and data manipulation.
Later on, you'll see how to optimize data science programs with parallel computing and memory allocation. You'll get familiar with the concepts of package development and networking to solve numerical problems using the Julia platform.
This book includes recipes on identifying and classifying data science problems, data modelling, data analysis, data manipulation, meta-programming, multidimensional arrays, and parallel computing. By the end of the book, you will acquire the skills to work more effectively with your data.
This book has a recipe-based approach to help you grasp the concepts of Julia programming.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 138
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: September 2016
Production reference: 1260916
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-201-2
www.packtpub.com
Author
Jalem Raj Rohit
Copy Editor
Pranjali Chury
Reviewer
Jakub Glinka
Project Coordinator
Izzat Contractor
Commissioning Editor
Pratik Shah
Proofreader
Safis Editing
Acquisition Editor
Denim Pinto
Indexer
Tejal Daruwale Soni
Content Development Editor
Rohit Singh
Production Coordinator
Aparna Bhagat
Technical Editor
Abhishek R. Kotian
Cover Work
Aparna Bhagat
Jalem Raj Rohit is an IIT Jodhpur graduate with a keen interest in machine learning, data science, data analysis, computational statistics, and natural language processing (NLP). Rohit currently works as a senior data scientist at Zomato, also having worked as the first data scientist at Kayako.
He is part of the Julia project, where he develops data science models and contributes to the codebase. Additionally, Raj is also a Mozilla contributor and volunteer, and he has interned at Scimergent Analytics.
I would thank my parents and my family for all their support and encouragement, which helped me make this book possible.
Jakub Glinka is a mathematician, programmer, and data scientist.
He holds a master's degree in applied mathematics from Warsaw University with a specialization in mathematical statistics.
From the beginning of his professional career, he is associated with GfK. His area of expertise ranges from Bayesian modeling to machine learning. He is enthusiastic about new programming languages and currently relying heavily on R and Julia in his professional work.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com , you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Get notified! Find out when new books are published by following @PacktEnterprise on Twitter or the Packt Enterprise Facebook page.
Julia is a programming language that promises both speed and support for extensive data science applications. Apart from the official documentation of the language, and the individual documentations for each package, there is no single resource that combines all of them and provides a detailed guide to carry out machine learning and data science. So, this book aims to solve the problem by being a comprehensive guide to learning data science for a Julia programmer, right from the exploratory analytics part to the visualization part.
Chapter 1, Extracting and Handling Data, deals with the importance of the Julia programming language for data science and its applications. It also serves as a guide to handle data in the most available formats, and shows how to crawl and scrape data from the Internet.
Chapter 2, Metaprogramming, covers the concept of metaprogramming, where a language can express its own code as a data structure of itself. For example, Lisp expresses code in the form of Lisp arrays, which are data structures in Lisp itself. Similarly, Julia can express its code as data structures.
Chapter 3, Statistics with Julia, teaches you how to perform statistics in Julia, along with the common problems of handling data arrays, distributions, estimation, and sampling techniques.
Chapter 4, Building Data Science Models, talks about various data science and statistical models. You will learn to design, customize, and apply them to various data science problems. This chapter will also teach you about model selection and the ways to learn how to build and understand robust statistical models.
Chapter 5, Working with Visualizations, teaches you how to visualize and present data, and also to analyze and the findings from the data science approach that you have taken to solve a particular problem. There are various types of visualizations to display your findings, namely the bar plot, the scatter plot, pie chart, and so on. It is very important to choose an appropriate method that can reflect your findings and work in a sensible and an aesthetically pleasing manner.
Chapter 6, Parallel Computing, talks about the concepts of parallel computing and handling a lot of data in Julia.
A beginner level proficiency in the Julia programming language and experience with any programming language, preferably dynamically typed ones such as Python. The software requirements assume you have any of the following OSes: Linux, Windows, or OS X. There are no specific hardware requirements, except that you run and work all your code on a desktop, or a laptop preferably.
This book is for beginner-level programmers, preferably Julia programmers who are looking to explore and learn the concepts in the domain of data science.
In this book, you will find several headings that appear frequently (Getting ready, How to do it…, How it works…, There's more…, and See also).
To give clear instructions on how to complete a recipe, we use these sections as follows:
This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.
This section contains the steps required to follow the recipe.
This section usually consists of a detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.
This section provides helpful links to other useful information for the recipe.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
In this chapter, we will cover the following recipes:
This chapter deals with the importance of the Julia programming language for data science and its applications. It also serves as a guide to handling data in the most available formats and also shows how to crawl and scrape data from the Internet.
Data Science pipelines that are used for production purposes need to be robust and highly fault-tolerant, without which the teams would be exposed highly error-prone models. So, these pipelines contain a subprocess called Extract-Transform-Load (ETL), in which the Extraction step involves pulling the data from a source, the Transform step involves the transforms performed on the dataset as part of the cleansing process, and the Load step is about loading the now clean data into the local databases for use in production. This will chapter will also teach you how to interact with websites by sending and receiving data through HTTP requests. This would be the first step in any data science and analytics pipeline. So, this chapter will cover some of those methods through which data can be ingested into the pipeline through various data sources.
Now, you are all set up to learn and experience Julia for data science.
Data Science is simply doing science with data. It applies to a surprisingly wide range of domains, such as engineering, business, marketing, and automotive, owing to the availability of a large amount of data in all these industries from which valuable insights can be extracted and understood.
With the growth of industries, the speed, volume, and variety of the data being produced are drastically increasing. And the tools that have to deal with this data are continuously being adapted, which led to the emergence of more evolved, powerful tools such as Julia.
Julia has been growing steadily as a powerful alternative to the current data science tools. Julia's diverse range of statistical packages along with its powerful compiler features make it a very strong competitor to the current top two programming languages of data science: R and Python. However, advanced users of R and Python can use Julia alongside each of them to reap the maximum benefits from the features of both.
