23,99 €
Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.”
At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily.
You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 108
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1261115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-285-7
www.packtpub.com
Authors
Kartik Bhatnagar
Barry Hart
Reviewers
Oscar Campos
Pavan Narayanan
Commissioning Editor
Usha Iyer
Acquisition Editor
Larissa Pinto
Content Development Editor
Anish Sukumaran
Technical Editor
Tanmayee Patil
Copy Editor
Vikrant Phadke
Project Coordinator
Izzat Contractor
Proofreader
Safis Editing
Indexer
Rekha Nair
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
Kartik Bhatnagar loves nature and likes to visit picturesque places. He is a technical architect in the big data analytics unit of Infosys. He is passionate about new technologies. He is leading the development work of Apache Storm and MarkLogic NoSQL for a leading bank. Kartik has a total 10 years of experience in software development for Fortune 500 companies in many countries. His expertise also includes the full Amazon Web Services (AWS) stack and modern open source libraries. He is active on the StackOverflow platform and is always eager to help young developers with new technologies. Kartik has also worked as a reviewer of a book called Elasticsearch Blueprints, Packt Publishing. In the future, he wants to work on predictive analytics.
Barry Hart began using Storm in 2012 at AirSage. He quickly saw the potential of Storm while suffering from the limitations of the basic storm.py that it provides. In response, he developed Petrel, the first open source library for developing Storm applications in pure Python. He also contributed some bug fixes to the core Storm project.
When it comes to development, Barry has worked on a little of everything: Windows printer drivers, logistics planning frameworks, OLAP engines for the retail industry, database engines, and big data workflows.
Barry is currently an architect and senior Python/C++ developer at Pindrop Security, helping fight phone fraud in banking, insurance, investment, and other industries.
I want to thank my wonderful wife, Beth, for all her love and support. I would also like to thank my two little boys, who keep me young and make every day special.
Oscar Campos has been working with Python since early 2007. He is the author of the famous Anaconda Python IDE package for Sublime Text 3, available as free software at http://github.com/DamnWidget/anaconda.
He currently works as a senior software engineer on EXADS, programming high-concurrency backend system applications in Golang.
Oscar has also reviewed PySide GUI Application Development, Packt Publishing.
I want to thank my wife, Lydia, for all her support in every aspect of my life—without you, nothing could be possible.
Pavan Narayanan is a blogger at DataScience Hacks (https://datasciencehacks.wordpress.com), experienced in developing mathematical programming and data analytics solutions. He has utilized Apache Storm for developing real-time analytics prototype and his interests are exploring problem solving techniques, from industrial mathematics to machine learning. He can be reached at <[email protected]>.
Pavan has also reviewed Apache Mahout Essentials, Learning Apache Mahout Classification, and Mastering Machine Learning with R, all by Packt Publishing.
I would like to thank my family and God almighty for all the strength and endurance, and the folks at Packt Publishing for the opportunity to work on this book.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Apache Storm is a powerful framework for creating complex workflows that ingest and process huge amounts of data. With its generic concepts of spouts and bolts, along with simple deployment and monitoring tools, it allows developers to focus on the specifics of their workflow without reinventing the wheel.
However, Storm is written in Java. While it supports other programming languages besides Java, the tools are incomplete and there is little documentation and few examples.
One of the authors of this book created Petrel, the first framework that supports the creation of Storm topologies in 100 percent Python. He has firsthand experience with the struggles of building a Python Storm topology on the Java tool set. This book closes this gap, providing a resource to help Python developers of all experience levels in building their own applications using Storm.
Chapter 1, Getting Acquainted with Storm, provides detailed information about Storm's use cases, different installation modes, and configuration in Storm.
Chapter 2, The Storm Anatomy, tells you about Storm-specific terminologies, processes, fault tolerance in Storm, tuning parallelism in Storm, and guaranteed tuple processing, with detailed explanations about each of these.
Chapter 3, Introducing Petrel, introduces a framework called Petrel for building Storm topologies in Python. This chapter walks through the installation of Petrel and includes a simple example.
Chapter 4, Example Topology – Twitter, provides an in-depth example of a topology that computes statistics on Twitter data in real time. The example introduces the use of tick tuples, which are useful for topologies that need to compute statistics or do other things on a schedule. In this chapter, you also see how topologies can access configuration data.
Chapter 5, Persistence Using Redis and MongoDB, updates the sample Twitter topology for the use of Redis, a popular key-value store. It shows you how to simplify the complex Python calculation logic with built-in Redis operations. The chapter concludes with an example of storing Twitter data in MongoDB, a popular NoSQL database, and using its aggregation capabilities to generate reports.
Chapter 6, Petrel in Practice, teaches practical skills that will make developers more productive using Storm. You learn how to use Petrel to create automated tests for your spout and bolt components that run outside of Storm. You also see how to use a graphical debugger to debug a topology running inside Storm.
Appendix, Managing Storm Using Supervisord, is a practical demonstration of monitoring and control of Storm using a supervisor over the cluster.
You will need a computer with Python 2.7, Java 7 JDK, and Apache Storm 0.9.3. Ubuntu is recommended but not required.
This book is for beginners as well as advanced Python developers who want to use Storm to process big data in real time. While familiarity with the Java runtime environment is helpful for installing and configuring Storm, all the code examples in this book are in Python.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
