Building Python Real time Applications with Storm - Kartik Bhatnagar - E-Book

Building Python Real time Applications with Storm E-Book

Kartik Bhatnagar

0,0
23,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Big data is a trending concept that everyone wants to learn about. With its ability to process all kinds of data in real time, Storm is an important addition to your big data “bag of tricks.”
At the same time, Python is one of the fastest-growing programming languages today. It has become a top choice for both data science and everyday application development. Together, Storm and Python enable you to build and deploy real-time big data applications quickly and easily.
You will begin with some basic command tutorials to set up storm and learn about its configurations in detail. You will then go through the requirement scenarios to create a Storm cluster. Next, you’ll be provided with an overview of Petrel, followed by an example of Twitter topology and persistence using Redis and MongoDB. Finally, you will build a production-quality Storm topology using development best practices.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 108

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Building Python Real-Time Applications with Storm
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Acquainted with Storm
Overview of Storm
Before the Storm era
Key features of Storm
Storm cluster modes
Developer mode
Single-machine Storm cluster
Multimachine Storm cluster
The Storm client
Prerequisites for a Storm installation
Zookeeper installation
Storm installation
Enabling native (Netty only) dependency
Netty configuration
Starting daemons
Playing with optional configurations
Summary
2. The Storm Anatomy
Storm processes
Supervisor
Zookeeper
The Storm UI
Storm-topology-specific terminologies
The worker process, executor, and task
Worker processes
Executors
Tasks
Interprocess communication
A physical view of a Storm cluster
Stream grouping
Fault tolerance in Storm
Guaranteed tuple processing in Storm
XOR magic in acking
Tuning parallelism in Storm – scaling a distributed computation
Summary
3. Introducing Petrel
What is Petrel?
Building a topology
Packaging a topology
Logging events and errors
Managing third-party dependencies
Installing Petrel
Creating your first topology
Sentence spout
Splitter bolt
Word Counting Bolt
Defining a topology
Running the topology
Troubleshooting
Productivity tips with Petrel
Improving startup performance
Enabling and using logging
Automatic logging of fatal errors
Summary
4. Example Topology – Twitter
Twitter analysis
Twitter's Streaming API
Creating a Twitter app to use the Streaming API
The topology configuration file
The Twitter stream spout
Splitter bolt
Rolling word count bolt
The intermediate rankings bolt
The total rankings bolt
Defining the topology
Running the topology
Summary
5. Persistence Using Redis and MongoDB
Finding the top n ranked topics using Redis
The topology configuration file – the Redis case
Rolling word count bolt – the Redis case
Total rankings bolt – the Redis case
Defining the topology – the Redis case
Running the topology – the Redis case
Finding the hourly count of tweets by city name using MongoDB
Defining the topology – the MongoDB case
Running the topology – the MongoDB case
Summary
6. Petrel in Practice
Testing a bolt
Example – testing SplitSentenceBolt
Example – testing SplitSentenceBolt with WordCountBolt
Debugging
Installing Winpdb
Add Winpdb breakpoint
Launching and attaching the debugger
Profiling your topology's performance
Split sentence bolt log
Word count bolt log
Summary
A. Managing Storm Using Supervisord
Storm administration over a cluster
Introducing supervisord
Supervisord components
Supervisord installation
Configuration of supervisord.conf
Configuration of supervisord.conf on 172-31-19-62
Summary
Index

Building Python Real-Time Applications with Storm

Building Python Real-Time Applications with Storm

Copyright © 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: November 2015

Production reference: 1261115

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-285-7

www.packtpub.com

Credits

Authors

Kartik Bhatnagar

Barry Hart

Reviewers

Oscar Campos

Pavan Narayanan

Commissioning Editor

Usha Iyer

Acquisition Editor

Larissa Pinto

Content Development Editor

Anish Sukumaran

Technical Editor

Tanmayee Patil

Copy Editor

Vikrant Phadke

Project Coordinator

Izzat Contractor

Proofreader

Safis Editing

Indexer

Rekha Nair

Production Coordinator

Aparna Bhagat

Cover Work

Aparna Bhagat

About the Authors

Kartik Bhatnagar loves nature and likes to visit picturesque places. He is a technical architect in the big data analytics unit of Infosys. He is passionate about new technologies. He is leading the development work of Apache Storm and MarkLogic NoSQL for a leading bank. Kartik has a total 10 years of experience in software development for Fortune 500 companies in many countries. His expertise also includes the full Amazon Web Services (AWS) stack and modern open source libraries. He is active on the StackOverflow platform and is always eager to help young developers with new technologies. Kartik has also worked as a reviewer of a book called Elasticsearch Blueprints, Packt Publishing. In the future, he wants to work on predictive analytics.

Barry Hart began using Storm in 2012 at AirSage. He quickly saw the potential of Storm while suffering from the limitations of the basic storm.py that it provides. In response, he developed Petrel, the first open source library for developing Storm applications in pure Python. He also contributed some bug fixes to the core Storm project.

When it comes to development, Barry has worked on a little of everything: Windows printer drivers, logistics planning frameworks, OLAP engines for the retail industry, database engines, and big data workflows.

Barry is currently an architect and senior Python/C++ developer at Pindrop Security, helping fight phone fraud in banking, insurance, investment, and other industries.

I want to thank my wonderful wife, Beth, for all her love and support. I would also like to thank my two little boys, who keep me young and make every day special.

About the Reviewers

Oscar Campos has been working with Python since early 2007. He is the author of the famous Anaconda Python IDE package for Sublime Text 3, available as free software at http://github.com/DamnWidget/anaconda.

He currently works as a senior software engineer on EXADS, programming high-concurrency backend system applications in Golang.

Oscar has also reviewed PySide GUI Application Development, Packt Publishing.

I want to thank my wife, Lydia, for all her support in every aspect of my life—without you, nothing could be possible.

Pavan Narayanan is a blogger at DataScience Hacks (https://datasciencehacks.wordpress.com), experienced in developing mathematical programming and data analytics solutions. He has utilized Apache Storm for developing real-time analytics prototype and his interests are exploring problem solving techniques, from industrial mathematics to machine learning. He can be reached at <[email protected]>.

Pavan has also reviewed Apache Mahout Essentials, Learning Apache Mahout Classification, and Mastering Machine Learning with R, all by Packt Publishing.

I would like to thank my family and God almighty for all the strength and endurance, and the folks at Packt Publishing for the opportunity to work on this book.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Apache Storm is a powerful framework for creating complex workflows that ingest and process huge amounts of data. With its generic concepts of spouts and bolts, along with simple deployment and monitoring tools, it allows developers to focus on the specifics of their workflow without reinventing the wheel.

However, Storm is written in Java. While it supports other programming languages besides Java, the tools are incomplete and there is little documentation and few examples.

One of the authors of this book created Petrel, the first framework that supports the creation of Storm topologies in 100 percent Python. He has firsthand experience with the struggles of building a Python Storm topology on the Java tool set. This book closes this gap, providing a resource to help Python developers of all experience levels in building their own applications using Storm.

What this book covers

Chapter 1, Getting Acquainted with Storm, provides detailed information about Storm's use cases, different installation modes, and configuration in Storm.

Chapter 2, The Storm Anatomy, tells you about Storm-specific terminologies, processes, fault tolerance in Storm, tuning parallelism in Storm, and guaranteed tuple processing, with detailed explanations about each of these.

Chapter 3, Introducing Petrel, introduces a framework called Petrel for building Storm topologies in Python. This chapter walks through the installation of Petrel and includes a simple example.

Chapter 4, Example Topology – Twitter, provides an in-depth example of a topology that computes statistics on Twitter data in real time. The example introduces the use of tick tuples, which are useful for topologies that need to compute statistics or do other things on a schedule. In this chapter, you also see how topologies can access configuration data.

Chapter 5, Persistence Using Redis and MongoDB, updates the sample Twitter topology for the use of Redis, a popular key-value store. It shows you how to simplify the complex Python calculation logic with built-in Redis operations. The chapter concludes with an example of storing Twitter data in MongoDB, a popular NoSQL database, and using its aggregation capabilities to generate reports.

Chapter 6, Petrel in Practice, teaches practical skills that will make developers more productive using Storm. You learn how to use Petrel to create automated tests for your spout and bolt components that run outside of Storm. You also see how to use a graphical debugger to debug a topology running inside Storm.

Appendix, Managing Storm Using Supervisord, is a practical demonstration of monitoring and control of Storm using a supervisor over the cluster.

What you need for this book

You will need a computer with Python 2.7, Java 7 JDK, and Apache Storm 0.9.3. Ubuntu is recommended but not required.

Who this book is for

This book is for beginners as well as advanced Python developers who want to use Storm to process big data in real time. While familiarity with the Java runtime environment is helpful for installing and configuring Storm, all the code examples in this book are in Python.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code