Learning Apache Thrift - Krzystof Rakowski - E-Book

Learning Apache Thrift E-Book

Krzystof Rakowski

0,0
27,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

With modern software systems being increasingly complex, providing a scalable communication architecture for applications in different languages is tedious. The Apache Thrift framework is the solution to this problem! It helps build efficient and easy-to-maintain services and offers a plethora of options matching your application type by supporting several popular programming languages, including C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, and Delphi.
This book will help you set aside the basics of service-oriented systems through your first Apache Thrift-powered app. Then, progressing to more complex examples, it will provide you with tips for running large-scale applications in production environments.
You will learn how to assess when Apache Thrift is the best tool to be used. To start with, you will run a simple example application, learning the framework's structure along the way; you will quickly advance to more complex systems that will help you solve various real-life problems. Moreover, you will be able to add a communication layer to every application written in one of the popular programming languages, with support for various data types and error handling. Further, you will learn how pre-eminent companies use Apache Thrift in their popular applications.
This book is a great starting point if you want to use one of the best tools available to develop cross-language applications in service-oriented architectures.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 244

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Learning Apache Thrift
Credits
About the Author
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introducing Apache Thrift
Distributed systems and their services
Service-oriented architecture
Distributed systems
Maintainability
Scalability
Testability
An introduction to Apache Thrift
Supported programming languages
Data types
Transports
Protocols
Versioning
Security
Interface description language
Apache Thrift and others
Custom protocols
XML-RPC and JSON-RPC
SOAP and WSDL
RESTful APIs
CORBA
Apache Avro
Protocol Buffers
When to choose Apache Thrift
Summary
2. Installing and Running Apache Thrift
Installing Apache Thrift on Linux
Installation requirements
Installing dependencies
Installing dependencies on CentOS
Installing dependencies on Debian and Ubuntu
Installing Apache Thrift
Installing Apache Thrift on Mac OS X
Installing Apache Thrift
Installing Apache Thrift on Windows
Testing the installation
Summary
3. Running Your First Apache Thrift Service and Client
Creating necessary project files
Creating a local copy of the Apache Thrift libraries
Defining our first service and generating files
The service code in PHP
The client code in Python
Running the code
What really happened?
Analyzing the code
The service description – IDL
The server script – PHP
The client script – Python
Summary
4. Understanding How Apache Thrift Works
Prepare your tools
Apache Thrift's architecture
Going about using the tool
Designing the services
Preparing the interface description
Generating service and client libraries
Implementing services and clients
Running server and clients
The network stack
Transport
Protocol
Processor
Server and client
Example
Apache Thrift's type system
Basic types
Special types
Structs
Unions
Containers
list
set
map
Usage of containers
Enums
Exceptions
Services
IDL syntax
Comments
Document
Headers
Thrift include
C++ include
Namespace
Definitions
const
typedef
Summary
5. Generating and Running Code in Different Languages
PHP
Generating the code
Examining the code
Transports
Protocols
Servers
Implementing and running the service
Implementing and running the client
Java
Generating the code
Examining the code
Transports
Protocols
Servers
Implementing and running the service
Implementing and running the client
Python
Generating the code
Examining the code
Transports
Protocols
Servers
Building the libraries
Implementing and running the service
Implementing and running the client
JavaScript
Generating the code
Examining the code
Transport, protocol, and servers
Implementing and running the client
Ruby
Generating the code
Examining the code
Transports
Protocols
Servers
Implementing and running the service
Implementing and running the client
C++
Generating the code
Examining the code
Transports
Protocols
Servers
Implementing and running the service
Implementing and running the client
Summary
6. Handling Errors in Apache Thrift
What are the type of errors that can occur?
Syntax errors
Runtime errors
Logic errors
What are exceptions and how to handle them?
Handling exceptions in Apache Thrift
An example code
Implementing the divide method
Running the application without error handling
Adding error handling to the server
Adding error handling to the client
Advanced error handling
Summary
7. An Example Client-Server Application
Our example application
Planning out your work
Getting a general idea of the example application
A technical overview of the application
get_distance
find_occurences
save_to_log
The server
Clients
Preparing the Apache Thrift document
The basic toolbox – base.thrift
The MyToolbox service – mytoolbox.thrift
Compiling the IDL files
Implementing the server
Imports
Displaying errors on the console (logger)
Implementing service methods
Creating the server
Running the server
Implementing and running clients
Creating a client in PHP
Creating a client in Ruby
Further testing and other exercises
Summary
8. Advanced Usage of Apache Thrift
Apache Thrift in production
Code version control systems
Code deployment
Apache Thrift versioning
Apache Thrift performance
Comparing Java servers
Comparing C++ servers
Service multiplexing
Security issues
General security tips
Transport Layer Security/Secure Sockets Layer
Generating keystores
Using keystores in the Java code
Real-world examples of the usage of Apache Thrift
FBThrift in Facebook
Apache Thrift in Evernote
Apache Thrift in Twitter
Apache Thrift in other companies
Summary
Index

Learning Apache Thrift

Learning Apache Thrift

Copyright © 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2015

Production reference: 1181215

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-274-6

www.packtpub.com

Credits

Author

Krzysztof Rakowski

Reviewer

Faisal Rahman

Commissioning Editor

Dipika Gaonkar

Acquisition Editor

Rahul Nair

Content Development Editor

Mehvash Fatima

Technical Editor

Ankita Thakur

Copy Editor

Sonia Cheema

Project Coordinator

Milton Dsouza

Proofreader

Safis Editing

Indexer

Hemangini Bari

Graphics

Jason Monteiro

Production Coordinator

Nilesh Mohite

Cover Work

Nilesh Mohite

About the Author

Krzysztof Rakowski has 13 years of professional experience in IT as a team leader, software developer and architect, and agile project manager. During the course of his career, he has helped major global brands establish their online presence using scalable, fault-tolerant, and high-performance systems. His broad experience comes from various industries, including interactive advertising, banking, retail, and e-commerce. He is a recognized expert, Zend Certified Engineer, and a Professional Scrum Master.

Currently, Krzysztof works for the largest online shop in central and eastern Europe—where he is responsible for supervising teams of software engineers and project managers who pair the smartest IT solutions with the best customer experience.

He enjoys sharing his knowledge through articles and presentations. He occasionally writes about his side projects on his website at www.rakowski.pro.

In his free time, Krzysztof likes to travel around the world with his wife, go snowboarding, or read a good book.

I would like to thank my wife, Anna, for her constant support, encouragement, and patience. I also want to thank my parents, parents-in-law, and brother for inspiring me to reach my goals.

This book wouldn't be possible without the generous support of the friendly people at Packt Publishing.

About the Reviewer

Faisal Rahman is a developer, writer, mentor, and tech enthusiast. His passion extends from architecting secure, scalable, and maintainable software to finding optimal algorithms and data structures for the smallest problems in a system. His research on optimization algorithms for known mathematical problems has been published in reputed journals. He is currently working as a software engineer at Microsoft.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

I dedicate my work on this book to my son, Ignacy, who will be born as this book goes into print.

Preface

In 2007, Facebook's engineers needed to integrate the various applications powering their website. As their engineering culture encouraged selecting the best tools for a task without imposing strict rules regarding the choice of technology, their applications were written in a wide spectrum of different programming languages, which were considered the best for the given task.

Looking for the best solution to fulfill their needs, the engineers reviewed lots of different frameworks that were already available on the market. None of them was deemed sufficient in terms of performance or flexibility. They made a decision to develop their own solution, which became a standard to integrate all the services on Facebook.

As they considered their solution to be exceeding the current standards of the market, they released their code to the open source community, passing the task of maintaining their work on the project to the Apache Software Foundation. Since then, Apache Thrift has been developed by a large group of volunteers.

Now you can use Apache Thrift as a tool to expose your own services that are written in different languages and make your applications communicate with each other. Regardless of whether you intend to work on a small-scale application or huge enterprise, Apache Thrift may be one of the best tools for you.

In Learning Apache Thrift, you will find an introduction to various concepts of the services around you and some service-oriented architecture (SOA). Then you will learn how to use Apache Thrift in various projects. We will discuss advanced concepts too to see how the giants of the industry use this framework, and you will get some solid advice and much needed inspiration.

What this book covers

Chapter 1, Introducing Apache Thrift, gives you basic information about the environment where services are needed. You will learn about the history of Apache Thrift and its position in the market. This chapter provides some solid understanding of the context in which Apache Thrift exists.

Chapter 2, Installing and Running Apache Thrift, provides you with a quick tutorial that will allow you to have Apache Thrift up and running on your machine in no time. Instructions for Linux (Debian and CentOS), Windows, and Mac OS X are included.

Chapter 3, Running Your First Apache Thrift Service and Client, gives you the ability to see Apache Thrift in action. Simple instructions will get you through the process of setting up a server and client that run in two different programming languages (PHP and Python) and communicate with each other.

Chapter 4, Understanding How Apache Thrift Works, provides you with real knowledge of the framework's internals. You will learn about its components, network stacks, data types, interface description language (IDL), and the programming languages that are supported. You will also find out about its limitations and how to deal with them. This chapter is essential to understand the concept of "under the hood", and how to design your own Apache Thrift-supported services.

Chapter 5, Generating and Running Code in Different Languages, provides you with a toolbox of essential information about different popular programming languages and how you can use them with Apache Thrift. You may read it from the beginning to the end or just focus on those languages that interest you. The same example is used for every language, so you can easily compare the server's and client's implementation for each of them.

Chapter 6, Handling Errors in Apache Thrift, gives you information on how to deal with undesirable situations that may occur when you run your service or client. Handling errors is an important part of any programming project, and is especially essential when dealing with cross-platform applications where errors occur frequently due to the nature of the distributed architecture.

Chapter 7, An Example Client-Server Application, gathers knowledge from the whole book into one example client-server application. You will build the code step by step. The example touches every aspect of Apache Thrift and is a bit more complicated than what you have done until now. Three different languages will be used (PHP, Python, and Ruby).

Chapter 8, Advanced Usage of Apache Thrift, inspires you to further expand your Apache Thrift skills. You will learn how big companies use this framework, how to run your applications in production, and how to address security, performance, and scalability issues. You will be also be given access to other interesting Apache Thrift-related projects.

What you need for this book

To run the examples in this book, you will need any modern computer running Linux—CentOS or Debian (preferred)—Mac OS X, or Windows. You will also need some text editor to edit your code.

You will also need an Internet connection to download Apache Thrift and other required software on your computer.

Who this book is for

If you have some experience of developing applications in one or more languages that are supported by Apache Thrift (C++, Java, PHP, Python, Ruby, and others) and want to broaden your knowledge and skills in building cross-platform, scalable applications, then this book is for you.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Introducing Apache Thrift

There is a milestone in the life of every sufficiently large application that marks the point when it is too big to be maintained as a monolith. For some systems, it is in their blueprints from the very beginning, while for others, it comes as a growth induced necessity and brings along the need for massive rebuild.

Apache Thrift is one of the tools that assist in building scalable, distributed systems, spanning across different platforms and languages. Originally developed for internal use by Facebook, now it is an open source software project backed by the Apache Foundation. It is characterized by a wide range of supported languages, flexibility, and performance.

In this chapter, you will learn about the scenarios where using Apache Thrift may be necessary. You will also get familiar with its basic properties and how it is compared to other similar frameworks. It is essential to know the big picture to be able to select the best tool for your job.

Let's see how you can put Apache Thrift to good use!

Distributed systems and their services

Imagine typical web applications that you use every day, such as search engines, messaging platforms, or social networks. Under one web address, they deliver different services. For example, a social network delivers people search, messaging, and users' profile pages. While you access them by one user interface—a web page written in HTML and JavaScript—what you see in your browser is only a gateway. Your request to message a friend is being relayed by the underlying application to the messaging service—an application which is specifically designed to deal with exchange of messages between the social network's users.

Service-oriented architecture

Messaging service, which we use as an example here, may be written in a completely different programming language than web application. It is a design decision. The system architect may decide that interface of your social network; the web pages that you see every time you log in will be easier to manage and maintain when they are written in, let's say, PHP or Ruby on Rails. However, messaging systems may be written in Python as the architect may decide that this language offers better libraries for this task. On the other hand, search engines or other tools that need superb performance are often written in C++. There may be also some internal corporate applications in Java or C#.

Those applications, of course, need to communicate with each other. But how to do that? There is a concept in software design called service-oriented architecture (SOA). We just discussed the first part of this principle. It focuses on creating applications around distinct tasks. If every task is performed by a different application, there is a need for some means of communication between them. To achieve this goal, applications expose services that are used by other applications. Typically, they are accessible over some medium, that is, an internal network or the Internet. They are self-contained and autonomous, which means they are independent of other services and are able to deliver complete response when queried. They should also be well documented so that any developer can use them.

Distributed systems

When—as in our example of social network—we have a system that consists of many autonomous services, we call such systems distributed systems. Depending on the scale, business needs, or technical constraints, the systems may be spread over lots of computers in a local network, the Internet, or just on a single machine. Benefitting from the SOA principles, you may run and test on your desktop computer distributed system of the same logical architecture, which will be then used on hundreds of servers in the production environment.

There are many advantages of SOA in distributed systems over monolithic applications. Let's discuss some of them.

Maintainability

The greatest advantage of distributed systems in SOA is their maintainability, which means ease of performing all the tasks related to the caretaking of the software. If the system consists of many applications, each dedicated to one task or type of tasks instead of one big monolith, some of the actions can be performed a lot easier:

You can select tools (that is, programming languages, libraries, and services) that are best for a given task. You can use different toolsets for search engine, message queues, or data-intensive calculations.Instead of having all the developers working on one application (that means one code base), you can split the team to work on many applications separately. You can even outsource some of the work to external teams or companies. This way, they won't get in each other's way. Smaller teams are more agile and yield better results.Communication between the different components of the system is narrowed to only one specified interface, which is easier to comprehend, monitor, and debug than lots of convoluted classes and methods.It is easier to respond to failures and fix bugs. Let's say there's some bug introduced that causes whole application to crash. In distributed systems, only one service may be down, while the whole system is operational. System operators or developers are able to replace the service with the stable version and do some tests to identify the bug or perform other actions without affecting the rest of the system.Introducing changes is a lot easier too. In the common workflow, if a new version of a service is to be deployed, it can be run as a separate instance with the old version simultaneously. System operators can switch the client application from the old to the new service and see whether everything performs correctly. If it does, the old service is turned off; otherwise, it is easy to switch back to the old service and fix the new one. It is even easier in the cloud environments.

Scalability

Many systems are required to perform well under a high load. It is not only the domain of web applications, but it is best pictured here: popular websites receive hundreds of millions of page views per day, which constitutes a high traffic load. To withstand such increasing stress, systems need to scale. The most obvious way, known by every computer user, is to add RAM or switch to a better CPU if applications don't run smoothly. But there is a limit to such scaling (called vertical scaling). You don't expect Google to run on a single powerful computer, do you?

The other type of scalability is horizontal scaling, which means adding more computers (called nodes) to the system. For example, our imaginary social network system may consist of several web application nodes, a few database nodes, and also some user search nodes. In properly designed systems, operators can add or remove nodes depending on the expected load and other circumstances. More sophisticated systems can even scale themselves, starting or stopping nodes in the cloud automatically, based on the traffic analysis.

SOA allows multiple nodes of the same function to be accessible to the clients. As services are self-contained, independent of the state of other services, and documented, developers can prepare their software without much care if they will be dealing with one or hundred nodes. In most scenarios, traffic to the services is managed by software or hardware load balancers, making it completely invisible for the client.

Testability

Another advantage of distributed systems is the easiness of testing them and finding and fixing bugs. Independence of services means that they can be tested in isolation from the whole system. Only a particular service's operation is being tested without any influence from other components. Because services should be well documented, it is easy to predict the desired output for a given input. If bugs are found, they can be evaluated and fixed without the need to consider them in the scope of whole system.