34,79 €
This book achieves its goal by taking up an end-to-end development structure, right from understanding NOSQL document design to implementing full fledged eCommerce application design using Couchbase as a backend.
Starting with the architecture of Couchbase to get you up and running, this book quickly takes you through designing a NoSQL document and implementing highly scalable applications using Java API. You will then be introduced to document design and get to know the various ways to administer Couchbase. Followed by this, learn to store documents using bucket. Moving on, you will then learn to store, retrieve and delete documents using smart client base on Java API. You will then retrieve documents using SQL like syntax call N1QL. Next, you will learn how to write map reduce base views. Finally, you will configure XDCR for disaster recovery and implement an eCommerce application using Couchbase.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 262
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2015
Production reference: 1171115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78528-859-3
www.packtpub.com
Author
Henry Potsangbam
Reviewers
Tigran Babloyan
Clive Holloway
Marcus Johansson
Commissioning Editor
Neil Alexander
Acquisition Editor
Nikhil Karkal
Content Development Editor
Samantha Gonsalves
Technical Editor
Deepti Tuscano
Copy Editors
Merilyn Pereira
Vikrant Phadke
Project Coordinator
Kinjal Bari
Proofreader
Safis Editing
Indexer
Rekha Nair
Production Coordinator
Manu Joseph
Cover Work
Manu Joseph
Henry Potsangbam is an experienced software developer, administrator, and architect with more than 14 years of experience in enterprise application architecture, design, and development. He's worked in various domains, such as e-commerce, retail, and energy sectors. He is an IBM certified application and solution developer, SAP Certified Netweaver EP Consultant and CIPM (project management).
Always fascinated by and interested in exploring emerging technologies to solve business scenarios, Henry has been following NoSQL and Couchbase since its initial release around 2011.
In his spare time, he explores, and educates professionals in big data technologies such as Hadoop (Mapr, Hortonworks, and Cloudera), enterprise integration (camel, fuse esb, and Mule), analytics with R, messaging with kafka, rabbitMQ, the OSGI framework, NoSQL (Couchbase, Cassandra, and Mongodb), enterprise architecture, and so on. During his career, he architect private cloud implementation using virtualization for one of the fortune 500 company.
He also played active role in provisioning infrastructure for one of the largest cash transfer programme in the world.
I would like to thank my wife, Rajnita, and my sons, Henderson and Tiraj, who supported and encouraged me in spite of all the time I took away from them while writing this book.
I also want to thank Nikhil Karkal and Samantha Gonsalves, without whose efforts and encouragement this book quite possibly would not have happened.
I would also like to thank all the reviewers for providing valuable input and making this book a success.
Tigran Babloyan is a software developer and technical solution lead with over 8 years of commercial application development and consulting experience. He has played key roles in several Java Enterprise projects for companies such as Sun Microsystems, Oracle, DHL, and several governmental projects. Currently, besides his main duties as a Java development lead, he also consults several companies and start-ups on big data and NoSQL migration. Apache Lucene and Spark, Couchbase, and JavaEE are only a small part of Tigran's daily duties.
Clive Holloway is a New York based developer who has been working with web technologies for over 20 years—from website and mobile UI design, to systems architecture and database design. Surprisingly, he has a website: http://cliveholloway.net.
Marcus Johansson is currently working as a Berlin-based freelance developer, having previously worked on one of the world's most visited Couchbase projects during his time at Nokia.
Marcus writes about development in general and Drupal specifically at www.drupaldare.com.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
This book will enable you to understand Couchbase, how its flexible schema helps to develop agile application without downtime, and its architecture. You will also learn how to design document base data schema, connecting using connection polling from Java base applications to Couchbase. You will understand how to retrieve data from it using MapReduce based views, understand SQL-like syntax, N1QL to extract documents from the Couchbase database, bucket and perform high availability features with XDCR. It will also enable you to perform full text search by integrating ElasticSearch plugins.
Chapter 1, Introduction to Couchbase, introduces the concepts of NoSQL databases, provides the architecture, and introduces the various concepts of Couchbase. It will explain the installation of Couchbase in the Windows and Linux environments; finally, it will introduce the various logging and configuration folders.
Chapter 2, The Couchbase Administration Interface, provides an overview on various administration interfaces provided by Couchbase. The reader will be able to use the various interfaces, such as the web admin UI, the administration REST API, and the command line interface.
Chapter 3, Storing Documents in Couchbase Using Buckets, introduces the concept of buckets in detail. It will also explain how documents are stored in Couchbase and how it maintains them in a Couchbase cluster.
Chapter 4, Designing a Document for Couchbase, introduces the concepts of JSON, compares NoSQL with RDBMS, and explains how to manage relationships between various documents. It will also familiarize you with the document editor option for creating and editing documents using the web UI.
Chapter 5, Introducing Client SDK, explains the Couchbase SDK, focusing on the Java API. We will also explore some APIs that are used to connect to Couchbase and perform CRUD operations. It will also explain various concepts, such as locking and counters. The chapter further explains connection management of SDK.
Chapter 6, Retrieving Documents without Keys Using Views, explains the concepts of MapReduce, explain the concepts of views and reduce functions. It will also explain filtering and advanced concepts of views, along with retrieving geospatial data.
Chapter 7, Understanding SQL-Like Queries N1QL, introduces you to N1QL and explains how to retrieve documents using SQL-like syntax.
Chapter 8, Full Text Search Using ElasticSearch, explains how to provide full text search using ElasticSearch plugins. It will explain how to configure ElasticSearch plugins to connect to Couchbase.
Chapter 9, Data Replication and Compaction, explains cross datacenter replication for intercluster. It also explains how data compaction happens in the Couchbase cluster.
Chapter 10, Administration, Tuning, and Monitoring, explains how to monitor, tune, and configure the Couchbase cluster. Along the way, we will explore some best practices as well. We will also see how to initiate data rebalancing, backing up, and so on.
Chapter 11, Case Study – An E-Commerce Application, explains a case on e-commerce and builds it using various features provided by Couchbase, such as document design, views, and so on.
This book requires Couchbase Enterprise Edition 3.0 to be installed on your machine, so that you can try various features discussed in this book. While writing applications to connect to the Couchbase cluster, you will be using Couchbase Client and Java SDK 2.0, which can be downloaded using Maven 3.0. We will be writing code using Eclipse Lunar IDE. To understand full text search, you need to install the ElasticSearch cluster and plugins to fetch data from Couchbase to ElasticSearch for indexing. Subsequently, you require Apache Tomcat 8.0 to deploy web application.
If you are new to the NoSQL document system or have little or no experience in NoSQL development and administration and are planning to deploy Couchbase for your next project, then this book is for you. It will be helpful to have a bit of familiarity with Java.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: " You can use the rpm command to install Couchbase on Red Hat or CentOS."
A block of code is set as follows:
Any command-line input or output is written as follows:
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Clicking the Next button moves you to the next screen."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
This chapter will introduce a new type of database technology called NoSQL. You too are a contributor to the evolution of this technology. Surprised? You do have a Facebook account, upload pictures, and use messenger services, such as WeChat, WhatsApp, right? The data in these are generated at a fast rate and in huge amounts (terabytes per day). They also vary in format or structure. We usually use the term big data for such types of data. Such large amounts of data can't be handled by a traditional relational database management system. That is why a new way needs to be discovered to solve this. This is how NoSQL came into existence. This chapter will introduce you to NoSQL and its fundamentals. Next, you will be introduced to one of the fastest NoSQL databases in the world, called Couchbase. Right, you read it correct! It's the fastest database since all of the data is, by default, cached in the RAM or volatile memory, and the most interesting part is that you don't need to do any configuration for caching the data. Everything will be taken care of by Couchbase Server. Following this, you will learn to install Couchbase Server in Windows and Linux environments. Finally, this chapter will introduce you to the various logs and configuration folders.
In this chapter, we will cover the following topics:
It's always a challenge to introduce a new technology, especially when it changes the fundamentals that have been taught for so long. An example is the one I am going to introduce right now. However, it's easy to comprehend it if we understand the rationale behind it. So, let's understand the need for NoSQL. Oh, hold on! We will elaborate on this later.
We are all aware of and useRelational Database Management Systems (RDBMS). RDBMS is a database management system, which is based on the relational model invented by E. F. Codd, that has features such as normalization, joins, foreign keys, and so on. (Examples of such a database management system would be MySQL, Oracle, DB2 DB, and so on). RDBMS provides features such as transactions, table joins, locking mechanisms, ACID properties, and so on. However, there are some limitations to RDBMS, predominantly in terms of scalability and readiness for schema changes.
ACID stands for Atomicity, Consistency, Isolation, and Durablity. These are properties that are essential for supporting transactions in any database system. In order to guarantee a meaningful and successful transaction, the system has to support all of these properties:
In order to get more clarity, let's look at a scenario. Your organization has recently launched an e-commerce application and you are the technical architect. Everything has been going on smoothly and everyone, including your boss, is happy with the outcome. However, after a couple of months, you start getting complaints from the business team that the application is not performing well. After some investigation, you realize that the consumer base has increased, hence Users traffic has increased. The application server and the infrastructure are not able to handle such an increase in traffic. So what will you do? Think about it. If you are like most other architects, the initial measures would be to scale the application servers, introduce multiple servers, and provide a load balancer, or increase the system resources, such as the RAM and CPU. After you take these steps, the application seems to show some improvement.
But after a couple of weeks comes a realization that the same improvement needs to be done at the database server too. So, what can be done? You have two options:
The first is vertical scaling, wherein you increase the hardware resources in terms of CPU and RAM. The second is horizontal scaling, wherein you increase the sever nodes.
However, there is a challenge here; we can't just scale the database server horizontally as we do for application servers. If we need to scale database servers horizontally, we need to find a mechanism to distribute data across the servers, balance the load, and what not! The only easy way left is to increase your hardware resources. However, after a certain stage, physical servers can't expand further due to limitations of sockets, chips, and so on, just like if you have four CPU socket servers, then you cannot scale up further than that. Therefore, we need to find a way to scale out, horizontally, when we anticipate an increase in the number of database requests or hits or load in the database layer. Such a situation is encountered in most content-driven, social networking, and e-commerce sites, where there are a large number of transactions taking place in milliseconds.
Besides this, due to dynamics in business functions, the database schema needs to be changed very frequently, which is very common in agile development. It is difficult to incorporate the changes in RDBMS. Sometimes you need to bring the application down to modify the schema, such as adding one column in a table. In order to address such issues, companies such as Facebook and Google started exploring alternatives to RDBMS for data storage that can scale out and handle changes in schemas seamlessly without any impact on business operations. These are the fundamentals of NoSQL.
NoSQL is a nonrelational database management system that is different from traditional relational database management systems in significant ways. It is designed for distributed data stores in which there are very large-scale data storage requirements (terabytes and petabytes of data). These types of data storage mechanisms may not require fixed schemas, avoid join operations, and typically scale horizontally.
The main feature of NoSQL is that it is schemaless. There is no fixed schema to store data. Also, there is no join between one or more data records or documents. However, nowadays, most of the NoSQL systems have started providing join features. It allows distributed storage and utilizes computing resources, such as CPU and RAM, spanning across the nodes that are part of the NoSQL cluster.
There are different types of NoSQL data stores. Let's try to cover the four main categories of NoSQL systems in brief:
Use Case: Multiplayer online gaming to manage each player session.
Column family store: A sparse matrix system that uses a row and a column as keys, for example, Apache HBase, Apache Cassandra.Use Case: Stream massive write loads such as log analysis.
Graph store: This is used for relationship-intensive problems. An example is Neo4j.Use Case: Complicated graph problems, such as moving from one point to another.
Document store: This is used to store hierarchical data structures directly in the database, for example, MongoDB (10Gen), CouchDB, and Couchbase.Use Case: Storing structured product information.
Electronic data is generated at rapid speed from a variety of sources, such as social media, web server logs, and e-commerce transactions and so on; these include Facebook, Google+, e-commerce websites such as Amazon, eBay, and others. Personal user information, social graphs, geolocation data, user-generated content, and machine logging data are just a few examples of areas in which data has been increasing exponentially. Such data is termed as big data, which usually has a variety of data formats, is generated at a rapid speed, and contains a large set of data. In order to derive information from such big data, large amounts of data have to be processed, for which RDBMS was never designed! The evolution of NoSQL databases is the way to handle such huge data efficiently.
Most of NoSQL databases provide the following benefits:
Since NoSQL is a distributed database system, you need to know a theorem called CAP to understand it better, and take better decisions when the system fails in a distributed environment. Let me explain the CAP theorem to you. There are three important properties of this theorem:
The following is a Venn diagram depicting the CAP theorem:
So, you have understood what the CAP properties signify. The CAP theorem states that in any distributed system it can provide only two features out of these three features. Depending on the type of use cases that the system is intended to address, the database system can choose two out of these three features.
There are a number of database systems available in the IT software market—RDBMS such as MySQL, or NoSQL such as MongoDB, Couchbase, Cassandra, and so on. How do you choose a database system that suits your business requirements? This theorem will help you to decide it. In our context, Couchbase has opted for AP—availability and partition tolerance. So, if your application demands availability and partition tolerance more than consistency, you could opt for Couchbase. However, Couchbase provides a feature called eventual consistency, which will be discussed later in Chapter 6, Retrieving Documents without Keys Using Views. This feature enables the developer to decide the consistency level per operation.
Having understood what NoSQL is all about and why it's a buzzword nowadays, let's try to understand Couchbase, which is the purpose of this book.
Couchbase Server is a persistent, distributed, document-based database that is part of the NoSQL database movement. It combines the capabilities of Apache CouchDB: document-based and indexing-with that of a Membase database, an integrated RAM caching layer, enabling it to support very fast operations, such as create, store, update, and retrieval.
Couchbase Server is a leading NoSQL database project that focuses on distributed database technology and the surrounding ecosystems. It supports both key-value and document-oriented use cases. All components are available under the Apache 2.0 Public License. It can be obtained as packaged software in both an enterprise edition, which is rigorously tested and provides support, and a community edition that do not have support and is open source.
Let's cover some of the main features of Couchbase Server here:
Auto-sharding is a feature of NoSQL databases that spreads documents across the nodes in a cluster automatically. It remains transparent to the application that consumes the data from the cluster.
Couchbase clusters consist of multiple nodes. A cluster is a collection
