31,19 €
Absorb the knowledge required to utilize, manage, and deploy RethinkDB using Node.js
Getting Started with RethinkDB is ideal for developers who are new to RethinkDB and need a practical understanding to start working with it. No previous knowledge of database programming is required, although a basic knowledge of JavaScript or Node.js would be helpful.
RethinkDB is a high-performance document-oriented database with a unique set of features. This increasingly popular NoSQL database is used to develop real-time web applications and, together with Node.js, it can be used to easily deploy them to the cloud with very little difficulty.
Getting Started with RethinkDB is designed to get you working with RethinkDB as quickly as possible. Starting with the installation and configuration process, you will learn how to start importing data into the database and run simple queries using the intuitive ReQL query language.
After successfully running a few simple queries, you will be introduced to other topics such as clustering and sharding. You will get to know how to set up a cluster of RethinkDB nodes and spread database load across multiple machines. We will then move on to advanced queries and optimization techniques. You will discover how to work with RethinkDB from a Node.js environment and find out all about deployment techniques.
Finally, we'll finish by working on a fully-fledged example that uses the Node.js framework and advanced features such as Changefeeds to develop a real-time web application.
This is a step-by-step book that provides a practical approach to RethinkDB programming, and is explained in a conversational, easy-to-follow style.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 208
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: March 2016
Production reference: 1100316
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-760-4
www.packtpub.com
Author
Gianluca Tiepolo
Reviewer
Brandon Martin
Acquisition Editor
Rahul Nair
Content Development Editor
Arshiya Ayaz Umer
Technical Editor
Rupali R. Shrawane
Copy Editor
Yesha Gangani
Project Coordinator
Kinjal Bari
Proofreader
Safis Editing
Indexer
Tejal Daruwale Soni
Graphics
Jason Monteiro
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
Gianluca Tiepolo is an accomplished software engineering professional and entrepreneur with several years of experience in developing software and products on a variety of technologies, from consumer applications to revolutionary projects focused on computer vision, data engineering, and database programming. His software stack ranges from traditional platforms, such as Hadoop and OpenCV, to modern platforms, such as Node.js and Redis.
He is the founder of Defrogs, a start-up that is building a new-generation data engineering platform to handle big data called TrisDB. Bringing in innovative data process development approaches, this organization focuses on cutting-edge technologies designed to scale small to large distributed data clusters. To date, TrisDB is used by more than 3 million developers around the world. He previously co-founded Sixth Sense Solutions, a start-up that develops interactive solutions for the retail and fashion industries. In 2013, he helped produce the largest touch-enabled surface in the world. Currently, he's working on a fashion platform called Stylobag and maintains several open source projects on his GitHub account.
In 2015, he reviewed the book called Building Web Applications with Python and Neo4j published by Packt Publishing.
Writing a book is both an uphill and demanding task that cannot be accomplished by a single person without the support of several others, and this book is no different. I would like to thank everyone who has played a role in helping me write this book or helped me in my career. I am indebted and grateful to everyone; however, I would like to mention a few people who have been extremely important to me over these last few months.
First of all, I have to thank the team of editors, reviewers, and the entire team at Packt Publishing for this book. I especially want to thank Izzat Contractor, who initially suggested me as the author of this book, Rahul Nair, the acquisition editor for this project, and also Arshiya Ayaz, my content editor.
I also want to thank all my friends and colleagues who have unconditionally supported me throughout the writing of this book and have given me great inspiration. I especially want to thank Diego Frasson for everything that he has taught me and Marco Ippolito who always reminded me that impossible projects become possible if you believe in them. You guys have given me a great dose of motivation, and this book simply wouldn't have been possible without you!
I want to thank the creators of RethinkDB. I am extremely thankful to the RethinkDB team and its great community. I also have to thank the entire open source community that contributed to the technologies on which RethinkDB is based. Without these technologies, the database and, in turn, this book would not have been possible.
Last but not least, I want to thank my mum Claire for understanding me and supporting me during my long hours of work and writing.
Brandon Martin is a full-stack experienced programmer who has been programming for 10 years. Currently, he works for Lumi Inc (www.lumi.com) as the lead engineer where they are using RethinkDB in production. He has been happily married for 15 years and has two children.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
Databases are all around us. In the modern web, almost every website that we visit and the web-based applications that we use have a database system working behind the frontend. Web developers are constantly looking for new database solutions that adapt to the modern web, allowing them to store data in a simpler manner.
RethinkDB is both the simplest and the most powerful document database technology available across Linux and OS X platforms. Based on robust and feature-rich software, RethinkDB provides a bunch of features that can be used to develop some real-time web applications that can be scaled incredibly easily. RethinkDB is also open source, so the source code is freely downloadable from the database GitHub repository.
This book provides an introduction to RethinkDB. The following chapters will give you some understanding and coding tips to install and configure the database and start developing web applications with RethinkDB in no time.
Chapter 1, Introducing RethinkDB, explains how to download and install RethinkDB on both Linux and OS X.
Chapter 2, The ReQL Query Language, explains the basics of RethinkDB's query language and how to use it to run simple queries on the database.
Chapter 3, Clustering, Sharding, and Replication, explores the different techniques you can use to scale RethinkDB.
Chapter 4, Performance Tuning and Advanced Queries, gives out best practices to obtain optimal performance and explores more advanced queries.
Chapter 5, Programming RethinkDB in Node.js, explains how to interact with the database using the Node.js programming language.
Chapter 6, RethinkDB Administration and Deployment, teaches you how to maintain your RethinkDB database instance and how to deploy it to the cloud.
Chapter 7, Developing Real-Time Web Applications, explores how to develop a full-fledged Node.js web application based on RethinkDB.
To get the most out of this book, you'll need a computer or server running OS X or any other Linux distribution.
You will also need an Internet connection and administration privileges to download and install the database.
Finally, you will need a text editor to edit configuration files and write code. There are many freely available editors, such as Nano, Emacs, or Gedit. Choose the one you prefer!
This book is targeted at anyone interested in learning how to get started with the RethinkDB database. No prior database programming experience is required, however you should be comfortable with installing software, editing configuration files, and using the terminal or commandline. A basic knowledge of Node.js is recommended, but not required.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The default RethinkDB package includes various control scripts including the init script /etc/init.d/rethinkdb."
A block of code is set as follows:
Any command-line input or output is written as follows:
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "You can do this from the Data Explorer section of the web interface by clicking on the Options icon and checking the query profiler checkbox, as you can see from the following screenshot."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/GettingStartedwithRethinkDB_ColorImages.pdf
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
RethinkDB is an open source, distributed, and document-oriented database built to store JSON documents and is used to scale multiple machines with very little effort. It's easy to set up and use, and it has a powerful query language that supports advanced queries such as table joins, aggregations, and MapReduce.
This chapter covers major design decisions that have made RethinkDB what it is now including the unique features it offers for real-time application development. We're going to start by looking at the basics of RethinkDB, why it is different, and why the new approach has everybody excited about using it to build the next generation web apps.
In this chapter, you will also learn the following:
The RethinkDB development team provides prepackaged binaries for some platforms, whereas the source code is available on GitHub. You will learn to install the database using both methods. The choice of which type of package to use depends on which one is more appropriate for your system; Ubuntu, Debian, CentOS, and OS X users may prefer using the provided binaries, whereas users using different platforms can install RethinkDB by downloading and compiling the source code.
Traditional database systems have existed for many years, and they all have a familiar structure and common methods of communicating, inserting, and querying for information; however, the relatively recent rise and diffusion of NoSQL databases have given developers an increasingly large amount of choice on what to use for their data storage.
Although, new scalability capabilities have most certainly revolutionized the performance that these databases can deliver, most NoSQL systems still rely on the creation of a specific structure that is organized collectively into a record of data. Additionally, the access model of these systems has not changed to adapt today's modern web applications; to get in information, you add a record of data, and to get the information out, you query the database by polling specific values or fields as illustrated by the following diagram:
However, as technology evolves, it's often worth rethinking how we do tasks. RethinkDB takes a completely different approach to the database structure and methods of storing and retrieving information.
What follows is an overview of RethinkDB's main features along with accompanying considerations of how it differs from other NoSQL databases.
RethinkDB is designed for building real-time applications. Using a feature called Changefeeds, developers can program the database to continuously push data updates to applications in real time. This fundamental architecture choice solves all the problems generated by continuously polling the database, as it is the database itself that serves data to applications in real time by reducing the time and complexity required to develop scalable web apps. The following diagram illustrates how this works:
The best part about how RethinkDB handles Changefeeds is that you don't need to particularly modify your queries to implement them. They look identical to a normal query apart from the changes() command that gets appended to it. Currently, the changes command works on a large subset of queries and allows a client to receive updates on a table, a single document, or even the results from a specific query as they happen.
RethinkDB is a very good solution when flexibility and rapid iteration are of primary importance. Its other big strength is its ability to scale horizontally with very little effort or changes required to how you interact with the database. Horizontal scalability consists of expanding the storage capacity and processing power of a database by adding more servers to a cluster. A single database node is greatly limited by the capacity of the server that hosts it. So, if the dataset exceeds available capacity, data must besharded among multiple database instances that are connected to each other.
Thankfully, the RethinkDB team set out to make scaling really easy for developers. Users should not have to worry about these issues at all wherever possible. So, with RethinkDB, you can set up a cluster, create table-level shards, and run cross-shard joins and aggregations in less than five minutes using the web interface.
The RethinkDB query language, ReQL, is a data-driven, abstract, advanced language that embeds itself perfectly in the programming language that you use to build your applications; in fact, in ReQL, queries are constructed simply by making function calls in any programming language that you prefer. ReQL is designed to be pragmatic and works like a fluent API—a set of functions that you can chain together to compose queries. It supports advanced queries including massively parallelized distributed computation. All queries are automatically parallelized on the database server and, whenever possible, query execution is split across multiple cores and datacenters. RethinkDB will automatically break large queries into stages and execute each stage in parallel by combining intermediate data to return a complete query result.
Official RethinkDB client drivers are available for JavaScript, Python and Ruby; however, support for other programming languages is available through community-supported drivers.
RethinkDB is different by design. In fact, it aims to be both developer friendly and operations-oriented, combining an easy-to-use query language with simple controls for operating at scale, while still maintaining an operations-oriented approach of being highly available and extremely scalable.
Since its first release, RethinkDB has gained a large, vibrant, developer community quicker than almost any other database; in fact, today, RethinkDB is the second most popular database on GitHub and is becoming the database of choice for many big and small companies with hundreds of technology start-ups already using it in production.
One of the reasons behind RethinkDB's popularity among developers is its data model. JSON has become the de-facto standard for data interchange for modern web applications and a persistence layer that naturally stores, queries, and manages JSON. It makes life easier for developers. RethinkDB is a document database built from the ground up to take advantage of JSON's feature set. When developers have to work with objects in databases, it can be troublesome at times due to data mapping and impedance issues. Document-oriented databases solve these issues by replacing the concept of a row with a more flexible model called the document, as documents are objects. After all, programmers who tend to work with objects are going to be much more familiar with storing and querying such data in RethinkDB. If you've never worked with a document before, consider the following example that represents a person using JSON:
As you can see from the preceding example, a document always begins and ends with curly braces, keys and values are separated by colons, and key/value pairs are separated by commas. The key is always a string. A typical JSON document lets you represent values as numbers, strings, bools, arrays, and objects; however, RethinkDB adds other data types that you can use to model your data—binary data, dates and times and the null value. Since version 1.15, RethinkDB also supports geospatial queries for you to include geometry within your JSON documents.
By allowing embedded objects and arrays in JSON, the document-oriented approach used by RethinkDB lets you represent complex relationships with a single document. This fits naturally into the way in which web developers think and model their data.
Traditional, relational, and document databases, more often than not, use locks at various levels to ensure proper data consistency during concurrent access to the database. In a typical NoSQL database that uses locking, once a write request comes in, all readers are blocked until the write completes. What this means is that in some use cases that require large volumes of writes, this architecture could eventually lead to reads to the database getting queued up, resulting in significant performance degradation.
RethinkDB solves this problem by implementing block-level Multi-Version Concurrency Control (MVCC)—a method commonly used by database management systems that provides concurrent access to the database without locking it. Whenever a write operation occurs while there is an ongoing read, the database takes a snapshot of the data block for each relevant shard and temporarily maintains different versions of the blocks in order to execute both read and write operations at the same time.
The main difference between MVCC and lock models is that in MVCC, locks acquired for reading data don't conflict with locks acquired for writing data, and so, reading never blocks writing and vice versa. The concurrency model used by RethinkDB ensures, for example, that you can run an hour-long MapReduce job without blocking the database.
For distributed