Getting Started with RethinkDB - Gianluca Tiepolo - E-Book

Getting Started with RethinkDB E-Book

Gianluca Tiepolo

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Absorb the knowledge required to utilize, manage, and deploy RethinkDB using Node.js

About This Book

  • Make the most of this open source, scalable database—RethinkDB —to ease the construction of web applications
  • Run powerful queries using ReQL, which is the most convenient language to manipulate JSON documents with
  • Develop fully-fledged real-time web apps using Node.js and RethinkDB

Who This Book Is For

Getting Started with RethinkDB is ideal for developers who are new to RethinkDB and need a practical understanding to start working with it. No previous knowledge of database programming is required, although a basic knowledge of JavaScript or Node.js would be helpful.

What You Will Learn

  • Download and install the database on your system
  • Configure RethinkDB's settings and start using the web interface
  • Import data into RethinkDB
  • Run queries using the ReQL language
  • Create shards, replicas, and RethinkDB clusters
  • Use an index to improve database performance
  • Get to know all the RethinkDB deployment techniques

In Detail

RethinkDB is a high-performance document-oriented database with a unique set of features. This increasingly popular NoSQL database is used to develop real-time web applications and, together with Node.js, it can be used to easily deploy them to the cloud with very little difficulty.

Getting Started with RethinkDB is designed to get you working with RethinkDB as quickly as possible. Starting with the installation and configuration process, you will learn how to start importing data into the database and run simple queries using the intuitive ReQL query language.

After successfully running a few simple queries, you will be introduced to other topics such as clustering and sharding. You will get to know how to set up a cluster of RethinkDB nodes and spread database load across multiple machines. We will then move on to advanced queries and optimization techniques. You will discover how to work with RethinkDB from a Node.js environment and find out all about deployment techniques.

Finally, we'll finish by working on a fully-fledged example that uses the Node.js framework and advanced features such as Changefeeds to develop a real-time web application.

Style and approach

This is a step-by-step book that provides a practical approach to RethinkDB programming, and is explained in a conversational, easy-to-follow style.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 208

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Getting Started with RethinkDB
Credits
About the Author
Acknowledgement
About the Reviewer
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Introducing RethinkDB
Rethinking the database
Changefeeds
Horizontal scalability
Powerful query language
Developer-oriented
Document-oriented
Lock-free architecture
Immediate consistency
Secondary indexes
Distributed joins
Installing RethinkDB
Installing RethinkDB on Ubuntu/Debian Linux
Installing RethinkDB on CentOS and Fedora
Installing RethinkDB on OS X
Installing RethinkDB using Homebrew
Building RethinkDB from source
Configuring RethinkDB
Running as a daemon
Creating a configuration file
Starting RethinkDB
Running a query
Summary
2. The ReQL Query Language
Documents
Document databases
JSON document format
Keys
Arrays
Embedded documents
Data modeling
Introducing ReQL
An explicit query language
Building a query
Inserting data
Batch inserts
Reading data
Filtering results
Manipulating results
Updating data
Updating existing attributes
Adding new attributes
Deleting data
Removing all documents
Deleting a table
Deleting a database
Summary
3. Clustering, Sharding, and Replication
An introduction to scaling
What kind of system is it?
Scaling reads
Scaling writes
Scaling data
Clustering RethinkDB
Creating a cluster
Adding a server to the cluster
Running queries on the cluster
Replication
Adding a secondary replica
Failover
Sharding
Sharding a table
Summary
4. Performance Tuning and Advanced Queries
Performance tuning
Increasing the cache size
Increasing concurrency
Using soft durability mode
Bulk data import
Introducing indexing
Evaluating query performance
Creating and using an index
Compound indexes
Advanced queries
Limits, skips, and sorts
The limit command
The skip command
Sorting documents
Finding a random document
Grouping
Aggregations
Average
Maximum
The pluck command
Summary
5. Programming RethinkDB in Node.js
Introducing Node.js
An increasingly popular technology
An event-driven design
Installing Node.js
Installing on Linux
Installing on Mac OS X
Running Node.js
Installing the RethinkDB module
Connecting to RethinkDB
Running a simple query
Inserting documents
Reading documents
Updating and deleting documents
Introducing Changefeeds
A simple example using Changefeeds
Summary
6. RethinkDB Administration and Deployment
RethinkDB administration tools
Backing up your data
Backing up a single table
Setting up automatic backups
Restoring your data
Securing RethinkDB
Securing the web interface
Securing the driver port
Monitoring RethinkDB
Monitoring issues
Monitoring running jobs
Deploying RethinkDB
Summary
7. Developing Real-Time Web Applications
Introducing real-time web applications
Examples of real-time web apps
Going real time on the Web
Polling
AJAX
WebSockets
Developing web applications with Node.js
Express.js
Routing
Templating
Socket.io
Using RethinkDB in Node.js web applications
Database polling
Message queues
Changefeeds
Your first real-time web application
Structuring the Notes web application
Creating the Node.js server
Configuring the router
Writing the application logic
Interacting with RethinkDB
Implementing Changefeeds
Programming Socket.io
Programming the frontend
The view
Running our web application
Summary
Index

Getting Started with RethinkDB

Getting Started with RethinkDB

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: March 2016

Production reference: 1100316

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-760-4

www.packtpub.com

Credits

Author

Gianluca Tiepolo

Reviewer

Brandon Martin

Acquisition Editor

Rahul Nair

Content Development Editor

Arshiya Ayaz Umer

Technical Editor

Rupali R. Shrawane

Copy Editor

Yesha Gangani

Project Coordinator

Kinjal Bari

Proofreader

Safis Editing

Indexer

Tejal Daruwale Soni

Graphics

Jason Monteiro

Production Coordinator

Aparna Bhagat

Cover Work

Aparna Bhagat

About the Author

Gianluca Tiepolo is an accomplished software engineering professional and entrepreneur with several years of experience in developing software and products on a variety of technologies, from consumer applications to revolutionary projects focused on computer vision, data engineering, and database programming. His software stack ranges from traditional platforms, such as Hadoop and OpenCV, to modern platforms, such as Node.js and Redis.

He is the founder of Defrogs, a start-up that is building a new-generation data engineering platform to handle big data called TrisDB. Bringing in innovative data process development approaches, this organization focuses on cutting-edge technologies designed to scale small to large distributed data clusters. To date, TrisDB is used by more than 3 million developers around the world. He previously co-founded Sixth Sense Solutions, a start-up that develops interactive solutions for the retail and fashion industries. In 2013, he helped produce the largest touch-enabled surface in the world. Currently, he's working on a fashion platform called Stylobag and maintains several open source projects on his GitHub account.

In 2015, he reviewed the book called Building Web Applications with Python and Neo4j published by Packt Publishing.

Acknowledgement

Writing a book is both an uphill and demanding task that cannot be accomplished by a single person without the support of several others, and this book is no different. I would like to thank everyone who has played a role in helping me write this book or helped me in my career. I am indebted and grateful to everyone; however, I would like to mention a few people who have been extremely important to me over these last few months.

First of all, I have to thank the team of editors, reviewers, and the entire team at Packt Publishing for this book. I especially want to thank Izzat Contractor, who initially suggested me as the author of this book, Rahul Nair, the acquisition editor for this project, and also Arshiya Ayaz, my content editor.

I also want to thank all my friends and colleagues who have unconditionally supported me throughout the writing of this book and have given me great inspiration. I especially want to thank Diego Frasson for everything that he has taught me and Marco Ippolito who always reminded me that impossible projects become possible if you believe in them. You guys have given me a great dose of motivation, and this book simply wouldn't have been possible without you!

I want to thank the creators of RethinkDB. I am extremely thankful to the RethinkDB team and its great community. I also have to thank the entire open source community that contributed to the technologies on which RethinkDB is based. Without these technologies, the database and, in turn, this book would not have been possible.

Last but not least, I want to thank my mum Claire for understanding me and supporting me during my long hours of work and writing.

About the Reviewer

Brandon Martin is a full-stack experienced programmer who has been programming for 10 years. Currently, he works for Lumi Inc (www.lumi.com) as the lead engineer where they are using RethinkDB in production. He has been happily married for 15 years and has two children.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Databases are all around us. In the modern web, almost every website that we visit and the web-based applications that we use have a database system working behind the frontend. Web developers are constantly looking for new database solutions that adapt to the modern web, allowing them to store data in a simpler manner.

RethinkDB is both the simplest and the most powerful document database technology available across Linux and OS X platforms. Based on robust and feature-rich software, RethinkDB provides a bunch of features that can be used to develop some real-time web applications that can be scaled incredibly easily. RethinkDB is also open source, so the source code is freely downloadable from the database GitHub repository.

This book provides an introduction to RethinkDB. The following chapters will give you some understanding and coding tips to install and configure the database and start developing web applications with RethinkDB in no time.

What this book covers

Chapter 1, Introducing RethinkDB, explains how to download and install RethinkDB on both Linux and OS X.

Chapter 2, The ReQL Query Language, explains the basics of RethinkDB's query language and how to use it to run simple queries on the database.

Chapter 3, Clustering, Sharding, and Replication, explores the different techniques you can use to scale RethinkDB.

Chapter 4, Performance Tuning and Advanced Queries, gives out best practices to obtain optimal performance and explores more advanced queries.

Chapter 5, Programming RethinkDB in Node.js, explains how to interact with the database using the Node.js programming language.

Chapter 6, RethinkDB Administration and Deployment, teaches you how to maintain your RethinkDB database instance and how to deploy it to the cloud.

Chapter 7, Developing Real-Time Web Applications, explores how to develop a full-fledged Node.js web application based on RethinkDB.

What you need for this book

To get the most out of this book, you'll need a computer or server running OS X or any other Linux distribution.

You will also need an Internet connection and administration privileges to download and install the database.

Finally, you will need a text editor to edit configuration files and write code. There are many freely available editors, such as Nano, Emacs, or Gedit. Choose the one you prefer!

Who this book is for

This book is targeted at anyone interested in learning how to get started with the RethinkDB database. No prior database programming experience is required, however you should be comfortable with installing software, editing configuration files, and using the terminal or commandline. A basic knowledge of Node.js is recommended, but not required.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The default RethinkDB package includes various control scripts including the init script /etc/init.d/rethinkdb."

A block of code is set as follows:

r.db("test").table("people").indexStatus("name_and_age") { "function": <binary, 125 bytes, "24 72 65 71 6c 5f..."> , "geo": false , "index": "name_and_age" , "multi": false , "outdated": false , "ready": true }

Any command-line input or output is written as follows:

rethinkdb import -f data.json --table test.peoplesudo pip install rethinkdb

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "You can do this from the Data Explorer section of the web interface by clicking on the Options icon and checking the query profiler checkbox, as you can see from the following screenshot."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/GettingStartedwithRethinkDB_ColorImages.pdf

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Introducing RethinkDB

RethinkDB is an open source, distributed, and document-oriented database built to store JSON documents and is used to scale multiple machines with very little effort. It's easy to set up and use, and it has a powerful query language that supports advanced queries such as table joins, aggregations, and MapReduce.

This chapter covers major design decisions that have made RethinkDB what it is now including the unique features it offers for real-time application development. We're going to start by looking at the basics of RethinkDB, why it is different, and why the new approach has everybody excited about using it to build the next generation web apps.

In this chapter, you will also learn the following:

Installing the database on Linux and OS XConfiguring itRunning a query using the web interface

The RethinkDB development team provides prepackaged binaries for some platforms, whereas the source code is available on GitHub. You will learn to install the database using both methods. The choice of which type of package to use depends on which one is more appropriate for your system; Ubuntu, Debian, CentOS, and OS X users may prefer using the provided binaries, whereas users using different platforms can install RethinkDB by downloading and compiling the source code.

Rethinking the database

Traditional database systems have existed for many years, and they all have a familiar structure and common methods of communicating, inserting, and querying for information; however, the relatively recent rise and diffusion of NoSQL databases have given developers an increasingly large amount of choice on what to use for their data storage.

Although, new scalability capabilities have most certainly revolutionized the performance that these databases can deliver, most NoSQL systems still rely on the creation of a specific structure that is organized collectively into a record of data. Additionally, the access model of these systems has not changed to adapt today's modern web applications; to get in information, you add a record of data, and to get the information out, you query the database by polling specific values or fields as illustrated by the following diagram:

However, as technology evolves, it's often worth rethinking how we do tasks. RethinkDB takes a completely different approach to the database structure and methods of storing and retrieving information.

What follows is an overview of RethinkDB's main features along with accompanying considerations of how it differs from other NoSQL databases.

Changefeeds

RethinkDB is designed for building real-time applications. Using a feature called Changefeeds, developers can program the database to continuously push data updates to applications in real time. This fundamental architecture choice solves all the problems generated by continuously polling the database, as it is the database itself that serves data to applications in real time by reducing the time and complexity required to develop scalable web apps. The following diagram illustrates how this works:

The best part about how RethinkDB handles Changefeeds is that you don't need to particularly modify your queries to implement them. They look identical to a normal query apart from the changes() command that gets appended to it. Currently, the changes command works on a large subset of queries and allows a client to receive updates on a table, a single document, or even the results from a specific query as they happen.

Horizontal scalability

RethinkDB is a very good solution when flexibility and rapid iteration are of primary importance. Its other big strength is its ability to scale horizontally with very little effort or changes required to how you interact with the database. Horizontal scalability consists of expanding the storage capacity and processing power of a database by adding more servers to a cluster. A single database node is greatly limited by the capacity of the server that hosts it. So, if the dataset exceeds available capacity, data must besharded among multiple database instances that are connected to each other.

Thankfully, the RethinkDB team set out to make scaling really easy for developers. Users should not have to worry about these issues at all wherever possible. So, with RethinkDB, you can set up a cluster, create table-level shards, and run cross-shard joins and aggregations in less than five minutes using the web interface.

Powerful query language

The RethinkDB query language, ReQL, is a data-driven, abstract, advanced language that embeds itself perfectly in the programming language that you use to build your applications; in fact, in ReQL, queries are constructed simply by making function calls in any programming language that you prefer. ReQL is designed to be pragmatic and works like a fluent API—a set of functions that you can chain together to compose queries. It supports advanced queries including massively parallelized distributed computation. All queries are automatically parallelized on the database server and, whenever possible, query execution is split across multiple cores and datacenters. RethinkDB will automatically break large queries into stages and execute each stage in parallel by combining intermediate data to return a complete query result.

Tip

Official RethinkDB client drivers are available for JavaScript, Python and Ruby; however, support for other programming languages is available through community-supported drivers.

Developer-oriented

RethinkDB is different by design. In fact, it aims to be both developer friendly and operations-oriented, combining an easy-to-use query language with simple controls for operating at scale, while still maintaining an operations-oriented approach of being highly available and extremely scalable.

Since its first release, RethinkDB has gained a large, vibrant, developer community quicker than almost any other database; in fact, today, RethinkDB is the second most popular database on GitHub and is becoming the database of choice for many big and small companies with hundreds of technology start-ups already using it in production.

Document-oriented

One of the reasons behind RethinkDB's popularity among developers is its data model. JSON has become the de-facto standard for data interchange for modern web applications and a persistence layer that naturally stores, queries, and manages JSON. It makes life easier for developers. RethinkDB is a document database built from the ground up to take advantage of JSON's feature set. When developers have to work with objects in databases, it can be troublesome at times due to data mapping and impedance issues. Document-oriented databases solve these issues by replacing the concept of a row with a more flexible model called the document, as documents are objects. After all, programmers who tend to work with objects are going to be much more familiar with storing and querying such data in RethinkDB. If you've never worked with a document before, consider the following example that represents a person using JSON:

{ "firstName": "Alex", "lastName": "Jones", "yearOfBirth": 1991, "phoneNumbers": { "home": "02-345678", "mobile": "345-12345678" }, "interests": [ "programming", "football", "chess" ] }

As you can see from the preceding example, a document always begins and ends with curly braces, keys and values are separated by colons, and key/value pairs are separated by commas. The key is always a string. A typical JSON document lets you represent values as numbers, strings, bools, arrays, and objects; however, RethinkDB adds other data types that you can use to model your data—binary data, dates and times and the null value. Since version 1.15, RethinkDB also supports geospatial queries for you to include geometry within your JSON documents.

By allowing embedded objects and arrays in JSON, the document-oriented approach used by RethinkDB lets you represent complex relationships with a single document. This fits naturally into the way in which web developers think and model their data.

Lock-free architecture

Traditional, relational, and document databases, more often than not, use locks at various levels to ensure proper data consistency during concurrent access to the database. In a typical NoSQL database that uses locking, once a write request comes in, all readers are blocked until the write completes. What this means is that in some use cases that require large volumes of writes, this architecture could eventually lead to reads to the database getting queued up, resulting in significant performance degradation.

RethinkDB solves this problem by implementing block-level Multi-Version Concurrency Control (MVCC)—a method commonly used by database management systems that provides concurrent access to the database without locking it. Whenever a write operation occurs while there is an ongoing read, the database takes a snapshot of the data block for each relevant shard and temporarily maintains different versions of the blocks in order to execute both read and write operations at the same time.

The main difference between MVCC and lock models is that in MVCC, locks acquired for reading data don't conflict with locks acquired for writing data, and so, reading never blocks writing and vice versa. The concurrency model used by RethinkDB ensures, for example, that you can run an hour-long MapReduce job without blocking the database.

Immediate consistency

For distributed