Amazon SimpleDB Developer Guide - Prabhakar Chaganti - E-Book

Amazon SimpleDB Developer Guide E-Book

Prabhakar Chaganti

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

SimpleDB is a highly scalable, simple-to-use, and inexpensive database in the cloud from Amazon Web Services. But in order to use SimpleDB, you really have to change your mindset. This isn't a traditional relational database; in fact it's not relational at all. For developers who have experience working with relational databases, this may lead to misconceptions as to how SimpleDB works.This practical book aims to address your preconceptions on how SimpleDB will work for you. You will be quickly led through the differences between relational databases and SimpleDB, and the implications of using SimpleDB. Throughout this book, there is an emphasis on demonstrating key concepts with practical examples for Java, PHP, and Python developers.You will be introduced to this massively scalable schema-less key-value data store: what it is, how it works, and why it is such a game-changer. You will then explore the basic functionality offered by SimpleDB including querying, code samples, and a lot more. This book will help you deploy services outside the Amazon cloud and access them from any web host.You will see how SimpleDB gives you the freedom to focus on application development. As you work through this book you will be able to optimize the performance of your applications using parallel operations, caching with memcache, asynchronous operations, and more.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 236

Veröffentlichungsjahr: 2010

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Amazon SimpleDB Developer Guide

Prabhakar Chaganti

Rich Helms

Amazon SimpleDB Developer Guide

Copyright © 2010 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: June 2010

Production Reference: 2240510

Published by Packt Publishing Ltd.

32 Lincoln Road

Olton

Birmingham, B27 6PA, UK.

ISBN 978-1-847197-34-4

www.packtpub.com

Cover Image by Tina Negus (<[email protected]>)

Credits

Authors

Prabhakar Chaganti

Rich Helms

Reviewers

Deepak Anupalli

Anders Samuelsson

Ashley Tate

Acquisition Editor

James Lumsden

Development Editors

Dhwani Devater

Reshma Sundaresan

Technical Editor

Ishita Dhabalia

Indexer

Monica Ajmera Mehta

Editorial Team Leader

Gagandeep Singh

Project Team Leader

Lata Basantani

Project Coordinator

Joel Goveya

Proofreader

Lynda Silwoski

Graphics

Nilesh Mohite

Production Coordinator

Adline Swetha Jesuthas

Cover Work

Adline Swetha Jesuthas

Foreword

Most software developers who work on the Internet love change. Change presents a new challenge, a new paradigm, and new technologies to learn. To realize this, all you have to do is look at the evolution of computers. During the 70s, we worked in a world of mainframes and raised floors. Only special people got to touch the computer, while others had to be content watching from outside of the fishbowl.

The 80s brought the mini-computer with dedicated CRT terminals. You could show data on the screen in any color as long as it was green, but the computer was down the hall in the back room. The 80s also introduced the personal computer. As PC power grew, the mini was replaced with the LAN-connected PC.

The 90s saw the advent of the Internet, and people dialed in, and in the early 2000s, the Internet went viral. As high-speed connections became common, the Internet replaced corporate networks. Computers went from rooms to luggables to "in my briefcase" to "in my pocket."

In 2010, we are seeing the growth of cloud computing. Selecting a brand and model of server computer is being replaced with renting a virtual server at a hosting service like Amazon. The purchaser of these virtual servers doesn't have to select a hardware "brand." I no more care about the brand of computer than I would care about what brand of pipe the water utility used to connect to my house. All I am buying is cycles and reliability.

This move to virtual servers also changes the capital required to propose the next viral application. I don't need to buy a large database cluster, hoping for the acceptance to fill it. I am billed for usage, not capacity. SimpleDB is one of those virtual offerings and the topic of this book.

Rich Helms

About the Authors

Prabhakar Chaganti is the founder and CTO of Ylastic, a startup that is building a single unified interface to architect, manage, and monitor a user's entire AWS Cloud computing environment: EC2, S3, RDS, AutoScaling, ELB, Cloudwatch, SQS, and SimpleDB. He is the author of Xen Virtualization and GWT Java AJAX Programming, both by Packt Publishing, and is also the winner of the community choice award for the most innovative virtual appliance in the VMware Global Virtual Appliance Challenge. He hangs out on Twitter as @pchaganti.

"It's never been done" is a call to action for Rich Helms. He has built a career on breaking new ground in the computer field. He developed CARES (Computer Assisted Recovery Enhancement System) for the Metropolitan Toronto Police in Canada. CARES was the first computer system in the world for aging missing children. CARES has been internationally recognized as pioneering work in child aging. Rich has also created several generations of e-learning platforms including Learn it script and most recently Educate Press.

Rich can be reached at http://webmasterinresidence.ca.

Rich is a seasoned software developer with over 30 years of experience. He spent 22 years in various positions at IBM including Chief Image Technology Architect. His credentials range from deep technical work (five patents in hardware and software) to running multinational R&D.

About the Reviewers

Deepak Anupalli is Architect for the Server Engineering group at Pramati Technologies. He has deep insight into various Java/J2EE technologies. He represents Pramati on the EJB and JPA expert groups and has led the Java EE 5 certification effort of Pramati Server. He is currently leading the effort to build a standards-based web-scale Application server. He is a visiting faculty member with IIIT-Hyderabad for a course on middleware and also speaks at various technology conferences. He holds a graduate degree in Computer Science and Engineering from National Institute of Technology (NIT Warangal, India).

Anders Samuelsson has over 25 years of experience in the computing industry. The main focus during this time has been with computer security. He currently works for Amazon.com with Amazon Web Services.

I'd like to thank my wife Malena and my son Daniel and daughter Ida, for always standing by me and allowing me to spend time helping out with this book. I love you forever.

Ashley Tate is the founder of Coditate Software and the creator of Simple Savant, an advanced C# interface to SimpleDB. He is currently working on GridRoom, an application for collaborative sports-video review built on several Amazon Web Services, including SimpleDB. He lives near Atlanta with his wife and four children. You can find him online at http://blog.coditate.com.

 

I would like to dedicate this book to my brother Madhukar, who gave us all a big scare, and with typical panache came out of it stronger than ever, my sister-in-law Meghna for putting the rock of Gibraltar to shame and showing us all how to handle and deal with adversity, and my nephew Yuv, the two year old fire cracker. My two daughters Anika and Anya were understanding and patient beyond their years as I stuck to my Mac at all kinds of weird hours. Above all, this book would not have made it into the station without the constant support, love and encouragement from my lovely wife Nitika!

  --Prabhakar Chaganti
 

A special thanks to Dorothea, Mike, Mary, our little girl Margaret, and the gang at WCDR.

  --Rich Helms

Preface

SimpleDB is a highly scalable, simple-to-use, and inexpensive database in the cloud from Amazon Web Services. But in order to use SimpleDB, you really have to change your mindset. This isn't a traditional relational database; in fact it's not relational at all. For developers who have experience working with relational databases, this may lead to misconceptions as to how SimpleDB works.

This practical book aims to address your preconceptions on how SimpleDB will work for you. You will be led quickly through the differences between relational databases and SimpleDB, and the implications of using SimpleDB. Throughout this book, there is an emphasis on demonstrating key concepts with practical examples for Java, PHP, and Python developers.

You will be introduced to this massively scalable schema less key-value data store: what it is, how it works, and why it is such a game changer. You will then explore the basic functionality offered by SimpleDB including querying, code samples, and a lot more. This book will help you deploy services outside the Amazon cloud and access them from any web host.

You will see how SimpleDB gives you the freedom to focus on application development. As you work through this book you will be able to optimize the performance of your applications using parallel operations, caching with memcache, asynchronous operations, and more.

Gain in-depth understanding of Amazon SimpleDB with PHP, Java, and Python examples, and run optimized database-backed applications on Amazon's Web Services cloud.

What this book covers

Chapter 1, Getting to Know SimpleDB, explores SimpleDB and the advantages of utilizing it to build web-scale applications.

Chapter 2, Getting Started with SimpleDB, moves on to set up an AWS account, enable SimpleDB service for the account, and install and set up libraries for Java, PHP, and Python. It also illustrates several SimpleDB operations using these libraries.

Chapter 3, SimpleDB versus RDBMS, sheds light on the differences between SimpleDB and a traditional RDBMS, as well as the pros and cons of using SimpleDB as the storage engine in your application.

Chapter 4, The SimpleDB Data Model, takes a detailed look at the SimpleDB data model and different methods for interacting with a domain, its items, and their attributes. It further talks about the domain metadata and reviews the various constraints imposed by SimpleDB on domains, items, and attributes.

Chapter 5, Data Types, discusses the techniques needed for storing different data types in SimpleDB, and explores a technique for storing numbers, Boolean values, and dates. It also teaches you about XML-restricted characters and encoding them using base64 encoding.

Chapter 6, Querying, describes the Select syntax for retrieving results from SimpleDB, and looks at the various operators and how to create predicates that allow you to get back the information you need.

Chapter 7, Storing Data on S3, introduces you to Amazon S3 and its use for storing large files. It practically modifies a sample domain to add additional metadata including a file key that is again used for naming the MP3 file uploaded to S3. The example used in this chapter shows you a simple way to store metadata on SimpleDB while storing associated content that is in the form of binary files on Amazon S3.

Chapter 8, Tuning and Usage Costs, mainly covers the BoxUsage of different SimpleDB queries and the usage costs, along with viewing the usage activity reports.

Chapter 9, Caching, explains memcached and Cache_Lite in detail and their use for caching. It further explores a way you can use memcached with SimpleDB to avoid making unnecessary requests to SimpleDB, that is, by using libraries in Java, PHP, and Python.

Chapter 10, Parallel Processing, analyzes how to utilize multiple threads for running parallel operations against SimpleDB in Java, PHP, and Python in order to speed up processing times and taking advantage of the excellent support for concurrency in SimpleDB.

What you need for this book

To get started with the book and try out the code samples included here you will need following software:

For Python:

Python 2.5 (http://python.org/download/)Boto latest version (http://code.google.com/p/boto/downloads/list)

For Java:

JDK6 latest version (http://java.sun.com/javase/downloads/index.jsp)Typica latest version (http://typica.googlecode.com/files/typica-1.6.zip)

For the PHP part:

PHP with curl support enabledGeSHi (optional): If Generic Syntax Highlighter package is installed the PHP source will be formatted when displayed in the samples available free from http://qbnz.com/highlighter/

Who this book is for

If you are a developer wanting to build scalable, web-based database applications using SimpleDB, then this book is for you. You do not need to know anything about SimpleDB to read and learn from this book, and no basic knowledge is strictly necessary. This guide will help you to start from scratch and build advanced applications.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a book that you need and would like to see us publish, please send us a note in the SUGGEST A TITLE form on www.packtpub.com or e-mail <[email protected]>.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book on, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Tip

Downloading the example code for the book

Visit https://www.packtpub.com//sites/default/files/downloads/7344_Code.zip to directly download the example code.

The downloadable files contain instructions on how to use them.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration, and help us to improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the let us know link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata added to any list of existing errata. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or web site name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Getting to Know SimpleDB

Most developers would describe a modern database as relational with stored procedures and cross-table functions such as join. So why would you use a database that has none of these capabilities? The answer is scalability.

This morning, CNN ran a story on your new web application. Yesterday you had 10 concurrent users, and now your site is viral with 50,000 users signing on. Which database will handle 50,000 concurrent users without a complex expensive cluster? The answer is SimpleDB.

Why SimpleDB?

ScalabilityPay only for your useAccess from any web-based systemNo fixed schema

Challenges?

New metaphor—write seldom, read manyEventual consistency

SimpleDB is one of the core Amazon Web Services, which include Amazon Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2). Amazon SimpleDB stores your structured data as key-value pairs in the Amazon Web Services (AWS) cloud and lets you run real-time queries against this data. You can scale it easily in response to increased load from your successful applications without the need for a costly cluster database server complex.

SimpleDB, as illustrated in the following diagram, is designed to be used either as an independent data storage component in your applications or in conjunction with some of the other services from Amazon's stable of Cloud Services, such as Amazon S3 and Amazon EC2.

The biggest challenge in SimpleDB is learning to think in its unique metaphor. Like speaking a new language, you need to stop translating and start thinking in that language. Rather than thinking of SimpleDB as a database, approach it as a spreadsheet with some XML characteristics.

SimpleDB functionality can be accessed from almost any programming language (such as Python, Ruby, Java, PHP, Erlang, and Perl) using super simple HTTP-based requests. You can get started anytime you like, and you pay for it based on how much you use it. It is very different from a relational database, and takes a completely different approach toward storing and querying data. It follows the convention of eventual consistency. Think of it as a single master database for updates and a large collection of read database slaves. Any changes made to your data will need to be propagated across all the different copies. This can sometimes take a few seconds depending upon the system load at that time and network latency, which means that a consumer of your domain and data may not see the changes immediately. The changes will eventually be propagated throughout SimpleDB, but this is an important consideration you need to think about when designing your application.

Experimenting with SimpleDB

As SimpleDB is so different, it helps to have a tool for manipulating and exploring the database. When developing with a MySQL database, phpMyAdmin allows the developer to work directly on the database. SimpleDB has a similar free Firefox plugin called sdbtool (http://code.google.com/p/sdbtool/). Another Firefox plugin used in the more advanced examples is S3Fox (http://www.s3fox.net/) for administering the Amazon S3 storage. In this book, we cover several basic sample applications. We have also provided code to show each SimpleDB application using three languages: Java, PHP, and Python.

As access to SimpleDB can be from any site on the Web, the PHP samples can be downloaded and run directly from your site. To run any sample, an Amazon account is required. These samples let you explore most of the SimpleDB API, as well as some of the S3 API capabilities.

You can both download and try the PHP samples from http://www.webmasterinresidence.ca/simpledb/.

How does SimpleDB work?

The best way to wrap your head around the way SimpleDB works is to picture a spreadsheet that contains your structured data. For instance, a contact database that stores information on your customers can be represented in SimpleDB as follows:

As SimpleDB is a different database metaphor, new terms have been introduced. The use of this new terminology by Amazon stresses that the traditional assumptions may not be valid.

Domain

The entire customers table will be represented as the domain Customers. Domains group similar data for your application, and you can have up to 100 domains per AWS account. If required, you can increase this limit further by filling out a form on the SimpleDB website. The data stored in these domains is retrieved by making queries against the specific domain. There is no concept of joins as in the relational database world; therefore, queries run within a specific domain and not across domains.

Item

Each customer is represented by a unique Customer ID. Items are similar to rows in a database table. Each item identifies a single object and contains data for that individual item as a number of key-value attributes. Each item is identified by a unique key or identifier, or in traditional terminology, the primary key. SimpleDB does not support the concept of auto-incrementing keys, and most people use a generated key such as the unix timestamp combined with the user identifier or something similar as the unique identifier for an item. You can have up to one billion items in each domain.

Attributes

Each customer item will have distinguishing characteristics that are represented by an attribute. A customer will have a name, a phone number, an address, and other such attributes, which are similar to the columns in a table in a database. SimpleDB even enables you to have different attributes for each item in a domain. This kind of schema independence lets you mix and match items within a domain to satisfy the needs of your application easily, while at the same time enables you to take advantage of the benefits of the automatic indexing provided by SimpleDB. If your company suddenly decides to start marketing using Twitter, you can simply add a new attribute to your customer domain for the customers who have a Twitter ID! In traditional database terminology, there is no need to add a new column to the table.

Values

Each customer attribute will be associated with a value, which is the same as a cell in a spreadsheet or the value of a column in a database. A relational database or a spreadsheet supports only a single value for each cell or column, while SimpleDB allows you to have multiple values for a single attribute. This lets you do things such as store multiple e-mail addresses for a customer while taking advantage of automatic indexing, without the need for you to manually create new and separate columns for each e-mail address, and then index each new column. In a relational DB, a separate table with a join would be used to store the multiple values. Unlike a delimited list in a character field, the multiple values are indexed, enabling quick searching.

It is a simple way of modeling your data, but at the same time, it is different from a relational database model that is familiar to most users. The following table compares SimpleDB components with a spreadsheet and a relational database:

Relational Database

Spreadsheet

SimpleDB

Table

Worksheet

Domain

Row

Row

Item

Column

Cell

Attribute

Value

Value

Value(s)

How do I interact with SimpleDB?