31,19 €
With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server.
Microsoft SQL Server 2012 with Hadoop is aimed at SQL Server developers. It will quickly show you how to get Hadoop activated on SQL Server 2012 (it ships with this version). Once this is done, the book will focus on how to manage big data with Hadoop and use Hadoop Hive to query the data. It will also cover topics such as using in-memory functions by SQL Server and using tools for BI with big data.
Microsoft SQL Server 2012 with Hadoop focuses on data integration techniques between relational (SQL Server 2012) and non-relational (Hadoop) worlds. It will walk you through different tools for the bi-directional movement of data with practical examples.
You will learn to use open source connectors like SQOOP to import and export data between SQL Server 2012 and Hadoop, and to work with leading in-memory BI tools to create ETL solutions using the Hive ODBC driver for developing your data movement projects. Finally, this book will give you a glimpse of the present day self-service BI tools such as Excel and PowerView to consume Hadoop data and provide powerful insights on the data.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 73
Veröffentlichungsjahr: 2013
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2013
Production Reference: 1200813
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78217-798-2
www.packtpub.com
Cover Image by Aniket Sawant (<[email protected]>)
Authors
Debarchan Sarkar
Reviewer
Atdhe Buja Msc
Acquisition Editor
James Jones
Commissioning Editor
Shaon Basu
Technical Editor
Chandni Maishery
Project Coordinator
Akash Poojary
Proofreader
Mario Cecere
Indexer
Rekha Nair
Tejal Soni
Graphics
Abhinash Sahu
Production Coordinator
Nilesh R. Mohite
Cover Work
Nilesh R. Mohite
Debarchan Sarkar is a Microsoft Data Platform engineer who hails from Calcutta, the "city of joy", India. He has been a seasoned SQL Server engineer with Microsoft, India for the last six years and has now started venturing into the open source world, specifically the Apache Hadoop framework. He is a SQL Server Business Intelligence specialist with subject matter expertise in SQL Server Integration Services.
Debarchan is currently working on another book with Apress on Microsoft's Hadoop distribution, HDInsight.
I would like to thank my parents, Devjani Sarkar and Asok Sarkar for their continuous support and encouragement behind this book.
Atdhe Buja Msc is a Certified Ethical Hacker, Database Administrator (MCITP, OCA11g) and a developer with good management skills. He is a DBA at Ministry of Public Administration, Pristina, RKS, where he also manages some projects of E-Governance and eight years' experience in SQL Server.
Atdhe is a regular columnist for UBT News, currently he holds a MSc. in Computer Science and Engineering, has a Bachelor in Management and Information and continues studies for a Bachelor degree in Political Science in UP.
Specialized and Certified in many technologies such as SQL Server 2000, 2005, 2008, 2008 R2, Oracle 11g, CEH-Ethical Hacker, Windows Server, MS Project, System Center Operation Manager, and Web Design.
His capabilities go beyond the above mentioned knowledge!
I thank my wife Donika Bajrami and my family Buja for all the encouragement and support.
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Get notified! Find out when new books are published by following @PacktEnterprise on Twitter, or the Packt Enterprise Facebook page.
Data management needs have evolved from traditional relational storage to both relational and non-relational storage and a modern information management platform needs to support all types of data. To deliver insight on any data, you need a platform that provides a complete set of capabilities for data management across relational, non-relational, and streaming data while being able to seamlessly move data from one type to another and being able to monitor and manage all your data regardless of the type of data or data structure it is. Apache Hadoop is the widely accepted Big Data tool, similarly, when it comes to RDBMS, SQL Server 2012 is perhaps the most powerful, in-memory and dynamic data storage and management system. This book enables the reader to bridge the gap between Hadoop and SQL Server, in other words, between the non-relational and relational data management worlds. The book specifically focusses on the data integration and visualization solutions that are available with the rich Business Intelligence suite of SQL Server and their seamless communication with Apache Hadoop and Hive.
Chapter 1, Introduction to Big Data and Hadoop, introduces the reader to the Big Data and Hadoop world. This chapter explains the need for Big Data solutions, the current market trends, and enables the user to be a step ahead during the data explosion that is soon to happen.
Chapter 2, Using Sqoop – SQL Server Hadoop Connector, covers the open source Sqoop-based Hadoop Connector for Microsoft SQL Server. This chapter explains the basic Sqoop commands to import/export files to and from SQL Server and Hadoop.
Chapter 3, Using the Hive ODBC Driver, explains the ways to consume data from Hadoop and Hive using the Open Database Connectivity (ODBC) interface. This chapter shows you how to create an SQL Server Integration Services package to move data from Hadoop to SQL Server using the Hive ODBC driver.
Chapter 4, Creating a data model with SQL Server Analysis Services, illustrates how to consume data from Hadoop and Hive from SQL Server Analysis Services. The reader will learn to use the Hive ODBC driver to create a Linked Server from SQL to Hive and build an Analysis Services multidimensional model.
Chapter 5, Using Microsoft's Self-Service Business Intelligence Tools, introduces the reader to the rich set of self-service BI tools available with SQL Server 2012 BI suite. This chapter explains how to build powerful visualization on Hadoop data quickly and easily with a few mouse clicks.
Following are the software prerequisites for running the samples in the book:
