Microsoft SQL Server 2012 with Hadoop - Debarchan Sarkar - E-Book

Microsoft SQL Server 2012 with Hadoop E-Book

Debarchan Sarkar

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be stored in Hadoop taking hours to be queried, terabytes of structured data can be stored in SQL Server 2012 and queried in seconds. This leads to the need to transfer and integrate data between Hadoop and SQL Server.

Microsoft SQL Server 2012 with Hadoop is aimed at SQL Server developers. It will quickly show you how to get Hadoop activated on SQL Server 2012 (it ships with this version). Once this is done, the book will focus on how to manage big data with Hadoop and use Hadoop Hive to query the data. It will also cover topics such as using in-memory functions by SQL Server and using tools for BI with big data.

Microsoft SQL Server 2012 with Hadoop focuses on data integration techniques between relational (SQL Server 2012) and non-relational (Hadoop) worlds. It will walk you through different tools for the bi-directional movement of data with practical examples.

You will learn to use open source connectors like SQOOP to import and export data between SQL Server 2012 and Hadoop, and to work with leading in-memory BI tools to create ETL solutions using the Hive ODBC driver for developing your data movement projects. Finally, this book will give you a glimpse of the present day self-service BI tools such as Excel and PowerView to consume Hadoop data and provide powerful insights on the data.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 73

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Microsoft SQL Server 2012 with Hadoop
Credits
About the Author
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Instant Updates on New Packt Books
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Introduction to Big Data and Hadoop
Big Data – what's the big deal?
The Apache Hadoop framework
HDFS
MapReduce
NameNode
Secondary NameNode
DataNode
JobTracker
TaskTracker
Hive
Pig
Flume
Sqoop
Oozie
HBase
Mahout
Summary
2. Using Sqoop – The SQL Server Hadoop Connector
The SQL Server-Hadoop Connector
Installation prerequisites
A Hadoop cluster on Linux
Installing and configuring Sqoop
Setting up the Microsoft JDBC driver
Downloading the SQL Server-Hadoop Connector
Installing the SQL Server-Hadoop Connector
The Sqoop import tool
Importing the tables in Hive
The Sqoop export tool
Data types
Summary
3. Using the Hive ODBC Driver
The Hive ODBC Driver
SQL Server Integration Services (SSIS)
SSIS as an ETL – extract, transform, and load tool
Developing the package
Creating the project
Creating the Data Flow
Creating the source Hive connection
Creating the destination SQL connection
Creating the Hive source component
Creating the SQL destination component
Mapping the columns
Running the package
Summary
4. Creating a Data Model with SQL Server Analysis Services
Configuring the SQL Linked Server to Hive
The Linked Server script
Using OpenQuery
Creating a view
Creating an SSAS data model
Summary
5. Using Microsoft's Self-Service Business Intelligence Tools
PowerPivot enhancements
Power View for Excel
Summary
Index

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop

Copyright © 2013 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: August 2013

Production Reference: 1200813

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78217-798-2

www.packtpub.com

Cover Image by Aniket Sawant (<[email protected]>)

Credits

Authors

Debarchan Sarkar

Reviewer

Atdhe Buja Msc

Acquisition Editor

James Jones

Commissioning Editor

Shaon Basu

Technical Editor

Chandni Maishery

Project Coordinator

Akash Poojary

Proofreader

Mario Cecere

Indexer

Rekha Nair

Tejal Soni

Graphics

Abhinash Sahu

Production Coordinator

Nilesh R. Mohite

Cover Work

Nilesh R. Mohite

About the Author

Debarchan Sarkar is a Microsoft Data Platform engineer who hails from Calcutta, the "city of joy", India. He has been a seasoned SQL Server engineer with Microsoft, India for the last six years and has now started venturing into the open source world, specifically the Apache Hadoop framework. He is a SQL Server Business Intelligence specialist with subject matter expertise in SQL Server Integration Services.

Debarchan is currently working on another book with Apress on Microsoft's Hadoop distribution, HDInsight.

I would like to thank my parents, Devjani Sarkar and Asok Sarkar for their continuous support and encouragement behind this book.

About the Reviewer

Atdhe Buja Msc is a Certified Ethical Hacker, Database Administrator (MCITP, OCA11g) and a developer with good management skills. He is a DBA at Ministry of Public Administration, Pristina, RKS, where he also manages some projects of E-Governance and eight years' experience in SQL Server.

Atdhe is a regular columnist for UBT News, currently he holds a MSc. in Computer Science and Engineering, has a Bachelor in Management and Information and continues studies for a Bachelor degree in Political Science in UP.

Specialized and Certified in many technologies such as SQL Server 2000, 2005, 2008, 2008 R2, Oracle 11g, CEH-Ethical Hacker, Windows Server, MS Project, System Center Operation Manager, and Web Design.

His capabilities go beyond the above mentioned knowledge!

I thank my wife Donika Bajrami and my family Buja for all the encouragement and support.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print and bookmark contentOn demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Instant Updates on New Packt Books

Get notified! Find out when new books are published by following @PacktEnterprise on Twitter, or the Packt Enterprise Facebook page.

Preface

Data management needs have evolved from traditional relational storage to both relational and non-relational storage and a modern information management platform needs to support all types of data. To deliver insight on any data, you need a platform that provides a complete set of capabilities for data management across relational, non-relational, and streaming data while being able to seamlessly move data from one type to another and being able to monitor and manage all your data regardless of the type of data or data structure it is. Apache Hadoop is the widely accepted Big Data tool, similarly, when it comes to RDBMS, SQL Server 2012 is perhaps the most powerful, in-memory and dynamic data storage and management system. This book enables the reader to bridge the gap between Hadoop and SQL Server, in other words, between the non-relational and relational data management worlds. The book specifically focusses on the data integration and visualization solutions that are available with the rich Business Intelligence suite of SQL Server and their seamless communication with Apache Hadoop and Hive.

What this book covers

Chapter 1, Introduction to Big Data and Hadoop, introduces the reader to the Big Data and Hadoop world. This chapter explains the need for Big Data solutions, the current market trends, and enables the user to be a step ahead during the data explosion that is soon to happen.

Chapter 2, Using Sqoop – SQL Server Hadoop Connector, covers the open source Sqoop-based Hadoop Connector for Microsoft SQL Server. This chapter explains the basic Sqoop commands to import/export files to and from SQL Server and Hadoop.

Chapter 3, Using the Hive ODBC Driver, explains the ways to consume data from Hadoop and Hive using the Open Database Connectivity (ODBC) interface. This chapter shows you how to create an SQL Server Integration Services package to move data from Hadoop to SQL Server using the Hive ODBC driver.

Chapter 4, Creating a data model with SQL Server Analysis Services, illustrates how to consume data from Hadoop and Hive from SQL Server Analysis Services. The reader will learn to use the Hive ODBC driver to create a Linked Server from SQL to Hive and build an Analysis Services multidimensional model.

Chapter 5, Using Microsoft's Self-Service Business Intelligence Tools, introduces the reader to the rich set of self-service BI tools available with SQL Server 2012 BI suite. This chapter explains how to build powerful visualization on Hadoop data quickly and easily with a few mouse clicks.

What you need for this book

Following are the software prerequisites for running the samples in the book:

Apache Hadoop 1.0 cluster with Hive 0.9 configuredSQL Server 2012 with Integration Services and Analysis Services installedMicrosoft Office 2013

Who this book is for