Microsoft SQL Server 2012 Integration Services: An Expert Cookbook - Reza Rad - E-Book

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook E-Book

Reza Rad

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

SQL Server Integration Services (SSIS) is a leading tool in the data warehouse industry - used for performing extraction, transformation, and load operations. This book is aligned with the most common methodology associated with SSIS known as Extract Transform and Load (ETL); ETL is responsible for the extraction of data from several sources, their cleansing, customization, and loading into a central repository normally called Data Warehouse or Data Mart.Microsoft SQL Server 2012 Integration Services: An Expert Cookbook covers all the aspects of SSIS 2012 with lots of real-world scenarios to help readers understand usages of SSIS in every environment. Written by two SQL Server MVPs who have in-depth knowledge of SSIS having worked with it for many years.This book starts by creating simple data transfer packages with wizards and illustrates how to create more complex data transfer packages, troubleshoot packages, make robust SSIS packages, and how to boost the performance of data consolidation with SSIS. It then covers data flow transformations and advanced transformations for data cleansing, fuzzy and term extraction in detail. The book then dives deep into making a dynamic package with the help of expressions and variables, and performance tuning and consideration.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 612

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Instant Updates on New Packt Books
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Started with SQL Server Integration Services
Introduction
Import and Export Wizard: First experience with SSIS
Getting ready
How to do it...
How it works...
There's more...
Mapping columns
Configure transfer settings for multiple tables
Mapping data types
Querying the source database
See also
Getting started with SSDT
How to do it...
How it works...
Creating the first SSIS Package
Getting ready
How to do it...
How it works...
Getting familiar with Data Flow Task
Getting ready
How to do it...
How it works...
SSIS 2012 versus previous versions in Developer Experience
Getting ready
How to do it...
How it works...
2. Control Flow Tasks
Introduction
Executing T-SQL commands: Execute SQL Task
How to do it...
How it works...
There's more...
Full result set
XML result set
BypassPrepare
See also
Handling file and folder operations: File System Task
Getting ready
How to do it...
There's more...
Sending and receiving files through FTP: FTP Task
How to do it...
How it works...
There's more...
FTP Connection Manager's advanced options
File Connection Manager
Working with variables
Executing other packages: Execute Package Task
Getting ready
How to do it...
How it works...
There's more...
Advanced options for executing external packages
Running external applications: Execute Process Task
Getting ready
How to do it...
How it works...
Reading data from web methods: Web Service Task
How to do it...
How it works…
Transforming, validating, and querying XML: XML Task
Getting ready
How to do it...
How it works...
There's more...
XPath
Merge
Diff
Patch
See also
Profiling table statistics: Data Profiling Task
Getting ready
How to do it...
How it works...
There's more...
Batch insertion of data into a database: Bulk Insert Task
Getting ready
How to do it...
How it works...
Querying system information: WMI Data Reader Task
Getting ready
How to do it...
How it works...
There's more...
WMI code creator—a useful tool for WMI
See also
Querying system events: WMI Event Watcher Task
Getting ready
How to do it...
How it works...
Transferring SQL server objects: DBMS Tasks
Getting ready
How to do it...
How it works…
3. Data Flow Task Part 1—Extract and Load
Introduction
Working with database connections in Data Flow
Getting ready
How to do it...
How it works...
ADO.NET Source
OLE DB Destination
There's more...
Connect to ODBC data providers
Exploring All properties of connection managers
Advanced Editor
Source and Destination Assistant
Shared Connection Manager
SQL Server Destination
Fast Load option
Querying source data dynamically
See also
Working with flat files in Data Flow
Getting ready
How to do it...
How it works...
Passing data between packages—Raw Source and Destination
How to do it...
How it works...
Raw File Destination
Raw File Source
There's more...
Raise performance with Raw Files
Importing XML data with XML Source
Getting ready
How to do it...
How it works...
There's more...
Hierarchical XML data
Loading data into memory—Recordset Destination
Getting ready
How to do it...
How it works...
Extracting and loading Excel data
Getting ready
How to do it...
How it works...
Data access
Data types
Change Data Capture
Getting ready
How to do it...
How it works...
4. Data Flow Task Part 2—Transformations
Introduction
Derived Column: adding calculated columns
Getting ready
How to do it...
How it works...
Audit Transformation: logging in Data Flow
Getting ready
How to do it...
How it works...
Aggregate Transform: aggregating the data stream
How to do it...
How it works...
Conditional Split: dividing the data stream based on conditions
Getting ready
How to do it...
How it works...
Lookup Transform: performing the Upsert scenario
Getting ready
How to do it...
How it works...
Specify how to handle rows with no matching entries
Connection tab
Columns tab
Two different outputs
There's more...
Cache mode
Connection type
OLE DB Command: executing SQL statements on each row in the data stream
Getting ready
How to do it...
How it works...
OLE DB
Parameter markers
There's more...
Fetching the output parameter into the data stream column
Merge and Union All transformations: combining input data rows
Getting ready
How to do it...
How it works...
Merge Join Transform: performing different types of joins in data flow
How to do it...
How it works...
Joining key(s)
Join types
There's more...
Performance issue with Sort Transformation
Order by clause
IsSorted
SortKeyPosition
Merge Join versus Lookup
Sort Transform properties
Remove rows with duplicate sort values
Sort Order
Sort Type
Comparison Flags
Pass Through
Multicast: creating copies of the data stream
Getting ready
How to do it...
How it works...
Working with BLOB fields: Export Column and Import Column transformations
Getting ready
How to do it...
How it works...
Export Column
Import Column
Slowly Changing Dimensions (SCDs) in SSIS
Getting ready
How to do it...
How it works...
5. Data Flow Task Part 3—Advanced Transformation
Introduction
Pivot and Unpivot Transformations
Getting ready
How to do it...
How it works...
Pivot Transformation
Unpivot Transformation
There's more…
Pivot Transformation; Working with the Advanced Editor
Text Analysis with Term Lookup and Term Extraction transformations
Getting ready
How to do it...
How it works...
There's more...
Excluding Terms
DQS Cleansing Transformation—Cleansing Data
Getting ready
How to do it...
How it works...
There's more...
The DQS Cleansing Component acts asynchronously
Advanced settings
Status Column
Fuzzy Transformations—how SSIS understands fuzzy similarities
Getting ready
How to do it...
How it works...
6. Variables, Expressions, and Dynamism in SSIS
Introduction
Variables and data types
Getting ready
How to do it...
How it works...
Using expressions in Control Flow
Getting ready
How to do it...
How it works...
DelayValidation
IsVariable
Real-world scenarios
Using expressions in Data Flow
Getting ready
How to do it...
How it works...
The Expression Task
How to do it...
How it works...
There's more...
Alternative method
Dynamic connection managers
Getting ready
How to do it...
How it works...
Dynamic data transfer with different data structures
Getting ready
How to do it...
How it works...
Other sources and destinations
Flat file to SQL server
SQL query to flat file
SQL server to SQL server on the same server
Excel to SQL server
Other DB engines with SQL server
Any other source and destination
7. Containers and Precedence Constraints
Introduction
Sequence Container: putting all tasks in an executable object
How to do it...
How it works...
For Loop Container: looping through static enumerator till a condition is met
Getting ready
How to do it...
How it works...
Foreach Loop Container: looping through result set of a database query
Getting ready
How to do it...
How it works...
ADO Enumerator
Variable Mappings
Type Cast in expressions in the derived column
Foreach Loop Container: looping through files using File Enumerator
Getting ready
How to do it...
How it works...
Foreach Loop Container: looping through data table
Getting ready
How to do it...
How it works...
DelayValidation
Precedence Constraints: how to control the flow of task execution
How to do it...
How it works...
Types of Constraints
Evaluation operation
Multiple constraints
8. Scripting
Introduction
The Script Task: Scripting through Control Flow
How to do it...
How it works...
ReadOnlyVariables
ReadWriteVariables
Script
Code description
Script results
Fire Events
The Script Component as a Transformation
How to do it...
How it works...
Script
See also
The Script Component as a Source
Getting ready
How to do it...
How it works...
CreateNewOutputRows
AddRow
See also
The Script Component as a Destination
How to do it...
How it works...
The Asynchronous Script Component
Getting ready
How to do it...
How it works...
SynchronousInputID
InputName_ProcessInput
There's more...
Script component methods
AcquireConnection
Log
PostExecute
PreExecute
ProcessInput
ReleaseConnection
9. Deployment
Introduction
Project Deployment Model: Project Deployment from SSDT
Getting ready
How to do it...
How it works...
Using Integration Services Deployment Wizard and command-line utility for deployment
Getting ready
How to do it...
How it works...
ISPAC file
Integration Services Deployment Wizard graphical user interface
Integration Services Deployment Wizard command-line utility
The Package Deployment Model, Using SSDT to deploy package
Getting ready
How to do it...
How it works...
File system
SQL Server
SSIS package store
Creating and running Deployment Utility
How to do it...
How it works...
Deployment Utility Ingredients
DTUTIL—the command-line utility for deployment
How to do it...
How it works...
There's more...
DTUTIL exit codes
Multiple packages deployment
Protection level: Securing sensitive data
How to do it...
How it works...
Sensitive data
Protection level types
Do Not Save Sensitive
Encrypt Sensitive with User Key
Encrypt Sensitive with Password
Encrypt All with User Key
Encrypt All with Password
Rely on Server Storage
There's more...
The ProtectionLevel property of a package
Protection level in project properties
Protection level in DTUTIL
10. Debugging, Troubleshooting, and Migrating Packages to 2012
Introduction
Troubleshooting with Progress and Execution Results tab
Getting ready
How to do it...
How it works...
Breakpoints, Debugging the Control Flow
Getting ready
How to do it...
How it works...
How to enable breakpoints
Breakpoints window
Breakpoint menu options
Monitoring windows
Breakpoints limitation in SSIS
Script breakpoint support
Handling errors in Data Flow
Getting ready
How to do it...
How it works...
Error columns and understanding them
Migrating packages to 2012
How to do it...
How it works...
Data Tap
Getting ready
How to do it...
How it works...
Create the execution
Create the Data Tap
Running the package
11. Event Handling and Logging
Introduction
Logging over Legacy Deployment Model
Getting ready
How to do it...
How it works...
Logging over Project Deployment Model
Getting ready
How to do it...
How it works...
There's more...
The SSISDB database
The SSIS dashboards
Some useful SSIS Catalog features
1 SSISDB
2 PacktPub
3 Projects
4 R02_Project Deployment Mode
Using event handlers and system variables for custom logging
Getting ready
How to do it...
Traditional approach
Project Deployment Mode
How it works...
There's more...
The traditional approach—Legacy Deloyment Model
The new approach—Project Deployment Model
Enriching default views
Get row counts
Custom reporting services reports
12. Execution
Introduction
Execution from SSMS
Getting ready
How to do it...
How it works...
There's more...
Logging level
Package validation
Execution and validation reports
Legacy package execution from SSMS
Execution from a command-line utility
Getting ready
How to do it...
How it works...
DTEXEC
There's more...
Execution with DTExecUI
Execution with SSIS catalog procedures
32-bit / 64-bit issue
Execution from a scheduled SQL Server Agent job
Getting ready
How to do it...
How it works...
There's more...
Running SQL Server Agent job under a proxy account
Creating an SQL Server job more easily
13. Restartability and Robustness
Introduction
Parameters: Passing values to packages from outside
Getting ready
How to do it...
How it works...
There's more...
Environment
Package configuration: Legacy method to inter-relation
Getting ready
How to do it...
How it works...
XML configuration file
SQL server
Environment variable
Registry entry
Parent package variable
There's more...
Indirect configuration
Configuration priority
Transactions: Doing multiple operations atomic
Getting ready
How to do it...
How it works...
Required
Supported
NotSupported
Checkpoints: The power of restartability
Getting ready
How to do it...
How it works...
CheckpointFileName
FailPackageOnFailure
CheckpointUsage
SaveCheckpoints
SSIS reports and catalog views
Getting ready
How to do it...
How it works...
There's more...
SSIS Catalog views
Simple catalog views
14. Programming SSIS
Introduction
Creating and configuring Control Flow Tasks programmatically
Getting ready
How to do it...
How it works...
Working with Data Flow components programmatically
Getting ready
How to do it...
How it works...
Executing and managing packages programmatically
Getting ready
How to do it...
How it works...
Saving packages
Loading packages
There's more...
Methods of the Application class
Set parameter's value programmatically
Creating and using Custom Tasks
How to do it...
How it works...
Signing the project
Adding assembly to GAC
There's more...
Creating custom Data Flow component
Creating UI Editor for custom object
15. Performance Boost in SSIS
Introduction
Control Flow Task and variables considerations for boosting performance
How to do it...
Using SSISDB Catalog
Progress Bar
Windows Performance Monitor
BIDS Helper
How it works...
Data Flow best practices in Extract and Load
Getting ready
How to do it...
Optimize Queries
OLE DB Destination
Data conversions
Update data into destination
How it works...
There's more...
Data Flow best practices in Transformations
Getting ready
How to do it...
Remove unused columns from the pipeline
Avoid unnecessary sorting in the pipeline
Lookup reference data
Make use of the Cache Transformation to store data in memory
How it works...
There's more...
Replace lookup by changing SQL statement at the source
Working with buffer size
How to do it...
How it works...
Working with performance counters
How to do it...
How it works...
Index

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook

Copyright © 2012 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2012

Production Reference: 1140512

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-84968-524-5

www.packtpub.com

Cover Image by Artie Ng (<[email protected]>)

Credits

Authors

Reza Rad

Pedro Perfeito

Reviewers

Phil Brammer

Brenner Grudka Lira

April L. Rains

Rafael Salas

Milla Smirnova

Acquisition Editor

Rukshana Khambatta

Lead Technical Editors

Kedar Bhat

Meeta Rajani

Technical Editors

Joyslita D'Souza

Manasi Poonthottam

Aaron Rosario

Project Coordinator

Leena Purkait

Proofreaders

Mario Cecere

Chris Smith

Indexer

Monica Ajmera Mehta

Graphics

Valentina D'silva

Manu Joseph

Production Coordinator

Aparna Bhagat

Cover Work

Aparna Bhagat

Foreword

Data Transformation Services (DTS) was Microsoft's first entrance into the world of advanced data transformation and task-oriented tools, allowing users to rapidly move data from one point to another, or to perform common tasks such as FTPing files from one server to another. New to SQL Server 2000, this tool was the foundation for many developers' toolkits. The UI was easy to use and understand, precedence constraints could be applied between tasks ensuring business rules were maintained, and custom code could be added to perform advanced tasks not found in the boxed feature set. DTS is still the bar against which many measure SQL Server Integration Services.

SQL Server Integration Services (SSIS), introduced in SQL Server 2005 and largely unchanged through SQL Server 2008 R2, was a rewrite of both the toolset and the paradigm by which developers were used to thinking as compared to the relatively easy-to-use DTS. SSIS has its strengths in separating the work surface of a DTS package into distinct parts, the Control Flow and the Data Flow. The Control Flow is designed to direct the "flow" of the package, ensure dependencies are met before executing a downstream task, perform looping operations over a varied list of sources, execute SQL statements, and so on. A Data Flow Task is designed to move data from one source to another, transforming data along the way. The separation allows for greater flexibility in developing a package by limiting the scope of what a developer can edit at once, and by allowing specific tasks to be copied and subsequently reused.

SSIS is not without its list of negatives, however. Through SQL Server 2008 R2, an SSIS package was a single entity, which could be executed in any number of places from within Business Intelligence Developer Studio, from the filesystem, or on a SQL Server instance. In a shop that has a large number of packages deployed, it was extremely difficult to manage all of the packages and track all of the activities that the packages were doing. This meant that developers were forced to write their own logging solutions to capture data such as row counts, start and end times, audit information, and any other pertinent information necessary to support the package. SSIS also has a steep learning curve, which many developers find very hard to overcome.

SQL Server 2012 introduces some very welcome additions to the existing SSIS product. The most welcome addition, and the one I am most excited about, is the inclusion of a true server-side component to SSIS. Choosing to deploy packages to the server will allow developers and administrators to finally get ease of deployment, and capture the most often requested information about the execution of packages. This server component, called the SSIS Catalog, and its new project deployment model allow administrators to override logging levels, set input parameters, and view built-in reports in an easy-to-use presentation format. In the new project deployment model, the project build process creates a .ispac file, which can be shared with any person doing the physical deployment of the project. The file includes all of the packages in the project, any shared project-level connections, and other metadata pertaining to the project. Double-clicking on the file will start the deployment wizard. Very easy.

Some other changes found in SQL Server 2012 SSIS are a revamped design surface helping to meet accessibility requirements, full undo/redo capability, a removed limit of 4,000 characters on expressions, ability to change variable scopes, and so on.

This book will walk you through, step-by-step, each major feature of SSIS in SQL Server 2012, and how to use them. Pedro and Reza have given contextual examples where possible, and you will be able to download and implement them yourself to help you follow along each recipe. If you are an experienced SSIS developer or you are new to the product, this book will be an often-referenced resource in your bookshelf. Pedro and Reza have put together a great reference book that I know you'll enjoy.

Phil Brammer

Microsoft MVP – SQL Server

About the Authors

Reza Rad is an author, trainer, speaker, and consultant. He has a BSc in Computer Engineering; he has more than 10 years' experience in programming and development mostly on Microsoft technologies. He received the Microsoft Most Valuable Professional (MVP) award in SQL Server in 2011 and 2012 for his dedication in Microsoft BI and specially SSIS. He has been working on the Microsoft BI suite for more than six years. He is an SSIS/MSBI/.NET Trainer and also software and BI Consultant at some companies and institutes. His articles on different aspects of technologies, specially on SSIS, can be found on his blog http://www.rad.pasfu.com.

He was the co-author of SQL Server MVP Deep Dives Volume 2. He is one of the active members on online technical forums such as MSDN and Experts-Exchange. He is a Microsoft Certified Professional (MCP); Microsoft Certified Technology Specialist (MCTS) and Microsoft Certified IT Professional (MCITP) in Business Intelligence (BI). His e-mail address is <[email protected]>.

I would like to thank my wife who has been a wonderful supporter in writing this book; she encouraged me a lot to complete this book, she was a light during my difficult moments.

I would also like to thank my parents and sister, who were my teachers for many years of my life.

I would like to thank Pedro, my good friend who helped a lot in writing this book. He did a good job in completing this book in his busy hours with full-time job and teaching.

Pedro Perfeito was born in 1977 in Portugal and currently works as a BI Senior Consultant and Developer at Novabase. He's also an invited teacher in master and short-master BI degrees at IUL-ISCTE (Lisbon) and at Universidade Portucalense (UPT-Porto) respectively. He received the Microsoft award Microsoft Most Valuable Professional (MVP) in 2010, 2011, and 2012 for all his dedication and contribution in helping theoretical and practical issues in the various BI communities. He is also the co-author of SQL Server MVP Deep Dives Volume 2. He has several Microsoft certifications including MCP, MCSD, MCTS-Web, MCTS-BI, and MCITP-BI. He also has worldwide certifications in the area of BI provided by TDWI/CBIP (The Data Warehouse Institute, http://www.tdwi.org). He's currently preparing for his PhD degree on BI. For further details you can visit his personal blog at http://www.pedrocgd.blogspot.com or even contact him directly at <[email protected]>.

I would like to express my gratitude to all teams at Packt who trusted me—a Portuguese author—and helped me complete this book. I would like to thank my friend and co-author of this book Reza Rad because without him this book would not have been possible.

I have furthermore to thank Barbara Chambel for all the support she gave me since the first moment at Novabase, to Luis Ferreira (Project Manager at Banco de Portugal) and Simão Fernandes (ex-student and colleague at Novabase) for all hints and complaints from the previous SSIS version (you both know which ones I mean!) and for all my Master BI students from Universidade Portucalense (Oporto) and from ISCTE-IUL (Lisbon) who have directly and indirectly motivated me in this challenge.

I am deeply indebted to Dr. Maria José Trigueiros for all the encouragement to go inside this amazing world of Business Intelligence and make my dream come true. She's not physically with us but she will be remembered for ever.

Especially, I would like to give my special thanks to my family and my girlfriend Joana whose patient love helped me to complete this work!

Thanks to all who I haven't mentioned here and who believed in me, even more than myself.

About the Reviewers

Phil Brammer, a fifth year Microsoft MVP in SQL Server, has over 12 years' data warehousing experience in various technologies from reporting through ETL to database administration. He has worked with SSIS since 2007 and he continues to play an active role in the SSIS community via online resources as well as his technical blog site, SSISTalk.com. He has contributed to SQL Saturdays, SQL PASS Summits, and the first volume of the SQL Server MVP Deep Dives book.

Most recently he has taken on the role of a full-time operational DBA managing over 120 database instances in the health-care insurance industry. He is an avid golfer and loves spending time with his wife and two children.

Brenner Grudka Lira joined NeuroTech as a Data Analyst in 2012. He has a Bachelor's degree in Computer Science from the Catholic University of Pernambuco in Recife, Brazil. He also has experience in building and modeling Data Warehouses and has knowledge of Oracle Warehouse Builder, SQL Server Integration Services, SAP Business Objects, and Oracle Business Intelligence Standard Edition One. Today, he is dedicated to the study of Business Intelligence with focus on the ETL process and Risk Management in Financial Operations.

April L. Rains has 13 years of experience building Business Intelligence, Web, and Windows applications using Microsoft tools and platforms. Working in the transportation and logistics industry for many years provided numerous opportunities for ETL, EAI, and trading partner EDI using both SSIS and BizTalk. She has a wide range of hands-on experience in multiple roles across the application lifecycle. You can e-mail her at <[email protected]> or contact her through her website at www.aprilrains.com.

I would like to thank my son Kieran who provides amazing and never-ending inspiration to me.

Rafael Salas is a Data Warehousing and Business Intelligence professional with more than a decade of experience in many industries and Fortune 500 companies. He provides technical leadership and helps organizations to improve performance through Business Intelligence strategies and solutions. His credentials include a Bachelor's degree in Computer Sciences, a Master's degree in Business and Technology, and a number of industry certifications. He has been recognized as Microsoft Most Valuable Professional (MVP) since 2007 and is a published author, blogger, and frequent speaker at conferences and technology community events. His specialties include architecture, Data Warehouse appliances, data integration, data quality, OLAP databases, and Dimensional Modeling. You can find more about him on his blog at www.rafael-salas.com.

Milla Smirnova is a Data Architect, DBA, and BI specialist. She possesses over 10 years of experience in Information Technology; most of those years of experience are in SQL Server Administration and Development. As her involvement with Business Intelligence technologies increased drastically within the last few years so has her passion for ETL design, development, and optimization utilizing SSIS.

I would like to thank my wonderful husband Larry for all his help and support. I would like to thank Maria and Nikolay as well.

I would also like to thank everyone at Packt Publishing for their encouragement and guidance.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books. 

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print and bookmark contentOn demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Instant Updates on New Packt Books

Get notified! Find out when new books are published by following @PacktEnterprise on Twitter, or the Packt Enterprise Facebook page.

Preface

Microsoft SQL Server 2012 Integration Services: An Expert Cookbook is a complete guide for everyone, from a novice to a professional in Integration Services 2012. SQL Server Integration Services is an ETL tool, which stands for Extract Transform and Load. There is a need for a data transfer system in all operational systems these days, and SSIS is one of the best data transfer tools. In this book, all aspects of SSIS 2012 are discussed with lots of real-world scenarios to help readers to understand usage of SSIS in every environment.

What this book covers

Chapter 1, Getting Started with SQL Server Integration Services, provides an overview of the ETL concepts and ETL terminologies, why ETL is needed in the technology world, and what problems ETL will solve. Then an overview of SSIS as an ETL tool is provided to help readers to get an overall view of the other parts of the book.

Chapter 2, Control Flow Tasks, explores all Control Flow Tasks with real-world samples of each Task. The reader will learn what each Task stands for, what is its usage, real-world scenarios, and the new tasks available in SSIS 2012.

Chapter 3, Data Flow Task Part 1—Extract and Load, explains the data sources and data destinations under the Data Flow Task. Data Flow Task is the most functional part of SSIS, to which an SSIS Developer probably dedicates most time.

Chapter 4, Data Flow Task Part 2—Transformations, explores the transformations used to apply data quality and business rules that are essential to prepare data loaded into destinations. Data Flow Task provides an easy way to transform source data into the form needed by its destination in several different ways.

Chapter 5, Data Flow Task Part 3—Advanced Transformation, briefly discusses Advanced Transformations. In real-world scenarios, different data sources don't provide the same structure, so there is a need to unify them in a unique structure. There are some transformations in SSIS Data Flow Task that use complex ways to apply such changes on data stream. We call them Advanced Transformations.

Chapter 6, Variables, Expressions, and Dynamism in SSIS, describes how SSIS works with dynamism with the aid of expressions, what are the limitations of some tasks in dynamism, and what are the alternative solutions. SSIS as an executable unit needs to have a structure for declaring in-memory variables and store some data in memory to pass between Tasks through the execution phase. Besides the variables, there is a built-in statement language in SSIS components and Tasks to do many operations such as data conversion, data splitting based on a condition, or creating text filenames based on date. In this chapter, readers will learn how to work with variables and expressions in many scenarios. Dynamism is the most powerful aspect of an ETL tool in data transfer operations.

Chapter 7, Containers and Precedence Constraints, explains three types of containers and precedence constraint in the SSIS Control Flow, which help developers to control the flow of task execution. All of these containers and the precedence constraints are covered in this chapter with real-world samples.

Chapter 8, Scripting, explains the powerful aspect of SSIS: scripting—developers can use scripting whenever other tasks or transformations can't help them to fulfill their requirements. There are two places for scripting in SSIS the—Script Task in Control Flow and Script Component in Data Flow. Scripting in both of these components will be covered in this chapter with samples.

Chapter 9, Deployment, describes how to deploy the developed packages and projects to a production environment, discussing different methods of deployment with the pros and cons of each way in real-world scenarios.

Chapter 10, Debugging, Troubleshooting, and Migrating Packages to 2012, explains the ability of SSIS to debug and troubleshoot like all robust systems. Developers need to know how to face problems in Control Flow or Data Flow, how to handle errors in Data Flow Task, and troubleshoot them. Debugging and troubleshooting have two sides in SSIS—Control Flow and Data Flow. This chapter describes both sides with appropriate examples. Also, this chapter has two recipes on migrating packages from the previous versions to 2012.

Chapter 11, Event Handling and Logging, explores all aspects of event handlers in SSIS besides logging in custom or built-in modes. SSIS provides a set of handlers for events on executable objects of Control Flow, which helps developers to handle these events and design appropriate operations on them. These event handlers also help developers to do some custom logging in their packages. There is a built-in logging feature in SSIS which can be used in general logging scenarios.

Chapter 12, Execution, covers different methods of package execution, and the properties and settings that can be configured at the time of execution.

Chapter 13, Restartability and Robustness, covers all these aspects of SSIS: SSIS has the structure to get input parameters from other applications. On the other hand, Packages can operate in a restartable mode. They can store their state at the time of failure and continue execution from that state next time. They are also capable of running Tasks in packages as a transaction.

Chapter 14, Programming SSIS, explains library classes for creating package and tasks, configuring them, deployment of a package, and running the package. Integration Services provide a set of .NET library classes and methods to do all parts of SSIS lifecycle operations from .NET programming.

Chapter 15, Performance Boost in SSIS, covers recommendations and best practices for raising the performance of packages and Data Flow. As an advanced part of each tool, there are some tips to raise the performance; they are described in this chapter.

What you need for this book

You need to have Microsoft SQL Server 2012 Business Intelligence Edition for running all recipes of this book.

Visual Studio 2010 is also needed for Chapter 14, Programming SSIS, which is about creating SQL Server Integration Services packages programmatically; so if you want to read and practice all the recipes in this book it is necessary to have Microsoft Visual Studio 2010.

Who this book is for

If you are a SQL database administrator or developer looking to explore all the aspects of SSIS and need to use SSIS in the data transfer parts of systems, then this is the best guide for you. Basic understanding of working with SQL Server Integration Services is required.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.

Code words in text are shown as follows: "This Data Flow reads some customer data (first name and last name) from an Excel file, applies some common transformations and inserts the data into an SQL table named SalesLT.Customer."

A block of code is set as follows:

<title>The First Book</title> <title>Becoming Somebody</title> <title>The Poet's First Poem</title>

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

<xsd:element name="genre" type="xsd:string"/> <xsd:element name="price" type="xsd:float" minOccurs="0" maxOccurs="unbounded" /> <xsd:element name="pub_date" type="xsd:date" minOccurs="0" maxOccurs="unbounded" /> <xsd:element name="review" type="xsd:string"/>

Any command-line input or output is written as follows:

x "C:\SSIS\Ch02_ControlFlowTasks\R03_FTP Task\LocalFolder\files.7z"

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "If any error occurs while executing the process, the error can be stored into a variable with the StandardErrorVaraible option".

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Getting Started with SQL Server Integration Services

by Reza Rad and Pedro Perfeito

In this chapter, we will cover the following topics:

Import and Export Wizard: First experience with SQL Server Integration Services (SSIS)Getting started with SSDTCreating the first SSIS packageGetting familiar with Data Flow TaskSSIS 2012 versus previous versions in Developer Experience

Introduction

As technology evolves, it is always necessary to integrate data between different systems. The integration component is increasingly gaining importance, especially the component responsible for data quality as well as the cleaning rules applied between source and destination databases. Different vendors have their own integration tools and components, and Microsoft with its SSIS tool is recognized as one of the leaders in this field.

SSIS can be used to perform a broad range of data integration tasks, and the most common scenarios are applied to Data Warehousing. The known term associated with Data Warehousing is the Extract Transform and Load (ETL) that is responsible for the extraction of data from several sources, their cleansing, customization, and loading into a central repository (for example, to a Data Warehouse, Data Mart, Hub, and so on). SSIS is also used in other scenarios, for example data migration and data consolidation. Data Migration is the one-time movement of data between databases and computer systems, and is needed when changes occur or when we upgrade our systems. Data Consolidation combines and integrates data from disparate systems and assumes high importance in a business environment with increasing acquisitions and mergers. The following diagram adapted from TDWI (www.tdwi.org) helps clarify the different scenarios where SSIS could be used:

New business challenges are driving organizations to adopt data integration projects. Some of these challenges are:

Increasing demand for real-time information reporting and analysisLarge volumes of data spread along the entire organizationThe need to comply with regulations, which often require to continuously track all changes to data and not just the net result of those changes

Although SSIS is an amazing tool for data integration, the same work can be done manually in almost all cases. As you can imagine, performing data integration tasks manually could be hard to maintain in terms of code, hard to scale properly, and would require more time to implement. From our perspective, since we have SSIS, there is no real reason to do it manually. The cost of ownership is not a problem either, because SSIS is included with SQL Server licenses that most organizations have already acquired.

In this chapter you will learn how to work with SSIS, how to create packages for data transfer, and you'll perform some simple operations with SSIS Package. At the end, we will highlight several improvements which are included in this new version.

Note

As we will cover many recipes in this book, it is advisable to have Adventure Works SQL 2012 sample database installed.

Getting started with SSDT

This recipe is an overview of SQL Server Data Tools (SSDT), where a user will spend most of his/her time while developing and maintaining SSIS projects.

This version is based on Visual Studio 2010, and the whole structure that supports the process of developing such projects has been significantly improved. Working with SSDT is not only easier for advanced users who require more flexibility, but also for beginners who can enjoy some new and interesting user interfaces to help them take their first steps with SSDT. Previous versions of SSIS used Business Intelligence Development Studio (BIDS) as their development environment.

How to do it...

Open SQLServerDataTools (SSDT) through the shortcut placed under Microsoft SQL Server 2012 or Open Microsoft Visual Studio 2010 under the Microsoft Visual Studio 2010 Start menu folders.

Once SSDT is open, a start page will be seen by default. The Start Page window contains useful information about the SSDT environment such as recently opened projects, links to create or open an existing project, and is also a useful area with several resources and the latest news to help stay up to date about several Microsoft platforms such as Windows, Web, Cloud, and so on.

Now that SSDT is already opened, let's create a new SSIS project from the Start Page window in order to understand the basic steps as well as the remaining windows placed in the SSIS project example.

Click on New Project… and a Windows dialog will appear.Under Installed Templates, expand Business Intelligence and click on Integration Services. In the center pane, select Integration Services Project.Name the project as R02_Getting Started with SSDT. Name the solution as Ch01_Getting Start with SQL Server Integration Services in C:\SSIS and click on OK. An empty SSIS project will be created using the Project Deployment Model approach (default) with an empty package included.In the Solution Explorer pane, right-click on the SSIS Package folder, and choose Add Existing Package.In the Add Copy of Existing Package dialog box, set Package location to File System and choose the package path from the file that you saved in the previous recipe from this address: C:\SSIS\Ch01\Ch01R01_ImportExportWizard.dtsx.The new package will be added under the SSIS Packages folder, double-click on the package name in Solution Explorer to open it in Package Designer.Double-click on Preparation SQL Task 1 and the Execute SQL Task Editor dialog will open. Verify the SQL Statement property with a click on the ellipsis button in front of SQL Statement, and then close the editor.Double-click on Data Flow Task 1, and you will be redirected to the Data Flow tab, there are three source or destination combinations in Data Flow.Double-click on the Source-Department component and the OLE DB Source Editor will open, verify the table name there.Double-click on Destination-Department, and in the OLE DB Destination Editor, verify the connection and table name.

The next recipe will explain the process of creating a new SSIS Package in more detail, and for that reason this recipe will focus on how we could get more value from SSDT to make the development and maintenance easier and faster.

How it works...

Now that the SSDT is open with an empty package, let's describe some of the windows that you should be familiar with, as shown in the next screenshot:

By default, SSDT creates a new and empty SSIS Package named package.dtsx. A package is a collection of SSIS objects including connection managers, tasks and components.

Package design area ( 1 )

Control Flow is the most important tab; it's where a developer "explains" to SSIS what the package will do. The remaining tabs such as Data Flow (see recipe), Parameters (see Chapter 11, Event Handling and Logging), Event Handlers (see Chapter 10, Debugging, Troubleshooting, and Migrating Packages to 2012), the Package Explorer and Progress bar (available just at runtime) are also important and will be described in later recipes.

Solution Exlorer ( 2 )

The Solution Explorer section contains projects and their files.

Each project consists of Project Parameters, Connection Managers, SSIS Packages, and the Miscellaneous folder.

Project Parameters are parameters which are public for all packages in the project. We will discuss parameters in later chapters.

The Connection Managers folder in the Solution Explorer consists of shared connection managers which are shared between all packages in a project.

All SSIS Packages will be listed under the SSIS Packages folder.

The Miscellaneous folder can consist of any other files that are relevant to projects and packages, files such as documentation files, screenshots, and so on.

Properties panel ( 3 )

In this panel, it's possible to read and edit the properties of each selected object in the Design area or Solution Explorer.

SSIS Toolbox ( 4 )

In the SSIS Package, there are tasks and components which will be available depending on the tab selected in the Package design area. When the tab selected is Control Flow, the SSIS Toolbox will be grouped into four areas. The groups of tasks are the Favorites, Common, Containers, and Other Tasks. With these tasks, it's possible to control and inform SSIS about what should be done during execution. An interesting tip is that you can add tasks to the Favorites area anytime you like by right-clicking on each task and selecting Move to Favorites.

Note that the SSIS Toolbox is completely different on the Data Flow tab; we will talk about it in later recipes.

Connection Managers ( 5 )

Connection Managers are connections from the SSIS Package's components to source or destination data providers. There are different types of connection managers, some of them which are much in use are OLE DB Connection manager, Flat File Connection Manager and so on.

Each connection manager can be used in one or more components in the SSIS Package to work with underlying data provider. Some data providers require the installation of special drivers to have connections to their data source.

Each connection manager which is relevant to the current package will be listed in the Connection Manager's pane. Some connections are bold, these are referenced from a shared project's connection manager.

We will discuss more about connections in the next recipes.

Variables Pane ( 6 )

Each task in SSIS Package can send information to other tasks and it is possible by resorting to Variables. Package variables, their data types, their scope, and other properties exist in this pane, which will be described in greater detail in later chapters.

Creating the first SSIS Package

After understanding the SSDT environment, you will be able to make your first SSIS Package. Depending on the Data Integration project's complexity, it's always recommended to think carefully while selecting tasks and components to apply, as well as the order in which to execute them.

In this recipe, the first package created will read the number of records in an Adventure Works Microsoft Sample table and store it inside the SSIS Package.

Getting ready

You can reuse the recipe created in the previous section (adding a new package to it), or start from scratch as explained in the following steps:

Open SQL Server Data Tools (SSDT).Click on New Project… and a Windows dialog will appear.Click on the Business Intelligence Projects tab and under the Installed Templates section, select Integration Services Project.Provide a name and location for the SSIS project and an empty structure as well as a package will be created.In the Solution Explorer, select the empty package.dtsx and rename it to: P01_FirstSSISPackage.dtsx.

How to do it...

Now that the SSIS project is created, ensure that the empty package created by default is open and follow these steps:

Create an OLEDB Connection to the Adventure Works Microsoft sample database at the package level. (As already mentioned, if you create this connection in the Solution Explorer window, the connection will be created at the project level and will be automatically included inside all the existent packages).In the Configure OLE DB Connection Manager Editor, select New... to create a new connection. If the connection already exists in the Data Connections list then select it here.

Tip

Ensure that the Control Flow tab is selected in the Package Designer area.

Drag-and-drop Execute SQL Task from SSIS Toolbox and place into the control flow design surface.Double-click to edit Execute SQL Task