Pentaho Data Integration Kitchen - Sergio Ramazzina - E-Book

Pentaho Data Integration Kitchen E-Book

Sergio Ramazzina

0,0
22,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Pentaho PDI is a modern, powerful, and easy-to-use ETL system that lets you develop ETL processes with simplicity. Explore and gain the experience and skills that you need to run processes from the command line or schedule them by using an extensive description and a good set of samples.

Instant Pentaho Data Integration Kitchen How-to will help you to understand the correct way to deal with PDI command line tools. We start with a recipe about how to configure your memory requirements to run your processes effectively and then move forward with a set of recipes that show you the different ways to start PDI processes.

We start with a recap about how transformations and jobs are designed using spoon and then move forward to configure memory requirements to properly run your processes from the command line.

We dive into the various flags that control the logging system by specifying the logging output and the log verbosity. We focus and deliver all the knowledge you require to run the ETL processes using command line tools with ease and in a proficient manner.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 87

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Instant Pentaho Data Integration Kitchen
Credits
About the Author
About the Reviewer
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
How the story began…
Kettle components
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Instant Pentaho Data Integration Kitchen
Designing a simple PDI transformation (Simple)
Getting ready
How to do it...
There's more...
How to quickly find the steps to use
Designing a simple PDI job (Simple)
Getting ready
How to do it...
How it works...
There's more...
Why a proper naming for tasks and steps is so important
Using internal variables to write location-independent processes
The important role of icon and color indicators
Configuring command-line tools to run properly (Simple)
Getting ready
How to do it...
There's more...
Making things easier by writing custom scripts
Executing PDI jobs from a filesystem (Simple)
Getting ready
How to do it…
Executing PDI jobs packaged in archive files (Intermediate)
Getting ready
How to do it...
How it works...
There's more...
Changes in job and transformation design
Executing PDI jobs from the repository (Simple)
Getting ready
How to do it...
There's more...
Changes in job and transformation design
How to define a filesystem repository
Defining a database repository
Dealing with the execution log (Simple)
Getting ready
How to do it...
There's more...
Understanding the log to identify where our process fails
Separating execution logfiles by date and time
Discovering your PDI repository from the command line (Simple)
Getting ready
How to do it...
Exporting jobs and transformations to the .zip files (Simple)
Getting ready
How to do it...
How it works...
There's more...
Managing PDI processes return code (Simple)
Getting ready
How to do it...
There's more...
A summary of Kitchen/Pan exit codes
Scheduling PDI jobs and transformations (Intermediate)
Getting ready
How to do it...
There's more...
Understanding crontab malfunctions

Instant Pentaho Data Integration Kitchen

Instant Pentaho Data Integration Kitchen

Copyright © 2013 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2013

Production Reference: 1240713

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-84969-690-6

www.packtpub.com

Credits

Author

Sergio Ramazzina

Reviewer

Joel Latino

Acquisition Editor

Erol Staveley

Commissioning Editor

Shreerang Deshpande

Technical Editor

Sampreshita Maheshwari

Copy Editor

Insiya Morbiwala

Project Coordinator

Suraj Bist

Proofreader

Paul Hindle

Production Coordinator

Zahid Shaikh

Cover Work

Prachali Bhiwandkar

Cover Image

Aditi Gajjar

About the Author

Sergio Ramazzina is a software architect/trainer with over 20 years of experience working on a large number of projects for banks and major Italian companies as well as designing complex enterprise solutions in Java/JavaEE and Ruby. He started using Pentaho products from the very beginning (late 2003), gaining vast experience by deploying Pentaho as an open source, standalone BI solution. He also deeply integrated Pentaho as the analytics engine of choice in other applications he designed. Starting from 2009, based on his experience in the Java/JavaEE world and because of his appreciation for the open source world and its principles, he began participating actively as a contributor to some Pentaho projects, such as JPivot, Saiku, CDF, and CDA, and he has achieved the title of Pentaho Active Contributor.

In late 2010, he founded Serasoft, a young Italian consulting company specialized in the design and delivery of open source business intelligence solutions, and he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main heroes. He is also the CTO of Athilab (Athirat Innovation Lab), sharing his experience in the design and delivery of high-value innovative enterprise solutions. He is always looking for innovative solutions that can help users make their work more efficient. He is also passionate about skiing, tennis, and photography.

About the Reviewer

Joel Latino was born in Ponte de Lima, Portugal, in 1989. He has been working in the IT industry since 2010, mostly as a software developer and BI developer.

He started his career at Xpand-IT—a Portuguese company specialized in strategic planning, consulting, implementation, and the maintenance of enterprise software that is fully adapted to the customer's needs—and earned his graduate degree in Informatics Engineering at the School of Technology and Management of the Viana do Castelo Polytechnic Institute.

Joel mainly focuses on open source web technology, databases, and business intelligence, and has some fascination with mobile technologies. He is responsible for developing some plugins to Pentaho Data Integration, such as Android and Apple push notification steps.

I would like to thank my parents for supporting me throughout my career and endeavors.

www.PacktPub.com

Support files, eBooks, discount offers and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print and bookmark contentOn demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Pentaho Data Integration (PDI) is an ETL tool that was born 10 years ago. Its creator, Matt Caster, celebrated the 10th anniversary of this product, originally named Kettle (you can read the celebratory post on Matt's blog at: http://www.ibridge.be/?p=211), this year on March 8th 2013. The term K. E. T. T. L. E. is an acronym that stands for Kettle Extraction Transformation Transport Load Environment. When Pentaho acquired Kettle, its name was changed to Pentaho Data Integration, but actually, many developers continue to call it by the old name: Kettle.

How the story began…

The history of Kettle began in 2001 when Matt Caster, Pentaho Data Integration's chief architect and creator of Kettle, was working as a BI consultant. He had the idea of writing his own ETL tool to have a better and cheaper way to transfer data from one place to another. He was looking for a different solution, something that was better than inventing ugly data warehouse solutions written in PL/SQL, VB, or Shell scripts. He spent two years doing a thorough analysis of the problem. Because he was busy all the time with his work as a consultant, he worked on this project either during the weekends or at night. After this phase, he came out with a set of analyses documents and a couple of test programs written in C. He was not fully satisfied with what he got, so by early 2003, he started looking towards Java and continued his work on the product on this platform that, in those years, was gaining more traction in the market. So by the mid of 2003, the first version of the ETL design tool named Stir (which is now called Spoon) came to life.

It is interesting to see a screenshot of how things were then:

Stir featured a big X on the graphical view, and the log view was not working and neither were most step dialogs; but, it is useful for you to understand what the starting point of this adventure was. A certain number of other releases came out, each with a different set of new features or bugs fixed.

In 2004, work was reasonably stable and he was able to deploy Kettle for the first time to a customer. Because of the "real-world" situation, a lot of things needed to be fixed and new features needed to be implemented. That was why, in those days, things were advancing a lot faster than they were in the first three years. It seemed that the code base grew so fast that several refactorings and code cleanings were needed. Version 2.0 was one of the last "unstructured" versions. But it was thanks to the Java expertise from companies such as ixor (Wim De Clerq especially) that Kettle survived and changed radically. They helped Matt a lot with refactoring and code reorganizations to give the application a better structure and to simplify the code. At that time, Kettle had a fairly complete first release with support for slow-changing dimensions, junk dimensions, 28 steps, and 13 database connectors.

The application that was initially closed source was open sourced in late 2005. The first version under this new licensing mode was published in December 2005, and the response from the community was massive.

Kettle components

As of today, PDI is one of the best ETL open source solutions; it is made up of the following components:

Spoon