22,79 €
Pentaho PDI is a modern, powerful, and easy-to-use ETL system that lets you develop ETL processes with simplicity. Explore and gain the experience and skills that you need to run processes from the command line or schedule them by using an extensive description and a good set of samples.
Instant Pentaho Data Integration Kitchen How-to will help you to understand the correct way to deal with PDI command line tools. We start with a recipe about how to configure your memory requirements to run your processes effectively and then move forward with a set of recipes that show you the different ways to start PDI processes.
We start with a recap about how transformations and jobs are designed using spoon and then move forward to configure memory requirements to properly run your processes from the command line.
We dive into the various flags that control the logging system by specifying the logging output and the log verbosity. We focus and deliver all the knowledge you require to run the ETL processes using command line tools with ease and in a proficient manner.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 87
Veröffentlichungsjahr: 2013
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2013
Production Reference: 1240713
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-84969-690-6
www.packtpub.com
Author
Sergio Ramazzina
Reviewer
Joel Latino
Acquisition Editor
Erol Staveley
Commissioning Editor
Shreerang Deshpande
Technical Editor
Sampreshita Maheshwari
Copy Editor
Insiya Morbiwala
Project Coordinator
Suraj Bist
Proofreader
Paul Hindle
Production Coordinator
Zahid Shaikh
Cover Work
Prachali Bhiwandkar
Cover Image
Aditi Gajjar
Sergio Ramazzina is a software architect/trainer with over 20 years of experience working on a large number of projects for banks and major Italian companies as well as designing complex enterprise solutions in Java/JavaEE and Ruby. He started using Pentaho products from the very beginning (late 2003), gaining vast experience by deploying Pentaho as an open source, standalone BI solution. He also deeply integrated Pentaho as the analytics engine of choice in other applications he designed. Starting from 2009, based on his experience in the Java/JavaEE world and because of his appreciation for the open source world and its principles, he began participating actively as a contributor to some Pentaho projects, such as JPivot, Saiku, CDF, and CDA, and he has achieved the title of Pentaho Active Contributor.
In late 2010, he founded Serasoft, a young Italian consulting company specialized in the design and delivery of open source business intelligence solutions, and he started participating as a BI architect and Pentaho expert on a wide number of projects where open source BI and Pentaho were the main heroes. He is also the CTO of Athilab (Athirat Innovation Lab), sharing his experience in the design and delivery of high-value innovative enterprise solutions. He is always looking for innovative solutions that can help users make their work more efficient. He is also passionate about skiing, tennis, and photography.
Joel Latino was born in Ponte de Lima, Portugal, in 1989. He has been working in the IT industry since 2010, mostly as a software developer and BI developer.
He started his career at Xpand-IT—a Portuguese company specialized in strategic planning, consulting, implementation, and the maintenance of enterprise software that is fully adapted to the customer's needs—and earned his graduate degree in Informatics Engineering at the School of Technology and Management of the Viana do Castelo Polytechnic Institute.
Joel mainly focuses on open source web technology, databases, and business intelligence, and has some fascination with mobile technologies. He is responsible for developing some plugins to Pentaho Data Integration, such as Android and Apple push notification steps.
I would like to thank my parents for supporting me throughout my career and endeavors.
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Pentaho Data Integration (PDI) is an ETL tool that was born 10 years ago. Its creator, Matt Caster, celebrated the 10th anniversary of this product, originally named Kettle (you can read the celebratory post on Matt's blog at: http://www.ibridge.be/?p=211), this year on March 8th 2013. The term K. E. T. T. L. E. is an acronym that stands for Kettle Extraction Transformation Transport Load Environment. When Pentaho acquired Kettle, its name was changed to Pentaho Data Integration, but actually, many developers continue to call it by the old name: Kettle.
The history of Kettle began in 2001 when Matt Caster, Pentaho Data Integration's chief architect and creator of Kettle, was working as a BI consultant. He had the idea of writing his own ETL tool to have a better and cheaper way to transfer data from one place to another. He was looking for a different solution, something that was better than inventing ugly data warehouse solutions written in PL/SQL, VB, or Shell scripts. He spent two years doing a thorough analysis of the problem. Because he was busy all the time with his work as a consultant, he worked on this project either during the weekends or at night. After this phase, he came out with a set of analyses documents and a couple of test programs written in C. He was not fully satisfied with what he got, so by early 2003, he started looking towards Java and continued his work on the product on this platform that, in those years, was gaining more traction in the market. So by the mid of 2003, the first version of the ETL design tool named Stir (which is now called Spoon) came to life.
It is interesting to see a screenshot of how things were then:
Stir featured a big X on the graphical view, and the log view was not working and neither were most step dialogs; but, it is useful for you to understand what the starting point of this adventure was. A certain number of other releases came out, each with a different set of new features or bugs fixed.
In 2004, work was reasonably stable and he was able to deploy Kettle for the first time to a customer. Because of the "real-world" situation, a lot of things needed to be fixed and new features needed to be implemented. That was why, in those days, things were advancing a lot faster than they were in the first three years. It seemed that the code base grew so fast that several refactorings and code cleanings were needed. Version 2.0 was one of the last "unstructured" versions. But it was thanks to the Java expertise from companies such as ixor (Wim De Clerq especially) that Kettle survived and changed radically. They helped Matt a lot with refactoring and code reorganizations to give the application a better structure and to simplify the code. At that time, Kettle had a fairly complete first release with support for slow-changing dimensions, junk dimensions, 28 steps, and 13 database connectors.
The application that was initially closed source was open sourced in late 2005. The first version under this new licensing mode was published in December 2005, and the response from the community was massive.
As of today, PDI is one of the best ETL open source solutions; it is made up of the following components:
