43,19 €
Alteryx allows you to create data manipulation and analytic workflows with a simple, easy-to-use, code-free UI, and perform fast-executing workflows, offering multiple ways to achieve the same results. The Alteryx Designer Cookbook is a comprehensive guide to maximizing your Alteryx skills and determining the best ways to perform data operations
This book's recipes will guide you through an analyst's complete journey, covering all aspects of the data life cycle. The first set of chapters will teach you how to read data from various sources to obtain reports and pass it through the required adjustment operations for analysis. After an explanation of the Alteryx platform components with a particular focus on Alteryx Designer, you’ll be taken on a tour of what and how you can accomplish by using this tool. Along the way, you’ll learn best practices and design patterns. The book also covers real-world examples to help you apply your understanding of the features in Alteryx to practical scenarios
By the end of this book, you’ll have enhanced your proficiency with Alteryx Designer and an improved ability to execute tasks within the tool efficiently
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 449
Veröffentlichungsjahr: 2023
Over 60 recipes to transform your data into insights and take your productivity to a new level
Alberto Guisande
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Reshma Raman
Publishing Product Manager: Apeksha Shetty
Content Development Editor: Joseph Sunil
Technical Editor: Devanshi Ayare
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Manju Arasan
Production Designer: Alishon Mendonca
Marketing Coordinator: Vinishka Kalra
First published: October 2023
Production reference: 1231023
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1R.
ISBN 978-1-80461-508-9
www.packtpub.com
To Mariana, my beloved wife, Agustina and Martín, the best daughter and son you can aspire to have. Thank you for being the reason that I open my eyes every morning and for making me better every day.
To my parents, for setting the foundations for who I am now.
To my fellow Alteryx ACEs, a second family I found in life.
To Roberto, the best business partner (cum friend) you can find, thank you for being so patient with me over the years.
Alberto Guisande has been an Alteryx ACE since 2015 and is currently a C.A.O. at Decision Science, a company dedicated to providing data science consulting services to its clients. He has held top management positions in banking and consulting companies and General Management in logistics and Supply Chain.
With over 15 years of experience in the field of data science, he has helped numerous companies define and adopt an analytical culture, based on the exploitation of their data. He has created centers of excellence in numerous companies in America and serves as Chief Data Officer (CDO) in several of his clients.
He is certified in Alteryx Core, Alteryx Advanced, Alteryx Server, Tableau, and Vertica among others. In 2021, he was appointed as Veritone Ambassador by Veritone, an AI & ML leader company.
He is an international instructor in the Data Science for Business and Big Data for Business courses at the Instituto Tecnológico de Monterrey, having also collaborated in the creation of their pensums. He also continuously participates as a speaker at specialized data science conferences and exhibitions.
He serves as a mentor in thePredictive Analytics for Business Nanodegree Program at Udacity.com
Having Alteryx in my toolbox is the best thing that ever happened to me as a data professional – Alberto Guisande
I want to thank all the people who dedicated time to help with this book, especially the team from Packt (Apeksha, Kirti, and Joseph), and my family for being so patient while I worked on it.
Thales Donizeti Silva, an engineer with advanced degrees in Business Analytics and AI, has over 8 years of experience in Business Intelligence. Based in Sao Paulo, Brazil, he’s a Senior Analytics Engineering Consultant at phData and has led multiple Alteryx projects in the United States over the last 4 years. Thales is an Alteryx ACE and holds the Alteryx Designer Expert certification. Beyond his professional roles, Thales leads the Brazilian Alteryx User Group, owns the website “Alteryx Para Todos”, runs the YouTube channel “Alteryx Simplifica”, and mentors in the “Adote um Júnior” volunteer project. He’s a Top Community Contributor with over 500 solutions in the Alteryx Community and has presented in 2 Alteryx Conferences.
Jean-Baptiste has been an Alteryx fan since he first tried it, loving the logic of the tool and working the same way the tool does. Since then, he has skilled up completing weekly challenges, writing blogs, passing certifications, and so on. He then became an Alteryx ACE in 2022, won the European Alteryx Grand Prix in Amsterdam, and passed the Expert Certification all in the same year. He always tries to give back to the community by helping others and building challenges for others to solve.
Marcus Montenegro is a certified expert in Alteryx, Matillion, and KNIME with years of experience in analytics engineering projects. He co-founded the first Matillion user group, is one of the Alteryx user group leaders in Brazil, and is one of the few Alteryx ACEs around the world. At phData, Marcus leads a team of consultants, assisting them in growing, delivering services, and pursuing business and professional objectives.
Due to his passion for teaching and assisting others, Marcus created his own YouTube channel to teach Alteryx to the Brazilian community for free.
Marcus has worked with clients to integrate multiple systems into Alteryx implementations and has taught new specialists for every company he has worked for.
Alteryx allows you to create repetitive data manipulation and analytic workflows with a simple, easy-to-use, code-free/code-friendly UI and performs very fast executing those workflows, offering multiple ways to achieve the same results.
This book is the result of compiling Alteryx users’ frequent day-to-day use cases/situations they deal with and a comprehensive guide for leveraging your skills in Alteryx and finding the best way to perform these operations.
Through the chapters, this book’s recipes will take you through an analyst’s complete journey, covering all aspects of the data life cycle, from reading it, no matter what source you do it from, to obtaining reports/reports, and passing through the adjustment operations necessary for analysis. You’ll learn how to read numerous and disparate files and databases and handle huge amounts of data. You’ll then perform operations and transformations to your data to suit your needs. Also, you’ll learn how to group data in different ways and criteria and blend different types of data sources. You’ll understand how to pivot and un-pivot your data for easy manipulation and perform aggregations and calculations on your data. You will also learn how to encapsulate reusable logic into components (Macros), connect and download data from APIs and save your data or create reports in various formats.
By the end of this book, you will notice how your use of Alteryx Designer has improved and how you have discovered how to best perform tasks within it.
This book is intended for any professional data user (from business intelligence professionals, data analysts, Citizen Data Scientists to Data Scientists) with a basic knowledge of Alteryx Designer, who wants to leverage its usage.
In summary, anyone who wants to:
Learn how to tackle analytics operations like a masterImprove their data manipulation skillsSave a lot of time and effort to get your data-driven insightsUse best practices to get better with Alteryx DesignerChapter 1, Inputting Data from Files, will show you how Alteryx Designer makes connecting to any type of file very easy. Every format has its own requirements and oddities. You will get to know the best way to unleash the whole reading and writing capabilities of the Designer.
We’ll show you recipes to handle and improve the way you connect and consume different file formats.
Chapter 2, Working with Databases, will explore some recipes to improve reading and writing from databases. As it does with files, Alteryx Designer allows you to connect with a vast amount of databases.
Chapter 3, Preparing Data, will guide you to help shape messy data into a workable dataset.
Chapter 4, Transforming Data, will explore various data transformations such as extracting/parsing fields, breaking one column into many, transposing, or converting data types, which are very common and widely needed operations when you work with data.
Chapter 5, Parsing Data,will show you how to extract or interpret what’s inside a field value or making an unreadable field readable.
Chapter 6, Grouping Data, will show you how classifying your data into groups is very important when you work in real life use cases.
Chapter 7, Blending and Merging Datasets, will teach you how to convert the data into information and enrich it with various techniques like relating, joining, or unioning datasets.
Chapter 8, Aggregating Data, will showcase recipes to summarize and sort data. You will also get your first brush with visualizations.
Chapter 9, Dynamic Operations, will allow you to take advantage of the dynamic toolset that Alteryx Designer offers. These will allow you to handle the data dynamically.
Chapter 10, Macros and Apps, will show how to reuse repetitive pieces of logic, build applications with user input, to make Alteryx workflows even more dynamic and simpler to maintain.
Chapter 11, Downloads, APIs, and Web Services, will explore how to make use of the cloud and web services to get data from various sources.
Chapter 12, Developer Tools, showcases varioous external tools to solve some challenges that data presents. Alteryx offers a lot of options to incorporate source code, use existing programs, or execute external scripts to accomplish these tasks.
Chapter 13, Reporting with Alteryx, will show you how to properly communicate your data. These recipes will guide you in converting your information into business insights.
Chapter 14, Outputting Data, is our last step in this journey. Saving the resulting data from our workflows for later use is crucial to every analyst in every Company. These recipes will guide you into effective saving of your resulting data.
You’ll need to install Alteryx Designer on your computer (at the time of writing this book, the latest version was 22.4, and 23.1 was in beta).
For some recipes, access to a database engine will be necessary.
Software/Hardware covered in the book
OS Requirements
Alteryx Designer
Windows, Mac OS X (can be used with the help of Parallels or similar mechanisms to run Windows applications within Mac OS)
If you don’t have an Alteryx License, you can get a free trial from here:https://bit.ly/ayx_trial_Cookbook
Or by scanning the following QR Code:
The test sets available for download contain the whole finished workflow/s. I strongly recommend using them to compare with the ones you create in the book.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Alteryx-Designer-Cookbook. In case there’s an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “ Drop an Input Data tool onto the canvas and point it to ..\DATA\Customers_by_City\ARVADA.csv.”
A block of code is set as follows:
IF CONTAINS([Name],[TY]) THEN "TY" ELSEIF CONTAINS([Name],ToString(ToNumber([TY])-1)) THEN "LY" ELSE "" ENDIFWhen we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
!IsEmpty([Description])Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: “Add another Dynamic Select tool following the Dynamic Rename tool.”
Tips or important notes
Appear like this.
In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There’s more..., and See also).
To give clear instructions on how to complete a recipe, use these sections as follows:
This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.
This section contains the steps required to follow the recipe.
This section usually consists of a detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.
This section provides helpful links to other useful information for the recipe.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Once you’ve read Alteryx Designer Cookbook, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?
Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781804615089
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyAs data workers, no matter how mature and evolved your actual working environment is, you’ll have to deal with a good number of files and different formats, even when they refer to the same kind of data.
Through the years, I’ve seen a lot of analysts struggling with the same problems, such as vendor files that are not consistent, structure changes through versions of the same file, and so on.
In this chapter, we’ll start defining recipes for those common situations you’ll encounter when reading files using Alteryx Designer.
We’ll read multiple Excel files from a single directory, multiple worksheets from an Excel file, and multiple Excel files having different structures between worksheets. Specifically, we’ll look at the following recipes:
Reading a worksheet from multiple Excel files all at onceReading all worksheets from an Excel file all at once—same schemaDifferent schema Excel worksheets all at onceSaving and preserving Excel sheet formatsAccessing and determining which files are to be processed from a file repositoryHandling formats when reading text filesTo begin practicing and reproducing these recipes by yourself, you need to have Alteryx Designer installed on your computer. If you have no access to it, you can download a 1-month trial here: https://bit.ly/ayx_trial_Cookbook.
To follow along with the recipes included in this chapter, you can download the ch1 folder from GitHub here:
https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1
Important note
If you decided to apply these recipes to your own data, please read carefully the Getting ready section of each one, to be sure that your data fits the use case.
This is one of the most common situations you’ll find when working with files. Maybe it’s sales data, purchase orders, or any type of data—at the end, you’ll have a bunch of Excel files in a folder or subfolders and you’ll need to read, consolidate, and analyze them.
This recipe will help you with the case where those files are in the same folder (including its subfolders) and the files have the same structure (columns).
Even though the recipe was crafted using Excel files (because they present an extra complication, including different worksheets within), it can be adjusted and will work with any file format you need to deal with.
If you want to try this recipe before applying it to your own data, we have a test set here: https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe1. You’ll find a ZIP file with a directory structure and files within it. We provided one file per year of sales data from 2018 to 2022, called MonthlySales_XXXX.xlsx (XXXX meaning the year of sales contained within that specific file), monthly summarized (1 month of each year per row), stored in a worksheet called MonthlySales.
If you’re planning to follow this recipe with your own data, make sure your files are within the same folder (and/or subfolders within a unique parent folder) and all the worksheets you want to input have the same name:
Figure 1.1: Folder hierarchy
The directory structure should look like this:
Data Root Folder Files within the root Folder Data Sub Folder 1 Files within Sub Folder 1 Data Sub Folder 2 Files within Sub Folder 2We’ll get started with the following steps:
Drag an Input Data tool (from the In/Out category) and drop it onto the canvas.Point your input to one of the files you want to read (preferably on the topmost folder) and select it:Figure 1.2: Selecting a file
Since we are selecting an Excel file, Alteryx Designer will ask you to select a worksheet within the selected file, so we do that (remember that for this recipe, all worksheets you need to read must have the same name and—of course— structure):
Figure 1.3: Selecting a worksheet
You’ll get your Input Data tool configuration pointing to this file and with the default configurations:
Figure 1.4: Input Data configurations
Now is when we start to exploit Alteryx Designer’s configuration options to fulfill our needs.
Option 4 in the Configuration panel reads Search SubDirs. If we enable it, Alteryx will not only search for the file/s pointed to in the file selection, but also will be looking through all the subfolders beneath the actual folder:Figure 1.5: Enabling SubDirs
So far, Alteryx will be looking for the MonthlySales_2022.xlsx file in the \DATA subfolder and all of its subfolders.
Replace the 2022 part of the filename with a * sign, like this:Figure 1.6: Replacing part of the filename
And with that little change, you’ll be telling Alteryx Designer to read the MonthlySales worksheet from all files within the \DATA folder and its subfolders that start with MonthlySales_ and have anything in their names before the extension (.xlsx).
Give it a try, and you’ll see that the contents of Sales for all years are now read by Alteryx.
Alteryx Designer allows you to use wildcard characters (* and ?) that give you the option of replacing a specific character or a sequence of them.
Use * when you need to replace any string, no matter its length.
Use ? when you need to replace a character in the indicated position.
For instance, MonthlySales_*.xlsx will tell Alteryx Designer to read all files having MonthlySales_ in the filename, followed by any character or characters (MonthlySales_1234.xlsx, MonthlySales_1.xlsx, and so on), and that end with a .xlsx extension.
When using ?, it’ll replace one character per ? used, in the exact position where it is. For instance, MonthlySales?2018.xlsx will read MonthlySales_2018.xlsx, MonthlySales-2018.xlsx, MonthlySalesX2018.xlsx, and so on, but MonthlySales%%2018.xlsx won’t (because there are two characters between MonthlySalesand 2018).
So, for example, this configuration will allow you to read all CSV files contained in C:\DATASETS\DATA\ that start with CityBike:
Figure 1.7: Reading multiple CSVs at once
If you want to keep track of where every record comes from, you can use the Output File Name as Field option in the Configuration pane. This will create an additional column in your dataset, with the filename from where that record was read:
Figure 1.8: Getting a record’s source file
Also, within Alteryx Designer, you can refer to a file location using the “dots convention," where a single dot (.) specifies the current workflow directory and two dots (..) refers to a parent folder from the actual workflow location.
Have a look at the following examples:
.\mydata.csv → Refers to a mydata.csv file located in the same folder as the actual workflow..\DATA\mydata.csv → Refers to a mydata.csv file located in a folder called DATA, at the same hierarchy level as the folder containing the actual workflowAnother common situation is to have all your data spread across multiple worksheets. So, we need to read them all and put them together:
Figure 1.9: Our data file worksheets
As always, we’ll make it simple for you, since Alteryx Designer allows us to.
If you want to follow along with this recipe, please download our test set here:
https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe2
Here, you’ll find an Excel file with yearly data spread across multiple worksheets, and by applying this recipe, we’ll be reading them all at once.
If you want to try this recipe with your data, make sure of the following:
All your worksheets have the same schema (field/column names and data types).You have a list of worksheets to import (we’ll get this directly when first accessing the file using a helpful option Alteryx has).You have a file with the schema to be used as a template. This file should be always available to the workflow, and won’t be imported if it’s not within the files-to-import list. If your file meets the first requirement, you can use the same file as the template.Use thefollowing steps:
Drop an Input Data tool onto the canvas, and point it to the Excel file you need to read:Figure 1.10: Selecting a file to read
Alteryx will prompt you for which of the worksheets you want to read.
Select the Import only the list of sheet names option:Figure 1.11: Selecting a list of sheet names
The selection will produce a list of sheets within the selected file, like this one:
Figure 1.12: Results after reading the file
It’s important to note that at this point, we still don’t have the actual contents of those sheets, only their names.
To read the contents, we need the help of the Dynamic Input tool, so we drop one onto the canvas (from the Developer category).Click on Edit… in the Input Data Source Template section, and you’ll notice that the screen that pops up is the same as the Input Datatool one.Point Input Data Source Template to your actual Excel file (or one that has the same schema as the one you need to read), and since we’re reading an Excel file, Alteryx Designer will ask for the sheet we want to connect to.Select one of the sheets and hit OK:Figure 1.13: Selecting one of the sheets to use as a template
Hit OK again, and your template will be ready to be used as the sheet’s schema for the upcoming data.Now, we need to tell the Dynamic Input tool how we expect it to behave.
So, for the Read a List of Data Sources option, we are going to use the contents of the [Sheet Names] field, and for the Action field, we’ll select Change File/Table Name:
Figure 1.14: Setting data sources/tables/sheet names
Figure 1.15: Setting the action to perform, based on field values
If you run your workflow, you’ll see how the contents of each worksheet are read by Alteryx and stacked together:
Figure 1.16: Workflow results
As its name suggests, the Dynamic Input tool performs several dynamic reading operations based on the configuration we set up.
In this particular case, we instructed the tool to read a list of worksheets (or tables) based on what’s been gathered by the Input Data tool (the sheet names within our file), stating that the expected scheme or structure for those sheets must match the one we used as a template.
So, our first input record is Sales_2018 (the name of the first sheet name), so Alteryx Designer compares its schema against the one we indicated. If it matches, Alteryx reads the data within it; if not, it throws an error. In both cases, Alteryx moves to the next record and performs the same tasks until we reach the end of the file.
Sometimes we need to keep track of where every record comes from. Alteryx Designer allows us to do that very easily.
Just select the Output File Name as Field option, and Alteryx will add a new field to your data, containing—depending on which option you selected—either File Name Only or Full Path (the entire absolute path to the file) information:
Figure 1.17: Output File Name as Field option
The most complicated situation is when we need to consolidate Excel files that have different schemas.
As we saw in recipe #2, if the schema of the worksheets is not the same, we get a warning from Alteryx Designer, and the contents of that worksheet are not read:
Figure 1.18: Warning for unmatching schema
In this recipe, we’ll create a batch macro to read each worksheet within an Excel file, even when they have a different schema.
We prepared a test set for you to follow along with this recipe, available at https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe3.
If you decided to use your data to try this recipe, make sure that your files have different schemas (with some fields in common so that you can see the effects of different configurations).
Make sure that under the Options→User Settings→Edit User Settings→Macros tab, you have at least one directory/folder set up as a macro repository (this makes it easier for Alteryx Designer to find your macros). As you can see, there can be several repositories configured, but one of them will be the default one:
Figure 1.19: Default macro repositories
Figure 1.20: Input Data tool configuration
From the Interface category, drop a Control Parameter tool onto the canvas and change its label to something meaningful (this is going to be your tool’s input anchor):Figure 1.21: Control Parameter tool in the canvas
Once you have both tools, click and drag from the black Q connector of the Control Parameter tool and connect it to the lightning icon on top of the Input Data tool. You’ll notice that Alteryx Designer inserts an Action tool in the middle.Click on the Action tool to configure it, and select Update Input Data Tool from the Select an action type: dropdown, making sure that the Required option is selected:Figure 1.22: Updating the action type
From the Interface category, drop a Macro Output tool and connect it to the Input Data tool output anchor (this is going to be the output anchor of your tool):Figure 1.23: Completed macro
Open Interface Designer (from the View menu or by pressing Ctrl + Alt + D):Figure 1.24: Interface Designer
Click on the gear icon at the left of the pop-up window so that you can access the output options:Figure 1.25: Macro output options
Make sure that Output fields change based on macro’s configuration or data input is checked, and select Auto Configure by Name (Wait Until All Iterations Run) from the Output Mode option:Figure 1.26: Macro output options configured
Now, save the workflow.If you pay attention, Alteryx Designer will present you the Save dialog, pointing to the same folder you have configured as the default macro repository, and the extension of the file to be saved is .yxmc (“yx” for Alteryx; “mc” for macro).
Name your macro BatchReadWorksheets.yxmc and save it. Close your workflow.Now, create a new workflow and right-click on the canvas.Go to Insert, and you’ll notice a Macro… option way down on the menu:Figure 1.27: Inserting a macro into a new workflow
Click on Macro…, and Alteryx will present you with a File Open dialog. Navigate to your macro repository and select the BatchReadWorksheets.yxmc file.You’ll see that Alteryx Designer inserted a “tool” in the canvas with one input and one output anchor:
Figure 1.28: Batch macro waiting for data to be connected
Now, drop a Directory tool from the In/Out category onto the canvas and connect its output to our tool’s parameter input anchor (marked with ¿).Point the Directory tool to the directory where your Excel files are:Figure 1.29: Directory tool pointing to the folder
Now, click again on our macro, and select the [FullPath] field from the Choose Field: FullPath to Read option:Figure 1.30: Selecting a field to use as the path to read files
Drop a Browse tool following the macro and run the workflow.You’ll see that all records from the Excel files are there, but the structures are not quite complete.
Notice that there’ll be columns with a lot of Null values. This happens because some of the worksheets don’t have that particular column, so Alteryx Designer fills them with Null values.
When working with Excel files, we often have to report using existing templates that have already been formatted, and we need to respect those formats.
Alteryx allows you to preserve the existing formatting, and we’ll go through the process of getting this done within Alteryx Designer.
To follow along with this recipe, you can download our test set from here:
https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe4
If you decide to try it with your data, make sure you define ranges on which the data must be saved.
To make this recipe work, we need to have a range defined within the Excel file.
Our sample set contains a pre-formatted Excel worksheet, with explicit and conditional formats:
Figure 1.31: Excel template
We are going to tackle this recipe in two phases:
Grabbing the templateWriting to the templateWe’ll start reading our data source, so drop an Input Data tool onto the canvas and point it to MonthlySales_*.xlsx (you can revisit recipe #1 to see how this works):Figure 1.32: Reading the data source
This’ll be the data we’ll process and later save in our report template file.
Drop a Block Until Done tool (from the Developer category) following the Input Data tool.To be able to save data into the template, we need to make a copy of the actual template in the final location where we want our output. We’re going to use \OUTPUT as the location.
Also, we need to save our workflow first, for which Alteryx Designer provides a very handy variable ([Engine.WorkflowDirectory]) that stores the value to where our workflow is stored; otherwise, it’ll point to a temp directory and we’ll lose our relative references to the working files.
Drop a Sample tool onto the canvas and configure it to keep only the first record (First N rows selected and 1):Figure 1.33: Configuring the Sample tool
Drop a Formula tool and connect it to the output anchor labeled 1 in the Block Until Done tool.Create a new column called Copy_Command and use this formula as its expression:
'copy "' + [Engine.WorkflowDirectory] +'..\TEMPLATE\ReportingTemplate.xlsx" "' + [Engine.WorkflowDirectory]+ '..\OUTPUT\"'As stated earlier, [Engine.WorkflowDirectory] refers to the folder where the workflow is stored (and runs from). So, no matter where we put it or copy it, it’ll keep the relative references (such as ..\DATA, for example).
Since our data contains more columns than needed, we’ll drop a Select tool onto the canvas, to only keep the [Copy_command] field:
Figure 1.34: Keeping only the [Copy_command] data field
To finish this step, we need to tell Alteryx Designer to execute the OS command contained within the [Copy_command] field, so for that, we need a Run Command tool (it’s in the Developer category).
Drop it and configure it as in the following screenshot, making sure that you use %temp%\Copy.bat as the Write Source value and the same value for the Run External Program command:Figure 1.35: Run Command tool configuration
So far, if we run this workflow, we’ll see that the original template located in the \TEMPLATES folder was copied to the \OUTPUT folder.
From now on, we can use it to write data within.
We can see we have three sections in the template to “fill” (Monthly Sales, starting at A11; Annual Sales, starting at F11; and there is another one— B3, where we are going to insert the date of the report’s issue). You can see that we named those ranges too in the template: MONTHLY_SALES, YEARLY_SALES, and REPORT_DATE.
Additionally, the template has some conditional formatting rules configured, and a chart inserted to show the monthly sales in a line chart.
To fill the monthly sales section of the report, we only need to write the contents of our incoming data, so for that, drop a Sort tool and connect it to the output anchor of the Block Until Done tool, labeled 2. Sort the data by date in ascending order, and following that, drop an Output Data tool onto the canvas.Point the Write to File or Database value to where the template was copied (in our case, it’s the OUTPUT folder located one level up (..\) to where our workflow is in the hierarchy), and select the ReportingTemplate.xlsx file.Make sure that options 5 and 6 (Skip Field Names and Preserve Formatting on Overwrite (Range Required)) are checked:Figure 1.36: Configuring the Output Data tool
When Alteryx asks to specify a sheet, we have two options:
Use the named range as the sheet name if we have one:Figure 1.37: Specifying a named range to write data to
Specify a range within the worksheet to save our data (notice that we define A11:B as the range because we know we’ll be starting at A11, but we don’t know where we’ll be ending at B):Figure 1.38: Specifying a range within a worksheet to write data to
For yearly sales, we need to perform an extra step, which consists of summarizing sales values per year.
To be able to do this with the actual data we have, we need to extract the [Year] value from the actual [DATE] field. This can be easily done by dropping a Formula tool onto the canvas. Connect this Formula tool to the output anchor 3 of the existing Block Until Done tool.Add a new column on the Formula tool named [YEAR] and use this expression to extract the year from the [DATE] field: DateTimeFormat([DATE],"%Y")Now that we have the [YEAR] value of each record, we need a Summarize tool to perform the corresponding aggregations.
Drop a Summarize tool from the Transform category and configure it, like this:Figure 1.39: Summarizing tool configuration process
Note
To configure the Summarize tool, first, select a field from the top box, then click on Add, and from the dropdown, select the operation you want to apply to that particular field.
So, this configuration will give us the total sum of [MONTHLY_SALES] and the maximum (MAX) value of [DATE] per [YEAR]:
Figure 1.40: Summarized sales per year
Now, to save the data, drop another Output Data tool and configure it.Select ReportingTemplate.xlsx from the OUTPUT directory, and use the YEARLY_SALES named range to write the data. Again, make sure that options 5 and 6 (Skip Field Names and Preserve Formatting on Overwrite (Range Required)) are selected; otherwise, you’ll not only overwrite existing data but you’ll be removing the existing formats:Figure 1.41: Writing yearly sales to the template
To finish, we’ll add the date on which the report was issued.
To do so, we need a Formula tool that creates that value for us. Drop one, connect it to the Select output we added in the previous step (the same anchor the Output Data tool connects to), and create a new column called [REPORT_DATE] with this expression: DateTimeStart()Add a Select tool following the Formula tool, and just keep the [REPORT_DATE] field and a Sample tool to get the first row (First N rows, 1).Finally, add a new Block Until Done tool and then drop an Output Data tool, and again select our ..\OUTPUT\ReportingTemplate.xlsx file. In this case, select the REPORT_DATE named range and make sure the Skip Field Names and Preserve Formatting on Overwrite (Range Required) options are both checked:Figure 1.42: Writing the report-issuing date to a named range
Your workflow should look very close to this:
Figure 1.43: Overview of the finished workflow
If you run the workflow and open \OUTPUT\ReportTemplate.xlsx, it’ll look like this:
Figure 1.44: Final results (ranges and chart updated)
Let’s start explaining the Block Until Done tool, firstly according to Alteryx’s documentation (https://help.alteryx.com/20221/designer/block-until-done-tool):
“Use Block Until Done to stop datasets from moving downstream until the last record in the set has been processed by all previous tools. In addition, this tool ensures that only a single output stream at a time receives the completed dataset. Subsequent streams are blocked until all the dataset records are pushed through the 1st stream.”
As you will have realized by now, this tool is essential for performing multiple write operations within the same file from the same workflow, serializing the operations connected to it.
The order in which we connected the tools to the Block Until Done tool is important too because they determine the order in which those operations will be executed.
Thanks to the Run Command tool, we can execute external commands such as the one we created to copy files.
It is a very powerful tool to interact with our environment, including the OS, or custom applications we can integrate into Alteryx:
Figure 1.45: The “anatomy” of the Run Command tool
You can learn more about the Block Until Done tool here:
https://help.alteryx.com/current/designer/block-until-done-tool
And you can find out more about the Run Command tool here:
https://help.alteryx.com/current/designer/run-command-tool
Sometimes, data is put in shared folders across the company’s network, and we have no control over when and how that data is updated. We have some users updating files within a shared folder, and we need to keep the last updated one from a set of various versions of the same file.
This recipe will help you determine the newest files to be processed from within a folder.
To follow along with this recipe, use the ch1\r5 files from GitHub:
https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe5
If you decided to use your data to follow up on this recipe, the only thing you need to make sure of is that all your files are in one directory (and/or subdirectories).
Select *.xlsx as the File Specification value and make sure Include subdirectories is checked:
Figure 1.46: Directory tool configuration
Run your workflow and focus on the Results panel of Alteryx Designer:Figure 1.47: Workflow results
You’ll see important information about the files within the selected directory, such as the following:
[FullPath]—Absolute path to the file[Directory]—Absolute path to the directory containing the file[FileName]—Name and extension of the file[ShortFileName]—For backward compatibility, shows the 8.3 filename format (mostly, you won’t use this)[CreationTime]—Datetime of file creation (registered by the OS)[LastAccessTime]—Last time the file was accessed[LastWriteTime]—Last time the file was written within[Size]—Size in bytesAlong with these values, you’ll see additional ones, depending on which attributes the OS has on each file:
Figure 1.48: Additional workflow results
We can use all these values to create conditions to read files.
For example:
Getting the last modified fileAfter the Directory tool, drop a Sort tool and sort by [LastWriteTime] in descending order.Drop a Sample tool, select First N rows, and set 1.If our files contain worksheets with the same name, we can apply recipe #2 (we’ll follow this path here) to read the selected file’s content, using the [FullPath] data instead of [List of Sheet Names]. If not, we can reshape recipe #3 to read the selected file content (which will be explained in the There’s more… section of this recipe).
To do so, we need to add a Formula tool to tell Alteryx which worksheet we’ll be reading from the instructed file/s.
Add the Formula tool and create a new column called ToRead, with this expression: [FullPath] + "|||MonthlySales"This will add the worksheet name (MonthlySales) to be read from the selected files (the ||| separator is how Alteryx Designer recognizes tables/worksheets within databases/files).
Now, drop a Dynamic Input tool following the Formula tool:Figure 1.49: Our workflow so far
Change the Dynamic Input tool configuration. Using the recently generated [ToRead] field, we are telling the tool to carry out the following action: Change Entire File Path (this is where we’ll change the recipe #2 configuration):Figure 1.50: Dynamic Input tool configuration
Running the workflow will give us the contents of the last modified file within a folder:
Figure 1.51: Dynamic Input tool configuration
The Directory tool allows you to target a specific directory/folder (and its subfolders if wanted), select a file pattern, and return a list of files from that folder with their attributes, allowing us to make decisions about which files to keep and what to discard in our workflow.
Once we got the file list from the desired location, we sorted the files by one of their attributes (LastWriteTime) in descending order, so the newest file will be always the first in our list, and having the Sample tool configured to get the first (1) record, we are completely certain that always gets the last modified one.
Note
We can make endless decisions here, using Filter tools to select files based on any criteria we need to apply to finally read the ones we need, so feel free to take this recipe and adjust it to get the content you need from a bunch of files sitting in a folder.
We made this recipe using Excel files because this format presents us with an additional challenge since they may contain several worksheets (or tables) within each file, and we need to tell Alteryx Designer which ones we need to read. That’s why we need to add a Formula tool before reading the actual contents of the files, adding the ||| notation to the [FilePath] value and building the entire path, not only to the file but to the worksheet within it.
From there, we can now leverage the Dynamic Input tool with another of its powerful methods, Change Entire File Path, based on the results of the [ToRead] field we created in the Formula tool.
As stated before, if you combine this recipe with recipe #3, you can have a more dynamic way to read files, since you won’t be depending on the worksheets having the same name (you’ll be picking them up from the <List of Sheet Names> option Alteryx gives you).
Also, you can keep track of the files already processed simply by appending the [FullPath] field of the processed file/s and the date you processed those files to some kind of custom log (maybe another Excel file).
This method is very useful to avoid reprocessing files that have already been processed (before reading the contents, just join your Directory tool data stream with your custom log and only process the unjoined records coming from the Directory tool) and will save you a lot of time:
Figure 1.52: Joining input with a log of processed files
Lots of applications provide data in a flat, fixed-width format. Even when Alteryx doesn’t have a .txt format reader, it allows us to extract data from this type of file, by offering a parsing functionality (similar to what Excel has to determine columns and field types).
Every time we need to access this type of file, we need to know how and where is data organized, and most of the time, these files come with a “data dictionary” telling us the internal structure of the files.
In this recipe, we’ll guide you to build an automated fixed-length file parser, leveraging some of Alteryx Designer’s functionalities.
We got you covered to try this recipe with a sample set here:
https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe6
This sample data has a dictionary, indicating to us the name, type, size, and scale of each field within it. You can find the data dictionary in a file called FileStructure.xlsx inside the DATA DEFINITION folder of the test set:
Name
Type
Size
Scale
TRIPDURATION
Int32
4
STARTTIME
DateTime
19
STOPTIME
DateTime
19
START_STATION_ID
V_String
4
START_STATION_NAME
V_String
33
START_STATION_LATITUDE
FixedDecimal
13
9
START_STATION_LONGITUDE
FixedDecimal
13
9
STATION_ID
Int32
4
END_STATION_NAME
V_String
33
END_STATION_LATITUDE
FixedDecimal
13
9
END_STATION_LONGITUDE
FixedDecimal
13
9
BIKE_ID
V_String
5
MEMBERSHIP_TYPE
V_String
1
USERTYPE
V_String
10
BIRTH_YEAR
V_String
4
GENDER
Byte
1
Table 1: FileStructure.xlsx content
If you’ll be following along with your data, please make sure that you have a fixed text file and the field definitions for that file (field names, data types, and lengths).
Figure 1.53: Input data
Now, add a Formula tool to the Input Data tool output anchor and create a new column, also named XML, with the following expression: ' <field name="'+[Name]+'" type="'+ [Type] +'" length="'+ ToString([Size])+ IIF([Type]="FixedDecimal",'" scale="'+ToString([Scale]),'') +'" />'Use a Select tool following the Formula tool to only select the XML field (be sure to uncheck the *Unknownfield too).Drop a Union tool and connect the output anchors of the tools in this order:First text input—Its connection will be labeled #1Select tool—Will become connection #2Second text input—Connection #3At this point, youll have something similar to this:
Figure 1.54: Connections to the Union tool
Now, click on the Union tool, make sure the method (on the top dropdown) is set to Auto Config by Name, which is the default setting, and check Set a Specific Output Order at the bottom of the configuration panel to set the order:Figure 1.55: Setting the specific output order of records in the Union tool
To finish, we need to save the resulting XML file onto the DATA folder for later use, so we need to drop an Output Data tool following the Union tool and configure it, as in the following screenshot:Figure 1.56: Saving the XML file (.flat)
Notice the \0 delimiter in option 3 and the unchecked box for First Row Contains Field Names (option 4).
If you browse to your \DATA folder, you’ll see the GeneratedFlatFile.flat file there.
