Alteryx Designer Cookbook - Alberto Guisande - E-Book

Alteryx Designer Cookbook E-Book

Alberto Guisande

0,0
43,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Alteryx allows you to create data manipulation and analytic workflows with a simple, easy-to-use, code-free UI, and perform fast-executing workflows, offering multiple ways to achieve the same results. The Alteryx Designer Cookbook is a comprehensive guide to maximizing your Alteryx skills and determining the best ways to perform data operations
This book's recipes will guide you through an analyst's complete journey, covering all aspects of the data life cycle. The first set of chapters will teach you how to read data from various sources to obtain reports and pass it through the required adjustment operations for analysis. After an explanation of the Alteryx platform components with a particular focus on Alteryx Designer, you’ll be taken on a tour of what and how you can accomplish by using this tool. Along the way, you’ll learn best practices and design patterns. The book also covers real-world examples to help you apply your understanding of the features in Alteryx to practical scenarios
By the end of this book, you’ll have enhanced your proficiency with Alteryx Designer and an improved ability to execute tasks within the tool efficiently

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 449

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Alteryx Designer Cookbook

Over 60 recipes to transform your data into insights and take your productivity to a new level

Alberto Guisande

BIRMINGHAM—MUMBAI

Alteryx Designer Cookbook

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Reshma Raman

Publishing Product Manager: Apeksha Shetty

Content Development Editor: Joseph Sunil

Technical Editor: Devanshi Ayare

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Manju Arasan

Production Designer: Alishon Mendonca

Marketing Coordinator: Vinishka Kalra

First published: October 2023

Production reference: 1231023

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1R.

ISBN 978-1-80461-508-9

www.packtpub.com

To Mariana, my beloved wife, Agustina and Martín, the best daughter and son you can aspire to have. Thank you for being the reason that I open my eyes every morning and for making me better every day.

To my parents, for setting the foundations for who I am now.

To my fellow Alteryx ACEs, a second family I found in life.

To Roberto, the best business partner (cum friend) you can find, thank you for being so patient with me over the years.

Contributors

About the author

Alberto Guisande has been an Alteryx ACE since 2015 and is currently a C.A.O. at Decision Science, a company dedicated to providing data science consulting services to its clients. He has held top management positions in banking and consulting companies and General Management in logistics and Supply Chain.

With over 15 years of experience in the field of data science, he has helped numerous companies define and adopt an analytical culture, based on the exploitation of their data. He has created centers of excellence in numerous companies in America and serves as Chief Data Officer (CDO) in several of his clients.

He is certified in Alteryx Core, Alteryx Advanced, Alteryx Server, Tableau, and Vertica among others. In 2021, he was appointed as Veritone Ambassador by Veritone, an AI & ML leader company.

He is an international instructor in the Data Science for Business and Big Data for Business courses at the Instituto Tecnológico de Monterrey, having also collaborated in the creation of their pensums. He also continuously participates as a speaker at specialized data science conferences and exhibitions.

He serves as a mentor in thePredictive Analytics for Business Nanodegree Program at Udacity.com

Having Alteryx in my toolbox is the best thing that ever happened to me as a data professional – Alberto Guisande

I want to thank all the people who dedicated time to help with this book, especially the team from Packt (Apeksha, Kirti, and Joseph), and my family for being so patient while I worked on it.

About the reviewers

Thales Donizeti Silva, an engineer with advanced degrees in Business Analytics and AI, has over 8 years of experience in Business Intelligence. Based in Sao Paulo, Brazil, he’s a Senior Analytics Engineering Consultant at phData and has led multiple Alteryx projects in the United States over the last 4 years. Thales is an Alteryx ACE and holds the Alteryx Designer Expert certification. Beyond his professional roles, Thales leads the Brazilian Alteryx User Group, owns the website “Alteryx Para Todos”, runs the YouTube channel “Alteryx Simplifica”, and mentors in the “Adote um Júnior” volunteer project. He’s a Top Community Contributor with over 500 solutions in the Alteryx Community and has presented in 2 Alteryx Conferences.

Jean-Baptiste has been an Alteryx fan since he first tried it, loving the logic of the tool and working the same way the tool does. Since then, he has skilled up completing weekly challenges, writing blogs, passing certifications, and so on. He then became an Alteryx ACE in 2022, won the European Alteryx Grand Prix in Amsterdam, and passed the Expert Certification all in the same year. He always tries to give back to the community by helping others and building challenges for others to solve.

Marcus Montenegro is a certified expert in Alteryx, Matillion, and KNIME with years of experience in analytics engineering projects. He co-founded the first Matillion user group, is one of the Alteryx user group leaders in Brazil, and is one of the few Alteryx ACEs around the world. At phData, Marcus leads a team of consultants, assisting them in growing, delivering services, and pursuing business and professional objectives.

Due to his passion for teaching and assisting others, Marcus created his own YouTube channel to teach Alteryx to the Brazilian community for free.

Marcus has worked with clients to integrate multiple systems into Alteryx implementations and has taught new specialists for every company he has worked for.

Table of Contents

Preface

1

Inputting Data from Files

Technical requirements

Reading a worksheet from multiple Excel files all at once

Getting ready

How to do it…

How it works…

There’s more…

Reading all worksheets from an Excel file at once – same schema

Getting ready

How to do it…

How it works…

There’s more…

Different schema Excel worksheets all at once

Getting ready

How to do it…

Saving and preserving Excel sheet formats

Getting ready

How to do it…

How it works…

See also (follow-up steps)

Accessing and determining which files are to be processed from a file repository

Getting ready

How to do it…

How it works…

There’s more…

Handling formats when reading text files

Getting ready

How to do it…

How it works…

2

Working with Databases

Technical requirements

Cursor behavior, but more efficient

Getting ready

How to do it…

How it works…

There’s more…

Working with Calgary databases

Getting ready

How to do it…

How it works…

There’s more…

DCM – setting up credentials

Getting ready

How to do it…

How it works…

DCM – setting up a connection

Getting ready

How to do it…

How it works…

There’s more…

Getting information from your In-DB connection/query

Getting ready

How to do it…

How it works…

There’s more…

See also (follow-up steps)

3

Preparing Data

Technical requirements

Saving field configurations with the Select tool

Getting ready

How to do it…

How it works…

Working with ranges

Getting ready

How to do it…

How it works…

There’s more…

Applying data transformations to more than one field at a time

Getting ready

How to do it…

How it works…

There’s more…

Using the previous/next N records' data in the current record operation

Getting ready

How to do it…

How it works…

There’s more…

4

Transforming Data

Pivoting (and un-pivoting) your data

Getting ready

How to do it…

How it works…

Fixing your field names after a Cross Tab operation

Getting ready

How to do it…

How it works…

Laying out your data in columns

Getting ready

How to do it…

How it works…

There’s more…

Arranging data

Getting ready

How to do it…

How it works…

There’s more…

5

Data Parsing

Technical requirements

Preparing, identifying, and classifying free text

Getting ready

How to do it…

How it works…

There’s more…

Getting the last part of a delimited string

Getting ready

How to do it…

How it works…

Working with RegEx

Getting ready

How to do it…

See also

6

Grouping Data

Grouping data

Getting ready

How to do it…

How it works…

Finding groups within groups

Getting ready

How to do it…

How it works…

There’s more…

Exploring the Make Group tool

Getting ready

How to do it…

How it works…

Binning numeric fields

Getting ready

How to do it…

How it works…

7

Blending and Merging Datasets

Technical requirements

Join or Find Replace

Getting ready

How to do it…

Unioning data sources with different structures

Getting ready

How to do it…

How it works…

Advanced joining (using conditions in your joins)

Getting ready

How to do it…

How it works…

Appending fields to your data

Getting ready

How to do it…

How it works…

There’s more…

8

Aggregating Data

Technical requirements

Using Cross Tab to summarize values

Getting ready

How to do it…

How it works…

Preparing data for visualizations

Getting ready

How to do it…

How it works…

See also

Grouping and summarizing your data

Getting ready

How to do it…

Appending fields to data sources

Getting ready

How to do it…

How it works…

There’s more...

9

Dynamic Operations

Dynamically renaming fields

Getting ready

How to do it…

How it works…

Dynamically selecting fields

Getting ready

How to do it…

Dynamically replacing values

Getting ready

How to do it…

How it works…

10

Macros and Apps

Technical requirements

Distributing data connections

Getting ready

How to do it…

How it works…

There’s more…

Using Detour in your data stream

Getting ready

How to do it…

How it works…

Enabling/disabling tool containers

Getting ready

How to do it…

How it works…

There’s more…

Working with the underlying XML

Getting ready

How to do it…

Populating a list box with the fields of a File Browse tool selection

Getting ready

How to do it…

Notifying yourself or others

Getting ready

How to do it…

How it works…

11

Downloads, APIs, and Web Services

Technical requirements

Managing APIs’ XML responses

Getting ready

How to do it…

How it works…

Parsing JSON

Getting ready

How to do it…

Getting paged results from APIs (Iterative Macros)

Getting ready

How to do it…

There’s more…

12

Developer Tools

Executing external programs/batches/scripts from inside a workflow

Getting ready

How to do it…

How it works…

There’s more…

See also

Basic Python usage – performing optical character recognition (OCR) from images

Getting ready

How to do it…

How it works…

Using binary large objects (blobs) to read and write raw/binary fields

Getting ready

How to do it…

There’s more…

13

Reporting with Alteryx

Technical requirements

Loading images for reports dynamically

Getting ready

How to do it…

Arranging your report snippets

Getting ready

How to do it…

How it works…

Rendering your reports

Getting ready

How to do it…

Using HTML and JavaScript in your reports

Getting ready

How to do it…

There’s more…

14

Outputting Data

Technical requirements

Using the Render tool to conditionally format your Excel files

Getting ready

How to do it…

Creating your own execution metadata

Getting ready

How to do it…

Using Events to back up your files

Getting ready

How to do it…

How it works…

There’s more…

See also (follow-up steps)

Using Pre and Post SQL commands

Getting ready

How to do it…

Index

Other Books You May Enjoy

Preface

Alteryx allows you to create repetitive data manipulation and analytic workflows with a simple, easy-to-use, code-free/code-friendly UI and performs very fast executing those workflows, offering multiple ways to achieve the same results.

This book is the result of compiling Alteryx users’ frequent day-to-day use cases/situations they deal with and a comprehensive guide for leveraging your skills in Alteryx and finding the best way to perform these operations.

Through the chapters, this book’s recipes will take you through an analyst’s complete journey, covering all aspects of the data life cycle, from reading it, no matter what source you do it from, to obtaining reports/reports, and passing through the adjustment operations necessary for analysis. You’ll learn how to read numerous and disparate files and databases and handle huge amounts of data. You’ll then perform operations and transformations to your data to suit your needs. Also, you’ll learn how to group data in different ways and criteria and blend different types of data sources. You’ll understand how to pivot and un-pivot your data for easy manipulation and perform aggregations and calculations on your data. You will also learn how to encapsulate reusable logic into components (Macros), connect and download data from APIs and save your data or create reports in various formats.

By the end of this book, you will notice how your use of Alteryx Designer has improved and how you have discovered how to best perform tasks within it.

Who this book is for

This book is intended for any professional data user (from business intelligence professionals, data analysts, Citizen Data Scientists to Data Scientists) with a basic knowledge of Alteryx Designer, who wants to leverage its usage.

In summary, anyone who wants to:

Learn how to tackle analytics operations like a masterImprove their data manipulation skillsSave a lot of time and effort to get your data-driven insightsUse best practices to get better with Alteryx Designer

What this book covers

Chapter 1, Inputting Data from Files, will show you how Alteryx Designer makes connecting to any type of file very easy. Every format has its own requirements and oddities. You will get to know the best way to unleash the whole reading and writing capabilities of the Designer.

We’ll show you recipes to handle and improve the way you connect and consume different file formats.

Chapter 2, Working with Databases, will explore some recipes to improve reading and writing from databases. As it does with files, Alteryx Designer allows you to connect with a vast amount of databases.

Chapter 3, Preparing Data, will guide you to help shape messy data into a workable dataset.

Chapter 4, Transforming Data, will explore various data transformations such as extracting/parsing fields, breaking one column into many, transposing, or converting data types, which are very common and widely needed operations when you work with data.

Chapter 5, Parsing Data,will show you how to extract or interpret what’s inside a field value or making an unreadable field readable.

Chapter 6, Grouping Data, will show you how classifying your data into groups is very important when you work in real life use cases.

Chapter 7, Blending and Merging Datasets, will teach you how to convert the data into information and enrich it with various techniques like relating, joining, or unioning datasets.

Chapter 8, Aggregating Data, will showcase recipes to summarize and sort data. You will also get your first brush with visualizations.

Chapter 9, Dynamic Operations, will allow you to take advantage of the dynamic toolset that Alteryx Designer offers. These will allow you to handle the data dynamically.

Chapter 10, Macros and Apps, will show how to reuse repetitive pieces of logic, build applications with user input, to make Alteryx workflows even more dynamic and simpler to maintain.

Chapter 11, Downloads, APIs, and Web Services, will explore how to make use of the cloud and web services to get data from various sources.

Chapter 12, Developer Tools, showcases varioous external tools to solve some challenges that data presents. Alteryx offers a lot of options to incorporate source code, use existing programs, or execute external scripts to accomplish these tasks.

Chapter 13, Reporting with Alteryx, will show you how to properly communicate your data. These recipes will guide you in converting your information into business insights.

Chapter 14, Outputting Data, is our last step in this journey. Saving the resulting data from our workflows for later use is crucial to every analyst in every Company. These recipes will guide you into effective saving of your resulting data.

To get the most out of this book

You’ll need to install Alteryx Designer on your computer (at the time of writing this book, the latest version was 22.4, and 23.1 was in beta).

For some recipes, access to a database engine will be necessary.

Software/Hardware covered in the book

OS Requirements

Alteryx Designer

Windows, Mac OS X (can be used with the help of Parallels or similar mechanisms to run Windows applications within Mac OS)

If you don’t have an Alteryx License, you can get a free trial from here:https://bit.ly/ayx_trial_Cookbook

Or by scanning the following QR Code:

The test sets available for download contain the whole finished workflow/s. I strongly recommend using them to compare with the ones you create in the book.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Alteryx-Designer-Cookbook. In case there’s an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “ Drop an Input Data tool onto the canvas and point it to ..\DATA\Customers_by_City\ARVADA.csv.”

A block of code is set as follows:

IF CONTAINS([Name],[TY]) THEN "TY" ELSEIF CONTAINS([Name],ToString(ToNumber([TY])-1)) THEN "LY" ELSE "" ENDIF

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

!IsEmpty([Description])

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: “Add another Dynamic Select tool following the Dynamic Rename tool.”

Tips or important notes

Appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There’s more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There’s more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Share Your Thoughts

Once you’ve read Alteryx Designer Cookbook, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?

Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781804615089

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

1

Inputting Data from Files

As data workers, no matter how mature and evolved your actual working environment is, you’ll have to deal with a good number of files and different formats, even when they refer to the same kind of data.

Through the years, I’ve seen a lot of analysts struggling with the same problems, such as vendor files that are not consistent, structure changes through versions of the same file, and so on.

In this chapter, we’ll start defining recipes for those common situations you’ll encounter when reading files using Alteryx Designer.

We’ll read multiple Excel files from a single directory, multiple worksheets from an Excel file, and multiple Excel files having different structures between worksheets. Specifically, we’ll look at the following recipes:

Reading a worksheet from multiple Excel files all at onceReading all worksheets from an Excel file all at once—same schemaDifferent schema Excel worksheets all at onceSaving and preserving Excel sheet formatsAccessing and determining which files are to be processed from a file repositoryHandling formats when reading text files

Technical requirements

To begin practicing and reproducing these recipes by yourself, you need to have Alteryx Designer installed on your computer. If you have no access to it, you can download a 1-month trial here: https://bit.ly/ayx_trial_Cookbook.

To follow along with the recipes included in this chapter, you can download the ch1 folder from GitHub here:

https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1

Important note

If you decided to apply these recipes to your own data, please read carefully the Getting ready section of each one, to be sure that your data fits the use case.

Reading a worksheet from multiple Excel files all at once

This is one of the most common situations you’ll find when working with files. Maybe it’s sales data, purchase orders, or any type of data—at the end, you’ll have a bunch of Excel files in a folder or subfolders and you’ll need to read, consolidate, and analyze them.

This recipe will help you with the case where those files are in the same folder (including its subfolders) and the files have the same structure (columns).

Even though the recipe was crafted using Excel files (because they present an extra complication, including different worksheets within), it can be adjusted and will work with any file format you need to deal with.

Getting ready

If you want to try this recipe before applying it to your own data, we have a test set here: https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe1. You’ll find a ZIP file with a directory structure and files within it. We provided one file per year of sales data from 2018 to 2022, called MonthlySales_XXXX.xlsx (XXXX meaning the year of sales contained within that specific file), monthly summarized (1 month of each year per row), stored in a worksheet called MonthlySales.

If you’re planning to follow this recipe with your own data, make sure your files are within the same folder (and/or subfolders within a unique parent folder) and all the worksheets you want to input have the same name:

Figure 1.1: Folder hierarchy

The directory structure should look like this:

Data Root Folder          Files within the root Folder          Data Sub Folder 1                   Files within Sub Folder 1          Data Sub Folder 2                   Files within Sub Folder 2

How to do it…

We’ll get started with the following steps:

Drag an Input Data tool (from the In/Out category) and drop it onto the canvas.Point your input to one of the files you want to read (preferably on the topmost folder) and select it:

Figure 1.2: Selecting a file

Since we are selecting an Excel file, Alteryx Designer will ask you to select a worksheet within the selected file, so we do that (remember that for this recipe, all worksheets you need to read must have the same name and—of course— structure):

Figure 1.3: Selecting a worksheet

You’ll get your Input Data tool configuration pointing to this file and with the default configurations:

Figure 1.4: Input Data configurations

Now is when we start to exploit Alteryx Designer’s configuration options to fulfill our needs.

Option 4 in the Configuration panel reads Search SubDirs. If we enable it, Alteryx will not only search for the file/s pointed to in the file selection, but also will be looking through all the subfolders beneath the actual folder:

Figure 1.5: Enabling SubDirs

So far, Alteryx will be looking for the MonthlySales_2022.xlsx file in the \DATA subfolder and all of its subfolders.

Replace the 2022 part of the filename with a * sign, like this:

Figure 1.6: Replacing part of the filename

And with that little change, you’ll be telling Alteryx Designer to read the MonthlySales worksheet from all files within the \DATA folder and its subfolders that start with MonthlySales_ and have anything in their names before the extension (.xlsx).

Give it a try, and you’ll see that the contents of Sales for all years are now read by Alteryx.

How it works…

Alteryx Designer allows you to use wildcard characters (* and ?) that give you the option of replacing a specific character or a sequence of them.

Use * when you need to replace any string, no matter its length.

Use ? when you need to replace a character in the indicated position.

For instance, MonthlySales_*.xlsx will tell Alteryx Designer to read all files having MonthlySales_ in the filename, followed by any character or characters (MonthlySales_1234.xlsx, MonthlySales_1.xlsx, and so on), and that end with a .xlsx extension.

When using ?, it’ll replace one character per ? used, in the exact position where it is. For instance, MonthlySales?2018.xlsx will read MonthlySales_2018.xlsx, MonthlySales-2018.xlsx, MonthlySalesX2018.xlsx, and so on, but MonthlySales%%2018.xlsx won’t (because there are two characters between MonthlySalesand 2018).

So, for example, this configuration will allow you to read all CSV files contained in C:\DATASETS\DATA\ that start with CityBike:

Figure 1.7: Reading multiple CSVs at once

There’s more…

If you want to keep track of where every record comes from, you can use the Output File Name as Field option in the Configuration pane. This will create an additional column in your dataset, with the filename from where that record was read:

Figure 1.8: Getting a record’s source file

Also, within Alteryx Designer, you can refer to a file location using the “dots convention," where a single dot (.) specifies the current workflow directory and two dots (..) refers to a parent folder from the actual workflow location.

Have a look at the following examples:

.\mydata.csv → Refers to a mydata.csv file located in the same folder as the actual workflow..\DATA\mydata.csv → Refers to a mydata.csv file located in a folder called DATA, at the same hierarchy level as the folder containing the actual workflow

Reading all worksheets from an Excel file at once – same schema

Another common situation is to have all your data spread across multiple worksheets. So, we need to read them all and put them together:

Figure 1.9: Our data file worksheets

As always, we’ll make it simple for you, since Alteryx Designer allows us to.

Getting ready

If you want to follow along with this recipe, please download our test set here:

https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe2

Here, you’ll find an Excel file with yearly data spread across multiple worksheets, and by applying this recipe, we’ll be reading them all at once.

If you want to try this recipe with your data, make sure of the following:

All your worksheets have the same schema (field/column names and data types).You have a list of worksheets to import (we’ll get this directly when first accessing the file using a helpful option Alteryx has).You have a file with the schema to be used as a template. This file should be always available to the workflow, and won’t be imported if it’s not within the files-to-import list. If your file meets the first requirement, you can use the same file as the template.

How to do it…

Use thefollowing steps:

Drop an Input Data tool onto the canvas, and point it to the Excel file you need to read:

Figure 1.10: Selecting a file to read

Alteryx will prompt you for which of the worksheets you want to read.

Select the Import only the list of sheet names option:

Figure 1.11: Selecting a list of sheet names

The selection will produce a list of sheets within the selected file, like this one:

Figure 1.12: Results after reading the file

It’s important to note that at this point, we still don’t have the actual contents of those sheets, only their names.

To read the contents, we need the help of the Dynamic Input tool, so we drop one onto the canvas (from the Developer category).Click on Edit… in the Input Data Source Template section, and you’ll notice that the screen that pops up is the same as the Input Datatool one.Point Input Data Source Template to your actual Excel file (or one that has the same schema as the one you need to read), and since we’re reading an Excel file, Alteryx Designer will ask for the sheet we want to connect to.Select one of the sheets and hit OK:

Figure 1.13: Selecting one of the sheets to use as a template

Hit OK again, and your template will be ready to be used as the sheet’s schema for the upcoming data.

Now, we need to tell the Dynamic Input tool how we expect it to behave.

So, for the Read a List of Data Sources option, we are going to use the contents of the [Sheet Names] field, and for the Action field, we’ll select Change File/Table Name:

Figure 1.14: Setting data sources/tables/sheet names

Figure 1.15: Setting the action to perform, based on field values

If you run your workflow, you’ll see how the contents of each worksheet are read by Alteryx and stacked together:

Figure 1.16: Workflow results

How it works…

As its name suggests, the Dynamic Input tool performs several dynamic reading operations based on the configuration we set up.

In this particular case, we instructed the tool to read a list of worksheets (or tables) based on what’s been gathered by the Input Data tool (the sheet names within our file), stating that the expected scheme or structure for those sheets must match the one we used as a template.

So, our first input record is Sales_2018 (the name of the first sheet name), so Alteryx Designer compares its schema against the one we indicated. If it matches, Alteryx reads the data within it; if not, it throws an error. In both cases, Alteryx moves to the next record and performs the same tasks until we reach the end of the file.

There’s more…

Sometimes we need to keep track of where every record comes from. Alteryx Designer allows us to do that very easily.

Just select the Output File Name as Field option, and Alteryx will add a new field to your data, containing—depending on which option you selected—either File Name Only or Full Path (the entire absolute path to the file) information:

Figure 1.17: Output File Name as Field option

Different schema Excel worksheets all at once

The most complicated situation is when we need to consolidate Excel files that have different schemas.

As we saw in recipe #2, if the schema of the worksheets is not the same, we get a warning from Alteryx Designer, and the contents of that worksheet are not read:

Figure 1.18: Warning for unmatching schema

In this recipe, we’ll create a batch macro to read each worksheet within an Excel file, even when they have a different schema.

Getting ready

We prepared a test set for you to follow along with this recipe, available at https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe3.

If you decided to use your data to try this recipe, make sure that your files have different schemas (with some fields in common so that you can see the effects of different configurations).

Make sure that under the Options→User Settings→Edit User Settings→Macros tab, you have at least one directory/folder set up as a macro repository (this makes it easier for Alteryx Designer to find your macros). As you can see, there can be several repositories configured, but one of them will be the default one:

Figure 1.19: Default macro repositories

How to do it…

Let’s start by creating a new workflow.Drop an Input Data tool onto your canvas and point it to the file you want to read:

Figure 1.20: Input Data tool configuration

From the Interface category, drop a Control Parameter tool onto the canvas and change its label to something meaningful (this is going to be your tool’s input anchor):

Figure 1.21: Control Parameter tool in the canvas

Once you have both tools, click and drag from the black Q connector of the Control Parameter tool and connect it to the lightning icon on top of the Input Data tool. You’ll notice that Alteryx Designer inserts an Action tool in the middle.Click on the Action tool to configure it, and select Update Input Data Tool from the Select an action type: dropdown, making sure that the Required option is selected:

Figure 1.22: Updating the action type

From the Interface category, drop a Macro Output tool and connect it to the Input Data tool output anchor (this is going to be the output anchor of your tool):

Figure 1.23: Completed macro

Open Interface Designer (from the View menu or by pressing Ctrl + Alt + D):

Figure 1.24: Interface Designer

Click on the gear icon at the left of the pop-up window so that you can access the output options:

Figure 1.25: Macro output options

Make sure that Output fields change based on macro’s configuration or data input is checked, and select Auto Configure by Name (Wait Until All Iterations Run) from the Output Mode option:

Figure 1.26: Macro output options configured

Now, save the workflow.

If you pay attention, Alteryx Designer will present you the Save dialog, pointing to the same folder you have configured as the default macro repository, and the extension of the file to be saved is .yxmc (“yx” for Alteryx; “mc” for macro).

Name your macro BatchReadWorksheets.yxmc and save it. Close your workflow.Now, create a new workflow and right-click on the canvas.Go to Insert, and you’ll notice a Macro… option way down on the menu:

Figure 1.27: Inserting a macro into a new workflow

Click on Macro…, and Alteryx will present you with a File Open dialog. Navigate to your macro repository and select the BatchReadWorksheets.yxmc file.

You’ll see that Alteryx Designer inserted a “tool” in the canvas with one input and one output anchor:

Figure 1.28: Batch macro waiting for data to be connected

Now, drop a Directory tool from the In/Out category onto the canvas and connect its output to our tool’s parameter input anchor (marked with ¿).Point the Directory tool to the directory where your Excel files are:

Figure 1.29: Directory tool pointing to the folder

Now, click again on our macro, and select the [FullPath] field from the Choose Field: FullPath to Read option:

Figure 1.30: Selecting a field to use as the path to read files

Drop a Browse tool following the macro and run the workflow.

You’ll see that all records from the Excel files are there, but the structures are not quite complete.

Notice that there’ll be columns with a lot of Null values. This happens because some of the worksheets don’t have that particular column, so Alteryx Designer fills them with Null values.

Saving and preserving Excel sheet formats

When working with Excel files, we often have to report using existing templates that have already been formatted, and we need to respect those formats.

Alteryx allows you to preserve the existing formatting, and we’ll go through the process of getting this done within Alteryx Designer.

Getting ready

To follow along with this recipe, you can download our test set from here:

https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe4

If you decide to try it with your data, make sure you define ranges on which the data must be saved.

To make this recipe work, we need to have a range defined within the Excel file.

Our sample set contains a pre-formatted Excel worksheet, with explicit and conditional formats:

Figure 1.31: Excel template

How to do it…

We are going to tackle this recipe in two phases:

Grabbing the templateWriting to the templateWe’ll start reading our data source, so drop an Input Data tool onto the canvas and point it to MonthlySales_*.xlsx (you can revisit recipe #1 to see how this works):

Figure 1.32: Reading the data source

This’ll be the data we’ll process and later save in our report template file.

Drop a Block Until Done tool (from the Developer category) following the Input Data tool.

To be able to save data into the template, we need to make a copy of the actual template in the final location where we want our output. We’re going to use \OUTPUT as the location.

Also, we need to save our workflow first, for which Alteryx Designer provides a very handy variable ([Engine.WorkflowDirectory]) that stores the value to where our workflow is stored; otherwise, it’ll point to a temp directory and we’ll lose our relative references to the working files.

Drop a Sample tool onto the canvas and configure it to keep only the first record (First N rows selected and 1):

Figure 1.33: Configuring the Sample tool

Drop a Formula tool and connect it to the output anchor labeled 1 in the Block Until Done tool.

Create a new column called Copy_Command and use this formula as its expression:

'copy "' + [Engine.WorkflowDirectory] +'..\TEMPLATE\ReportingTemplate.xlsx" "' + [Engine.WorkflowDirectory]+ '..\OUTPUT\"'

As stated earlier, [Engine.WorkflowDirectory] refers to the folder where the workflow is stored (and runs from). So, no matter where we put it or copy it, it’ll keep the relative references (such as ..\DATA, for example).

Since our data contains more columns than needed, we’ll drop a Select tool onto the canvas, to only keep the [Copy_command] field:

Figure 1.34: Keeping only the [Copy_command] data field

To finish this step, we need to tell Alteryx Designer to execute the OS command contained within the [Copy_command] field, so for that, we need a Run Command tool (it’s in the Developer category).

Drop it and configure it as in the following screenshot, making sure that you use %temp%\Copy.bat as the Write Source value and the same value for the Run External Program command:

Figure 1.35: Run Command tool configuration

So far, if we run this workflow, we’ll see that the original template located in the \TEMPLATES folder was copied to the \OUTPUT folder.

From now on, we can use it to write data within.

We can see we have three sections in the template to “fill” (Monthly Sales, starting at A11; Annual Sales, starting at F11; and there is another one— B3, where we are going to insert the date of the report’s issue). You can see that we named those ranges too in the template: MONTHLY_SALES, YEARLY_SALES, and REPORT_DATE.

Additionally, the template has some conditional formatting rules configured, and a chart inserted to show the monthly sales in a line chart.

To fill the monthly sales section of the report, we only need to write the contents of our incoming data, so for that, drop a Sort tool and connect it to the output anchor of the Block Until Done tool, labeled 2. Sort the data by date in ascending order, and following that, drop an Output Data tool onto the canvas.Point the Write to File or Database value to where the template was copied (in our case, it’s the OUTPUT folder located one level up (..\) to where our workflow is in the hierarchy), and select the ReportingTemplate.xlsx file.Make sure that options 5 and 6 (Skip Field Names and Preserve Formatting on Overwrite (Range Required)) are checked:

Figure 1.36: Configuring the Output Data tool

When Alteryx asks to specify a sheet, we have two options:

Use the named range as the sheet name if we have one:

Figure 1.37: Specifying a named range to write data to

Specify a range within the worksheet to save our data (notice that we define A11:B as the range because we know we’ll be starting at A11, but we don’t know where we’ll be ending at B):

Figure 1.38: Specifying a range within a worksheet to write data to

For yearly sales, we need to perform an extra step, which consists of summarizing sales values per year.

To be able to do this with the actual data we have, we need to extract the [Year] value from the actual [DATE] field. This can be easily done by dropping a Formula tool onto the canvas. Connect this Formula tool to the output anchor 3 of the existing Block Until Done tool.Add a new column on the Formula tool named [YEAR] and use this expression to extract the year from the [DATE] field: DateTimeFormat([DATE],"%Y")

Now that we have the [YEAR] value of each record, we need a Summarize tool to perform the corresponding aggregations.

Drop a Summarize tool from the Transform category and configure it, like this:

Figure 1.39: Summarizing tool configuration process

Note

To configure the Summarize tool, first, select a field from the top box, then click on Add, and from the dropdown, select the operation you want to apply to that particular field.

So, this configuration will give us the total sum of [MONTHLY_SALES] and the maximum (MAX) value of [DATE] per [YEAR]:

Figure 1.40: Summarized sales per year

Now, to save the data, drop another Output Data tool and configure it.Select ReportingTemplate.xlsx from the OUTPUT directory, and use the YEARLY_SALES named range to write the data. Again, make sure that options 5 and 6 (Skip Field Names and Preserve Formatting on Overwrite (Range Required)) are selected; otherwise, you’ll not only overwrite existing data but you’ll be removing the existing formats:

Figure 1.41: Writing yearly sales to the template

To finish, we’ll add the date on which the report was issued.

To do so, we need a Formula tool that creates that value for us. Drop one, connect it to the Select output we added in the previous step (the same anchor the Output Data tool connects to), and create a new column called [REPORT_DATE] with this expression: DateTimeStart()Add a Select tool following the Formula tool, and just keep the [REPORT_DATE] field and a Sample tool to get the first row (First N rows, 1).Finally, add a new Block Until Done tool and then drop an Output Data tool, and again select our ..\OUTPUT\ReportingTemplate.xlsx file. In this case, select the REPORT_DATE named range and make sure the Skip Field Names and Preserve Formatting on Overwrite (Range Required) options are both checked:

Figure 1.42: Writing the report-issuing date to a named range

Your workflow should look very close to this:

Figure 1.43: Overview of the finished workflow

If you run the workflow and open \OUTPUT\ReportTemplate.xlsx, it’ll look like this:

Figure 1.44: Final results (ranges and chart updated)

How it works…

Let’s start explaining the Block Until Done tool, firstly according to Alteryx’s documentation (https://help.alteryx.com/20221/designer/block-until-done-tool):

“Use Block Until Done to stop datasets from moving downstream until the last record in the set has been processed by all previous tools. In addition, this tool ensures that only a single output stream at a time receives the completed dataset. Subsequent streams are blocked until all the dataset records are pushed through the 1st stream.”

As you will have realized by now, this tool is essential for performing multiple write operations within the same file from the same workflow, serializing the operations connected to it.

The order in which we connected the tools to the Block Until Done tool is important too because they determine the order in which those operations will be executed.

Thanks to the Run Command tool, we can execute external commands such as the one we created to copy files.

It is a very powerful tool to interact with our environment, including the OS, or custom applications we can integrate into Alteryx:

Figure 1.45: The “anatomy” of the Run Command tool

See also (follow-up steps)

You can learn more about the Block Until Done tool here:

https://help.alteryx.com/current/designer/block-until-done-tool

And you can find out more about the Run Command tool here:

https://help.alteryx.com/current/designer/run-command-tool

Accessing and determining which files are to be processed from a file repository

Sometimes, data is put in shared folders across the company’s network, and we have no control over when and how that data is updated. We have some users updating files within a shared folder, and we need to keep the last updated one from a set of various versions of the same file.

This recipe will help you determine the newest files to be processed from within a folder.

Getting ready

To follow along with this recipe, use the ch1\r5 files from GitHub:

https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe5

If you decided to use your data to follow up on this recipe, the only thing you need to make sure of is that all your files are in one directory (and/or subdirectories).

How to do it…

Place a Directory tool onto the canvas and point it to where your data is (in our example set, it’s on ..\DATA\).

Select *.xlsx as the File Specification value and make sure Include subdirectories is checked:

Figure 1.46: Directory tool configuration

Run your workflow and focus on the Results panel of Alteryx Designer:

Figure 1.47: Workflow results

You’ll see important information about the files within the selected directory, such as the following:

[FullPath]—Absolute path to the file[Directory]—Absolute path to the directory containing the file[FileName]—Name and extension of the file[ShortFileName]—For backward compatibility, shows the 8.3 filename format (mostly, you won’t use this)[CreationTime]—Datetime of file creation (registered by the OS)[LastAccessTime]—Last time the file was accessed[LastWriteTime]—Last time the file was written within[Size]—Size in bytes

Along with these values, you’ll see additional ones, depending on which attributes the OS has on each file:

Figure 1.48: Additional workflow results

We can use all these values to create conditions to read files.

For example:

Getting the last modified fileAfter the Directory tool, drop a Sort tool and sort by [LastWriteTime] in descending order.Drop a Sample tool, select First N rows, and set 1.

If our files contain worksheets with the same name, we can apply recipe #2 (we’ll follow this path here) to read the selected file’s content, using the [FullPath] data instead of [List of Sheet Names]. If not, we can reshape recipe #3 to read the selected file content (which will be explained in the There’s more… section of this recipe).

To do so, we need to add a Formula tool to tell Alteryx which worksheet we’ll be reading from the instructed file/s.

Add the Formula tool and create a new column called ToRead, with this expression: [FullPath] + "|||MonthlySales"

This will add the worksheet name (MonthlySales) to be read from the selected files (the ||| separator is how Alteryx Designer recognizes tables/worksheets within databases/files).

Now, drop a Dynamic Input tool following the Formula tool:

Figure 1.49: Our workflow so far

Change the Dynamic Input tool configuration. Using the recently generated [ToRead] field, we are telling the tool to carry out the following action: Change Entire File Path (this is where we’ll change the recipe #2 configuration):

Figure 1.50: Dynamic Input tool configuration

Running the workflow will give us the contents of the last modified file within a folder:

Figure 1.51: Dynamic Input tool configuration

How it works…

The Directory tool allows you to target a specific directory/folder (and its subfolders if wanted), select a file pattern, and return a list of files from that folder with their attributes, allowing us to make decisions about which files to keep and what to discard in our workflow.

Once we got the file list from the desired location, we sorted the files by one of their attributes (LastWriteTime) in descending order, so the newest file will be always the first in our list, and having the Sample tool configured to get the first (1) record, we are completely certain that always gets the last modified one.

Note

We can make endless decisions here, using Filter tools to select files based on any criteria we need to apply to finally read the ones we need, so feel free to take this recipe and adjust it to get the content you need from a bunch of files sitting in a folder.

We made this recipe using Excel files because this format presents us with an additional challenge since they may contain several worksheets (or tables) within each file, and we need to tell Alteryx Designer which ones we need to read. That’s why we need to add a Formula tool before reading the actual contents of the files, adding the ||| notation to the [FilePath] value and building the entire path, not only to the file but to the worksheet within it.

From there, we can now leverage the Dynamic Input tool with another of its powerful methods, Change Entire File Path, based on the results of the [ToRead] field we created in the Formula tool.

There’s more…

As stated before, if you combine this recipe with recipe #3, you can have a more dynamic way to read files, since you won’t be depending on the worksheets having the same name (you’ll be picking them up from the <List of Sheet Names> option Alteryx gives you).

Also, you can keep track of the files already processed simply by appending the [FullPath] field of the processed file/s and the date you processed those files to some kind of custom log (maybe another Excel file).

This method is very useful to avoid reprocessing files that have already been processed (before reading the contents, just join your Directory tool data stream with your custom log and only process the unjoined records coming from the Directory tool) and will save you a lot of time:

Figure 1.52: Joining input with a log of processed files

Handling formats when reading text files

Lots of applications provide data in a flat, fixed-width format. Even when Alteryx doesn’t have a .txt format reader, it allows us to extract data from this type of file, by offering a parsing functionality (similar to what Excel has to determine columns and field types).

Every time we need to access this type of file, we need to know how and where is data organized, and most of the time, these files come with a “data dictionary” telling us the internal structure of the files.

In this recipe, we’ll guide you to build an automated fixed-length file parser, leveraging some of Alteryx Designer’s functionalities.

Getting ready

We got you covered to try this recipe with a sample set here:

https://github.com/PacktPublishing/Alteryx-Designer-Cookbook/tree/main/ch1/Recipe6

This sample data has a dictionary, indicating to us the name, type, size, and scale of each field within it. You can find the data dictionary in a file called FileStructure.xlsx inside the DATA DEFINITION folder of the test set:

Name

Type

Size

Scale

TRIPDURATION

Int32

4

STARTTIME

DateTime

19

STOPTIME

DateTime

19

START_STATION_ID

V_String

4

START_STATION_NAME

V_String

33

START_STATION_LATITUDE

FixedDecimal

13

9

START_STATION_LONGITUDE

FixedDecimal

13

9

STATION_ID

Int32

4

END_STATION_NAME

V_String

33

END_STATION_LATITUDE

FixedDecimal

13

9

END_STATION_LONGITUDE

FixedDecimal

13

9

BIKE_ID

V_String

5

MEMBERSHIP_TYPE

V_String

1

USERTYPE

V_String

10

BIRTH_YEAR

V_String

4

GENDER

Byte

1

Table 1: FileStructure.xlsx content

If you’ll be following along with your data, please make sure that you have a fixed text file and the field definitions for that file (field names, data types, and lengths).

How to do it…

Start by dropping a Text Input tool onto the canvas and create a field named XML, with the following content: <flatfile version="1">   <file eoltype="crlf" allowShortLines="t" allowLongLines="f" trimWhiteSpace="t" />   <fields>Drop another Text Input tool, and create a field also called XML, with the following content:   </fields> </flatfile>Drop an Input Data tool into your canvas, point it to the ..\DATA DEFINITION\FileStructure.xlsx file, and select the Sheet1 worksheet from it:

Figure 1.53: Input data

Now, add a Formula tool to the Input Data tool output anchor and create a new column, also named XML, with the following expression: '    <field name="'+[Name]+'" type="'+ [Type] +'" length="'+ ToString([Size])+ IIF([Type]="FixedDecimal",'" scale="'+ToString([Scale]),'') +'" />'Use a Select tool following the Formula tool to only select the XML field (be sure to uncheck the *Unknownfield too).Drop a Union tool and connect the output anchors of the tools in this order:First text input—Its connection will be labeled #1Select tool—Will become connection #2Second text input—Connection #3

At this point, youll have something similar to this:

Figure 1.54: Connections to the Union tool

Now, click on the Union tool, make sure the method (on the top dropdown) is set to Auto Config by Name, which is the default setting, and check Set a Specific Output Order at the bottom of the configuration panel to set the order:

Figure 1.55: Setting the specific output order of records in the Union tool

To finish, we need to save the resulting XML file onto the DATA folder for later use, so we need to drop an Output Data tool following the Union tool and configure it, as in the following screenshot:

Figure 1.56: Saving the XML file (.flat)

Notice the \0 delimiter in option 3 and the unchecked box for First Row Contains Field Names (option 4).

If you browse to your \DATA folder, you’ll see the GeneratedFlatFile.flat file there.