QlikView: Advanced Data Visualization - Miguel  Ángel García - E-Book

QlikView: Advanced Data Visualization E-Book

Miguel Ángel García

0,0
41,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Build powerful data analytics applications with this business intelligence tool and overcome all your business challenges

Key Features

  • Master time-saving techniques and make your QlikView development more efficient
  • Perform geographical analysis and sentiment analysis in your QlikView applications
  • Explore advanced QlikView techniques, tips, and tricks to deliver complex business requirements

Book Description

QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations.

Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources.

With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro.

This Learning Path includes content from the following Packt products:

  • QlikView for Developers by Miguel Ángel García, Barry Harmsen
  • Mastering QlikView by Stephen Redmond
  • Mastering QlikView Data Visualization by Karl Pover

What you will learn

  • Deliver common business requirements using advanced techniques
  • Load data from disparate sources to build associative data models
  • Understand when to apply more advanced data visualization
  • Utilize the built-in aggregation functions for complex calculations
  • Build a data architecture that supports scalable QlikView deployments
  • Troubleshoot common data visualization errors in QlikView
  • Protect your QlikView applications and data

Who this book is for

This Learning Path is designed for developers who want to go beyond their technical knowledge of QlikView and understand how to create analysis and data visualizations that solve real business needs. To grasp the concepts explained in this Learning Path, you should have a basic understanding of the common QlikView functions and some hands-on experience with the tool.

Miguel Ángel García is a Business Intelligence consultant and Qlik Solutions Architect from Monterrey, Mexico. Having worked throughout many successful Qlik implementations, from inception through implementation, and performed across a wide variety of roles on each project, his experience and skills range from pre-sales to applications development and design, technical architecture, system administration, as well as functional analysis and overall project execution. He currently holds the QlikView Designer, QlikView Developer, and QlikView System Administrator Certifications. Barry Harmsen is a Business Intelligence Consultant based in the Netherlands. Here he runs Bitmetric, a boutique consulting firm specialized in the Qlik product suite. Originally from a background of traditional business intelligence, data warehousing, and performance management, in 2008 Barry made the shift to Qlik and a more user-centric form of Business Intelligence. Since then, he and his team have helped many clients get the most out of their investments in the Qlik platform. Barry is one of the four Qlik experts teaching at the Masters Summit for Qlik, an advanced 3-day training for experienced Qlik professionals. He also runs the Amsterdam chapter of the Qlik Dev Group, an open and informal gathering where Qlik professionals can share knowledge and experiences. Stephen Redmond is the CTO and Qlik Luminary at CapricornVentis - a QlikView Elite Partner. He is the author of several books, including QlikView for Developers Cookbook and QlikView Server and Publisher, both published by Packt Publishing. He is also the author of the popular DevLogixseries for SalesLogix developers. In 2006, after many years of working with CRM systems, reporting and analysis solutions, and data integration, Stephen started working with QlikView. Since then, CapricornVentis has become QlikView's top partner in the UK and Ireland territories, and with Stephen as the head of the team, they have implemented QlikView in a wide variety of enterprise and large-business customers across a wide range of sectors, from public sector to financial services to large retailers. In 2014, Stephen was awarded the Luminary status by Qlik in recognition of his product advocacy. He regularly contributes to online forums, including the Qlik Community. Karl Pover is the owner and principal consultant of Evolution Consulting, which provides QlikView consulting services throughout Mexico. Since 2006, he has been dedicated to providing QlikView presales, implementation, and training for more than 50 customers. He is the author of Learning QlikView Data Visualization, and he has also been a Qlik Luminary since 2014.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 886

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

QlikView: Advanced Data Visualization
Why subscribe?
PacktPub.com
Contributors
About the authors
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
1. Performance Tuning and Scalability
Reviewing basic performance tuning techniques
Removing unneeded data
Reducing the number of rows
Reducing the number of columns
Replacing text keys with numbers
Resolving synthetic keys
Reviewing the basics
Generating test data
Generating dimension values
Generating fact table rows
Understanding how QlikView stores its data
A great primer
Looking at things from a simple level
Exporting the memory statistics for a document
Strategies to reduce the data size and improve performance
Optimizing field values and keys
Optimizing data by removing keys using ApplyMap
Optimizing performance by removing keys by joining tables
Optimizing memory by removing low cardinality fields
Testing chart performance for different load options
Turning the cache off
Examining the chart calculation time for different scenarios
Optimizing performance by creating counter fields
Optimizing performance by combining fact tables?
Optimizing your numbers
Optimizing chart calculation times
The QlikView calculation engine
Creating flags for well-known conditions
Sorting for well-known conditions
Using Direct Discovery
Direct Discovery restrictions
Direct Discovery syntax
Looking at an example Direct Query
Testing scalability with JMeter
Obtaining the scalability tools
Installing JMeter
Installing the scalability tools
About the scalability tools
Running an example execution
Summary
2. QlikView Data Modeling
Reviewing basic data modeling
Associating data
Automatically associating tables
Understanding synthetic keys
Creating composite keys
Using string concatenation
Using one of the Hash functions
Using the AutoNumber function
Realizing that facts are calculated at the level of their table
Joining data
Understanding Join and Keep
Inner joins
Left and right joins
Outer joins
Cartesian joins
Understanding the effect of duplicate key values on joins
Understanding Keep
Concatenating rows
Reviewing Concatenate
Differentiating Concatenate and Join
Mapping data with ApplyMap
Reviewing the basic functionality of ApplyMap
Mapping numbers
Using ApplyMap instead of Join with duplicate rows
Dimensional data modeling
Differentiating between facts and dimensions
Understanding the grain
Understanding star schemas
Summing with facts
Discovering more about facts
Transaction fact tables
Periodic snapshot fact tables
Factless fact tables
Dealing with nulls in fact tables in QlikView
Designing dimension tables
Denormalizing dimensions and conformed dimensions
Understanding surrogate keys
Dealing with missing or late arriving dimension values
Defining Kimball's four-step dimensional design process
Selecting the business process
Declaring the grain
Identifying the dimensions
Identifying the facts
Learning some useful reusable dimension methods
Creating a calendar dimension
Unwrapping hierarchies
Creating leaves with Hierarchy
Creating parent associations with HierarchyBelongsTo
Creating dimensional facts
Handling slowly changing dimensions
Taking the most recently changed record using FirstSortedValue
Using IntervalMatch with SCDs
Using hash to manage from/to dates
Dealing with multiple fact tables in one model
Joining the fact tables together
Concatenating fact tables
Changing the grain of a fact table
Linking fact tables of different grains
Drilling across with document chaining
Summary
3. Best Practices for Loading Data
Reviewing data loading concepts
Getting data from anywhere
Loading data from QlikView
Loading similar files with concatenation
Loading dissimilar files with Concatenate and For Each
Understanding QlikView Data files
Storing tables to QVD
Using QVD files
Understanding why you should use an ETL approach
Speeding up overall data loading
Reusing extracted data in multiple documents
Applying common business rules across multiple documents
Creating conformed dimensions
Provisioning a self-service data layer
Using an ETL approach to create QVD data layers
Creating a StoreAndDrop subroutine
Extracting data
Creating an extractor folder structure
Differentiating types of scripts
Executing the extractors
Transforming data
Creating a transformer and model folder structure
Executing transformers
Loading data
Creating a UserApp folder structure
Executing the load step
Mastering loading techniques
Loading data incrementally
Establishing the script for the basic process
Running an incremental load when data is only added
Loading incrementally when data might be modified
Handling deletions from the source system
Handling situations where there is no modify date
Partially reloading only one part of the data model
Replacing a table
Adding new rows to a table
Managing script execution in partial reloads
Loading the content of another QVW
Using QlikView Expressor for ETL
Introducing Expressor
Understanding why to use Expressor for ETL
Understanding workspaces, libraries, projects, and artifacts
Creating a workspace
Managing extensions
Working with libraries and projects
Understanding artifacts
Configuring connections
Configuring a File connection
Connecting to a database
Creating a QVX Connector Connection
Configuring types and schemas
Adding additional Atomic types
Creating Composite types
Configuring a schema
Creating and packaging a basic dataflow
Understanding the dataflow toolbox
Inputs
Outputs
Transformers
Utility
Creating the dataflow
Configuring a Read File operator
Adding a Transformation operation
Creating a QVX output
Packaging the dataflow
Summary
4. Advanced Expressions
Reviewing basic concepts
Searching in QlikView
Searching for text
Wildcard search
Normal search
Fuzzy search
Associative search
Advanced search
Searching numeric fields
Numeric search
Automatic interpretation of searches
Multiple values search
Searching in multiple listboxes
Understanding bookmarks
Saving a bookmark
Managing bookmarks
Using variables in QlikView
SET versus LET
Using variables to hold common expressions
Using variables with Dollar-sign Expansion
Limiting calculations
Sum of If
Flag arithmetic
Calculations using variables
Data islands
Set Analysis
Explaining what we mean by a set
Set identifiers
Set modifiers
Understanding Dollar-sign Expansion
Following the two-step process
Following the steps in the script debugger
Following the steps in a chart expression
Understanding when the steps happen in chart expressions
Using parameters with variables and Dollar-sign Expansion
Using variables in expressions
Using advanced Set Analysis
Identifying the identifiers
Understanding that modifiers are sets
Set arithmetic
Using searches in Set Analysis
Using Dollar-sign Expansion with Set Analysis
Comparing to other fields
Direct field comparison
Using Concat with Dollar-sign Expansion
Using the P and E element functions
Set Analysis with Alternate States
Using Alternate States as identifiers
Comparing fields between states
Calculating vertically
Using inter-record and range functions
Applying the Total qualifier
Creating advanced aggregations with Aggr
Using Aggr to calculate a control chart
Calculated dimensions
No to nodistinct
Summary
5. Advanced Scripting
Reviewing the basic concepts
Using Table Files Wizard
Using relative paths
Delimited files
Fixed width files
XML files
HTML files
QVD/QVX files
Connecting to databases
Using the Connect button
Understanding the Connect To statement
Explaining the Force 32 Bit option
The Select wizard
Counting records
RecNo
RowNo
FieldValueCount
NoOfRows
NoOfColumns
Loading data quickly
Understanding compression settings
Optimal loading from QVD
Using an Exists clause
Preloading fields into QVDs
Applying variables and the Dollar-sign Expansion in the script
Examining common usage
Holding dates
Holding paths
Examining variable values during reloads
Nesting Dollar-sign Expansions
Passing parameters to variables – macro functions
Subroutines
Using control structures
Branching with conditional statements
If … Then … ElseIf
A note about conditional functions
Switch … Case
When and Unless
Looping in the script
AutoGenerate
For … Next loops
For Each … Next loops
FileList
DirList
Do … Loop
Exiting
Exiting the script
Exiting other constructs
Using variables for error handling
ErrorMode
ScriptError
ScriptErrorCount and ScriptErrorList
Examining advanced Table File Wizard options
Enabling a transformation step
Garbage
Fill
Column
Context
Unwrap
Rotate
Using the Crosstable wizard
Looking at data from different directions
Putting things first
First
FirstSortedValue
Looking backwards
Previous
Peek
Reusing code
Summary
6. What's New in QlikView 12?
Common QIX Engine
64-bit only
Online documentation
Security improvements
Mobile touch improvements
Improved clustering and scaling
Clustering improvements in QlikView 12.1
What is new in the Qlik product portfolio?
Qlik Sense Enterprise and the Qlik Analytics Platform
In what way are QlikView and Qlik Sense similar?
In what way do QlikView and Qlik Sense differ?
What does this mean for users?
What is the Qlik Analytics Platform?
Qlik NPrinting
Qlik Web Connectors
Qlik GeoAnalytics
Qlik DataMarket
How do the products in the Qlik product portfolio fit together?
Summary
7. Styling Up
Design requirements
The Document Properties window
The Sheet Properties dialog
Setting the object properties
Caption colors and style
Changing the caption colors
The Color Area and Color dialog windows
Setting the caption font
Setting the content font
Setting the global font
Propagating the object appearance
Setting the default Sheet Object Style
Hiding captions
Working with listboxes
Adding listboxes
The List Box Properties dialog
The General tab
The Expressions tab
The Sort tab
The Presentation tab
The Number tab
The Font tab
The Layout tab
The Caption tab
The Multi Box
The Current Selections Box
Making selections from the Current Selections Box
Adding a Bookmark Object
Aligning and resizing sheet objects
Selecting objects
Moving objects
Resizing objects
Resizing a Multi Box
Aligning sheet objects
Do a little house keeping
Creating and applying a default color map
Defining chart colors
Setting the default color map
Summary
8. Building Dashboards
User types
Dashboard users
Analysts
Report users
Applying the DAR principle to Airline Operations
Document requirements
Creating the Analysis sheet
Adding a new chart
Bar Chart
Additional bar chart properties
Style
Presentation
Expressions and the Edit Expression window
Expressions
The Edit Expression window
Fields
Functions
Variables
Images
The Expression Overview window
Line Chart
Additional line chart properties
Expressions
Style
Presentation
Combo Chart
Container
Scatter Chart
Button
Statistics box
Creating the new Dashboard sheet
Linked Objects
Gauges
Cloning the object for re-use
Adding Air Time %
More Gauge styles
Adding a Text object
Using a Text Object to display an image
Adding actions to a Text object
Adding a Pie chart
Dimension Limits
Adding the dimension value to the data point values
Creating the Reports sheet
Variables
The Expression Overview window in action
Copying sheets
KPIs per airline, origin, and destination country
Cyclic and Drill-down groups
Straight table
Not all expressions are numbers
Pivot tables
Auto minimize
The Report Editor window
Other charts
Radar Chart
Mekko Chart
Grid Chart
Funnel Chart
Block Chart
Trellis Chart
Summary
9. Advanced Data Transformation
Data architecture
Two-stage architecture
Three-stage architecture
Setting up our environment
Loading data already stored in QlikView
Cloning a QlikView data model
Loading from RAM
Resident load
Aggregating data
Aggregating the Flight Data table
The Transformation output
Aggregation functions
Sorting tables
Ordering the Order-By fields
The Peek function
Merging forces
A refresher
The objective
Getting it done
Loading the table
Sorting the table
Peeking previous records
A solo exercise
Dealing with slowly changing dimensions
The Carrier Decode table
IntervalMatch magic
Expanding the intervals
Some considerations
Applying IntervalMatch to the Carrier Decode table
Ordering, peeking, and matching all at once
The use case
Incremental loads
Summary
10. Security
Hidden script
Section access
Section access fields
Reduction fields
Initial data reduction
Omitting fields
Document-level security
Sheet-level security
Summary
11. Data Visualization Strategy
Data exploration, visualization, and discovery
Data teams and roles
Data research and development
Data governance team
Agile development
User story
Minimum Viable Product
QlikView Deployment Framework
Exercise 11.1
Summary
12. Sales Perspective
Sales perspective data model
Exercise 12.1
Data quality issues
Missing dimension values
Missing fact values
Data formatting and standardization
Case
Unwanted characters
Dates and time
Master calendar
Customer stratification.
Pareto analysis
Exercise 12.2
Exercise 12.3
Customer churn
Exercise 12.4
Exercise 12.5
QlikView extensions and the cycle plot
Exercise 12.6
Governance – design template
Summary
13. Financial Perspective
Financial perspective data model
Exercise 13.1
Financial report metadata
AsOfCalendar
Income statement
Exercise 13.2
Custom format cell
Exercise 13.3
Balance sheet
Exercise 13.4
Exercise 13.5
Cash flow statement
Exercise 13.6
Summary

QlikView: Advanced Data Visualization

QlikView: Advanced Data Visualization

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2018

Production reference: 1191218

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78995-599-6

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionalsLearn better with Skill Plans built especially for youGet a free eBook or video every monthMapt is fully searchableCopy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Miguel Ángel García is a Business Intelligence consultant and Qlik Solutions Architect from Monterrey, Mexico. Having worked throughout many successful Qlik implementations, from inception through implementation, and performed across a wide variety of roles on each project, his experience and skills range from pre-sales to applications development and design, technical architecture, system administration, as well as functional analysis and overall project execution. He currently holds the QlikView Designer, QlikView Developer, and QlikView System Administrator Certifications.

Barry Harmsen is a Business Intelligence Consultant based in the Netherlands. Here he runs Bitmetric, a boutique consulting firm specialized in the Qlik product suite. Originally from a background of traditional business intelligence, data warehousing, and performance management, in 2008 Barry made the shift to Qlik and a more user-centric form of Business Intelligence. Since then, he and his team have helped many clients get the most out of their investments in the Qlik platform.

Barry is one of the four Qlik experts teaching at the Masters Summit for Qlik, an advanced 3-day training for experienced Qlik professionals. He also runs the Amsterdam chapter of the Qlik Dev Group, an open and informal gathering where Qlik professionals can share knowledge and experiences.

Stephen Redmond is the CTO and Qlik Luminary at CapricornVentis - a QlikView Elite Partner. He is the author of several books, including QlikView for Developers Cookbook and QlikView Server and Publisher, both published by Packt Publishing. He is also the author of the popular DevLogixseries for SalesLogix developers. In 2006, after many years of working with CRM systems, reporting and analysis solutions, and data integration, Stephen started working with QlikView. Since then, CapricornVentis has become QlikView's top partner in the UK and Ireland territories, and with Stephen as the head of the team, they have implemented QlikView in a wide variety of enterprise and large-business customers across a wide range of sectors, from public sector to financial services to large retailers. In 2014, Stephen was awarded the Luminary status by Qlik in recognition of his product advocacy. He regularly contributes to online forums, including the Qlik Community.

Karl Pover is the owner and principal consultant of Evolution Consulting, which provides QlikView consulting services throughout Mexico. Since 2006, he has been dedicated to providing QlikView presales, implementation, and training for more than 50 customers. He is the author of Learning QlikView Data Visualization, and he has also been a Qlik Luminary since 2014.

Packt is Searching for Authors Like You

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Preface

QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations.

Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources.

With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro.

Who this book is for

This Learning Path is designed for developers who want to go beyond their technical knowledge of QlikView and understand how to create analysis and data visualizations that solve real business needs. To grasp the concepts explained in this Learning Path, you should have a basic understanding of the common QlikView functions and some hands-on experience with the tool.

What this book covers

Chapter 1, Performance Tuning and Scalability, is where we look at understanding how QlikView stores its data so that we can optimize that storage in our applications. We will also look at topics such as Direct Discovery and testing implementations using JMeter.

Chapter 2, QlikView Data Modeling, looks in detail at dimensional data modeling and learning about fact and dimension tables and using best practices from Ralph Kimball in QlikView. We also learn about how to handle slowly changing dimensions (SCDs), multiple fact tables, and drilling across with document chaining.

Chapter 3, Best Practices for Loading Data, is where we look at implementing ETL strategies with QVD files. We also introduce QlikView Expressor.

Chapter 4, Advanced Expressions, is where we look at areas such as the Dollar-sign Expansion, set analysis, and vertical calculations using Total and Aggr.

Chapter 5, Advanced Scripting, looks at optimizing loads, Dollar-sign Expansion in the script, and control structures. We also introduce the concept of code reuse.

Chapter 6, What's New in QlikView 12, presents a summary of the changes in the QlikView software, as well as in the Qlik ecosystem in general, that happened since the previous version of this book was published in 2012. In this chapter, we will bring you up to speed with the changes over the past few years.

Chapter 7, Styling Up, will help us learn how to style our QlikView documents. We will learn about the various document and sheet properties and will use them to manage the visual style of our document. We will also take a closer look at some of the most fundamental objects and learn how we can change their appearance.

Chapter 8, Building Dashboards, introduces us to the three basic types of QlikView users, and how we can best cater to their needs. We will learn about the various charting options that are available in QlikView, and will see how we can add interactivity to our QlikView documents. We will also be introduced to basic calculations.

Chapter 9, Advanced Data Transformation, returns to the topic of data transformation. We will learn about the most commonly used data architectures that can ease QlikView development and administration. Next, we will take a close look at aggregating and sorting data in the data model. In the fi nal part of the chapter, we will learn how to take advantage of some of QlikView's most powerful data transformation capabilities.

Chapter 10, Security, shows how to secure our QlikView documents. We will see how we can allow only authorized users to open our documents and will learn how we can limit what a user can do and see within our document.

Chapter 11, Data Visualization Strategy, begins our journey to create a data-driven organization using QlikView.

Chapter 12, Sales Perspective, explains the data model's importance to data visualization, and shows us how to create advanced analyses, such as customer stratifi cation, churn prediction, and seasonal trends.

Chapter 13, Financial Perspective, illustrates the usage of metadata to format an income statement, a balance sheet, and a cash flow statement.

Chapter 14, Marketing Perspective, walks us through various types of visualization that reveal customer profiles, potential markets, social media sentiment, and the sales pipeline.

Chapter 15, Working Capital Perspective, describes how to analyze days sales of inventory, days sales outstanding, and days payable outstanding, at both a high and a detailed level. It also explains how they are important in order to determine customer stratification.

Chapter 16, Operations Perspective, shows us how to analyze our service levels, predict supplier lead times, and investigate whether on-time deliveries depend on the supplier.

Chapter 17, Human Resources, reveals how to visualize personnel productivity and personal behavior analysis.

Chapter 18, Fact Sheets, demonstrates an ad hoc design method to create a customer fact sheet that includes bullet graphs, sparklines, and a customized UX.

Chapter 19, Balanced Scorecard, details a more formal design method to build an information dashboard containing balanced scorecard metrics.

Chapter 20, Troubleshooting Analysis, takes a look at resources and methods to debug problems in our QlikView applications.

Chapter 21, Mastering Qlik Sense Data Visualization, explains what Qlik Sense means to a QlikView developer and proposes a plan to master Qlik Sense data visualization.

To get the most out of this book

To use this book, you primarily need the QlikView Desktop software. With regards to computer requirements, you will need a PC with at least Windows XP (or better), 2 GB of hard disk space, and 2 GB of RAM. A 64-bit machine is required if you want to use QlikView 12 or a higher version, and is the recommended environment for this book and QlikView development in general. If you prefer to use a 32-bit machine, you can install QlikView 11 instead.

For best understanding, a general knowledge of BI and its terminology is required. Basic understanding of databases and SQL is preferred, but not compulsory for this book.

Download the example code files

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packtpub.com.Select the SUPPORT tab.Click on Code Downloads & Errata.Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub https://github.com/PacktPublishing/QlikView-Advanced-Data-Visualization. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "In the QVScriptGenTool_0_7 64Bit\Analyzer folder there is a ZIP file called FolderTemplate.zip."

A block of code is set as follows:

Sales: Load * INLINE [ Country, Sales USA, 1000 UK, 940 Japan, 543 ];

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Sales: Load * INLINE [ Country, Sales USA, 1000 UK, 940 Japan, 543 ];

Any command-line input or output is written as follows:

C:\Program Files\QlikView\qv.exe

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click on the Execution tab."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Chapter 1. Performance Tuning and Scalability

"The way Moore's Law occurs in computing is really unprecedented in other walks of life. If the Boeing 747 obeyed Moore's Law, it would travel a million miles an hour, it would be shrunken down in size, and a trip to New York would cost about five dollars. Those enormous changes just aren't part of our everyday experience."

— Nathan Myhrvold, former Chief Technology Officer at Microsoft, 1995

The way Moore's Law has benefitted QlikView is really unprecedented amongst other BI systems.

QlikView began life in 1993 in Lund, Sweden. Originally titled "QuickView", they had to change things when they couldn't obtain a copyright on that name, and thus "QlikView" was born.

After years of steady growth, something really good happened for QlikView around 2005/2006—the Intel x64 processors became the dominant processors in Windows servers. QlikView had, for a few years, supported the Itanium version of Windows; however, Itanium never became a dominant server processor. Intel and AMD started shipping the x64 processors in 2004 and, by 2006, most servers sold came with an x64 processor—whether the customer wanted 64-bit or not. Because the x64 processors could support either x86 or x64 versions of Windows, the customer didn't even have to know. Even those customers who purchased the x64 version of Windows 2003 didn't really know this because all of their x86 software would run just as well (perhaps with a few tweaks).

But x64 Windows was fantastic for QlikView! Any x86 process is limited to a maximum of 2 GB of physical memory. While 2 GB is quite a lot of memory, it wasn't enough to hold the volume of data that a true enterprise-class BI tool needed to handle. In fact, up until version 9 of QlikView, there was an in-built limitation of about 2 billion rows (actually, 2 to the power of 31) in the number of records that QlikView could load. On x86 processors, QlikView was really confined to the desktop.

x64 was a very different story. Early Intel implementations of x64 could address up to 64 GB of memory. More recent implementations allow up to 256 TB, although Windows Server 2012 can only address 4 TB. Memory is suddenly less of an obstacle to enterprise data volumes.

The other change that happened with processors was the introduction of multi-core architecture. At the time, it was common for a high-end server to come with 2 or 4 processors. Manufacturers came up with a method of putting multiple processors, or cores, on one physical processor. Nowadays, it is not unusual to see a server with 32 cores. High-end servers can have many, many more.

One of QlikView's design features that benefitted from this was that their calculation engine is multithreaded. That means that many of QlikView's calculations will execute across all available processor cores. Unlike many other applications, if you add more cores to your QlikView server, you will, in general, add more performance.

So, when it comes to looking at performance and scalability, very often, the first thing that people look at to improve things is to replace the hardware. This is valid of course! QlikView will almost always work better with newer, faster hardware. But before you go ripping out your racks, you should have a good idea of exactly what is going on with QlikView. Knowledge is power; it will help you tune your implementation to make the best use of the hardware that you already have in place.

The following are the topics we'll be covering in this chapter:

Reviewing basic performance tuning techniquesGenerating test dataUnderstanding how QlikView stores its dataLooking at strategies to reduce the data size and to improve performanceUsing Direct DiscoveryTesting scalability with JMeter

Reviewing basic performance tuning techniques

There are many ways in which you may have learned to develop with QlikView. Some of them may have talked about performance and some may not have. Typically, you start to think about performance at a later stage when users start complaining about slow results from a QlikView application or when your QlikView server is regularly crashing because your applications are too big.

In this section, we are going to quickly review some basic performance tuning techniques that you should, hopefully, already be aware of. Then, we will start looking at how we can advance your knowledge to master level.

Removing unneeded data

Removing unneeded data might seem easy in theory, but sometimes it is not so easy to implement—especially when you need to negotiate with the business. However, the quickest way to improve the performance of a QlikView application is to remove data from it. If you can reduce your number of fact rows by half, you will vastly improve performance. The different options are discussed in the next sections.

Reducing the number of rows

The first option is to simply reduce the number of rows. Here we are interested in Fact or Transaction table rows—the largest tables in your data model. Reducing the number of dimension table rows rarely produces a significant performance improvement.

The easiest way to reduce the number of these rows is usually to limit the table by a value such as the date. It is always valuable to ask the question, "Do we really need all the transactions for the last 10 years?" If you can reduce this, say to 2 years, then the performance will improve significantly.

We can also choose to rethink the grain of the data—to what level of detail we hold the information. By aggregating the data to a higher level, we will often vastly reduce the number of rows.

Reducing the number of columns

The second option is to reduce the width of tables—again, especially Fact or Transaction tables. This means looking at fields that might be in your data model but do not actually get used in the application. One excellent way of establishing this is to use the Document Analyzer tool by Rob Wunderlich to examine your application (http://robwunderlich.com/downloads).

As well as other excellent uses, Rob's tool looks at multiple areas of an application to establish whether fields are being used or not. It will give you an option to view fields that are not in use and has a useful DROP FIELD Statements listbox from which you can copy the possible values. The following screenshot shows an example (from the default document downloadable from Rob's website):

Adding these DROP FIELD statements into the end of a script makes it very easy to remove fields from your data model without having to dive into the middle of the script and try to remove them during the load—which could be painful.

There is a potential issue here; if you have users using collaboration objects—creating their own charts—then this tool will not detect that usage. However, if you use the DROP FIELD option, then it is straightforward to add a field back if a user complains that one of their charts is not working.

Of course, the best practice would be to take the pain and remove the fields from the script by either commenting them out or removing them completely from their load statements. This is more work, because you may break things and have to do additional debugging, but it will result in a better performing script.

Replacing text keys with numbers

Often, you will have a text value in a key field, for example, something like an account number that has alphanumeric characters. These are actually quite poor for performance compared to an integer value and should be replaced with numeric keys.

Note

There is some debate here about whether this makes a difference at all, but the effect is to do with the way the data is stored under the hood, which we will explore later. Generated numeric keys are stored slightly differently than text keys, which makes things work better.

The strategy is to leave the text value (account number) in the dimension table for use in display (if you need it!) and then use the AutoNumber function to generate a numeric value—also called a surrogate key—to associate the two tables.

For example, replace the following:

Account: Load AccountId, AccountName, … From Account.qvd (QVD); Transaction: Load TransactionId, AccountId, TransactionDate, … From Transaction.qvd (QVD);

With the following:

Generating test data

It is enormously useful to be able to quickly generate test data so that we can create QlikView applications and test different aspects of development and discover how different development methods work. By creating our own set of data, we can abstract problems away from the business issues that we are trying to solve because the data is not connected to those problems. Instead, we can resolve the technical issue underlying the business issue. Once we have resolved that issue, we will have built an understanding that allows us to more quickly resolve the real problems with the business data.

We might contemplate that if we are developers who only have access to a certain dataset, then we will only learn to solve the issues in that dataset. For true mastery, we need to be able to solve issues in many different scenarios, and the only way that we can do that is to generate our own test data to do that with.

Generating dimension values

Dimension tables will generally have lower numbers of records; there are a number of websites online that will generate this type of data for you.

The following screenshot demonstrates setting up aCustomer extract in Mockaroo:

This allows us to create 1,000 customer records that we can include in our QlikView data model. The extract is in the CSV format, so it is quite straightforward to load into QlikView.

Generating fact table rows

While we might often abdicate the creation of test dimension tables to a third-party website like this, we should always try and generate the Fact table data ourselves.

A good way to do this is to simply generate rows with a combination of the AutoGenerate() and Rand() functions.

For even more advanced use cases, we can look at using statistical functions such as NORMINV to generate normal distributions. There is a good article on this written by Henric Cronström onQlik Design Blog at http://community.qlik.com/blogs/qlikviewdesignblog/2013/08/26/monte-carlo-methods.

We should be aware of the AutoGenerate() function that will just simply generate empty rows of data. We can also use the Rand() function

Understanding how QlikView stores its data

QlikView is really good at storing data. It operates on data in memory, so being able to store a lot of data in a relatively small amount of memory gives the product a great advantage—especially as Moore's Law continues to give us bigger and bigger servers.

Understanding how QlikView stores its data is fundamental in mastering QlikView development. Writing load script with this understanding will allow you to load data in the most efficient way so that you can create the best performing applications. Your users will love you.

A great primer

A great primer on how QlikView stores its data is available onQlik Design Blog, written by Henric Cronström (http://community.qlik.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers).

Note

Henric joined QlikView in 1994, so he knows quite a bit about exactly how it works.

Looking at things from a simple level

From a simple level, consider the following small table:

First name

Surname

Country

John

Smith

USA

Jane

Smith

USA

John

Doe

Canada

For the preceding table, QlikView will create three symbol tables like the following:

Index

Value

1010

John

1011

Jane

Index

Value

1110

Smith

1111

Doe

Index

Value

110

USA

111

Canada

And the data table will look like the following:

First name

Surname

Country

1010

1110

110

1011

1110

110

1010

1111

111

This set of tables will take up less space than the original data table for the following three reasons:

The binary indexes are bit-stuffed in the data table—they only take up as much space as needed.The binary index, even though repeated, will take up less space than the text values. The Unicode text just for "USA" takes up several bytes—the binary index takes less space than that.Each, larger, text value is only stored once in the symbol tables.

So, to summarize, each field in the data model will be stored in a symbol table (unless, as we will see later, it is a sequential integer value) that contains the unique values and an index value. Every table that you create in the script—including any synthetic key tables—will be represented as a data table containing just the index pointers.

Note

Because the data table indexes are bit-stuffed, and because data is stored in bytes, adding another bit or two to the indexes may not actually increase the overall width of a data table record.

Exporting the memory statistics for a document

To help us understand what is going on in a particular QlikView document, we can export details about where all the memory is being used. This export file will tell us how much memory is being used by each field in the symbol tables, the data tables, chart objects, and so on.

Perform the following steps to export the memory statistics for a document:

To export the memory statistics, you need to open Document Properties from the Settings menu (Ctrl + Alt + D). On the General tab, click on the Memory Statistics button, as shown in the following screenshot:After you click on the button, you will be prompted to enter file information. Once you have entered the path and filename, the file will be exported. It is a tab-delimited data file:The easiest way to analyze this file is to import it into a new QlikView document:

We can now see exactly how much space our data is taking up in the symbol tables and in the data tables. We can also look at chart calculation performance to see whether there are long running calculations that we need to tune. Analyzing this data will allow us to make valuable decisions about where we can improve performance in our QlikView document.

Strategies to reduce the data size and improve performance

Using some of the test data that we have generated, or any other data that you might want, we can discover more about how QlikView handles different scenarios. Understanding these different situations will give you real mastery over data load optimization.

Optimizing field values and keys

To begin with, let's see what happens when we load two largish tables that are connected by a key. So, let's ignore the dimension tables and load the order data using a script like the following:

Order: LOAD OrderID, OrderDate, CustomerID, EmployeeID, Freight FROM [..\Scripts\OrderHeader.qvd] (qvd); OrderLine: LOAD OrderID, LineNo, ProductID, Quantity, SalesPrice, SalesCost, LineValue, LineCost FROM [..\Scripts\OrderLine.qvd] (qvd);

The preceding script will result in a database memory profile that looks like the following. In the following screenshot, Database has been selected for Class:

There are some interesting readings in this table. For example, we can see that when the main data table—OrderLine—is stored with just its pointer records, it takes up just 923,085 bytes for 102,565 records. That is an average of only 9 bytes per record. This shows the space benefit of the bit-stuffed pointer mechanism as described in Henric's blog post.

The largest individual symbol table is the OrderDate field. This is very typical of a TimeStamp field, which will often be highly unique, have long decimal values, and have the Dual text value, and so often takes up a lot of memory—28 bytes per value.

The number part of a TimeStamp field contains an integer representing the date (number of days since 30th December 1899) and a decimal representing the time. So, let's see what happens with this field if we turn it into just an integer—a common strategy with these fields as the time portion may not be important:

Order: LOAD OrderID, Floor(OrderDate) As DateID, ...

This changes things considerably:

The number of unique values has been vastly reduced, because the highly unique date and time values have been replaced with a much lower cardinality (2171) date integer, and the amount of memory consumed is also vastly reduced as the integer values are only taking 8 bytes instead of the 28 being taken by each value of the TimeStamp field.

The next field that we will pay attention to is OrderID. This is the key field, and key fields are always worth examining to see whether they can be improved. In our test data, the OrderID field is alphanumeric—this is not uncommon for such data. Alphanumeric data will tend to take up more space than numeric data, so it is a good idea to convert it to integers using the AutoNumber function.

AutoNumber accepts a text value and will return a sequential integer. If you pass the same text value, it will return the same integer. This is a great way of transforming alphanumeric ID values into integers. The code will look like the following:

Order: LOAD AutoNumber(OrderID) As OrderID, Floor(OrderDate) As DateID, ... OrderLine: LOAD AutoNumber(OrderID) As OrderID, LineNo, ...

This will result in a memory profile like the following:

The OrderID

Testing chart performance for different load options

As well as information about memory use in each data table and symbol table, we can recall that the Memory Statistics option will also export information about charts—both memory use and calculation time. This means that we can create a chart, especially one with multiple dimensions and expressions, and see how long the chart takes to calculate for different scenarios.

Let's load the Order Header and Order Line data with the Calendar information loaded inline (as in the first part of the last example) in the following manner:

Order: LOAD AutoNumber(OrderID, 'Order') As OrderID, Floor(OrderDate) As DateID, Year(OrderDate) As Year, Month(OrderDate) As Month, Day(OrderDate) As Day, Date(MonthStart(OrderDate), 'YYYY-MM') As YearMonth, AutoNumber(CustomerID, 'Customer') As CustomerID, AutoNumber(EmployeeID, 'Employee') As EmployeeID, Freight FROM [..\Scripts\OrderHeader.qvd] (qvd); OrderLine: LOAD AutoNumber(OrderID, 'Order') As OrderID, LineNo, ProductID, Quantity, SalesPrice, SalesCost, LineValue, LineCost FROM [..\Scripts\OrderLine.qvd] (qvd);

Now we can add a chart to the document with several dimensions and expressions like this:

We have used YearMonth and CustomerID as dimensions. This is deliberate because these two fields will be in separate tables once we move the calendar fields into a separate table.

The expressions that we have used are shown in the following table:

Expression Label

Expression

Sales $

Sum(LineValue)

Sales $ Color

ColorMix1(Sum(LineValue)/Max(total Aggr(Sum(LineValue), YearMonth, CustomerID)), White(), ARGB(255, 0, 128, 255))

Cost $

Sum(LineCost)

Margin $

Sum(LineValue)-Sum(LineCost)

Margin %

(Sum(LineValue)-Sum(LineCost))/Sum(LineValue)

Cum. Sales $

RangeSum(Above(Sum(LineValue),0,RowNo()))

# Orders

Count(DISTINCT OrderID)

Product 101

Sum(If(ProductID=101,1,0))

Product 102-106

Sum(If(Match(ProductID,102,103,104,105,106), 1, 0))

Turning the cache off

The cache in QlikView is enormously important. Calculations and selections are cached as you work with a QlikView document. The next time you open a chart with the same selections, the chart will not be recalculated; you will get the cached answer instead. This really speeds up QlikView performance. Even within a chart, you might have multiple expressions using the same calculation (such as dividing two expressions by each other to obtain a ratio)—the results will make use of caching.

This caching is really useful for a working document, but a pain if we want to gather statistics on one or more charts. With the cache on, we need to close a document and the QlikView desktop, reopen the document in a new QlikView instance, and open the chart. To help us test the chart performance, it can therefore be a good idea to turn off the cache.

Note

Note that you need to be very careful with this dialog as you could break things in your QlikView installation. Turning off the cache is not recommended for normal use of the QlikView desktop as it can seriously interfere with the performance of QlikView. Turning off the cache to gather accurate statistics on chart performance is pretty much the only use case that one might ever come across for turning off the cache. There is a reason why it is a hidden setting!

Examining the chart calculation time for different scenarios

Now that the cache is turned off, we can open our chart and it will always calculate at the maximum time. We can then export the memory information as usual and load it into another copy of QlikView (here, the Class of Sheetobject is selected):

What we could do now is make some selections and save them as bookmarks. By closing the QlikView desktop client and then reopening it, and then opening the document and running through the bookmarks, we can export the memory file and create a calculation for Avg Calc Time. Because there is no cache involved, this should be a valid representation.

Now, we can comment out the inline calendar and create the Calendar table (as we did in a previous exercise):

Order: LOAD AutoNumber(OrderID, 'Order') As OrderID, Floor(OrderDate) As DateID, // Year(OrderDate) As Year, // Month(OrderDate) As Month, // Day(OrderDate) As Day, // Date(MonthStart(OrderDate), 'YYYY-MM') As YearMonth, AutoNumber(CustomerID, 'Customer') As CustomerID, AutoNumber(EmployeeID, 'Employee') As EmployeeID, Freight FROM [..\Scripts\OrderHeader.qvd] (qvd); OrderLine: //Left Join (Order) LOAD AutoNumber(OrderID, 'Order') As OrderID, LineNo, ProductID, Quantity, SalesPrice, SalesCost, LineValue, LineCost FROM [..\Scripts\OrderLine.qvd] (qvd); //exit Script; Calendar: Load Distinct DateID, Year(DateID) As Year, Month(DateID) As Month, Day(DateID) As Day, Date(MonthStart(DateID), 'YYYY-MM') As YearMonth Resident Order;

For the dataset size that we are using, we should see no difference in calculation time between the two data structures. As previously established, the second option has a smaller in-memory data size, so that would always be the preferred option.

Optimizing performance by creating counter fields

For many years, it has been a well-established fact among QlikView consultants that a Count() function with a Distinct clause is a very expensive calculation. Over the years, I have heard that Count can be up to 1000 times more expensive than Sum. Actually, since about Version 9 of QlikView, this is no longer true, and the Count function is a lot more efficient.

Optimizing chart calculation times

Once we have optimized our data model, we can turn our focus onto chart performance. There are a few different things that we can do to make sure that our expressions are optimal, and we can use the memory file extract to test them.

Some of the expressions will actually involve revisiting the data model. If we do, we will need to weigh up the cost of that performance with changes to memory, and so on.

It will be useful to begin with an explanation of how the QlikView calculation engine works.

The QlikView calculation engine

QlikView is very clever in how it does its calculations. As well as the data storage, as discussed earlier in this chapter, it also stores the binary state of every field and of every data table dependent on user selection—essentially, depending on the green/white/grey state of each field, it is either included or excluded. This area of storage is called the state space and is updated by the QlikView logical inference engine every time a selection is made. There is one bit in the state space for every value in the symbol table or row in the data table—as such, the state space is much smaller than the data itself and hence much faster to query.

There are three steps to a chart being calculated:

The user makes a selection, causing the logical inference engine to reset and recalculate the state space. This should be a multithreaded operation.On one thread per object, the state space is queried to gather together all of the combinations of dimensions and values necessary to perform the calculation. The state space is being queried, so this is a relatively fast operation, but could be a potential bottleneck if there are many visible objects on the screen.On multiple threads per object, the expression is calculated. This is where we see the cores in the task manager all go to 100 percent at the same time. Having 100 percent CPU is expected and desired because QlikView will "burst" calculations across all available processor cores, which makes this a very fast process, relative to the size and complexity of the calculation. We call it a burst because, except for the most complex of calculations, the 100 percent CPU should only be for a short time.

Of course, the very intelligent cache comes into play as well and everything that is calculated is stored for potential subsequent use. If the same set of selections are met (such as hitting the Back button), then the calculation is retrieved from the cache and will be almost instantaneous.

Now that we know more about how QlikView performs its calculations, we can look at a few ways that we can optimize things.

Creating flags for well-known conditions

We cannot anticipate every possible selection or query that a user might make, but there are often some quite well-known conditions that will generally be true most of the time and may be commonly used in calculations. In this example, we will look at Year-to-Date and Last Year-to-Date—commonly used on dashboards.

The following is an example of a calculation that might be used in a gauge:

Sum(If(YearToDate(Date), LineValue, 0)) /Sum(If(YearToDate(Date,-1), LineValue, 0)) -1

This uses the YearToDate() function to check whether the date is in the current year to date or in the year to date period for last year (using the -1 for the offset parameter). This expression is a sum of an if statement, which is generally not recommended. Also, these are quite binary—a date is either in the year to date or not—so are ideal candidates for the creation of flags. We can do this in the Calendar table in the following script:

Calendar: Load Distinct DateID, -YearToDate(DateID) As YTD_Flag, -YearToDate(DateID,-1) As LYTD_Flag, Date(DateID) As Date, Year(DateID) As Year, Month(DateID) As Month, Day(DateID) As Day, Date(MonthStart(DateID), 'YYYY-MM') As YearMonth Resident Order;

Note

Note the - sign before the function. This is because YearToDate is a Boolean function that returns either true or false, which in QlikView is represented by -1 and 0. If the value is in the year to date, then the function will return -1, so I add the - to change that to 1. A - sign before 0 will make no difference.

In a particular test dataset, we might see an increase from 8,684 bytes to 13,026—not an unexpected increase and not significant because the Calendar table is relatively small. We are creating these flags to improve performance in the frontend and need to accept a small change in the data size.

The significant change comes when we change the expression in the chart to the following:

Sum(LineValue*YTD_Flag)/Sum(LineValue*LYTD_Flag)-1

In a sample dataset, we might see that the calculation reduces from, say, 46 to, say, 16—a 65 percent reduction. This calculation could also be written using Set Analysis as follows:

Sum({<YTD_Flag={1}>} LineValue)/Sum({<LYTD_Flag={1}>} LineValue)-1

However, this might only get a calc time of 31—only a 32.6 percent reduction. Very interesting!

If we think about it, the simple calculation of LineValue*YTD_Flag is going to do a multithreaded calculation using values that are derived from the small and fast in-memory state space. Both If and Set Analysis are going to add additional load to the calculation of the set of values that are going to be used in the calculation.

In this case, the flag field is in a dimension table, Calendar, and the value field is in the fact table. It is, of course, possible to generate the flag field in the fact table instead. In this case, the calculation is likely to run even faster, especially on very large datasets. This is because there is no join of data tables required. However, the thing to bear in mind is that the additional pointer indexes in the Calendar table will require relatively little space whereas the additional width of the fact table, because of the large numbers of rows, will be something to consider. However, saying that, the pointers to the flag values are very small, so you do need a really long fact table for it to make a big difference. In some cases, the additional bit necessary to store the pointer in the bit-stuffed table will not make any difference at all, and in other cases, it may add just one byte.

Set Analysis can be very powerful, but it is worth considering that it often has to go, depending on the formula, outside the current state space, and that will cause additional calculation to take place that may be achieved in a simpler manner by creating a flag field in the script and using it in this way. Even if you have to use Set Analysis, the best performing comparisons are going to be using numeric comparisons, so creating a numeric flag instead of a text value will improve the set calculation performance. For example, consider the following expression:

Sum({<YTD_Flag={1}>} LineValue)

This will execute much faster than the following expression:

Sum({<YTD_Flag={'Yes'}>} LineValue)

So, when should we use Set Analysis instead of multiplying by flags? Barry Harmsen has done some testing that indicates that if the dimension table is much larger relative to the fact table, then using Set Analysis is faster than the flag fields. The reasoning is that the multiply method will process all records (even those containing 0), so in larger tables, it has more to process. The Set Analysis method will first reduce the scope, and apply the calculation to that subset.

Of course, if we have to introduce more advanced logic, that might include AND/OR/NOT operations, then Set Analysis is the way to go—but try to use numeric flags.

Sorting for well-known conditions

Any time that you need to sort a chart or listbox, that sort needs to be calculated. Of course, a numeric sort will always be the fastest. An alphabetic sort is a lot slower, just by its nature. One of the very slowest sorts is where we want to sort by expression.

For example, let's imagine that we wish to sort our Country list by a fixed order, defined by the business. We could use a sort expression like this:

Match(Country,'USA','Canada','Germany','United Kingdom','China','India','Russia','France','Ireland')

The problem is that this is a text comparison that will be continually evaluated. What we can do instead is to load a temporary sort table in the script. We load this towards the beginning of the script because it needs to be the initial load of the symbol table; something like the following:

Country_Sort: Load * Inline [ Country USA Canada Germany United Kingdom China India Russia France Ireland ];

Then, as we won't need this table in our data, we should remember to drop it at the end of the script—after the main data has been loaded:

Drop Table Country_Sort;

Now, when we use this field anywhere, we can turn off all of the sort options and use the last one—Load Order. This doesn't need to be evaluated so will always calculate quickly:

Using Direct Discovery

Traditionally, QlikView has been a totally in-memory tool. If you want to analyze any information, you need to get all of the data into memory. This has caused problems for many enterprise organizations because of the sheer size of data that they wanted to analyze. You can get quite a lot of data into QlikView—billions of rows are not uncommon on very large servers, but there is a limit. Especially in the last few years where businesses have started to take note of the buzz around Big Data, many believed that QlikView could not play in this area.

Direct Discovery was introduced with QlikView Version 11.20. In Version 11.20 SR5, it was updated with a new, more sensible syntax. This syntax is also available in Qlik Sense. What Direct Discovery does is allow a QlikView model to connect directly to a data source without having to load all of the data into memory. Instead, we load only dimension values and, when necessary, QlikView generates a query to retrieve the required results from the database.

Of course, this does have the potential to reduce some of the things that make QlikView very popular—the sub-second response to selections, for example. Every time that a user makes a selection, QlikView generates a query to pass through to the database connection. The faster the data connection, the faster the response, so a performative data warehouse is a boon for Direct Discovery. But speed is not always everything—with Direct Discovery, we can connect to any valid connection that we might normally connect to with the QlikView script; this includes ODBC connectors to Big Data sources such as Cloudera or Google.

Tip

Here we will get an introduction to using Direct Discovery, but we should read the more detailed technical details published by the Qlik Community, for example, the SR5 technical addendum at http://community.qlik.com/docs/DOC-3710.

Direct Discovery restrictions

There are a few restrictions of Direct Discovery that will probably be addressed with subsequent service releases:

Only one direct table is supported: This restriction has been lifted in QlikView 11.20 SR7 and Qlik Sense 1.0. Prior to those versions, you could only have one direct query in your data model. All other tables in the data model must be in-memory.Set Analysis and complex expressions not supported: Because the query is generated on the fly, it just can't work with the likes of a Set Analysis query. Essentially, only calculations that can be performed on the source database—Sum, Count, Avg, Min, Max—will work via Direct Discovery.Only SQL compliant data sources