E-Book
41,99 €

QlikView: Advanced Data Visualization E-Book

Miguel Ángel García

0,0

41,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Build powerful data analytics applications with this business intelligence tool and overcome all your business challenges

Key Features

Master time-saving techniques and make your QlikView development more efficient
Perform geographical analysis and sentiment analysis in your QlikView applications
Explore advanced QlikView techniques, tips, and tricks to deliver complex business requirements

Book Description

QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations.

Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources.

With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro.

This Learning Path includes content from the following Packt products:

QlikView for Developers by Miguel Ángel García, Barry Harmsen
Mastering QlikView by Stephen Redmond
Mastering QlikView Data Visualization by Karl Pover

What you will learn

Deliver common business requirements using advanced techniques
Load data from disparate sources to build associative data models
Understand when to apply more advanced data visualization
Utilize the built-in aggregation functions for complex calculations
Build a data architecture that supports scalable QlikView deployments
Troubleshoot common data visualization errors in QlikView
Protect your QlikView applications and data

Who this book is for

This Learning Path is designed for developers who want to go beyond their technical knowledge of QlikView and understand how to create analysis and data visualizations that solve real business needs. To grasp the concepts explained in this Learning Path, you should have a basic understanding of the common QlikView functions and some hands-on experience with the tool.

Miguel Ángel García is a Business Intelligence consultant and Qlik Solutions Architect from Monterrey, Mexico. Having worked throughout many successful Qlik implementations, from inception through implementation, and performed across a wide variety of roles on each project, his experience and skills range from pre-sales to applications development and design, technical architecture, system administration, as well as functional analysis and overall project execution. He currently holds the QlikView Designer, QlikView Developer, and QlikView System Administrator Certifications. Barry Harmsen is a Business Intelligence Consultant based in the Netherlands. Here he runs Bitmetric, a boutique consulting firm specialized in the Qlik product suite. Originally from a background of traditional business intelligence, data warehousing, and performance management, in 2008 Barry made the shift to Qlik and a more user-centric form of Business Intelligence. Since then, he and his team have helped many clients get the most out of their investments in the Qlik platform. Barry is one of the four Qlik experts teaching at the Masters Summit for Qlik, an advanced 3-day training for experienced Qlik professionals. He also runs the Amsterdam chapter of the Qlik Dev Group, an open and informal gathering where Qlik professionals can share knowledge and experiences. Stephen Redmond is the CTO and Qlik Luminary at CapricornVentis - a QlikView Elite Partner. He is the author of several books, including QlikView for Developers Cookbook and QlikView Server and Publisher, both published by Packt Publishing. He is also the author of the popular DevLogixseries for SalesLogix developers. In 2006, after many years of working with CRM systems, reporting and analysis solutions, and data integration, Stephen started working with QlikView. Since then, CapricornVentis has become QlikView's top partner in the UK and Ireland territories, and with Stephen as the head of the team, they have implemented QlikView in a wide variety of enterprise and large-business customers across a wide range of sectors, from public sector to financial services to large retailers. In 2014, Stephen was awarded the Luminary status by Qlik in recognition of his product advocacy. He regularly contributes to online forums, including the Qlik Community. Karl Pover is the owner and principal consultant of Evolution Consulting, which provides QlikView consulting services throughout Mexico. Since 2006, he has been dedicated to providing QlikView presales, implementation, and training for more than 50 customers. He is the author of Learning QlikView Data Visualization, and he has also been a Qlik Luminary since 2014.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 886

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

QlikView for Developers

Miguel Ángel García

¿Cómo el modelo pedagógico del Colegio Encuentros influye en la elección del proyecto de vida de los estudiantes de grado 11º?

Miguel Ángel García

Mirror Welt

Karl Olsberg

Für immer aufgeräumt – auch digital

Jürgen Kurz

Das Gedächtnis des Körpers

Warum ich fühle, was du fühlst

Joachim Bauer

Genesis

Guido Tonelli

Die kürzeste Geschichte allen Lebens

Harald Lesch

Die Physik der Zukunft

Michio Kaku

Unnützes Wissen für Gamer

Björn Rohwer

Das elegante Universum

Brian Greene

Mensch und Wald

Carsten Wippermann

Nikon D7100 For Dummies

eBay Business All-in-One For Dummies

Marsha Collier

Windows 8 For Dummies, Pocket Edition

QlikView: Advanced Data Visualization

Why subscribe?

PacktPub.com

Contributors

About the authors

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

1. Performance Tuning and Scalability

Reviewing basic performance tuning techniques

Removing unneeded data

Reducing the number of rows

Reducing the number of columns

Replacing text keys with numbers

Resolving synthetic keys

Reviewing the basics

Generating test data

Generating dimension values

Generating fact table rows

Understanding how QlikView stores its data

A great primer

Looking at things from a simple level

Exporting the memory statistics for a document

Strategies to reduce the data size and improve performance

Optimizing field values and keys

Optimizing data by removing keys using ApplyMap

Optimizing performance by removing keys by joining tables

Optimizing memory by removing low cardinality fields

Testing chart performance for different load options

Turning the cache off

Examining the chart calculation time for different scenarios

Optimizing performance by creating counter fields

Optimizing performance by combining fact tables?

Optimizing your numbers

Optimizing chart calculation times

The QlikView calculation engine

Creating flags for well-known conditions

Sorting for well-known conditions

Using Direct Discovery

Direct Discovery restrictions

Direct Discovery syntax

Looking at an example Direct Query

Testing scalability with JMeter

Obtaining the scalability tools

Installing JMeter

Installing the scalability tools

About the scalability tools

Running an example execution

Summary

2. QlikView Data Modeling

Reviewing basic data modeling

Associating data

Automatically associating tables

Understanding synthetic keys

Creating composite keys

Using string concatenation

Using one of the Hash functions

Using the AutoNumber function

Realizing that facts are calculated at the level of their table

Joining data

Understanding Join and Keep

Inner joins

Left and right joins

Outer joins

Cartesian joins

Understanding the effect of duplicate key values on joins

Understanding Keep

Concatenating rows

Reviewing Concatenate

Differentiating Concatenate and Join

Mapping data with ApplyMap

Reviewing the basic functionality of ApplyMap

Mapping numbers

Using ApplyMap instead of Join with duplicate rows

Dimensional data modeling

Differentiating between facts and dimensions

Understanding the grain

Understanding star schemas

Summing with facts

Discovering more about facts

Transaction fact tables

Periodic snapshot fact tables

Factless fact tables

Dealing with nulls in fact tables in QlikView

Designing dimension tables

Denormalizing dimensions and conformed dimensions

Understanding surrogate keys

Dealing with missing or late arriving dimension values

Defining Kimball's four-step dimensional design process

Selecting the business process

Declaring the grain

Identifying the dimensions

Identifying the facts

Learning some useful reusable dimension methods

Creating a calendar dimension

Unwrapping hierarchies

Creating leaves with Hierarchy

Creating parent associations with HierarchyBelongsTo

Creating dimensional facts

Handling slowly changing dimensions

Taking the most recently changed record using FirstSortedValue

Using IntervalMatch with SCDs

Using hash to manage from/to dates

Dealing with multiple fact tables in one model

Joining the fact tables together

Concatenating fact tables

Changing the grain of a fact table

Linking fact tables of different grains

Drilling across with document chaining

Summary

3. Best Practices for Loading Data

Reviewing data loading concepts

Getting data from anywhere

Loading data from QlikView

Loading similar files with concatenation

Loading dissimilar files with Concatenate and For Each

Understanding QlikView Data files

Storing tables to QVD

Using QVD files

Understanding why you should use an ETL approach

Speeding up overall data loading

Reusing extracted data in multiple documents

Applying common business rules across multiple documents

Creating conformed dimensions

Provisioning a self-service data layer

Using an ETL approach to create QVD data layers

Creating a StoreAndDrop subroutine

Extracting data

Creating an extractor folder structure

Differentiating types of scripts

Executing the extractors

Transforming data

Creating a transformer and model folder structure

Executing transformers

Loading data

Creating a UserApp folder structure

Executing the load step

Mastering loading techniques

Loading data incrementally

Establishing the script for the basic process

Running an incremental load when data is only added

Loading incrementally when data might be modified

Handling deletions from the source system

Handling situations where there is no modify date

Partially reloading only one part of the data model

Replacing a table

Adding new rows to a table

Managing script execution in partial reloads

Loading the content of another QVW

Using QlikView Expressor for ETL

Introducing Expressor

Understanding why to use Expressor for ETL

Understanding workspaces, libraries, projects, and artifacts

Creating a workspace

Managing extensions

Working with libraries and projects

Understanding artifacts

Configuring connections

Configuring a File connection

Connecting to a database

Creating a QVX Connector Connection

Configuring types and schemas

Adding additional Atomic types

Creating Composite types

Configuring a schema

Creating and packaging a basic dataflow

Understanding the dataflow toolbox

Inputs

Outputs

Transformers

Utility

Creating the dataflow

Configuring a Read File operator

Adding a Transformation operation

Creating a QVX output

Packaging the dataflow

Summary

4. Advanced Expressions

Reviewing basic concepts

Searching in QlikView

Searching for text

Wildcard search

Normal search

Fuzzy search

Associative search

Advanced search

Searching numeric fields

Numeric search

Automatic interpretation of searches

Multiple values search

Searching in multiple listboxes

Understanding bookmarks

Saving a bookmark

Managing bookmarks

Using variables in QlikView

SET versus LET

Using variables to hold common expressions

Using variables with Dollar-sign Expansion

Limiting calculations

Sum of If

Flag arithmetic

Calculations using variables

Data islands

Set Analysis

Explaining what we mean by a set

Set identifiers

Set modifiers

Understanding Dollar-sign Expansion

Following the two-step process

Following the steps in the script debugger

Following the steps in a chart expression

Understanding when the steps happen in chart expressions

Using parameters with variables and Dollar-sign Expansion

Using variables in expressions

Using advanced Set Analysis

Identifying the identifiers

Understanding that modifiers are sets

Set arithmetic

Using searches in Set Analysis

Using Dollar-sign Expansion with Set Analysis

Comparing to other fields

Direct field comparison

Using Concat with Dollar-sign Expansion

Using the P and E element functions

Set Analysis with Alternate States

Using Alternate States as identifiers

Comparing fields between states

Calculating vertically

Using inter-record and range functions

Applying the Total qualifier

Creating advanced aggregations with Aggr

Using Aggr to calculate a control chart

Calculated dimensions

No to nodistinct

Summary

5. Advanced Scripting

Reviewing the basic concepts

Using Table Files Wizard

Using relative paths

Delimited files

Fixed width files

XML files

HTML files

QVD/QVX files

Connecting to databases

Using the Connect button

Understanding the Connect To statement

Explaining the Force 32 Bit option

The Select wizard

Counting records

RecNo

RowNo

FieldValueCount

NoOfRows

NoOfColumns

Loading data quickly

Understanding compression settings

Optimal loading from QVD

Using an Exists clause

Preloading fields into QVDs

Applying variables and the Dollar-sign Expansion in the script

Examining common usage

Holding dates

Holding paths

Examining variable values during reloads

Nesting Dollar-sign Expansions

Passing parameters to variables – macro functions

Subroutines

Using control structures

Branching with conditional statements

If … Then … ElseIf

A note about conditional functions

Switch … Case

When and Unless

Looping in the script

AutoGenerate

For … Next loops

For Each … Next loops

FileList

DirList

Do … Loop

Exiting

Exiting the script

Exiting other constructs

Using variables for error handling

ErrorMode

ScriptError

ScriptErrorCount and ScriptErrorList

Examining advanced Table File Wizard options

Enabling a transformation step

Garbage

Fill

Column

Context

Unwrap

Rotate

Using the Crosstable wizard

Looking at data from different directions

Putting things first

First

FirstSortedValue

Looking backwards

Peek

Reusing code

Summary

6. What's New in QlikView 12?

Common QIX Engine

64-bit only

Online documentation

Security improvements

Mobile touch improvements

Improved clustering and scaling

Clustering improvements in QlikView 12.1

What is new in the Qlik product portfolio?

Qlik Sense Enterprise and the Qlik Analytics Platform

In what way are QlikView and Qlik Sense similar?

In what way do QlikView and Qlik Sense differ?

What does this mean for users?

What is the Qlik Analytics Platform?

Qlik NPrinting

Qlik Web Connectors

Qlik GeoAnalytics

Qlik DataMarket

How do the products in the Qlik product portfolio fit together?

Summary

7. Styling Up

Design requirements

The Document Properties window

The Sheet Properties dialog

Setting the object properties

Caption colors and style

Changing the caption colors

The Color Area and Color dialog windows

Setting the caption font

Setting the content font

Setting the global font

Propagating the object appearance

Setting the default Sheet Object Style

Hiding captions

Working with listboxes

Adding listboxes

The List Box Properties dialog

The General tab

The Expressions tab

The Sort tab

The Presentation tab

The Number tab

The Font tab

The Layout tab

The Caption tab

The Multi Box

The Current Selections Box

Making selections from the Current Selections Box

Adding a Bookmark Object

Aligning and resizing sheet objects

Selecting objects

Moving objects

Resizing objects

Resizing a Multi Box

Aligning sheet objects

Do a little house keeping

Creating and applying a default color map

Defining chart colors

Setting the default color map

Summary

8. Building Dashboards

User types

Dashboard users

Analysts

Report users

Applying the DAR principle to Airline Operations

Document requirements

Creating the Analysis sheet

Adding a new chart

Bar Chart

Additional bar chart properties

Style

Presentation

Expressions and the Edit Expression window

Expressions

The Edit Expression window

Fields

Functions

Variables

Images

The Expression Overview window

Line Chart

Additional line chart properties

Expressions

Style

Presentation

Combo Chart

Container

Scatter Chart

Button

Statistics box

Creating the new Dashboard sheet

Linked Objects

Gauges

Cloning the object for re-use

Adding Air Time %

More Gauge styles

Adding a Text object

Using a Text Object to display an image

Adding actions to a Text object

Adding a Pie chart

Dimension Limits

Adding the dimension value to the data point values

Creating the Reports sheet

Variables

The Expression Overview window in action

Copying sheets

KPIs per airline, origin, and destination country

Cyclic and Drill-down groups

Straight table

Not all expressions are numbers

Pivot tables

Auto minimize

The Report Editor window

Other charts

Radar Chart

Mekko Chart

Grid Chart

Funnel Chart

Block Chart

Trellis Chart

Summary

9. Advanced Data Transformation

Data architecture

Two-stage architecture

Three-stage architecture

Setting up our environment

Loading data already stored in QlikView

Cloning a QlikView data model

Loading from RAM

Resident load

Aggregating data

Aggregating the Flight Data table

The Transformation output

Aggregation functions

Sorting tables

Ordering the Order-By fields

The Peek function

Merging forces

A refresher

The objective

Getting it done

Loading the table

Sorting the table

Peeking previous records

A solo exercise

Dealing with slowly changing dimensions

The Carrier Decode table

IntervalMatch magic

Expanding the intervals

Some considerations

Applying IntervalMatch to the Carrier Decode table

Ordering, peeking, and matching all at once

The use case

Incremental loads

Summary

10. Security

Hidden script

Section access

Section access fields

Reduction fields

Initial data reduction

Omitting fields

Document-level security

Sheet-level security

Summary

11. Data Visualization Strategy

Data exploration, visualization, and discovery

Data teams and roles

Data research and development

Data governance team

Agile development

User story

Minimum Viable Product

QlikView Deployment Framework

Exercise 11.1

Summary

12. Sales Perspective

Sales perspective data model

Exercise 12.1

Data quality issues

Missing dimension values

Missing fact values

Data formatting and standardization

Case

Unwanted characters

Dates and time

Master calendar

Customer stratification.

Pareto analysis

Exercise 12.2

Exercise 12.3

Customer churn

Exercise 12.4

Exercise 12.5

QlikView extensions and the cycle plot

Exercise 12.6

Governance – design template

Summary

13. Financial Perspective

Financial perspective data model

Exercise 13.1

Financial report metadata

AsOfCalendar

Income statement

Exercise 13.2

Custom format cell

Exercise 13.3

Balance sheet

Exercise 13.4

Exercise 13.5

Cash flow statement

Exercise 13.6

Summary

QlikView: Advanced Data Visualization

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2018

Production reference: 1191218

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78995-599-6

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionalsLearn better with Skill Plans built especially for youGet a free eBook or video every monthMapt is fully searchableCopy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Barry Harmsen is a Business Intelligence Consultant based in the Netherlands. Here he runs Bitmetric, a boutique consulting firm specialized in the Qlik product suite. Originally from a background of traditional business intelligence, data warehousing, and performance management, in 2008 Barry made the shift to Qlik and a more user-centric form of Business Intelligence. Since then, he and his team have helped many clients get the most out of their investments in the Qlik platform.

Barry is one of the four Qlik experts teaching at the Masters Summit for Qlik, an advanced 3-day training for experienced Qlik professionals. He also runs the Amsterdam chapter of the Qlik Dev Group, an open and informal gathering where Qlik professionals can share knowledge and experiences.

Stephen Redmond is the CTO and Qlik Luminary at CapricornVentis - a QlikView Elite Partner. He is the author of several books, including QlikView for Developers Cookbook and QlikView Server and Publisher, both published by Packt Publishing. He is also the author of the popular DevLogixseries for SalesLogix developers. In 2006, after many years of working with CRM systems, reporting and analysis solutions, and data integration, Stephen started working with QlikView. Since then, CapricornVentis has become QlikView's top partner in the UK and Ireland territories, and with Stephen as the head of the team, they have implemented QlikView in a wide variety of enterprise and large-business customers across a wide range of sectors, from public sector to financial services to large retailers. In 2014, Stephen was awarded the Luminary status by Qlik in recognition of his product advocacy. He regularly contributes to online forums, including the Qlik Community.

Karl Pover is the owner and principal consultant of Evolution Consulting, which provides QlikView consulting services throughout Mexico. Since 2006, he has been dedicated to providing QlikView presales, implementation, and training for more than 50 customers. He is the author of Learning QlikView Data Visualization, and he has also been a Qlik Luminary since 2014.

Packt is Searching for Authors Like You

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Preface

With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro.

Who this book is for

What this book covers

Chapter 1, Performance Tuning and Scalability, is where we look at understanding how QlikView stores its data so that we can optimize that storage in our applications. We will also look at topics such as Direct Discovery and testing implementations using JMeter.

Chapter 2, QlikView Data Modeling, looks in detail at dimensional data modeling and learning about fact and dimension tables and using best practices from Ralph Kimball in QlikView. We also learn about how to handle slowly changing dimensions (SCDs), multiple fact tables, and drilling across with document chaining.

Chapter 3, Best Practices for Loading Data, is where we look at implementing ETL strategies with QVD files. We also introduce QlikView Expressor.

Chapter 4, Advanced Expressions, is where we look at areas such as the Dollar-sign Expansion, set analysis, and vertical calculations using Total and Aggr.

Chapter 5, Advanced Scripting, looks at optimizing loads, Dollar-sign Expansion in the script, and control structures. We also introduce the concept of code reuse.

Chapter 6, What's New in QlikView 12, presents a summary of the changes in the QlikView software, as well as in the Qlik ecosystem in general, that happened since the previous version of this book was published in 2012. In this chapter, we will bring you up to speed with the changes over the past few years.

Chapter 7, Styling Up, will help us learn how to style our QlikView documents. We will learn about the various document and sheet properties and will use them to manage the visual style of our document. We will also take a closer look at some of the most fundamental objects and learn how we can change their appearance.

Chapter 8, Building Dashboards, introduces us to the three basic types of QlikView users, and how we can best cater to their needs. We will learn about the various charting options that are available in QlikView, and will see how we can add interactivity to our QlikView documents. We will also be introduced to basic calculations.

Chapter 9, Advanced Data Transformation, returns to the topic of data transformation. We will learn about the most commonly used data architectures that can ease QlikView development and administration. Next, we will take a close look at aggregating and sorting data in the data model. In the fi nal part of the chapter, we will learn how to take advantage of some of QlikView's most powerful data transformation capabilities.

Chapter 10, Security, shows how to secure our QlikView documents. We will see how we can allow only authorized users to open our documents and will learn how we can limit what a user can do and see within our document.

Chapter 11, Data Visualization Strategy, begins our journey to create a data-driven organization using QlikView.

Chapter 12, Sales Perspective, explains the data model's importance to data visualization, and shows us how to create advanced analyses, such as customer stratifi cation, churn prediction, and seasonal trends.

Chapter 13, Financial Perspective, illustrates the usage of metadata to format an income statement, a balance sheet, and a cash flow statement.

Chapter 14, Marketing Perspective, walks us through various types of visualization that reveal customer profiles, potential markets, social media sentiment, and the sales pipeline.

Chapter 15, Working Capital Perspective, describes how to analyze days sales of inventory, days sales outstanding, and days payable outstanding, at both a high and a detailed level. It also explains how they are important in order to determine customer stratification.

Chapter 16, Operations Perspective, shows us how to analyze our service levels, predict supplier lead times, and investigate whether on-time deliveries depend on the supplier.

Chapter 17, Human Resources, reveals how to visualize personnel productivity and personal behavior analysis.

Chapter 18, Fact Sheets, demonstrates an ad hoc design method to create a customer fact sheet that includes bullet graphs, sparklines, and a customized UX.

Chapter 19, Balanced Scorecard, details a more formal design method to build an information dashboard containing balanced scorecard metrics.

Chapter 20, Troubleshooting Analysis, takes a look at resources and methods to debug problems in our QlikView applications.

Chapter 21, Mastering Qlik Sense Data Visualization, explains what Qlik Sense means to a QlikView developer and proposes a plan to master Qlik Sense data visualization.

To get the most out of this book

To use this book, you primarily need the QlikView Desktop software. With regards to computer requirements, you will need a PC with at least Windows XP (or better), 2 GB of hard disk space, and 2 GB of RAM. A 64-bit machine is required if you want to use QlikView 12 or a higher version, and is the recommended environment for this book and QlikView development in general. If you prefer to use a 32-bit machine, you can install QlikView 11 instead.

For best understanding, a general knowledge of BI and its terminology is required. Basic understanding of databases and SQL is preferred, but not compulsory for this book.

Download the example code files

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packtpub.com.Select the SUPPORT tab.Click on Code Downloads & Errata.Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub https://github.com/PacktPublishing/QlikView-Advanced-Data-Visualization. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "In the QVScriptGenTool_0_7 64Bit\Analyzer folder there is a ZIP file called FolderTemplate.zip."

A block of code is set as follows:

Sales: Load * INLINE [ Country, Sales USA, 1000 UK, 940 Japan, 543 ];

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Sales: Load * INLINE [ Country, Sales USA, 1000 UK, 940 Japan, 543 ];

Any command-line input or output is written as follows:

C:\Program Files\QlikView\qv.exe

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click on the Execution tab."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Chapter 1. Performance Tuning and Scalability

"The way Moore's Law occurs in computing is really unprecedented in other walks of life. If the Boeing 747 obeyed Moore's Law, it would travel a million miles an hour, it would be shrunken down in size, and a trip to New York would cost about five dollars. Those enormous changes just aren't part of our everyday experience."

— Nathan Myhrvold, former Chief Technology Officer at Microsoft, 1995

The way Moore's Law has benefitted QlikView is really unprecedented amongst other BI systems.

QlikView began life in 1993 in Lund, Sweden. Originally titled "QuickView", they had to change things when they couldn't obtain a copyright on that name, and thus "QlikView" was born.

After years of steady growth, something really good happened for QlikView around 2005/2006—the Intel x64 processors became the dominant processors in Windows servers. QlikView had, for a few years, supported the Itanium version of Windows; however, Itanium never became a dominant server processor. Intel and AMD started shipping the x64 processors in 2004 and, by 2006, most servers sold came with an x64 processor—whether the customer wanted 64-bit or not. Because the x64 processors could support either x86 or x64 versions of Windows, the customer didn't even have to know. Even those customers who purchased the x64 version of Windows 2003 didn't really know this because all of their x86 software would run just as well (perhaps with a few tweaks).

But x64 Windows was fantastic for QlikView! Any x86 process is limited to a maximum of 2 GB of physical memory. While 2 GB is quite a lot of memory, it wasn't enough to hold the volume of data that a true enterprise-class BI tool needed to handle. In fact, up until version 9 of QlikView, there was an in-built limitation of about 2 billion rows (actually, 2 to the power of 31) in the number of records that QlikView could load. On x86 processors, QlikView was really confined to the desktop.

x64 was a very different story. Early Intel implementations of x64 could address up to 64 GB of memory. More recent implementations allow up to 256 TB, although Windows Server 2012 can only address 4 TB. Memory is suddenly less of an obstacle to enterprise data volumes.

The other change that happened with processors was the introduction of multi-core architecture. At the time, it was common for a high-end server to come with 2 or 4 processors. Manufacturers came up with a method of putting multiple processors, or cores, on one physical processor. Nowadays, it is not unusual to see a server with 32 cores. High-end servers can have many, many more.

One of QlikView's design features that benefitted from this was that their calculation engine is multithreaded. That means that many of QlikView's calculations will execute across all available processor cores. Unlike many other applications, if you add more cores to your QlikView server, you will, in general, add more performance.

So, when it comes to looking at performance and scalability, very often, the first thing that people look at to improve things is to replace the hardware. This is valid of course! QlikView will almost always work better with newer, faster hardware. But before you go ripping out your racks, you should have a good idea of exactly what is going on with QlikView. Knowledge is power; it will help you tune your implementation to make the best use of the hardware that you already have in place.

The following are the topics we'll be covering in this chapter:

Reviewing basic performance tuning techniquesGenerating test dataUnderstanding how QlikView stores its dataLooking at strategies to reduce the data size and to improve performanceUsing Direct DiscoveryTesting scalability with JMeter

Reviewing basic performance tuning techniques

There are many ways in which you may have learned to develop with QlikView. Some of them may have talked about performance and some may not have. Typically, you start to think about performance at a later stage when users start complaining about slow results from a QlikView application or when your QlikView server is regularly crashing because your applications are too big.

In this section, we are going to quickly review some basic performance tuning techniques that you should, hopefully, already be aware of. Then, we will start looking at how we can advance your knowledge to master level.

Removing unneeded data

Removing unneeded data might seem easy in theory, but sometimes it is not so easy to implement—especially when you need to negotiate with the business. However, the quickest way to improve the performance of a QlikView application is to remove data from it. If you can reduce your number of fact rows by half, you will vastly improve performance. The different options are discussed in the next sections.

Reducing the number of rows

The first option is to simply reduce the number of rows. Here we are interested in Fact or Transaction table rows—the largest tables in your data model. Reducing the number of dimension table rows rarely produces a significant performance improvement.

The easiest way to reduce the number of these rows is usually to limit the table by a value such as the date. It is always valuable to ask the question, "Do we really need all the transactions for the last 10 years?" If you can reduce this, say to 2 years, then the performance will improve significantly.

We can also choose to rethink the grain of the data—to what level of detail we hold the information. By aggregating the data to a higher level, we will often vastly reduce the number of rows.

Reducing the number of columns

The second option is to reduce the width of tables—again, especially Fact or Transaction tables. This means looking at fields that might be in your data model but do not actually get used in the application. One excellent way of establishing this is to use the Document Analyzer tool by Rob Wunderlich to examine your application (http://robwunderlich.com/downloads).

As well as other excellent uses, Rob's tool looks at multiple areas of an application to establish whether fields are being used or not. It will give you an option to view fields that are not in use and has a useful DROP FIELD Statements listbox from which you can copy the possible values. The following screenshot shows an example (from the default document downloadable from Rob's website):

Adding these DROP FIELD statements into the end of a script makes it very easy to remove fields from your data model without having to dive into the middle of the script and try to remove them during the load—which could be painful.

There is a potential issue here; if you have users using collaboration objects—creating their own charts—then this tool will not detect that usage. However, if you use the DROP FIELD option, then it is straightforward to add a field back if a user complains that one of their charts is not working.

Of course, the best practice would be to take the pain and remove the fields from the script by either commenting them out or removing them completely from their load statements. This is more work, because you may break things and have to do additional debugging, but it will result in a better performing script.

Replacing text keys with numbers

Often, you will have a text value in a key field, for example, something like an account number that has alphanumeric characters. These are actually quite poor for performance compared to an integer value and should be replaced with numeric keys.

Note

There is some debate here about whether this makes a difference at all, but the effect is to do with the way the data is stored under the hood, which we will explore later. Generated numeric keys are stored slightly differently than text keys, which makes things work better.

The strategy is to leave the text value (account number) in the dimension table for use in display (if you need it!) and then use the AutoNumber function to generate a numeric value—also called a surrogate key—to associate the two tables.

For example, replace the following:

Account: Load AccountId, AccountName, … From Account.qvd (QVD); Transaction: Load TransactionId, AccountId, TransactionDate, … From Transaction.qvd (QVD);

With the following:

Generating test data

It is enormously useful to be able to quickly generate test data so that we can create QlikView applications and test different aspects of development and discover how different development methods work. By creating our own set of data, we can abstract problems away from the business issues that we are trying to solve because the data is not connected to those problems. Instead, we can resolve the technical issue underlying the business issue. Once we have resolved that issue, we will have built an understanding that allows us to more quickly resolve the real problems with the business data.

We might contemplate that if we are developers who only have access to a certain dataset, then we will only learn to solve the issues in that dataset. For true mastery, we need to be able to solve issues in many different scenarios, and the only way that we can do that is to generate our own test data to do that with.

Generating dimension values

Dimension tables will generally have lower numbers of records; there are a number of websites online that will generate this type of data for you.

The following screenshot demonstrates setting up aCustomer extract in Mockaroo:

This allows us to create 1,000 customer records that we can include in our QlikView data model. The extract is in the CSV format, so it is quite straightforward to load into QlikView.

Generating fact table rows

While we might often abdicate the creation of test dimension tables to a third-party website like this, we should always try and generate the Fact table data ourselves.

A good way to do this is to simply generate rows with a combination of the AutoGenerate() and Rand() functions.

For even more advanced use cases, we can look at using statistical functions such as NORMINV to generate normal distributions. There is a good article on this written by Henric Cronström onQlik Design Blog at http://community.qlik.com/blogs/qlikviewdesignblog/2013/08/26/monte-carlo-methods.

We should be aware of the AutoGenerate() function that will just simply generate empty rows of data. We can also use the Rand() function

Understanding how QlikView stores its data

QlikView is really good at storing data. It operates on data in memory, so being able to store a lot of data in a relatively small amount of memory gives the product a great advantage—especially as Moore's Law continues to give us bigger and bigger servers.

Understanding how QlikView stores its data is fundamental in mastering QlikView development. Writing load script with this understanding will allow you to load data in the most efficient way so that you can create the best performing applications. Your users will love you.

A great primer

A great primer on how QlikView stores its data is available onQlik Design Blog, written by Henric Cronström (http://community.qlik.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers).

Note

Henric joined QlikView in 1994, so he knows quite a bit about exactly how it works.

Looking at things from a simple level

From a simple level, consider the following small table:

First name

Surname

Country

John

Smith

USA

Jane

Smith

USA

John

Doe

Canada

For the preceding table, QlikView will create three symbol tables like the following:

Index

Value

1010

John

1011

Jane

Index

Value

1110

Smith

1111

Doe

Index

Value

110

USA

111

Canada

And the data table will look like the following:

First name

Surname

Country

1010

1110

110

1011

1110

110

1010

1111

111

This set of tables will take up less space than the original data table for the following three reasons:

The binary indexes are bit-stuffed in the data table—they only take up as much space as needed.The binary index, even though repeated, will take up less space than the text values. The Unicode text just for "USA" takes up several bytes—the binary index takes less space than that.Each, larger, text value is only stored once in the symbol tables.

So, to summarize, each field in the data model will be stored in a symbol table (unless, as we will see later, it is a sequential integer value) that contains the unique values and an index value. Every table that you create in the script—including any synthetic key tables—will be represented as a data table containing just the index pointers.

Note

Because the data table indexes are bit-stuffed, and because data is stored in bytes, adding another bit or two to the indexes may not actually increase the overall width of a data table record.

Exporting the memory statistics for a document

To help us understand what is going on in a particular QlikView document, we can export details about where all the memory is being used. This export file will tell us how much memory is being used by each field in the symbol tables, the data tables, chart objects, and so on.

Perform the following steps to export the memory statistics for a document:

To export the memory statistics, you need to open Document Properties from the Settings menu (Ctrl + Alt + D). On the General tab, click on the Memory Statistics button, as shown in the following screenshot:After you click on the button, you will be prompted to enter file information. Once you have entered the path and filename, the file will be exported. It is a tab-delimited data file:The easiest way to analyze this file is to import it into a new QlikView document:

We can now see exactly how much space our data is taking up in the symbol tables and in the data tables. We can also look at chart calculation performance to see whether there are long running calculations that we need to tune. Analyzing this data will allow us to make valuable decisions about where we can improve performance in our QlikView document.

Strategies to reduce the data size and improve performance

Using some of the test data that we have generated, or any other data that you might want, we can discover more about how QlikView handles different scenarios. Understanding these different situations will give you real mastery over data load optimization.

Optimizing field values and keys

To begin with, let's see what happens when we load two largish tables that are connected by a key. So, let's ignore the dimension tables and load the order data using a script like the following:

Order: LOAD OrderID, OrderDate, CustomerID, EmployeeID, Freight FROM [..\Scripts\OrderHeader.qvd] (qvd); OrderLine: LOAD OrderID, LineNo, ProductID, Quantity, SalesPrice, SalesCost, LineValue, LineCost FROM [..\Scripts\OrderLine.qvd] (qvd);

The preceding script will result in a database memory profile that looks like the following. In the following screenshot, Database has been selected for Class:

There are some interesting readings in this table. For example, we can see that when the main data table—OrderLine—is stored with just its pointer records, it takes up just 923,085 bytes for 102,565 records. That is an average of only 9 bytes per record. This shows the space benefit of the bit-stuffed pointer mechanism as described in Henric's blog post.

The largest individual symbol table is the OrderDate field. This is very typical of a TimeStamp field, which will often be highly unique, have long decimal values, and have the Dual text value, and so often takes up a lot of memory—28 bytes per value.

The number part of a TimeStamp field contains an integer representing the date (number of days since 30th December 1899) and a decimal representing the time. So, let's see what happens with this field if we turn it into just an integer—a common strategy with these fields as the time portion may not be important:

Order: LOAD OrderID, Floor(OrderDate) As DateID, ...

This changes things considerably:

The number of unique values has been vastly reduced, because the highly unique date and time values have been replaced with a much lower cardinality (2171) date integer, and the amount of memory consumed is also vastly reduced as the integer values are only taking 8 bytes instead of the 28 being taken by each value of the TimeStamp field.

The next field that we will pay attention to is OrderID. This is the key field, and key fields are always worth examining to see whether they can be improved. In our test data, the OrderID field is alphanumeric—this is not uncommon for such data. Alphanumeric data will tend to take up more space than numeric data, so it is a good idea to convert it to integers using the AutoNumber function.

AutoNumber accepts a text value and will return a sequential integer. If you pass the same text value, it will return the same integer. This is a great way of transforming alphanumeric ID values into integers. The code will look like the following:

Order: LOAD AutoNumber(OrderID) As OrderID, Floor(OrderDate) As DateID, ... OrderLine: LOAD AutoNumber(OrderID) As OrderID, LineNo, ...

This will result in a memory profile like the following:

The OrderID

Testing chart performance for different load options

As well as information about memory use in each data table and symbol table, we can recall that the Memory Statistics option will also export information about charts—both memory use and calculation time. This means that we can create a chart, especially one with multiple dimensions and expressions, and see how long the chart takes to calculate for different scenarios.

Let's load the Order Header and Order Line data with the Calendar information loaded inline (as in the first part of the last example) in the following manner:

Order: LOAD AutoNumber(OrderID, 'Order') As OrderID, Floor(OrderDate) As DateID, Year(OrderDate) As Year, Month(OrderDate) As Month, Day(OrderDate) As Day, Date(MonthStart(OrderDate), 'YYYY-MM') As YearMonth, AutoNumber(CustomerID, 'Customer') As CustomerID, AutoNumber(EmployeeID, 'Employee') As EmployeeID, Freight FROM [..\Scripts\OrderHeader.qvd] (qvd); OrderLine: LOAD AutoNumber(OrderID, 'Order') As OrderID, LineNo, ProductID, Quantity, SalesPrice, SalesCost, LineValue, LineCost FROM [..\Scripts\OrderLine.qvd] (qvd);

Now we can add a chart to the document with several dimensions and expressions like this:

We have used YearMonth and CustomerID as dimensions. This is deliberate because these two fields will be in separate tables once we move the calendar fields into a separate table.

The expressions that we have used are shown in the following table:

Expression Label

Expression

Sales $

Sum(LineValue)

Sales $ Color

ColorMix1(Sum(LineValue)/Max(total Aggr(Sum(LineValue), YearMonth, CustomerID)), White(), ARGB(255, 0, 128, 255))

Cost $

Sum(LineCost)

Margin $

Sum(LineValue)-Sum(LineCost)

Margin %

(Sum(LineValue)-Sum(LineCost))/Sum(LineValue)

Cum. Sales $

RangeSum(Above(Sum(LineValue),0,RowNo()))

# Orders

Count(DISTINCT OrderID)

Product 101

Sum(If(ProductID=101,1,0))

Product 102-106

Sum(If(Match(ProductID,102,103,104,105,106), 1, 0))

Turning the cache off

The cache in QlikView is enormously important. Calculations and selections are cached as you work with a QlikView document. The next time you open a chart with the same selections, the chart will not be recalculated; you will get the cached answer instead. This really speeds up QlikView performance. Even within a chart, you might have multiple expressions using the same calculation (such as dividing two expressions by each other to obtain a ratio)—the results will make use of caching.

This caching is really useful for a working document, but a pain if we want to gather statistics on one or more charts. With the cache on, we need to close a document and the QlikView desktop, reopen the document in a new QlikView instance, and open the chart. To help us test the chart performance, it can therefore be a good idea to turn off the cache.

Note

Note that you need to be very careful with this dialog as you could break things in your QlikView installation. Turning off the cache is not recommended for normal use of the QlikView desktop as it can seriously interfere with the performance of QlikView. Turning off the cache to gather accurate statistics on chart performance is pretty much the only use case that one might ever come across for turning off the cache. There is a reason why it is a hidden setting!

Examining the chart calculation time for different scenarios

Now that the cache is turned off, we can open our chart and it will always calculate at the maximum time. We can then export the memory information as usual and load it into another copy of QlikView (here, the Class of Sheetobject is selected):

What we could do now is make some selections and save them as bookmarks. By closing the QlikView desktop client and then reopening it, and then opening the document and running through the bookmarks, we can export the memory file and create a calculation for Avg Calc Time. Because there is no cache involved, this should be a valid representation.

Now, we can comment out the inline calendar and create the Calendar table (as we did in a previous exercise):

Order: LOAD AutoNumber(OrderID, 'Order') As OrderID, Floor(OrderDate) As DateID, // Year(OrderDate) As Year, // Month(OrderDate) As Month, // Day(OrderDate) As Day, // Date(MonthStart(OrderDate), 'YYYY-MM') As YearMonth, AutoNumber(CustomerID, 'Customer') As CustomerID, AutoNumber(EmployeeID, 'Employee') As EmployeeID, Freight FROM [..\Scripts\OrderHeader.qvd] (qvd); OrderLine: //Left Join (Order) LOAD AutoNumber(OrderID, 'Order') As OrderID, LineNo, ProductID, Quantity, SalesPrice, SalesCost, LineValue, LineCost FROM [..\Scripts\OrderLine.qvd] (qvd); //exit Script; Calendar: Load Distinct DateID, Year(DateID) As Year, Month(DateID) As Month, Day(DateID) As Day, Date(MonthStart(DateID), 'YYYY-MM') As YearMonth Resident Order;

For the dataset size that we are using, we should see no difference in calculation time between the two data structures. As previously established, the second option has a smaller in-memory data size, so that would always be the preferred option.

Optimizing performance by creating counter fields

For many years, it has been a well-established fact among QlikView consultants that a Count() function with a Distinct clause is a very expensive calculation. Over the years, I have heard that Count can be up to 1000 times more expensive than Sum. Actually, since about Version 9 of QlikView, this is no longer true, and the Count function is a lot more efficient.

Optimizing chart calculation times

Once we have optimized our data model, we can turn our focus onto chart performance. There are a few different things that we can do to make sure that our expressions are optimal, and we can use the memory file extract to test them.

Some of the expressions will actually involve revisiting the data model. If we do, we will need to weigh up the cost of that performance with changes to memory, and so on.

It will be useful to begin with an explanation of how the QlikView calculation engine works.

The QlikView calculation engine

QlikView is very clever in how it does its calculations. As well as the data storage, as discussed earlier in this chapter, it also stores the binary state of every field and of every data table dependent on user selection—essentially, depending on the green/white/grey state of each field, it is either included or excluded. This area of storage is called the state space and is updated by the QlikView logical inference engine every time a selection is made. There is one bit in the state space for every value in the symbol table or row in the data table—as such, the state space is much smaller than the data itself and hence much faster to query.

There are three steps to a chart being calculated:

The user makes a selection, causing the logical inference engine to reset and recalculate the state space. This should be a multithreaded operation.On one thread per object, the state space is queried to gather together all of the combinations of dimensions and values necessary to perform the calculation. The state space is being queried, so this is a relatively fast operation, but could be a potential bottleneck if there are many visible objects on the screen.On multiple threads per object, the expression is calculated. This is where we see the cores in the task manager all go to 100 percent at the same time. Having 100 percent CPU is expected and desired because QlikView will "burst" calculations across all available processor cores, which makes this a very fast process, relative to the size and complexity of the calculation. We call it a burst because, except for the most complex of calculations, the 100 percent CPU should only be for a short time.

Of course, the very intelligent cache comes into play as well and everything that is calculated is stored for potential subsequent use. If the same set of selections are met (such as hitting the Back button), then the calculation is retrieved from the cache and will be almost instantaneous.

Now that we know more about how QlikView performs its calculations, we can look at a few ways that we can optimize things.

Creating flags for well-known conditions

We cannot anticipate every possible selection or query that a user might make, but there are often some quite well-known conditions that will generally be true most of the time and may be commonly used in calculations. In this example, we will look at Year-to-Date and Last Year-to-Date—commonly used on dashboards.

The following is an example of a calculation that might be used in a gauge:

Sum(If(YearToDate(Date), LineValue, 0)) /Sum(If(YearToDate(Date,-1), LineValue, 0)) -1

This uses the YearToDate() function to check whether the date is in the current year to date or in the year to date period for last year (using the -1 for the offset parameter). This expression is a sum of an if statement, which is generally not recommended. Also, these are quite binary—a date is either in the year to date or not—so are ideal candidates for the creation of flags. We can do this in the Calendar table in the following script:

Calendar: Load Distinct DateID, -YearToDate(DateID) As YTD_Flag, -YearToDate(DateID,-1) As LYTD_Flag, Date(DateID) As Date, Year(DateID) As Year, Month(DateID) As Month, Day(DateID) As Day, Date(MonthStart(DateID), 'YYYY-MM') As YearMonth Resident Order;

Note

Note the - sign before the function. This is because YearToDate is a Boolean function that returns either true or false, which in QlikView is represented by -1 and 0. If the value is in the year to date, then the function will return -1, so I add the - to change that to 1. A - sign before 0 will make no difference.

In a particular test dataset, we might see an increase from 8,684 bytes to 13,026—not an unexpected increase and not significant because the Calendar table is relatively small. We are creating these flags to improve performance in the frontend and need to accept a small change in the data size.

The significant change comes when we change the expression in the chart to the following:

Sum(LineValue*YTD_Flag)/Sum(LineValue*LYTD_Flag)-1

In a sample dataset, we might see that the calculation reduces from, say, 46 to, say, 16—a 65 percent reduction. This calculation could also be written using Set Analysis as follows:

Sum({<YTD_Flag={1}>} LineValue)/Sum({<LYTD_Flag={1}>} LineValue)-1

However, this might only get a calc time of 31—only a 32.6 percent reduction. Very interesting!

If we think about it, the simple calculation of LineValue*YTD_Flag is going to do a multithreaded calculation using values that are derived from the small and fast in-memory state space. Both If and Set Analysis are going to add additional load to the calculation of the set of values that are going to be used in the calculation.

In this case, the flag field is in a dimension table, Calendar, and the value field is in the fact table. It is, of course, possible to generate the flag field in the fact table instead. In this case, the calculation is likely to run even faster, especially on very large datasets. This is because there is no join of data tables required. However, the thing to bear in mind is that the additional pointer indexes in the Calendar table will require relatively little space whereas the additional width of the fact table, because of the large numbers of rows, will be something to consider. However, saying that, the pointers to the flag values are very small, so you do need a really long fact table for it to make a big difference. In some cases, the additional bit necessary to store the pointer in the bit-stuffed table will not make any difference at all, and in other cases, it may add just one byte.

Set Analysis can be very powerful, but it is worth considering that it often has to go, depending on the formula, outside the current state space, and that will cause additional calculation to take place that may be achieved in a simpler manner by creating a flag field in the script and using it in this way. Even if you have to use Set Analysis, the best performing comparisons are going to be using numeric comparisons, so creating a numeric flag instead of a text value will improve the set calculation performance. For example, consider the following expression:

Sum({<YTD_Flag={1}>} LineValue)

This will execute much faster than the following expression:

Sum({<YTD_Flag={'Yes'}>} LineValue)

So, when should we use Set Analysis instead of multiplying by flags? Barry Harmsen has done some testing that indicates that if the dimension table is much larger relative to the fact table, then using Set Analysis is faster than the flag fields. The reasoning is that the multiply method will process all records (even those containing 0), so in larger tables, it has more to process. The Set Analysis method will first reduce the scope, and apply the calculation to that subset.

Of course, if we have to introduce more advanced logic, that might include AND/OR/NOT operations, then Set Analysis is the way to go—but try to use numeric flags.

Sorting for well-known conditions

Any time that you need to sort a chart or listbox, that sort needs to be calculated. Of course, a numeric sort will always be the fastest. An alphabetic sort is a lot slower, just by its nature. One of the very slowest sorts is where we want to sort by expression.

For example, let's imagine that we wish to sort our Country list by a fixed order, defined by the business. We could use a sort expression like this:

Match(Country,'USA','Canada','Germany','United Kingdom','China','India','Russia','France','Ireland')

The problem is that this is a text comparison that will be continually evaluated. What we can do instead is to load a temporary sort table in the script. We load this towards the beginning of the script because it needs to be the initial load of the symbol table; something like the following:

Country_Sort: Load * Inline [ Country USA Canada Germany United Kingdom China India Russia France Ireland ];

Then, as we won't need this table in our data, we should remember to drop it at the end of the script—after the main data has been loaded:

Drop Table Country_Sort;

Now, when we use this field anywhere, we can turn off all of the sort options and use the last one—Load Order. This doesn't need to be evaluated so will always calculate quickly:

Using Direct Discovery

Traditionally, QlikView has been a totally in-memory tool. If you want to analyze any information, you need to get all of the data into memory. This has caused problems for many enterprise organizations because of the sheer size of data that they wanted to analyze. You can get quite a lot of data into QlikView—billions of rows are not uncommon on very large servers, but there is a limit. Especially in the last few years where businesses have started to take note of the buzz around Big Data, many believed that QlikView could not play in this area.

Direct Discovery was introduced with QlikView Version 11.20. In Version 11.20 SR5, it was updated with a new, more sensible syntax. This syntax is also available in Qlik Sense. What Direct Discovery does is allow a QlikView model to connect directly to a data source without having to load all of the data into memory. Instead, we load only dimension values and, when necessary, QlikView generates a query to retrieve the required results from the database.

Of course, this does have the potential to reduce some of the things that make QlikView very popular—the sub-second response to selections, for example. Every time that a user makes a selection, QlikView generates a query to pass through to the database connection. The faster the data connection, the faster the response, so a performative data warehouse is a boon for Direct Discovery. But speed is not always everything—with Direct Discovery, we can connect to any valid connection that we might normally connect to with the QlikView script; this includes ODBC connectors to Big Data sources such as Cloudera or Google.

Tip

Here we will get an introduction to using Direct Discovery, but we should read the more detailed technical details published by the Qlik Community, for example, the SR5 technical addendum at http://community.qlik.com/docs/DOC-3710.

Direct Discovery restrictions

There are a few restrictions of Direct Discovery that will probably be addressed with subsequent service releases:

Only one direct table is supported: This restriction has been lifted in QlikView 11.20 SR7 and Qlik Sense 1.0. Prior to those versions, you could only have one direct query in your data model. All other tables in the data model must be in-memory.Set Analysis and complex expressions not supported: Because the query is generated on the fly, it just can't work with the likes of a Set Analysis query. Essentially, only calculations that can be performed on the source database—Sum, Count, Avg, Min, Max—will work via Direct Discovery.Only SQL compliant data sources

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

QlikView: Advanced Data Visualization E-Book

Miguel Ángel García

Key Features

Book Description

What you will learn

Who this book is for

Table of Contents

QlikView: Advanced Data Visualization

QlikView: Advanced Data Visualization

Why subscribe?

PacktPub.com

Contributors

About the authors

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Note

Tip

Get in touch

Reviews

Chapter 1. Performance Tuning and Scalability

Reviewing basic performance tuning techniques

Removing unneeded data

Reducing the number of rows

Reducing the number of columns

Replacing text keys with numbers

Note

Generating test data

Generating dimension values

Generating fact table rows

Understanding how QlikView stores its data

A great primer

Note

Looking at things from a simple level

Note

Exporting the memory statistics for a document

Strategies to reduce the data size and improve performance

Optimizing field values and keys

Testing chart performance for different load options

Turning the cache off

Note

Examining the chart calculation time for different scenarios

Optimizing performance by creating counter fields

Optimizing chart calculation times

The QlikView calculation engine

Creating flags for well-known conditions

Note

Sorting for well-known conditions

Using Direct Discovery

Tip

Direct Discovery restrictions