41,99 €
Build powerful data analytics applications with this business intelligence tool and overcome all your business challenges
QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations.
Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources.
With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro.
This Learning Path includes content from the following Packt products:
This Learning Path is designed for developers who want to go beyond their technical knowledge of QlikView and understand how to create analysis and data visualizations that solve real business needs. To grasp the concepts explained in this Learning Path, you should have a basic understanding of the common QlikView functions and some hands-on experience with the tool.
Miguel Ángel García is a Business Intelligence consultant and Qlik Solutions Architect from Monterrey, Mexico. Having worked throughout many successful Qlik implementations, from inception through implementation, and performed across a wide variety of roles on each project, his experience and skills range from pre-sales to applications development and design, technical architecture, system administration, as well as functional analysis and overall project execution. He currently holds the QlikView Designer, QlikView Developer, and QlikView System Administrator Certifications. Barry Harmsen is a Business Intelligence Consultant based in the Netherlands. Here he runs Bitmetric, a boutique consulting firm specialized in the Qlik product suite. Originally from a background of traditional business intelligence, data warehousing, and performance management, in 2008 Barry made the shift to Qlik and a more user-centric form of Business Intelligence. Since then, he and his team have helped many clients get the most out of their investments in the Qlik platform. Barry is one of the four Qlik experts teaching at the Masters Summit for Qlik, an advanced 3-day training for experienced Qlik professionals. He also runs the Amsterdam chapter of the Qlik Dev Group, an open and informal gathering where Qlik professionals can share knowledge and experiences. Stephen Redmond is the CTO and Qlik Luminary at CapricornVentis - a QlikView Elite Partner. He is the author of several books, including QlikView for Developers Cookbook and QlikView Server and Publisher, both published by Packt Publishing. He is also the author of the popular DevLogixseries for SalesLogix developers. In 2006, after many years of working with CRM systems, reporting and analysis solutions, and data integration, Stephen started working with QlikView. Since then, CapricornVentis has become QlikView's top partner in the UK and Ireland territories, and with Stephen as the head of the team, they have implemented QlikView in a wide variety of enterprise and large-business customers across a wide range of sectors, from public sector to financial services to large retailers. In 2014, Stephen was awarded the Luminary status by Qlik in recognition of his product advocacy. He regularly contributes to online forums, including the Qlik Community. Karl Pover is the owner and principal consultant of Evolution Consulting, which provides QlikView consulting services throughout Mexico. Since 2006, he has been dedicated to providing QlikView presales, implementation, and training for more than 50 customers. He is the author of Learning QlikView Data Visualization, and he has also been a Qlik Luminary since 2014.Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 886
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2018
Production reference: 1191218
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78995-599-6
www.packtpub.com
mapt.io
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Miguel Ángel García is a Business Intelligence consultant and Qlik Solutions Architect from Monterrey, Mexico. Having worked throughout many successful Qlik implementations, from inception through implementation, and performed across a wide variety of roles on each project, his experience and skills range from pre-sales to applications development and design, technical architecture, system administration, as well as functional analysis and overall project execution. He currently holds the QlikView Designer, QlikView Developer, and QlikView System Administrator Certifications.
Barry Harmsen is a Business Intelligence Consultant based in the Netherlands. Here he runs Bitmetric, a boutique consulting firm specialized in the Qlik product suite. Originally from a background of traditional business intelligence, data warehousing, and performance management, in 2008 Barry made the shift to Qlik and a more user-centric form of Business Intelligence. Since then, he and his team have helped many clients get the most out of their investments in the Qlik platform.
Barry is one of the four Qlik experts teaching at the Masters Summit for Qlik, an advanced 3-day training for experienced Qlik professionals. He also runs the Amsterdam chapter of the Qlik Dev Group, an open and informal gathering where Qlik professionals can share knowledge and experiences.
Stephen Redmond is the CTO and Qlik Luminary at CapricornVentis - a QlikView Elite Partner. He is the author of several books, including QlikView for Developers Cookbook and QlikView Server and Publisher, both published by Packt Publishing. He is also the author of the popular DevLogixseries for SalesLogix developers. In 2006, after many years of working with CRM systems, reporting and analysis solutions, and data integration, Stephen started working with QlikView. Since then, CapricornVentis has become QlikView's top partner in the UK and Ireland territories, and with Stephen as the head of the team, they have implemented QlikView in a wide variety of enterprise and large-business customers across a wide range of sectors, from public sector to financial services to large retailers. In 2014, Stephen was awarded the Luminary status by Qlik in recognition of his product advocacy. He regularly contributes to online forums, including the Qlik Community.
Karl Pover is the owner and principal consultant of Evolution Consulting, which provides QlikView consulting services throughout Mexico. Since 2006, he has been dedicated to providing QlikView presales, implementation, and training for more than 50 customers. He is the author of Learning QlikView Data Visualization, and he has also been a Qlik Luminary since 2014.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
QlikView is one of the most flexible and powerful business intelligence platforms around, and if you want to transform data into insights, it is one of the best options you have at hand. Use this Learning Path, to explore the many features of QlikView to realize the potential of your data and present it as impactful and engaging visualizations.
Each chapter in this Learning Path starts with an understanding of a business requirement and its associated data model and then helps you create insightful analysis and data visualizations around it. You will look at problems that you might encounter while visualizing complex data insights using QlikView, and learn how to troubleshoot these and other not-so-common errors. This Learning Path contains real-world examples from a variety of business domains, such as sales, finance, marketing, and human resources.
With all the knowledge that you gain from this Learning Path, you will have all the experience you need to implement your next QlikView project like a pro.
This Learning Path is designed for developers who want to go beyond their technical knowledge of QlikView and understand how to create analysis and data visualizations that solve real business needs. To grasp the concepts explained in this Learning Path, you should have a basic understanding of the common QlikView functions and some hands-on experience with the tool.
Chapter 1, Performance Tuning and Scalability, is where we look at understanding how QlikView stores its data so that we can optimize that storage in our applications. We will also look at topics such as Direct Discovery and testing implementations using JMeter.
Chapter 2, QlikView Data Modeling, looks in detail at dimensional data modeling and learning about fact and dimension tables and using best practices from Ralph Kimball in QlikView. We also learn about how to handle slowly changing dimensions (SCDs), multiple fact tables, and drilling across with document chaining.
Chapter 3, Best Practices for Loading Data, is where we look at implementing ETL strategies with QVD files. We also introduce QlikView Expressor.
Chapter 4, Advanced Expressions, is where we look at areas such as the Dollar-sign Expansion, set analysis, and vertical calculations using Total and Aggr.
Chapter 5, Advanced Scripting, looks at optimizing loads, Dollar-sign Expansion in the script, and control structures. We also introduce the concept of code reuse.
Chapter 6, What's New in QlikView 12, presents a summary of the changes in the QlikView software, as well as in the Qlik ecosystem in general, that happened since the previous version of this book was published in 2012. In this chapter, we will bring you up to speed with the changes over the past few years.
Chapter 7, Styling Up, will help us learn how to style our QlikView documents. We will learn about the various document and sheet properties and will use them to manage the visual style of our document. We will also take a closer look at some of the most fundamental objects and learn how we can change their appearance.
Chapter 8, Building Dashboards, introduces us to the three basic types of QlikView users, and how we can best cater to their needs. We will learn about the various charting options that are available in QlikView, and will see how we can add interactivity to our QlikView documents. We will also be introduced to basic calculations.
Chapter 9, Advanced Data Transformation, returns to the topic of data transformation. We will learn about the most commonly used data architectures that can ease QlikView development and administration. Next, we will take a close look at aggregating and sorting data in the data model. In the fi nal part of the chapter, we will learn how to take advantage of some of QlikView's most powerful data transformation capabilities.
Chapter 10, Security, shows how to secure our QlikView documents. We will see how we can allow only authorized users to open our documents and will learn how we can limit what a user can do and see within our document.
Chapter 11, Data Visualization Strategy, begins our journey to create a data-driven organization using QlikView.
Chapter 12, Sales Perspective, explains the data model's importance to data visualization, and shows us how to create advanced analyses, such as customer stratifi cation, churn prediction, and seasonal trends.
Chapter 13, Financial Perspective, illustrates the usage of metadata to format an income statement, a balance sheet, and a cash flow statement.
Chapter 14, Marketing Perspective, walks us through various types of visualization that reveal customer profiles, potential markets, social media sentiment, and the sales pipeline.
Chapter 15, Working Capital Perspective, describes how to analyze days sales of inventory, days sales outstanding, and days payable outstanding, at both a high and a detailed level. It also explains how they are important in order to determine customer stratification.
Chapter 16, Operations Perspective, shows us how to analyze our service levels, predict supplier lead times, and investigate whether on-time deliveries depend on the supplier.
Chapter 17, Human Resources, reveals how to visualize personnel productivity and personal behavior analysis.
Chapter 18, Fact Sheets, demonstrates an ad hoc design method to create a customer fact sheet that includes bullet graphs, sparklines, and a customized UX.
Chapter 19, Balanced Scorecard, details a more formal design method to build an information dashboard containing balanced scorecard metrics.
Chapter 20, Troubleshooting Analysis, takes a look at resources and methods to debug problems in our QlikView applications.
Chapter 21, Mastering Qlik Sense Data Visualization, explains what Qlik Sense means to a QlikView developer and proposes a plan to master Qlik Sense data visualization.
To use this book, you primarily need the QlikView Desktop software. With regards to computer requirements, you will need a PC with at least Windows XP (or better), 2 GB of hard disk space, and 2 GB of RAM. A 64-bit machine is required if you want to use QlikView 12 or a higher version, and is the recommended environment for this book and QlikView development in general. If you prefer to use a 32-bit machine, you can install QlikView 11 instead.
For best understanding, a general knowledge of BI and its terminology is required. Basic understanding of databases and SQL is preferred, but not compulsory for this book.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
The code bundle for the book is also hosted on GitHub https://github.com/PacktPublishing/QlikView-Advanced-Data-Visualization. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "In the QVScriptGenTool_0_7 64Bit\Analyzer folder there is a ZIP file called FolderTemplate.zip."
A block of code is set as follows:
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
Any command-line input or output is written as follows:
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Click on the Execution tab."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome.
General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
"The way Moore's Law occurs in computing is really unprecedented in other walks of life. If the Boeing 747 obeyed Moore's Law, it would travel a million miles an hour, it would be shrunken down in size, and a trip to New York would cost about five dollars. Those enormous changes just aren't part of our everyday experience."
— Nathan Myhrvold, former Chief Technology Officer at Microsoft, 1995
The way Moore's Law has benefitted QlikView is really unprecedented amongst other BI systems.
QlikView began life in 1993 in Lund, Sweden. Originally titled "QuickView", they had to change things when they couldn't obtain a copyright on that name, and thus "QlikView" was born.
After years of steady growth, something really good happened for QlikView around 2005/2006—the Intel x64 processors became the dominant processors in Windows servers. QlikView had, for a few years, supported the Itanium version of Windows; however, Itanium never became a dominant server processor. Intel and AMD started shipping the x64 processors in 2004 and, by 2006, most servers sold came with an x64 processor—whether the customer wanted 64-bit or not. Because the x64 processors could support either x86 or x64 versions of Windows, the customer didn't even have to know. Even those customers who purchased the x64 version of Windows 2003 didn't really know this because all of their x86 software would run just as well (perhaps with a few tweaks).
But x64 Windows was fantastic for QlikView! Any x86 process is limited to a maximum of 2 GB of physical memory. While 2 GB is quite a lot of memory, it wasn't enough to hold the volume of data that a true enterprise-class BI tool needed to handle. In fact, up until version 9 of QlikView, there was an in-built limitation of about 2 billion rows (actually, 2 to the power of 31) in the number of records that QlikView could load. On x86 processors, QlikView was really confined to the desktop.
x64 was a very different story. Early Intel implementations of x64 could address up to 64 GB of memory. More recent implementations allow up to 256 TB, although Windows Server 2012 can only address 4 TB. Memory is suddenly less of an obstacle to enterprise data volumes.
The other change that happened with processors was the introduction of multi-core architecture. At the time, it was common for a high-end server to come with 2 or 4 processors. Manufacturers came up with a method of putting multiple processors, or cores, on one physical processor. Nowadays, it is not unusual to see a server with 32 cores. High-end servers can have many, many more.
One of QlikView's design features that benefitted from this was that their calculation engine is multithreaded. That means that many of QlikView's calculations will execute across all available processor cores. Unlike many other applications, if you add more cores to your QlikView server, you will, in general, add more performance.
So, when it comes to looking at performance and scalability, very often, the first thing that people look at to improve things is to replace the hardware. This is valid of course! QlikView will almost always work better with newer, faster hardware. But before you go ripping out your racks, you should have a good idea of exactly what is going on with QlikView. Knowledge is power; it will help you tune your implementation to make the best use of the hardware that you already have in place.
The following are the topics we'll be covering in this chapter:
There are many ways in which you may have learned to develop with QlikView. Some of them may have talked about performance and some may not have. Typically, you start to think about performance at a later stage when users start complaining about slow results from a QlikView application or when your QlikView server is regularly crashing because your applications are too big.
In this section, we are going to quickly review some basic performance tuning techniques that you should, hopefully, already be aware of. Then, we will start looking at how we can advance your knowledge to master level.
Removing unneeded data might seem easy in theory, but sometimes it is not so easy to implement—especially when you need to negotiate with the business. However, the quickest way to improve the performance of a QlikView application is to remove data from it. If you can reduce your number of fact rows by half, you will vastly improve performance. The different options are discussed in the next sections.
The first option is to simply reduce the number of rows. Here we are interested in Fact or Transaction table rows—the largest tables in your data model. Reducing the number of dimension table rows rarely produces a significant performance improvement.
The easiest way to reduce the number of these rows is usually to limit the table by a value such as the date. It is always valuable to ask the question, "Do we really need all the transactions for the last 10 years?" If you can reduce this, say to 2 years, then the performance will improve significantly.
We can also choose to rethink the grain of the data—to what level of detail we hold the information. By aggregating the data to a higher level, we will often vastly reduce the number of rows.
The second option is to reduce the width of tables—again, especially Fact or Transaction tables. This means looking at fields that might be in your data model but do not actually get used in the application. One excellent way of establishing this is to use the Document Analyzer tool by Rob Wunderlich to examine your application (http://robwunderlich.com/downloads).
As well as other excellent uses, Rob's tool looks at multiple areas of an application to establish whether fields are being used or not. It will give you an option to view fields that are not in use and has a useful DROP FIELD Statements listbox from which you can copy the possible values. The following screenshot shows an example (from the default document downloadable from Rob's website):
Adding these DROP FIELD statements into the end of a script makes it very easy to remove fields from your data model without having to dive into the middle of the script and try to remove them during the load—which could be painful.
There is a potential issue here; if you have users using collaboration objects—creating their own charts—then this tool will not detect that usage. However, if you use the DROP FIELD option, then it is straightforward to add a field back if a user complains that one of their charts is not working.
Of course, the best practice would be to take the pain and remove the fields from the script by either commenting them out or removing them completely from their load statements. This is more work, because you may break things and have to do additional debugging, but it will result in a better performing script.
Often, you will have a text value in a key field, for example, something like an account number that has alphanumeric characters. These are actually quite poor for performance compared to an integer value and should be replaced with numeric keys.
There is some debate here about whether this makes a difference at all, but the effect is to do with the way the data is stored under the hood, which we will explore later. Generated numeric keys are stored slightly differently than text keys, which makes things work better.
The strategy is to leave the text value (account number) in the dimension table for use in display (if you need it!) and then use the AutoNumber function to generate a numeric value—also called a surrogate key—to associate the two tables.
For example, replace the following:
With the following:
It is enormously useful to be able to quickly generate test data so that we can create QlikView applications and test different aspects of development and discover how different development methods work. By creating our own set of data, we can abstract problems away from the business issues that we are trying to solve because the data is not connected to those problems. Instead, we can resolve the technical issue underlying the business issue. Once we have resolved that issue, we will have built an understanding that allows us to more quickly resolve the real problems with the business data.
We might contemplate that if we are developers who only have access to a certain dataset, then we will only learn to solve the issues in that dataset. For true mastery, we need to be able to solve issues in many different scenarios, and the only way that we can do that is to generate our own test data to do that with.
Dimension tables will generally have lower numbers of records; there are a number of websites online that will generate this type of data for you.
The following screenshot demonstrates setting up aCustomer extract in Mockaroo:
This allows us to create 1,000 customer records that we can include in our QlikView data model. The extract is in the CSV format, so it is quite straightforward to load into QlikView.
While we might often abdicate the creation of test dimension tables to a third-party website like this, we should always try and generate the Fact table data ourselves.
A good way to do this is to simply generate rows with a combination of the AutoGenerate() and Rand() functions.
For even more advanced use cases, we can look at using statistical functions such as NORMINV to generate normal distributions. There is a good article on this written by Henric Cronström onQlik Design Blog at http://community.qlik.com/blogs/qlikviewdesignblog/2013/08/26/monte-carlo-methods.
We should be aware of the AutoGenerate() function that will just simply generate empty rows of data. We can also use the Rand() function
QlikView is really good at storing data. It operates on data in memory, so being able to store a lot of data in a relatively small amount of memory gives the product a great advantage—especially as Moore's Law continues to give us bigger and bigger servers.
Understanding how QlikView stores its data is fundamental in mastering QlikView development. Writing load script with this understanding will allow you to load data in the most efficient way so that you can create the best performing applications. Your users will love you.
A great primer on how QlikView stores its data is available onQlik Design Blog, written by Henric Cronström (http://community.qlik.com/blogs/qlikviewdesignblog/2012/11/20/symbol-tables-and-bit-stuffed-pointers).
Henric joined QlikView in 1994, so he knows quite a bit about exactly how it works.
From a simple level, consider the following small table:
First name
Surname
Country
John
Smith
USA
Jane
Smith
USA
John
Doe
Canada
For the preceding table, QlikView will create three symbol tables like the following:
Index
Value
1010
John
1011
Jane
Index
Value
1110
Smith
1111
Doe
Index
Value
110
USA
111
Canada
And the data table will look like the following:
First name
Surname
Country
1010
1110
110
1011
1110
110
1010
1111
111
This set of tables will take up less space than the original data table for the following three reasons:
So, to summarize, each field in the data model will be stored in a symbol table (unless, as we will see later, it is a sequential integer value) that contains the unique values and an index value. Every table that you create in the script—including any synthetic key tables—will be represented as a data table containing just the index pointers.
Because the data table indexes are bit-stuffed, and because data is stored in bytes, adding another bit or two to the indexes may not actually increase the overall width of a data table record.
To help us understand what is going on in a particular QlikView document, we can export details about where all the memory is being used. This export file will tell us how much memory is being used by each field in the symbol tables, the data tables, chart objects, and so on.
Perform the following steps to export the memory statistics for a document:
We can now see exactly how much space our data is taking up in the symbol tables and in the data tables. We can also look at chart calculation performance to see whether there are long running calculations that we need to tune. Analyzing this data will allow us to make valuable decisions about where we can improve performance in our QlikView document.
Using some of the test data that we have generated, or any other data that you might want, we can discover more about how QlikView handles different scenarios. Understanding these different situations will give you real mastery over data load optimization.
To begin with, let's see what happens when we load two largish tables that are connected by a key. So, let's ignore the dimension tables and load the order data using a script like the following:
The preceding script will result in a database memory profile that looks like the following. In the following screenshot, Database has been selected for Class:
There are some interesting readings in this table. For example, we can see that when the main data table—OrderLine—is stored with just its pointer records, it takes up just 923,085 bytes for 102,565 records. That is an average of only 9 bytes per record. This shows the space benefit of the bit-stuffed pointer mechanism as described in Henric's blog post.
The largest individual symbol table is the OrderDate field. This is very typical of a TimeStamp field, which will often be highly unique, have long decimal values, and have the Dual text value, and so often takes up a lot of memory—28 bytes per value.
The number part of a TimeStamp field contains an integer representing the date (number of days since 30th December 1899) and a decimal representing the time. So, let's see what happens with this field if we turn it into just an integer—a common strategy with these fields as the time portion may not be important:
This changes things considerably:
The number of unique values has been vastly reduced, because the highly unique date and time values have been replaced with a much lower cardinality (2171) date integer, and the amount of memory consumed is also vastly reduced as the integer values are only taking 8 bytes instead of the 28 being taken by each value of the TimeStamp field.
The next field that we will pay attention to is OrderID. This is the key field, and key fields are always worth examining to see whether they can be improved. In our test data, the OrderID field is alphanumeric—this is not uncommon for such data. Alphanumeric data will tend to take up more space than numeric data, so it is a good idea to convert it to integers using the AutoNumber function.
AutoNumber accepts a text value and will return a sequential integer. If you pass the same text value, it will return the same integer. This is a great way of transforming alphanumeric ID values into integers. The code will look like the following:
This will result in a memory profile like the following:
The OrderID
As well as information about memory use in each data table and symbol table, we can recall that the Memory Statistics option will also export information about charts—both memory use and calculation time. This means that we can create a chart, especially one with multiple dimensions and expressions, and see how long the chart takes to calculate for different scenarios.
Let's load the Order Header and Order Line data with the Calendar information loaded inline (as in the first part of the last example) in the following manner:
Now we can add a chart to the document with several dimensions and expressions like this:
We have used YearMonth and CustomerID as dimensions. This is deliberate because these two fields will be in separate tables once we move the calendar fields into a separate table.
The expressions that we have used are shown in the following table:
Expression Label
Expression
Sales $
Sum(LineValue)
Sales $ Color
ColorMix1(Sum(LineValue)/Max(total Aggr(Sum(LineValue), YearMonth, CustomerID)), White(), ARGB(255, 0, 128, 255))
Cost $
Sum(LineCost)
Margin $
Sum(LineValue)-Sum(LineCost)
Margin %
(Sum(LineValue)-Sum(LineCost))/Sum(LineValue)
Cum. Sales $
RangeSum(Above(Sum(LineValue),0,RowNo()))
# Orders
Count(DISTINCT OrderID)
Product 101
Sum(If(ProductID=101,1,0))
Product 102-106
Sum(If(Match(ProductID,102,103,104,105,106), 1, 0))
The cache in QlikView is enormously important. Calculations and selections are cached as you work with a QlikView document. The next time you open a chart with the same selections, the chart will not be recalculated; you will get the cached answer instead. This really speeds up QlikView performance. Even within a chart, you might have multiple expressions using the same calculation (such as dividing two expressions by each other to obtain a ratio)—the results will make use of caching.
This caching is really useful for a working document, but a pain if we want to gather statistics on one or more charts. With the cache on, we need to close a document and the QlikView desktop, reopen the document in a new QlikView instance, and open the chart. To help us test the chart performance, it can therefore be a good idea to turn off the cache.
Note that you need to be very careful with this dialog as you could break things in your QlikView installation. Turning off the cache is not recommended for normal use of the QlikView desktop as it can seriously interfere with the performance of QlikView. Turning off the cache to gather accurate statistics on chart performance is pretty much the only use case that one might ever come across for turning off the cache. There is a reason why it is a hidden setting!
Now that the cache is turned off, we can open our chart and it will always calculate at the maximum time. We can then export the memory information as usual and load it into another copy of QlikView (here, the Class of Sheetobject is selected):
What we could do now is make some selections and save them as bookmarks. By closing the QlikView desktop client and then reopening it, and then opening the document and running through the bookmarks, we can export the memory file and create a calculation for Avg Calc Time. Because there is no cache involved, this should be a valid representation.
Now, we can comment out the inline calendar and create the Calendar table (as we did in a previous exercise):
For the dataset size that we are using, we should see no difference in calculation time between the two data structures. As previously established, the second option has a smaller in-memory data size, so that would always be the preferred option.
For many years, it has been a well-established fact among QlikView consultants that a Count() function with a Distinct clause is a very expensive calculation. Over the years, I have heard that Count can be up to 1000 times more expensive than Sum. Actually, since about Version 9 of QlikView, this is no longer true, and the Count function is a lot more efficient.
Once we have optimized our data model, we can turn our focus onto chart performance. There are a few different things that we can do to make sure that our expressions are optimal, and we can use the memory file extract to test them.
Some of the expressions will actually involve revisiting the data model. If we do, we will need to weigh up the cost of that performance with changes to memory, and so on.
It will be useful to begin with an explanation of how the QlikView calculation engine works.
QlikView is very clever in how it does its calculations. As well as the data storage, as discussed earlier in this chapter, it also stores the binary state of every field and of every data table dependent on user selection—essentially, depending on the green/white/grey state of each field, it is either included or excluded. This area of storage is called the state space and is updated by the QlikView logical inference engine every time a selection is made. There is one bit in the state space for every value in the symbol table or row in the data table—as such, the state space is much smaller than the data itself and hence much faster to query.
There are three steps to a chart being calculated:
Of course, the very intelligent cache comes into play as well and everything that is calculated is stored for potential subsequent use. If the same set of selections are met (such as hitting the Back button), then the calculation is retrieved from the cache and will be almost instantaneous.
Now that we know more about how QlikView performs its calculations, we can look at a few ways that we can optimize things.
We cannot anticipate every possible selection or query that a user might make, but there are often some quite well-known conditions that will generally be true most of the time and may be commonly used in calculations. In this example, we will look at Year-to-Date and Last Year-to-Date—commonly used on dashboards.
The following is an example of a calculation that might be used in a gauge:
This uses the YearToDate() function to check whether the date is in the current year to date or in the year to date period for last year (using the -1 for the offset parameter). This expression is a sum of an if statement, which is generally not recommended. Also, these are quite binary—a date is either in the year to date or not—so are ideal candidates for the creation of flags. We can do this in the Calendar table in the following script:
Note the - sign before the function. This is because YearToDate is a Boolean function that returns either true or false, which in QlikView is represented by -1 and 0. If the value is in the year to date, then the function will return -1, so I add the - to change that to 1. A - sign before 0 will make no difference.
In a particular test dataset, we might see an increase from 8,684 bytes to 13,026—not an unexpected increase and not significant because the Calendar table is relatively small. We are creating these flags to improve performance in the frontend and need to accept a small change in the data size.
The significant change comes when we change the expression in the chart to the following:
In a sample dataset, we might see that the calculation reduces from, say, 46 to, say, 16—a 65 percent reduction. This calculation could also be written using Set Analysis as follows:
However, this might only get a calc time of 31—only a 32.6 percent reduction. Very interesting!
If we think about it, the simple calculation of LineValue*YTD_Flag is going to do a multithreaded calculation using values that are derived from the small and fast in-memory state space. Both If and Set Analysis are going to add additional load to the calculation of the set of values that are going to be used in the calculation.
In this case, the flag field is in a dimension table, Calendar, and the value field is in the fact table. It is, of course, possible to generate the flag field in the fact table instead. In this case, the calculation is likely to run even faster, especially on very large datasets. This is because there is no join of data tables required. However, the thing to bear in mind is that the additional pointer indexes in the Calendar table will require relatively little space whereas the additional width of the fact table, because of the large numbers of rows, will be something to consider. However, saying that, the pointers to the flag values are very small, so you do need a really long fact table for it to make a big difference. In some cases, the additional bit necessary to store the pointer in the bit-stuffed table will not make any difference at all, and in other cases, it may add just one byte.
Set Analysis can be very powerful, but it is worth considering that it often has to go, depending on the formula, outside the current state space, and that will cause additional calculation to take place that may be achieved in a simpler manner by creating a flag field in the script and using it in this way. Even if you have to use Set Analysis, the best performing comparisons are going to be using numeric comparisons, so creating a numeric flag instead of a text value will improve the set calculation performance. For example, consider the following expression:
This will execute much faster than the following expression:
So, when should we use Set Analysis instead of multiplying by flags? Barry Harmsen has done some testing that indicates that if the dimension table is much larger relative to the fact table, then using Set Analysis is faster than the flag fields. The reasoning is that the multiply method will process all records (even those containing 0), so in larger tables, it has more to process. The Set Analysis method will first reduce the scope, and apply the calculation to that subset.
Of course, if we have to introduce more advanced logic, that might include AND/OR/NOT operations, then Set Analysis is the way to go—but try to use numeric flags.
Any time that you need to sort a chart or listbox, that sort needs to be calculated. Of course, a numeric sort will always be the fastest. An alphabetic sort is a lot slower, just by its nature. One of the very slowest sorts is where we want to sort by expression.
For example, let's imagine that we wish to sort our Country list by a fixed order, defined by the business. We could use a sort expression like this:
The problem is that this is a text comparison that will be continually evaluated. What we can do instead is to load a temporary sort table in the script. We load this towards the beginning of the script because it needs to be the initial load of the symbol table; something like the following:
Then, as we won't need this table in our data, we should remember to drop it at the end of the script—after the main data has been loaded:
Now, when we use this field anywhere, we can turn off all of the sort options and use the last one—Load Order. This doesn't need to be evaluated so will always calculate quickly:
Traditionally, QlikView has been a totally in-memory tool. If you want to analyze any information, you need to get all of the data into memory. This has caused problems for many enterprise organizations because of the sheer size of data that they wanted to analyze. You can get quite a lot of data into QlikView—billions of rows are not uncommon on very large servers, but there is a limit. Especially in the last few years where businesses have started to take note of the buzz around Big Data, many believed that QlikView could not play in this area.
Direct Discovery was introduced with QlikView Version 11.20. In Version 11.20 SR5, it was updated with a new, more sensible syntax. This syntax is also available in Qlik Sense. What Direct Discovery does is allow a QlikView model to connect directly to a data source without having to load all of the data into memory. Instead, we load only dimension values and, when necessary, QlikView generates a query to retrieve the required results from the database.
Of course, this does have the potential to reduce some of the things that make QlikView very popular—the sub-second response to selections, for example. Every time that a user makes a selection, QlikView generates a query to pass through to the database connection. The faster the data connection, the faster the response, so a performative data warehouse is a boon for Direct Discovery. But speed is not always everything—with Direct Discovery, we can connect to any valid connection that we might normally connect to with the QlikView script; this includes ODBC connectors to Big Data sources such as Cloudera or Google.
Here we will get an introduction to using Direct Discovery, but we should read the more detailed technical details published by the Qlik Community, for example, the SR5 technical addendum at http://community.qlik.com/docs/DOC-3710.
There are a few restrictions of Direct Discovery that will probably be addressed with subsequent service releases:
