44,39 €
Business Intelligence Semantic Models (BISM) is a technology that is designed to deliver analytical information to users through a variety of mechanisms that include model structure, definition, and design. This book demonstrates how to create BISM models so that information can be presented to users in an intuitive and easy-to-use format. Once the model is defined, we also show you how it can be managed and maintained so that the data in it remains current and secure.
Microsoft Tabular Modeling Cookbook is an all-encompassing guide to developing, managing, creating, and using analytical models using the Business Intelligence Semantic Model (BISM). This title covers a range of modeling situations and common data analysis related problems to show you the techniques required to turn data into information using tabular modeling.
Microsoft Tabular Modeling Cookbook examines three areas of tabular modeling: model development, model management and maintenance, and reporting. This book is a practical guide on how to develop semantic models and turn business data into information. It covers all phases of the model lifecycle from creation to administration and finally reporting. It also shows you how to create models which are designed to analyze data.
All sections of BISM modeling from development to management and finally reporting are covered. The sections on development examine a wide range of techniques and tricks required to build models, including moving data into the model, structuring the model to manipulate the data, and finally the formulas required to answer common business questions; all of these are discussed in this book in detail.
Finally, the book examines methods of reporting on the data within the model, including the creation of data-driven workbooks and reports for a powerful end user experience.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 358
Veröffentlichungsjahr: 2013
Copyright © 2013 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: December 2013
Production Reference: 1171213
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78217-088-4
www.packtpub.com
Cover Image by Neston Simeoes (<[email protected]>)
Author
Paul te Braak
Reviewers
Anindita Basak
Steve Hughes
Cosmin Ioan
Stevo Smočilac
Acquisition Editors
Sam Birch
Edward Gordon
Lead Technical Editor
Ankita Shashi
Technical Editors
Pankaj Kadam
Pramod Kumavat
Adrian Raposo
Project Coordinator
Shiksha Chaturvedi
Copy Editors
Alisha Aranha
Sayanee Mukherjee
Deepa Nambiar
Alfida Paiva
Laxmi Subramanian
Proofreader
Linda Morris
Indexer
Rekha Nair
Graphics
Yuvraj Mannari
Production Coordinator
Nilesh R. Mohite
Cover Work
Nilesh R. Mohite
Paul te Braak (<[email protected]>) is a leading Business Intelligence Consultant based in Australia. He has been involved in Information Management for over 15 years, with the past 9 years focusing on the Microsoft Business Intelligence stack. His areas of interest include data modeling, data mining, and visualization. He is an active participant in the SQL Server community, speaks at various local and international events, and organizes a regional SQL Server Saturday. His blog can be found at www.paultebraak.wordpress.com.
I would like to thank everyone who has contributed to this book. Like most projects, there are many behind-the-scenes people who have assisted me, and I am truly grateful to those people. The book would never have been complete without your help!
Firstly, I'd like to thank my wife for her understanding and acceptance during the project when I spent nights and weekends working. I am sure that my responsibilities at home have decreased (well, they've been removed), and this has afforded me the time to focus on writing.
I would also like to thank Cathy Dumas for recommending me to Packt Publishing at the start of the project, and all the reviewers who have provided their input along the way and looked over my work with an objective view. I would like to thank the members of the community who are always willing to participate in events and forums that help us improve our knowledge.
Finally, I'd like to thank Packt Publishing and all the associated staff (believe me, there have been quite a few) for the opportunity to write for them.
Anindita Basak is currently working as a senior system analyst at Sonata Software in the Windows Azure Pro Direct Delivery group of Microsoft. She has worked as a senior software engineer on implementation of various enterprise applications on Windows Azure and Windows phone. She started her journey with Windows Azure in the Microsoft Cloud Integration Engineering (CIE) team and worked as a support engineer in Microsoft India (R&D) Pvt. Ltd. With six years of experience in the Microsoft .NET technology stack, she is solely focused on Cloud and Microsoft Mobility. As an MVB, she loves to share her technical experience and expertise through her blog at http://anindita9.wordpress.com.
She recently worked as a technical reviewer for the books HDInsight Essentials and Microsoft SQL Server 2012 with Hadoop by Packt Publishing.
She holds a B.E in Information Technology from West Bengal University of Technology (formerly IIIT Calcutta). She has attended various business conferences and technology seminars of Microsoft.
I would like to thank my grandpapa Mr. Kanti Das Basak, mom, Anjana, dad, Ajit Kumar Basak, and my affectionate brother, Aditya. Without their help, I can't achieve any goals of my life.
Steve Hughes is a Practice Lead at Magenic. In his current role, he develops strategy and helps guide data, Business Intelligence, collaboration, and data integration development using Microsoft technologies, including SQL Server, SharePoint, and BizTalk. He continues to deliver data and Business Intelligence solutions using these platforms. He has been working with technology for over 15 years with much of that time spent on creating Business Intelligence solutions. He is passionate about using data effectively and helping customers understand that data is valuable and profitable. Steve can often be found at Professional Association for SQL Server (PASS) events, where he serves as a regional mentor and is active with the Minnesota SQL Server User Group (MNPASS). He shares his insights about the field on his blog at http://dataonwheels.wordpress.com.
I would like to thank my family for their continued support on these projects.
Cosmin Ioan is a data warehouse and Business Intelligence architect with over 16 years' experience in the Information Technology field, spanning development languages, systems administration, RDBMS and OLAP design, architecture, troubleshooting, and scalability on Microsoft, Oracle, and Sybase platforms. He has worked in consultancy and full-time roles for companies, public and private companies alike, such as Motorola, Citrix, Aetna, and Sheridan Healthcorp, chiefly building data warehouse and systems integration solutions, allowing companies to better harness and give meaning to their data assets.
When not working, Cosmin enjoys scuba diving and racquet sports.
Writing a technical book and reviewing one are never easy tasks. Due to inherent time constraints and ever-changing technology advancements, keeping a delicate balance between product depth and breadth, as well as a target audience for covering any one technical product is always a challenging proposition. My thanks to Paul for taking up such a challenge and allowing me to be part of the effort, as well as the nice team at Packt Publishing for their endeavor in publishing quality technical books.
Stevo Smočilac is an associate principal consultant at Magenic, a Microsoft Gold Certified Partner, who specializes in Business Intelligence solutions.
He has over 12 years' experience of working in software development, the last seven of which have focused on designing, implementing, managing, and administrating technical solutions developed using Microsoft SQL Server and the Microsoft Business Intelligence stack. He has been involved in all phases of the BI development lifecycle from envisioning through operational support, and he is passionate about the field of Business Intelligence.
Stevo is currently a Virtual Technology Solutions Professional (V-TSP) for Business Intelligence, a Microsoft Certified IT professional, and holds a B.Tech degree in Information Technology.
Originally from South Africa, he now resides in (the much colder) Northeastern United States with his wife Talya.
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Get notified! Find out when new books are published by following @PacktEnterprise on Twitter, or the Packt Enterprise Facebook page.
In 2010, Microsoft announced a change to its Business Intelligence environment, and said it will focus its development efforts on semantic modeling. At that time, the current technology used for analysis was SQL Server Analysis Server (SSAS), a technology that relied on disk-based storage and the distinct steps of model development, deployment, and processing—a function usually under the control of IT. The new technology will house all its data in memory and allow the user (or model designer) to change the model in real time and view those changes instantaneously. In addition to this, the platform sought to remove many of the barriers that had existed in the traditional Business Intelligence landscape. It offered a uniform platform for data analysis across an entire organization. The same platform can now be used by an individual user in Excel deployed to SharePoint (for team Business Intelligence) or directly to a server (for corporate Business Intelligence). This will remove a large proportion of the rework that was traditionally involved in Business Intelligence projects and lead to the catchcry "BI to the masses" (meaning that anyone can model a Business Intelligence solution). A free add-in was released for Excel 2010, and the 2012 release of Analysis Server (in SQL Server) included a new storage mode called tabular.
This was an interesting challenge to the traditional methods for implementing Business Intelligence models. Under that structure, Business Intelligence was essentially controlled by an IT department, which used a waterfall methodology and there were distinct phases in an analytical project involving the separation of duties and more importantly, the separation of people. Those that had to use data models were often involved with a back-and-forth battle to make the model work as the business user required.
Tabular models were then introduced and overnight Excel users were able to consume massive amounts of data and create their own analytical models without the need to involve IT (other than access to the data of course!). The product extended the familiar pivot table by allowing users to create pivot tables using many different data sources (and removed the requirements for a pivot table to be sourced from a single data table). More importantly, the ability to create models for the analysis of data was delivered directly to those who needed it most—the analytical end user. The restrictions on analysis and data manipulation that they had previously encountered were removed.
This book is primarily written for those users—individuals who need to answer questions based on large amounts of data. For this reason, we focus on how these users can use that technology to build models in Excel using PowerPivot. We simply don't want to exclude those users who need it the most and do not have access to the more traditional tools developed for corporate BI. Furthermore, these techniques are also directly applicable to corporate tabular models.
Finally, the book looks at how these models can be managed and incorporated into production environments and corporate systems to provide robust and secure reporting systems.
Chapter 1, Getting Started with Excel, covers the basics of the tabular model, that is, how to get started with modeling and summarizing the data. This chapter includes a basic overview of how the tabular model works and how the model presents to an end user (we also look at some general data modeling principles, so that you can better understand the underlying structure of the datasets that you use). In doing so, we look at the basics of combining data within the model, calculations, and the control (and formatting) of what an end user can see.
Chapter 2, Importing Data, examines how different forms of data can be incorporated and managed within the model. In doing so, we examine some common sources of data which are used (for example, text files) and examine ways that these sources can be controlled and defined. We also examine some non-traditional sources (for example, data that is presented in a report).
Chapter 3, Advanced Browsing Features, examines how the model can be structured to provide an intuitive and desirable user experience. We examine a variety of techniques that include model properties and configurations, data structures and design styles, which can be used to control and present data within the model. We also examine how to create some common analytical features (for example, calculation styles, value bounds, ratios, and key performance indicators) and how these can be used.
Chapter 4, Time Calculations and Date Functions, explains how time and calendar calculations are added and used within the model. This chapter looks at defining the commonly used month-to-date and year-to-date calculations, as well as comparative calculations (for example, the same period last year). We also look at alternate calendars (for example, the 445 calendar) running averages and shell calculations.
Chapter 5, Applied Modeling, discusses some advanced modeling functionality and how the model can be used to manipulate its own data thus presenting new information. For example, we look at the dynamic generation of bins (that is, the grouping of data), currency calculations, many-to-many relationships, and stock calculations over time. We also look at how the model can be used to allocate its own data so that datasets that have been imported into the model at various levels of aggregation can be presented under a consistent view.
Chapter 6, Programmatic Access via Excel, explains how the tabular model can open a new world of possibilities for analysis in Excel by allowing the creation of interactive reports and visualizations that combine massive amounts of data. This chapter looks at how Excel and the tabular model can be used to provide an intuitive reporting environment through the use of VBA—Visual Basic for Applications is the internal programming language of Excel.
Chapter 7, Enterprise Design and Features, examines the corporate considerations of the tabular model design and the additional requirements of the model in that environment. We look at the various methods of upgrading PowerPivot model, perspectives, and the application of security.
Chapter 8, Enterprise Management, examines how the model is managed in a corporate environment (that is on SQL Server Analysis Server). This chapter looks at various techniques for deploying the tabular model to a SSAS server and the manipulation of objects once they have been deployed (for example, the addition and reconfiguration of data sources). We look at the addition of new data to the model through petitions and the processing of the model data through SQL Server Agent Jobs.
Chapter 9, Querying the Tabular Model with DAX, shows how to query the model using the language of the tabular model—DAX (Data Analysis Expressions). We look at how to retrieve data from the model and then go on to combine data from different parts of the model, create aggregate summaries and calculations, and finally filter data.
Chapter 10, Visualizing Data with Power View, explains how Power View can be used to analyze data in tabular models. This chapter looks at how to use Power View and how to configure and design a tabular model for use with Power View.
Appendix, Installing PowerPivot and Sample Databases, shows how to install PowerPivot in Excel 2010 and install the sample data used in this book.
As a book which covers many aspects of tabular modeling, the recipes can be followed using a variety of software that incorporates tabular modeling. Although we focus on PowerPivot in Excel 2010 (this is still the most prevalent installation in corporate environments), the recipes can also be completed in Excel 2013. When recipes focus on server and corporate features, SQL Server Analysis Services 2012 (in tabular storage mode) is used. The complete list of software applications used in this book is:
This book is designed for two types of users. First and foremost, it is designed for those users who wish to create tabular models for analysis regardless of whether they create the model for personal use in Excel using PowerPivot or server-based models that are deployed to Analysis Services. For those modelers, we show how to design, create, and manipulate the model so that it can be used to answer the types of questions that appear in business. For these users and consumers of model data, we also show how the model can be used to provide an intuitive and interactive report (both in Excel and Power View). Our goal for these users was to give them the skills so that they can build a model capable of answering their business questions.
The second category of users are those who are responsible for the maintenance of models in corporate environments. These are administrators who must ensure that the corporate model data is up-to-date and secure. For these users we show tricks and techniques to deploy the model and keep it running smoothly.
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Each product is identified by a product_id value."
A block of code is set as follows:
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Then on the PowerPivot tab, click on the Create Linked Table button."
Warnings or important notes appear in a box like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the erratasubmissionform link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
In this chapter, we will cover:
This chapter is designed as an introduction to tabular modelingby using PowerPivot. It shows the process by which a user imports data into PowerPivot for Excel, creates relationships between the datasets, and then reports on it.
The data used in this chapter is based on the orders of the fictitious bicycle company (named Adventure Works). Our data includes six datasets and they are:
The tabular modeling lifecycle revolves around three primary steps. These are:
This chapter examines these steps and allows the reader to become familiar with the tabular (PowerPivot) design environment.
An Excel workbook can only contain one tabular model and that one model contains tables of data (which may or may not be related). The first step to create a model is to import data into it. There are many techniques to do this—some techniques have advantages over others but for now, let's only consider the fact that we want to load data that exists in an Excel worksheet into the model.
The installation instructions for PowerPivot in Excel 2010 are covered in the Appendix, Installing PowerPivot and Sample Databases, of this book.
Open the Excel workbook named SalesBook which is available from the Packt Publishing website to examine the worksheets within the book. Each sheet contains a dataset for Products, Subcategories, Categories, Customers, Dates, and Sales.
This recipe looks at importing data into the PowerPivot model through linked tables. These are very convenient to use when the data is stored in Excel. Additionally, once the data has been imported into PowerPivot, it retains a connection to the Excel table. This means that, when the data is changed in Excel, it can also be changed in the PowerPivot model.
Excel will automatically highlight the data range.
When a linked table is created in PowerPivot, Excel creates a named range in the Excel workbook. This is then linked to the PowerPivot model (note that there is a small chain symbol before each of the tables). Also, note that the tables in Excel are formatted with alternate blue coloring. The named ranges can be viewed in Excel by clicking on the Name Manager button on the Formulas tab.
A table (table range) is actually an Excel feature that PowerPivot utilizes. A table can be defined in Excel, given a meaningful name and then imported into PowerPivot, so that the name of the table in PowerPivot is the same as the named range in Excel.
Ensure that the Customers sheet is selected in Excel and also any cell in the Customers data is selected. In the Home tab, click on the Format as Table button, and choose a table style; the style chosen in the following screenshot is a relevant one:
Note that the data is now formatted with alternating colors (based on the selected style). Return to the Name Manager window and double-click the table that relates to the Customers worksheet. A new window will open allowing you to edit the name, replace the name Table6 with Customers, and click on OK. The Table6 name is replaced by Customers in the Name Manager window.
Now, create a linked table in the same manner as we did before and note that the name of the table imported into PowerPivot is Customers.
If you want to select an entire table in Excel, simply choose the table name from the Name Box drop-down list in the formula bar in the upper-left corner. This is shown in the following screenshot:
A PowerPivot workbook contains two products that allow the user to analyze data. Firstly, there is the xVelocity in-memory analytics engine (the tabular model) which is a columnar database embedded in the workbook. Secondly, there is a client tool that allows the model to be queried, it also displays the results to the user in the form of a pivot table or pivot chart. In Excel 2010, the client tool was restricted to pivot table functionality (for example, a pivot table or pivot chart). In Excel 2013, the tools set has been extended to include Power View. The important distinction here is that the client tool is used to present the model to the user. This recipe shows how to control the way the model is presented to the user.
This recipe uses the model that has already been created in the prior recipe Creating the model. If this model has not been created, follow the recipe to ensure that the model has been loaded with data.
Start from an existing model within PowerPivot.
Before
After
Table
Column
New name
Products
product_id
Product ID
Products
product_name
Product Name
Products
colour
Colour
Products
size_range
Size Range
Products
size
Size
Subcategory
subcategory_name
Subcategory
Category
category_name
Category
Dates
date
Day
Dates
year
Year
Dates
month_name
Month
Dates
half_name
Half
Sales
order_number
SO Number
Customers
customer_id
Customer ID
Customers
customer_name
Customer Name
Customers
country_code
Country Code
Customers
state_code
State Code
You can also rename fields by right-clicking on the field and selecting Rename Column from the pop-up in PowerPivot. Alternatively, you can double-click on the field name (so that it changes the color of the field) and rename it.
PowerPivot mimics Excel in the way that you can select multiple fields by dragging your mouse across several columns (with the left button continually pressed). You can also select the first column, hold the Shift key, and select the final column.
Unlike Excel, multiple columns cannot be selected by using the Ctrl key and selecting multiple fields.
You can achieve the same result by dragging the Day field and dropping it in the Row Labels area of the pivot.
The semantic model defines the metadata structure of the model and includes information such as table names, column names, and data presentation formats. The model designer interacts with the semantic model through its presentation layer in a real-time manner (note that the model did not have to be deployed to a server), so that the changes made in the model are immediately available to the user.
The modeling environment behaves in a What You See Is What You Get (WYSIWYG) manner which means that any changes made to the design environment are reflected in the model that is presented to the user.
There are two methods that the model designer can use to examine the structure of the model. So far, we have only examined the data view. The diagram view shows all tables and columns (including hierarchies) that are used within the model and presents them on a design surface. This is shown in the next recipe.
In addition to table names, column names, and data formats, a semantic model defines how tables within a model relate to each other. This relationship is important because it defines the output of calculations (which are defined in the model). This recipe shows how to create relationships and the effect that these relationships have on the model.
This recipe assumes that the model in the recipe Managing the appearance of tables and fields has been created.
The reader should recognize that the model is designed to show sales information by product, date, and customer. This type of modeling scenario is commonly referred to as a star schema and is shown in the following diagram. The Sales table is referred to as a fact table (since it stores the data facts that we wish to analyze—sales amount, tax amount, and so on) and the other tables are referred to as dimension (subject) tables because they hold descriptive information.
Extending the model further, the Products table is linked to the Subcategory table, and the Subcategory table is linked to the Category table. This is shown in the following diagram and is sometimes called a snowflake schema, since the dimension tables are not directly connected to the fact table:
An important point to note, is that each dimension table has a unique identifying field, for example, a product can be uniquely identified in the Products table through the product_id field. This is commonly referred to as the primary key for the table.
In contrast, the referring column (product_id in the Sales table) can have many occurrences of the product_id field and is commonly referred to as the foreign key.
Start with the workbook that was developed in the prior recipe.
Source table
Source column
Related table
Related column
Sales
customer_id
Customers
Customer ID
Sales
order_date
Dates
Day
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Relationships can be created using this window. When the Create button is clicked, the same Create Relationships window opens. However, theCreate Relationships window is not populated with the source table and columns.
You can also switch between the data and diagram views by toggling the two buttons at the bottom-right side of the PowerPivot application status bar.
