139,99 €
Business Intelligence (BI) has emerged as a field which seeks to support managers in decision-making. It encompasses the techniques, methods and tools for conducting analytically-based IT solutions, which are referred to as OLAP (OnLine Analytical Processing). Within this field, SQL has a role as a leader and is continuously evolving to cover both transactional and analytical data management. This book discusses the functions provided by Microsoft® SQL Server 2014/2016 in terms of business intelligence. The analytic functions are considered as an enrichment of the SQL language. They combine a series of practical functions to answer complex analysis requests with all the simplicity, elegance and acquired performance of the SQL language. Drawing on the wide experience of the author in teaching and research, as well as insights from contacts in the industry, this book focuses on the issues and difficulties faced by academics (students and teachers) and professionals engaged in data analysis with the SQL Server 2014/2016 database management system.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 146
Veröffentlichungsjahr: 2019
Cover
Introduction
1 Data Analysis Fundamentals with the SQL Language
1.1. Data at the heart of the information system
1.2. SQL Server and data analysis
2 Queries
2.1. Data filtering
2.2. Sorting data
2.3. Data pagination
2.4. Subqueries
2.5. Options for the FROM clause
3 Operators
3.1. Joins
3.2. Set operators
3.3. Pivoting operators
4 Functions
4.1. Predefined functions
4.2. Aggregation functions
4.3. Windowing functions
4.4. Analytic functions
References
Index
End User License Agreement
Chapter 1
Table 1.1. SQL standards
Table 1.2. SQL Server versions
Chapter 4
Table 4.1. The most commonly used aggregation functions
Table 4.2. Categories of analytic functions used with the windowing mechanism
Introduction
Figure I.1. Number of Indeed job offers by programming language (January 2019)
Figure I.2. Outline of the work
Figure I.3. Mind map of the introduction
Figure I.4. Steps of programming with a procedural language
Figure I.5. Data query with the SQL language
Figure I.6. OLTP versus OLAP
Figure I.7. Code-first versus DB-first
Chapter 1
Figure 1.1. Enterprise architecture
Figure 1.2. Mind map of the first section
Figure 1.3. Database engineering process
Figure 1.4. Features of a DBMS
Figure 1.5. The main relational DBMSs
Figure 1.6. Edgar Frank Codd (23 August 1923–18 April 2003, United Kingdom). Tur...
Figure 1.7. The relational model: a model based on set theory
Figure 1.8. SQL language commands
Figure 1.9. Mind map of the second section
Figure 1.10. DBMS ranking (top 10)
Figure 1.11. SQL Server services
Figure 1.12. SQL Server editions
Figure 1.13. The evolution of SQL analytic functions in Microsoft SQL Server
Figure 1.14. Integration of the R server into SQL Server version 2016
Figure 1.15. Integration of the Python language into SQL Server version 2017
Chapter 2
Figure 2.1. Mind map of the first section
Figure 2.2. Installation of the full-text search service
Figure 2.3. Mind map of the second section
Figure 2.4. Estimated execution plan of a request
Figure 2.5. Mind map of the third section
Figure 2.6. Mental map of the fourth section
Figure 2.7. Operation of autonomous subqueries
Figure 2.8. Operation of correlated subqueries
Figure 2.9. Order of writing and execution of the different clauses of a query
Figure 2.10. Mind map of the fifth section
Figure 2.11. Reasons for using views
Chapter 3
Figure 3.1. Mind map of the first section
Figure 3.2. Cartesian product of countries with continents
Figure 3.3. Relational and set operators of SQL language
Figure 3.4. Mind map of the second section
Figure 3.5. Mind map of the third section
Chapter 4
Figure 4.1. Mind map of the first section
Figure 4.2. Built-in SQL Server functions
Figure 4.3. Mind map of the second section
Figure 4.4. Filters in a query using aggregation functions with the GROUP BY cla...
Figure 4.5. The ROLLUP option
Figure 4.6. The CUBE option
Figure 4.7. The GROUPING SETS option
Figure 4.8. Dashboard with aggregation functions. For a color version of this fi...
Figure 4.9. Mind map of the third section
Figure 4.10. Data windowing with dataset partitioning
Figure 4.11. Mind map of the fourth section
Cover
Table of Contents
Begin Reading
v
iii
iv
ix
x
xi
xii
xiii
xiv
xv
xvi
xvii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
137
138
139
140
141
142
143
144
145
146
147
148
149
151
Series Editor
Jean-Charles Pomerol
Riadh Ghlala
First published 2019 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
© ISTE Ltd 2019
The rights of Riadh Ghlala to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2019939237
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-78630-412-4
Nowadays, data analysis is a skill that is in high demand in order to develop indicators and metrics to monitor our information systems. In order to pursue this mission, several methods, techniques and tools have been developed to cover this need in various aspects.
Database Management Systems (DBMS), along with their querying language, Structured Query Language (SQL), are considered a very important path in data analysis. Based on analytic queries directly executed on the server side, a wide range of indicators can be developed without going through intermediate application layers.
Since its invention in the 1980s, SQL has demonstrated its dominance in data management. According to a study conducted by Indeed1, SQL has the highest ranking among skills sought by IT talent researchers.
Figure I.1 shows the position of SQL in relation to other computer languages.
The spread of SQL is explained by its characteristics of simplicity, high performance and diversity. Indeed, this language, which was previously designed for the definition and manipulation of data, has now become essential for analyzing this data.
Figure I.1.Number of Indeed job offers by programming language (January 2019)
This book, composed of four chapters, studies SQL as a tool for data management and analysis.
Although the concepts of this language are standardized, this book presents them with the Microsoft SQL Server Database Management System (DBMS) and focuses primarily on the analytical aspect of this language.
Figure I.2.Outline of the work
This introduction will be devoted to the presentation of SQL as a querying language for transactional and/or analytical data processing. Figure I.3 presents a mind map of the different concepts that will be covered in this introduction.
Figure I.3.Mind map of the introduction
The world of information technology contains a variety of languages, each dealing with a particular area of our information system. Among these languages, we find that the SQL language is designated as a querying language as opposed to the majority of other programming languages such as C#, Java or Python.
Programming languages, also known as procedural languages (we also consider object-oriented languages which are procedural languages with object-oriented mechanisms), are languages that allow developers to express their needs when automating particular tasks in the information system and also to specify to the machine (the computer) the method to apply to execution. Figure I.4 shows the process followed for programming with a procedural language.
Figure I.4.Steps of programming with a procedural language
The strength of querying languages, and in particular SQL, is that they ask users to focus on expressing their needs and to delegate the development of execution plans reflecting methods of execution to the tool used, which is the DBMS. Figure I.5 shows the process followed for requesting data using SQL.
Figure I.5.Data query with the SQL language
Data represents the static axis of any information system. The processing of this data, which is done using the SQL language, is divided into two categories: definition and manipulation. In addition to the definition of data usually performed on the database server side, the manipulation of this data is entrusted to the business logic layer to execute transactions or perform queries.
Transactional processing, also known as OnLine Transactional Processing (OLTP), consists of acting upon the data to change its status by addition, modification or deletion. This type of processing must be subject to very strict controls to ensure proper functioning in a multi-user and distributed context.
Figure I.6.OLTP versus OLAP
Analytical processing, also known as OnLine Analytical Processing (OLAP), consists of interrogating the data in order to develop synthetic information that can be used as indicators for the decision-maker. Figure I.6 compares the two types of processing, OLTP and OLAP.
The IT solutions that manage our information systems present numerous variants of OLAP. Indeed, this analytical processing can be performed with SQL queries directly executed on the database server or by integrating these queries into the application layer programs.
The term “code-first” refers to an approach that favors the use of the business logic layer for the development of a new need in the information system. In the case of analytical processing, this approach consists of developing the indicators requested by managers through programming or through graphical interfaces to generate the necessary programs. This approach has several disadvantages, such as the complexity of the programs to be developed, the security of data exchanges between these programs and the database, and the performance dilemma, especially when handling large amounts of information.
The term “database-first” refers to an approach that promotes, for the development of a new need in the information system, the implementation of this need directly in the database via a functionality provided by the DBMS. In the case of analytical processing, this approach consists of developing the indicators requested by decision-makers through appropriate SQL queries designated by the analytic functions of SQL or simply analytic SQL. This approach is important in information systems because of the simplicity, performance and security of the data in these types of requests. The use of this approach requires the mastery of these analytic functionalities of the SQL language.
Figure I.7.Code-first versus DB-first
Data analytics is a crucial need in information systems. This need can be met by various IT solutions for the development of the different indicators required by managers to supervise their firm’s activities in a highly competitive economic environment. SQL, as a querying language, is constantly being enriched with new features to improve its analytical capacity and therefore to master this aspect of analytical processing (OLAP), especially in database-first mode.
1
Indeed
is a job search engine:
https://www.indeed.com/
.
The development of IT solutions becomes more complicated from one day to the next due to the ever-increasing complexity of information systems and the need to align business needs with these IT solutions. In every generation of information system, data remains the cornerstone of these IT solutions. This importance is due to the static aspect of data in real-world modeling and the ability to provide any processing requested by application developers to meet the needs of the business logic layer.
Figure 1.1 shows the position of data in the urbanization of information systems according to the Enterprise Architecture (EA) vision1. This model puts two of the components on equal footing: data and applications.
Indeed, the role of data in meeting the needs of the business logic layer is essential, whether through the applications component or by direct access to the data component by means of a query language such as SQL.
Figure 1.1.Enterprise architecture
This first section will be devoted to introducing the basics of databases, the SQL language and Microsoft SQL Server DBMS.
Figure 1.2 presents a mind map of the different concepts that will be covered in this section.
A database (referred to as DB) is a collection of homogeneous data relating to a real-world domain (information system), stored in a structured way and with as little redundancy as possible to facilitate and accelerate access to these data later on.
These data must be accessible by different users, possibly simultaneously. Thus, the notion of database is generally coupled with that of computer networks in order to be able to share this information.
Figure 1.2.Mind map of the first section
The engineering process of a database goes through three essential steps: design, generation of the database schema in an appropriate formalization (in our case, the relational model) and finally implementation of this database.
Figure 1.3.Database engineering process
Database management consists of the implementation, operation and administration of the database. These tasks are performed by software using a set of commands written in a query language called SQL.
The range of DBMSs is very diverse. The choice of a DBMS is made according to several criteria: functionalities provided, cost, performance, security, etc.
Figure 1.4.Features of a DBMS
Figure 1.5.The main relational DBMSs
In 1970, Edgar F. Codd, a researcher at IBM2, proposed in a mathematical thesis the representation of data, links and even the expected results of a search in a database in table form. However, IBM — which was then working on another type of database — did not begin to take an interest in it until 1978, when the concept interested Lawrence Ellison, the founder of a start-up that became Oracle Corporation.
This proposal is the basis of the relational data model, a model used by almost all DBMSs. These DBMSs are qualified as Relational DBMSs (RDBMSs). In the past, this label was obvious since the majority of DBMSs were relational. Today, with a very diversified landscape including SQL, NewSQL and NoSQL, it is essential to specify the qualification of the DBMS.
This book discusses the analytic functions of the SQL language in relational DBMSs. The illustrative examples are based on the use of Microsoft SQL Server 2014/2016 DBMS.
Figure 1.6.Edgar Frank Codd (23 August 1923–18 April 2003, United Kingdom). Turing Prize in 1981
Figure 1.7.The relational model: a model based on set theory
SQL (Structured Query Language) is a computer language used to communicate with a database. Unlike programming languages where the user is called upon to specify the execution algorithm, SQL makes it possible to provide results by simply formulating our needs in the form of queries close to natural languages.
The SQL language consists of a set of commands that can be classified into four sub-languages according to their roles in data management within the database.
Figure 1.8 shows the SQL language commands and their classifications which are:
— Data Definition Language (DDL) containing commands for creating, modifying and deleting objects managed in the database, the most important of which are the tables that represent the data storage structures;
— Data Manipulation Language (DML) containing commands to search, insert, modify, delete and merge data into tables;
— Data Control Language (DCL) containing the commands for managing access control (allocation, removal and prohibition) to data;
— Transaction Control Language (TCL) containing transaction control commands.
Figure 1.8.SQL language commands
