32,99 €
Install data analytics into your brain with this comprehensive introduction
Data Analytics & Visualization All-in-One For Dummies collects the essential information on mining, organizing, and communicating data, all in one place. Clocking in at around 850 pages, this tome of a reference delivers eight books in one, so you can build a solid foundation of knowledge in data wrangling. Data analytics professionals are highly sought after these days, and this book will put you on the path to becoming one. You’ll learn all about sources of data like data lakes, and you’ll discover how to extract data using tools like Microsoft Power BI, organize the data in Microsoft Excel, and visually present the data in a way that makes sense using a Tableau. You’ll even get an intro to the Python, R, and SQL coding needed to take your data skills to a new level. With this Dummies guide, you’ll be well on your way to becoming a priceless data jockey.
New and novice data analysts will love this All-in-One reference on how to make sense of data. Get ready to watch as your career in data takes off.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1043
Veröffentlichungsjahr: 2024
Cover
Title Page
Copyright
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here
Book 1: Learning Data Analytics & Visualizations Foundations
Chapter 1: Exploring Definitions and Roles
What Is Data, Really?
Discovering Business Intelligence
Understanding Data Analytics
Exploring Data Management
Diving into Data Analysis
Visualizing Data
Chapter 2: Delving into Big Data
Identifying the Roles of Data
What’s All the Fuss about Data?
Identifying Important Data Sources
Role of Big Data in Data Science and Engineering
Connecting Big Data with Business Intelligence
Analyzing Data with Enterprise Business Intelligence Practices
Chapter 3: Understanding Data Lakes
Rock-Solid Water
A Really Great Lake
Expanding the Data Lake
More Than Just the Water
Different Types of Data
Different Water, Different Data
Refilling the Data Lake
Everyone Visits the Data Lake
Chapter 4: Wrapping Your Head Around Data Science
Inspecting the Pieces of the Data Science Puzzle
Choosing the Best Tools for Your Data Science Strategy
Getting a Handle on SQL and Relational Databases
Investing Some Effort into Database Design
Narrowing the Focus with SQL Functions
Making Life Easier with Excel
Chapter 5: Telling Powerful Stories with Data Visualization
Data Visualizations: The Big Three
Designing to Meet the Needs of Your Target Audience
Picking the Most Appropriate Design Style
Selecting the Appropriate Data Graphic Type
Testing Data Graphics
Adding Context
Book 2: Using Power BI for Data Analytics & Visualization
Chapter 1: Power BI Foundations
Looking Under the Power BI Hood
Knowing Your Power BI Terminology
Power BI Products in a Nutshell
Chapter 2: The Quick Tour of Power BI
Power BI Desktop: A Top-Down View
Services: Far and Wide
Chapter 3: Prepping Data for Visualization
Getting Data from the Source
Managing Data Source Settings
Working with Shared versus Local Datasets
Storage and Connection Modes
Data Sources Oh My!
Cleansing, Transforming, and Loading Your Data
Chapter 4: Tweaking Data for Primetime
Stepping through the Data Lifecycle
Resolving Inconsistencies
Evaluating and Transforming Column Data Types
Configuring Queries for Data Loading
Resolving Errors During Data Import
Chapter 5: Designing and Deploying Data Models
Creating a Data Model Masterpiece
Managing Relationships
Arranging Data
Publishing Data Models
Chapter 6: Tackling Visualization Basics in Power BI
Looking at Report Fundamentals and Visualizations
Choosing the Best Visualization for the Job
Chapter 7: Digging into Complex Visualization and Table Data
Dealing with Table-Based and Complex Visualizations
Using AI Tools to Create Questions and Answers
Formatting and Configuring Report Visualizations
Diving into Dashboards
Chapter 8: Sharing and Collaborating with Power BI
Working Together in a Workspace
Slicing and Dicing Data
Troubleshooting the Use of Data Lineage
Datasets, Dataflows, and Lineage
Defending Your Data Turf
Book 3: Using Tableau for Data Analytics & Visualization
Chapter 1: Tableau Foundations
Understanding Key Tableau Terms
Getting to Know the Tableau Product Line
Choosing the Right Version
Knowing What Tools You Need in Each Stage of the Data Life Cycle
Understanding User Types and Their Capabilities
Chapter 2: Connecting Your Data
Understanding Data Source Options
Connecting to Data
Setting Up and Planning the Data Source
Relating and Combining Data Sources
Working with Data Relationships
Joining Data
Chapter 3: Diving into the Tableau Prep Lifecycle
Dabbling in Data Flows
Saving Prep Data
Chapter 4: Advanced Data Prep Approaches in Tableau
Peering into Data Structures
Structuring for Data Visualization
Normalizing Data
Chapter 5: Touring Tableau Desktop
Getting Hands-On in the Tableau Desktop Workspace
Making Use of the Tableau Desktop Menus
Tooling Around in the Toolbar
Understanding Sheets versus Workbooks
Chapter 6: Storytelling Foundations in Tableau
Working with Dashboards
Creating a Compelling Story
Chapter 7: Visualizing Data in Tableau
Introducing the Visualizations
Converting a Visualization to a Crosstab
Publishing Visualizations
Chapter 8: Collaborating and Publishing with Tableau Cloud
Strolling through the Tableau Cloud Experience
Evaluating Personal Features in Tableau Cloud
Sharing Experiences and Collaborating with Others
Book 4: Extracting Information with SQL
Chapter 1: SQL Foundations
SQL and the Relational Model
Sets, Relations, Multisets, and Tables
Functional Dependencies
Keys
Views
Users
Privileges
Schemas
Catalogs
Connections, Sessions, and Transactions
Routines
Paths
Chapter 2: Drilling Down to the SQL Nitty-Gritty
Executing SQL Statements
Using Reserved Words Correctly
SQL’s Data Types
Handling Null Values
Applying Constraints
Chapter 3: Values, Variables, Functions, and Expressions
Entering Data Values
Working with Functions
Using Expressions
Chapter 4: SELECT Statements and Modifying Clauses
Finding Needles in Haystacks with the SELECT Statement
Modifying Clauses
Chapter 5: Tuning Queries
SELECT DISTINCT
Temporary Tables
The ORDER BY Clause
The HAVING Clause
The OR Logical Connective
Chapter 6: Complex Query Design
What Is a Subquery?
What Subqueries Do
Using Subqueries in INSERT, DELETE, and UPDATE Statements
Tuning Considerations for Statements Containing Nested Queries
Tuning Correlated Subqueries
UNION
INTERSECT
EXCEPT
Chapter 7: Joining Data Together in SQL
JOINS
ON versus WHERE
Join Conditions and Clustering Indexes
Book 5: Performing Statistical Data Analysis & Visualization with R Programming
Chapter 1: Using Open Source R for Data Science
Downloading Open Source R
Comprehending R’s Basic Vocabulary
Delving into Functions and Operators
Iterating in R
Observing How Objects Work
Sorting Out R’s Popular Statistical Analysis Packages
Examining Packages for Visualizing, Mapping, and Graphing in R
Chapter 2: R: What It Does and How It Does It
The Statistical (and Related) Ideas You Just Have to Know
Getting R
Getting RStudio
A Session with R
R Functions
User-Defined Functions
Comments
R Structures
for Loops and if Statements
Chapter 3: Getting Graphical
Finding Patterns
Doing the Basics: Base R Graphics, That Is
Chapter 4: Kicking It Up a Notch to ggplot2
Histograms
Bar Plots
Dot Charts
Bar Plots Re-revisited
Scatter Plots
Scatter Plot Matrix
Box Plots
Book 6: Applying Python Programming to Data Science
Chapter 1: Discovering the Match between Data Science and Python
Creating the Data Science Pipeline
Understanding Python’s Role in Data Science
Learning to Use Python Fast
Working with Python
Using the Python Ecosystem for Data Science
Chapter 2: Using Python for Data Science and Visualization
Using Python for Data Science
Sorting Out the Various Python Data Types
Putting Loops to Good Use in Python
Having Fun with Functions
Keeping Cool with Classes
Checking Out Some Useful Python Libraries
Chapter 3: Getting a Crash Course in Matplotlib
Starting with a Graph
Setting the Axis, Ticks, and Grids
Defining the Line Appearance
Using Labels, Annotations, and Legends
Chapter 4: Visualizing the Data
Choosing the Right Graph
Creating Advanced Scatterplots
Plotting Time Series
Plotting Geographical Data
Visualizing Graphs
Index
About the Authors
Advertisement Page
Connect with Dummies
End User License Agreement
Book 1 Chapter 2
TABLE 2-1 Quantification of Data Storage
TABLE 2-2 The Differences Between Data and Information
Book 1 Chapter 5
TABLE 5-1 Types of Data Visualization, by Audience
Book 2 Chapter 1
TABLE 1-1 Power BI Desktop, Common, Service Features
Book 2 Chapter 4
TABLE 4-1 Join Types
TABLE 4-2 Fuzzy Matching Options
Book 2 Chapter 5
TABLE 5-1 Buttons On the Power BI Model View Home Ribbon
Book 3 Chapter 1
TABLE 1-1 Licensing Differences between Tableau Server and Tableau Cloud
TABLE 1-2 Tools to Utilize For the Tableau Data Life Cycle
Book 3 Chapter 2
TABLE 2-1 Connection Types in Tableau Desktop and Prep
TABLE 2-2 Data Source Planning Categories and Questions
Book 3 Chapter 3
TABLE 3-1 Join Relationship Types for Input Step Data Flows
Book 3 Chapter 4
TABLE 4-1 Field Types Categories
Book 4 Chapter 1
TABLE 1-1 PROJECT Relation
TABLE 1-2 PROJECTS Relation
Book 4 Chapter 3
TABLE 3-1 Sample Literals of Various Data Types
TABLE 3-2 Photographic Paper Price List per 20 Sheets
TABLE 3-3 Examples of String Value Expressions
Book 4 Chapter 4
TABLE 4-1 SQL’s Comparison Predicates
TABLE 4-2 SQL’s
LIKE
Predicate
Book 4 Chapter 6
TABLE 6-1 Ford Small-Block V-8s, 1960–1980
TABLE 6-2 Chevy Small-Block V-8s, 1960–1980
Book 4 Chapter 7
TABLE 7-1 LOCATION
TABLE 7-2 DEPT
TABLE 7-3 EMPLOYEE
Book 5 Chapter 1
TABLE 1-1 Popular Operators
Book 5 Chapter 3
TABLE 3-1 Types and Frequencies of Cars in the Cars93 Data Frame
TABLE 3-2 US Commercial Space Revenues 1990–1994 (in Millions of Dollars)
Book 6 Chapter 3
TABLE 3-1 Matplotlib Line Styles
TABLE 3-2 Matplotlib Colors
TABLE 3-3 Matplotlib Markers
Cover
Table of Contents
Title Page
Copyright
Begin Reading
Index
About the Authors
i
ii
1
2
3
4
5
7
8
9
10
11
12
13
14
15
16
17
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
Data Analytics & Visualization All-in-One For Dummies®
Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com
Copyright © 2024 by John Wiley & Sons, Inc., Hoboken, New Jersey
Media and software compilation copyright © 2024 by John Wiley & Sons, Inc. All rights reserved.
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHORS MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHORS SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHORS OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2024932207
ISBN 978-1-394-24409-6 (pbk); ISBN 978-1-394-24411-9 (ePDF); ISBN 978-1-394-24410-2 (epub)
Everywhere you go in the business world, you are likely to encounter executives who make decisions driven by tidbits of raw data that together tell a meaningful story. In fact, in our everyday worlds, websites and mobile apps express data using powerful visualizations to explain complex numbers and concepts, not extensive written passages anymore. The phrase “a picture speaks a thousand words” rings true in the world of data analytics and visualization, and for good reason.
Data analytics and visualization allow anyone to turn raw data into meaningful stories and insights. You, as the analyst, act as the detective. Instead of having to solve a mystery with clues, you are provided datasets that, if provided with enough clarity, can answer complex questions using trend and pattern analysis. If you review a dataset enough, you’ll inevitably have an ah-ha moment in your interpretation quest, but if the dataset can be presented visually, you can accelerate your understanding like a racecar going from 0 to 100 miles per hour in seconds.
Data analytics and visualization help you uncover creative ways to showcase data in a manner that is both informative and engaging. Data often starts out as nothing more than a bunch of jumbled numbers; turning those numbers into a story that can influence decisions and drive change is incredibly powerful. Global enterprises rely on folks who have the skills you are about to embark on in this book as a way to determine business strategies, make corporate decisions, and influence change. If you are ready to learn these skills, you are in for a treat with this book.
If you’ve picked up this book, you might be on a quest to piece together a whole lot of terms being thrown around in the information economy regarding data, the most precious tool in the information economy. Data is a business asset that sits at the intersection of many disciplines; the resultant product from data can be methodologies, processes, algorithms, and system outputs. To the end user though, the end game is extracting knowledge and insights from the byproducts of data, and taking action upon review.
Book 1 covers the foundational aspects of the data analytics and visualization lifecycle that every user must understand to be proficient as an analytics and visualization savvy. Books 2 and 3 focus on the two leading tools in the enterprise business intelligence market used to perform complex data analytics and visualization tasks; Microsoft Power BI and Tableau. Books 4 through 6 cover the key programming languages used by both proprietary and open-source data analytics and visualization platforms to extract, assess, and visualize data at scale when commercial off-the-shelf enterprise business platforms are unavailable.
This book uses the following technical conventions:
Bold text means that you’re meant to type the text just as it appears in the book. The exception is when you’re working through a steps list: Because each step is bold, the text to type is not bold.
Web addresses and programming code appear in monofont. If you’re reading a digital version of this book on a device connected to the Internet, note that you can click the web address to visit that website, like this:
www.dummies.com
.
For command sequences in software, this book uses the command arrow. Here’s an example that uses Microsoft Word: Click the Office button and then choose Page Layout⇒ Margins⇒ Narrow to decrease the default margin setting.
If you don’t think the book contains any conventions that need to be spelled out in this section, discuss omitting conventions information with your editor.
To make the content more accessible, we divided it into 6 books:
Book 1, “Learning Data Analytics & Visualization Foundations.”
Book 1 introduces terms and fundamental concepts. You learn about big data, data lakes, and data science, and you see how you can apply visualization tools to create meaningful stories based on data you collect.
Book 2, “Using Power BI for Data Analysis & Visualization.”
Book 2 covers Microsoft Power BI, a data analysis and visualization tool used by many large organizations. This book illustrates how you can use Power BI to make sense of structured, unstructured, and semi-structured data, and develop robust business analytics outputs for your organization.
Book 3, “Using Tableau for Data Analysis & Visualization.”
Book 3 covers Tableau, a data analysis and visualization tool favored by researchers and educational institutions. In this book, you discover how to prepare data and present your findings using Tableau’s storytelling and visualization features. You also see how to collaborate and publish your work with Tableau Cloud.
Book 4, “Extracting Information with SQL.”
Book 4 describes SQL and the relational database model. You discover how SQL is a powerful tool that nonprogrammers can use to write complex queries to get the most out of their data, and more.
Book 5, “Performing Statistical Data Analysis & Visualization with R Programming.”
Book 5 introduces the open-source R programming language. You see how you can use R to perform statistical data analysis, data visualization, and other data science tasks.
Book 6, “Applying Python Programming to Data Science.”
Book 6 describes how Python is used as a data science and visualization tool. The book includes a “crash course” on MatPlotLib.
To get the most out of this book, you need the following:
Access to the Internet:
This may sound a bit obvious. Even with the Desktop client, an Internet connection is required in order to access datasets from the Internet.
A meaningful dataset:
A meaningful dataset includes at least 300 to 400 records containing a minimum of five or six columns’ worth of data.
Throughout this book, icons in the margins highlight certain types of valuable information that call out for your attention. Here are the icons you’ll encounter and a brief description of each.
Best Practice icons highlight points of common knowledge among seasoned professionals in the data industry. If you don’t want to look like a complete newbie, follow the well-worn advice described in these paragraphs.
Tips point out shortcuts or essential suggestions that you can use to do things quicker, faster, and more efficiently.
Consider these small suggestions that are quite helpful. Remember icons are like signs on the road to suggest a potential better route.
The Technical Stuff icon marks information of a highly technical nature that you can normally skip over. When appropriate, these paragraphs also suggest specialized resources you may find helpful down the road.
The Warning icon makes you aware of a common issue or product challenge many users face. Don’t fret, but do take note when you see this icon.
In addition to the abundance of information and guidance related to data analysis and visualization provided in this book, you get access to even more help and information online at Dummies.com. Check out this book’s online Cheat Sheet. Just go to www.dummies.com and search for “Data Analysis & Visualization All-in-One For Dummies Cheat Sheet.”
The book has three core themes: foundational concepts, tools, and programming languages.
If you want to learn the essential data analytics and visualization concepts, including learning the lingo of the land, head to Book 1.
If you’re looking to get up to speed on Microsoft’s Enterprise BI tools, head to Book 2. Tableau, a tool used for Enterprise BI but heavily leveraged in communities where data is regulated such as banking, healthcare, insurance, and government, head to Book 3.
The underpinning for data analytics and visualization is SQL, a querying language. To get a crash course on SQL, which is necessary for any proprietary or open-source data analytics and visualization platform, head to Book 4.
Finally, Books 5 and 6