120,99 €
Communicate the data that is powering our changing world with this essential text
The advent of machine learning and neural networks in recent years, along with other technologies under the broader umbrella of ‘artificial intelligence,’ has produced an explosion in Data Science research and applications. Data Visualization, which combines the technical knowledge of how to work with data and the visual and communication skills required to present it, is an integral part of this subject. The expansion of Data Science is already leading to greater demand for new approaches to Data Visualization, a process that promises only to grow.
Data Visualization in R and Python offers a thorough overview of the key dimensions of this subject. Beginning with the fundamentals of data visualization with Python and R, two key environments for data science, the book proceeds to lay out a range of tools for data visualization and their applications in web dashboards, data science environments, graphics, maps, and more. With an eye towards remarkable recent progress in open-source systems and tools, this book offers a cutting-edge introduction to this rapidly growing area of research and technological development.
Data Visualization in R and Python readers will also find:
Data Visualization in R and Python is ideal for any student or professional looking to understand the working principles of this key field.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 598
Veröffentlichungsjahr: 2024
Cover
Table of Contents
Title Page
Copyright
Preface
Introduction
About the Companion Website
Part I: Static Graphics with ggplot (R) and Seaborn (Python)
1 Scatterplots and Line Plots
1.1 R: ggplot
1.2 Python: Seaborn
2 Bar Plots
2.1 R: ggplot
2.2 Python: Seaborn
3 Facets
3.1 R: ggplot
3.2 Python: Seaborn
4 Histograms and Kernel Density Plots
4.1 R: ggplot
4.2 Python: Seaborn
5 Diverging Bar Plots and Lollipop Plots
5.1 R: ggplot
5.2 Python: Seaborn
6 Boxplots
6.1 R: ggplot
6.2 Python: Seaborn
7 Violin Plots
7.1 R: ggplot
7.2 Python: Seaborn
8 Overplotting, Jitter, and Sina Plots
8.1 Overplotting
8.2 R: ggplot
8.3 Python: Seaborn
9 Half-Violin Plots
9.1 R: ggplot
9.2 Python: Seaborn
10 Ridgeline Plots
10.1 History of the Ridgeline
10.2 R: ggplot
11 Heatmaps
11.1 R: ggplot
11.2 Python: Seaborn
12 Marginals and Plots Alignment
12.1 R: ggplot
12.2 Python: Seaborn
13 Correlation Graphics and Cluster Maps
13.1 R: ggplot
13.2 Python: Seaborn
13.3 R: ggplot
13.4 Python: Seaborn
Part II: Interactive Graphics with Altair
14 Altair Interactive Plots
14.1 Scatterplots
14.2 Line Plots
14.3 Bar Plots
14.4 Bubble Plots
14.5 Heatmaps and Histograms
Part III: Web Dashboards
15 Shiny Dashboards
15.1 General Organization
15.2 Second Version: Graphics and Style Options
15.3 Third Version: Tabs, Widgets, and Advanced Themes
15.4 Observe and Reactive
16 Advanced Shiny Dashboards
16.1 First Version: Sidebar, Widgets, Customized Themes, and Reactive/Observe
16.2 Second Version: Tabs,
Shinydashboard
, and Web Scraping
16.3 Third Version: Altair Graphics
17 Plotly Graphics
17.1 Plotly Graphics
18 Dash Dashboards
18.1 Preliminary Operations: Import and Data Wrangling
18.2 First Dash Dashboard: Base Elements and Layout Organization
18.3 Second Dash Dashboard: Sidebar, Widgets, Themes, and Style Options
18.4 Third Dash Dashboard: Tabs and Web Scraping of HTML Tables
18.5 Fourth Dash Dashboard: Light Theme, Custom CSS Style Sheet, and Interactive Altair Graphics
Part IV: Spatial Data and Geographic Maps
19 Geographic Maps with R
19.1 Spatial Data
19.2 Choropleth Maps
19.3 Multiple and Annotated Maps
19.4 Spatial Data (sp) and Simple Features (sf)
19.5 Overlaid Graphical Layers
19.6 Shape Files and GeoJSON Datasets
19.7 Venice: Open Data Cartography and Other Maps
19.8 Thematic Maps with tmap
19.9 Rome’s Accommodations: Intersecting Geometries with Simple Features and tmap
20 Geographic Maps with Python
20.1 New York City: Plotly
20.2 Overlaid Layers
20.3 Geopandas: Base Map, Data Frame, and Overlaid Layers
20.4 Folium
20.5 Altair: Choropleth Map
Index
End User License Agreement
Chapter 1
Figure 1.1 Output of the ggplot function with
x
and
y
aesthetics.
Figure 1.2 First ggplot’s scatterplot.
Figure 1.3 Scatterplot with color aesthetic.
Figure 1.4 Scatterplot with color aesthetic for marital status variable.
Figure 1.5 Scatterplot with income as dependent variable and color aesthetic...
Figure 1.6 (a/b) Scatterplots with four variables.
Figure 1.7 United States’ inflation values 1960–2022.
Figure 1.8 Inflation values for a sample of countries.
Figure 1.9 Dots colors based on an aesthetic when over a threshold, otherwis...
Figure 1.10 Markers colored based on two thresholds and textual labels, US i...
Figure 1.11 Temperature measurement in some US cities, minimum temperatures....
Figure 1.12 A problematic line plot, groups are not respected.
Figure 1.13 Line plot connecting points of same country.
Figure 1.14 Line plot with style options.
Figure 1.15 Scatterplot of the United States’ GDP time series from the World...
Figure 1.16 Scatterplot of the GDP for a sample of countries.
Figure 1.17 Scatterplot with markers styled differently for from year 2000 a...
Figure 1.18 Temperature measurement in some US cities, maximum temperatures....
Figure 1.19 Line plot of GDP variations for a sample of countries.
Figure 1.20 Line plot with line style varied according to country.
Figure 1.21 Line plot and scatterplot overlapped.
Figure 1.22 Line plot with markers automatically added.
Chapter 2
Figure 2.1 Bar plot with two variables.
Figure 2.2 Bar plot with custom color palette, horizontal bar orientation, a...
Figure 2.3 Bar plot with ranges of values for PM10 derived from a continuous...
Figure 2.4 Bar plot with ordered bars and x ticks rotated.
Figure 2.5 Bar plot with three variables and groups of bars.
Figure 2.6 Bar plot with month names and the legend moved outside the plot....
Figure 2.7 Bar plot with stacked bars.
Figure 2.8 Bar plot with ranges of values derived from a continuous variable...
Figure 2.9 Bar plots with quantile representation, subplots, and style optio...
Chapter 3
Figure 3.1 Temperature measurement in some US cities, minimum temperatures, ...
Figure 3.2 Facet visualization with bar plots, some facets not readable due ...
Figure 3.3 Facet visualization with independent scale on
y
-axis.
Figure 3.4 Facet visualization with bar plots, facets are all well-readable ...
Figure 3.5 Temperature measurement in some US cities, maximum temperatures, ...
Figure 3.6 Facets and bar plot visualization.
Figure 3.7 Incorrect facet visualization (single facet detail).
Figure 3.8 Facet visualization with the general method, unbalanced facets.
Figure 3.9 Facet visualization with the general method, independent scales....
Figure 3.10 Facet visualization with balanced and meaningful bar plots.
Chapter 4
Figure 4.1 Number of bins equals to 30.
Figure 4.2 Bin width equal to 10.
Figure 4.3 Facets visualization with histograms.
Figure 4.4 Histogram for bivariate analysis with rectangular tiles.
Figure 4.5 Histogram for bivariate analysis with hexagonal tiles.
Figure 4.6 Histogram for bivariate analysis with facet visualization.
Figure 4.7 Kernel density for bivariate analysis with isodensity curves.
Figure 4.8 Kernel density for bivariate analysis with color gradient, NYC ma...
Figure 4.9 Kernel density for bivariate analysis with color gradient, NYC mi...
Figure 4.10 Histogram for univariate analysis, bin width equals 20.
Figure 4.11 Histogram for univariate analysis and kernel density, bin width ...
Figure 4.12 Histogram for univariate analysis with stacked bars.
Figure 4.13 Histogram for bivariate analysis and continuous variables.
Figure 4.14 Histogram for bivariate analysis with a categorical variable.
Figure 4.15 Histogram for bivariate analysis and facet visualization.
Figure 4.16 Histogram with logarithmic scale.
Figure 4.17 Histogram with logarithmic scale and symmetric log.
Figure 4.18 Histogram with stacked visualization, logarithmic scale, and sym...
Figure 4.19 Histogram with stacked visualization, logarithmic scale, and sym...
Chapter 5
Figure 5.1 Diverging bar plot, yearly wheat production variations for Argent...
Figure 5.2 Diverging bar plot with ordered bars and annotation, yearly varia...
Figure 5.3 Lollipop plot, yearly wheat production variations for Argentina....
Figure 5.4 Lollipop plot ordered by values and annotation, yearly variations...
Figure 5.5 Diverging bar plot, yearly wheat production variations for the Un...
Figure 5.6 Diverging bar plot, yearly wheat production variations for the Un...
Chapter 6
Figure 6.1 Boxplot statistics.
Figure 6.2 Boxplot, air quality in Milan, 2021.
Figure 6.3 Boxplot with three variables, confused result.
Figure 6.4 Boxplot with three variables, unbalanced facet visualization.
Figure 6.5 Boxplot with three variables, balanced facet visualization.
Figure 6.6 Box plot with three variables, the result is confused.
Figure 6.7 Boxplot with three variables, facet visualization.
Chapter 7
Figure 7.1 Violin plot, OECD/Pisa tests, male and female students, Mathemati...
Figure 7.2 Density plot, OECD/Pisa tests, male and female students, Mathemat...
Figure 7.3 Boxplot, OECD/Pisa tests, male and female students, Mathematics s...
Figure 7.4 Violin plot and scatterplot combined and correctly overlapped and...
Figure 7.5 Violin plot and boxplot combined and correctly overlapped and dod...
Figure 7.6 OECD/Pisa tests, male and female students, Mathematics, Reading, ...
Figure 7.7 Violin plot, bike thefts in Berlin, and bike values.
Figure 7.8 Violin plot, bike thefts in Berlin for each month of years 2021 a...
Figure 7.9 Bar plot, bike thefts in Berlin for each month of years 2021 and ...
Figure 7.10 Violin plot, bike thefts in Berlin for bike type and month, year...
Chapter 8
Figure 8.1 Categorical scatterplot with jitter, OECD/Pisa tests results for ...
Figure 8.2 Categorical scatterplot with reduced jitter.
Figure 8.3 Categorical scatterplot with increased jitter.
Figure 8.4 Violin plot and scatterplot with jitter, OECD/Pisa tests results ...
Figure 8.5 Violin plot, boxplot, and scatterplot with jitter, OECD/Pisa test...
Figure 8.6 Sina plot, OECD/Pisa tests results for male and female students, ...
Figure 8.7 Sina plot and violin plot combined, OECD/Pisa tests results for m...
Figure 8.8 Sina plot and boxplot, OECD/Pisa tests results for male and femal...
Figure 8.9 Sina plot with stacked groups of data points and color based on l...
Figure 8.10 Beeswarm plot, OECD/Pisa test results for male and female studen...
Figure 8.11 Comparing overplotting, jitter, sina plot, and beeswarm plot.
Figure 8.12 Strip plot, bike thefts in Berlin.
Figure 8.13 Swarm plot, men’s and ladies’ bike thefts in Berlin, October 202...
Figure 8.14 Sina plot, men’s and ladies’ bike thefts in Berlin in January 20...
Chapter 9
Figure 9.1 Half-violin plot, custom function, OECD/Pisa test results for mal...
Figure 9.2 Half-violin plot, boxplot, and scatterplot with jitter correctly ...
Figure 9.3 OECD/Pisa tests, male and female students, Mathematics, Reading, ...
Figure 9.4 Left-side half-violin plots, male and female students, Mathematic...
Figure 9.5 Raincloud plot, male and female students, Mathematics, Reading, a...
Figure 9.6 Violin plot with groups of two subsets of points, bike thefts in ...
Figure 9.7 Half-violin plots with sticks.
Figure 9.8 Half-violin plots with quartiles.
Chapter 10
Figure 10.1 “Many consecutive pulses from CP1919,” in Harold Dumont Craft, J...
Figure 10.2 Ridgeline plot, OECD-Pisa tests, default alphabetical order base...
Figure 10.3 Ridgeline plot, OECD-Pisa tests, custom order based on arithmeti...
Figure 10.4 Ridgeline plot, OECD-Pisa tests, custom order based on arithmeti...
Figure 10.5 Ridgeline plot, OECD-Pisa tests, custom order based on arithmeti...
Chapter 11
Figure 11.1 Heatmap, bike thefts in Berlin for months and hours of day.
Figure 11.2 Heatmap, bike thefts in Berlin for months and hours and style el...
Figure 11.3 Heatmap, number of bike thefts in Berlin for months and hours.
Figure 11.4 Heatmap, value of stolen bikes in Berlin for months and hours.
Chapter 12
Figure 12.1 Marginal with scatterplot and histograms, bike thefts in Berlin ...
Figure 12.2 Plots aligned in a vertical grid, marginals, bike thefts in Berl...
Figure 12.3 Marginal with scatterplot and rug plots, bike thefts in Berlin (...
Figure 12.4 Marginal with categorical scatterplot and rug plot, number of st...
Figure 12.5 Subplots, a scatter plot and a boxplot horizontally aligned, sto...
Figure 12.6 Subplots, a scatter plot and a boxplot vertically aligned, stole...
Figure 12.7 Joint plot with density plots as marginals, stolen bikes in Berl...
Figure 12.8 Joint grid with scatterplot and rug plots as marginals, stolen b...
Chapter 13
Figure 13.1 Cluster map, bike thefts in Berlin (2021–2022), values scaled by...
Figure 13.2 Cluster map, bike thefts in Berlin (2021–2022), values scaled by...
Figure 13.3 Cluster map, stolen bikes in Berlin (2021–2022), scaled by colum...
Figure 13.4 Cluster map, stolen bikes in Berlin (2021–2022), scaled by rows....
Figure 13.5 Diagonal correlation heatmap, stolen bikes in Berlin (2021–2022)...
Figure 13.6 Diagonal correlation heatmap, stolen bikes in Berlin, correlatio...
Figure 13.7 Scatterplot heatmap, stolen bikes in Berlin (2021–2022), correla...
Chapter 14
Figure 14.1 Altair, scatterplot with color aesthetic and style options.
Figure 14.2 Altair, horizontal alignments of plots and differences from assi...
Figure 14.3 Altair, facet visualization.
Figure 14.4 (a) Dynamic tooltip (example 1). (b) Dynamic tooltip (example 2)...
Figure 14.5 (a) Dynamic legend, year 2005. (b) Dynamic legend, year 2010.
Figure 14.6 (a) Dynamic zoom, zoom in. (b) Dynamic zoom, zoom out.
Figure 14.7 Mouse hover, contextual change of color.
Figure 14.8 Drop-down menu.
Figure 14.9 Radio buttons.
Figure 14.10 (a) Selection with brush and synchronized table (example 1). (b...
Figure 14.11 (a) (Left plot) brush selection; (right plot) synchronized plot...
Figure 14.12 (a) Plot as interactive legend, all years selected. (b) Plot as...
Figure 14.13 Line plots, mean per capita, total expenditure, and total arriv...
Figure 14.14 Line plots with mouse hover, Oceania’s line is highlighted (the...
Figure 14.15 (a) Line plot with mouse hover and coordinated visualization of...
Figure 14.16 Line plot with mouse hover and coordinated visualization in all...
Figure 14.17 (Left): Bar plot with segment for the arithmetic mean.
Figure 14.18 (Right): Bar plot with horizontal orientation and annotations....
Figure 14.19 Diverging bar plots, pirate attacks, yearly and monthly variati...
Figure 14.20 Plot with two distinct
y
-axes and corresponding scales.
Figure 14.21 Stacked bar plot, pirate attacks, and countries where they took...
Figure 14.22 Bar plot with sorted bars and annotations.
Figure 14.23 (a) Synchronized bar plots, default visualization, without sele...
Figure 14.24 Bar plots and tables synchronized with slider, homeless in the ...
Figure 14.25 (a) Bar plots and slider, homeless in the US States (year 2022)...
Figure 14.26 (a) Bubble plot and slider, homeless in the US States (year 202...
Figure 14.27 Heatmap with dynamic tooltip, homelessness in the US States (% ...
Figure 14.28 Univariate histogram, 100 bins, homeless in the United States (...
Figure 14.29 Bivariate histogram, 20 bins, and scatterplot, homeless in the ...
Figure 14.30 Bivariate histogram, 20 bins, and rug plot, homeless in the Uni...
Part 3
Figure 1 Design for Tandem Cart, 1850–74, Gift of William Brewster, 1923, Th...
Chapter 15
Figure 15.1 (a) Shiny, test MAT, and country AL (Albania) selected. (b) Shin...
Figure 15.2 (a) Table and plot, test READ and country KR (Korea) selected. (...
Figure 15.3 (a) A table, two plots, and light theme. (b) A table, two plots,...
Figure 15.4 (a) Tab MAT, default theme. (b) Tab READ, dark theme. (c) Google...
Chapter 16
Figure 16.1 (a) Layout with default configuration with years range 2000–2021...
Figure 16.2 Excerpt of XML representation of a web-scraped HTML page.
Figure 16.3 Selecting the table element through the Chrome’s Inspect Element...
Figure 16.4 First data frame obtained through web scraping from an HTML page...
Figure 16.5 Second data frame obtained through web scraping from an HTML pag...
Figure 16.6 (a) Expeditions tab, default visualization. (b) Summiteers tab, ...
Figure 16.7 Static and interactive Altair graphics in a Shiny dashboard.
Chapter 17
Figure 17.1 Plotly, scatterplot with default dynamic tooltip.
Figure 17.2 Plotly, scatterplot with extended dynamic tooltip.
Figure 17.3 Plotly, line plot with tooltip.
Figure 17.4 Plotly, scatterplot with a histogram and a rug plot as marginals...
Figure 17.5 Plotly, facet visualization.
Chapter 18
Figure 18.1 Dash dashboard with Plotly graphic.
Figure 18.2 (a) Slider with default range. (b) Slider with modified range (2...
Figure 18.3 (a) Dash, graphic, slider, and data table with interactive featu...
Figure 18.4 (a) Color palette selector and centered, resized data table (exa...
Figure 18.5 Sidebar and reactive data table, all country checkbox selected. ...
Figure 18.6 (a) Dash dashboard, default appearance. (b) Detail of the scatte...
Figure 18.7 (a) First tab with a selection of countries from the drop-down m...
Figure 18.8 (a) First tab, data table, reactive graphics, and layout. (b) Se...
Chapter 19
Figure 19.1 World map from package maps.
Figure 19.2 Italy’s border map.
Figure 19.3 Provinces of Italy.
Figure 19.4 Choropleth map with an incoherent association between data and g...
Figure 19.5 Regions of Italy.
Figure 19.6 Choropleth map with coherent data and geographical areas.
Figure 19.7 Choropleth maps, from left to right: ratio of dogs per resident,...
Figure 19.8 Annotated map with dots and city names for Milan, Bologna, and R...
Figure 19.9 ggplot image transformed into a Plotly HTML object.
Figure 19.10 Maps from Natural Earth, Sweden and Denmark’s borders and regio...
Figure 19.11 Railroad and land maps from Natural Earth.
Figure 19.12 Land and railroad maps of Western Europe.
Figure 19.13 Busiest railway stations and railroad network in Western Europe...
Figure 19.14 (a/b) Venice, streets, and canals cartographic layers.
Figure 19.15 Venice municipality border map.
Figure 19.16 Venice, Municipality area, streets, and canals layers.
Figure 19.17 Venice, historical insular part, map with overlaid layers.
Figure 19.18 (a/b) Venice, ggmap, Stamen Terrain, and Toner tiled web maps....
Figure 19.19 Venice, Leaflet base map from OpenStreetMap. (a) Full view. (b)...
Figure 19.20 (a/b/c) Venice, Leaflet tile maps from Stamen, Carto, and ESRI....
Figure 19.21 Venice, ggmap, tiled web maps with cartographic layers. (a) Ope...
Figure 19.22 Venice, Leaflet with Carto Positron tile map, and cartographic ...
Figure 19.23 Venice, Leaflet, civic numbers with dynamic popups associated....
Figure 19.24 Venice, Leaflet, pedestrian areas.
Figure 19.25 Venice, ggplot, markers with annotations.
Figure 19.26 (a) Venice, Leaflet, aggregate circular marker and popup, full ...
Figure 19.27 (a/b) Rome, tmap, choropleth maps of neighborhoods and district...
Figure 19.28 (a) Rome, tmap, historical villas, plot mode (static). (b) Rome...
Figure 19.29 (a) Rome, tmap view mode, city center archaeological map with E...
Figure 19.30 Rome, accommodations for topographic area, wrong bubble plot.
Figure 19.31 (a) Rome, tmap, full map with bubbles centered on centroids and...
Figure 19.32 Rome, tmap, quantiles, and custom legend labels.
Figure 19.33 Rome, tmap, standard quantile subdivision, and legend labels.
Figure 19.34 Rome region tmap, road map with dynamic popups.
Figure 19.35 (a) Rome, tmap, Bed and Breakfasts, full map. (b) Rome, tmap, H...
Figure 19.36 (a) Rome, tmap, hotels, full map. (b) Rome, tmap, hotels, zoom ...
Chapter 20
Figure 20.1 NYC, plotly.express, choropleth map of licensed dogs.
Figure 20.2 NYC, plotly.express, most popular dog breed for zip code.
Figure 20.3 NYC, plotly.express, most popular dog breed for zip code, OpenSt...
Figure 20.4
NYC, plotly go, base map, and dog runs layer
.
Figure 20.5 NYC, plotly go, overlaid layers, Choropleth map, and dog runs, C...
Figure 20.6 NYC, plotly.express and geopandas, dog runs, extended tooltip.
Figure 20.7 NYC, plotly go and geopandas, dog runs, extended tooltip.
Figure 20.8 NYC, plotly go and geopandas, dog breeds and dog runs with disti...
Figure 20.9 (a) NYC, plotly go and geopandas, dog breeds, dog run areas, and...
Figure 20.10 NYC, Folium, base map with default tiled web map from OpenStree...
Figure 20.11 NYC, Folium, markers, popups, and tooltips, Stamen Terrain tile...
Figure 20.12 (a/b) NYC, Folium, marker’s popups with HTML iframe and image (...
Figure 20.13 NYC, Folium, base map, and GeoJSON layer with FEMA sea level ri...
Figure 20.14 NYC, Folium choropleth map, rodent inspections finding rat acti...
Figure 20.15 NYC, Folium and geopandas, rodent inspections finding rat activ...
Figure 20.16 NYC, Folium heatmap of rodent inspections with rat activity.
Figure 20.17 (a/b) Altair, NYC zip code areas, and boroughs.
Figure 20.18 Altair, NYC subway stations with popups.
Figure 20.19 Altair, choropleth maps for ethnic groups (from left to right: ...
Cover
Table of Contents
Title Page
Copyright
Preface
Introduction
About the Companion Website
Begin Reading
Index
End User License Agreement
iii
iv
xiii
xiv
xv
xvi
xvii
xviii
xix
xx
xxi
xxii
xxiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
99
100
101
102
103
104
105
106
107
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
Marco Cremonini
University of Milan, Italy
Copyright © 2025 by John Wiley & Sons Inc. All rights reserved, including rights for text and data mining and training of artificial intelligence technologies or similar technologies
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data is applied for:
Hardback ISBN 9781394289486
Cover Design: Manuela RuggeriCover Images: Courtesy of Marco Cremonini, © ulimi/Getty Images
The idea of this handbook came to me when I noticed something that made me pause and reflect. What I saw was that when I mentioned data visualization to a person who know just a little about it, perhaps adding that it involves representing data and the results of data analysis with figures, sometimes even interactive one, the reaction was often of curiosity with a shade of perplexity, the name sounded nice, but what is it, exactly? After all, if we have a table with data and we want to produce a graph, isn’t it enough to search in a menu, choose the stylized figure of the graph you want to create and click? Is there so much to say to fill an entire book? When I also add that what I was talking about were completely different graphic tools from those of office automation and that, to tell the truth, it doesn’t even stop at the graphics, even if they are interactive, but there are also dashboards, i.e. the latest evolution of data visualization, when real dynamic web applications are created, then the expression of the interlocutor was generally crossed by a shadow of concern. At that moment, I typically threw the ace up the sleeve by saying that in data visualization there are also maps, geographical maps – why not? – those are data too, they are spatial data, geographical data, and the maps are produced with the zoom, the flags, colored areas, and also cartographic maps, you may work with maps of New York, Tokyo, Paris, Rome, New Delhi, you name it.
At that point the interlocutors were usually looking puzzled, the references they had from the common experience were lost and doesn’t really know what this data visualization is about, only that there actually seems to be a lot to say, enough to fill an entire book.
If anyone recognizes themselves in this interlocutor, be assured that you are in good company. Good in a literal not figurative sense, because data visualization is the Cinderella of data science that many admire but always from a certain distance, it arrives last and at the best moment it is forced to step back because there is no longer enough time to teach, study, or practice it. Yet, it frequently happens that those who, given the right opportunity to study and practice it, sense that it could be decidedly interesting, certainly prove useful and applicable in an infinite number of fields. This is due to a property that data visualization has and is instead absent in data analysis or code development: it stimulates visual creativity together with logic. Even statisticians and programmers use creativity, those who deny it have never been neither of them, but that is logical creativity. With data visualization, another dimension of otherwise neglected data science comes into play, the visual language combined with computational logic, meaning that data are represented with an expressive form that is no longer just logical and symbolic, but also perceptive, sensorial, shapes together with colors come into play, the once passive observer starts interacting, or projections of geographical areas suddenly become artifacts to use in a visual communication. Data visualization conveys different knowledge and logic for an expressive form that always has a double nature: computational for the data that feeds it, visual and sometimes interactive for the language it uses to communicate with the observer. There is enough to fill not a single book, in fact, what is contained in this book is a part of the discourse on data visualization, the one more practical and operative, other publications approach data visualization considering complementary aspects, such as the aesthetical composition of graphics, the storytelling behind a visual communication, and the syntax and semantic of a visual language together with the sensorial perception and psychology, and there is a lot to say for each one of these topics. All of them are essential for a complete understanding of the aim and extent of data visualization, but together they just don’t fit in one single handbook, unless presented in a truly superficial fashion, for this reason almost every book on data visualization focuses more explicitly on a few of those aspects. This book is dedicated to the more operational and computational issues, because you have to know the low-level logic behind modern data visualization artifacts and you have to know and practice with tools, they are not all alike, “just pick the easiest to use and you’re all set” is definitely not a good advice and, given the liveliness of the proprietary data visualization tools’ market, it is easy to forget about open-source ones, which instead rival and often surpass what proprietary tools are able to offer; may be with a little more of initial efforts, but not much.
To conclude, data visualization probably deserves better consideration in educational programs and a recognition as a coherent and evolving discipline. It could be a lot of fun to study and practice it, it could make also you pause and reflect about tools for communicating data science results with a visual language, and it includes many different aspects from diverse disciplines, both theoretical and practical, all converging and enmeshing in a coherent body of knowledge. These are all good characteristics for curious persons. The Cinderella role of data visualization can be overcome by recognizing its educational and professional value and, no less important, its creative stimulus.
October 8, 2024
Marco Cremonini
University of Milan
When you mention data visualization to a person who doesn’t know it, perhaps adding that it involves data and the results of data analysis with figures, sometimes even interactive ones, the reaction you observe is often that the person in front of you looks intrigued but doesn’t know exactly what it consists of. After all, if we have a table with data and we want to produce a graph, isn’t it enough to open the usual application, go to a certain drop-down menu, choose the stylized figure of the graph you want to create and click? Is there so much to say to fill an entire book? At that moment, when you perceive that the interlocutor is thinking of the well-known spreadsheet product, you may add that those described in the book are graphic tools completely different from those of office automation and, to tell the truth, we don’t even stop at the graphics, even if interactive, but there are also dashboards, namely the latest evolution of data visualization, when it is transformed into dynamic web applications, and to obtain dashboards it is not sufficient to click on menus but you have to go deeper into the inner logic and mechanisms. It’s then that the expression of the interlocutor is generally crossed by a shadow of concern and you can play the ace up your sleeve by saying that in data visualization there are also maps, geographical maps, sure, those are made by data too: spatial data and geographical data, and the maps can be produced with the many available widgets such as zoom, flags, and colored areas; and we even go beyond simple maps, because there are also cartographic maps with layers of cartographic quality, such as maps of Rome, of Venice, of New York, of the most famous, and also not-so-famous cities and places, possibly with very detailed geographical information.
At that point the interlocutor has likely lost the references she or he had from the usual experience with office automation products and doesn’t really know what this data visualization is, only that there seems to be a lot to say, enough to fill an entire book. If anyone recognizes themselves in this imaginary interlocutor (imaginary up to a certain point, to be honest), know that you are in good company. Good in a literal not figurative sense, because data visualization is a little like the Cinderella of data science that many admire from a certain distance, it arrives last in a project and sometimes it does not receive the attention it deserves. Yet there are many who, given the right opportunity to study and practice it, sense that it could be interesting and enjoyable, it could certainly prove useful and applicable in an infinite number of areas, situations, and results. This is due to a property that data visualization has and is instead absent in traditional data analysis or code development: it stimulates visual creativity together with logic. Even statisticians and programmers use creativity, those who deny it have never really practiced one of those disciplines, but that is logical creativity. With data visualization, another dimension of data science that is otherwise neglected comes into play, the visual language combined with computational logic, the data represented with an expressive form that is no longer just logical and formal, but also perceptive, and sensorial, comes into play with shapes, colors, use and projections of space, and it is always accompanied with meaning that the originator wish to convey and the observers will interpret, often subjectively. Data visualization conveys different knowledge and logic for an expressive form that always has a double soul: computational for the data that feeds it, visual and sometimes interactive for the language it uses to communicate with the observer. Data visualization has always a double nature: it is a key part of data science for its methods, techniques, and tools, and it is storytelling; who produces visual representations from data tells a story that may have different guises and may produce different reactions. There is enough to fill not just a single book.
The text is divided into four parts already mentioned in the previous introduction. The first part presents the fundamentals of data visualization with Python and R, the two reference languages and environments for data science, employed to create static graphs as a direct result of a previous data wrangling (import, transformation) and analysis activity. The reference libraries for this first part are Seaborn for Python and ggplot2 for R. They are both modern open-source graphics libraries and in constant evolution, both produced by the core developers and with the contributions of the respective communities, very large and lively in engaging in continuous innovations. Seaborn is the more recent of the two and partly represents an evolved interface of Python’s traditional matplotlib graphics library, made more functional and enriched with features and graph types popular in modern data visualization. Ggplot2 is the traditional graphic library for R, unanimously recognized as one of the best ever, both in the open-source and proprietary world. Ggplot is full of high-level features and constantly evolving, it receives contributions from researchers and developers from various scientific and application fields. A simply unavoidable tool for anyone approaching data visualization. The two have different settings, more traditional Seaborn, with a collection of functions and options for the different types of charts supported. Instead, ggplot is organized by overlapping graphic levels, according to a setting that goes by the name of grammar of graphics, shared by some of the most widespread digital graphics tools, and suitable for developing even unconventional types of graphics, thanks to the extreme flexibility it allows. This first part covers about a third of the work.
The second part introduces Altair, a Python library capable of producing interactive graphics in HTML and JSON format, as well as static versions in bitmap (PNG and JPG) and vector (SVG) formats. Altair is a young but solid graphic library because in all respects it represents a modern interface of Vega-Lite, a graphic library with an established tradition for web applications thanks to the declarative syntax in JSON format. Altair offers the same web-oriented functionality as Vega-Lite for typical data science use, with a syntax that supports the definition of overlapping graphical layers and aesthetics composed with a syntax that is easy to use and common to similar tools. This second part presents a higher level of difficulty than the first, but certainly within reach for those who have acquired the fundamental knowledge given by the first part. The first and second parts cover approximately half of the work.
The third and fourth parts represent advanced data visualization contents. The difficulty increases and so does the commitment required, on the other hand, we face two real worlds: that of web dashboards and of spatial data and maps. The term dashboard may be new to many, but dashboards are not. Whenever you access environments on the web that show menus and configurable graphic objects according to user’s choices and content in the form of data or graphs, what you are using is most likely a dashboard. If you access Open Data of a large institution, such as the Organisation for Economic Co-operation and Development (OECD) or the United Nations, or even an internal company application that displays graphs and statistics, you are most likely using a dashboard. Numerous systems and products for creating dashboards with different technologies are available, it is a vast market. In data science environments with Python and R, there are two formidable tools, Plotly/Dash and Shiny, respectively. They are professional tools, and the list of relevant organizations using them is long. They are also irreplaceable teaching tools for learning the logic and basic mechanisms of a dashboard, which, in its final form, is a web application, therefore integrated with the typical technology of pages and websites. However, a Dash or Shiny dashboard is also something else, it is the terminal point of a pipeline that begins with the fundamentals of data science, data import, data wrangling, data analysis, and then static and dynamic graphs. The dashboard is the final end in which everything is concentrated and integrated: logic, mechanisms, requirements, and creativity. Technically they are challenging due to the presence of reactive logic which allows them to be dynamic and interactive and due to the integration of various components. The text discusses and develops examples of medium complexity, with different solutions, from web scraping of online content to the integration of Altair interactive graphics.
The second world that opens up, that of geographical maps, is undeniably fascinating. Spatial data, choropleth maps, the simplest ones with the colored areas (such as maps with areas colored according to the coalition that won the elections or the rate of unemployment by province, region, or nation), but also maps based on cartography data are the declination given by data science of a discipline that has very ancient roots and still constitutes an almost independent environment composed of high-resolution maps and geographic information systems (GIS), with its specializations and professional skills. Until a few years ago, data science tools could not even touch that world, but today they have come surprisingly close. This is thanks to extraordinary progress in open-source systems and tools, Python but above all R, which is now offering formidable tools capable of also using shape files from a technical cartography and geographic coordinate systems according to international standards. In the examples presented, geographic and cartographic files from Venice, Rome, and New York were used with the aim of showing the impressive potential offered by the Python and R tools.
It is simple to specify to whom this text is addressed: it is addressed to everyone. Anyone who finds data visualization interesting, and images useful for their work, study, and the skills they are building, will find a learning path that starts from the fundamentals and goes up to cartography and web applications. Of course, saying “it’s aimed at everyone” is simple, then doubts may arise in the reader, “but am I also part of those everyone?” Trying to make a list of those included in this “everyone” will inevitably leave out someone, but we could certainly mention students, researchers, and instructors of social, political, and economic sciences. In addition to many generic data, they may have spatial data to represent (e.g. movement of people and goods, global supply chains, logistics, spatial or ethnographic analyses). Next, students, researchers, and instructors of marketing, communication, public relations, journalism, media, and advertising, for whom interactive representations via the web and graphics in general are important, as products and skills. Also, students, researchers, and instructors of scientific and medical disciplines could be interested, they often deal with sophisticated graphic representations, for example in biology or epidemiology, without forgetting that the graphic contributions from the genomics and molecular biology community are among the most numerous. Students, researchers, and instructors of engineering, management, or bioengineering, for example, use data science tools and visualization as an integral part of their analyses. Historians, archaeologists, and paleontologists produce high-quality graphic representations, so the text can be useful for them too. Well, the list is already long, and I’m definitely forgetting someone who should be mentioned instead.
In general, undergraduate and graduate students, teachers, researchers, and Ph.D. students will find many examples and explanations to help them graphically present content and organize exercises. Likewise, professionals and companies, for their corporate and institutional communication and training, may enjoy the material presented in the text.
What I’m trying to say is that data visualization, like data science as a whole, is not a sectoral discipline for which you need to have a specific background, such as a statistician, computer scientist, engineer, or graphic designer. It is not necessary at all, in fact the opposite is needed, that is, that data visualization and data science be as transversal as possible, being studied and used by all those who, for their formation and work interests, in their specific field, from economics to paleontology, from psychology to molecular biology, find themselves working with data, whether numerical, textual, or spatial and find useful to obtain high-quality visual representations from those data, perhaps interactive or structured in dashboards.
To follow and learn the contents of the text it is necessary to know the fundamentals of data science with Python and R, meaning those concerned with importing and reading operations of datasets and the typical data wrangling operations (sorting, aggregations, shape and type transformations, selections, and so on). Numerous examples are presented in the text which include the data wrangling part (the cases where it is longer can be found in the Supplementary Material), so to replicate a visualization all the necessary code is available, starting from reading the Open Data. Therefore, it is not required to independently produce the preliminary part of operations on the data, but it is necessary to be able to interpret the logic and the operations that are performed. Hence the need to know the fundamentals, as well as the possibility of producing variations of the examples.
Another aspect that may appear problematic is knowledge of the fundamentals of data science with both Python and R because often one only knows one of the two environments and languages. In this regard, I would like to reassure anyone who finds themselves in this situation. If you know data wrangling operations with R or Python, interpreting the logic of those carried out with the other language requires little effort, at most the details of the syntax will need some specific attention and learning efforts. But here a second consideration comes into play: knowledge of both Python and R is particularly useful in modern data science, those who know only one of the two probably just need a good opportunity to learn the second, discovering that the learning curve is much smoother than could have been imagined and the effort will be certainly reasonable. The advantage will be considerable in terms of new features and tools that will become available.
The organization into parts also suggests a progression and division in learning and teaching. The first part is also suitable for those who have just learned the fundamentals of data science and can be carried out in parallel with the study of those fundamentals. Most static graphs require basic data wrangling operations and generating graphs can be a great educational tool for demonstrating the logic and use of data wrangling operations. The presentation of the different types of static graphs follows an order of increasing complexity, from the first intuitive and easily modifiable in infinite variations, up to the last ones which require knowledge of some important properties of statistical analysis. The difficulty level of code is generally low. The second part is a natural continuation of the first. The Altair library has a linear and clear syntax, so the greater difficulty introduced by the interactive features, especially in terms of computational logic, is completely within reach for anyone who has learned the fundamentals contained in the first part. The result will be motivating, the Altair interactive graphics are of excellent quality, allowing various configurations and alternative solutions.
Between these two parts and the subsequent third and fourth parts, there is a gap in terms of what is required and what is learned, for this reason in the initial introductory part the last two parts were presented as advanced content. It is necessary to have acquired a good familiarity with the fundamentals, confidence in searching for information in the documentation of libraries, and knowing how to patiently and methodically manage errors. In other words, you need to have done a good number of exercises with the fundamental part.
For the third part on dashboards, it is necessary to have basic knowledge of HTML, CSS, and in general how a traditional web page is made. They are not difficult notions, but it may take some time to acquire them. You don’t need more advanced knowledge, such as JavaScript or web application frameworks. You also need to have gained some confidence in writing scripts in Python and R. In both cases you learn the basic reactive mechanisms to manage interactivity, it is a different logic from the traditional one.
For the fourth part on maps, it is necessary to learn the fundamental notions of geographic coordinate systems, the form of geographic data with the typical organization in geometries, and the often-necessary coordinate transformations. The tools used are partly known, ggplot for R and pandas for Python, but many new ones will be encountered because in any case, not only in the world of cartography but also in that of data science, the logic, methods, and tools to use spatial data have specificities that distinguish them. As mentioned initially, there are some initial difficulties to overcome and it is required to go into the details of the shape of the spatial data, but the use of these data and the production of geographical maps is fascinating, right from the first and simple choropleth maps. However, it is right after those initial maps that there’s the real beauty of working with spatial data and geographic maps.
As always, or almost always, much remains excluded from the content of a book, sometimes simply due to the need not to exceed a certain number of pages, sometimes out of pure forgetfulness, and often due to a conscious choice by the Author. All three motifs also exist in this work. For the first, that is simply how a publisher works; for the second, other than apologizing I don’t know what to say since those are things I’ve forgotten; for the third however, there is something to comment on, if for no other reason than to give some explanation of the motives for exclusions by choice.
The first obvious exclusion is the absence of proprietary technologies and tools. For data visualization there are many proprietary solutions, from very specialized ones produced by small companies to generalist ones produced by big players. Manufacturers of data visualization software will say that their tools are better than those presented in this book. For some aspects, it might be true, but almost always it is false and in general, to define itself as better than the open-source tools of Python and R would require several distinctions and clarifications that are rarely presented. One of the main reasons is the ease of use of the graphical interfaces of proprietary tools compared to the low-level programming of open-source ones. An old, worn out, and now out-of-date issue that is slowly, perhaps, starting to be overcome. It is obvious that learning to click sequences of buttons and menus or drag graphic icons is initially simpler than writing code with a programming language. The initial learning curve is different in the two cases. The point, however, lies in that adjective, “initial.” What happens next? What is the purpose of learning to use these tools? If the purpose is educational, teaching and learning the fundamentals and advanced contents of data visualization, there is practically no choice, only the environments and tools that exhibit low-level details are teaching tools. The others simply aren’t. They are suitable for professional training courses on that particular instrument, but not for basic teaching or learning. This is enough to exclude any proprietary instrument from this text. It should be noted that some of the most modern proprietary tools (or perhaps made by intelligent manufacturers) are integrating the open-source technologies of Python and R into their frameworks, with the idea of offering both possibilities.
Then there is a specific and perhaps surprising exclusion among the basic chart types, and not one of the exotic kind that very few use, on the contrary of the most widespread, very widespread indeed. The excluded is pie charts and reason is simply that it is not useful in the true sense of data visualization in data science. The statement will seem surprising, in what sense are pie charts, ubiquitous and used millions of times, not useful? I will briefly explain the reason, which is also shared by many who deal with data visualization. A graph is produced to visually represent the information contained in certain data and this representation is based on at least two conditions: (1) that the visual representation is clear and interpretable in an unambiguous way and (2) that with the graph, the information contained in the data is easier to understand than the tabular form (or at least of equal difficulty). Pie charts satisfy neither condition. They are ambiguous because the relative size of the slices is often unclear and above all they make it more difficult to interpret the data than the equivalent table. In other words, if the table with the values is presented instead of the pie chart, the reader has easier, clearer, and more understandable information. On the contrary, bar charts are one of the fundamental type of graphics, despite the fact that pie charts are simply the polar coordinate representation of a bar chart. So why this difference and why pie charts are so common? The reason for the difference is that visually evaluating angles is considerably more difficult than comparing linear heights. Pie charts are mostly used because they just give a touch of color to an otherwise monotonous text, not for their informative content. And what about the difficulty of evaluating the slice proportions? Well, the numerical values are often added to the slices, that is, in practice, to rewrite the data table right over the graphic.
To conclude, data visualization deserves more space in educational programs and clearer recognition as a coherent and evolving discipline and body of knowledge. The Cinderella role of data science can be overcome by recognizing its educational value and, no less importantly, its creative stimulus.
This book is accompanied by a companion website:
https://www.wiley.com/go/Cremonini/DataVisualization1e
This website includes:
Codes
Figures
Datasets
The grammar of graphics was cited in the Introduction and will continue to be mentioned in the rest of the text. We see a brief summary here. The concept of grammar of graphics was proposed by Leland Wilkinson in the early 2000s with the idea of creating grammatical, mathematical, and aesthetic rules to define the graphics that were produced by statistical analysis. The different approach, with respect to the fixed definition of chart types composed of stylized reference schemes, is that a graph’s grammar would instead have allowed previously unknown flexibility. In Wilkinson’s definition, seven fundamental components were identified, but the construction by overlapping layers was not yet highlighted. It is Hadley Wickham, core developer of R and ggplot, who in 2010 introduced the layered grammar of graphics, with which Wilkinson’s approach was updated by reviewing the fundamental elements. The definition by levels provides the representation of the data, combining statistics and geometries, two of the fundamental elements, together with positions, aesthetics, scales, a coordinate system, and possibly facets. We will find all these elements in ggplot and Altair, the two graphic libraries organized according to the grammar of graphics considered in this book, as well as in the recent but still preliminary Seaborn Objects interface of Seaborn, the reference graphic library for Python.
Leland Wilkinson,
The Grammar of Graphics
, 2nd Ed., Springer-Verlag New York 2005,
https://doi.org/10.1007/0-387-28695-0
.
Leland Wilkinson, The Grammar of Graphics,
Chapter 13
,
Handbook of Computational Statistics, Concepts and Methods
, 2nd Ed., Gentle J.E., Härdle W. K. and Mori Y. (eds.), Springer-Verlag Berlin Heidelberg 2012.
https://doi.org/10.1007/978-3-642-21551-3
.
Wickham, H. (2010). A Layered Grammar of Graphics.
Journal of Computational and Graphical Statistics
19 (1): 3–28.
http://dx.doi.org/10.1198/jcgs.2009.07098
.
Scatterplots, with the main variant represented by line plots, are the fundamental type of graphic for pairs of continuous variables or for a continuous variable and a categorical variable and, in addition to representing the most common type of graphic together with bar plots (or bar charts/bar graphs), form the basis for numerous variations. The logic that guides a scatterplot graphic is to represent with markers