Data Visualization in R and Python - Marco Cremonini - E-Book

Data Visualization in R and Python E-Book

Marco Cremonini

0,0
120,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Communicate the data that is powering our changing world with this essential text

The advent of machine learning and neural networks in recent years, along with other technologies under the broader umbrella of ‘artificial intelligence,’ has produced an explosion in Data Science research and applications. Data Visualization, which combines the technical knowledge of how to work with data and the visual and communication skills required to present it, is an integral part of this subject. The expansion of Data Science is already leading to greater demand for new approaches to Data Visualization, a process that promises only to grow.

Data Visualization in R and Python offers a thorough overview of the key dimensions of this subject. Beginning with the fundamentals of data visualization with Python and R, two key environments for data science, the book proceeds to lay out a range of tools for data visualization and their applications in web dashboards, data science environments, graphics, maps, and more. With an eye towards remarkable recent progress in open-source systems and tools, this book offers a cutting-edge introduction to this rapidly growing area of research and technological development.

Data Visualization in R and Python readers will also find:

  • Coverage suitable for anyone with a foundational knowledge of R and Python
  • Detailed treatment of tools including the Ggplot2, Seaborn, and Altair libraries, Plotly/Dash, Shiny, and others
  • Case studies accompanying each chapter, with full explanations for data operations and logic for each, based on Open Data from many different sources and of different formats

Data Visualization in R and Python is ideal for any student or professional looking to understand the working principles of this key field.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 598

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Table of Contents

Title Page

Copyright

Preface

Introduction

About the Companion Website

Part I: Static Graphics with ggplot (R) and Seaborn (Python)

1 Scatterplots and Line Plots

1.1 R: ggplot

1.2 Python: Seaborn

2 Bar Plots

2.1 R: ggplot

2.2 Python: Seaborn

3 Facets

3.1 R: ggplot

3.2 Python: Seaborn

4 Histograms and Kernel Density Plots

4.1 R: ggplot

4.2 Python: Seaborn

5 Diverging Bar Plots and Lollipop Plots

5.1 R: ggplot

5.2 Python: Seaborn

6 Boxplots

6.1 R: ggplot

6.2 Python: Seaborn

7 Violin Plots

7.1 R: ggplot

7.2 Python: Seaborn

8 Overplotting, Jitter, and Sina Plots

8.1 Overplotting

8.2 R: ggplot

8.3 Python: Seaborn

9 Half-Violin Plots

9.1 R: ggplot

9.2 Python: Seaborn

10 Ridgeline Plots

10.1 History of the Ridgeline

10.2 R: ggplot

11 Heatmaps

11.1 R: ggplot

11.2 Python: Seaborn

12 Marginals and Plots Alignment

12.1 R: ggplot

12.2 Python: Seaborn

13 Correlation Graphics and Cluster Maps

13.1 R: ggplot

13.2 Python: Seaborn

13.3 R: ggplot

13.4 Python: Seaborn

Part II: Interactive Graphics with Altair

14 Altair Interactive Plots

14.1 Scatterplots

14.2 Line Plots

14.3 Bar Plots

14.4 Bubble Plots

14.5 Heatmaps and Histograms

Part III: Web Dashboards

15 Shiny Dashboards

15.1 General Organization

15.2 Second Version: Graphics and Style Options

15.3 Third Version: Tabs, Widgets, and Advanced Themes

15.4 Observe and Reactive

16 Advanced Shiny Dashboards

16.1 First Version: Sidebar, Widgets, Customized Themes, and Reactive/Observe

16.2 Second Version: Tabs,

Shinydashboard

, and Web Scraping

16.3 Third Version: Altair Graphics

17 Plotly Graphics

17.1 Plotly Graphics

18 Dash Dashboards

18.1 Preliminary Operations: Import and Data Wrangling

18.2 First Dash Dashboard: Base Elements and Layout Organization

18.3 Second Dash Dashboard: Sidebar, Widgets, Themes, and Style Options

18.4 Third Dash Dashboard: Tabs and Web Scraping of HTML Tables

18.5 Fourth Dash Dashboard: Light Theme, Custom CSS Style Sheet, and Interactive Altair Graphics

Part IV: Spatial Data and Geographic Maps

19 Geographic Maps with R

19.1 Spatial Data

19.2 Choropleth Maps

19.3 Multiple and Annotated Maps

19.4 Spatial Data (sp) and Simple Features (sf)

19.5 Overlaid Graphical Layers

19.6 Shape Files and GeoJSON Datasets

19.7 Venice: Open Data Cartography and Other Maps

19.8 Thematic Maps with tmap

19.9 Rome’s Accommodations: Intersecting Geometries with Simple Features and tmap

20 Geographic Maps with Python

20.1 New York City: Plotly

20.2 Overlaid Layers

20.3 Geopandas: Base Map, Data Frame, and Overlaid Layers

20.4 Folium

20.5 Altair: Choropleth Map

Index

End User License Agreement

List of Illustrations

Chapter 1

Figure 1.1 Output of the ggplot function with

x

and

y

aesthetics.

Figure 1.2 First ggplot’s scatterplot.

Figure 1.3 Scatterplot with color aesthetic.

Figure 1.4 Scatterplot with color aesthetic for marital status variable.

Figure 1.5 Scatterplot with income as dependent variable and color aesthetic...

Figure 1.6 (a/b) Scatterplots with four variables.

Figure 1.7 United States’ inflation values 1960–2022.

Figure 1.8 Inflation values for a sample of countries.

Figure 1.9 Dots colors based on an aesthetic when over a threshold, otherwis...

Figure 1.10 Markers colored based on two thresholds and textual labels, US i...

Figure 1.11 Temperature measurement in some US cities, minimum temperatures....

Figure 1.12 A problematic line plot, groups are not respected.

Figure 1.13 Line plot connecting points of same country.

Figure 1.14 Line plot with style options.

Figure 1.15 Scatterplot of the United States’ GDP time series from the World...

Figure 1.16 Scatterplot of the GDP for a sample of countries.

Figure 1.17 Scatterplot with markers styled differently for from year 2000 a...

Figure 1.18 Temperature measurement in some US cities, maximum temperatures....

Figure 1.19 Line plot of GDP variations for a sample of countries.

Figure 1.20 Line plot with line style varied according to country.

Figure 1.21 Line plot and scatterplot overlapped.

Figure 1.22 Line plot with markers automatically added.

Chapter 2

Figure 2.1 Bar plot with two variables.

Figure 2.2 Bar plot with custom color palette, horizontal bar orientation, a...

Figure 2.3 Bar plot with ranges of values for PM10 derived from a continuous...

Figure 2.4 Bar plot with ordered bars and x ticks rotated.

Figure 2.5 Bar plot with three variables and groups of bars.

Figure 2.6 Bar plot with month names and the legend moved outside the plot....

Figure 2.7 Bar plot with stacked bars.

Figure 2.8 Bar plot with ranges of values derived from a continuous variable...

Figure 2.9 Bar plots with quantile representation, subplots, and style optio...

Chapter 3

Figure 3.1 Temperature measurement in some US cities, minimum temperatures, ...

Figure 3.2 Facet visualization with bar plots, some facets not readable due ...

Figure 3.3 Facet visualization with independent scale on

y

-axis.

Figure 3.4 Facet visualization with bar plots, facets are all well-readable ...

Figure 3.5 Temperature measurement in some US cities, maximum temperatures, ...

Figure 3.6 Facets and bar plot visualization.

Figure 3.7 Incorrect facet visualization (single facet detail).

Figure 3.8 Facet visualization with the general method, unbalanced facets.

Figure 3.9 Facet visualization with the general method, independent scales....

Figure 3.10 Facet visualization with balanced and meaningful bar plots.

Chapter 4

Figure 4.1 Number of bins equals to 30.

Figure 4.2 Bin width equal to 10.

Figure 4.3 Facets visualization with histograms.

Figure 4.4 Histogram for bivariate analysis with rectangular tiles.

Figure 4.5 Histogram for bivariate analysis with hexagonal tiles.

Figure 4.6 Histogram for bivariate analysis with facet visualization.

Figure 4.7 Kernel density for bivariate analysis with isodensity curves.

Figure 4.8 Kernel density for bivariate analysis with color gradient, NYC ma...

Figure 4.9 Kernel density for bivariate analysis with color gradient, NYC mi...

Figure 4.10 Histogram for univariate analysis, bin width equals 20.

Figure 4.11 Histogram for univariate analysis and kernel density, bin width ...

Figure 4.12 Histogram for univariate analysis with stacked bars.

Figure 4.13 Histogram for bivariate analysis and continuous variables.

Figure 4.14 Histogram for bivariate analysis with a categorical variable.

Figure 4.15 Histogram for bivariate analysis and facet visualization.

Figure 4.16 Histogram with logarithmic scale.

Figure 4.17 Histogram with logarithmic scale and symmetric log.

Figure 4.18 Histogram with stacked visualization, logarithmic scale, and sym...

Figure 4.19 Histogram with stacked visualization, logarithmic scale, and sym...

Chapter 5

Figure 5.1 Diverging bar plot, yearly wheat production variations for Argent...

Figure 5.2 Diverging bar plot with ordered bars and annotation, yearly varia...

Figure 5.3 Lollipop plot, yearly wheat production variations for Argentina....

Figure 5.4 Lollipop plot ordered by values and annotation, yearly variations...

Figure 5.5 Diverging bar plot, yearly wheat production variations for the Un...

Figure 5.6 Diverging bar plot, yearly wheat production variations for the Un...

Chapter 6

Figure 6.1 Boxplot statistics.

Figure 6.2 Boxplot, air quality in Milan, 2021.

Figure 6.3 Boxplot with three variables, confused result.

Figure 6.4 Boxplot with three variables, unbalanced facet visualization.

Figure 6.5 Boxplot with three variables, balanced facet visualization.

Figure 6.6 Box plot with three variables, the result is confused.

Figure 6.7 Boxplot with three variables, facet visualization.

Chapter 7

Figure 7.1 Violin plot, OECD/Pisa tests, male and female students, Mathemati...

Figure 7.2 Density plot, OECD/Pisa tests, male and female students, Mathemat...

Figure 7.3 Boxplot, OECD/Pisa tests, male and female students, Mathematics s...

Figure 7.4 Violin plot and scatterplot combined and correctly overlapped and...

Figure 7.5 Violin plot and boxplot combined and correctly overlapped and dod...

Figure 7.6 OECD/Pisa tests, male and female students, Mathematics, Reading, ...

Figure 7.7 Violin plot, bike thefts in Berlin, and bike values.

Figure 7.8 Violin plot, bike thefts in Berlin for each month of years 2021 a...

Figure 7.9 Bar plot, bike thefts in Berlin for each month of years 2021 and ...

Figure 7.10 Violin plot, bike thefts in Berlin for bike type and month, year...

Chapter 8

Figure 8.1 Categorical scatterplot with jitter, OECD/Pisa tests results for ...

Figure 8.2 Categorical scatterplot with reduced jitter.

Figure 8.3 Categorical scatterplot with increased jitter.

Figure 8.4 Violin plot and scatterplot with jitter, OECD/Pisa tests results ...

Figure 8.5 Violin plot, boxplot, and scatterplot with jitter, OECD/Pisa test...

Figure 8.6 Sina plot, OECD/Pisa tests results for male and female students, ...

Figure 8.7 Sina plot and violin plot combined, OECD/Pisa tests results for m...

Figure 8.8 Sina plot and boxplot, OECD/Pisa tests results for male and femal...

Figure 8.9 Sina plot with stacked groups of data points and color based on l...

Figure 8.10 Beeswarm plot, OECD/Pisa test results for male and female studen...

Figure 8.11 Comparing overplotting, jitter, sina plot, and beeswarm plot.

Figure 8.12 Strip plot, bike thefts in Berlin.

Figure 8.13 Swarm plot, men’s and ladies’ bike thefts in Berlin, October 202...

Figure 8.14 Sina plot, men’s and ladies’ bike thefts in Berlin in January 20...

Chapter 9

Figure 9.1 Half-violin plot, custom function, OECD/Pisa test results for mal...

Figure 9.2 Half-violin plot, boxplot, and scatterplot with jitter correctly ...

Figure 9.3 OECD/Pisa tests, male and female students, Mathematics, Reading, ...

Figure 9.4 Left-side half-violin plots, male and female students, Mathematic...

Figure 9.5 Raincloud plot, male and female students, Mathematics, Reading, a...

Figure 9.6 Violin plot with groups of two subsets of points, bike thefts in ...

Figure 9.7 Half-violin plots with sticks.

Figure 9.8 Half-violin plots with quartiles.

Chapter 10

Figure 10.1 “Many consecutive pulses from CP1919,” in Harold Dumont Craft, J...

Figure 10.2 Ridgeline plot, OECD-Pisa tests, default alphabetical order base...

Figure 10.3 Ridgeline plot, OECD-Pisa tests, custom order based on arithmeti...

Figure 10.4 Ridgeline plot, OECD-Pisa tests, custom order based on arithmeti...

Figure 10.5 Ridgeline plot, OECD-Pisa tests, custom order based on arithmeti...

Chapter 11

Figure 11.1 Heatmap, bike thefts in Berlin for months and hours of day.

Figure 11.2 Heatmap, bike thefts in Berlin for months and hours and style el...

Figure 11.3 Heatmap, number of bike thefts in Berlin for months and hours.

Figure 11.4 Heatmap, value of stolen bikes in Berlin for months and hours.

Chapter 12

Figure 12.1 Marginal with scatterplot and histograms, bike thefts in Berlin ...

Figure 12.2 Plots aligned in a vertical grid, marginals, bike thefts in Berl...

Figure 12.3 Marginal with scatterplot and rug plots, bike thefts in Berlin (...

Figure 12.4 Marginal with categorical scatterplot and rug plot, number of st...

Figure 12.5 Subplots, a scatter plot and a boxplot horizontally aligned, sto...

Figure 12.6 Subplots, a scatter plot and a boxplot vertically aligned, stole...

Figure 12.7 Joint plot with density plots as marginals, stolen bikes in Berl...

Figure 12.8 Joint grid with scatterplot and rug plots as marginals, stolen b...

Chapter 13

Figure 13.1 Cluster map, bike thefts in Berlin (2021–2022), values scaled by...

Figure 13.2 Cluster map, bike thefts in Berlin (2021–2022), values scaled by...

Figure 13.3 Cluster map, stolen bikes in Berlin (2021–2022), scaled by colum...

Figure 13.4 Cluster map, stolen bikes in Berlin (2021–2022), scaled by rows....

Figure 13.5 Diagonal correlation heatmap, stolen bikes in Berlin (2021–2022)...

Figure 13.6 Diagonal correlation heatmap, stolen bikes in Berlin, correlatio...

Figure 13.7 Scatterplot heatmap, stolen bikes in Berlin (2021–2022), correla...

Chapter 14

Figure 14.1 Altair, scatterplot with color aesthetic and style options.

Figure 14.2 Altair, horizontal alignments of plots and differences from assi...

Figure 14.3 Altair, facet visualization.

Figure 14.4 (a) Dynamic tooltip (example 1). (b) Dynamic tooltip (example 2)...

Figure 14.5 (a) Dynamic legend, year 2005. (b) Dynamic legend, year 2010.

Figure 14.6 (a) Dynamic zoom, zoom in. (b) Dynamic zoom, zoom out.

Figure 14.7 Mouse hover, contextual change of color.

Figure 14.8 Drop-down menu.

Figure 14.9 Radio buttons.

Figure 14.10 (a) Selection with brush and synchronized table (example 1). (b...

Figure 14.11 (a) (Left plot) brush selection; (right plot) synchronized plot...

Figure 14.12 (a) Plot as interactive legend, all years selected. (b) Plot as...

Figure 14.13 Line plots, mean per capita, total expenditure, and total arriv...

Figure 14.14 Line plots with mouse hover, Oceania’s line is highlighted (the...

Figure 14.15 (a) Line plot with mouse hover and coordinated visualization of...

Figure 14.16 Line plot with mouse hover and coordinated visualization in all...

Figure 14.17 (Left): Bar plot with segment for the arithmetic mean.

Figure 14.18 (Right): Bar plot with horizontal orientation and annotations....

Figure 14.19 Diverging bar plots, pirate attacks, yearly and monthly variati...

Figure 14.20 Plot with two distinct

y

-axes and corresponding scales.

Figure 14.21 Stacked bar plot, pirate attacks, and countries where they took...

Figure 14.22 Bar plot with sorted bars and annotations.

Figure 14.23 (a) Synchronized bar plots, default visualization, without sele...

Figure 14.24 Bar plots and tables synchronized with slider, homeless in the ...

Figure 14.25 (a) Bar plots and slider, homeless in the US States (year 2022)...

Figure 14.26 (a) Bubble plot and slider, homeless in the US States (year 202...

Figure 14.27 Heatmap with dynamic tooltip, homelessness in the US States (% ...

Figure 14.28 Univariate histogram, 100 bins, homeless in the United States (...

Figure 14.29 Bivariate histogram, 20 bins, and scatterplot, homeless in the ...

Figure 14.30 Bivariate histogram, 20 bins, and rug plot, homeless in the Uni...

Part 3

Figure 1 Design for Tandem Cart, 1850–74, Gift of William Brewster, 1923, Th...

Chapter 15

Figure 15.1 (a) Shiny, test MAT, and country AL (Albania) selected. (b) Shin...

Figure 15.2 (a) Table and plot, test READ and country KR (Korea) selected. (...

Figure 15.3 (a) A table, two plots, and light theme. (b) A table, two plots,...

Figure 15.4 (a) Tab MAT, default theme. (b) Tab READ, dark theme. (c) Google...

Chapter 16

Figure 16.1 (a) Layout with default configuration with years range 2000–2021...

Figure 16.2 Excerpt of XML representation of a web-scraped HTML page.

Figure 16.3 Selecting the table element through the Chrome’s Inspect Element...

Figure 16.4 First data frame obtained through web scraping from an HTML page...

Figure 16.5 Second data frame obtained through web scraping from an HTML pag...

Figure 16.6 (a) Expeditions tab, default visualization. (b) Summiteers tab, ...

Figure 16.7 Static and interactive Altair graphics in a Shiny dashboard.

Chapter 17

Figure 17.1 Plotly, scatterplot with default dynamic tooltip.

Figure 17.2 Plotly, scatterplot with extended dynamic tooltip.

Figure 17.3 Plotly, line plot with tooltip.

Figure 17.4 Plotly, scatterplot with a histogram and a rug plot as marginals...

Figure 17.5 Plotly, facet visualization.

Chapter 18

Figure 18.1 Dash dashboard with Plotly graphic.

Figure 18.2 (a) Slider with default range. (b) Slider with modified range (2...

Figure 18.3 (a) Dash, graphic, slider, and data table with interactive featu...

Figure 18.4 (a) Color palette selector and centered, resized data table (exa...

Figure 18.5 Sidebar and reactive data table, all country checkbox selected. ...

Figure 18.6 (a) Dash dashboard, default appearance. (b) Detail of the scatte...

Figure 18.7 (a) First tab with a selection of countries from the drop-down m...

Figure 18.8 (a) First tab, data table, reactive graphics, and layout. (b) Se...

Chapter 19

Figure 19.1 World map from package maps.

Figure 19.2 Italy’s border map.

Figure 19.3 Provinces of Italy.

Figure 19.4 Choropleth map with an incoherent association between data and g...

Figure 19.5 Regions of Italy.

Figure 19.6 Choropleth map with coherent data and geographical areas.

Figure 19.7 Choropleth maps, from left to right: ratio of dogs per resident,...

Figure 19.8 Annotated map with dots and city names for Milan, Bologna, and R...

Figure 19.9 ggplot image transformed into a Plotly HTML object.

Figure 19.10 Maps from Natural Earth, Sweden and Denmark’s borders and regio...

Figure 19.11 Railroad and land maps from Natural Earth.

Figure 19.12 Land and railroad maps of Western Europe.

Figure 19.13 Busiest railway stations and railroad network in Western Europe...

Figure 19.14 (a/b) Venice, streets, and canals cartographic layers.

Figure 19.15 Venice municipality border map.

Figure 19.16 Venice, Municipality area, streets, and canals layers.

Figure 19.17 Venice, historical insular part, map with overlaid layers.

Figure 19.18 (a/b) Venice, ggmap, Stamen Terrain, and Toner tiled web maps....

Figure 19.19 Venice, Leaflet base map from OpenStreetMap. (a) Full view. (b)...

Figure 19.20 (a/b/c) Venice, Leaflet tile maps from Stamen, Carto, and ESRI....

Figure 19.21 Venice, ggmap, tiled web maps with cartographic layers. (a) Ope...

Figure 19.22 Venice, Leaflet with Carto Positron tile map, and cartographic ...

Figure 19.23 Venice, Leaflet, civic numbers with dynamic popups associated....

Figure 19.24 Venice, Leaflet, pedestrian areas.

Figure 19.25 Venice, ggplot, markers with annotations.

Figure 19.26 (a) Venice, Leaflet, aggregate circular marker and popup, full ...

Figure 19.27 (a/b) Rome, tmap, choropleth maps of neighborhoods and district...

Figure 19.28 (a) Rome, tmap, historical villas, plot mode (static). (b) Rome...

Figure 19.29 (a) Rome, tmap view mode, city center archaeological map with E...

Figure 19.30 Rome, accommodations for topographic area, wrong bubble plot.

Figure 19.31 (a) Rome, tmap, full map with bubbles centered on centroids and...

Figure 19.32 Rome, tmap, quantiles, and custom legend labels.

Figure 19.33 Rome, tmap, standard quantile subdivision, and legend labels.

Figure 19.34 Rome region tmap, road map with dynamic popups.

Figure 19.35 (a) Rome, tmap, Bed and Breakfasts, full map. (b) Rome, tmap, H...

Figure 19.36 (a) Rome, tmap, hotels, full map. (b) Rome, tmap, hotels, zoom ...

Chapter 20

Figure 20.1 NYC, plotly.express, choropleth map of licensed dogs.

Figure 20.2 NYC, plotly.express, most popular dog breed for zip code.

Figure 20.3 NYC, plotly.express, most popular dog breed for zip code, OpenSt...

Figure 20.4

NYC, plotly go, base map, and dog runs layer

.

Figure 20.5 NYC, plotly go, overlaid layers, Choropleth map, and dog runs, C...

Figure 20.6 NYC, plotly.express and geopandas, dog runs, extended tooltip.

Figure 20.7 NYC, plotly go and geopandas, dog runs, extended tooltip.

Figure 20.8 NYC, plotly go and geopandas, dog breeds and dog runs with disti...

Figure 20.9 (a) NYC, plotly go and geopandas, dog breeds, dog run areas, and...

Figure 20.10 NYC, Folium, base map with default tiled web map from OpenStree...

Figure 20.11 NYC, Folium, markers, popups, and tooltips, Stamen Terrain tile...

Figure 20.12 (a/b) NYC, Folium, marker’s popups with HTML iframe and image (...

Figure 20.13 NYC, Folium, base map, and GeoJSON layer with FEMA sea level ri...

Figure 20.14 NYC, Folium choropleth map, rodent inspections finding rat acti...

Figure 20.15 NYC, Folium and geopandas, rodent inspections finding rat activ...

Figure 20.16 NYC, Folium heatmap of rodent inspections with rat activity.

Figure 20.17 (a/b) Altair, NYC zip code areas, and boroughs.

Figure 20.18 Altair, NYC subway stations with popups.

Figure 20.19 Altair, choropleth maps for ethnic groups (from left to right: ...

Guide

Cover

Table of Contents

Title Page

Copyright

Preface

Introduction

About the Companion Website

Begin Reading

Index

End User License Agreement

Pages

iii

iv

xiii

xiv

xv

xvi

xvii

xviii

xix

xx

xxi

xxii

xxiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

99

100

101

102

103

104

105

106

107

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

Data Visualization in R and Python

 

Marco Cremonini

University of Milan, Italy

 

 

 

 

 

Copyright © 2025 by John Wiley & Sons Inc. All rights reserved, including rights for text and data mining and training of artificial intelligence technologies or similar technologies

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.

Library of Congress Cataloging-in-Publication Data is applied for:

Hardback ISBN 9781394289486

Cover Design: Manuela RuggeriCover Images: Courtesy of Marco Cremonini, © ulimi/Getty Images

Preface

The idea of this handbook came to me when I noticed something that made me pause and reflect. What I saw was that when I mentioned data visualization to a person who know just a little about it, perhaps adding that it involves representing data and the results of data analysis with figures, sometimes even interactive one, the reaction was often of curiosity with a shade of perplexity, the name sounded nice, but what is it, exactly? After all, if we have a table with data and we want to produce a graph, isn’t it enough to search in a menu, choose the stylized figure of the graph you want to create and click? Is there so much to say to fill an entire book? When I also add that what I was talking about were completely different graphic tools from those of office automation and that, to tell the truth, it doesn’t even stop at the graphics, even if they are interactive, but there are also dashboards, i.e. the latest evolution of data visualization, when real dynamic web applications are created, then the expression of the interlocutor was generally crossed by a shadow of concern. At that moment, I typically threw the ace up the sleeve by saying that in data visualization there are also maps, geographical maps – why not? – those are data too, they are spatial data, geographical data, and the maps are produced with the zoom, the flags, colored areas, and also cartographic maps, you may work with maps of New York, Tokyo, Paris, Rome, New Delhi, you name it.

At that point the interlocutors were usually looking puzzled, the references they had from the common experience were lost and doesn’t really know what this data visualization is about, only that there actually seems to be a lot to say, enough to fill an entire book.

If anyone recognizes themselves in this interlocutor, be assured that you are in good company. Good in a literal not figurative sense, because data visualization is the Cinderella of data science that many admire but always from a certain distance, it arrives last and at the best moment it is forced to step back because there is no longer enough time to teach, study, or practice it. Yet, it frequently happens that those who, given the right opportunity to study and practice it, sense that it could be decidedly interesting, certainly prove useful and applicable in an infinite number of fields. This is due to a property that data visualization has and is instead absent in data analysis or code development: it stimulates visual creativity together with logic. Even statisticians and programmers use creativity, those who deny it have never been neither of them, but that is logical creativity. With data visualization, another dimension of otherwise neglected data science comes into play, the visual language combined with computational logic, meaning that data are represented with an expressive form that is no longer just logical and symbolic, but also perceptive, sensorial, shapes together with colors come into play, the once passive observer starts interacting, or projections of geographical areas suddenly become artifacts to use in a visual communication. Data visualization conveys different knowledge and logic for an expressive form that always has a double nature: computational for the data that feeds it, visual and sometimes interactive for the language it uses to communicate with the observer. There is enough to fill not a single book, in fact, what is contained in this book is a part of the discourse on data visualization, the one more practical and operative, other publications approach data visualization considering complementary aspects, such as the aesthetical composition of graphics, the storytelling behind a visual communication, and the syntax and semantic of a visual language together with the sensorial perception and psychology, and there is a lot to say for each one of these topics. All of them are essential for a complete understanding of the aim and extent of data visualization, but together they just don’t fit in one single handbook, unless presented in a truly superficial fashion, for this reason almost every book on data visualization focuses more explicitly on a few of those aspects. This book is dedicated to the more operational and computational issues, because you have to know the low-level logic behind modern data visualization artifacts and you have to know and practice with tools, they are not all alike, “just pick the easiest to use and you’re all set” is definitely not a good advice and, given the liveliness of the proprietary data visualization tools’ market, it is easy to forget about open-source ones, which instead rival and often surpass what proprietary tools are able to offer; may be with a little more of initial efforts, but not much.

To conclude, data visualization probably deserves better consideration in educational programs and a recognition as a coherent and evolving discipline. It could be a lot of fun to study and practice it, it could make also you pause and reflect about tools for communicating data science results with a visual language, and it includes many different aspects from diverse disciplines, both theoretical and practical, all converging and enmeshing in a coherent body of knowledge. These are all good characteristics for curious persons. The Cinderella role of data visualization can be overcome by recognizing its educational and professional value and, no less important, its creative stimulus.

October 8, 2024                     

Marco Cremonini

University of Milan

Introduction

When you mention data visualization to a person who doesn’t know it, perhaps adding that it involves data and the results of data analysis with figures, sometimes even interactive ones, the reaction you observe is often that the person in front of you looks intrigued but doesn’t know exactly what it consists of. After all, if we have a table with data and we want to produce a graph, isn’t it enough to open the usual application, go to a certain drop-down menu, choose the stylized figure of the graph you want to create and click? Is there so much to say to fill an entire book? At that moment, when you perceive that the interlocutor is thinking of the well-known spreadsheet product, you may add that those described in the book are graphic tools completely different from those of office automation and, to tell the truth, we don’t even stop at the graphics, even if interactive, but there are also dashboards, namely the latest evolution of data visualization, when it is transformed into dynamic web applications, and to obtain dashboards it is not sufficient to click on menus but you have to go deeper into the inner logic and mechanisms. It’s then that the expression of the interlocutor is generally crossed by a shadow of concern and you can play the ace up your sleeve by saying that in data visualization there are also maps, geographical maps, sure, those are made by data too: spatial data and geographical data, and the maps can be produced with the many available widgets such as zoom, flags, and colored areas; and we even go beyond simple maps, because there are also cartographic maps with layers of cartographic quality, such as maps of Rome, of Venice, of New York, of the most famous, and also not-so-famous cities and places, possibly with very detailed geographical information.

At that point the interlocutor has likely lost the references she or he had from the usual experience with office automation products and doesn’t really know what this data visualization is, only that there seems to be a lot to say, enough to fill an entire book. If anyone recognizes themselves in this imaginary interlocutor (imaginary up to a certain point, to be honest), know that you are in good company. Good in a literal not figurative sense, because data visualization is a little like the Cinderella of data science that many admire from a certain distance, it arrives last in a project and sometimes it does not receive the attention it deserves. Yet there are many who, given the right opportunity to study and practice it, sense that it could be interesting and enjoyable, it could certainly prove useful and applicable in an infinite number of areas, situations, and results. This is due to a property that data visualization has and is instead absent in traditional data analysis or code development: it stimulates visual creativity together with logic. Even statisticians and programmers use creativity, those who deny it have never really practiced one of those disciplines, but that is logical creativity. With data visualization, another dimension of data science that is otherwise neglected comes into play, the visual language combined with computational logic, the data represented with an expressive form that is no longer just logical and formal, but also perceptive, and sensorial, comes into play with shapes, colors, use and projections of space, and it is always accompanied with meaning that the originator wish to convey and the observers will interpret, often subjectively. Data visualization conveys different knowledge and logic for an expressive form that always has a double soul: computational for the data that feeds it, visual and sometimes interactive for the language it uses to communicate with the observer. Data visualization has always a double nature: it is a key part of data science for its methods, techniques, and tools, and it is storytelling; who produces visual representations from data tells a story that may have different guises and may produce different reactions. There is enough to fill not just a single book.

Organization of the Work: Foundations and Advanced Contents

The text is divided into four parts already mentioned in the previous introduction. The first part presents the fundamentals of data visualization with Python and R, the two reference languages and environments for data science, employed to create static graphs as a direct result of a previous data wrangling (import, transformation) and analysis activity. The reference libraries for this first part are Seaborn for Python and ggplot2 for R. They are both modern open-source graphics libraries and in constant evolution, both produced by the core developers and with the contributions of the respective communities, very large and lively in engaging in continuous innovations. Seaborn is the more recent of the two and partly represents an evolved interface of Python’s traditional matplotlib graphics library, made more functional and enriched with features and graph types popular in modern data visualization. Ggplot2 is the traditional graphic library for R, unanimously recognized as one of the best ever, both in the open-source and proprietary world. Ggplot is full of high-level features and constantly evolving, it receives contributions from researchers and developers from various scientific and application fields. A simply unavoidable tool for anyone approaching data visualization. The two have different settings, more traditional Seaborn, with a collection of functions and options for the different types of charts supported. Instead, ggplot is organized by overlapping graphic levels, according to a setting that goes by the name of grammar of graphics, shared by some of the most widespread digital graphics tools, and suitable for developing even unconventional types of graphics, thanks to the extreme flexibility it allows. This first part covers about a third of the work.

The second part introduces Altair, a Python library capable of producing interactive graphics in HTML and JSON format, as well as static versions in bitmap (PNG and JPG) and vector (SVG) formats. Altair is a young but solid graphic library because in all respects it represents a modern interface of Vega-Lite, a graphic library with an established tradition for web applications thanks to the declarative syntax in JSON format. Altair offers the same web-oriented functionality as Vega-Lite for typical data science use, with a syntax that supports the definition of overlapping graphical layers and aesthetics composed with a syntax that is easy to use and common to similar tools. This second part presents a higher level of difficulty than the first, but certainly within reach for those who have acquired the fundamental knowledge given by the first part. The first and second parts cover approximately half of the work.

The third and fourth parts represent advanced data visualization contents. The difficulty increases and so does the commitment required, on the other hand, we face two real worlds: that of web dashboards and of spatial data and maps. The term dashboard may be new to many, but dashboards are not. Whenever you access environments on the web that show menus and configurable graphic objects according to user’s choices and content in the form of data or graphs, what you are using is most likely a dashboard. If you access Open Data of a large institution, such as the Organisation for Economic Co-operation and Development (OECD) or the United Nations, or even an internal company application that displays graphs and statistics, you are most likely using a dashboard. Numerous systems and products for creating dashboards with different technologies are available, it is a vast market. In data science environments with Python and R, there are two formidable tools, Plotly/Dash and Shiny, respectively. They are professional tools, and the list of relevant organizations using them is long. They are also irreplaceable teaching tools for learning the logic and basic mechanisms of a dashboard, which, in its final form, is a web application, therefore integrated with the typical technology of pages and websites. However, a Dash or Shiny dashboard is also something else, it is the terminal point of a pipeline that begins with the fundamentals of data science, data import, data wrangling, data analysis, and then static and dynamic graphs. The dashboard is the final end in which everything is concentrated and integrated: logic, mechanisms, requirements, and creativity. Technically they are challenging due to the presence of reactive logic which allows them to be dynamic and interactive and due to the integration of various components. The text discusses and develops examples of medium complexity, with different solutions, from web scraping of online content to the integration of Altair interactive graphics.

The second world that opens up, that of geographical maps, is undeniably fascinating. Spatial data, choropleth maps, the simplest ones with the colored areas (such as maps with areas colored according to the coalition that won the elections or the rate of unemployment by province, region, or nation), but also maps based on cartography data are the declination given by data science of a discipline that has very ancient roots and still constitutes an almost independent environment composed of high-resolution maps and geographic information systems (GIS), with its specializations and professional skills. Until a few years ago, data science tools could not even touch that world, but today they have come surprisingly close. This is thanks to extraordinary progress in open-source systems and tools, Python but above all R, which is now offering formidable tools capable of also using shape files from a technical cartography and geographic coordinate systems according to international standards. In the examples presented, geographic and cartographic files from Venice, Rome, and New York were used with the aim of showing the impressive potential offered by the Python and R tools.

Who is it Aimed at?

It is simple to specify to whom this text is addressed: it is addressed to everyone. Anyone who finds data visualization interesting, and images useful for their work, study, and the skills they are building, will find a learning path that starts from the fundamentals and goes up to cartography and web applications. Of course, saying “it’s aimed at everyone” is simple, then doubts may arise in the reader, “but am I also part of those everyone?” Trying to make a list of those included in this “everyone” will inevitably leave out someone, but we could certainly mention students, researchers, and instructors of social, political, and economic sciences. In addition to many generic data, they may have spatial data to represent (e.g. movement of people and goods, global supply chains, logistics, spatial or ethnographic analyses). Next, students, researchers, and instructors of marketing, communication, public relations, journalism, media, and advertising, for whom interactive representations via the web and graphics in general are important, as products and skills. Also, students, researchers, and instructors of scientific and medical disciplines could be interested, they often deal with sophisticated graphic representations, for example in biology or epidemiology, without forgetting that the graphic contributions from the genomics and molecular biology community are among the most numerous. Students, researchers, and instructors of engineering, management, or bioengineering, for example, use data science tools and visualization as an integral part of their analyses. Historians, archaeologists, and paleontologists produce high-quality graphic representations, so the text can be useful for them too. Well, the list is already long, and I’m definitely forgetting someone who should be mentioned instead.

In general, undergraduate and graduate students, teachers, researchers, and Ph.D. students will find many examples and explanations to help them graphically present content and organize exercises. Likewise, professionals and companies, for their corporate and institutional communication and training, may enjoy the material presented in the text.

What I’m trying to say is that data visualization, like data science as a whole, is not a sectoral discipline for which you need to have a specific background, such as a statistician, computer scientist, engineer, or graphic designer. It is not necessary at all, in fact the opposite is needed, that is, that data visualization and data science be as transversal as possible, being studied and used by all those who, for their formation and work interests, in their specific field, from economics to paleontology, from psychology to molecular biology, find themselves working with data, whether numerical, textual, or spatial and find useful to obtain high-quality visual representations from those data, perhaps interactive or structured in dashboards.

What is Required and What is Learned

To follow and learn the contents of the text it is necessary to know the fundamentals of data science with Python and R, meaning those concerned with importing and reading operations of datasets and the typical data wrangling operations (sorting, aggregations, shape and type transformations, selections, and so on). Numerous examples are presented in the text which include the data wrangling part (the cases where it is longer can be found in the Supplementary Material), so to replicate a visualization all the necessary code is available, starting from reading the Open Data. Therefore, it is not required to independently produce the preliminary part of operations on the data, but it is necessary to be able to interpret the logic and the operations that are performed. Hence the need to know the fundamentals, as well as the possibility of producing variations of the examples.

Another aspect that may appear problematic is knowledge of the fundamentals of data science with both Python and R because often one only knows one of the two environments and languages. In this regard, I would like to reassure anyone who finds themselves in this situation. If you know data wrangling operations with R or Python, interpreting the logic of those carried out with the other language requires little effort, at most the details of the syntax will need some specific attention and learning efforts. But here a second consideration comes into play: knowledge of both Python and R is particularly useful in modern data science, those who know only one of the two probably just need a good opportunity to learn the second, discovering that the learning curve is much smoother than could have been imagined and the effort will be certainly reasonable. The advantage will be considerable in terms of new features and tools that will become available.

The organization into parts also suggests a progression and division in learning and teaching. The first part is also suitable for those who have just learned the fundamentals of data science and can be carried out in parallel with the study of those fundamentals. Most static graphs require basic data wrangling operations and generating graphs can be a great educational tool for demonstrating the logic and use of data wrangling operations. The presentation of the different types of static graphs follows an order of increasing complexity, from the first intuitive and easily modifiable in infinite variations, up to the last ones which require knowledge of some important properties of statistical analysis. The difficulty level of code is generally low. The second part is a natural continuation of the first. The Altair library has a linear and clear syntax, so the greater difficulty introduced by the interactive features, especially in terms of computational logic, is completely within reach for anyone who has learned the fundamentals contained in the first part. The result will be motivating, the Altair interactive graphics are of excellent quality, allowing various configurations and alternative solutions.

Between these two parts and the subsequent third and fourth parts, there is a gap in terms of what is required and what is learned, for this reason in the initial introductory part the last two parts were presented as advanced content. It is necessary to have acquired a good familiarity with the fundamentals, confidence in searching for information in the documentation of libraries, and knowing how to patiently and methodically manage errors. In other words, you need to have done a good number of exercises with the fundamental part.

For the third part on dashboards, it is necessary to have basic knowledge of HTML, CSS, and in general how a traditional web page is made. They are not difficult notions, but it may take some time to acquire them. You don’t need more advanced knowledge, such as JavaScript or web application frameworks. You also need to have gained some confidence in writing scripts in Python and R. In both cases you learn the basic reactive mechanisms to manage interactivity, it is a different logic from the traditional one.

For the fourth part on maps, it is necessary to learn the fundamental notions of geographic coordinate systems, the form of geographic data with the typical organization in geometries, and the often-necessary coordinate transformations. The tools used are partly known, ggplot for R and pandas for Python, but many new ones will be encountered because in any case, not only in the world of cartography but also in that of data science, the logic, methods, and tools to use spatial data have specificities that distinguish them. As mentioned initially, there are some initial difficulties to overcome and it is required to go into the details of the shape of the spatial data, but the use of these data and the production of geographical maps is fascinating, right from the first and simple choropleth maps. However, it is right after those initial maps that there’s the real beauty of working with spatial data and geographic maps.

What is Excluded

As always, or almost always, much remains excluded from the content of a book, sometimes simply due to the need not to exceed a certain number of pages, sometimes out of pure forgetfulness, and often due to a conscious choice by the Author. All three motifs also exist in this work. For the first, that is simply how a publisher works; for the second, other than apologizing I don’t know what to say since those are things I’ve forgotten; for the third however, there is something to comment on, if for no other reason than to give some explanation of the motives for exclusions by choice.

The first obvious exclusion is the absence of proprietary technologies and tools. For data visualization there are many proprietary solutions, from very specialized ones produced by small companies to generalist ones produced by big players. Manufacturers of data visualization software will say that their tools are better than those presented in this book. For some aspects, it might be true, but almost always it is false and in general, to define itself as better than the open-source tools of Python and R would require several distinctions and clarifications that are rarely presented. One of the main reasons is the ease of use of the graphical interfaces of proprietary tools compared to the low-level programming of open-source ones. An old, worn out, and now out-of-date issue that is slowly, perhaps, starting to be overcome. It is obvious that learning to click sequences of buttons and menus or drag graphic icons is initially simpler than writing code with a programming language. The initial learning curve is different in the two cases. The point, however, lies in that adjective, “initial.” What happens next? What is the purpose of learning to use these tools? If the purpose is educational, teaching and learning the fundamentals and advanced contents of data visualization, there is practically no choice, only the environments and tools that exhibit low-level details are teaching tools. The others simply aren’t. They are suitable for professional training courses on that particular instrument, but not for basic teaching or learning. This is enough to exclude any proprietary instrument from this text. It should be noted that some of the most modern proprietary tools (or perhaps made by intelligent manufacturers) are integrating the open-source technologies of Python and R into their frameworks, with the idea of offering both possibilities.

Then there is a specific and perhaps surprising exclusion among the basic chart types, and not one of the exotic kind that very few use, on the contrary of the most widespread, very widespread indeed. The excluded is pie charts and reason is simply that it is not useful in the true sense of data visualization in data science. The statement will seem surprising, in what sense are pie charts, ubiquitous and used millions of times, not useful? I will briefly explain the reason, which is also shared by many who deal with data visualization. A graph is produced to visually represent the information contained in certain data and this representation is based on at least two conditions: (1) that the visual representation is clear and interpretable in an unambiguous way and (2) that with the graph, the information contained in the data is easier to understand than the tabular form (or at least of equal difficulty). Pie charts satisfy neither condition. They are ambiguous because the relative size of the slices is often unclear and above all they make it more difficult to interpret the data than the equivalent table. In other words, if the table with the values is presented instead of the pie chart, the reader has easier, clearer, and more understandable information. On the contrary, bar charts are one of the fundamental type of graphics, despite the fact that pie charts are simply the polar coordinate representation of a bar chart. So why this difference and why pie charts are so common? The reason for the difference is that visually evaluating angles is considerably more difficult than comparing linear heights. Pie charts are mostly used because they just give a touch of color to an otherwise monotonous text, not for their informative content. And what about the difficulty of evaluating the slice proportions? Well, the numerical values are often added to the slices, that is, in practice, to rewrite the data table right over the graphic.

To conclude, data visualization deserves more space in educational programs and clearer recognition as a coherent and evolving discipline and body of knowledge. The Cinderella role of data science can be overcome by recognizing its educational value and, no less importantly, its creative stimulus.

About the Companion Website

This book is accompanied by a companion website:

https://www.wiley.com/go/Cremonini/DataVisualization1e

This website includes:

Codes

Figures

Datasets

Part IStatic Graphics with ggplot (R) and Seaborn (Python)

Grammar of Graphics

The grammar of graphics was cited in the Introduction and will continue to be mentioned in the rest of the text. We see a brief summary here. The concept of grammar of graphics was proposed by Leland Wilkinson in the early 2000s with the idea of creating grammatical, mathematical, and aesthetic rules to define the graphics that were produced by statistical analysis. The different approach, with respect to the fixed definition of chart types composed of stylized reference schemes, is that a graph’s grammar would instead have allowed previously unknown flexibility. In Wilkinson’s definition, seven fundamental components were identified, but the construction by overlapping layers was not yet highlighted. It is Hadley Wickham, core developer of R and ggplot, who in 2010 introduced the layered grammar of graphics, with which Wilkinson’s approach was updated by reviewing the fundamental elements. The definition by levels provides the representation of the data, combining statistics and geometries, two of the fundamental elements, together with positions, aesthetics, scales, a coordinate system, and possibly facets. We will find all these elements in ggplot and Altair, the two graphic libraries organized according to the grammar of graphics considered in this book, as well as in the recent but still preliminary Seaborn Objects interface of Seaborn, the reference graphic library for Python.

References

Leland Wilkinson,

The Grammar of Graphics

, 2nd Ed., Springer-Verlag New York 2005,

https://doi.org/10.1007/0-387-28695-0

.

Leland Wilkinson, The Grammar of Graphics,

Chapter 13

,

Handbook of Computational Statistics, Concepts and Methods

, 2nd Ed., Gentle J.E., Härdle W. K. and Mori Y. (eds.), Springer-Verlag Berlin Heidelberg 2012.

https://doi.org/10.1007/978-3-642-21551-3

.

Wickham, H. (2010). A Layered Grammar of Graphics.

Journal of Computational and Graphical Statistics

19 (1): 3–28.

http://dx.doi.org/10.1198/jcgs.2009.07098

.

1Scatterplots and Line Plots

Scatterplots, with the main variant represented by line plots, are the fundamental type of graphic for pairs of continuous variables or for a continuous variable and a categorical variable and, in addition to representing the most common type of graphic together with bar plots (or bar charts/bar graphs), form the basis for numerous variations. The logic that guides a scatterplot graphic is to represent with markers