Data Science For Dummies - Lillian Pierson - E-Book

Data Science For Dummies E-Book

Lillian Pierson

0,0
20,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Discover how data science can help you gain in-depth insight into your business - the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles. Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer covering all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value. If you want to pick-up the skills you need to begin a new career or initiate a new project, reading this book will help you understand what technologies, programming languages, and mathematical methods on which to focus. While this book serves as a wildly fantastic guide through the broad aspects of the topic, including the sometimes intimidating field of big data and data science, it is not an instructional manual for hands-on implementation. Here's what to expect in Data Science for Dummies: * Provides a background in big data and data engineering before moving on to data science and how it's applied to generate value. * Includes coverage of big data frameworks and applications like Hadoop, MapReduce, Spark, MPP platforms, and NoSQL. * Explains machine learning and many of its algorithms, as well as artificial intelligence and the evolution of the Internet of Things. * Details data visualization techniques that can be used to showcase, summarize, and communicate the data insights you generate. It's a big, big data world out there - let Data Science For Dummies help you get started harnessing its power so you can gain a competitive edge for your organization.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 526

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Data Science For Dummies®

Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com

Copyright © 2015 by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.

For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit www.wiley.com/techsupport.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2014955780

ISBN 978-1-118-4155-6 (pbk); ISBN 978-1-118-84145-7 (ebk); ISBN 978-1-118-84152-5

Data Science For Dummies®

Visit www.dummies.com/cheatsheet/datascience to view this book's cheat sheet.

Table of Contents

Cover

Foreword

Introduction

About This Book

Foolish Assumptions

Icons Used in This Book

Beyond the Book

Where to Go from Here

Part I: Getting Started With Data Science

Chapter 1: Wrapping Your Head around Data Science

Seeing Who Can Make Use of Data Science

Looking at the Pieces of the Data Science Puzzle

Getting a Basic Lay of the Data Science Landscape

Chapter 2: Exploring Data Engineering Pipelines and Infrastructure

Defining Big Data by Its Four Vs

Identifying Big Data Sources

Grasping the Difference between Data Science and Data Engineering

Boiling Down Data with MapReduce and Hadoop

Identifying Alternative Big Data Solutions

Data Engineering in Action — A Case Study

Chapter 3: Applying Data Science to Business and Industry

Incorporating Data-Driven Insights into the Business Process

Distinguishing Business Intelligence and Data Science

Knowing Who to Call to Get the Job Done Right

Exploring Data Science in Business: A Data-Driven Business Success Story

Part II: Using Data Science to Extract Meaning from Your Data

Chapter 4: Introducing Probability and Statistics

Introducing the Fundamental Concepts of Probability

Introducing Linear Regression

Simulations

Introducing Time Series Analysis

Chapter 5: Clustering and Classification

Introducing the Basics of Clustering and Classification

Identifying Clusters in Your Data

Chapter 6: Clustering and Classification with Nearest Neighbor Algorithms

Making Sense of Data with Nearest Neighbor Analysis

Seeing the Importance of Clustering and Classification

Classifying Data with Average Nearest Neighbor Algorithms

Classifying with K-Nearest Neighbor Algorithms

Using Nearest Neighbor Distances to Infer Meaning from Point Patterns

Solving Real-World Problems with Nearest Neighbor Algorithms

Chapter 7: Mathematical Modeling in Data Science

Introducing Multi-Criteria Decision Making (MCDM)

Using Numerical Methods in Data Science

Mathematical Modeling with Markov Chains and Stochastic Methods

Chapter 8: Modeling Spatial Data with Statistics

Generating Predictive Surfaces from Spatial Point Data

Using Trend Surface Analysis on Spatial Data

Part III: Creating Data Visualizations that Clearly Communicate Meaning

Chapter 9: Following the Principles of Data Visualization Design

Understanding the Types of Visualizations

Focusing on Your Audience

Picking the Most Appropriate Design Style

Knowing When to Add Context

Knowing When to Get Persuasive

Choosing the Most Appropriate Data Graphic Type

Choosing Your Data Graphic

Chapter 10: Using D3.js for Data Visualization

Introducing the D3.js Library

Knowing When to Use D3.js (and When Not To)

Getting Started in D3.js

Understanding More Advanced Concepts and Practices in D3.js

Chapter 11: Web-Based Applications for Visualization Design

Using Collaborative Data Visualization Platforms

Visualizing Spatial Data with Online Geographic Tools

Visualizing with Open Source: Web-Based Data Visualization Platforms

Knowing When to Stick with Infographics

Chapter 12: Exploring Best Practices in Dashboard Design

Focusing on the Audience

Starting with the Big Picture

Getting the Details Right

Testing Your Design

Chapter 13: Making Maps from Spatial Data

Getting into the Basics of GIS

Analyzing Spatial Data

Getting Started with Open-Source QGIS

Part IV: Computing for Data Science

Chapter 14: Using Python for Data Science

Understanding Basic Concepts in Python

Getting on a First-Name Basis with Some Useful Python Libraries

Using Python to Analyze Data — An Example Exercise

Chapter 15: Using Open Source R for Data Science

Introducing the Fundamental Concepts

Previewing R Packages

Chapter 16: Using SQL in Data Science

Getting Started with SQL

Using SQL and Its Functions in Data Science

Chapter 17: Software Applications for Data Science

Making Life Easier with Excel

Using KNIME for Advanced Data Analytics

Part V: Applying Domain Expertise to Solve Real-World Problems Using Data Science

Chapter 18: Using Data Science in Journalism

Exploring the Five Ws and an H

Collecting Data for Your Story

Finding and Telling Your Data’s Story

Bringing Data Journalism to Life: Washington Post’s The Black Budget

Chapter 19: Delving into Environmental Data Science

Modeling Environmental-Human Interactions with Environmental Intelligence

Modeling Natural Resources in the Raw

Using Spatial Statistics to Predict for Environmental Variation across Space

Chapter 20: Data Science for Driving Growth in E-Commerce

Making Sense of Data for E-Commerce Growth

Optimizing E-Commerce Business Systems

Chapter 21: Using Data Science to Describe and Predict Criminal Activity

Temporal Analysis for Crime Prevention and Monitoring

Spatial Crime Prediction and Monitoring

Probing the Problems with Data Science for Crime Analysis

Part VI: The Part of Tens

Chapter 22: Ten Phenomenal Resources for Open Data

Digging through Data.gov

Checking Out Canada Open Data

Diving into data.gov.uk

Checking Out U.S. Census Bureau Data

Knowing NASA Data

Wrangling World Bank Data

Getting to Know Knoema Data

Queuing Up with Quandl Data

Exploring Exversion Data

Mapping OpenStreetMap Spatial Data

Chapter 23: Ten (or So) Free Data Science Tools and Applications

Making Custom Web-Based Data Visualizations with Free R Packages

Checking Out More Scraping, Collecting, and Handling Tools

Checking Out More Data Exploration Tools

Checking Out More Web-Based Visualization Tools

About the Author

Cheat Sheet

Advertisement Page

Connect with Dummies

End User License Agreement

Guide

Cover

Table of Contents

Begin Reading

Pages

i

ii

xv

xvi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

385

386

387

388

389

390

391

392

Foreword

We live in exciting, even revolutionary times. As our daily interactions move from the physical world to the digital world, nearly every action we take generates data. Information pours from our mobile devices and our every online interaction. Sensors and machines collect, store and process information about the environment around us. New, huge data sets are now open and publicly accessible.

This flood of information gives us the power to make more informed decisions, react more quickly to change, and better understand the world around us. However, it can be a struggle to know where to start when it comes to making sense of this data deluge. What data should one collect? What methods are there for reasoning from data? And, most importantly, how do we get the answers from the data to answer our most pressing questions about our businesses, our lives, and our world?

Data science is the key to making this flood of information useful. Simply put, data science is the art of wrangling data to predict our future behavior, uncover patterns to help prioritize or provide actionable information, or otherwise draw meaning from these vast, untapped data resources.

I often say that one of my favorite interpretations of the word “big” in Big Data is “expansive.” The data revolution is spreading to so many fields that it is now incumbent on people working in all professions to understand how to use data, just as people had to learn how to use computers in the 80’s and 90’s. This book is designed to help you do that.

I have seen firsthand how radically data science knowledge can transform organizations and the world for the better. At DataKind, we harness the power of data science in the service of humanity by engaging data science and social sector experts to work on projects addressing critical humanitarian problems. We are also helping drive the conversation about how data science can be applied to solve the world’s biggest challenges. From using satellite imagery to estimate poverty levels to mining decades of human rights violations to prevent further atrocities, DataKind teams have worked with many different nonprofits and humanitarian organizations just beginning their data science journeys. One lesson resounds through every project we do: The people and organizations that are most committed to using data in novel and responsible ways are the ones who will succeed in this new environment.

Just holding this book means you are taking your first steps on that journey, too. Whether you are a seasoned researcher looking to brush up on some data science techniques or are completely new to the world of data, Data Science For Dummies will equip you with the tools you need to show whatever you can dream up. You’ll be able to demonstrate new findings from your physical activity data, to present new insights from the latest marketing campaign, and to share new learnings about preventing the spread of disease.

We truly are on the forefront of a new data age, and those that learn data science will be able to take part in this thrilling new adventure, shaping our path forward in every field. For you, that adventure starts now. Welcome aboard!

Jake Porway

Founder and Executive Director of DataKind™

Introduction

The power of big data and data science are revolutionizing the world. From the modern business enterprise to the lifestyle choices of today’s digital citizen, data science insights are driving changes and improvements in every arena. Although data science may be a new topic to many, it’s a skill that any individual who wants to stay relevant in her career field and industry needs to know.

Although other books dealing with data science tend to focus heavily on using Microsoft Excel to learn basic data science techniques, Data Science For Dummies goes deeper by introducing Python, the R statistical programming language, D3.js, SQL, Excel, and a whole plethora of open-source applications that you can use to get started in practicing data science. Some books on data science are needlessly wordy, with authors going in circles trying to get to a point. Not so here. Unlike books authored by stuffy-toned, academic types, I’ve written this book in friendly, approachable language — because data science is a friendly and approachable subject!

To be honest, up until now, the data science realm has been dominated by a few select data science wizards who tend to present the topic in a manner that’s unnecessarily over-technical and intimidating. Basic data science isn’t that hard or confusing. Data science is simply the practice of using a set of analytical techniques and methodologies to derive and communicate valuable and actionable insights from raw data. The purpose of data science is to optimize processes and to support improved data-informed decision making, thereby generating an increase in value — whether value is represented by number of lives saved, number of dollars retained, or percentage of revenues increased. In Data Science For Dummies, I introduce a broad array of concepts and approaches that you can use when extracting valuable insights from your data.

Remember, a lot of times data scientists get so caught up analyzing the bark of the trees that they simply forget to look for their way out of the forest. This is a common pitfall that you should avoid at all costs. I’ve worked hard to make sure that this book presents the core purpose of each data science technique and the goals you can accomplish by utilizing them.

About This Book

In keeping with the For Dummies brand, this book is organized in a modular, easy-to-access format. This format allows you to use the book as a practical guidebook and ad hoc reference. In other words, you don’t need to read through, cover to cover. Just take what you want and leave the rest. I’ve taken great care to use real-world examples that illustrate data science concepts that may otherwise be overly abstract.

Web addresses and programming code appear in monofont. If you’re reading a digital version of this book on a device connected to the Internet, you can click a web address to visit that website, like this: www.dummies.com.

Foolish Assumptions

In writing this book, I’ve assumed that readers are at least technical enough to have mastered advanced Microsoft Excel — pivot tables, grouping, sorting, plotting, and the like. Being strong in algebra, basic statistics, or even business calculus helps, as well. Foolish or not, it’s my high hope that all readers have a subject-matter expertise to which they can apply the skills presented in this book. Since data scientists must be capable of intuitively understanding the implications and applications of the data insights they derive, subject-matter expertise is a major component of data science.

Icons Used in This Book

As you make your way through this book, you’ll see the following icons in the margins:

The Tip icon marks tips (duh!) and shortcuts that you can use to make subject mastery easier.

Remember icons mark the information that’s especially important to know. To siphon off the most important information in each chapter, just skim through these icons.

The Technical Stuff icon marks information of a highly technical nature that you can normally skip over.

The Warning icon tells you to watch out! It marks important information that may save you headaches.

Beyond the Book

This book includes the following external resources:

Data Science Cheat Sheet:

This book comes with a handy Cheat Sheet at

www.dummies.com/cheatsheet/datascience

. The Cheat Sheet lists helpful shortcuts, as well as abbreviated definitions for essential processes and concepts described in the book. You can use it as a quick-and-easy reference when doing data science.

Online articles on the practical application of data science:

This book has Parts pages that link to

www.dummies.com

, where you can find a number of articles that extend the topics covered. More specifically, these articles present best practices, how-to’s, and case studies that exemplify the power of data science in practice. The articles are available on the book’s Extras page (

www.dummies.com/extras/datascience

).

Updates:

I’ll be updating this book on a regular basis. You can find updates on the Downloads tab of the book's product page. On the book’s Extras page (

www.dummies.com/extras/datascience

), an article will either describe the update or provide a link to take readers to the Downloads tab for access to updated content. Any errata will appear in this section, as well.

Where to Go from Here

Just to reemphasize the point, this book’s modular design allows you to pick up and start reading anywhere you want. Although you don’t need to read cover to cover, a few good starter chapters include Chapters 1, 2, and 9.

Part I

Getting Started With Data Science

For great online content, check out http://www.dummies.com.

In this part . . .

Get introduced to the field of data science.

Define big data.

Explore solutions for big data problems.

See how a real-world businesses put data science to good use.

Chapter 1

Wrapping Your Head around Data Science

In This Chapter

Defining data science

Defining data science by its key components

Identifying viable data science solutions to your own data challenges

For quite some time now, we’ve all been absolutely deluged by data. It’s coming off of every computer, every mobile device, every camera, and every sensor — and now it’s even coming off of watches and other wearable technologies. It’s generated in every social media interaction we make, every file we save, every picture we take, every query we submit; it’s even generated when we do something as simple as get directions to the closest ice cream shop from Google.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!