20,99 €
Discover how data science can help you gain in-depth insight into your business - the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles. Data Science For Dummies is the perfect starting point for IT professionals and students who want a quick primer covering all areas of the expansive data science space. With a focus on business cases, the book explores topics in big data, data science, and data engineering, and how these three areas are combined to produce tremendous value. If you want to pick-up the skills you need to begin a new career or initiate a new project, reading this book will help you understand what technologies, programming languages, and mathematical methods on which to focus. While this book serves as a wildly fantastic guide through the broad aspects of the topic, including the sometimes intimidating field of big data and data science, it is not an instructional manual for hands-on implementation. Here's what to expect in Data Science for Dummies: * Provides a background in big data and data engineering before moving on to data science and how it's applied to generate value. * Includes coverage of big data frameworks and applications like Hadoop, MapReduce, Spark, MPP platforms, and NoSQL. * Explains machine learning and many of its algorithms, as well as artificial intelligence and the evolution of the Internet of Things. * Details data visualization techniques that can be used to showcase, summarize, and communicate the data insights you generate. It's a big, big data world out there - let Data Science For Dummies help you get started harnessing its power so you can gain a competitive edge for your organization.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 526
Veröffentlichungsjahr: 2015
Data Science For Dummies®
Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com
Copyright © 2015 by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit www.wiley.com/techsupport.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2014955780
ISBN 978-1-118-4155-6 (pbk); ISBN 978-1-118-84145-7 (ebk); ISBN 978-1-118-84152-5
Table of Contents
Cover
Foreword
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here
Part I: Getting Started With Data Science
Chapter 1: Wrapping Your Head around Data Science
Seeing Who Can Make Use of Data Science
Looking at the Pieces of the Data Science Puzzle
Getting a Basic Lay of the Data Science Landscape
Chapter 2: Exploring Data Engineering Pipelines and Infrastructure
Defining Big Data by Its Four Vs
Identifying Big Data Sources
Grasping the Difference between Data Science and Data Engineering
Boiling Down Data with MapReduce and Hadoop
Identifying Alternative Big Data Solutions
Data Engineering in Action — A Case Study
Chapter 3: Applying Data Science to Business and Industry
Incorporating Data-Driven Insights into the Business Process
Distinguishing Business Intelligence and Data Science
Knowing Who to Call to Get the Job Done Right
Exploring Data Science in Business: A Data-Driven Business Success Story
Part II: Using Data Science to Extract Meaning from Your Data
Chapter 4: Introducing Probability and Statistics
Introducing the Fundamental Concepts of Probability
Introducing Linear Regression
Simulations
Introducing Time Series Analysis
Chapter 5: Clustering and Classification
Introducing the Basics of Clustering and Classification
Identifying Clusters in Your Data
Chapter 6: Clustering and Classification with Nearest Neighbor Algorithms
Making Sense of Data with Nearest Neighbor Analysis
Seeing the Importance of Clustering and Classification
Classifying Data with Average Nearest Neighbor Algorithms
Classifying with K-Nearest Neighbor Algorithms
Using Nearest Neighbor Distances to Infer Meaning from Point Patterns
Solving Real-World Problems with Nearest Neighbor Algorithms
Chapter 7: Mathematical Modeling in Data Science
Introducing Multi-Criteria Decision Making (MCDM)
Using Numerical Methods in Data Science
Mathematical Modeling with Markov Chains and Stochastic Methods
Chapter 8: Modeling Spatial Data with Statistics
Generating Predictive Surfaces from Spatial Point Data
Using Trend Surface Analysis on Spatial Data
Part III: Creating Data Visualizations that Clearly Communicate Meaning
Chapter 9: Following the Principles of Data Visualization Design
Understanding the Types of Visualizations
Focusing on Your Audience
Picking the Most Appropriate Design Style
Knowing When to Add Context
Knowing When to Get Persuasive
Choosing the Most Appropriate Data Graphic Type
Choosing Your Data Graphic
Chapter 10: Using D3.js for Data Visualization
Introducing the D3.js Library
Knowing When to Use D3.js (and When Not To)
Getting Started in D3.js
Understanding More Advanced Concepts and Practices in D3.js
Chapter 11: Web-Based Applications for Visualization Design
Using Collaborative Data Visualization Platforms
Visualizing Spatial Data with Online Geographic Tools
Visualizing with Open Source: Web-Based Data Visualization Platforms
Knowing When to Stick with Infographics
Chapter 12: Exploring Best Practices in Dashboard Design
Focusing on the Audience
Starting with the Big Picture
Getting the Details Right
Testing Your Design
Chapter 13: Making Maps from Spatial Data
Getting into the Basics of GIS
Analyzing Spatial Data
Getting Started with Open-Source QGIS
Part IV: Computing for Data Science
Chapter 14: Using Python for Data Science
Understanding Basic Concepts in Python
Getting on a First-Name Basis with Some Useful Python Libraries
Using Python to Analyze Data — An Example Exercise
Chapter 15: Using Open Source R for Data Science
Introducing the Fundamental Concepts
Previewing R Packages
Chapter 16: Using SQL in Data Science
Getting Started with SQL
Using SQL and Its Functions in Data Science
Chapter 17: Software Applications for Data Science
Making Life Easier with Excel
Using KNIME for Advanced Data Analytics
Part V: Applying Domain Expertise to Solve Real-World Problems Using Data Science
Chapter 18: Using Data Science in Journalism
Exploring the Five Ws and an H
Collecting Data for Your Story
Finding and Telling Your Data’s Story
Bringing Data Journalism to Life: Washington Post’s The Black Budget
Chapter 19: Delving into Environmental Data Science
Modeling Environmental-Human Interactions with Environmental Intelligence
Modeling Natural Resources in the Raw
Using Spatial Statistics to Predict for Environmental Variation across Space
Chapter 20: Data Science for Driving Growth in E-Commerce
Making Sense of Data for E-Commerce Growth
Optimizing E-Commerce Business Systems
Chapter 21: Using Data Science to Describe and Predict Criminal Activity
Temporal Analysis for Crime Prevention and Monitoring
Spatial Crime Prediction and Monitoring
Probing the Problems with Data Science for Crime Analysis
Part VI: The Part of Tens
Chapter 22: Ten Phenomenal Resources for Open Data
Digging through Data.gov
Checking Out Canada Open Data
Diving into data.gov.uk
Checking Out U.S. Census Bureau Data
Knowing NASA Data
Wrangling World Bank Data
Getting to Know Knoema Data
Queuing Up with Quandl Data
Exploring Exversion Data
Mapping OpenStreetMap Spatial Data
Chapter 23: Ten (or So) Free Data Science Tools and Applications
Making Custom Web-Based Data Visualizations with Free R Packages
Checking Out More Scraping, Collecting, and Handling Tools
Checking Out More Data Exploration Tools
Checking Out More Web-Based Visualization Tools
About the Author
Cheat Sheet
Advertisement Page
Connect with Dummies
End User License Agreement
Cover
Table of Contents
Begin Reading
i
ii
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
385
386
387
388
389
390
391
392
We live in exciting, even revolutionary times. As our daily interactions move from the physical world to the digital world, nearly every action we take generates data. Information pours from our mobile devices and our every online interaction. Sensors and machines collect, store and process information about the environment around us. New, huge data sets are now open and publicly accessible.
This flood of information gives us the power to make more informed decisions, react more quickly to change, and better understand the world around us. However, it can be a struggle to know where to start when it comes to making sense of this data deluge. What data should one collect? What methods are there for reasoning from data? And, most importantly, how do we get the answers from the data to answer our most pressing questions about our businesses, our lives, and our world?
Data science is the key to making this flood of information useful. Simply put, data science is the art of wrangling data to predict our future behavior, uncover patterns to help prioritize or provide actionable information, or otherwise draw meaning from these vast, untapped data resources.
I often say that one of my favorite interpretations of the word “big” in Big Data is “expansive.” The data revolution is spreading to so many fields that it is now incumbent on people working in all professions to understand how to use data, just as people had to learn how to use computers in the 80’s and 90’s. This book is designed to help you do that.
I have seen firsthand how radically data science knowledge can transform organizations and the world for the better. At DataKind, we harness the power of data science in the service of humanity by engaging data science and social sector experts to work on projects addressing critical humanitarian problems. We are also helping drive the conversation about how data science can be applied to solve the world’s biggest challenges. From using satellite imagery to estimate poverty levels to mining decades of human rights violations to prevent further atrocities, DataKind teams have worked with many different nonprofits and humanitarian organizations just beginning their data science journeys. One lesson resounds through every project we do: The people and organizations that are most committed to using data in novel and responsible ways are the ones who will succeed in this new environment.
Just holding this book means you are taking your first steps on that journey, too. Whether you are a seasoned researcher looking to brush up on some data science techniques or are completely new to the world of data, Data Science For Dummies will equip you with the tools you need to show whatever you can dream up. You’ll be able to demonstrate new findings from your physical activity data, to present new insights from the latest marketing campaign, and to share new learnings about preventing the spread of disease.
We truly are on the forefront of a new data age, and those that learn data science will be able to take part in this thrilling new adventure, shaping our path forward in every field. For you, that adventure starts now. Welcome aboard!
Jake Porway
Founder and Executive Director of DataKind™
The power of big data and data science are revolutionizing the world. From the modern business enterprise to the lifestyle choices of today’s digital citizen, data science insights are driving changes and improvements in every arena. Although data science may be a new topic to many, it’s a skill that any individual who wants to stay relevant in her career field and industry needs to know.
Although other books dealing with data science tend to focus heavily on using Microsoft Excel to learn basic data science techniques, Data Science For Dummies goes deeper by introducing Python, the R statistical programming language, D3.js, SQL, Excel, and a whole plethora of open-source applications that you can use to get started in practicing data science. Some books on data science are needlessly wordy, with authors going in circles trying to get to a point. Not so here. Unlike books authored by stuffy-toned, academic types, I’ve written this book in friendly, approachable language — because data science is a friendly and approachable subject!
To be honest, up until now, the data science realm has been dominated by a few select data science wizards who tend to present the topic in a manner that’s unnecessarily over-technical and intimidating. Basic data science isn’t that hard or confusing. Data science is simply the practice of using a set of analytical techniques and methodologies to derive and communicate valuable and actionable insights from raw data. The purpose of data science is to optimize processes and to support improved data-informed decision making, thereby generating an increase in value — whether value is represented by number of lives saved, number of dollars retained, or percentage of revenues increased. In Data Science For Dummies, I introduce a broad array of concepts and approaches that you can use when extracting valuable insights from your data.
Remember, a lot of times data scientists get so caught up analyzing the bark of the trees that they simply forget to look for their way out of the forest. This is a common pitfall that you should avoid at all costs. I’ve worked hard to make sure that this book presents the core purpose of each data science technique and the goals you can accomplish by utilizing them.
In keeping with the For Dummies brand, this book is organized in a modular, easy-to-access format. This format allows you to use the book as a practical guidebook and ad hoc reference. In other words, you don’t need to read through, cover to cover. Just take what you want and leave the rest. I’ve taken great care to use real-world examples that illustrate data science concepts that may otherwise be overly abstract.
Web addresses and programming code appear in monofont. If you’re reading a digital version of this book on a device connected to the Internet, you can click a web address to visit that website, like this: www.dummies.com.
In writing this book, I’ve assumed that readers are at least technical enough to have mastered advanced Microsoft Excel — pivot tables, grouping, sorting, plotting, and the like. Being strong in algebra, basic statistics, or even business calculus helps, as well. Foolish or not, it’s my high hope that all readers have a subject-matter expertise to which they can apply the skills presented in this book. Since data scientists must be capable of intuitively understanding the implications and applications of the data insights they derive, subject-matter expertise is a major component of data science.
As you make your way through this book, you’ll see the following icons in the margins:
The Tip icon marks tips (duh!) and shortcuts that you can use to make subject mastery easier.
Remember icons mark the information that’s especially important to know. To siphon off the most important information in each chapter, just skim through these icons.
The Technical Stuff icon marks information of a highly technical nature that you can normally skip over.
The Warning icon tells you to watch out! It marks important information that may save you headaches.
This book includes the following external resources:
Data Science Cheat Sheet:
This book comes with a handy Cheat Sheet at
www.dummies.com/cheatsheet/datascience
. The Cheat Sheet lists helpful shortcuts, as well as abbreviated definitions for essential processes and concepts described in the book. You can use it as a quick-and-easy reference when doing data science.
Online articles on the practical application of data science:
This book has Parts pages that link to
www.dummies.com
, where you can find a number of articles that extend the topics covered. More specifically, these articles present best practices, how-to’s, and case studies that exemplify the power of data science in practice. The articles are available on the book’s Extras page (
www.dummies.com/extras/datascience
).
Updates:
I’ll be updating this book on a regular basis. You can find updates on the Downloads tab of the book's product page. On the book’s Extras page (
www.dummies.com/extras/datascience
), an article will either describe the update or provide a link to take readers to the Downloads tab for access to updated content. Any errata will appear in this section, as well.
Just to reemphasize the point, this book’s modular design allows you to pick up and start reading anywhere you want. Although you don’t need to read cover to cover, a few good starter chapters include Chapters 1, 2, and 9.
Part I
For great online content, check out http://www.dummies.com.
In this part . . .
Get introduced to the field of data science.
Define big data.
Explore solutions for big data problems.
See how a real-world businesses put data science to good use.
Chapter 1
In This Chapter
Defining data science
Defining data science by its key components
Identifying viable data science solutions to your own data challenges
For quite some time now, we’ve all been absolutely deluged by data. It’s coming off of every computer, every mobile device, every camera, and every sensor — and now it’s even coming off of watches and other wearable technologies. It’s generated in every social media interaction we make, every file we save, every picture we take, every query we submit; it’s even generated when we do something as simple as get directions to the closest ice cream shop from Google.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!