25,99 €
Learn what it takes to succeed in the the most in-demand tech job
Harvard Business Review calls it the sexiest tech job of the 21st century. Data scientists are in demand, and this unique book shows you exactly what employers want and the skill set that separates the quality data scientist from other talented IT professionals. Data science involves extracting, creating, and processing data to turn it into business value. With over 15 years of big data, predictive modeling, and business analytics experience, author Vincent Granville is no stranger to data science. In this one-of-a-kind guide, he provides insight into the essential data science skills, such as statistics and visualization techniques, and covers everything from analytical recipes and data science tricks to common job interview questions, sample resumes, and source code.
The applications are endless and varied: automatically detecting spam and plagiarism, optimizing bid prices in keyword advertising, identifying new molecules to fight cancer, assessing the risk of meteorite impact. Complete with case studies, this book is a must, whether you're looking to become a data scientist or to hire one.
Developing Analytic Talent: Becoming a Data Scientist is essential reading for those aspiring to this hot career choice and for employers seeking the best candidates.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 553
Veröffentlichungsjahr: 2014
Table of Contents
Cover
Chapter 1: What Is Data Science?
Real Versus Fake Data Science
The Data Scientist
Data Science Applications in 13 Real-World Scenarios
Data Science History, Pioneers, and Modern Trends
Summary
Chapter 2: Big Data Is Different
Two Big Data Issues
Examples of Big Data Techniques
What MapReduce Can’t Do
Communication Issues
Data Science: The End of Statistics?
The Big Data Ecosystem
Summary
Chapter 3: Becoming a Data Scientist
Key Features of Data Scientists
Types of Data Scientists
Data Scientist Demographics
Training for Data Science
Data Scientist Career Paths
Summary
Chapter 4: Data Science Craftsmanship, Part I
New Types of Metrics
Choosing Proper Analytics Tools
Visualization
Statistical Modeling Without Models
Three Classes of Metrics: Centrality, Volatility, Bumpiness
Statistical Clustering for Big Data
Correlation and R-Squared for Big Data
Computational Complexity
Structured Coefficient
Identifying the Number of Clusters
Internet Topology Mapping
Securing Communications: Data Encoding
Summary
Chapter 5: Data Science Craftsmanship, Part II
Data Dictionary
Hidden Decision Trees
Model-Free Confidence Intervals
Random Numbers
Four Ways to Solve a Problem
Causation Versus Correlation
How Do You Detect Causes?
Life Cycle of Data Science Projects
Predictive Modeling Mistakes
Logistic-Related Regressions
Experimental Design
Analytics as a Service and APIs
Miscellaneous Topics
New Synthetic Variance for Hadoop and Big Data
Summary
Chapter 6: Data Science Application Case Studies
Stock Market
Encryption
Fraud Detection
Digital Analytics
Miscellaneous
Summary
Chapter 7: Launching Your New Data Science Career
Job Interview Questions
Testing Your Own Visual and Analytic Thinking
From Statistician to Data Scientist
Taxonomy of a Data Scientist
400 Data Scientist Job Titles
Salary Surveys
Summary
Chapter 8: Data Science Resources
Professional Resources
Career-Building Resources
Summary
Introduction
Who This Book Is For
What This Book Covers
How This Book Is Structured
What You Need to Use This Book
Conventions
Sometimes, understanding what something is includes having a clear picture of what it is not. Understanding data science is no exception. Thus, this chapter begins by investigating what data science is not, because the term has been much abused and a lot of hype surrounds big data and data science. You will first consider the difference between true data science and fake data science. Next, you will learn how new data science training has evolved from traditional university degree programs. Then you will review several examples of how modern data science can be used in real-world scenarios.
Finally, you will review the history of data science and its evolution from computer science, business optimization, and statistics into modern data science and its trends. At the end of the chapter, you will find a Q&A section from recent discussions I’ve had that illustrate the conflicts between data scientists, data architects, and business analysts.
This chapter asks more questions than it answers, but you will find the answers discussed in more detail in subsequent chapters. The purpose of this approach is for you to become familiar with how data scientists think, what is important in the big data industry today, what is becoming obsolete, and what people interested in a data science career don’t need to learn. For instance, you need to know statistics, computer science, and machine learning, but not everything from these domains. You don’t need to know the details about complexity of sorting algorithms (just the general results), and you don’t need to know how to compute a generalized inverse matrix, nor even know what a generalized inverse matrix is (a core topic of statistical theory), unless you specialize in the numerical aspects of data science.
Books, certificates, and graduate degrees in data science are spreading like mushrooms after the rain. Unfortunately, many are just a mirage: people taking advantage of the new paradigm to quickly repackage old material (such as statistics and R programming) with the new label “data science.”
Expanding on the R programming example of fake data science, note that R is an open source statistical programming language and environment that is at least 20 years old, and is the successor of the commercial product S+. R was and still is limited to in-memory data processing and has been very popular in the statistical community, sometimes appreciated for the great visualizations that it produces. Modern environments have extended R capabilities (the in-memory limitations) by creating libraries or integrating R in a distributed architecture, such as RHadoop (R + Hadoop). Of course other languages exist, such as SAS, but they haven’t gained as much popularity as R. In the case of SAS, this is because of its high price and the fact that it was more popular in government organizations and brick-and-mortar companies than in the fields that experienced rapid growth over the last 10 years, such as digital data (search engine, social, mobile data, collaborative filtering). Finally, R is not unlike the C, Perl, or Python programming languages in terms of syntax (they all share the same syntax roots), and thus it is easy for a wide range of programmers to learn. It also comes with many libraries and a nice user interface. SAS, on the other hand, is more difficult to learn.
To add to the confusion, executives and decision makers building a new team of data scientists sometimes don’t know exactly what they are looking for, and they end up hiring pure tech geeks, computer scientists, or people lacking proper big data experience. The problem is compounded by Human Resources (HR) staff who do not know any better and thus produce job ads that repeat the same keywords: Java, Python, MapReduce, R, Hadoop, and NoSQL. But is data science really a mix of these skills?
Sure, MapReduce is just a generic framework to handle big data by reducing data into subsets and processing them separately on different machines, then putting all the pieces back together. So it’s the distributed architecture aspect of processing big data, and these farms of servers and machines are called the cloud.
Hadoop is an implementation of MapReduce, just like C++ is an implementation (still used in finance) of object oriented programming. NoSQL means “Not Only SQL” and is used to describe database or data management systems that support new, more efficient ways to access data (for instance, MapReduce), sometimes as a layer hidden below SQL (the standard database querying language).
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!