32,99 €
Dive deeper into SPSS Statistics for more efficient, accurate, and sophisticated data analysis and visualization SPSS Statistics for Data Analysis and Visualization goes beyond the basics of SPSS Statistics to show you advanced techniques that exploit the full capabilities of SPSS. The authors explain when and why to use each technique, and then walk you through the execution with a pragmatic, nuts and bolts example. Coverage includes extensive, in-depth discussion of advanced statistical techniques, data visualization, predictive analytics, and SPSS programming, including automation and integration with other languages like R and Python. You'll learn the best methods to power through an analysis, with more efficient, elegant, and accurate code. IBM SPSS Statistics is complex: true mastery requires a deep understanding of statistical theory, the user interface, and programming. Most users don't encounter all of the methods SPSS offers, leaving many little-known modules undiscovered. This book walks you through tools you may have never noticed, and shows you how they can be used to streamline your workflow and enable you to produce more accurate results. * Conduct a more efficient and accurate analysis * Display complex relationships and create better visualizations * Model complex interactions and master predictive analytics * Integrate R and Python with SPSS Statistics for more efficient, more powerful code These "hidden tools" can help you produce charts that simply wouldn't be possible any other way, and the support for other programming languages gives you better options for solving complex problems. If you're ready to take advantage of everything this powerful software package has to offer, SPSS Statistics for Data Analysis and Visualization is the expert-led training you need.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 607
Veröffentlichungsjahr: 2017
Keith McCormickJesus Salcedo
with Jon Peck and Andrew Wheeler
SPSS® Statistics for Data Analysis and Visualization
Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com
Copyright © 2017 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada
ISBN: 978-1-119-00355-7 ISBN: 978-1-119-00557-5 (ebk) ISBN: 978-1-119-00366-3 (ebk)
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.
For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2017936609
Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. SPSS is a registered trademark of International Business Machine Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.
We would like to dedicate this book to Jon Peck, who retired from more than 30 years with SPSS and IBM while this book was in its final stages. We wish him the best of retirements even though he probably won't be able to resist staying in the SPSS community in some form.
Keith McCormick is a data mining consultant, trainer, and speaker. A passionate user of SPSS for 25 years, he has trained thousands on how to effectively use SPSS Statistics and SPSS Modeler. He blogs at keithmccormick.com.
Jesus Salcedo is an independent statistical consultant. He is a former SPSS Curriculum Team Lead and Senior Education Specialist, who has written numerous SPSS training courses and trained thousands of users.
Jon Peck, recently retired from IBM and SPSS, was instrumental in developing and introducing the R and Python connections to the SPSS community. This expertise made him uniquely qualified to produce Chapter 18. He is the author of all the extension commands discussed in that chapter and has a patent pending on the algorithm in SPSSINC TURF procedure discussed there. He can be reached at [email protected].
Andrew Wheeler is a professor of criminology at the University of Texas at Dallas and a former crime analyst. The application of geospatial techniques in his research created the opportunity for a powerful real world example in Chapter 8. He has used SPSS for over 10 years, and often blogs SPSS tutorials at andrewpwheeler.wordpress.com.
Jon Peck, now retired from IBM, was a senior engineer, statistician, and product strategy person for SPSS and IBM for 32 years. He earned a Ph.D in economics from Yale University, and taught econometrics and statistics there for 13 years before joining SPSS. He designed and contributed to many features of SPSS Statistics and has consulted with and trained many users. He remains active on social media and in consulting.
Terry Taerum has fifteen years’ experience as a statistician at the University of Alberta, fifteen years as a data analyst at SPSS Inc., and five years as a predictive analyst and consultant with IBM Inc.
Project Editor
Tom Dinse
Technical Editors
Jon Peck
Terry Taerum
Production Editor
Dassi Zeidel
Copy Editor
Kim Cofer
Production Manager
Katie Wisor
Manager of Content Development & Assembly
Mary Beth Wakefield
Marketing Manager
Christie Hilbrich
Professional Technology & Strategy Director
Barry Pruett
Business Manager
Amy Knies
Executive Editor
Jim Minatel
Project Coordinator, Cover
Brent Savage
Proofreader
Nancy Carrasco
Indexer
Johnna VanHoose Dinse
Cover Designer
Wiley
Cover Image
iStock.com/agsandrew
Keith and Jesus are especially proud to have worked with Bob Elliot before he retired. Our good friend Dean Abbott recommended Keith to Bob when Bob was seeking out a follow up to Dean’s excellent Applied Predictive Analytics, but specifically in SPSS Statistics. Without both of them, this book would not have been created.
Terry’s and Jon’s contribution extended well beyond technical reviewing. We consider both of them mentors and friends. Jon took over technical reviewing when Terry took on a new role with a return to IBM. Jon, in particular, was an interlocutor and trusted advisor, and we produced a better book as a result.
Tom, our project editor, had to be patient with us. Deadlines slipped, contributors became unavailable, and Bob retired before the book was complete. Whenever it seemed that something wasn’t quite as it should be, it was often Tom that ultimately made it right. He deserves credit for multiple roles, and we thank him.
We would also like to thank all of the many SPSSers that we turn to when we have a question even if they haven’t heard from us in a while. We love the sense of community that we have all managed to maintain even when so many have moved on to other roles. And we thank Jason for capturing that sense of community in his foreword.
Foreword
Introduction
The Audience for This Book
How This Book Is Organized
How to Use This Book
The Themes of the Book
Understanding the SPSS Bundles and the SPSS Modules
The New SPSS Subscription Bundles
What’s New in SPSS 23 and 24?
Part I: Advanced Statistics
Chapter 1: Comparing and Contrasting IBM SPSS AMOS with Other Multivariate Techniques
T-Test
Factor Analysis and Unobserved Variables in SPSS
AMOS
Chapter 2: Monte Carlo Simulation and IBM SPSS Bootstrapping
Monte Carlo Simulation
Monte Carlo Simulation in IBM SPSS Statistics
Creating an SPSS Model File
IBM SPSS Bootstrapping
Chapter 3: Regression with Categorical Outcome Variables
Regression Approaches in SPSS
Logistic Regression
Ordinal Regression Theory
Ordinal Regression Dialogs
Ordinal Regression Output
Categorical Regression Theory
Categorical Regression Dialogs
Categorical Regression Output
Chapter 4: Building Hierarchical Linear Models
Overview of Hierarchical Linear Mixed Models
Mixed Models…Linear
Mixed Models…Linear (Output)
Mixed Models…Generalized Linear
Mixed Models…Generalized Linear (Output)
Adjusting Model Structure
Part II: Data Visualization
Chapter 5: Take Your Data Visualizations to the Next Level
Graphics Options in SPSS Statistics
Understanding the Revolutionary Approach in
The Grammar of Graphics
Bar Chart Case Study
Bubble Chart Case Study
Chapter 6: The Code Behind SPSS Graphics: Graphics Production Language
Introducing GPL: Bubble Chart Case Study
GPL Help
Bubble Chart Case Study Part Two
Double Regression Line Case Study
Arrows Case Study
MBTI Bubble Chart Case Study
Chapter 7: Mapping in IBM SPSS Statistics
Creating Maps with the Graphboard Template Chooser
Chapter 8: Geospatial Analytics
Geospatial Association Rules
Case Study: Crime and 311 Calls
Spatio-Temporal Prediction
Case Study: Predicting Weekly Shootings
Chapter 9: Perceptual Mapping with Correspondence Analysis, GPL, and OMS
Starting with Crosstabs
Correspondence Analysis
Multiple Correspondence Analysis
Applying OMS and GPL to the MCA Perceptual Map
Chapter 10: Display Complex Relationships with Multidimensional Scaling
Metric and Nonmetric Multidimensional Scaling
Nonmetric Scaling of Psychology Sub-Disciplines
Multidimenional Scaling Dialog Options
Multidimensional Scaling Output Interpretation
Subjective Approach to Dimension Interpretation
Statistical Approach to Dimension Interpretation
Part III: Predictive Analytics
Chapter 11: SPSS Statistics versus SPSS Modeler: Can I Be a Data Miner Using SPSS Statistics?
What Is Data Mining?
What Is IBM SPSS Modeler?
Can Data Mining Be Done in SPSS Statistics?
Hypothesis Testing, Type I Error, and Hold-Out Validation
Significance of the Model and Importance of Each Independent Variable
The Importance of Finding and Modeling Interactions
Classic and Important Data Mining Tasks
Chapter 12: IBM SPSS Data Preparation
Identify Unusual Cases
Optimal Binning
Chapter 13: Model Complex Interactions with IBM SPSS Neural Networks
Why “Neural” Nets?
XOR Example Syntax
Neural Net Results with the XOR Variables
Comparing Regression to Neural Net with the Bank Salary Case Study
Chapter 14: Powerful and Intuitive: IBM SPSS Decision Trees
Building a Tree with the CHAID Algorithm
Review of the CHAID Algorithm
CRT for Classification
The Scoring Wizard
Chapter 15: Find Patterns and Make Predictions with K Nearest Neighbors
Using KNN to Find “Neighbors”
The Titanic Dataset and KNN Used as a Classifier
The Trade-Offs between Bias and Variance
Comparing Our Models: Decision Trees, Neural Nets, and KNN
Building an Ensemble
Part IV: Syntax, Data Management, and Programmability
Chapter 16: Write More Efficient and Elegant Code with SPSS Syntax Techniques
A Syntax Primer for the Uninitiated
The Case Study
Chapter 17: Automate Your Analyses with SPSS Syntax and the Output Management System
Overview of the Output Management System
Running OMS from Menus
Automatically Writing Selected Categories of Output to Different Formats
Suppressing Output
Working with OMS data
Running OMS from Syntax
Chapter 18: Statistical Extension Commands
What Is an Extension Command?
TURF Analysis—Designing Product Bundles
Quantile Regression—Predicting Airline Delays
Comparing Ordinary Least Squares with Quantile Regression Results
Support Vector Machines—Predicting Loan Default
Computing Cohen’s d Measure of Effect Size for a T-Test
EULA
Chapter 1
Table 1.1
Table 1.2
Table 1.3
Table 1.4
Chapter 3
Table 3.1
Table 3.2
Table 3.3
Chapter 11
Table 11.1
Table 11.2
Table 11.3
Chapter 13
Table 13.1
Table 13.2
Table 13.3
Chapter 14
Table 14.1
Chapter 15
Table 15.1
Chapter 18
Table 18.1
Table 18.2
Introduction
Figure I-1:
Example of version 24 custom table
Figure I-2:
The Exension Hub
Chapter 1
Figure 1.1
T-test dialog
Figure 1.2
T-test results
Figure 1.3
Chart Builder dialog
Figure 1.4
Chart results with regression lines added
Figure 1.5
General Linear Model menu options
Figure 1.6
Univariate dialog
Figure 1.7
Univariate options subdialog
Figure 1.8
ANCOVA results
Figure 1.9
Multivariate dialog
Figure 1.10
MANOVA Multivariate Tests
Figure 1.11
Additional MANOVA results
Figure 1.12
T-test dialog
Figure 1.13
MANCOVA Multivariate Tests results
Figure 1.14
MANCOVA Tests of Between-Subjects Effects results
Figure 1.15
MANCOVA Parameter Estimates
Figure 1.16
MANCOVA dialog with four covariates
Figure 1.17
Pillai’s Trace results
Figure 1.18
MANCOVA Between-Subjects Effects
Figure 1.19
MANCOVA Parameter Estimates
Figure 1.20
Alternate MANCOVA Multivariate Tests
Figure 1.21
Alternate MANCOVA Parameter Estimates
Figure 1.22
Factor Analysis menu
Figure 1.23
Factor Analysis dialog
Figure 1.24
Extraction subdialog
Figure 1.25
Factor Analysis results
Figure 1.26
An AMOS model similar to our MANCOVA
Figure 1.27
The AMOS interface
Figure 1.28
An AMOS version of our factor analysis
Figure 1.29
Our “General Model”
Figure 1.30
Estimating a verbal effect using regression
Figure 1.31
Estimating a combined score effect using regression
Figure 1.32
General Model with S/N
Figure 1.33
The View menu
Figure 1.34
Outline pane of the AMOS Text Output
Figure 1.35
Top portion of Model Fit Summary
Figure 1.36
RMSEA results
Figure 1.37
Hoelter results
Figure 1.38
General Model with S/N and sex
Figure 1.39
General Model with S/N and rank
Figure 1.40
Dual causality
Figure 1.41
Our best model
Chapter 2
Figure 2.1
Simulation: Model Source dialog
Figure 2.2
Fans dataset
Figure 2.3
Completed Linear Regression dialog
Figure 2.4
Completed Linear Regression: Save dialog
Figure 2.5
Linear regression results
Figure 2.6
Simulated Fields panel
Figure 2.7
Fit Details dialog
Figure 2.8
Model tab
Figure 2.9
Correlations panel
Figure 2.10
Advanced Options panel
Figure 2.11
Density Functions panel
Figure 2.12
Output panel
Figure 2.13
Save panel
Figure 2.14
Model Type table
Figure 2.15
Input Distributions table
Figure 2.16
Correlations table
Figure 2.17
Stopping Criteria table
Figure 2.18
Simulation Summary table
Figure 2.19
Descriptive Statistics of Scale Targets table
Figure 2.20
Descriptive Statistics of Scale Inputs table
Figure 2.21
Correlations table
Figure 2.22
Probability Density chart
Figure 2.23
Chart Options dialog
Figure 2.24
Edited Probability Density chart
Figure 2.25
Tornado chart
Figure 2.26
Default frequencies report
Figure 2.27
The Frequencies menu with the Bootstrap submenu
Figure 2.28
The Bootstrap submenu
Figure 2.29
Frequency table with bootstrap results
Figure 2.30
Split File menu
Figure 2.31
Frequency table with bootstrap results and with a split applied
Figure 2.32
Descriptives table with bootstrap results
Figure 2.33
Regression coefficients with standard confidence interval
Figure 2.34
Regression coefficients with bootstrap confidence intervals
Chapter 3
Figure 3.1
Ordinal Regression dialog
Figure 3.2
Options dialog
Figure 3.3
Distribution of the satisfied variable
Figure 3.4
Output dialog
Figure 3.5
Location dialog
Figure 3.6
Scale dialog
Figure 3.7
Warning of cells with frequency of zero
Figure 3.8
Case Processing Summary table
Figure 3.9
Model Fitting Information table
Figure 3.10
Goodness-of-Fit table
Figure 3.11
Pseudo R-Square table
Figure 3.12
Parameter Estimates table
Figure 3.13
Test of Parallel Lines table
Figure 3.14
Crosstab between actual and predicted outcomes
Figure 3.15
Categorical Regression dialog
Figure 3.16
Define Scale dialog
Figure 3.17
Discretization dialog
Figure 3.18
Missing Values dialog
Figure 3.19
Options dialog
Figure 3.20
Regularization dialog
Figure 3.21
Output dialog
Figure 3.22
Save dialog
Figure 3.23
Plots dialog
Figure 3.24
Case Processing Summary table
Figure 3.25
Model Summary table
Figure 3.26
ANOVA table
Figure 3.27
Coefficients table
Figure 3.28
Correlations and Tolerance table
Figure 3.29
Quantifications table: recommend
Figure 3.30
Quantifications table: satisfied
Figure 3.31
Quantifications table: valuable
Figure 3.32
Quantifications table: when_purchased
Figure 3.33
Transformation plot: recommend
Figure 3.34
Transformation plot: satisfied
Figure 3.35
Transformation plot: valuable
Figure 3.36
Transformation plot: when_purchased
Chapter 4
Figure 4.1
Merchandise sales data
Figure 4.2
Analyze ➪ Mixed Models menu options
Figure 4.3
Specify Subjects and Repeated dialog
Figure 4.4
Linear Mixed Models dialog
Figure 4.5
Fixed Effects dialog
Figure 4.6
Random Effects dialog
Figure 4.7
Random Effects dialog
Figure 4.8
Estimation dialog
Figure 4.9
Statistics dialog
Figure 4.10
EM Means dialog
Figure 4.11
Save dialog
Figure 4.12
Model Dimension table
Figure 4.13
Information Criteria table
Figure 4.14
Type III Tests of Fixed Effects table
Figure 4.15
Estimates of Fixed Effects table
Figure 4.16
Estimates of Covariance Parameters table
Figure 4.17
Estimates of Covariance Parameters table for a null model
Figure 4.18
Data Structure dialog
Figure 4.19
Fields & Effects: Target dialog
Figure 4.20
Fields & Effects: Fixed Effects dialog
Figure 4.21
Fields & Effects: Random Effects dialog
Figure 4.22
Model Summary
Figure 4.23
Data Structure
Figure 4.24
Predicted by Observed
Figure 4.25
Fixed Effects (diagram)
Figure 4.26
Fixed Effects (table)
Figure 4.27
Fixed Coefficients (diagram)
Figure 4.28
Fixed Coefficients (table)
Figure 4.29
Covariance Parameters
Figure 4.30
No random effects
Figure 4.31
Model Summary
Figure 4.32
Covariance Parameters
Figure 4.33
Fixed Coefficients
Chapter 5
Figure 5.1
Graphs menu
Figure 5.2
Legacy Bar Charts menu
Figure 5.3
Chart Builder main menu
Figure 5.4
Basic Elements submenu
Figure 5.5
Graphboard Template Chooser main menu
Figure 5.6
Graphboard Template Chooser Basic tab
Figure 5.7
Graphboard Template Chooser fields specified
Figure 5.8
Detailed Tab
Figure 5.9
Bar chart
Figure 5.10
Graphboard Editor
Figure 5.11
Regions sorted
Figure 5.12
: Region: Range as summary
Figure 5.13
Bubble Chart Detailed tab
Figure 5.14
Bubble Chart
Figure 5.15
Edited Bubble Chart
Chapter 6
Figure 6.1
Chart Builder Gallery tab
Figure 6.2
Preview of grouped scatterplot
Figure 6.3
Groups/Point ID tab
Figure 6.4
Element Properties
Figure 6.5
Bubble plot
Figure 6.6
Help and Reference options
Figure 6.7
Bubble plot with changes
Figure 6.8
Bubble plot with red and blue states
Figure 6.9
Bubble plot with bands
Figure 6.10
Bubble plot with polygon
Figure 6.11
PainTreat data file
Figure 6.12
Scatterplot between pain and physical therapy during the first time period
Figure 6.13
Double regression line
Figure 6.14
Preview of scatterplot with panel variable
Figure 6.15
Change in pain by drug treatment
Figure 6.16
MBTI bubble chart
Figure 6.17
Preview of bubble plot
Figure 6.18
Relationship between class rank and SAT scores and MBTI results
Chapter 7
Figure 7.1
Worldwide sales data
Figure 7.2
Bar chart of customer location
Figure 7.3
Map of customer locations
Figure 7.4
One variable selected
Figure 7.5
Select Maps dialog
Figure 7.6
Completed Detailed tab
Figure 7.7
Choropleth of Counts
Figure 7.8
Two categorical variables selected
Figure 7.9
Choropleth of Values
Figure 7.10
Pie of counts on a map
Figure 7.11
One categorical and one continuous variable selected
Figure 7.12
Choropleth of Sums
Figure 7.13
Two categorical and one continuous variable selected
Figure 7.14
Bars on a Map
Figure 7.15
Two continuous variables selected
Figure 7.16
Coordinates on a Reference map
Figure 7.17
Two continuous variables and one categorical variable selected
Figure 7.18
Coordinates on a Choropleth of Counts map
Figure 7.19
Four continuous variables selected
Figure 7.20
Arrows on a Reference map
Chapter 8
Figure 8.1
Opening the Geospatial Modeling Wizard
Figure 8.2
Adding map data
Figure 8.3
Assigning context and prediction data
Figure 8.4
Associating map data
Figure 8.5
Associating fields from data to the map
Figure 8.6
Assigned geospatial coordinates
Figure 8.7
Setting the coordinate system
Figure 8.8
Setting the prediction variables
Figure 8.9
Setting the condition variables
Figure 8.10
Setting output for geospatial association rules
Figure 8.11
Setting the rules
Figure 8.12
Defining bins
Figure 8.13
Rule 14: Theft F/Auto and Graffiti > 3
Figure 8.14
Rule 32: Assault w/Dangerous Weapon
Figure 8.15
Setting target and predictor fields for spatio-temporal modeling
Figure 8.16
Setting the time intervals
Figure 8.17
Setting the output
Figure 8.18
Scoring a separate file
Figure 8.19
Regression coefficient tables
Figure 8.20
Predictions of future shootings
Chapter 9
Figure 9.1
Culture and sport perceptual map
Figure 9.2
Crosstabs main menu
Figure 9.3
Crosstabs: Statistics submenu
Figure 9.4
Crosstabs results
Figure 9.5
Crosstabs Cell Display submenu
Figure 9.6
Crosstabs Style submenu
Figure 9.7
Crosstabs results (with highlighting style)
Figure 9.8
Dimension Reduction menu
Figure 9.9
Correspondence Analysis
Figure 9.10
Initial attempt of the perceptual map
Figure 9.11
Correspondence Analysis Dimension Summary
Figure 9.12
Improved perceptual map
Figure 9.13
OMS Control Panel
Figure 9.14
Variable View of OMS results
Figure 9.15
Modified Data View of OMS results
Figure 9.16
Correspondence Analysis dimension summary
Figure 9.17
Perceptual map with GPL modifications
Figure 9.18
Correspondence Analysis dimension summary
Figure 9.19
Correspondence Analysis dimension summary
Figure 9.20
Optimal Scaling submenu
Figure 9.21
Multiple Correspondence Analysis main menu
Figure 9.22
Variable Plots submenu
Figure 9.23
Very crowded Joint Category Plot
Figure 9.24
Discrimination Measures Plot
Figure 9.25
Discrimination Measures table
Figure 9.26
Sorting the mean discrimination measures
Figure 9.27
Draft map with few variables
Figure 9.28
MCA Output submenu
Figure 9.29
Coordinates (partial)
Figure 9.30
OMS Control Panel
Figure 9.31
Dataset produced by OMS
Figure 9.32
OMS Control Panel
Figure 9.33
MCA perceptual map using OMS and GPL
Figure 9.34
MCA version of Figure 9.12
Chapter 10
Figure 10.1
Object points plot
Figure 10.2
Dissimilarity matrix of psychology data
Figure 10.3
Ice cream preference data
Figure 10.4
Proximities matrix ice cream preference data
Figure 10.5
Analyze ➪Scale menu options
Figure 10.6
Multidimensional Scaling: Data Format dialog
Figure 10.7
Multidimensional Scaling: (Proximities in Matrices Across Columns) dialog
Figure 10.8
Multidimensional Scaling: Model dialog
Figure 10.9
Multidimensional Scaling: Restrictions dialog
Figure 10.10
Multidimensional Scaling: Options dialog
Figure 10.11
Multidimensional Scaling: Plots dialog
Figure 10.12
Multidimensional Scaling: Output dialog
Figure 10.13
Scree plot of normalized raw stress
Figure 10.14
Stress and Fit Measures table displaying results for a one-dimensional solution
Figure 10.15
Stress and Fit Measures for three multidimensional scaling solutions
Figure 10.16
Stress decomposition table
Figure 10.17
Coordinates for a two-dimensional solution
Figure 10.18
Object points plot
Figure 10.19
Plot of the actual and transformed proximities (two-dimensional solution)
Figure 10.20
Residuals plot of distances
Figure 10.21
Dimension coordinates and aggregated mean rating on additional scales
Figure 10.22
Modified correlation procedure syntax
Figure 10.23
Correlations between dimensions and aggregated mean rating scales
Chapter 11
Figure 11.1
An SPSS Modeler “stream”
Figure 11.2
Stream with model
Figure 11.3
ANOVA results showing a significant difference
Figure 11.4
Post hoc results table showing a variety of test results
Figure 11.5
A Decision Tree
Figure 11.6
Two very different slopes
Figure 11.7
A closer look at the stream
Figure 11.8
A stream shown on the full canvas area
Figure 11.9
Partition node settings
Figure 11.10
Result set
Figure 11.11
Feature Selection node and model added to stream
Figure 11.12
0 screened fields
Figure 11.13
Distribution node results
Figure 11.14
Generated Balance node calculation
Figure 11.15
Stream with two models added
Figure 11.16
Analysis node results
Figure 11.17
Ensemble methods
Figure 11.18
Comparing models to ensemble results
Figure 11.19
Stream with scoring of test.csv added
Figure 11.20
Scoring results for 10 passengers
Chapter 12
Figure 12.1
Data menu
Figure 12.2
Identify Unusual Cases: Variables dialog
Figure 12.4
Identify Unusual Cases: Save dialog
Figure 12.5
Identify Unusual Cases: Missing Values dialog
Figure 12.6
Identify Unusual Cases: Options dialog
Figure 12.7
Case Processing Summary table
Figure 12.8
Anomaly Case Index List table
Figure 12.9
Anomaly Case Peer ID List table
Figure 12.10
Anomaly Case Reason List table
Figure 12.11
Scale Variable Norms table
Figure 12.12
Categorical Variable Norms table
Figure 12.13
Anomaly Index Summary table
Figure 12.14
Reason 1 table
Figure 12.15
Sorting data
Figure 12.16
New variables sorted
Figure 12.17
Transform menu
Figure 12.18
Optimal Binning: Variables dialog
Figure 12.19
Optimal Binning: Output dialog
Figure 12.20
Optimal Binning: Save dialog
Figure 12.21
Optimal Binning: Missing Values dialog
Figure 12.22
Optimal Binning: Options dialog
Figure 12.23
Descriptive Statistics table
Figure 12.24
Model Entropy table
Figure 12.25
Binning summary table
Figure 12.26
Logistic Regression dialog
Figure 12.27
Variables in the Equation table
Chapter 13
Figure 13.1
An illustration of a perceptron
Figure 13.2
A flat regression line
Figure 13.3
Two regression lines
Figure 13.4
Displaying interaction
Figure 13.5
An illustration of a multilayer perceptron
Figure 13.6
The neural net “topology”
Figure 13.7
Parameter Estimates for the neural net
Figure 13.8
Alternative weights from a second neural net
Figure 13.9
Topology with outcome declared as nominal
Figure 13.10
Parameter Estimates with outcome declared as nominal
Figure 13.11
Training submenu
Figure 13.12
Options submenu
Figure 13.13
The Selection Variable option in Linear Regression
Figure 13.14
Set Rule submenu
Figure 13.15
Complete Regression Output
Figure 13.16
Multilayer Perceptron main menu
Figure 13.17
Partitions submenu
Figure 13.18
Network topology diagram
Figure 13.19
Comparing performance
Figure 13.20
Regression results with interaction terms
Figure 13.21
Updated results with three models compared
Figure 13.22
Results for a more complex regression
Figure 13.23
Topology diagram for the more complex neural net
Figure 13.24
Comparing five models
Figure 13.25
Variables selected for the neural net
Figure 13.26
Neural net topology diagram
Figure 13.27
Classification accuracy results
Figure 13.28
Adding additional variables
Figure 13.29
The Save submenu
Figure 13.30
Model accuracy for the second attempt
Chapter 14
Figure 14.1
Decision tree main menu
Figure 14.2
Validation submenu
Figure 14.3
Training Sample tree
Figure 14.4
Test Sample tree
Figure 14.5
Overall accuracy results
Figure 14.6
Crosstab results for Sex variable
Figure 14.7
Crosstab results for Pclass variable
Figure 14.8
Crosstab showing all three variables
Figure 14.9
Decision tree criteria
Figure 14.10
Training tree after changing settings
Figure 14.11
Accuracy results for the larger tree
Figure 14.12
Decision tree main menu
Figure 14.13
Intial CRT tree
Figure 14.14
Accuracy results for CRT tree
Figure 14.15
Pruning criteria submenu
Figure 14.16
Second CRT tree
Figure 14.17
Second CRT tree accuracy results
Figure 14.18
Using a random assignment
Figure 14.19
Results using the random assignment
Figure 14.20
Scoring Wizard first menu
Figure 14.21
Scoring Wizard second menu
Figure 14.22
Scoring Wizard third menu
Figure 14.23
Scoring Wizard fourth menu
Figure 14.24
Predictive scores for some passengers in the Test dataset
Chapter 15
Figure 15.1
Nearest Neighbor Analysis main menu
Figure 15.2
Partitions submenu
Figure 15.3
Mr. Svensson as focal record
Figure 15.4
Mr. Svensson’s neighbors
Figure 15.5
Mr. Svensson’s Peers Chart
Figure 15.6
Model accuracy
Figure 15.7
Neighbors submenu
Figure 15.8
Optimal value for k
Figure 15.9
Results for k=6
Figure 15.10
Results for k=4
Figure 15.11
Output submenu
Figure 15.12
Output submenu
Figure 15.13
Output submenu
Figure 15.14
Comparing the models with Descriptives
Figure 15.15
Descriptives with Bootstrapping
Figure 15.16
All four models compared
Chapter 16
Figure 16.1
Frequencies main dialog
Figure 16.2
Resulting Syntax in the Syntax Editor
Figure 16.3
Syntax Help
Figure 16.4
Frequencies command in the Syntax editor
Figure 16.5
Help menu showing the Command Syntax Reference
Figure 16.6
Frequencies dialog and Frequencies commands
Figure 16.7
Charts subdialog
Figure 16.8
Main dialog and Cell Display subdialog
Figure 16.9
Three CROSSTABS examples
Figure 16.10
Data View (above) and Variable View (below)
Figure 16.11
Define Variable Properties dialog
Figure 16.12
Declaring the Sales_Amount variable
Figure 16.13
Adding a Value Label to Category_Code
Figure 16.14
Pasted code from the Define Variable Properties dialog
Figure 16.15
Value Labels with additional category codes
Figure 16.16
A few rows of address information in the customer data
Figure 16.17
Type and Label subdialog
Figure 16.18
The STRING command
Figure 16.19
Code examples using scratch variables
Figure 16.20
City names in mixed case
Figure 16.21
City names in descending case
Figure 16.22
A few rows of the transactional dataset
Figure 16.23
First screen of the Restructure Wizard
Figure 16.24
Second screen of the Restructure Wizard
Figure 16.25
The Utilities Menu
Figure 16.26
Count Values within Cases menu option
Figure 16.27
RECODE command in Syntax Help
Figure 16.28
First screen of Add Variables
Figure 16.29
Second screen of Add Variables
Chapter 17
Figure 17.1
Utilities menu options
Figure 17.2
Output Management System Control Panel dialog
Figure 17.3
Completed Output Management System Control Panel dialog
Figure 17.4
Outline pane options
Figure 17.5
OMS label added
Figure 17.6
OMS: Options dialog
Figure 17.7
Output destination added
Figure 17.8
New OMS request added
Figure 17.9
Second OMS request added
Figure 17.10
OMS Control Panel: Summary dialog
Figure 17.11
Crosstabs dialog
Figure 17.12
Output without Case Processing Summary table
Figure 17.13
Bivariate Correlations dialog
Figure 17.14
Traditional correlations output
Figure 17.15
Manipulated correlations syntax
Figure 17.16
Manipulated correlations output
Figure 17.17
Ending OMS requests
Figure 17.18
OMS Control Panel: Summary dialog
Figure 17.19
Correlations dataset
Figure 17.20
Select Cases: If dialog
Figure 17.21
Descriptives dialog
Figure 17.22
Average correlation
Figure 17.23
OMS syntax
Figure 17.24
OMS Identifiers dialog
Chapter 18
Figure 18.1
The menus show the installed extensions. Extension commands have a white “+” icon.
Figure 18.2
Available extensions are listed on the website.
Figure 18.3
Two preference sets
Figure 18.4
Computing reach manually
Figure 18.5
Maximum Group Size: 1. Reach and Frequency
Figure 18.6
The TURF dialog box
Figure 18.7
Maximum Group Size: 2. Reach and Frequency
Figure 18.8
Maximum Group Size: 4. Reach and Frequency
Figure 18.9
Effect of group size
Figure 18.10
Arrival delays by airport
Figure 18.11
Arrival Delays by Airport
Figure 18.12
Regression residuals histogram
Figure 18.13
OLS vs. QR coefficients for month
Figure 18.14
Residual correlations
Figure 18.15
Q-Q plot of OLS against QUANTREG residuals
Figure 18.16
QR coefficients by quantile
Figure 18.17
Discriminant classification
Figure 18.18
Logistic regression classification
Figure 18.19
SVM classification
Figure 18.20
SVM parameter tuning
Figure 18.21
Entering grid search parameters
Figure 18.22
SVM Parameter tuning with two parameters
Figure 18.23
SVM classification result with tuning
Figure 18.24
SVM classification results with weighting
Figure 18.25
The SVM dialog
Figure 18.26
The T-Test dialog
Figure 18.27
The T-TEST output
Figure 18.28
The Python plugin code for Cohen’s d
Figure 18.29
The Calculate with a Pivot Table dialog
Figure 18.30
The TABLE CALC syntax
Figure 18.31
The modified T-TEST output
Cover
Table of Contents
1
vii
ix
xi
xiii
xxiii
xxiv
xxv
xxvi
xxvii
xxviii
xxix
xxx
xxxi
xxxii
xxxiii
xxxiv
xxxv
xxxvi
xxxvii
xxxviii
1
2
1
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
26
28
29
30
31
32
33
34
35
36
37
38
40
41
43
44
45
46
47
48
49
50
52
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
90
91
92
93
94
95
96
97
98
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
143
144
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
179
180
181
182
184
185
186
187
188
189
191
193
194
195
196
197
198
199
200
202
203
204
205
206
207
208
209
210
211
212
213
214
215
217
218
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
242
243
244
245
246
249
250
251
252
253
254
256
257
258
259
260
261
262
263
264
265
266
267
268
269
271
272
273
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
291
292
293
294
295
296
297
298
299
300
301
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
337
338
339
340
341
342
343
344
345
346
347
349
351
352
353
355
356
357
358
359
360
361
362
363
364
365
366
368
369
371
372
373
374
375
376
377
378
379
380
381
382
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
459
460
461
462
463
464
465
466
467
468
469
470
471
In my various roles at SPSS and IBM I met Keith and Jesus many years ago. They both have over 20 years of statistical consulting experience, and they both have been training people on statistics and how to use SPSS for many years. Each has in fact trained thousands of students. They are uniquely qualified to bring the message and content of this book to you, and they have done so with rigor and grace. SPSS has so many techniques and procedures to perform both simple and complex analysis, and Keith and Jesus will introduce you to this rich tapestry so that it pays dividends in benefiting your endeavors in driving societal change based on data and analytics for years to come. This book goes beyond the elementary treatments found in most of the other books on SPSS Statistics but is written for users who do not necessarily have an advanced statistical background. It can make the reader a better analyst by expanding their toolkit to include powerful techniques that he or she might not otherwise consider but that can have a big payoff in increased insight.
Keith and Jesus’ outstanding new book on SPSS Statistics has brought back so many thoughts about this great product and the influence it has had on so many people that I thought I would briefly reminisce.
I first became involved with this software when I went to work for SPSS in 1995 as Director of Quality Assurance. A year earlier, SPSS had released its first Microsoft Windows product—which, while solid, did not really take advantage of the amazing possibilities a true graphical interface could provide. This was a huge and important time for the company as the SPSS team was hard at work revolutionizing both the front-end user interface and the output to create a standard that is still in place and considered best of breed today. These innovations enabled sophisticated pivot table output as well as much more customized graphical output than had ever been attempted before. Indeed, in the years to come it was that spirit of always getting ahead of every technological trend that would keep this software right in the heart of what the data analysis community demanded.
When I say the heart of the data analysis community I am not in any way exaggerating. This software has been used by hundreds of thousands of students in college and graduate school and by similar numbers in government and commercial environments worldwide. Over the years I have literally had hundreds, if not thousands of people say to me “I used SPSS in college” when I introduced myself. And of course, I can’t leave out the bootleg copies I have seen in innumerable places during my travels and personally purchased on the streets of Santiago and Beijing.
Impressive? Absolutely. But of course the real question is … WHY is SPSS so heavily used and so well loved? WHY has its community of users stayed vibrant and loyal even eight years after the company itself was acquired by IBM?
The answer is the combination of power and simplicity combined with elegance. This is a big statement. To back this up—and apropos of the subject matter—I’ll contribute a data point as my best evidence. A few years ago, when I was still with IBM (which acquired SPSS in 2009), we hired a summer intern who had used our software for a semester in college. After about a month on the job, we debriefed her on the progress of her user interface design assignment. She discussed at length the challenges she was having coming up with a design that was up to the standard of the rest of the product in terms of simplicity, backed by immense power. This led to a discussion of the first time she used the product as a student. Of course, opening a “statistics” product for the first time filled this iPhone-using millennial with much trepidation; however, as she described to us within just a few minutes she was loading and manipulating data, building predictive models, and producing output for her class. In just a short time beyond that she was digging into the depths of some of the power the product provided. Even a user nearly born and bred with the beautiful user designs of the smartphone consumer era was right at home using SPSS. What an amazing statement in and of itself. Think about it! This is made even more extraordinary because this same student had interactions with professors and researchers on her campus who were using—in fact, relying on—that very same product to do their cutting-edge work. As I said, the answer is the combination of power and simplicity combined with elegance.
This amazing simplicity does not come at the expense of power. As Keith and Jesus make clear in this book, SPSS Statistics is an incredibly powerful tool for data analysis and visualization. Even today there is no tool that works with its users of any level (novice, intermediate, or expert) to uncover meanings and relationships in data as powerfully as SPSS does. Further, once the data has been prepared, the models built, and the analysis done, there is no software available that is better at explaining the results to non-data analysts who have to act on it. This increases the value of the tool immeasurably—since it creates the understanding and confidence to deploy its insights into the real world to create real value. Having seen this done so many times, by so many people, in so many domains, I can say to those starting with this product for the first time that I truly envy you—you are about to start on a journey of learning and getting results that will amaze you—and the people you work with.
Let’s put this all in perspective. This product is now in its sixth decade of existence. That’s right—it first came out in the late 1960s. How many products can you name that have survived and prospered for that long? Not many. The Leica M camera and the Porsche 911 car with their classic timeless designs come to mind, but not much else. How many COMPUTER products? Even less; perhaps only the venerable IBM mainframe, in fact. But here we have IBM SPSS Statistics—not only surviving but still as relevant and vital as ever—right in the midst of the new age of big data and machine learning, heavily used by experts who dig deep into data and model building, but usable by novices in the iPhone era as well.
Now, let us switch our focus from celebrating the vibrancy and staying power of the SPSS journey and into the heart of what Keith and Jesus have addressed in this book. This is first and foremost a book for data analysis practitioners at intermediate and advanced levels. The question this begs is how this product can help that audience create the most value in the modern era.
Unlike the world of the late 1960s when SPSS was created, we now live in an age where there are many tools to do quick and fast analysis of datasets. For example, Tableau is a fine tool for more business-oriented users with less data analysis training to get immediate and useful visual insights from their data. So what then is the need for IBM SPSS Statistics in this new world?
To answer that question, let me take you back several years to a conference called “MinneAnalytics,” sponsored by a Minnesota-based organization of analytic professionals, where I delivered a presentation on Advanced Analytics called “What’s Your World View?” In that presentation, I envisioned a rapidly approaching new age where “big data” would meet advanced analytic techniques running in real time and that combination would drive every decision- making aspect of how our society would work. I compared the importance of this movement to previous huge steps that changed the very foundation of society—including the invention of the automobile and the invention of assembly-line production for manufacturing many different types of goods.
Well, a mere three years later that “future” society is here already—right now. It is happening all around us. Analytics on big data is driving decision making and processes everywhere you look. Hospitals apply real-time analytics to data feeds from patient-monitoring instruments in intensive care units to message doctors automatically that their patient in the ICU will shortly take a turn for the worse. Firms managing trucking use analytics to intervene proactively when the system tells them one of their drivers is predicted to have an accident. Airplanes and cars apply real-time analytics to engine sensors to predict failure and inform the pilots and drivers to take action before such failure occurs. Indeed, big data analytics has become one of the most disruptive forces in business history and is unleashing new value creation quite literally wherever you look. All of these examples clearly show a fundamental point—quick visual understanding is one thing—but deep insight yielding confidence in a predictive model that is deployed in real time at critical decision points at vast scale is quite another. It is in this realm of confirmation and confidence that SPSS Statistics shines like no other.
Mass deployment of advanced analytics will create benefits for society that are for all intents and purposes unimaginable. Assuming, of course, that the deployed analytics are in fact correct (and with the right tweaking and trade-offs between accuracy and stability) and deployed properly. It is the almost unique benefit of SPSS that no matter what language in which those analytics are built (SPSS, R, Python, supervised or unsupervised, standard or machine learning, executed programmatically or through visual interfaces, or any other variant you can think of) the product can be used to confirm confidence that the desired results will be achieved, and in understanding the risks involved. It can also be used to explain the results to others in the enterprise, aligning those who need to be in the know on exactly and precisely how analytics drive their new business models. There is no better “hub” for data scientists to practice their craft and contribute their value to the creation of a new world—a new world of staggering rates of change guided or driven by data and analytics.
IBM SPSS Statistics is the perfect tool for this new world when used by well-trained analysts who can put all the data and all the insights together without mistakes to create the most value. People who can take the output of machine learning, add traditional data and then other new forms of data (like sensors and social media for example), to get insights well beyond those quick insights from Tableau and other surface-level tools. People who know how to use the advanced capabilities of the tool, such as the ability to do mixed model analysis of data at different levels (for example, within a hierarchy to find even deeper insights). Such a tool, in the hands of such people—well-trained data scientists—can drive us into this new remarkable world with both confidence and safety. To become one of those who drive this societal transformation using SPSS you can benefit from having this book as your guide.
Enjoy the book…and enjoy the next 50 years of IBM SPSS Statistics as well!
— Jason Verlen
Jason Verlen is currently Senior Vice President of Product Management and Marketing at CCC Information Services, based in Chicago. Before moving to CCC he spent 20 years at SPSS and then IBM (after its acquisition of SPSS) in various roles ending with being named Vice President of Big Data Analytics at IBM.
This book is a collaboration between me (Keith) and several other career-long “SPSSers,” and the editorial decisions about what to cover, and how to cover it, are greatly affected by that fact. My own career took a turn down a road that led to a life of learning, teaching, and consulting about SPSS almost 20 years ago. I was contemplating a PhD in Psychometrics at the University of North Carolina, Chapel Hill. My plans didn’t get much further than auditing some prerequisites and establishing residency. So, on paper, I hadn’t made much progress, but moving 1000 miles (from Massachusetts) to relocate and purchasing a house represented a milestone in my life and career. I’m still in that same house (more than 22 years now), and I’m still using SPSS almost daily. Like many things in life, it seems almost accidental. I was doing contract statistics work using SPSS, working from home while I planned for a life in graduate school, and I drove up to Arlington, VA to take advantage of what SPSS training then called the training “subscription.”
The concept was to take as many classes as you can manage in a year. It was remarkably cost effective. I was able to convince my primary contract client to pay for the subscription under the condition that I covered all other expenses, and didn’t let it affect my deadlines. I already had several years of daily SPSS use under my belt, so I was hardly a rookie, but it was too good to pass up. I found a summer sublet in Washington, DC, took advantage of the training classes almost daily for a couple of months, learned all the latest features, learned about modules that I had never tried, made some good new friends, and worked late into the evening trying to keep my contract research work on schedule. Then suddenly I was asked if I wanted to relocate and take on teaching the basic classes in that same office. I declined the full-time position (the grad school idea was still alive), but I did start making occasional trips. Within a year they were frequent trips, and it became effectively full time, including training trips all over the United States and Canada.
A bit of nostalgia, perhaps, but there is a good reason to reflect on that time period in SPSS Inc.’s history. As Jason Verlen notes in his foreword to this book, the mid to late ’90s was a pivotal time in the development of SPSS. With Windows 95 came a whole new world, and SPSS Inc. leaped into the fray. Also, in the late ’90s, SPSS Inc. bought ISL, and with it, Clementine. The revolutionary software package then became SPSS Clementine, and is now called IBM SPSS Modeler. While this book is dedicated to SPSS Statistics and not SPSS Modeler, my career certainly was never quite the same since. Although that was the acquisition that most influenced my career, it was certainly not the only one. There were numerous acquisitions during that period, growing the SPSS family to include products like AMOS, SPSS Data Collection, and Showcase.
It was also a bit of a golden age in SPSS training. Almost 20 of us offered SPSS training frequently. On any given day, there were at least a couple of SPSS training events being held in one of several cities that had permanent full-time SPSS training facilities. Traveling to public training was common then—online training hadn’t yet arrived. It simply was how training was done. In light of this very active, live, corporate-managed, instructor-led training economy more than 30 distinct classes were offered that represented 50–60+ days of training content. It took me three years before I found myself teaching 80% of them, and even longer before I taught all of them. Classroom training was seen as a key way to support the user community, so even classes that were infrequent, and therefore not very profitable, were still scheduled to support the product. Everything changes over time, and certainly traveling cross-country to a corporate training center for 5 continuous days of training, with a stack of huge books, along with 16 strangers from other companies seems quaint now.
For all of us who experienced it as trainers and participants, however, we are forever changed. One of the things that always struck me, and that still knocks me off my feet, was that the 32 books we used were not enough! SPSS had so many great new features coming out with each new version that it was hard to keep up, even though we were in the classroom three-quarters of the time. The Arlington office frequently had another trainer teaching in a room next door, so we would have lunch together, and admit to each other that we had left ourselves with a few too many pages for day three. Day three! And that was just the Regression class! We’d sometimes lament that someone had shown up for a class, but had skipped one or more of the three prerequisites. Can you imagine? Seven days of prerequisites to take a training class! It just wouldn’t work to require that many days now, but we worked hard, and covered a lot of ground, and we went through all the software output, step by step. Then we would make a change to the model, or respond to an audience question, and go through the entire output again, step by step. Go ahead and admit it—if you are like us it probably sounds great. And it was.
My friend and coauthor Jesus Salcedo had a similar experience, and in those same classrooms. He also had an interest in psychometrics, except that he actually acted on his interest and earned his PhD. We met in the very busy New York City SPSS Training office when I was sent there as a contractor during his tenure. He was the full-time trainer in that office. We’d often chat about our favorite course guides (and least favorite) and became friends over an occasional shared meal in that Empire State Building office, or nearby in New York’s Koreatown neighborhood. So, the perspective that we both start with is that SPSS is a big topic, a worthy topic, and frankly, a sometimes intimidating topic. We still feel this way today. There is so much to learn that we struggle to keep up with everything new. At a consultancy where we worked for a time as a team, we put together a series of monthly seminars that proved to us again that there was always something new to learn. Each and every month, we discovered new features when we were preparing for a new topic. So tens of thousands of training hours later, we still learn something new all the time.
Of course, we aren’t asked to really show what we can do as often as we used to be. The reason, of course, is that training these days is rushed. We are often asked to cover two days’ worth of information in just one, or five days’ in just two, or ten days’ in just four. It happens all the time. We are pros, and we do as we are asked, but we know, we really know, that to do a proper job it takes more time. The book market is flooded with rookie SPSS books. The more advanced books tend to be more advanced in the theory, but not at all advanced in the practice of using SPSS, its efficient use, or the sophisticated use of its features. A major motivation in writing this book is the loss of organizational memory that has occurred since in-depth specialized SPSS training courses have started to disappear over the last ten years.
So, with this book, we get to call the shots, and what we are trying to offer all of you is a chance to learn some intermediate to advanced topics thoroughly enough that you will be tempted to use them yourself, very possibly for the first time. We don’t try to cover every topic—barely two dozen out of a hundred that we could have chosen, in fact. This is not at all encyclopedic. It certainly is also not a book-length treatment on a single subject. It gives you a taste of what attending one of our classes 15 years ago might have been like—a couple hours’ worth on each of several interesting, powerful topics that you might not even know existed.
We think that this book fills an important niche. Books on the fundamentals of using SPSS Statistics are not in short supply. There are certainly dozens of them. Some are better than others. Naturally, we are proud of our own contribution: IBM SPSS Statistics For Dummies, 3rd Edition (Wiley, 2011). However, this book is certainly not a book about the fundamentals of settings up SPSS properly, or running routine statistics like T-tests or Chi-Square. Nor is this book a good choice for reviewing Statistics 101. Knowledge of topics like Ordinary Least Squares regression and ANOVA is assumed.
Since beginning the quest to contribute something we felt was new and needed for the SPSS Statistics community, Jesus Salcedo and I have consistently thought of the same audience. We have imagined the intermediate-level practitioner, perhaps relatively new or perhaps even a long-time user of SPSS, who is stuck in a rut. We imagine ourselves in a sense. If it wasn’t for our training careers, forcing us to learn the new features as soon as they come out, we probably wouldn’t be familiar with all of the techniques in this book. We use the shortcuts because we are active in the corporate community of SPSS, yet we meet veteran users all the time who don’t even know they exist. We have our own personal favorite techniques, tips, and tricks, but we know many users who know their theory very well, yet haven’t discovered a key feature that could make their analysis more effective, even though it’s been in the last 10 versions. I mention this specifically because it is a constant, even humorous, but telling exchange:
“Wow, that is amazing. I’m so glad that they added that feature. It must be brand new.”
“Actually, we’ve had that since version X. It’s been around for about 8 years.”
