SPSS Statistics for Data Analysis and Visualization - Keith McCormick - E-Book

SPSS Statistics for Data Analysis and Visualization E-Book

Keith McCormick

0,0
32,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Dive deeper into SPSS Statistics for more efficient, accurate, and sophisticated data analysis and visualization SPSS Statistics for Data Analysis and Visualization goes beyond the basics of SPSS Statistics to show you advanced techniques that exploit the full capabilities of SPSS. The authors explain when and why to use each technique, and then walk you through the execution with a pragmatic, nuts and bolts example. Coverage includes extensive, in-depth discussion of advanced statistical techniques, data visualization, predictive analytics, and SPSS programming, including automation and integration with other languages like R and Python. You'll learn the best methods to power through an analysis, with more efficient, elegant, and accurate code. IBM SPSS Statistics is complex: true mastery requires a deep understanding of statistical theory, the user interface, and programming. Most users don't encounter all of the methods SPSS offers, leaving many little-known modules undiscovered. This book walks you through tools you may have never noticed, and shows you how they can be used to streamline your workflow and enable you to produce more accurate results. * Conduct a more efficient and accurate analysis * Display complex relationships and create better visualizations * Model complex interactions and master predictive analytics * Integrate R and Python with SPSS Statistics for more efficient, more powerful code These "hidden tools" can help you produce charts that simply wouldn't be possible any other way, and the support for other programming languages gives you better options for solving complex problems. If you're ready to take advantage of everything this powerful software package has to offer, SPSS Statistics for Data Analysis and Visualization is the expert-led training you need.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 607

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



SPSS® Statistics for Data Analysis and Visualization

Keith McCormickJesus Salcedo

with Jon Peck and Andrew Wheeler

SPSS® Statistics for Data Analysis and Visualization

Published by John Wiley & Sons, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com

Copyright © 2017 by John Wiley & Sons, Inc., Indianapolis, Indiana Published simultaneously in Canada

ISBN: 978-1-119-00355-7 ISBN: 978-1-119-00557-5 (ebk) ISBN: 978-1-119-00366-3 (ebk)

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that Internet websites listed in this work may have changed or disappeared between when this work was written and when it is read.

For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Control Number: 2017936609

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. SPSS is a registered trademark of International Business Machine Corporation. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

We would like to dedicate this book to Jon Peck, who retired from more than 30 years with SPSS and IBM while this book was in its final stages. We wish him the best of retirements even though he probably won't be able to resist staying in the SPSS community in some form.

About the Authors

Keith McCormick is a data mining consultant, trainer, and speaker. A passionate user of SPSS for 25 years, he has trained thousands on how to effectively use SPSS Statistics and SPSS Modeler. He blogs at keithmccormick.com.

Jesus Salcedo is an independent statistical consultant. He is a former SPSS Curriculum Team Lead and Senior Education Specialist, who has written numerous SPSS training courses and trained thousands of users.

Jon Peck, recently retired from IBM and SPSS, was instrumental in developing and introducing the R and Python connections to the SPSS community. This expertise made him uniquely qualified to produce Chapter 18. He is the author of all the extension commands discussed in that chapter and has a patent pending on the algorithm in SPSSINC TURF procedure discussed there. He can be reached at [email protected].

Andrew Wheeler is a professor of criminology at the University of Texas at Dallas and a former crime analyst. The application of geospatial techniques in his research created the opportunity for a powerful real world example in Chapter 8. He has used SPSS for over 10 years, and often blogs SPSS tutorials at andrewpwheeler.wordpress.com.

About the Technical Editors

Jon Peck, now retired from IBM, was a senior engineer, statistician, and product strategy person for SPSS and IBM for 32 years. He earned a Ph.D in economics from Yale University, and taught econometrics and statistics there for 13 years before joining SPSS. He designed and contributed to many features of SPSS Statistics and has consulted with and trained many users. He remains active on social media and in consulting.

Terry Taerum has fifteen years’ experience as a statistician at the University of Alberta, fifteen years as a data analyst at SPSS Inc., and five years as a predictive analyst and consultant with IBM Inc.

Credits

Project Editor

Tom Dinse

Technical Editors

Jon Peck

Terry Taerum

Production Editor

Dassi Zeidel

Copy Editor

Kim Cofer

Production Manager

Katie Wisor

Manager of Content Development & Assembly

Mary Beth Wakefield

Marketing Manager

Christie Hilbrich

Professional Technology & Strategy Director

Barry Pruett

Business Manager

Amy Knies

Executive Editor

Jim Minatel

Project Coordinator, Cover

Brent Savage

Proofreader

Nancy Carrasco

Indexer

Johnna VanHoose Dinse

Cover Designer

Wiley

Cover Image

iStock.com/agsandrew

Acknowledgments

Keith and Jesus are especially proud to have worked with Bob Elliot before he retired. Our good friend Dean Abbott recommended Keith to Bob when Bob was seeking out a follow up to Dean’s excellent Applied Predictive Analytics, but specifically in SPSS Statistics. Without both of them, this book would not have been created.

Terry’s and Jon’s contribution extended well beyond technical reviewing. We consider both of them mentors and friends. Jon took over technical reviewing when Terry took on a new role with a return to IBM. Jon, in particular, was an interlocutor and trusted advisor, and we produced a better book as a result.

Tom, our project editor, had to be patient with us. Deadlines slipped, contributors became unavailable, and Bob retired before the book was complete. Whenever it seemed that something wasn’t quite as it should be, it was often Tom that ultimately made it right. He deserves credit for multiple roles, and we thank him.

We would also like to thank all of the many SPSSers that we turn to when we have a question even if they haven’t heard from us in a while. We love the sense of community that we have all managed to maintain even when so many have moved on to other roles. And we thank Jason for capturing that sense of community in his foreword.

CONTENTS

Foreword

Introduction

The Audience for This Book

How This Book Is Organized

How to Use This Book

The Themes of the Book

Understanding the SPSS Bundles and the SPSS Modules

The New SPSS Subscription Bundles

What’s New in SPSS 23 and 24?

Part I: Advanced Statistics

Chapter 1: Comparing and Contrasting IBM SPSS AMOS with Other Multivariate Techniques

T-Test

Factor Analysis and Unobserved Variables in SPSS

AMOS

Chapter 2: Monte Carlo Simulation and IBM SPSS Bootstrapping

Monte Carlo Simulation

Monte Carlo Simulation in IBM SPSS Statistics

Creating an SPSS Model File

IBM SPSS Bootstrapping

Chapter 3: Regression with Categorical Outcome Variables

Regression Approaches in SPSS

Logistic Regression

Ordinal Regression Theory

Ordinal Regression Dialogs

Ordinal Regression Output

Categorical Regression Theory

Categorical Regression Dialogs

Categorical Regression Output

Chapter 4: Building Hierarchical Linear Models

Overview of Hierarchical Linear Mixed Models

Mixed Models…Linear

Mixed Models…Linear (Output)

Mixed Models…Generalized Linear

Mixed Models…Generalized Linear (Output)

Adjusting Model Structure

Part II: Data Visualization

Chapter 5: Take Your Data Visualizations to the Next Level

Graphics Options in SPSS Statistics

Understanding the Revolutionary Approach in

The Grammar of Graphics

Bar Chart Case Study

Bubble Chart Case Study

Chapter 6: The Code Behind SPSS Graphics: Graphics Production Language

Introducing GPL: Bubble Chart Case Study

GPL Help

Bubble Chart Case Study Part Two

Double Regression Line Case Study

Arrows Case Study

MBTI Bubble Chart Case Study

Chapter 7: Mapping in IBM SPSS Statistics

Creating Maps with the Graphboard Template Chooser

Chapter 8: Geospatial Analytics

Geospatial Association Rules

Case Study: Crime and 311 Calls

Spatio-Temporal Prediction

Case Study: Predicting Weekly Shootings

Chapter 9: Perceptual Mapping with Correspondence Analysis, GPL, and OMS

Starting with Crosstabs

Correspondence Analysis

Multiple Correspondence Analysis

Applying OMS and GPL to the MCA Perceptual Map

Chapter 10: Display Complex Relationships with Multidimensional Scaling

Metric and Nonmetric Multidimensional Scaling

Nonmetric Scaling of Psychology Sub-Disciplines

Multidimenional Scaling Dialog Options

Multidimensional Scaling Output Interpretation

Subjective Approach to Dimension Interpretation

Statistical Approach to Dimension Interpretation

Part III: Predictive Analytics

Chapter 11: SPSS Statistics versus SPSS Modeler: Can I Be a Data Miner Using SPSS Statistics?

What Is Data Mining?

What Is IBM SPSS Modeler?

Can Data Mining Be Done in SPSS Statistics?

Hypothesis Testing, Type I Error, and Hold-Out Validation

Significance of the Model and Importance of Each Independent Variable

The Importance of Finding and Modeling Interactions

Classic and Important Data Mining Tasks

Chapter 12: IBM SPSS Data Preparation

Identify Unusual Cases

Optimal Binning

Chapter 13: Model Complex Interactions with IBM SPSS Neural Networks

Why “Neural” Nets?

XOR Example Syntax

Neural Net Results with the XOR Variables

Comparing Regression to Neural Net with the Bank Salary Case Study

Chapter 14: Powerful and Intuitive: IBM SPSS Decision Trees

Building a Tree with the CHAID Algorithm

Review of the CHAID Algorithm

CRT for Classification

The Scoring Wizard

Chapter 15: Find Patterns and Make Predictions with K Nearest Neighbors

Using KNN to Find “Neighbors”

The Titanic Dataset and KNN Used as a Classifier

The Trade-Offs between Bias and Variance

Comparing Our Models: Decision Trees, Neural Nets, and KNN

Building an Ensemble

Part IV: Syntax, Data Management, and Programmability

Chapter 16: Write More Efficient and Elegant Code with SPSS Syntax Techniques

A Syntax Primer for the Uninitiated

The Case Study

Chapter 17: Automate Your Analyses with SPSS Syntax and the Output Management System

Overview of the Output Management System

Running OMS from Menus

Automatically Writing Selected Categories of Output to Different Formats

Suppressing Output

Working with OMS data

Running OMS from Syntax

Chapter 18: Statistical Extension Commands

What Is an Extension Command?

TURF Analysis—Designing Product Bundles

Quantile Regression—Predicting Airline Delays

Comparing Ordinary Least Squares with Quantile Regression Results

Support Vector Machines—Predicting Loan Default

Computing Cohen’s d Measure of Effect Size for a T-Test

EULA

List of Tables

Chapter 1

Table 1.1

Table 1.2

Table 1.3

Table 1.4

Chapter 3

Table 3.1

Table 3.2

Table 3.3

Chapter 11

Table 11.1

Table 11.2

Table 11.3

Chapter 13

Table 13.1

Table 13.2

Table 13.3

Chapter 14

Table 14.1

Chapter 15

Table 15.1

Chapter 18

Table 18.1

Table 18.2

List of Illustrations

Introduction

Figure I-1:

Example of version 24 custom table

Figure I-2:

The Exension Hub

Chapter 1

Figure 1.1

T-test dialog

Figure 1.2

T-test results

Figure 1.3

Chart Builder dialog

Figure 1.4

Chart results with regression lines added

Figure 1.5

General Linear Model menu options

Figure 1.6

Univariate dialog

Figure 1.7

Univariate options subdialog

Figure 1.8

ANCOVA results

Figure 1.9

Multivariate dialog

Figure 1.10

MANOVA Multivariate Tests

Figure 1.11

Additional MANOVA results

Figure 1.12

T-test dialog

Figure 1.13

MANCOVA Multivariate Tests results

Figure 1.14

MANCOVA Tests of Between-Subjects Effects results

Figure 1.15

MANCOVA Parameter Estimates

Figure 1.16

MANCOVA dialog with four covariates

Figure 1.17

Pillai’s Trace results

Figure 1.18

MANCOVA Between-Subjects Effects

Figure 1.19

MANCOVA Parameter Estimates

Figure 1.20

Alternate MANCOVA Multivariate Tests

Figure 1.21

Alternate MANCOVA Parameter Estimates

Figure 1.22

Factor Analysis menu

Figure 1.23

Factor Analysis dialog

Figure 1.24

Extraction subdialog

Figure 1.25

Factor Analysis results

Figure 1.26

An AMOS model similar to our MANCOVA

Figure 1.27

The AMOS interface

Figure 1.28

An AMOS version of our factor analysis

Figure 1.29

Our “General Model”

Figure 1.30

Estimating a verbal effect using regression

Figure 1.31

Estimating a combined score effect using regression

Figure 1.32

General Model with S/N

Figure 1.33

The View menu

Figure 1.34

Outline pane of the AMOS Text Output

Figure 1.35

Top portion of Model Fit Summary

Figure 1.36

RMSEA results

Figure 1.37

Hoelter results

Figure 1.38

General Model with S/N and sex

Figure 1.39

General Model with S/N and rank

Figure 1.40

Dual causality

Figure 1.41

Our best model

Chapter 2

Figure 2.1

Simulation: Model Source dialog

Figure 2.2

Fans dataset

Figure 2.3

Completed Linear Regression dialog

Figure 2.4

Completed Linear Regression: Save dialog

Figure 2.5

Linear regression results

Figure 2.6

Simulated Fields panel

Figure 2.7

Fit Details dialog

Figure 2.8

Model tab

Figure 2.9

Correlations panel

Figure 2.10

Advanced Options panel

Figure 2.11

Density Functions panel

Figure 2.12

Output panel

Figure 2.13

Save panel

Figure 2.14

Model Type table

Figure 2.15

Input Distributions table

Figure 2.16

Correlations table

Figure 2.17

Stopping Criteria table

Figure 2.18

Simulation Summary table

Figure 2.19

Descriptive Statistics of Scale Targets table

Figure 2.20

Descriptive Statistics of Scale Inputs table

Figure 2.21

Correlations table

Figure 2.22

Probability Density chart

Figure 2.23

Chart Options dialog

Figure 2.24

Edited Probability Density chart

Figure 2.25

Tornado chart

Figure 2.26

Default frequencies report

Figure 2.27

The Frequencies menu with the Bootstrap submenu

Figure 2.28

The Bootstrap submenu

Figure 2.29

Frequency table with bootstrap results

Figure 2.30

Split File menu

Figure 2.31

Frequency table with bootstrap results and with a split applied

Figure 2.32

Descriptives table with bootstrap results

Figure 2.33

Regression coefficients with standard confidence interval

Figure 2.34

Regression coefficients with bootstrap confidence intervals

Chapter 3

Figure 3.1

Ordinal Regression dialog

Figure 3.2

Options dialog

Figure 3.3

Distribution of the satisfied variable

Figure 3.4

Output dialog

Figure 3.5

Location dialog

Figure 3.6

Scale dialog

Figure 3.7

Warning of cells with frequency of zero

Figure 3.8

Case Processing Summary table

Figure 3.9

Model Fitting Information table

Figure 3.10

Goodness-of-Fit table

Figure 3.11

Pseudo R-Square table

Figure 3.12

Parameter Estimates table

Figure 3.13

Test of Parallel Lines table

Figure 3.14

Crosstab between actual and predicted outcomes

Figure 3.15

Categorical Regression dialog

Figure 3.16

Define Scale dialog

Figure 3.17

Discretization dialog

Figure 3.18

Missing Values dialog

Figure 3.19

Options dialog

Figure 3.20

Regularization dialog

Figure 3.21

Output dialog

Figure 3.22

Save dialog

Figure 3.23

Plots dialog

Figure 3.24

Case Processing Summary table

Figure 3.25

Model Summary table

Figure 3.26

ANOVA table

Figure 3.27

Coefficients table

Figure 3.28

Correlations and Tolerance table

Figure 3.29

Quantifications table: recommend

Figure 3.30

Quantifications table: satisfied

Figure 3.31

Quantifications table: valuable

Figure 3.32

Quantifications table: when_purchased

Figure 3.33

Transformation plot: recommend

Figure 3.34

Transformation plot: satisfied

Figure 3.35

Transformation plot: valuable

Figure 3.36

Transformation plot: when_purchased

Chapter 4

Figure 4.1

Merchandise sales data

Figure 4.2

Analyze ➪ Mixed Models menu options

Figure 4.3

Specify Subjects and Repeated dialog

Figure 4.4

Linear Mixed Models dialog

Figure 4.5

Fixed Effects dialog

Figure 4.6

Random Effects dialog

Figure 4.7

Random Effects dialog

Figure 4.8

Estimation dialog

Figure 4.9

Statistics dialog

Figure 4.10

EM Means dialog

Figure 4.11

Save dialog

Figure 4.12

Model Dimension table

Figure 4.13

Information Criteria table

Figure 4.14

Type III Tests of Fixed Effects table

Figure 4.15

Estimates of Fixed Effects table

Figure 4.16

Estimates of Covariance Parameters table

Figure 4.17

Estimates of Covariance Parameters table for a null model

Figure 4.18

Data Structure dialog

Figure 4.19

Fields & Effects: Target dialog

Figure 4.20

Fields & Effects: Fixed Effects dialog

Figure 4.21

Fields & Effects: Random Effects dialog

Figure 4.22

Model Summary

Figure 4.23

Data Structure

Figure 4.24

Predicted by Observed

Figure 4.25

Fixed Effects (diagram)

Figure 4.26

Fixed Effects (table)

Figure 4.27

Fixed Coefficients (diagram)

Figure 4.28

Fixed Coefficients (table)

Figure 4.29

Covariance Parameters

Figure 4.30

No random effects

Figure 4.31

Model Summary

Figure 4.32

Covariance Parameters

Figure 4.33

Fixed Coefficients

Chapter 5

Figure 5.1

Graphs menu

Figure 5.2

Legacy Bar Charts menu

Figure 5.3

Chart Builder main menu

Figure 5.4

Basic Elements submenu

Figure 5.5

Graphboard Template Chooser main menu

Figure 5.6

Graphboard Template Chooser Basic tab

Figure 5.7

Graphboard Template Chooser fields specified

Figure 5.8

Detailed Tab

Figure 5.9

Bar chart

Figure 5.10

Graphboard Editor

Figure 5.11

Regions sorted

Figure 5.12

: Region: Range as summary

Figure 5.13

Bubble Chart Detailed tab

Figure 5.14

Bubble Chart

Figure 5.15

Edited Bubble Chart

Chapter 6

Figure 6.1

Chart Builder Gallery tab

Figure 6.2

Preview of grouped scatterplot

Figure 6.3

Groups/Point ID tab

Figure 6.4

Element Properties

Figure 6.5

Bubble plot

Figure 6.6

Help and Reference options

Figure 6.7

Bubble plot with changes

Figure 6.8

Bubble plot with red and blue states

Figure 6.9

Bubble plot with bands

Figure 6.10

Bubble plot with polygon

Figure 6.11

PainTreat data file

Figure 6.12

Scatterplot between pain and physical therapy during the first time period

Figure 6.13

Double regression line

Figure 6.14

Preview of scatterplot with panel variable

Figure 6.15

Change in pain by drug treatment

Figure 6.16

MBTI bubble chart

Figure 6.17

Preview of bubble plot

Figure 6.18

Relationship between class rank and SAT scores and MBTI results

Chapter 7

Figure 7.1

Worldwide sales data

Figure 7.2

Bar chart of customer location

Figure 7.3

Map of customer locations

Figure 7.4

One variable selected

Figure 7.5

Select Maps dialog

Figure 7.6

Completed Detailed tab

Figure 7.7

Choropleth of Counts

Figure 7.8

Two categorical variables selected

Figure 7.9

Choropleth of Values

Figure 7.10

Pie of counts on a map

Figure 7.11

One categorical and one continuous variable selected

Figure 7.12

Choropleth of Sums

Figure 7.13

Two categorical and one continuous variable selected

Figure 7.14

Bars on a Map

Figure 7.15

Two continuous variables selected

Figure 7.16

Coordinates on a Reference map

Figure 7.17

Two continuous variables and one categorical variable selected

Figure 7.18

Coordinates on a Choropleth of Counts map

Figure 7.19

Four continuous variables selected

Figure 7.20

Arrows on a Reference map

Chapter 8

Figure 8.1

Opening the Geospatial Modeling Wizard

Figure 8.2

Adding map data

Figure 8.3

Assigning context and prediction data

Figure 8.4

Associating map data

Figure 8.5

Associating fields from data to the map

Figure 8.6

Assigned geospatial coordinates

Figure 8.7

Setting the coordinate system

Figure 8.8

Setting the prediction variables

Figure 8.9

Setting the condition variables

Figure 8.10

Setting output for geospatial association rules

Figure 8.11

Setting the rules

Figure 8.12

Defining bins

Figure 8.13

Rule 14: Theft F/Auto and Graffiti > 3

Figure 8.14

Rule 32: Assault w/Dangerous Weapon

Figure 8.15

Setting target and predictor fields for spatio-temporal modeling

Figure 8.16

Setting the time intervals

Figure 8.17

Setting the output

Figure 8.18

Scoring a separate file

Figure 8.19

Regression coefficient tables

Figure 8.20

Predictions of future shootings

Chapter 9

Figure 9.1

Culture and sport perceptual map

Figure 9.2

Crosstabs main menu

Figure 9.3

Crosstabs: Statistics submenu

Figure 9.4

Crosstabs results

Figure 9.5

Crosstabs Cell Display submenu

Figure 9.6

Crosstabs Style submenu

Figure 9.7

Crosstabs results (with highlighting style)

Figure 9.8

Dimension Reduction menu

Figure 9.9

Correspondence Analysis

Figure 9.10

Initial attempt of the perceptual map

Figure 9.11

Correspondence Analysis Dimension Summary

Figure 9.12

Improved perceptual map

Figure 9.13

OMS Control Panel

Figure 9.14

Variable View of OMS results

Figure 9.15

Modified Data View of OMS results

Figure 9.16

Correspondence Analysis dimension summary

Figure 9.17

Perceptual map with GPL modifications

Figure 9.18

Correspondence Analysis dimension summary

Figure 9.19

Correspondence Analysis dimension summary

Figure 9.20

Optimal Scaling submenu

Figure 9.21

Multiple Correspondence Analysis main menu

Figure 9.22

Variable Plots submenu

Figure 9.23

Very crowded Joint Category Plot

Figure 9.24

Discrimination Measures Plot

Figure 9.25

Discrimination Measures table

Figure 9.26

Sorting the mean discrimination measures

Figure 9.27

Draft map with few variables

Figure 9.28

MCA Output submenu

Figure 9.29

Coordinates (partial)

Figure 9.30

OMS Control Panel

Figure 9.31

Dataset produced by OMS

Figure 9.32

OMS Control Panel

Figure 9.33

MCA perceptual map using OMS and GPL

Figure 9.34

MCA version of Figure 9.12

Chapter 10

Figure 10.1

Object points plot

Figure 10.2

Dissimilarity matrix of psychology data

Figure 10.3

Ice cream preference data

Figure 10.4

Proximities matrix ice cream preference data

Figure 10.5

Analyze ➪Scale menu options

Figure 10.6

Multidimensional Scaling: Data Format dialog

Figure 10.7

Multidimensional Scaling: (Proximities in Matrices Across Columns) dialog

Figure 10.8

Multidimensional Scaling: Model dialog

Figure 10.9

Multidimensional Scaling: Restrictions dialog

Figure 10.10

Multidimensional Scaling: Options dialog

Figure 10.11

Multidimensional Scaling: Plots dialog

Figure 10.12

Multidimensional Scaling: Output dialog

Figure 10.13

Scree plot of normalized raw stress

Figure 10.14

Stress and Fit Measures table displaying results for a one-dimensional solution

Figure 10.15

Stress and Fit Measures for three multidimensional scaling solutions

Figure 10.16

Stress decomposition table

Figure 10.17

Coordinates for a two-dimensional solution

Figure 10.18

Object points plot

Figure 10.19

Plot of the actual and transformed proximities (two-dimensional solution)

Figure 10.20

Residuals plot of distances

Figure 10.21

Dimension coordinates and aggregated mean rating on additional scales

Figure 10.22

Modified correlation procedure syntax

Figure 10.23

Correlations between dimensions and aggregated mean rating scales

Chapter 11

Figure 11.1

An SPSS Modeler “stream”

Figure 11.2

Stream with model

Figure 11.3

ANOVA results showing a significant difference

Figure 11.4

Post hoc results table showing a variety of test results

Figure 11.5

A Decision Tree

Figure 11.6

Two very different slopes

Figure 11.7

A closer look at the stream

Figure 11.8

A stream shown on the full canvas area

Figure 11.9

Partition node settings

Figure 11.10

Result set

Figure 11.11

Feature Selection node and model added to stream

Figure 11.12

0 screened fields

Figure 11.13

Distribution node results

Figure 11.14

Generated Balance node calculation

Figure 11.15

Stream with two models added

Figure 11.16

Analysis node results

Figure 11.17

Ensemble methods

Figure 11.18

Comparing models to ensemble results

Figure 11.19

Stream with scoring of test.csv added

Figure 11.20

Scoring results for 10 passengers

Chapter 12

Figure 12.1

Data menu

Figure 12.2

Identify Unusual Cases: Variables dialog

Figure 12.4

Identify Unusual Cases: Save dialog

Figure 12.5

Identify Unusual Cases: Missing Values dialog

Figure 12.6

Identify Unusual Cases: Options dialog

Figure 12.7

Case Processing Summary table

Figure 12.8

Anomaly Case Index List table

Figure 12.9

Anomaly Case Peer ID List table

Figure 12.10

Anomaly Case Reason List table

Figure 12.11

Scale Variable Norms table

Figure 12.12

Categorical Variable Norms table

Figure 12.13

Anomaly Index Summary table

Figure 12.14

Reason 1 table

Figure 12.15

Sorting data

Figure 12.16

New variables sorted

Figure 12.17

Transform menu

Figure 12.18

Optimal Binning: Variables dialog

Figure 12.19

Optimal Binning: Output dialog

Figure 12.20

Optimal Binning: Save dialog

Figure 12.21

Optimal Binning: Missing Values dialog

Figure 12.22

Optimal Binning: Options dialog

Figure 12.23

Descriptive Statistics table

Figure 12.24

Model Entropy table

Figure 12.25

Binning summary table

Figure 12.26

Logistic Regression dialog

Figure 12.27

Variables in the Equation table

Chapter 13

Figure 13.1

An illustration of a perceptron

Figure 13.2

A flat regression line

Figure 13.3

Two regression lines

Figure 13.4

Displaying interaction

Figure 13.5

An illustration of a multilayer perceptron

Figure 13.6

The neural net “topology”

Figure 13.7

Parameter Estimates for the neural net

Figure 13.8

Alternative weights from a second neural net

Figure 13.9

Topology with outcome declared as nominal

Figure 13.10

Parameter Estimates with outcome declared as nominal

Figure 13.11

Training submenu

Figure 13.12

Options submenu

Figure 13.13

The Selection Variable option in Linear Regression

Figure 13.14

Set Rule submenu

Figure 13.15

Complete Regression Output

Figure 13.16

Multilayer Perceptron main menu

Figure 13.17

Partitions submenu

Figure 13.18

Network topology diagram

Figure 13.19

Comparing performance

Figure 13.20

Regression results with interaction terms

Figure 13.21

Updated results with three models compared

Figure 13.22

Results for a more complex regression

Figure 13.23

Topology diagram for the more complex neural net

Figure 13.24

Comparing five models

Figure 13.25

Variables selected for the neural net

Figure 13.26

Neural net topology diagram

Figure 13.27

Classification accuracy results

Figure 13.28

Adding additional variables

Figure 13.29

The Save submenu

Figure 13.30

Model accuracy for the second attempt

Chapter 14

Figure 14.1

Decision tree main menu

Figure 14.2

Validation submenu

Figure 14.3

Training Sample tree

Figure 14.4

Test Sample tree

Figure 14.5

Overall accuracy results

Figure 14.6

Crosstab results for Sex variable

Figure 14.7

Crosstab results for Pclass variable

Figure 14.8

Crosstab showing all three variables

Figure 14.9

Decision tree criteria

Figure 14.10

Training tree after changing settings

Figure 14.11

Accuracy results for the larger tree

Figure 14.12

Decision tree main menu

Figure 14.13

Intial CRT tree

Figure 14.14

Accuracy results for CRT tree

Figure 14.15

Pruning criteria submenu

Figure 14.16

Second CRT tree

Figure 14.17

Second CRT tree accuracy results

Figure 14.18

Using a random assignment

Figure 14.19

Results using the random assignment

Figure 14.20

Scoring Wizard first menu

Figure 14.21

Scoring Wizard second menu

Figure 14.22

Scoring Wizard third menu

Figure 14.23

Scoring Wizard fourth menu

Figure 14.24

Predictive scores for some passengers in the Test dataset

Chapter 15

Figure 15.1

Nearest Neighbor Analysis main menu

Figure 15.2

Partitions submenu

Figure 15.3

Mr. Svensson as focal record

Figure 15.4

Mr. Svensson’s neighbors

Figure 15.5

Mr. Svensson’s Peers Chart

Figure 15.6

Model accuracy

Figure 15.7

Neighbors submenu

Figure 15.8

Optimal value for k

Figure 15.9

Results for k=6

Figure 15.10

Results for k=4

Figure 15.11

Output submenu

Figure 15.12

Output submenu

Figure 15.13

Output submenu

Figure 15.14

Comparing the models with Descriptives

Figure 15.15

Descriptives with Bootstrapping

Figure 15.16

All four models compared

Chapter 16

Figure 16.1

Frequencies main dialog

Figure 16.2

Resulting Syntax in the Syntax Editor

Figure 16.3

Syntax Help

Figure 16.4

Frequencies command in the Syntax editor

Figure 16.5

Help menu showing the Command Syntax Reference

Figure 16.6

Frequencies dialog and Frequencies commands

Figure 16.7

Charts subdialog

Figure 16.8

Main dialog and Cell Display subdialog

Figure 16.9

Three CROSSTABS examples

Figure 16.10

Data View (above) and Variable View (below)

Figure 16.11

Define Variable Properties dialog

Figure 16.12

Declaring the Sales_Amount variable

Figure 16.13

Adding a Value Label to Category_Code

Figure 16.14

Pasted code from the Define Variable Properties dialog

Figure 16.15

Value Labels with additional category codes

Figure 16.16

A few rows of address information in the customer data

Figure 16.17

Type and Label subdialog

Figure 16.18

The STRING command

Figure 16.19

Code examples using scratch variables

Figure 16.20

City names in mixed case

Figure 16.21

City names in descending case

Figure 16.22

A few rows of the transactional dataset

Figure 16.23

First screen of the Restructure Wizard

Figure 16.24

Second screen of the Restructure Wizard

Figure 16.25

The Utilities Menu

Figure 16.26

Count Values within Cases menu option

Figure 16.27

RECODE command in Syntax Help

Figure 16.28

First screen of Add Variables

Figure 16.29

Second screen of Add Variables

Chapter 17

Figure 17.1

Utilities menu options

Figure 17.2

Output Management System Control Panel dialog

Figure 17.3

Completed Output Management System Control Panel dialog

Figure 17.4

Outline pane options

Figure 17.5

OMS label added

Figure 17.6

OMS: Options dialog

Figure 17.7

Output destination added

Figure 17.8

New OMS request added

Figure 17.9

Second OMS request added

Figure 17.10

OMS Control Panel: Summary dialog

Figure 17.11

Crosstabs dialog

Figure 17.12

Output without Case Processing Summary table

Figure 17.13

Bivariate Correlations dialog

Figure 17.14

Traditional correlations output

Figure 17.15

Manipulated correlations syntax

Figure 17.16

Manipulated correlations output

Figure 17.17

Ending OMS requests

Figure 17.18

OMS Control Panel: Summary dialog

Figure 17.19

Correlations dataset

Figure 17.20

Select Cases: If dialog

Figure 17.21

Descriptives dialog

Figure 17.22

Average correlation

Figure 17.23

OMS syntax

Figure 17.24

OMS Identifiers dialog

Chapter 18

Figure 18.1

The menus show the installed extensions. Extension commands have a white “+” icon.

Figure 18.2

Available extensions are listed on the website.

Figure 18.3

Two preference sets

Figure 18.4

Computing reach manually

Figure 18.5

Maximum Group Size: 1. Reach and Frequency

Figure 18.6

The TURF dialog box

Figure 18.7

Maximum Group Size: 2. Reach and Frequency

Figure 18.8

Maximum Group Size: 4. Reach and Frequency

Figure 18.9

Effect of group size

Figure 18.10

Arrival delays by airport

Figure 18.11

Arrival Delays by Airport

Figure 18.12

Regression residuals histogram

Figure 18.13

OLS vs. QR coefficients for month

Figure 18.14

Residual correlations

Figure 18.15

Q-Q plot of OLS against QUANTREG residuals

Figure 18.16

QR coefficients by quantile

Figure 18.17

Discriminant classification

Figure 18.18

Logistic regression classification

Figure 18.19

SVM classification

Figure 18.20

SVM parameter tuning

Figure 18.21

Entering grid search parameters

Figure 18.22

SVM Parameter tuning with two parameters

Figure 18.23

SVM classification result with tuning

Figure 18.24

SVM classification results with weighting

Figure 18.25

The SVM dialog

Figure 18.26

The T-Test dialog

Figure 18.27

The T-TEST output

Figure 18.28

The Python plugin code for Cohen’s d

Figure 18.29

The Calculate with a Pivot Table dialog

Figure 18.30

The TABLE CALC syntax

Figure 18.31

The modified T-TEST output

Guide

Cover

Table of Contents

1

Pages

vii

ix

xi

xiii

xxiii

xxiv

xxv

xxvi

xxvii

xxviii

xxix

xxx

xxxi

xxxii

xxxiii

xxxiv

xxxv

xxxvi

xxxvii

xxxviii

1

2

1

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

26

28

29

30

31

32

33

34

35

36

37

38

40

41

43

44

45

46

47

48

49

50

52

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

90

91

92

93

94

95

96

97

98

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

143

144

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

179

180

181

182

184

185

186

187

188

189

191

193

194

195

196

197

198

199

200

202

203

204

205

206

207

208

209

210

211

212

213

214

215

217

218

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

242

243

244

245

246

249

250

251

252

253

254

256

257

258

259

260

261

262

263

264

265

266

267

268

269

271

272

273

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

291

292

293

294

295

296

297

298

299

300

301

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

337

338

339

340

341

342

343

344

345

346

347

349

351

352

353

355

356

357

358

359

360

361

362

363

364

365

366

368

369

371

372

373

374

375

376

377

378

379

380

381

382

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

459

460

461

462

463

464

465

466

467

468

469

470

471

Foreword

In my various roles at SPSS and IBM I met Keith and Jesus many years ago. They both have over 20 years of statistical consulting experience, and they both have been training people on statistics and how to use SPSS for many years. Each has in fact trained thousands of students. They are uniquely qualified to bring the message and content of this book to you, and they have done so with rigor and grace. SPSS has so many techniques and procedures to perform both simple and complex analysis, and Keith and Jesus will introduce you to this rich tapestry so that it pays dividends in benefiting your endeavors in driving societal change based on data and analytics for years to come. This book goes beyond the elementary treatments found in most of the other books on SPSS Statistics but is written for users who do not necessarily have an advanced statistical background. It can make the reader a better analyst by expanding their toolkit to include powerful techniques that he or she might not otherwise consider but that can have a big payoff in increased insight.

Keith and Jesus’ outstanding new book on SPSS Statistics has brought back so many thoughts about this great product and the influence it has had on so many people that I thought I would briefly reminisce.

I first became involved with this software when I went to work for SPSS in 1995 as Director of Quality Assurance. A year earlier, SPSS had released its first Microsoft Windows product—which, while solid, did not really take advantage of the amazing possibilities a true graphical interface could provide. This was a huge and important time for the company as the SPSS team was hard at work revolutionizing both the front-end user interface and the output to create a standard that is still in place and considered best of breed today. These innovations enabled sophisticated pivot table output as well as much more customized graphical output than had ever been attempted before. Indeed, in the years to come it was that spirit of always getting ahead of every technological trend that would keep this software right in the heart of what the data analysis community demanded.

When I say the heart of the data analysis community I am not in any way exaggerating. This software has been used by hundreds of thousands of students in college and graduate school and by similar numbers in government and commercial environments worldwide. Over the years I have literally had hundreds, if not thousands of people say to me “I used SPSS in college” when I introduced myself. And of course, I can’t leave out the bootleg copies I have seen in innumerable places during my travels and personally purchased on the streets of Santiago and Beijing.

Impressive? Absolutely. But of course the real question is … WHY is SPSS so heavily used and so well loved? WHY has its community of users stayed vibrant and loyal even eight years after the company itself was acquired by IBM?

The answer is the combination of power and simplicity combined with elegance. This is a big statement. To back this up—and apropos of the subject matter—I’ll contribute a data point as my best evidence. A few years ago, when I was still with IBM (which acquired SPSS in 2009), we hired a summer intern who had used our software for a semester in college. After about a month on the job, we debriefed her on the progress of her user interface design assignment. She discussed at length the challenges she was having coming up with a design that was up to the standard of the rest of the product in terms of simplicity, backed by immense power. This led to a discussion of the first time she used the product as a student. Of course, opening a “statistics” product for the first time filled this iPhone-using millennial with much trepidation; however, as she described to us within just a few minutes she was loading and manipulating data, building predictive models, and producing output for her class. In just a short time beyond that she was digging into the depths of some of the power the product provided. Even a user nearly born and bred with the beautiful user designs of the smartphone consumer era was right at home using SPSS. What an amazing statement in and of itself. Think about it! This is made even more extraordinary because this same student had interactions with professors and researchers on her campus who were using—in fact, relying on—that very same product to do their cutting-edge work. As I said, the answer is the combination of power and simplicity combined with elegance.

This amazing simplicity does not come at the expense of power. As Keith and Jesus make clear in this book, SPSS Statistics is an incredibly powerful tool for data analysis and visualization. Even today there is no tool that works with its users of any level (novice, intermediate, or expert) to uncover meanings and relationships in data as powerfully as SPSS does. Further, once the data has been prepared, the models built, and the analysis done, there is no software available that is better at explaining the results to non-data analysts who have to act on it. This increases the value of the tool immeasurably—since it creates the understanding and confidence to deploy its insights into the real world to create real value. Having seen this done so many times, by so many people, in so many domains, I can say to those starting with this product for the first time that I truly envy you—you are about to start on a journey of learning and getting results that will amaze you—and the people you work with.

Let’s put this all in perspective. This product is now in its sixth decade of existence. That’s right—it first came out in the late 1960s. How many products can you name that have survived and prospered for that long? Not many. The Leica M camera and the Porsche 911 car with their classic timeless designs come to mind, but not much else. How many COMPUTER products? Even less; perhaps only the venerable IBM mainframe, in fact. But here we have IBM SPSS Statistics—not only surviving but still as relevant and vital as ever—right in the midst of the new age of big data and machine learning, heavily used by experts who dig deep into data and model building, but usable by novices in the iPhone era as well.

Now, let us switch our focus from celebrating the vibrancy and staying power of the SPSS journey and into the heart of what Keith and Jesus have addressed in this book. This is first and foremost a book for data analysis practitioners at intermediate and advanced levels. The question this begs is how this product can help that audience create the most value in the modern era.

Unlike the world of the late 1960s when SPSS was created, we now live in an age where there are many tools to do quick and fast analysis of datasets. For example, Tableau is a fine tool for more business-oriented users with less data analysis training to get immediate and useful visual insights from their data. So what then is the need for IBM SPSS Statistics in this new world?

To answer that question, let me take you back several years to a conference called “MinneAnalytics,” sponsored by a Minnesota-based organization of analytic professionals, where I delivered a presentation on Advanced Analytics called “What’s Your World View?” In that presentation, I envisioned a rapidly approaching new age where “big data” would meet advanced analytic techniques running in real time and that combination would drive every decision- making aspect of how our society would work. I compared the importance of this movement to previous huge steps that changed the very foundation of society—including the invention of the automobile and the invention of assembly-line production for manufacturing many different types of goods.

Well, a mere three years later that “future” society is here already—right now. It is happening all around us. Analytics on big data is driving decision making and processes everywhere you look. Hospitals apply real-time analytics to data feeds from patient-monitoring instruments in intensive care units to message doctors automatically that their patient in the ICU will shortly take a turn for the worse. Firms managing trucking use analytics to intervene proactively when the system tells them one of their drivers is predicted to have an accident. Airplanes and cars apply real-time analytics to engine sensors to predict failure and inform the pilots and drivers to take action before such failure occurs. Indeed, big data analytics has become one of the most disruptive forces in business history and is unleashing new value creation quite literally wherever you look. All of these examples clearly show a fundamental point—quick visual understanding is one thing—but deep insight yielding confidence in a predictive model that is deployed in real time at critical decision points at vast scale is quite another. It is in this realm of confirmation and confidence that SPSS Statistics shines like no other.

Mass deployment of advanced analytics will create benefits for society that are for all intents and purposes unimaginable. Assuming, of course, that the deployed analytics are in fact correct (and with the right tweaking and trade-offs between accuracy and stability) and deployed properly. It is the almost unique benefit of SPSS that no matter what language in which those analytics are built (SPSS, R, Python, supervised or unsupervised, standard or machine learning, executed programmatically or through visual interfaces, or any other variant you can think of) the product can be used to confirm confidence that the desired results will be achieved, and in understanding the risks involved. It can also be used to explain the results to others in the enterprise, aligning those who need to be in the know on exactly and precisely how analytics drive their new business models. There is no better “hub” for data scientists to practice their craft and contribute their value to the creation of a new world—a new world of staggering rates of change guided or driven by data and analytics.

IBM SPSS Statistics is the perfect tool for this new world when used by well-trained analysts who can put all the data and all the insights together without mistakes to create the most value. People who can take the output of machine learning, add traditional data and then other new forms of data (like sensors and social media for example), to get insights well beyond those quick insights from Tableau and other surface-level tools. People who know how to use the advanced capabilities of the tool, such as the ability to do mixed model analysis of data at different levels (for example, within a hierarchy to find even deeper insights). Such a tool, in the hands of such people—well-trained data scientists—can drive us into this new remarkable world with both confidence and safety. To become one of those who drive this societal transformation using SPSS you can benefit from having this book as your guide.

Enjoy the book…and enjoy the next 50 years of IBM SPSS Statistics as well!

 —  Jason Verlen

Jason Verlen is currently Senior Vice President of Product Management and Marketing at CCC Information Services, based in Chicago. Before moving to CCC he spent 20 years at SPSS and then IBM (after its acquisition of SPSS) in various roles ending with being named Vice President of Big Data Analytics at IBM.

Introduction

This book is a collaboration between me (Keith) and several other career-long “SPSSers,” and the editorial decisions about what to cover, and how to cover it, are greatly affected by that fact. My own career took a turn down a road that led to a life of learning, teaching, and consulting about SPSS almost 20 years ago. I was contemplating a PhD in Psychometrics at the University of North Carolina, Chapel Hill. My plans didn’t get much further than auditing some prerequisites and establishing residency. So, on paper, I hadn’t made much progress, but moving 1000 miles (from Massachusetts) to relocate and purchasing a house represented a milestone in my life and career. I’m still in that same house (more than 22 years now), and I’m still using SPSS almost daily. Like many things in life, it seems almost accidental. I was doing contract statistics work using SPSS, working from home while I planned for a life in graduate school, and I drove up to Arlington, VA to take advantage of what SPSS training then called the training “subscription.”

The concept was to take as many classes as you can manage in a year. It was remarkably cost effective. I was able to convince my primary contract client to pay for the subscription under the condition that I covered all other expenses, and didn’t let it affect my deadlines. I already had several years of daily SPSS use under my belt, so I was hardly a rookie, but it was too good to pass up. I found a summer sublet in Washington, DC, took advantage of the training classes almost daily for a couple of months, learned all the latest features, learned about modules that I had never tried, made some good new friends, and worked late into the evening trying to keep my contract research work on schedule. Then suddenly I was asked if I wanted to relocate and take on teaching the basic classes in that same office. I declined the full-time position (the grad school idea was still alive), but I did start making occasional trips. Within a year they were frequent trips, and it became effectively full time, including training trips all over the United States and Canada.

A bit of nostalgia, perhaps, but there is a good reason to reflect on that time period in SPSS Inc.’s history. As Jason Verlen notes in his foreword to this book, the mid to late ’90s was a pivotal time in the development of SPSS. With Windows 95 came a whole new world, and SPSS Inc. leaped into the fray. Also, in the late ’90s, SPSS Inc. bought ISL, and with it, Clementine. The revolutionary software package then became SPSS Clementine, and is now called IBM SPSS Modeler. While this book is dedicated to SPSS Statistics and not SPSS Modeler, my career certainly was never quite the same since. Although that was the acquisition that most influenced my career, it was certainly not the only one. There were numerous acquisitions during that period, growing the SPSS family to include products like AMOS, SPSS Data Collection, and Showcase.

It was also a bit of a golden age in SPSS training. Almost 20 of us offered SPSS training frequently. On any given day, there were at least a couple of SPSS training events being held in one of several cities that had permanent full-time SPSS training facilities. Traveling to public training was common then—online training hadn’t yet arrived. It simply was how training was done. In light of this very active, live, corporate-managed, instructor-led training economy more than 30 distinct classes were offered that represented 50–60+ days of training content. It took me three years before I found myself teaching 80% of them, and even longer before I taught all of them. Classroom training was seen as a key way to support the user community, so even classes that were infrequent, and therefore not very profitable, were still scheduled to support the product. Everything changes over time, and certainly traveling cross-country to a corporate training center for 5 continuous days of training, with a stack of huge books, along with 16 strangers from other companies seems quaint now.

For all of us who experienced it as trainers and participants, however, we are forever changed. One of the things that always struck me, and that still knocks me off my feet, was that the 32 books we used were not enough! SPSS had so many great new features coming out with each new version that it was hard to keep up, even though we were in the classroom three-quarters of the time. The Arlington office frequently had another trainer teaching in a room next door, so we would have lunch together, and admit to each other that we had left ourselves with a few too many pages for day three. Day three! And that was just the Regression class! We’d sometimes lament that someone had shown up for a class, but had skipped one or more of the three prerequisites. Can you imagine? Seven days of prerequisites to take a training class! It just wouldn’t work to require that many days now, but we worked hard, and covered a lot of ground, and we went through all the software output, step by step. Then we would make a change to the model, or respond to an audience question, and go through the entire output again, step by step. Go ahead and admit it—if you are like us it probably sounds great. And it was.

My friend and coauthor Jesus Salcedo had a similar experience, and in those same classrooms. He also had an interest in psychometrics, except that he actually acted on his interest and earned his PhD. We met in the very busy New York City SPSS Training office when I was sent there as a contractor during his tenure. He was the full-time trainer in that office. We’d often chat about our favorite course guides (and least favorite) and became friends over an occasional shared meal in that Empire State Building office, or nearby in New York’s Koreatown neighborhood. So, the perspective that we both start with is that SPSS is a big topic, a worthy topic, and frankly, a sometimes intimidating topic. We still feel this way today. There is so much to learn that we struggle to keep up with everything new. At a consultancy where we worked for a time as a team, we put together a series of monthly seminars that proved to us again that there was always something new to learn. Each and every month, we discovered new features when we were preparing for a new topic. So tens of thousands of training hours later, we still learn something new all the time.

Of course, we aren’t asked to really show what we can do as often as we used to be. The reason, of course, is that training these days is rushed. We are often asked to cover two days’ worth of information in just one, or five days’ in just two, or ten days’ in just four. It happens all the time. We are pros, and we do as we are asked, but we know, we really know, that to do a proper job it takes more time. The book market is flooded with rookie SPSS books. The more advanced books tend to be more advanced in the theory, but not at all advanced in the practice of using SPSS, its efficient use, or the sophisticated use of its features. A major motivation in writing this book is the loss of organizational memory that has occurred since in-depth specialized SPSS training courses have started to disappear over the last ten years.

So, with this book, we get to call the shots, and what we are trying to offer all of you is a chance to learn some intermediate to advanced topics thoroughly enough that you will be tempted to use them yourself, very possibly for the first time. We don’t try to cover every topic—barely two dozen out of a hundred that we could have chosen, in fact. This is not at all encyclopedic. It certainly is also not a book-length treatment on a single subject. It gives you a taste of what attending one of our classes 15 years ago might have been like—a couple hours’ worth on each of several interesting, powerful topics that you might not even know existed.

The Audience for This Book

We think that this book fills an important niche. Books on the fundamentals of using SPSS Statistics are not in short supply. There are certainly dozens of them. Some are better than others. Naturally, we are proud of our own contribution: IBM SPSS Statistics For Dummies, 3rd Edition (Wiley, 2011). However, this book is certainly not a book about the fundamentals of settings up SPSS properly, or running routine statistics like T-tests or Chi-Square. Nor is this book a good choice for reviewing Statistics 101. Knowledge of topics like Ordinary Least Squares regression and ANOVA is assumed.

Since beginning the quest to contribute something we felt was new and needed for the SPSS Statistics community, Jesus Salcedo and I have consistently thought of the same audience. We have imagined the intermediate-level practitioner, perhaps relatively new or perhaps even a long-time user of SPSS, who is stuck in a rut. We imagine ourselves in a sense. If it wasn’t for our training careers, forcing us to learn the new features as soon as they come out, we probably wouldn’t be familiar with all of the techniques in this book. We use the shortcuts because we are active in the corporate community of SPSS, yet we meet veteran users all the time who don’t even know they exist. We have our own personal favorite techniques, tips, and tricks, but we know many users who know their theory very well, yet haven’t discovered a key feature that could make their analysis more effective, even though it’s been in the last 10 versions. I mention this specifically because it is a constant, even humorous, but telling exchange:

“Wow, that is amazing. I’m so glad that they added that feature. It must be brand new.”

“Actually, we’ve had that since version X. It’s been around for about 8 years.”