Data Analysis with IBM SPSS Statistics - Kenneth Stehlik-Barry - E-Book

Data Analysis with IBM SPSS Statistics E-Book

Kenneth Stehlik-Barry

0,0
55,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Master data management & analysis techniques with IBM SPSS Statistics 24

About This Book

  • Leverage the power of IBM SPSS Statistics to perform efficient statistical analysis of your data
  • Choose the right statistical technique to analyze different types of data and build efficient models from your data with ease
  • Overcome any hurdle that you might come across while learning the different SPSS Statistics concepts with clear instructions, tips and tricks

Who This Book Is For

This book is designed for analysts and researchers who need to work with data to discover meaningful patterns but do not have the time (or inclination) to become programmers. We assume a foundational understanding of statistics such as one would learn in a basic course or two on statistical techniques and methods.

What You Will Learn

  • Install and set up SPSS to create a working environment for analytics
  • Techniques for exploring data visually and statistically, assessing data quality and addressing issues related to missing data
  • How to import different kinds of data and work with it
  • Organize data for analytical purposes (create new data elements, sampling, weighting, subsetting, and restructure your data)
  • Discover basic relationships among data elements (bivariate data patterns, differences in means, correlations)
  • Explore multivariate relationships
  • Leverage the offerings to draw accurate insights from your research, and benefit your decision-making

In Detail

SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data.

The journey starts with installing and configuring SPSS Statistics for first use and exploring the data to understand its potential (as well as its limitations). Use the right statistical analysis technique such as regression, classification and more, and analyze your data in the best possible manner. Work with graphs and charts to visualize your findings. With this information in hand, the discovery of patterns within the data can be undertaken. Finally, the high level objective of developing predictive models that can be applied to other situations will be addressed.

By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.

Style and approach

Provides a practical orientation to understanding a set of data and examining the key relationships among the data elements. Shows useful visualizations to enhance understanding and interpretation. Outlines a roadmap that focuses the process so decision regarding how to proceed can be made easily.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 356

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Data Analysis with IBM SPSS Statistics

 

 

 

 

 

 

 

 

 

 

Implementing Data Modeling, Descriptive Statistics and ANOVA

 

 

 

 

 

 

 

 

 

 

Kenneth Stehlik-Barry

 

Anthony J. Babinec

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Data Analysis with IBM SPSS Statistics

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

 

First published: September 2017

 

Production reference: 1190917

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-78728-381-7

 

www.packtpub.com

Credits

Authors

 

Kenneth Stehlik-Barry

Anthony J. Babinec

Copy Editor

Manisha Sinha

Reviewers

 

James Mott

James Sugrue

Project Coordinator

 

Manthan Patel

Commissioning Editor

 

Amey Varangaonkar

Proofreader

 

Safis Editing

Acquisition Editor

 

Tushar Gupta

Indexer

 

Tejal Daruwale Soni

Content Development Editor

 

Tejas Limkar

Graphics

 

Tania Dutta

Technical Editor

 

Dharmendra Yadav

Production Coordinator

 

Deepika Naik

About the Authors

Kenneth Stehlik-Barry, PhD, joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. Working with others at SPSS, including Anthony Babinec, he developed a series of courses related to the use of SPSS and taught these courses to numerous SPSS users. He also managed the technical support and statistics groups at SPSS. Along with Norman Nie, the founder of SPSS and Jane Junn, a political scientist, he co-authored Education and Democratic Citizenship. Dr. Stehlik-Barry has used SPSS extensively to analyze data from SPSS and IBM customers to discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there.

 

Anthony J. Babinec joined SPSS as a statistician in 1978 after assisting Norman Nie, SPSS founder, in a research methods class at the University of Chicago. Anthony developed SPSS courses and trained many SPSS users. He also wrote many examples found in SPSS documentation and worked in technical support. Anthony led a business development effort to find products implementing then-emerging new technologies such as CHAID decision trees and neural networks and helped SPSS customers successfully apply them. Anthony uses SPSS in consulting engagements and teaches IBM customers how to use its advanced features. He received his BA and MA in sociology with a specialization in advanced statistics from the University of Chicago and teaches classes at the Institute for Statistics Education. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including President.

Acknowledgement

A book such as this is always a collaboration that extends beyond the authors. We owe a depth of gratitude to many and we would like to begin by thanking our family members, Janis, Cassiopeia, Leila, and Thea Stehlik-Barry, Tony's wife Terri M. Long, and their children Gina and Anthony. Authoring a book inevitably takes time away from family and the patience of our spouses and children is much appreciated. We would also like to thank our late parents, Leo and Patricia Barry and Anthony and Dorothy Babinec. They fostered our love of learning and supported our scholastic pursuits during our youth.

We would also like to acknowledge the late Norman Nie, a founder of SPSS and highly regarded social scientist. Norman was an empirical researcher and SPSS was his tool as well as his creation. His use of SPSS for his own analysis led to many valuable additions to the software. Ken coauthored Education and Democratic Citizenship with Norman and Jane Junn. Tony was a teaching assistant and research assistant with Norman at the University of Chicago. Norman was a colleague, mentor, and a valued friend and is greatly missed.

The team at Packt was enormously helpful in bringing this book to fruition. Tejas Limkar, our most frequent contact person, brought enthusiasm and encouragement to the project and kept things on track. Tushar Gupta was instrumental in launching the book initially, and Dharmendra Yadav drove the final push to get it completed. We also thank those at Packt that worked behind the scenes to deal with the graphics, editing, proofing and productions tasks.                                        

Finally, we would like thank Colin Shearer, our IBM/SPSS colleague who put us in touch with Tushar at Packt initially and our two reviewers, James Mott and James Sugrue. They are long term colleagues of the authors and have a very deep knowledge of SPSS Statistics. Their feedback helped to make this a better book. We also thank our many colleagues at SPSS Inc., who collectively over the years built SPSS Statistics into the great product it has become.

 

Kenneth Stehlik-Barry

Anthony J. Babinec

About the Reviewers

James Mott, Ph.D, is a senior education consultant with extensive experience in teaching statistical analysis, modeling, Data Mining and Predictive Analytics. He has over 30 years of experience using SPSS products in his own research including IBM SPSS Statistics, IBM SPSS Modeler, and IBM SPSS Amos. He has also been actively teaching these products to IBM/SPSS customers for over 30 years. In addition, he is an experienced historian with expertise in the research and teaching of 20th Century United States Political history and Quantitative Methods.

Specialties: Data Mining, Quantitative Methods, Statistical Analysis, Teaching and Consulting.

 

 

James Sugrue has been selling and supporting SPSS Statistics since 1982. He is currently the president of Channel Group Inc. Channel Group Inc. began in 1996 as the holding company for the SPSS Inc. operations in Argentina, Chile, Paraguay, Uruguay, Bolivia, and Mexico. In 1998, they acquired the Quantime Inc. (Quantum, Quanvert, and so on) operations in Latin America. They later became the regional overlay team for the SPSS Market Research product line (Dimensions, Data Collection) for all of Latin America and the Caribbean.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review.

If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Installing and Configuring SPSS

The SPSS installation utility

Installing Python for the scripting

Licensing SPSS

Confirming the options available

Launching and using SPSS

Setting parameters within the SPSS software

Executing a basic SPSS session

Summary

Accessing and Organizing Data

Accessing and organizing data overview

Reading Excel files

Reading delimited text data files

Saving IBM SPSS Statistics files

Reading IBM SPSS Statistics files

Demo - first look at the data - frequencies

Variable properties

Variable properties - name

Variable properties - type

Variable properties - width

Variable properties - decimals

Variable properties - label

Variable properties - values

Variable properties - missing

Variable properties - columns

Variable properties - align

Variable properties - measure

Variable properties - role

Demo - adding variable properties to the Variable View

Demo - adding variable properties via syntax

Demo - defining variable properties

Summary

Statistics for Individual Data Elements

Getting the sample data

Descriptive statistics for numeric fields

Controlling the descriptives display order

Frequency distributions

Discovering coding issues using frequencies

Using frequencies to verify missing data patterns

Explore procedure

Stem and leaf plot

Boxplot

Using explore to check subgroup patterns

Summary

Dealing with Missing Data and Outliers

Outliers

Frequencies for histogram and percentile values

Descriptives for standardized scores

The Examine procedure for extreme values and boxplot

Detecting multivariate outliers

Missing data

Missing values in Frequencies

Missing values in Descriptives

Missing value patterns

Replacing missing values

Summary

Visually Exploring the Data

Graphs available in SPSS procedures

Obtaining bar charts with frequencies

Obtaining a histogram with frequencies

Creating graphs using chart builder

Building a scatterplot

Create a boxplot using chart builder

Summary

Sampling, Subsetting, and Weighting

Select cases dialog box

Select cases - If condition is satisfied

Example

If condition is satisfied combined with Filter

If condition is satisfied combined with Copy

If condition is satisfied combined with Delete unselected cases

The Temporary command

Select cases based on time or case range

Using the filter variable

Selecting a random sample of cases

Split File

Weighting

Summary

Creating New Data Elements

Transforming fields in SPSS

The RECODE command

Creating a dummy variable using RECODE

Using RECODE to rescale a field

Respondent's income using the midpoint of a selected category

The COMPUTE command

The IF command

The DO IF/ELSE IF command

General points regarding SPSS transformation commands

Summary

Adding and Matching Files

SPSS Statistics commands to merge files

Example of one-to-many merge - Northwind database

Customer table

Orders table

The Customer-Orders relationship

SPSS code for a one-to-many merge

Alternate SPSS code

One-to-one merge - two data subsets from GSS2016

Example of combining cases using ADD FILES

Summary

Aggregating and Restructuring Data

Using aggregation to add fields to a file

Using aggregated variables to create new fields

Aggregating up one level

Preparing the data for aggregation

Second level aggregation

Preparing aggregated data for further use

Matching the aggregated file back to find specific records

Restructuring rows to columns

Patient test data example

Performing calculations following data restructuring

Summary

Crosstabulation Patterns for Categorical Data

Percentages in crosstabs

Testing differences in column proportions

Crosstab pivot table editing

Adding a layer variable

Adding a second layer

Using a Chi-square test with crosstabs

Expected counts

Context sensitive help

Ordinal measures of association

Interval with nominal association measure

Nominal measures of association

Summary

Comparing Means and ANOVA

SPSS procedures for comparing Means

The Means procedure

Adding a second variable 

Test of linearity example

Testing the strength of the nonlinear relationship

Single sample t-test

The independent samples t-test

Homogeneity of variance test

Comparing subsets

Paired t-test

Paired t-test split by gender

One-way analysis of variance

Brown-Forsythe and Welch statistics

Planned comparisons

Post hoc comparisons

The ANOVA procedure

Summary

Correlations

Pearson correlations

Testing for significance

Mean differences versus correlations

Listwise versus pairwise missing values

Comparing pairwise and listwise correlation matrices

Pivoting table editing to enhance correlation matrices

Creating a very trimmed matrix

Visualizing correlations with scatterplots

Rank order correlations

Partial correlations

Adding a second control variable

Summary

Linear Regression

Assumptions of the classical linear regression model

Example - motor trend car data

Exploring associations between the target and predictors

Fitting and interpreting a simple regression model

Residual analysis for the simple regression model

Saving and interpreting casewise diagnostics

Multiple regression - Model-building strategies

Summary

Principal Components and Factor Analysis

Choosing between principal components analysis and factor analysis

PCA example - violent crimes

Simple descriptive analysis

SPSS code - principal components analysis

Assessing factorability of the data

Principal components analysis of the crime variables

Principal component analysis – two-component solution

Factor analysis - abilities

The reduced correlation matrix and its eigenvalues

Factor analysis code

Factor analysis results

Summary

Clustering

Overview of cluster analysis

Overview of SPSS Statistics cluster analysis procedures

Hierarchical cluster analysis example

Descriptive analysis

Cluster analysis - first attempt

Cluster analysis with four clusters

K-means cluster analysis example

Descriptive analysis

K-means cluster analysis of the Old Faithful data

Further cluster profiling

Other analyses to try

Twostep cluster analysis example

Summary

Discriminant Analysis

Descriptive discriminant analysis

Predictive discriminant analysis

Assumptions underlying discriminant analysis

Example data

Statistical and graphical summary of the data

Discriminant analysis setup - key decisions

Priors

Pooled or separate

Dimensionality

Syntax for the wine example

Examining the results

Scoring new observations

Summary

Preface

SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options to analyze patterns in the data. This book will have a comprehensive coverage of IBM's premier statistics and data analysis tool--IBM SPSS Statistics. It is designed for business professionals who wish to analyze their data. By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.

What this book covers

Chapter 1, Installing and Configuring SPSS, covers the initial installation of SPSS and the configuration of the system for use on the user’s machine.

Chapter 2, Accessing and Organizing Data, covers the process of opening various types of data files (Excel, CSV, and SPSS) in SPSS and performing some simple tasks, such as labeling data elements. It demonstrates how to save new versions of the data that incorporate the changes so that they are available for subsequent use.

Chapter 3, Statistics for Individual Data Elements, is about the tools in SPSS that are available for obtaining descriptive statistics for each field in a data file.

Chapter 4, Dealing with Missing Data and Outliers, focuses on assessing data quality with respect to missing information and extreme values. It also deals with the techniques that can be used to address these problems.

Chapter 5, Visually Exploring the Data, discusses topics such as histograms, bar charts, box and whisker plots, and scatter plots.

Chapter 6, Sampling, Subsetting and Weighting, describes the options available in SPSS for taking samples from a dataset, creating subgroups with the data, and assigning weights to individual rows.

Chapter 7, Creating New Data Elements, discusses when it is useful to define new data elements to support analysis objectives and the process involved in building these elements in SPSS.

Chapter 8, Adding and Matching Files, describes the process of combining multiple data files to create a single file for use in an analysis. Both appending multiple files and merging files to add information are addressed.

Chapter 9, Aggregating and Restructuring Data, is about two topics--changing the unit of analysis via aggregation, and restructuring the data from wide to long or long to wide to facilitate analysis.

Chapter 10, Crosstabulation Patterns for Categorical Data, covers descriptive and inferential analysis of categorical data in two-way and multi-way contingency tables.

Chapter 11, Comparing Means and ANOVA, is about descriptive and inferential analysis involving the mean of a variable across groups.

Chapter 12, Correlations, discusses descriptive and inferential analysis of associations involving numeric variables via the use of the Pearson correlation coefficient and some analogs.

Chapter 13, Linear Regression, covers using linear regression to develop predictions of numeric target variables.

Chapter 14, Principal Components and Factor Analysis, is about the use of principal components analysis and factor analysis to understand patterns among the variables.

Chapter 15, Clustering, covers methods to find groups in the data through analyzing the data rows.

Chapter 16, Discriminant Analysis, discusses using discriminant analysis to develop classifications involving categorical target variables.

What you need for this book

You will need: IBM SPSS Statistics 24 (or higher).

Here are the download links to the software:

Trial

https://www.ibm.com/analytics/us/en/technology/spss/spss-trials.html

Info on subscription:

https://www-01.ibm.com/software/analytics/subscriptionandsupport/spss.html



Info on hardware specs:

https://www.ibm.com/software/reports/compatibility/clarity-reports/report/html/osForProduct

You will also need Windows 10 or recent versions.

IBM SPSS Statistics is available via trial download. However, the trial period is something in the order of 14 days, which is probably too short.

IBM SPSS Statistics is available via annual single-user license and various other licenses, and relatively recently, via a subscription.

Price lists and terms probably vary by country.

IBM SPSS Statistics is packaged as Base plus optional modules. We made an effort to only use only elements of SPSS Base.

Detailed installation steps (software-wise) in theIBM SPSS Statistics installation documentation can be found at http://www-01.ibm.com/support/docview.wss?uid=swg24041224.

Who this book is for

This book is designed for analysts and researchers who need to work with data to discover meaningful patterns but do not have the time (or inclination) to become programmers. We assume a foundational understanding of statistics such as one would learn in a basic course or two on statistical techniques and methods.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "we will focus on the Extreme Values table and the boxplot."

A block of code is set as follows:

FREQUENCIES VARIABLES=Price /FORMAT=NOTABLE /PERCENTILES=1.0 5.0 10.0 25.0 50.0 75.0 90.0 95.0 99.0 /STATISTICS=MINIMUM MAXIMUM /HISTOGRAM /ORDER=ANALYSIS.

Any command-line input or output is written as follows:

RECODE quality (1 thru 3=0) (4 thru 5=1) INTO qualsatpos.

VARIABLE LABELS qualsatpos 'Satisfied with Quality.

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go toFiles|Settings|Project Name|Project Interpreter."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:

Log in or register to our website using your email address and password.

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Data-Analysis-with-IBM-SPSS-Statistics. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Installing and Configuring SPSS

If the SPSS Statistics package is not already available for you to use, you will need to start by installing the software. This section establishes the foundation to use this tool for data analysis. Even if the software is available on your computer, you will want to become familiar with setting up the environment properly in order to make the analyzing process efficient and effective.

It is also a good idea to run a basic SPSS job to verify that everything is working as it should and to see the resources that are provided by way of tutorials and sample datasets.

Before you can use IBM SPSS Statistics for data analysis, you will need to install and configure the software. Typically, an analyst or researcher will use their desktop/laptop to analyze the data and this is where the SPSS software will be installed.

When you purchase the software, or obtain it through your organization, you will receive an executable with a name such as  SPSS_Statistics_24_win_64.exe. The 64 in this file name indicates that the 64-bit version of the software was selected and version 24 of SPSS is being installed.

Running this .exe file will launch the installation process but prior to this, there are some things to consider. During the installation process, you will be asked where you want the files associated with SPSS to be stored. Most often, users will put the software in the same location that they use for other applications on their machine. This is usually the C:Program Files folder.

Topics that will be covered in this chapter include the following:

Running the SPSS installation utility

Setting parameters during the installation process

Licensing the SPSS software

Setting parameters within the SPSS software

Executing a basic SPSS session

The SPSS installation utility

To begin the installation, double-click on the installation .exe file that you downloaded. You should see a screen similar to the one shown in the following screenshot:

Once the extraction is finished, two license-related screens will appear. Click on Next on the first screen and, after accepting the license terms (read through them first if you want), click on Next again on the second screen to continue with the installation.

Installing Python for the scripting

SPSS includes a scripting language that can be used to automate various processes within the software. While the scripting language will not be covered in this section, you may find it useful down the road.

The scripting is done via the Python language, and part of the installation process involves installing Python. The next three screens deal with installing Python and agreeing to the associated license terms. We recommend that you include Python as part of your basic software installation for SPSS. The following screenshot shows the initial screen where you indicate that the Python component is to be included in the installation:

On the two following screens, accept the license terms for Python and click on Next to proceed.

As part of the installation, you will be asked where to put the files associated with the SPSS software. By default, they will be placed in the C:\Program Files\IBM\SPSS\Statistics\24 folder,  where 24 refers to the version of the SPSS software that you are installing. You can change the location for these files using the Browse button but unless you have a compelling reason to do so, we recommend using the setting shown in the image after the paragraph.

If you are concerned about having sufficient disk space on the C: drive, you can use the Available Space button to see how much free disk space is available.

Depending on the options you have licensed (SPSS consists of a base package along with options such as Advanced Statistics, Decision Trees, Forecasting, and so on), you may need up to 2 GB of disk space. After specifying the folder to use for the SPSS files, click on Next and, on the following screen, click on Install to begin the process:

The process of copying the files to the folder and performing the installation may take a couple of minutes. A screen displays the progress of the file copying step. Installing the Python component for use within SPSS results in a screen as shown in the following screenshot. There are no buttons associated with this screen, only a display of the files being compiled:

Licensing SPSS

When the screen titled InstallShield Wizard Completed appears, you can click on Finish to launch SPSS and perform the final step. SPSS uses an activation code to license the product after purchase. You should have obtained this code when you downloaded the software initially. It is typically a 20-character code with a mix of numbers and letters.

On the screen shown in the following screenshot, click on License Product to initiate the authorization of the software:

The SPSS home screen shown in the preceding screenshot contains several useful links that you may want to explore, such as the Get started with tutorials link at the bottom. If you no longer want to see this screen each time you launch SPSS, check the box at the lower left.

Use the Next button to proceed through this screen and the two following screens. The authorized user license choice on the last screen is the right choice, unless your organization has provided you with information for a concurrent user setup. If this is the case, change the setting to that option before proceeding.

The following screenshot shows the screen where you will enter  your authorization code to activate the software via the Internet. While you can enter the code manually, it is easier to use copy/paste to ensure the characters are entered correctly.

Confirming the options available

The authorization code unlocks SPSS Statistics base along with any options that you are entitled to use. If your purchase included the Forecasting option, for example, there would be a Forecasting choice on the Analyze menu within the SPSS software. Some of the options included in the activation code used in this example are shown in the following screenshot:

Scroll through the license information to see which options are included in your SPSS license.

In the installation example shown here, the user purchased the Grad Pack version of SPSS, which includes a specific set of options along with the base portion of the software. The expiration date for the license just entered is displayed as well.

Launching and using SPSS

After reviewing the options that you have available, click on Finish to exit the installation process. Launch SPSS Statistics by going to the main Windows menu and finding it under Recently added in the upper left of the screen. The first screenshot shown under the licensing SPSS section is displayed initially. The tutorials included with SPSS can be accessed via the link on this screen, but they are also available via the Help menu within SPSS. Close this dialog box and the SPSS Data Editor window will be displayed.

The Data Editor window resembles a spreadsheet in terms of the layout, with the columns representing fields and the rows representing cases. As no data file has been loaded at this point, the window will have no content in the cells. Go to the Edit menu and select Options at the very bottom, as shown in the following screenshot: 

Setting parameters within the SPSS software

The General tab, which is where some of the basic settings can be changed, is displayed. It is likely that you will not need to change any of these specifications initially, but at some point, you may want to alter these default settings. Click on the File Locations tab to display the dialog box in the following screenshot. Again, there is typically no need to change the settings initially, but be aware that SPSS creates temporary files during a session that are deleted when you exit the software.

If you are working with large volumes of data, you may need to direct these files to a location with more space, such as a network drive or an external device connected to your machine:

SPSS maintains a Journal file, which logs all the commands created as you move through various dialog boxes and make selections. This file provides an audit trail of sorts that can be quite useful. The file is set up to be appended and it is recommended that you keep this setting in place. As only the commands are logged in this file, it does not become very large, even over many months of using SPSS.

Executing a basic SPSS session

Click on OK to return to the Data Editor window. To confirm that the software is ready for use, go to the File menu and select Open Data. Navigate to the location where SPSS Statistics was installed, and down through the folders to the SamplesEnglish subfolder. The path shown here is typically where the sample SPSS data files that ship with the software get installed:

C:Program FilesIBMSPSSStatistics24SamplesEnglish

A list of sample SPSS data files (those with a .sav extension) will be displayed. For this example, select the bankloan.sav file, as shown in the following screenshot, and click on Open:

The Data Editor window now displays the name of the file just opened in the title bar with the fields (variables in SPSS terminology) as the column names and the actual values in the rows. Here, each row represents a bank customer and the columns contain their associated information. Only the first 12 rows are visible in the following screenshot, but after scrolling down, you will see more.

There are 850 rows in total:

Go to the Analyze menu and select DescriptiveStatistics | Frequencies, as shown in the following screenshot:

The Frequencies dialog box shown here has a Bootstrap button on the lower right. This is present because the license used for this installation included the Bootstrap option, which results in this added feature appearing in appropriate places within SPSS.

The dialog box shown in the previous image allows you to select fields and obtain basic descriptive statistics for them.

For this initial check of the software installation, select just the education field, which is shown by its label, Level of education, as shown in the following screenshot. You can double-click on the label or highlight it and use the arrow in the middle of the screen to make the selection:

The descriptive statistics requested for the education field are presented in a new output window as shown in the following image. The left side of the output window is referred to in SPSS as the navigation pane and it lists the elements available for viewing in the main portion of the window. The frequency table for education shows that there are five levels of education present in the data for the bank's customers and that over half, 54.1%, of these 850 customers did not complete high school. This very simple example will confirm that the SPSS Statistics software is installed and ready to use on your machine. 

Refer to the following image for a better understanding of descriptive statistics and the navigation pane:

To complete this check of the installation process, go to the File menu and select Exit at the bottom. You will be prompted to save the newly-created output window, which was automatically assigned the name, *Output1. There is no need to save the results of the frequency table that was created, but you can do so if you like. 

The title bar of the output window shows the name *Output1, which was generated automatically by SPSS. The * indicates that the window contains material that has not been saved.

Summary

In this first chapter, we covered the basic installation of IBM SPSS Statistics on a local machine running Windows. The standard install includes the Python scripting component and requires licensing the software via the Internet. Although the default setting for things like files and display options were not modified, you saw how these elements can be changed later if there is a need to do so.

Once SPSS was up and running, the software was launched and a very basic example was covered. This should give you a sense of how to get started analyzing your own as well as confirm that everything is functioning as expected in terms of using the tool.

Congratulations! You are now ready to begin exploring the capabilities of SPSS Statistics on your data or using one if the sample data sets such as the one used in the sample session above. Be sure to take advantage of the tutorials within the Help system to facilitate the process of learning SPSS.

Accessing and Organizing Data

This chapter shows you how to read common file formats such as an Excel sheet or a delimited text file to IBM SPSS Statistics. The rationale for showing the reading of these formats is that most software programs read these file formats. In addition, many analysts use Excel for simple data activities such as data handling and producing charts. However, beyond these simple activities, Excel is limited in the data analytic capabilities it provides, so researchers have turned to IBM SPSS Statistics for its extensive statistical and analytical capabilities. 

In order to use IBM SPSS Statistics, you must first read your data to the IBM SPSS Statistics Data Editor window. Once you successfully read the data, you provide variable properties to enrich the description of the data. After you have established the variable properties for the variables in your file, you have set the stage to produce informative statistical analyses and charts.

We will cover the following topics in this chapter:

Accessing and organizing data overview

Reading Excel files

Reading delimited text files

Saving IBM SPSS Statistics files

Reading IBM SPSS Statistics files

Looking at the data with frequencies

Specifying variable properties

Accessing and organizing data overview

Once you read the data to IBM SPSS Statistics, you should at least do a cursory data check of the inputted data. Do you see numeric data? String data? Is the data in the expected scale and range? Is the data complete? Of course, even if your data is not really very large in either the number of rows or columns, it can be difficult to assess via a simple visual inspection. For this reason, you might use SPSS Statistics to produce a tabular summary of variables showing counts and percentages. Doing so produces tables showing all the data codes in the designated variables. Once you have defined the SPSS Variable Properties such as value labels, you can control the tabular display to show data values (the data codes), value labels, or both. 

A further consideration is how the data values are represented for categorical variables. Let's consider Respondent's Sex as an example.

Your categorical values in an Excel spreadsheet could be string values such as male or female. If so, then IBM SPSS Statistics can read these values.

However, it is a common practice in the survey research community to use numeric codes to represent categories. In general, use sequential numbers starting from 1 to enumerate the categories. In this example, the data codes would be 1 and 2, although assignment to the genders of male and female is arbitrary. Say that males are represented by a 1 code and females are represented by a 2 code.

A drawback of using numeric codes is that tabular summaries such as a summary table of counts will list the number of 1s and 2s, but the reader would not know that 1 represents male and 2 represents female. The way to handle this situation is to use value labels, one of a number of Variable Properties you can define after successfully reading the data.

Another consideration is: what if Respondent's Sex is not known for a specific individual? If the variable is a string variable, you could represent an unknown value of Respondent's Sex as a string value such as 'unknown', or you might represent the absence of information with a string of blanks such as '   '.

If Respondent's Sex is a numeric field, an unknown value could be represented by a distinct number code such as 3, assuming that males and females would be represented by 1 and 2, respectively. In either situation, you would like your summary tables and statistics to take into account the absence of information indicated in the values 'unknown' or 3. The way to handle this situation is to use the missing values command. There is more on this next.

Value labels and missing values are two examples of variable properties, which are properties internal to IBM SPSS Statistics that are associated with each variable in the data. You can save these properties along with the data. When added, these properties inform the analysis and display of data in IBM SPSS Statistics. For example, for a variable indicating Sex of Respondent, value labels could provide gender labels 'male' and 'female' that would clarify which data code represented which gender. Or, by defining data codes as missing values, you would insure that SPSS Statistics excluded these cases from the calculation of valid percent's, for example. 

Menus versus syntax The examples in this chapter start from the menus but suggest the use of the Paste button to paste constructed syntax to the Syntax window. In the syntax window, you can run the just-pasted syntax. We discuss elements of the syntax, but encourage you to use the Help button to learn more about individual commands.

Reading Excel files

Here is a snapshot of a portion of an Excel spreadsheet:

Note that row 1 of the Excel spreadsheet is a header row containing variable names for the columns.

When working with Excel spreadsheets or delimited text files, use row 1 of the file to supply variable names that you intend to use in SPSS Statistics. 

IBM SPSS Statistics can directly read an Excel sheet. There are different implementations in different recent releases of IBM SPSS Statistics but, in general, the capability exists on the File menu. In IBM SPSS Statistics 24, use the following path:

File | Import Data

Here is the Read Excel File dialog box:

By default, IBM SPSS Statistics shows the entire range of data that it encounters in the Excel sheet. You can use the Range