55,19 €
Master data management & analysis techniques with IBM SPSS Statistics 24
This book is designed for analysts and researchers who need to work with data to discover meaningful patterns but do not have the time (or inclination) to become programmers. We assume a foundational understanding of statistics such as one would learn in a basic course or two on statistical techniques and methods.
SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data.
The journey starts with installing and configuring SPSS Statistics for first use and exploring the data to understand its potential (as well as its limitations). Use the right statistical analysis technique such as regression, classification and more, and analyze your data in the best possible manner. Work with graphs and charts to visualize your findings. With this information in hand, the discovery of patterns within the data can be undertaken. Finally, the high level objective of developing predictive models that can be applied to other situations will be addressed.
By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.
Provides a practical orientation to understanding a set of data and examining the key relationships among the data elements. Shows useful visualizations to enhance understanding and interpretation. Outlines a roadmap that focuses the process so decision regarding how to proceed can be made easily.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 356
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: September 2017
Production reference: 1190917
ISBN 978-1-78728-381-7
www.packtpub.com
Authors
Kenneth Stehlik-Barry
Anthony J. Babinec
Copy Editor
Manisha Sinha
Reviewers
James Mott
James Sugrue
Project Coordinator
Manthan Patel
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Tushar Gupta
Indexer
Tejal Daruwale Soni
Content Development Editor
Tejas Limkar
Graphics
Tania Dutta
Technical Editor
Dharmendra Yadav
Production Coordinator
Deepika Naik
Kenneth Stehlik-Barry, PhD, joined SPSS as Manager of Training in 1980 after using SPSS for his own research for several years. Working with others at SPSS, including Anthony Babinec, he developed a series of courses related to the use of SPSS and taught these courses to numerous SPSS users. He also managed the technical support and statistics groups at SPSS. Along with Norman Nie, the founder of SPSS and Jane Junn, a political scientist, he co-authored Education and Democratic Citizenship. Dr. Stehlik-Barry has used SPSS extensively to analyze data from SPSS and IBM customers to discover valuable patterns that can be used to address pertinent business issues. He received his PhD in Political Science from Northwestern University and currently teaches in the Masters of Science in Predictive Analytics program there.
Anthony J. Babinec joined SPSS as a statistician in 1978 after assisting Norman Nie, SPSS founder, in a research methods class at the University of Chicago. Anthony developed SPSS courses and trained many SPSS users. He also wrote many examples found in SPSS documentation and worked in technical support. Anthony led a business development effort to find products implementing then-emerging new technologies such as CHAID decision trees and neural networks and helped SPSS customers successfully apply them. Anthony uses SPSS in consulting engagements and teaches IBM customers how to use its advanced features. He received his BA and MA in sociology with a specialization in advanced statistics from the University of Chicago and teaches classes at the Institute for Statistics Education. He is on the Board of Directors of the Chicago Chapter of the American Statistical Association, where he has served in different positions including President.
A book such as this is always a collaboration that extends beyond the authors. We owe a depth of gratitude to many and we would like to begin by thanking our family members, Janis, Cassiopeia, Leila, and Thea Stehlik-Barry, Tony's wife Terri M. Long, and their children Gina and Anthony. Authoring a book inevitably takes time away from family and the patience of our spouses and children is much appreciated. We would also like to thank our late parents, Leo and Patricia Barry and Anthony and Dorothy Babinec. They fostered our love of learning and supported our scholastic pursuits during our youth.
We would also like to acknowledge the late Norman Nie, a founder of SPSS and highly regarded social scientist. Norman was an empirical researcher and SPSS was his tool as well as his creation. His use of SPSS for his own analysis led to many valuable additions to the software. Ken coauthored Education and Democratic Citizenship with Norman and Jane Junn. Tony was a teaching assistant and research assistant with Norman at the University of Chicago. Norman was a colleague, mentor, and a valued friend and is greatly missed.
The team at Packt was enormously helpful in bringing this book to fruition. Tejas Limkar, our most frequent contact person, brought enthusiasm and encouragement to the project and kept things on track. Tushar Gupta was instrumental in launching the book initially, and Dharmendra Yadav drove the final push to get it completed. We also thank those at Packt that worked behind the scenes to deal with the graphics, editing, proofing and productions tasks.
Finally, we would like thank Colin Shearer, our IBM/SPSS colleague who put us in touch with Tushar at Packt initially and our two reviewers, James Mott and James Sugrue. They are long term colleagues of the authors and have a very deep knowledge of SPSS Statistics. Their feedback helped to make this a better book. We also thank our many colleagues at SPSS Inc., who collectively over the years built SPSS Statistics into the great product it has become.
Kenneth Stehlik-Barry
Anthony J. Babinec
James Mott, Ph.D, is a senior education consultant with extensive experience in teaching statistical analysis, modeling, Data Mining and Predictive Analytics. He has over 30 years of experience using SPSS products in his own research including IBM SPSS Statistics, IBM SPSS Modeler, and IBM SPSS Amos. He has also been actively teaching these products to IBM/SPSS customers for over 30 years. In addition, he is an experienced historian with expertise in the research and teaching of 20th Century United States Political history and Quantitative Methods.
Specialties: Data Mining, Quantitative Methods, Statistical Analysis, Teaching and Consulting.
James Sugrue has been selling and supporting SPSS Statistics since 1982. He is currently the president of Channel Group Inc. Channel Group Inc. began in 1996 as the holding company for the SPSS Inc. operations in Argentina, Chile, Paraguay, Uruguay, Bolivia, and Mexico. In 1998, they acquired the Quantime Inc. (Quantum, Quanvert, and so on) operations in Latin America. They later became the regional overlay team for the SPSS Market Research product line (Dimensions, Data Collection) for all of Latin America and the Caribbean.
For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Installing and Configuring SPSS
The SPSS installation utility
Installing Python for the scripting
Licensing SPSS
Confirming the options available
Launching and using SPSS
Setting parameters within the SPSS software
Executing a basic SPSS session
Summary
Accessing and Organizing Data
Accessing and organizing data overview
Reading Excel files
Reading delimited text data files
Saving IBM SPSS Statistics files
Reading IBM SPSS Statistics files
Demo - first look at the data - frequencies
Variable properties
Variable properties - name
Variable properties - type
Variable properties - width
Variable properties - decimals
Variable properties - label
Variable properties - values
Variable properties - missing
Variable properties - columns
Variable properties - align
Variable properties - measure
Variable properties - role
Demo - adding variable properties to the Variable View
Demo - adding variable properties via syntax
Demo - defining variable properties
Summary
Statistics for Individual Data Elements
Getting the sample data
Descriptive statistics for numeric fields
Controlling the descriptives display order
Frequency distributions
Discovering coding issues using frequencies
Using frequencies to verify missing data patterns
Explore procedure
Stem and leaf plot
Boxplot
Using explore to check subgroup patterns
Summary
Dealing with Missing Data and Outliers
Outliers
Frequencies for histogram and percentile values
Descriptives for standardized scores
The Examine procedure for extreme values and boxplot
Detecting multivariate outliers
Missing data
Missing values in Frequencies
Missing values in Descriptives
Missing value patterns
Replacing missing values
Summary
Visually Exploring the Data
Graphs available in SPSS procedures
Obtaining bar charts with frequencies
Obtaining a histogram with frequencies
Creating graphs using chart builder
Building a scatterplot
Create a boxplot using chart builder
Summary
Sampling, Subsetting, and Weighting
Select cases dialog box
Select cases - If condition is satisfied
Example
If condition is satisfied combined with Filter
If condition is satisfied combined with Copy
If condition is satisfied combined with Delete unselected cases
The Temporary command
Select cases based on time or case range
Using the filter variable
Selecting a random sample of cases
Split File
Weighting
Summary
Creating New Data Elements
Transforming fields in SPSS
The RECODE command
Creating a dummy variable using RECODE
Using RECODE to rescale a field
Respondent's income using the midpoint of a selected category
The COMPUTE command
The IF command
The DO IF/ELSE IF command
General points regarding SPSS transformation commands
Summary
Adding and Matching Files
SPSS Statistics commands to merge files
Example of one-to-many merge - Northwind database
Customer table
Orders table
The Customer-Orders relationship
SPSS code for a one-to-many merge
Alternate SPSS code
One-to-one merge - two data subsets from GSS2016
Example of combining cases using ADD FILES
Summary
Aggregating and Restructuring Data
Using aggregation to add fields to a file
Using aggregated variables to create new fields
Aggregating up one level
Preparing the data for aggregation
Second level aggregation
Preparing aggregated data for further use
Matching the aggregated file back to find specific records
Restructuring rows to columns
Patient test data example
Performing calculations following data restructuring
Summary
Crosstabulation Patterns for Categorical Data
Percentages in crosstabs
Testing differences in column proportions
Crosstab pivot table editing
Adding a layer variable
Adding a second layer
Using a Chi-square test with crosstabs
Expected counts
Context sensitive help
Ordinal measures of association
Interval with nominal association measure
Nominal measures of association
Summary
Comparing Means and ANOVA
SPSS procedures for comparing Means
The Means procedure
Adding a second variable
Test of linearity example
Testing the strength of the nonlinear relationship
Single sample t-test
The independent samples t-test
Homogeneity of variance test
Comparing subsets
Paired t-test
Paired t-test split by gender
One-way analysis of variance
Brown-Forsythe and Welch statistics
Planned comparisons
Post hoc comparisons
The ANOVA procedure
Summary
Correlations
Pearson correlations
Testing for significance
Mean differences versus correlations
Listwise versus pairwise missing values
Comparing pairwise and listwise correlation matrices
Pivoting table editing to enhance correlation matrices
Creating a very trimmed matrix
Visualizing correlations with scatterplots
Rank order correlations
Partial correlations
Adding a second control variable
Summary
Linear Regression
Assumptions of the classical linear regression model
Example - motor trend car data
Exploring associations between the target and predictors
Fitting and interpreting a simple regression model
Residual analysis for the simple regression model
Saving and interpreting casewise diagnostics
Multiple regression - Model-building strategies
Summary
Principal Components and Factor Analysis
Choosing between principal components analysis and factor analysis
PCA example - violent crimes
Simple descriptive analysis
SPSS code - principal components analysis
Assessing factorability of the data
Principal components analysis of the crime variables
Principal component analysis – two-component solution
Factor analysis - abilities
The reduced correlation matrix and its eigenvalues
Factor analysis code
Factor analysis results
Summary
Clustering
Overview of cluster analysis
Overview of SPSS Statistics cluster analysis procedures
Hierarchical cluster analysis example
Descriptive analysis
Cluster analysis - first attempt
Cluster analysis with four clusters
K-means cluster analysis example
Descriptive analysis
K-means cluster analysis of the Old Faithful data
Further cluster profiling
Other analyses to try
Twostep cluster analysis example
Summary
Discriminant Analysis
Descriptive discriminant analysis
Predictive discriminant analysis
Assumptions underlying discriminant analysis
Example data
Statistical and graphical summary of the data
Discriminant analysis setup - key decisions
Priors
Pooled or separate
Dimensionality
Syntax for the wine example
Examining the results
Scoring new observations
Summary
SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options to analyze patterns in the data. This book will have a comprehensive coverage of IBM's premier statistics and data analysis tool--IBM SPSS Statistics. It is designed for business professionals who wish to analyze their data. By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease.
Chapter 1, Installing and Configuring SPSS, covers the initial installation of SPSS and the configuration of the system for use on the user’s machine.
Chapter 2, Accessing and Organizing Data, covers the process of opening various types of data files (Excel, CSV, and SPSS) in SPSS and performing some simple tasks, such as labeling data elements. It demonstrates how to save new versions of the data that incorporate the changes so that they are available for subsequent use.
Chapter 3, Statistics for Individual Data Elements, is about the tools in SPSS that are available for obtaining descriptive statistics for each field in a data file.
Chapter 4, Dealing with Missing Data and Outliers, focuses on assessing data quality with respect to missing information and extreme values. It also deals with the techniques that can be used to address these problems.
Chapter 5, Visually Exploring the Data, discusses topics such as histograms, bar charts, box and whisker plots, and scatter plots.
Chapter 6, Sampling, Subsetting and Weighting, describes the options available in SPSS for taking samples from a dataset, creating subgroups with the data, and assigning weights to individual rows.
Chapter 7, Creating New Data Elements, discusses when it is useful to define new data elements to support analysis objectives and the process involved in building these elements in SPSS.
Chapter 8, Adding and Matching Files, describes the process of combining multiple data files to create a single file for use in an analysis. Both appending multiple files and merging files to add information are addressed.
Chapter 9, Aggregating and Restructuring Data, is about two topics--changing the unit of analysis via aggregation, and restructuring the data from wide to long or long to wide to facilitate analysis.
Chapter 10, Crosstabulation Patterns for Categorical Data, covers descriptive and inferential analysis of categorical data in two-way and multi-way contingency tables.
Chapter 11, Comparing Means and ANOVA, is about descriptive and inferential analysis involving the mean of a variable across groups.
Chapter 12, Correlations, discusses descriptive and inferential analysis of associations involving numeric variables via the use of the Pearson correlation coefficient and some analogs.
Chapter 13, Linear Regression, covers using linear regression to develop predictions of numeric target variables.
Chapter 14, Principal Components and Factor Analysis, is about the use of principal components analysis and factor analysis to understand patterns among the variables.
Chapter 15, Clustering, covers methods to find groups in the data through analyzing the data rows.
Chapter 16, Discriminant Analysis, discusses using discriminant analysis to develop classifications involving categorical target variables.
You will need: IBM SPSS Statistics 24 (or higher).
Here are the download links to the software:
Trial
https://www.ibm.com/analytics/us/en/technology/spss/spss-trials.html
Info on subscription:
https://www-01.ibm.com/software/analytics/subscriptionandsupport/spss.html
Info on hardware specs:
https://www.ibm.com/software/reports/compatibility/clarity-reports/report/html/osForProduct
You will also need Windows 10 or recent versions.
IBM SPSS Statistics is available via trial download. However, the trial period is something in the order of 14 days, which is probably too short.
IBM SPSS Statistics is available via annual single-user license and various other licenses, and relatively recently, via a subscription.
Price lists and terms probably vary by country.
IBM SPSS Statistics is packaged as Base plus optional modules. We made an effort to only use only elements of SPSS Base.
Detailed installation steps (software-wise) in theIBM SPSS Statistics installation documentation can be found at http://www-01.ibm.com/support/docview.wss?uid=swg24041224.
This book is designed for analysts and researchers who need to work with data to discover meaningful patterns but do not have the time (or inclination) to become programmers. We assume a foundational understanding of statistics such as one would learn in a basic course or two on statistical techniques and methods.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning. Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "we will focus on the Extreme Values table and the boxplot."
A block of code is set as follows:
FREQUENCIES VARIABLES=Price /FORMAT=NOTABLE /PERCENTILES=1.0 5.0 10.0 25.0 50.0 75.0 90.0 95.0 99.0 /STATISTICS=MINIMUM MAXIMUM /HISTOGRAM /ORDER=ANALYSIS.
Any command-line input or output is written as follows:
RECODE quality (1 thru 3=0) (4 thru 5=1) INTO qualsatpos.
VARIABLE LABELS qualsatpos 'Satisfied with Quality.
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go toFiles|Settings|Project Name|Project Interpreter."
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Data-Analysis-with-IBM-SPSS-Statistics. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
If the SPSS Statistics package is not already available for you to use, you will need to start by installing the software. This section establishes the foundation to use this tool for data analysis. Even if the software is available on your computer, you will want to become familiar with setting up the environment properly in order to make the analyzing process efficient and effective.
It is also a good idea to run a basic SPSS job to verify that everything is working as it should and to see the resources that are provided by way of tutorials and sample datasets.
Before you can use IBM SPSS Statistics for data analysis, you will need to install and configure the software. Typically, an analyst or researcher will use their desktop/laptop to analyze the data and this is where the SPSS software will be installed.
Running this .exe file will launch the installation process but prior to this, there are some things to consider. During the installation process, you will be asked where you want the files associated with SPSS to be stored. Most often, users will put the software in the same location that they use for other applications on their machine. This is usually the C:Program Files folder.
Topics that will be covered in this chapter include the following:
Running the SPSS installation utility
Setting parameters during the installation process
Licensing the SPSS software
Setting parameters within the SPSS software
Executing a basic SPSS session
To begin the installation, double-click on the installation .exe file that you downloaded. You should see a screen similar to the one shown in the following screenshot:
Once the extraction is finished, two license-related screens will appear. Click on Next on the first screen and, after accepting the license terms (read through them first if you want), click on Next again on the second screen to continue with the installation.
SPSS includes a scripting language that can be used to automate various processes within the software. While the scripting language will not be covered in this section, you may find it useful down the road.
The scripting is done via the Python language, and part of the installation process involves installing Python. The next three screens deal with installing Python and agreeing to the associated license terms. We recommend that you include Python as part of your basic software installation for SPSS. The following screenshot shows the initial screen where you indicate that the Python component is to be included in the installation:
On the two following screens, accept the license terms for Python and click on Next to proceed.
As part of the installation, you will be asked where to put the files associated with the SPSS software. By default, they will be placed in the C:\Program Files\IBM\SPSS\Statistics\24 folder, where 24 refers to the version of the SPSS software that you are installing. You can change the location for these files using the Browse button but unless you have a compelling reason to do so, we recommend using the setting shown in the image after the paragraph.
Depending on the options you have licensed (SPSS consists of a base package along with options such as Advanced Statistics, Decision Trees, Forecasting, and so on), you may need up to 2 GB of disk space. After specifying the folder to use for the SPSS files, click on Next and, on the following screen, click on Install to begin the process:
The process of copying the files to the folder and performing the installation may take a couple of minutes. A screen displays the progress of the file copying step. Installing the Python component for use within SPSS results in a screen as shown in the following screenshot. There are no buttons associated with this screen, only a display of the files being compiled:
When the screen titled InstallShield Wizard Completed appears, you can click on Finish to launch SPSS and perform the final step. SPSS uses an activation code to license the product after purchase. You should have obtained this code when you downloaded the software initially. It is typically a 20-character code with a mix of numbers and letters.
On the screen shown in the following screenshot, click on License Product to initiate the authorization of the software:
Use the Next button to proceed through this screen and the two following screens. The authorized user license choice on the last screen is the right choice, unless your organization has provided you with information for a concurrent user setup. If this is the case, change the setting to that option before proceeding.
The following screenshot shows the screen where you will enter your authorization code to activate the software via the Internet. While you can enter the code manually, it is easier to use copy/paste to ensure the characters are entered correctly.
The authorization code unlocks SPSS Statistics base along with any options that you are entitled to use. If your purchase included the Forecasting option, for example, there would be a Forecasting choice on the Analyze menu within the SPSS software. Some of the options included in the activation code used in this example are shown in the following screenshot:
Scroll through the license information to see which options are included in your SPSS license.
After reviewing the options that you have available, click on Finish to exit the installation process. Launch SPSS Statistics by going to the main Windows menu and finding it under Recently added in the upper left of the screen. The first screenshot shown under the licensing SPSS section is displayed initially. The tutorials included with SPSS can be accessed via the link on this screen, but they are also available via the Help menu within SPSS. Close this dialog box and the SPSS Data Editor window will be displayed.
The Data Editor window resembles a spreadsheet in terms of the layout, with the columns representing fields and the rows representing cases. As no data file has been loaded at this point, the window will have no content in the cells. Go to the Edit menu and select Options at the very bottom, as shown in the following screenshot:
The General tab, which is where some of the basic settings can be changed, is displayed. It is likely that you will not need to change any of these specifications initially, but at some point, you may want to alter these default settings. Click on the File Locations tab to display the dialog box in the following screenshot. Again, there is typically no need to change the settings initially, but be aware that SPSS creates temporary files during a session that are deleted when you exit the software.
If you are working with large volumes of data, you may need to direct these files to a location with more space, such as a network drive or an external device connected to your machine:
Click on OK to return to the Data Editor window. To confirm that the software is ready for use, go to the File menu and select Open Data. Navigate to the location where SPSS Statistics was installed, and down through the folders to the SamplesEnglish subfolder. The path shown here is typically where the sample SPSS data files that ship with the software get installed:
C:Program FilesIBMSPSSStatistics24SamplesEnglish
A list of sample SPSS data files (those with a .sav extension) will be displayed. For this example, select the bankloan.sav file, as shown in the following screenshot, and click on Open:
The Data Editor window now displays the name of the file just opened in the title bar with the fields (variables in SPSS terminology) as the column names and the actual values in the rows. Here, each row represents a bank customer and the columns contain their associated information. Only the first 12 rows are visible in the following screenshot, but after scrolling down, you will see more.
There are 850 rows in total:
Go to the Analyze menu and select DescriptiveStatistics | Frequencies, as shown in the following screenshot:
The dialog box shown in the previous image allows you to select fields and obtain basic descriptive statistics for them.
For this initial check of the software installation, select just the education field, which is shown by its label, Level of education, as shown in the following screenshot. You can double-click on the label or highlight it and use the arrow in the middle of the screen to make the selection:
The descriptive statistics requested for the education field are presented in a new output window as shown in the following image. The left side of the output window is referred to in SPSS as the navigation pane and it lists the elements available for viewing in the main portion of the window. The frequency table for education shows that there are five levels of education present in the data for the bank's customers and that over half, 54.1%, of these 850 customers did not complete high school. This very simple example will confirm that the SPSS Statistics software is installed and ready to use on your machine.
Refer to the following image for a better understanding of descriptive statistics and the navigation pane:
To complete this check of the installation process, go to the File menu and select Exit at the bottom. You will be prompted to save the newly-created output window, which was automatically assigned the name, *Output1. There is no need to save the results of the frequency table that was created, but you can do so if you like.
In this first chapter, we covered the basic installation of IBM SPSS Statistics on a local machine running Windows. The standard install includes the Python scripting component and requires licensing the software via the Internet. Although the default setting for things like files and display options were not modified, you saw how these elements can be changed later if there is a need to do so.
Once SPSS was up and running, the software was launched and a very basic example was covered. This should give you a sense of how to get started analyzing your own as well as confirm that everything is functioning as expected in terms of using the tool.
Congratulations! You are now ready to begin exploring the capabilities of SPSS Statistics on your data or using one if the sample data sets such as the one used in the sample session above. Be sure to take advantage of the tutorials within the Help system to facilitate the process of learning SPSS.
This chapter shows you how to read common file formats such as an Excel sheet or a delimited text file to IBM SPSS Statistics. The rationale for showing the reading of these formats is that most software programs read these file formats. In addition, many analysts use Excel for simple data activities such as data handling and producing charts. However, beyond these simple activities, Excel is limited in the data analytic capabilities it provides, so researchers have turned to IBM SPSS Statistics for its extensive statistical and analytical capabilities.
In order to use IBM SPSS Statistics, you must first read your data to the IBM SPSS Statistics Data Editor window. Once you successfully read the data, you provide variable properties to enrich the description of the data. After you have established the variable properties for the variables in your file, you have set the stage to produce informative statistical analyses and charts.
We will cover the following topics in this chapter:
Accessing and organizing data overview
Reading Excel files
Reading delimited text files
Saving IBM SPSS Statistics files
Reading IBM SPSS Statistics files
Looking at the data with frequencies
Specifying variable properties
Once you read the data to IBM SPSS Statistics, you should at least do a cursory data check of the inputted data. Do you see numeric data? String data? Is the data in the expected scale and range? Is the data complete? Of course, even if your data is not really very large in either the number of rows or columns, it can be difficult to assess via a simple visual inspection. For this reason, you might use SPSS Statistics to produce a tabular summary of variables showing counts and percentages. Doing so produces tables showing all the data codes in the designated variables. Once you have defined the SPSS Variable Properties such as value labels, you can control the tabular display to show data values (the data codes), value labels, or both.
A further consideration is how the data values are represented for categorical variables. Let's consider Respondent's Sex as an example.
Your categorical values in an Excel spreadsheet could be string values such as male or female. If so, then IBM SPSS Statistics can read these values.
A drawback of using numeric codes is that tabular summaries such as a summary table of counts will list the number of 1s and 2s, but the reader would not know that 1 represents male and 2 represents female. The way to handle this situation is to use value labels, one of a number of Variable Properties you can define after successfully reading the data.
Another consideration is: what if Respondent's Sex is not known for a specific individual? If the variable is a string variable, you could represent an unknown value of Respondent's Sex as a string value such as 'unknown', or you might represent the absence of information with a string of blanks such as ' '.
If Respondent's Sex is a numeric field, an unknown value could be represented by a distinct number code such as 3, assuming that males and females would be represented by 1 and 2, respectively. In either situation, you would like your summary tables and statistics to take into account the absence of information indicated in the values 'unknown' or 3. The way to handle this situation is to use the missing values command. There is more on this next.
Value labels and missing values are two examples of variable properties, which are properties internal to IBM SPSS Statistics that are associated with each variable in the data. You can save these properties along with the data. When added, these properties inform the analysis and display of data in IBM SPSS Statistics. For example, for a variable indicating Sex of Respondent, value labels could provide gender labels 'male' and 'female' that would clarify which data code represented which gender. Or, by defining data codes as missing values, you would insure that SPSS Statistics excluded these cases from the calculation of valid percent's, for example.
Here is a snapshot of a portion of an Excel spreadsheet:
Note that row 1 of the Excel spreadsheet is a header row containing variable names for the columns.
IBM SPSS Statistics can directly read an Excel sheet. There are different implementations in different recent releases of IBM SPSS Statistics but, in general, the capability exists on the File menu. In IBM SPSS Statistics 24, use the following path:
File | Import Data
Here is the Read Excel File dialog box:
By default, IBM SPSS Statistics shows the entire range of data that it encounters in the Excel sheet. You can use the Range
