46,44 €
Leverage the full potential of SAS to get unique, actionable insights from your data
Key Features
Book Description
SAS is one of the leading enterprise tools in the world today when it comes to data management and analysis. It enables the fast and easy processing of data and helps you gain valuable business insights for effective decision-making. This book will serve as a comprehensive guide that will prepare you for the SAS certification exam.
After a quick overview of the SAS architecture and components, the book will take you through the different approaches to importing and reading data from different sources using SAS. You will then cover SAS Base and 4GL, understanding data management and analysis, along with exploring SAS functions for data manipulation and transformation. Next, you'll discover SQL procedures and get up to speed on creating and validating queries. In the concluding chapters, you'll learn all about data visualization, right from creating bar charts and sample geographic maps through to assigning patterns and formats. In addition to this, the book will focus on macro programming and its advanced aspects.
By the end of this book, you will be well versed in SAS programming and have the skills you need to easily handle and manage your data-related problems in SAS.
What you will learn
Who this book is for
Budding or experienced data professionals who want to get started with SAS will benefit from this book. Those looking to prepare for the SAS certification exam will also find this book to be a useful resource. Some understanding of basic data management concepts will help you get the most out of this book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 306
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Screenshots in this book are created with SAS® University Edition software. Copyright 2014, SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC
The modifiers of the SCAN function in this book are taken from SAS® 9.2 Language Reference: Dictionary, Fourth Edition, Copyright 2011, SAS Institute Inc., USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc, Cary, NC
Commissioning Editor:Amey VarangaonkarAcquisition Editor: Yogesh DeokarContent Development Editor: Athikho Sapuni RishanaSenior Editor: Sofi RogersTechnical Editor: Dinesh ChaudharyCopy Editor: Safis EditingProject Coordinator: Kirti PisatProofreader: Safis EditingIndexer: Pratik ShirodkarProduction Designer: Shraddha Falebhai
First published: September 2019
Production reference: 1260919
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78883-982-2
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Harish Gulati is a consultant, analyst, modeler, and trainer based in London. He has 16 years of financial, consulting, and project management experience across leading banks, management consultancies, and media hubs. He enjoys demystifying his complex line of work in his spare time. This has led him to be an author and orator at analytical forums. His published books include SAS for Finance by Packt and Role of a Data Analyst, published by the British Chartered Institute of IT (BCS). He has an MBA in brand communications and a degree in psychology.
Harshil Gandhi is part of SAS India's Consulting team. As a consultant, he provides consulting and implementation services, including requirements gathering, analysis, solution development/implementation, and knowledge transfer. He also assists sales teams with relevant activities and ensures the highest levels of customer satisfaction. Harshil is good at rapidly prototyping solutions thanks to his background in data science (MTech). He is also a visiting scholar at NMIMS Mumbai.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On SAS For Data Analysis
About Packt
Why subscribe?
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: SAS Basics
Introduction to SAS Programming
SAS dataset fundamentals
Creating an SAS table
Compile phase
Execution phase
Dataset creation example
SAS programming language – basic syntax
Data step
Proc SQL
SAS LOG
Naming conventions in SAS
Naming conventions for Teradata in SAS
Dataset options
Compression
Encryption
Indexing
SAS operators
Arithmetic operators
Comparison operators
Logical operators
Formats
Formatting to make the data readable
Specifying a format to make it meaningful
Altering the data type
Subsetting datasets
WHERE and IF statements
Using OPTIONS
DROP or KEEP options
Viewing properties
Dictionary tables
Role of _ALL_ and _IN_
Summary
Data Manipulation and Transformation
Length of a variable
Case conversion and alignment
LowCase, PropCase, and UpCase
AnyUpper, AnyLower, and NoTupper
Left and right
String identification
The Scan function
Index, Indexc, and Indexw
Find
Dealing with blanks
Compress, Strip, and Trim
Missing and multiple values
COALESCE and COALESCEC
Interval calculations
INTNX and INTCK
Concatenation
CAT
CATS, CATT, and CATX
LAGS
Logic and control
IFC and IFN
WHICHC or WHICHN
CHOOSEC or CHOOSEN
Number manipulation
CEIL, FLOOR, INT, and ROUND
Summary
Section 2: Merging, Optimizing, and Descriptive Statistics
Combining, Indexing, Encryption, and Compression Techniques Simplified
Introduction to combining
Concatenating
Interleaving
Merging
Updating
Modifying
Concatenation
Different variable length and additional variables
Duplicate values
Different data types
Leveraging the temporary variable
PROC APPEND
Interleaving
Merging
By Matching
Overlapping variables
One-to-many merging
Program data vector
Many-to-many merging
Indexing
Unique values
Missing values
Encryption
Summary
Power of Statistics, Reporting, Transforming Procedures, and Functions
Proc Freq
Cross tabulation
Restricting Proc Freq output
Cross tabulation with a controlling variable
Proc Freq and statistical tests
Proc Univariate
Basic statistics and extreme observations
Tests of normality
Tests for location
Proc Means and Summary
Proc Means
Proc Summary
Proc Corr
Proc REG
Proc Transpose
Summary
Section 3: Advanced Programming
Advanced Programming Techniques - SAS Macros
What are macros?
Macro variable processing
Macro resolution tracking
Macro definition processing
Comparing positional and keywords parameters
Data-driven programming
Leveraging automatic global macro variables
Macros that evaluate
Writing efficient macros
Summary
Powerful Functions, Options, and Automatic Variables Simplified
NOMPREPLACE and MREPLACE
NOMCOMPILE and NCOMPILE
MCOMPILENOTE
NOMEXECNOTE and MEXECNOTE
MAUTOCOMPLOC
MACRO and NOMACRO
Available macro functions 
Exchanging values between the DATA step and macro variables
Choosing between CALL SYMGET and CALL SYMPUT
CALL EXECUTE
Altering the CALL SYMPUT example
Resolving macro variables
Macro variable names within text
Macro variables and libraries
Indirect macro referencing
Series of macro variable references with a single macro call
Multiple ampersands
Macro quoting
Using the %STR quote
Using the %NRSTR quote
Using the %BQOUTE and %NRBQOUTE quotes
Summary
Section 4: SQL in SAS
Advanced Programming Techniques Using PROC SQL
Comparing data steps and Proc SQL
Proc SQL joins
Inner join
Left join
Right join
Full join
One-to-many join
Many-to-many join
Proc SQL essentials
Subsetting
Grouping and summarizing
Dictionary tables
Summary
Deep Dive into PROC SQL
SAS views in Proc SQL
SQL views syntax
Describing views
Optimizing performance using views
Making changes with Proc SQL
Deleting
Altering
Identifying duplicates using Proc SQL
Creating an index in Proc SQL
Macros and Proc SQL
Creating a macro variable using Into
Creating multiple macro variables using Into
Summary
Section 5: Data Visualization and Reporting
Data Visualization
The role of data visualization in analytics
Histograms
Line plots
Vertical and horizontal bar charts
Scatter charts
Box plot
Summary
Reporting and Output Delivery System
Proc Tabulate
Comparing multiple Proc Means and Proc Tabulates
Multiple tables using Proc Tabulate
Choosing the statistics
Formatting the output
Two-dimensional output
Specifying the ODS destination
Formatting ODS files
Multiple sheets
Applying filters
Controlling the print options
Changing the default cells
ODS Excel charts
Color-coding the output
Copying over the formula
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
SAS is one of the leading enterprise tools in the world today in the fields of data management and analysis. It enables faster, easier processing of data and empowers you to get valuable business insights for effective decision-making.
This book will serve as an all-encompassing, comprehensive guide that you can refer to while preparing for your SAS certification exam. After a quick walk-through of the SAS architecture and components, this book teaches you the different ways to import and read data from different sources using SAS. You will become familiar with SAS Base, the 4GL programming language, and SQL procedures, with comprehensive coverage of topics such as data management and data analysis. You will then move on to learn about the advanced aspects of macro-programming.
By the end of this book, you will be an expert in SAS programming and will be able to handle and manage your data-related problems in SAS with ease.
If you are a data professional who's new to SAS programming and wants to be an expert at it, this is the book for you. Those looking to prepare for the SAS certification exam will also find this book to be a very handy resource. Some understanding of basic data management concepts will help you get the most out of this book.
Chapter 1, Introduction to SAS Programming, introduces programming concepts and instills in you the confidence to write basic SAS programs. We will explore what happens behind the scenes in SAS and thereby ensure that the fundamentals are in place to learn advanced concepts in the book.
Chapter 2, Data Manipulation and Transformation, includes comprehensive coverage of data manipulation, including tasks such as numeric-to-character conversion, handling missing values and blanks, and logic and control functions.
Chapter 3, Combining, Indexing, Encryption, and Compression Techniques Simplified, will focus on understanding the pros and cons of various data table combination techniques. We will explore the pros and cons of techniques using examples and look under the hood to see how SAS processes code.
Chapter 4, Power of Statistics, Reporting, Transforming Procedures, and Functions, looks at built-in SAS procedures that help reduce the coding effort required on your part and provide you with the ability to transform data, produce statistics, run statistical tests, and produce reports.
Chapter 5, Advanced Programming Techniques – SAS Macros, focuses on understanding the concept of loops and SAS macros. This chapter will help you move to advanced programming within SAS.
Chapter 6, Powerful Functions, Options, and Automatic Variables Simplified, focuses on mastering SAS macros using system options and functions that help debug and optimize code.
Chapter 7, Advanced Programming Techniques Using PROC SQL, is all about the PROC SQL procedure. We will start by understanding basic concepts such as Cartesian joins, then explore the pros and cons of using DATA steps over PROC SQL. Using various examples, we will perform multiple data tasks using PROC SQL.
Chapter 8, Deep Dive into PROC SQL, will introduce the unique benefits of combining our understanding of PROC SQL with macros.
Chapter 9, Data Visualization, covers data visualization, which is vital in the world of big data, where visual analysis is key to understanding the insights that are generated from data reporting and data mining. We will look at why visualization is important and how we can produce charts in SAS to help deliver the value that data has to offer effectively.
Chapter 10, Reporting and Output Delivery System, focuses on the packaging and production of data reports and insights in multiple formats and platforms.
B
asic knowledge of
SAS programming is what you need to get the most out of this book.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-SAS-For-Data-Analysis. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781788839822_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We have now specified the desired length of the Make variable."
A block of code is set as follows:
Data Cars;Length Make $ 15. Default=4;Input Make $ Year;Datalines;Porsche_Cayenne 2018
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The process of breaking this information is called tokenization."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
This part introduces the reader to the SAS environment—writing your first program and providing a glimpse into how SAS works in the background to execute the program. Readers will learn how to manipulate and transform data using a variety of functions.
This section comprises the following chapters:
Chapter 1
,
Introduction to SAS Programming
Chapter 2
,
Data Manipulation and Transformation
In this chapter, we will learn and master basic SAS programming techniques. For the uninitiated of you, this chapter should be a stepping stone to SAS programming. For experienced SAS programmers, this chapter will help you revise some behind the scenes functionalities and tricks of SAS. In either case, this chapter will lay the foundation for how good an advanced SAS programmer you can be. As we progress through this chapter, we will cover the following topics:
SAS dataset fundamentals
SAS programming language—basic syntax
SAS LOG
Dataset options
SAS operators
Formats
Subsetting datasets
The SAS dataset contains values that are organized as rows and columns that can be processed (read/written) by SAS. The dataset can be a data file (table) or view. Either way, a dataset is typically rectangular in format. The dataset has a descriptor portion and data portion. While in the following table, we can only see the column/variable names, the descriptor portion holds further information such as the number of rows (more commonly referred to as observations) in the dataset, date and time of creation, and the operating environment in which it was created. This section is called the data portion, which holds all the data values:
The maximum number of observations that can be counted for a SAS dataset is determined by the long integer data type size for the operating environment. In operating environments with a 32-bit long integer, the maximum number is 231-1 or approximately 2 billion observations (2,147,483,647). In operating environments with a 64-bit long integer, the maximum number is 263-1 or approximately 9.2 quintillion observations. Operating machines with a 32-bit long integer are likely to reach the maximum observation count of 2 billion observations in some real-world scenarios. However, the 64-big long integer machines are unlikely to ever reach the upper limit of observations permitted.
While dealing with SAS data, we are less concerned with the exact number of observations. It doesn't matter whether they are 5 million or 6 million observations. However, it should be much faster to query a 500-observation table compared to one with 5 million observations. The observations merely help in estimating the processing time. Throughout the book, we will learn about programming techniques that will help speed up processing. In this chapter, we will learn about compression.
The aspect more important than observations is the number of records per ID variable. In the cost of living table, we have 15 observations. Each observation is a record of a different city. In this case, the variable city has become an ID variable. In a transactional table of retail sales, you may have hundreds of records for each loyalty card. The multiple records may represent the basket of goods that have been purchased over a period of time. All the records would be linked to a single loyalty card number.
Data is seldom fully populated for each variable. For example, a data table constructed using responses from a questionnaire may have missing responses from a few respondents if the question that's being asked isn't mandatory. This information may not be available for each variable and such instances would be set to missing values in the table. A period (.) represents a missing numeric record, whereas a space (" ") represents a missing character record.
The task that SAS performs to create a table can be categorized into two phases:
Compile
Execute
The following flowchart shows us the detailed compilation and execution process in table creation:
The compile phase is one that is often not well understood by the users as this is the backend processing the output that was generated. Tasks within the compile phase include syntax check, input buffer (not created if reading an existing dataset), program data vector (PDV), and descriptor information:
Syntax check
: In the syntax check task, SAS checks whether the code syntax is correct and then converts the programming statements into machine code to help execute the code. Only if the syntax is correct does SAS proceed with other tasks in the compile phase.
Input buffer
: The input buffer is a logical area in memory in which SAS reads each record of data from a raw data file where the program executes. In the case when a dataset is created from another SAS dataset, an input buffer is not created.
PDV
: This is a logical area of memory where SAS builds the dataset by writing each observation one at a time. Data is read from the input buffer. Values are assigned to the variables in the PDV. The values are written to the dataset as a single observation. There are two automatic variables created in PDV, namely
_N_
and
_ERROR_
.
Both these variables are not part of the output dataset that's created. The
_N_
variable signifies the number of iterations of the data step. The
_ERROR_
variable captures the number of instances in each data step when an error occurs.
Descriptor information
: This contains information about the dataset and includes both the dataset and variable attributes.
In this phase, SAS writes the PDV values to the output dataset for the current observation. The values of the PDV are set to missing. If there are any more records to read, then the program goes back to the top of the data step and executes it again. The next observation is built and stored in the output dataset. This process goes on until there are no more records to read. After this, the dataset is then closed and SAS goes to the next DATA or PROC step (if available) in the program file.
Let's look at the steps in the compile and execution phase while creating the cost of living dataset we showcased in the preceding screenshot. We will run the following program to create the dataset:
DATA COST_LIVING;INPUT City $12. Index Prev_yr_index Housing Food Travel Utility Education Leisure Other;DATALINES;Adelaide 85 83 35 10 10 9 14 10 12Beijing 90 92 40 10 15 10 18 5 2Copenhagen 65 64 25 15 10 10 12 12 16Doha 56 50 30 15 5 10 10 20 10Dubai 75 76 30 16 14 10 20 8 2Dublin 45 43 30 10 8 12 10 15 15Hong Kong 83 88 45 5 10 15 15 9 1Johannesburg 35 40 45 5 5 15 15 10 5Manila 41 42 25 10 15 15 20 10 5Moscow 48 53 40 20 5 5 10 10 10Mumbai 83 85 40 10 15 15 10 9 1Munich 65 64 35 10 10 10 10 10 15 New York 89 85 40 10 15 10 20 5 5Oslo 60 58 25 15 5 5 15 20 15Paris 70 70 30 10 5 10 10 20 15Seoul 73 75 30 10 10 10 15 15 10Singapore 75 74 35 15 10 10 20 5 5Tokyo 87 85 40 15 10 5 15 14 1Zurich 63 61 30 10 10 15 10 10 15;RUN;
In its current form, the program will execute without errors. This is because the first phase of compile that checks for syntax errors will not come across any coding errors. For illustration purposes, we can try and remove the ; after the DATALINES command. The following error will be encountered when we try to run the modified code and no output table will be generated:
Let's review the steps in SAS processing for the preceding date creation program to understand how the PDV is generated. After the syntax check is done, the input buffer and the PDV are created. The PDV contains all the variables that are declared in the input statement. Initially, all the variable values are set to missing. The automatic _N_ and _ERROR_variables are both set to 0:
Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-
PDV:
City
Index
Prev_yr_index
Housing
Food
Travel
Utility
Education
Leisure
Other
.
.
.
.
.
.
.
.
.
The Cityvariable has been declared as a character variable and the rest of the variables are numeric. The missing character values are written as blanks and the missing numeric values are written as periods. After this stage, the data step executes and the data values are first assigned to the input buffer before being written to the PDV:
Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-Adelaide 85 83 35 10 10 9 14 10 12
PDV:
City
Index
Prev_yr_index
Housing
Food
Travel
Utility
Education
Leisure
Other
Adelaide
85
83
35
10
10
9
14
10
12
At this point, SAS writes the data values in the PDV to the output dataset. The _N_variable is set to 1 and _ERROR_ is set to 0. The PDV is set to missing values. Since we have more lines of data to read, the program will keep executing:
Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-Adelaide 85 83 35 10 10 9 14 10 12
PDV:
City
Index
Prev_yr_index
Housing
Food
Travel
Utility
Education
Leisure
Other
.
.
.
.
.
.
.
.
.
For the second line of data, the following values in the input buffer and PDV will be written:
Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-Beijing 90 92 40 10 15 10 18 5 2
PDV:
City
Index
Prev_yr_index
Housing
Food
Travel
Utility
Education
Leisure
Other
Beijing
90
92
40
10
15
10
18
5
2
The observations will now be written to the dataset and the variable _N_ will be incremented by 1 to take its value to 2. The _ERROR_variable will again be reset to 0 as no errors have been encountered. This process will continue until the time the last data observations have been read by the program and sent to the output dataset.
We used code in the first program to create an output dataset. The dataset was created by what is known as data steps. Data steps are a collection of statements that can help create, modify, and control the output. SAS also leverages Structured Query Language (SQL). Let's review the basic syntax for data steps and SQL within SAS. We will continue to explore more advanced versions of code throughout this book.
The following code represents one of the simplest forms of a data step:
DATA WORK.Air;SET SASHELP.Air;RUN;
Using theSETstatement in this data step, we have specified the dataset that we want to refer to. There are no conditional statements in the program that are selecting a proportion of records from theAirdataset in theSASHELPlibrary. As a result, all the contents of the dataset in the set statement will get copied over to the dataset specified in the data statement. In the set statement, it is necessary to specify a dataset. However, if you specify _LAST_ in theSETstatement, the dataset that was last created in the SAS session will be used instead. Finally, theRuncommand is specified to execute the program. The only instance when the command isn't needed in a data statement is when creating a dataset using anINPUTstatement (as shown in the following code block).
The use of DATA signifies that we are using the data step. In this statement, we specify the output dataset. WORK is what is known as a library in SAS. A library is like a Windows folder that stores files and various other things such as formats and catalogs. Every SAS session (each instance of a SAS software invocation is a separate session) is assigned its own temporary workspace. The temporary workspace is known as the Work library. At the end of the session, the temporary work session is cleared and unless saved in a permanent library, all the contents of the Work library are deleted. If the Work library is not specified, the dataset will be created in the temporary workspace by default.
A permanent library is one that can be assigned using a physical path. In BASE SAS, this can be a physical folder located in the computer drives. In the case of SAS Enterprise Guide Studio and other software, this may be a space on the server. The dataset name consists of two parts—the library name followed by the dataset name. Both are separated by a period. For the creation of datasets in the Work library, users only need to specify the dataset name. If no dataset name is specified, SAS names the dataset as D1, D2, and so on:
DATA;INPUT Id;DATALINES;12;RUN;
Since this is the first program in our SAS session without a dataset name specified, the name D1 will be assigned to the dataset:
We will explore the data step options in further detail throughout this book.
SAS leverages SQL through a built-in procedure. While we aren’t going to focus on SQL in this book, let's look at the basic structure of a SQL query in SAS:
PROC SQL; CREATE TABLE Table_Name AS SELECT FROM WHERE GROUP BY;QUIT;
PROC is the command that's used to specify a built-in procedure in SAS. In the case of the preceding program, we are referring to the SQL procedure. Just like in the data step, we start off by specifying the table name that we are creating. We then list the variables that we want to select from another dataset. The name of the dataset that has been selected is named in the FROM statement. The WHERE clause is used to sub-select the data. The GROUP BY clause is used for summary functions. Finally, we end the procedure by specifying the QUIT argument.
The SAS LOG section of your coding environment is where all the actions performed that have been by the user in the current session are stored. These include instances of program submission and also messages about any programs that you might have terminated while they were executing. Apart from these, the SAS LOG also contains system-generated messages. These are of two types. The first instance is where the SAS version and a few other details about your system are written to the LOG. The second is when responses to the user code are generated and written to the LOG. The response could state that the program has run successfully, failed, or has some syntax issues that have been ignored. The responses are categorized in NOTE, INFO, WARNING, and ERROR categories. Program submission messages can be easily identified in the LOG as they have a line number associated with them.
Let's examine the LOG that's generated after running the first program from the preceding Data step section:
On the left-hand side of the LOG is the line number. Line number 1 contains the default settings that are in place for this SAS session. The log for our program starts getting generated in line 73. From line 73 to line 75, the program that has been specified in the program editor window is replicated in the log. There are no errors being produced in the log, unlike the one shown in the Data creation example section. The note that's produced mentions the number of observations read from the input dataset. It also contains the number of observations and variables in the output dataset. The time it took to execute the query is also mentioned at the end of the notes.
By reading the LOG, the user can review the program executed and notes, warnings, or errors produced, review the summary produced about the input and output dataset, and check query execution time. After the first run, the user may want to modify the program. This could be because the output is not in sync with the intended requirement or an error has been generated. In any case, understanding of the log is required before we can edit the program.
Some of the frequently used functionalities in SAS where naming conventions need to be followed are variables, datasets, formats or informats that are user-created, arrays, labels, macro variables, library names, and file references.
The general rules for SAS names are as follows:
Variable names can be up to 32 characters, whereas some other names such as library names can be up to 8 characters.
SAS isn't case sensitive in name specification, unlike some other programming languages or other statistical packages.
The name cannot start with a number. It needs to start with a letter or an underscore. The second character can be a number or an underscore. No special characters apart from underscore are allowed in the name. Underscores are frequently used by programmers for variable names where multiple words are involved, for example,
Order_Date
,
Payment_Date
, or
Delivery_Date
. A variable name cannot contain a blank. Hence, the underscore becomes an important way to make the variable names more legible for users of your code and data.
In some instances of SAS names for
filerefs
, some special characters are allowed.
Some names are reserved for SAS functions and keywords that are used by the system. For instance, you cannot specify a library name that is the default SAS library associated with your SAS installation. These include SASHELP, SASUSER, and WORK.
SAS already has a macro variable called sysdate. Users shouldn't attempt to create a macro variable with the same name.
The Teradata naming conventions are different from the SAS names. Some of the key aspects are the following:
Unlike SAS, where the name can be up to 32 characters, Teradata names in SAS need to be between 1 and 30 characters.
The name can contain the letters A to Z, numbers, underscores, and also the dollar and pound signs.
You can specify a name in double quotes. That way, it can contain any characters except double quotation marks.
Names in double quotes are not case sensitive.
There are many built-in SAS options that apply to the dataset. The broad purpose of these options is to help us do the following:
Rename variables
Select variables for subsetting
Retain select variables
Specify the password for a dataset, compress it, and encrypt it
Throughout this book, we will be looking at various dataset options. We will begin by exploring the compress, encrypt, and index options.