Hands-On SAS for Data Analysis - Harish Gulati - E-Book

Hands-On SAS for Data Analysis E-Book

Harish Gulati

0,0
46,44 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Leverage the full potential of SAS to get unique, actionable insights from your data




Key Features



  • Build enterprise-class data solutions using SAS and become well-versed in SAS programming


  • Work with different data structures, and run SQL queries to manipulate your data


  • Explore essential concepts and techniques with practical examples to confidently pass the SAS certification exam



Book Description



SAS is one of the leading enterprise tools in the world today when it comes to data management and analysis. It enables the fast and easy processing of data and helps you gain valuable business insights for effective decision-making. This book will serve as a comprehensive guide that will prepare you for the SAS certification exam.







After a quick overview of the SAS architecture and components, the book will take you through the different approaches to importing and reading data from different sources using SAS. You will then cover SAS Base and 4GL, understanding data management and analysis, along with exploring SAS functions for data manipulation and transformation. Next, you'll discover SQL procedures and get up to speed on creating and validating queries. In the concluding chapters, you'll learn all about data visualization, right from creating bar charts and sample geographic maps through to assigning patterns and formats. In addition to this, the book will focus on macro programming and its advanced aspects.







By the end of this book, you will be well versed in SAS programming and have the skills you need to easily handle and manage your data-related problems in SAS.




What you will learn



  • Explore a variety of SAS modules and packages for efficient data analysis


  • Use SAS 4GL functions to manipulate, merge, sort, and transform data


  • Gain useful insights into advanced PROC SQL options in SAS to interact with data


  • Get to grips with SAS Macro and define your own macros to share data


  • Discover the different graphical libraries to shape and visualize data with


  • Apply the SAS Output Delivery System to prepare detailed reports



Who this book is for



Budding or experienced data professionals who want to get started with SAS will benefit from this book. Those looking to prepare for the SAS certification exam will also find this book to be a useful resource. Some understanding of basic data management concepts will help you get the most out of this book.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 306

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Hands-On SAS for Data Analysis

 

 

A practical guide to performing effective queries, data visualization, and reporting techniques

 

 

 

 

 

 

 

 

 

 

 

 

 

Harish Gulati

 

 

 

 

 

 

 

 

 

 

BIRMINGHAM - MUMBAI

Hands-On SAS For Data Analysis

Copyright © 2019 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Screenshots in this book are created with SAS® University Edition software. Copyright 2014, SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary, NC

The modifiers of the SCAN function in this book are taken from SAS® 9.2 Language Reference: Dictionary, Fourth Edition, Copyright 2011, SAS Institute Inc., USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc, Cary, NC

 

 

Commissioning Editor:Amey VarangaonkarAcquisition Editor: Yogesh DeokarContent Development Editor: Athikho Sapuni RishanaSenior Editor: Sofi RogersTechnical Editor: Dinesh ChaudharyCopy Editor: Safis EditingProject Coordinator: Kirti PisatProofreader: Safis EditingIndexer: Pratik ShirodkarProduction Designer: Shraddha Falebhai

First published: September 2019

Production reference: 1260919

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78883-982-2

www.packt.com

 

Packt.com

Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Fully searchable for easy access to vital information

Copy and paste, print, and bookmark content

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks. 

Contributors

About the author

Harish Gulati is a consultant, analyst, modeler, and trainer based in London. He has 16 years of financial, consulting, and project management experience across leading banks, management consultancies, and media hubs. He enjoys demystifying his complex line of work in his spare time. This has led him to be an author and orator at analytical forums. His published books include SAS for Finance by Packt and Role of a Data Analyst, published by the British Chartered Institute of IT (BCS). He has an MBA in brand communications and a degree in psychology.

 

About the reviewer

Harshil Gandhi is part of SAS India's Consulting team. As a consultant, he provides consulting and implementation services, including requirements gathering, analysis, solution development/implementation, and knowledge transfer. He also assists sales teams with relevant activities and ensures the highest levels of customer satisfaction. Harshil is good at rapidly prototyping solutions thanks to his background in data science (MTech).  He is also a visiting scholar at NMIMS Mumbai.

 

 

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

Hands-On SAS For Data Analysis

About Packt

Why subscribe?

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: SAS Basics

Introduction to SAS Programming

SAS dataset fundamentals

Creating an SAS table

Compile phase

Execution phase

Dataset creation example

SAS programming language – basic syntax

Data step

Proc SQL

SAS LOG

Naming conventions in SAS

Naming conventions for Teradata in SAS

Dataset options

Compression

Encryption

Indexing

SAS operators

Arithmetic operators

Comparison operators

Logical operators

Formats

Formatting to make the data readable

Specifying a format to make it meaningful

Altering the data type

Subsetting datasets

WHERE and IF statements

Using OPTIONS

DROP or KEEP options

Viewing properties

Dictionary tables

Role of _ALL_ and _IN_

Summary

Data Manipulation and Transformation

Length of a variable

Case conversion and alignment

LowCase, PropCase, and UpCase

AnyUpper, AnyLower, and NoTupper

Left and right

String identification

The Scan function

Index, Indexc, and Indexw

Find

Dealing with blanks

Compress, Strip, and Trim

Missing and multiple values

COALESCE and COALESCEC

Interval calculations

INTNX and INTCK

Concatenation

CAT

CATS, CATT, and CATX

LAGS

Logic and control

IFC and IFN

WHICHC or WHICHN

CHOOSEC or CHOOSEN

Number manipulation

CEIL, FLOOR, INT, and ROUND

Summary

Section 2: Merging, Optimizing, and Descriptive Statistics

Combining, Indexing, Encryption, and Compression Techniques Simplified

Introduction to combining

Concatenating

Interleaving

Merging

Updating

Modifying

Concatenation

Different variable length and additional variables

Duplicate values

Different data types

Leveraging the temporary variable

PROC APPEND

Interleaving

Merging

By Matching

Overlapping variables

One-to-many merging

Program data vector

Many-to-many merging

Indexing

Unique values

Missing values

Encryption

Summary

Power of Statistics, Reporting, Transforming Procedures, and Functions

Proc Freq

Cross tabulation

Restricting Proc Freq output

Cross tabulation with a controlling variable

Proc Freq and statistical tests

Proc Univariate

Basic statistics and extreme observations

Tests of normality

Tests for location

Proc Means and Summary

Proc Means

Proc Summary

Proc Corr

Proc REG

Proc Transpose

Summary

Section 3: Advanced Programming

Advanced Programming Techniques - SAS Macros

What are macros?

Macro variable processing

Macro resolution tracking

Macro definition processing

Comparing positional and keywords parameters

Data-driven programming

Leveraging automatic global macro variables

Macros that evaluate

Writing efficient macros

Summary

Powerful Functions, Options, and Automatic Variables Simplified

NOMPREPLACE and MREPLACE

NOMCOMPILE and NCOMPILE

MCOMPILENOTE

NOMEXECNOTE and MEXECNOTE

MAUTOCOMPLOC

MACRO and NOMACRO

Available macro functions 

Exchanging values between the DATA step and macro variables

Choosing between CALL SYMGET and CALL SYMPUT

CALL EXECUTE

Altering the CALL SYMPUT example

Resolving macro variables

Macro variable names within text

Macro variables and libraries

Indirect macro referencing

Series of macro variable references with a single macro call

Multiple ampersands

Macro quoting

Using the %STR quote

Using the %NRSTR quote

Using the %BQOUTE and %NRBQOUTE quotes

Summary

Section 4: SQL in SAS

Advanced Programming Techniques Using PROC SQL

Comparing data steps and Proc SQL

Proc SQL joins

Inner join

Left join

Right join

Full join

One-to-many join

Many-to-many join

Proc SQL essentials

Subsetting

Grouping and summarizing

Dictionary tables

Summary

Deep Dive into PROC SQL

SAS views in Proc SQL

SQL views syntax

Describing views

Optimizing performance using views

Making changes with Proc SQL

Deleting

Altering

Identifying duplicates using Proc SQL

Creating an index in Proc SQL

Macros and Proc SQL

Creating a macro variable using Into

Creating multiple macro variables using Into

Summary

Section 5: Data Visualization and Reporting

Data Visualization

The role of data visualization in analytics

Histograms

Line plots

Vertical and horizontal bar charts

Scatter charts

Box plot

Summary

Reporting and Output Delivery System

Proc Tabulate

Comparing multiple Proc Means and Proc Tabulates

Multiple tables using Proc Tabulate

Choosing the statistics

Formatting the output

Two-dimensional output

Specifying the ODS destination

Formatting ODS files

Multiple sheets

Applying filters

Controlling the print options

Changing the default cells

ODS Excel charts

Color-coding the output

Copying over the formula

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

SAS is one of the leading enterprise tools in the world today in the fields of data management and analysis. It enables faster, easier processing of data and empowers you to get valuable business insights for effective decision-making.

This book will serve as an all-encompassing, comprehensive guide that you can refer to while preparing for your SAS certification exam. After a quick walk-through of the SAS architecture and components, this book teaches you the different ways to import and read data from different sources using SAS. You will become familiar with SAS Base, the 4GL programming language, and SQL procedures, with comprehensive coverage of topics such as data management and data analysis. You will then move on to learn about the advanced aspects of macro-programming.

By the end of this book, you will be an expert in SAS programming and will be able to handle and manage your data-related problems in SAS with ease.

Who this book is for

If you are a data professional who's new to SAS programming and wants to be an expert at it, this is the book for you. Those looking to prepare for the SAS certification exam will also find this book to be a very handy resource. Some understanding of basic data management concepts will help you get the most out of this book.

What this book covers

Chapter 1, Introduction to SAS Programming, introduces programming concepts and instills in you the confidence to write basic SAS programs. We will explore what happens behind the scenes in SAS and thereby ensure that the fundamentals are in place to learn advanced concepts in the book.

Chapter 2, Data Manipulation and Transformation, includes comprehensive coverage of data manipulation, including tasks such as numeric-to-character conversion, handling missing values and blanks, and logic and control functions.

Chapter 3, Combining, Indexing, Encryption, and Compression Techniques Simplified, will focus on understanding the pros and cons of various data table combination techniques. We will explore the pros and cons of techniques using examples and look under the hood to see how SAS processes code.

Chapter 4, Power of Statistics, Reporting, Transforming Procedures, and Functions, looks at built-in SAS procedures that help reduce the coding effort required on your part and provide you with the ability to transform data, produce statistics, run statistical tests, and produce reports. 

Chapter 5, Advanced Programming Techniques – SAS Macros, focuses on understanding the concept of loops and SAS macros. This chapter will help you move to advanced programming within SAS.

Chapter 6, Powerful Functions, Options, and Automatic Variables Simplified, focuses on mastering SAS macros using system options and functions that help debug and optimize code.

Chapter 7, Advanced Programming Techniques Using PROC SQL, is all about the PROC SQL procedure. We will start by understanding basic concepts such as Cartesian joins, then explore the pros and cons of using DATA steps over PROC SQL. Using various examples, we will perform multiple data tasks using PROC SQL.

Chapter 8, Deep Dive into PROC SQL, will introduce the unique benefits of combining our understanding of PROC SQL with macros.

Chapter 9, Data Visualization, covers data visualization, which is vital in the world of big data, where visual analysis is key to understanding the insights that are generated from data reporting and data mining. We will look at why visualization is important and how we can produce charts in SAS to help deliver the value that data has to offer effectively.

Chapter 10, Reporting and Output Delivery System, focuses on the packaging and production of data reports and insights in multiple formats and platforms.

To get the most out of this book

B

asic knowledge of 

SAS programming is what you need to get the most out of this book.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

Support

tab.

Click on

Code Downloads

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-SAS-For-Data-Analysis. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781788839822_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We have now specified the desired length of the Make variable."

A block of code is set as follows:

Data Cars;Length Make $ 15. Default=4;Input Make $ Year;Datalines;Porsche_Cayenne 2018

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The process of breaking this information is called tokenization."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: SAS Basics

This part introduces the reader to the SAS environment—writing your first program and providing a glimpse into how SAS works in the background to execute the program. Readers will learn how to manipulate and transform data using a variety of functions.

This section comprises the following chapters:

Chapter 1

Introduction to SAS Programming

Chapter 2

Data Manipulation and Transformation

Introduction to SAS Programming

In this chapter, we will learn and master basic SAS programming techniques. For the uninitiated of you, this chapter should be a stepping stone to SAS programming. For experienced SAS programmers, this chapter will help you revise some behind the scenes functionalities and tricks of SAS. In either case, this chapter will lay the foundation for how good an advanced SAS programmer you can be. As we progress through this chapter, we will cover the following topics:

SAS dataset fundamentals

SAS programming language—basic syntax

SAS LOG

Dataset options

SAS operators

Formats

Subsetting datasets

SAS dataset fundamentals

The SAS dataset contains values that are organized as rows and columns that can be processed (read/written) by SAS. The dataset can be a data file (table) or view. Either way, a dataset is typically rectangular in format. The dataset has a descriptor portion and data portion. While in the following table, we can only see the column/variable names, the descriptor portion holds further information such as the number of rows (more commonly referred to as observations) in the dataset, date and time of creation, and the operating environment in which it was created. This section is called the data portion, which holds all the data values:

The maximum number of observations that can be counted for a SAS dataset is determined by the long integer data type size for the operating environment. In operating environments with a 32-bit long integer, the maximum number is 231-1 or approximately 2 billion observations (2,147,483,647). In operating environments with a 64-bit long integer, the maximum number is 263-1 or approximately 9.2 quintillion observations. Operating machines with a 32-bit long integer are likely to reach the maximum observation count of 2 billion observations in some real-world scenarios. However, the 64-big long integer machines are unlikely to ever reach the upper limit of observations permitted.

While dealing with SAS data, we are less concerned with the exact number of observations. It doesn't matter whether they are 5 million or 6 million observations. However, it should be much faster to query a 500-observation table compared to one with 5 million observations. The observations merely help in estimating the processing time. Throughout the book, we will learn about programming techniques that will help speed up processing. In this chapter, we will learn about compression.

The aspect more important than observations is the number of records per ID variable. In the cost of living table, we have 15 observations. Each observation is a record of a different city. In this case, the variable city has become an ID variable. In a transactional table of retail sales, you may have hundreds of records for each loyalty card. The multiple records may represent the basket of goods that have been purchased over a period of time. All the records would be linked to a single loyalty card number.

Data is seldom fully populated for each variable. For example, a data table constructed using responses from a questionnaire may have missing responses from a few respondents if the question that's being asked isn't mandatory. This information may not be available for each variable and such instances would be set to missing values in the table. A period (.) represents a missing numeric record, whereas a space (" ") represents a missing character record.

Please remember that a 0 value and missing values aren't the same.

Creating an SAS table

The task that SAS performs to create a table can be categorized into two phases:

Compile

Execute

The following flowchart shows us the detailed compilation and execution process in table creation:

Compile phase

The compile phase is one that is often not well understood by the users as this is the backend processing the output that was generated. Tasks within the compile phase include syntax check, input buffer (not created if reading an existing dataset), program data vector (PDV), and descriptor information:

Syntax check

: In the syntax check task, SAS checks whether the code syntax is correct and then converts the programming statements into machine code to help execute the code. Only if the syntax is correct does SAS proceed with other tasks in the compile phase.

Input buffer

: The input buffer is a logical area in memory in which SAS reads each record of data from a raw data file where the program executes. In the case when a dataset is created from another SAS dataset, an input buffer is not created.

PDV

: This is a logical area of memory where SAS builds the dataset by writing each observation one at a time. Data is read from the input buffer. Values are assigned to the variables in the PDV. The values are written to the dataset as a single observation. There are two automatic variables created in PDV, namely

_N_

and

_ERROR_

.

Both these variables are not part of the output dataset that's created. The

_N_

variable signifies the number of iterations of the data step. The

_ERROR_

variable captures the number of instances in each data step when an error occurs.

Descriptor information

: This contains information about the dataset and includes both the dataset and variable attributes.

Execution phase

In this phase, SAS writes the PDV values to the output dataset for the current observation. The values of the PDV are set to missing. If there are any more records to read, then the program goes back to the top of the data step and executes it again. The next observation is built and stored in the output dataset. This process goes on until there are no more records to read. After this, the dataset is then closed and SAS goes to the next DATA or PROC step (if available) in the program file.

Dataset creation example

Let's look at the steps in the compile and execution phase while creating the cost of living dataset we showcased in the preceding screenshot. We will run the following program to create the dataset:

DATA COST_LIVING;INPUT City $12. Index Prev_yr_index Housing Food Travel Utility Education Leisure Other;DATALINES;Adelaide 85 83 35 10 10 9 14 10 12Beijing 90 92 40 10 15 10 18 5 2Copenhagen 65 64 25 15 10 10 12 12 16Doha 56 50 30 15 5 10 10 20 10Dubai 75 76 30 16 14 10 20 8 2Dublin 45 43 30 10 8 12 10 15 15Hong Kong 83 88 45 5 10 15 15 9 1Johannesburg 35 40 45 5 5 15 15 10 5Manila 41 42 25 10 15 15 20 10 5Moscow 48 53 40 20 5 5 10 10 10Mumbai 83 85 40 10 15 15 10 9 1Munich 65 64 35 10 10 10 10 10 15 New York 89 85 40 10 15 10 20 5 5Oslo 60 58 25 15 5 5 15 20 15Paris 70 70 30 10 5 10 10 20 15Seoul 73 75 30 10 10 10 15 15 10Singapore 75 74 35 15 10 10 20 5 5Tokyo 87 85 40 15 10 5 15 14 1Zurich 63 61 30 10 10 15 10 10 15;RUN;

In its current form, the program will execute without errors. This is because the first phase of compile that checks for syntax errors will not come across any coding errors. For illustration purposes, we can try and remove the ; after the DATALINES command. The following error will be encountered when we try to run the modified code and no output table will be generated:

Let's review the steps in SAS processing for the preceding date creation program to understand how the PDV is generated. After the syntax check is done, the input buffer and the PDV are created. The PDV contains all the variables that are declared in the input statement. Initially, all the variable values are set to missing. The automatic _N_ and _ERROR_variables are both set to 0:

Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-

PDV:

City

Index

Prev_yr_index

Housing

Food

Travel

Utility

Education

Leisure

Other

.

.

.

.

.

.

.

.

.

 

The Cityvariable has been declared as a character variable and the rest of the variables are numeric. The missing character values are written as blanks and the missing numeric values are written as periods. After this stage, the data step executes and the data values are first assigned to the input buffer before being written to the PDV:

Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-Adelaide 85 83 35 10 10 9 14 10 12

PDV:

City

Index

Prev_yr_index

Housing

Food

Travel

Utility

Education

Leisure

Other

Adelaide

85

83

35

10

10

9

14

10

12

 

At this point, SAS writes the data values in the PDV to the output dataset. The _N_variable is set to 1 and _ERROR_ is set to 0. The PDV is set to missing values. Since we have more lines of data to read, the program will keep executing:

Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-Adelaide 85 83 35 10 10 9 14 10 12

PDV:

City

Index

Prev_yr_index

Housing

Food

Travel

Utility

Education

Leisure

Other

 

.

.

.

.

.

.

.

.

.

 

For the second line of data, the following values in the input buffer and PDV will be written:

Input Buffer----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+-Beijing 90 92 40 10 15 10 18 5 2

PDV:

City

Index

Prev_yr_index

Housing

Food

Travel

Utility

Education

Leisure

Other

Beijing

90

92

40

10

15

10

18

5

2

 

The observations will now be written to the dataset and the variable _N_ will be incremented by 1 to take its value to 2. The _ERROR_variable will again be reset to 0 as no errors have been encountered. This process will continue until the time the last data observations have been read by the program and sent to the output dataset.

SAS programming language – basic syntax

We used code in the first program to create an output dataset. The dataset was created by what is known as data steps. Data steps are a collection of statements that can help create, modify, and control the output. SAS also leverages Structured Query Language (SQL). Let's review the basic syntax for data steps and SQL within SAS. We will continue to explore more advanced versions of code throughout this book.

Data step

The following code represents one of the simplest forms of a data step:

DATA WORK.Air;SET SASHELP.Air;RUN;

Using theSETstatement in this data step, we have specified the dataset that we want to refer to. There are no conditional statements in the program that are selecting a proportion of records from theAirdataset in theSASHELPlibrary. As a result, all the contents of the dataset in the set statement will get copied over to the dataset specified in the data statement. In the set statement, it is necessary to specify a dataset. However, if you specify _LAST_ in theSETstatement, the dataset that was last created in the SAS session will be used instead. Finally, theRuncommand is specified to execute the program. The only instance when the command isn't needed in a data statement is when creating a dataset using anINPUTstatement (as shown in the following code block).

The use of DATA signifies that we are using the data step. In this statement, we specify the output dataset. WORK is what is known as a library in SAS. A library is like a Windows folder that stores files and various other things such as formats and catalogs. Every SAS session (each instance of a SAS software invocation is a separate session) is assigned its own temporary workspace. The temporary workspace is known as the Work library. At the end of the session, the temporary work session is cleared and unless saved in a permanent library, all the contents of the Work library are deleted. If the Work library is not specified, the dataset will be created in the temporary workspace by default.

A permanent library is one that can be assigned using a physical path. In BASE SAS, this can be a physical folder located in the computer drives. In the case of SAS Enterprise Guide Studio and other software, this may be a space on the server. The dataset name consists of two parts—the library name followed by the dataset name. Both are separated by a period. For the creation of datasets in the Work library, users only need to specify the dataset name. If no dataset name is specified, SAS names the dataset as D1, D2, and so on:

DATA;INPUT Id;DATALINES;12;RUN;

Since this is the first program in our SAS session without a dataset name specified, the name D1 will be assigned to the dataset:

We will explore the data step options in further detail throughout this book.

Proc SQL

SAS leverages SQL through a built-in procedure. While we aren’t going to focus on SQL in this book, let's look at the basic structure of a SQL query in SAS:

PROC SQL; CREATE TABLE Table_Name AS SELECT FROM WHERE GROUP BY;QUIT;

PROC is the command that's used to specify a built-in procedure in SAS. In the case of the preceding program, we are referring to the SQL procedure. Just like in the data step, we start off by specifying the table name that we are creating. We then list the variables that we want to select from another dataset. The name of the dataset that has been selected is named in the FROM statement. The WHERE clause is used to sub-select the data. The GROUP BY clause is used for summary functions. Finally, we end the procedure by specifying the QUIT argument.

SAS LOG

The SAS LOG section of your coding environment is where all the actions performed that have been by the user in the current session are stored. These include instances of program submission and also messages about any programs that you might have terminated while they were executing. Apart from these, the SAS LOG also contains system-generated messages. These are of two types. The first instance is where the SAS version and a few other details about your system are written to the LOG. The second is when responses to the user code are generated and written to the LOG. The response could state that the program has run successfully, failed, or has some syntax issues that have been ignored. The responses are categorized in NOTE, INFO, WARNING, and ERROR categories. Program submission messages can be easily identified in the LOG as they have a line number associated with them.

Let's examine the LOG that's generated after running the first program from the preceding Data step section:

On the left-hand side of the LOG is the line number. Line number 1 contains the default settings that are in place for this SAS session. The log for our program starts getting generated in line 73. From line 73 to line 75, the program that has been specified in the program editor window is replicated in the log. There are no errors being produced in the log, unlike the one shown in the Data creation example section. The note that's produced mentions the number of observations read from the input dataset. It also contains the number of observations and variables in the output dataset. The time it took to execute the query is also mentioned at the end of the notes.

By reading the LOG, the user can review the program executed and notes, warnings, or errors produced, review the summary produced about the input and output dataset, and check query execution time. After the first run, the user may want to modify the program. This could be because the output is not in sync with the intended requirement or an error has been generated. In any case, understanding of the log is required before we can edit the program.

Naming conventions in SAS

Some of the frequently used functionalities in SAS where naming conventions need to be followed are variables, datasets, formats or informats that are user-created, arrays, labels, macro variables, library names, and file references.

The general rules for SAS names are as follows:

Variable names can be up to 32 characters, whereas some other names such as library names can be up to 8 characters.

SAS isn't case sensitive in name specification, unlike some other programming languages or other statistical packages.

The name cannot start with a number. It needs to start with a letter or an underscore. The second character can be a number or an underscore. No special characters apart from underscore are allowed in the name. Underscores are frequently used by programmers for variable names where multiple words are involved, for example,

Order_Date

,

Payment_Date

, or

Delivery_Date

. A variable name cannot contain a blank. Hence, the underscore becomes an important way to make the variable names more legible for users of your code and data.

In some instances of SAS names for

filerefs

, some special characters are allowed.

Some names are reserved for SAS functions and keywords that are used by the system. For instance, you cannot specify a library name that is the default SAS library associated with your SAS installation. These include SASHELP, SASUSER, and WORK.

SAS already has a macro variable called sysdate. Users shouldn't attempt to create a macro variable with the same name.

The maximum length of arrays, labels, variables, numeric formats, macros, and datasets is 32. Character formats have a length of 31. Character and numeric informats have a length of 30 and 31, respectively. File references and library names have a maximum length of 8.

Naming conventions for Teradata in SAS

The Teradata naming conventions are different from the SAS names. Some of the key aspects are the following:

Unlike SAS, where the name can be up to 32 characters, Teradata names in SAS need to be between 1 and 30 characters.

The name can contain the letters A to Z, numbers, underscores, and also the dollar and pound signs.

You can specify a name in double quotes. That way, it can contain any characters except double quotation marks.

Names in double quotes are not case sensitive.

Dataset options

There are many built-in SAS options that apply to the dataset. The broad purpose of these options is to help us do the following:

Rename variables

Select variables for subsetting

Retain select variables

Specify the password for a dataset, compress it, and encrypt it

Throughout this book, we will be looking at various dataset options. We will begin by exploring the compress, encrypt, and index options.