SAS for Finance - Harish Gulati - E-Book

SAS for Finance E-Book

Harish Gulati

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

SAS is a groundbreaking tool for advanced predictive and statistical analytics used by top banks and financial corporations to establish insights from their financial data.

SAS for Finance offers you the opportunity to leverage the power of SAS analytics in redefining your data. Packed with real-world examples from leading financial institutions, the author discusses statistical models using time series data to resolve business issues.

This book shows you how to exploit the capabilities of this high-powered package to create clean, accurate financial models. You can easily assess the pros and cons of models to suit your unique business needs.

By the end of this book, you will be able to leverage the true power of SAS to design and develop accurate analytical models to gain deeper insights into your financial data.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 290

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



SAS for Finance
Forecasting and data analysis techniques with real-world examples to build powerful financial models
Harish Gulati
BIRMINGHAM - MUMBAI

SAS for Finance

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor:Amey VarangaonkarAcquisition Editor:Divya PoojariContent Development Editor:Amrita NoronhaTechnical Editor:Nilesh SawakhandeCopy Editor: Safis EditingProject Coordinator:Shweta H BirwatkarProofreader: Safis EditingIndexer:Aishwarya GangawaneGraphics:Jisha ChirayilProduction Coordinator:Shantanu Zagade

First published: May 2018

Production reference: 1250518

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78862-456-5

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Harish Gulati is a consultant, analyst, modeler, and trainer based in London. He has 15 years' financial, consulting, and project management experience with leading banks, management consultancies, and media hubs. He enjoys demystifying his complex line of work in his spare time. This has led to him being an author and orator at analytical forums. He has also co-authored Role of a Data Analyst, published by the British Chartered Institute of IT (BCS). He has an MBA in brand communications and a degree in psychology and statistics.

About the reviewer

Rashmi Gupta is an entrepreneur and consultant for established media and financial brands in the field of marketing and digital analytics. She is currently the director of Agile Fintech Partners. Artificial intelligence is a subject area that interests her, and she is currently building her expertise in the area.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title Page

Copyright and Credits

SAS for Finance

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Disclaimer

Time Series Modeling in the Financial Industry

Time series illustration

The importance of time series

Forecasting across industries

Characteristics of time series data

Seasonality

Trend

Outliers and rare events

Disruptions

Challenges in data

Influencer variables

Definition changes

Granularity required

Legacy issues

System differences

Source constraints

Vendor changes

Archiving policy

Good versus bad forecasts

Use of time series in the financial industry

Predicting stock prices and making portfolio decisions

Adhering to Basel norms

Demand planning

Inflation forecasting

Managing customer journeys and maintaining loyalty

Summary

References

Forecasting Stock Prices and Portfolio Decisions using Time Series

Portfolio forecasting

A portfolio demands decisions

Forecasting process

Visualization of time series data

Business case study

Data collection and transformation

Model selection and fitting

Part A – Fit statistics

Part B - Diagnostic plots

Part C - Residual plots

Dealing with multicollinearity

Role of autocorrelation

Scoring based on PROC REG

ARIMA

Validation of models

Model implementation

Recap of key terms

Summary

Credit Risk Management

Risk types

Basel norms

Credit risk key metrics

Exposure at default

Probability of default

Loss given default

Expected loss

Aspects of credit risk management

Basel and regulatory authority guidelines

Governance

Validation

Data

PD model build

Genmod procedure

Proc logistic

Proc Genmod probit

Summary

Budget and Demand Forecasting

The need for the Markov model

Business problem

Markovian model approach

ARIMA model approach

Markov method for imputation

Summary

Inflation Forecasting for Financial Planning

What is inflation?

Reasons for inflation

Inflation outcome and the Philips curve

Winners and losers

Business case for forecasting inflation

Data-gathering exercise

Modeling methodology

Multivariate regression model

Forward selection model

Backward selection

Maximize R

Univariate model

Summary

Managing Customer Loyalty Using Time Series Data

Advantages of survival modeling

Key aspects of survival analysis

Data structure

Business problem

Data preparation and exploration

Non-parametric procedure analysis

Survival curve for groups

Survival curve and covariates

Parametric procedure analysis

Semi-parametric procedure analysis

Summary

Transforming Time Series – Market Basket and Clustering

Market basket analysis

Segmentation and clustering

MBA business problem

Data preparation for MBA

Assumptions for MBA

Analysis of a set size of two

A segmentation business problem

Segmentation overview

Clustering methodologies

Segmentation suitability in the current scenario

Segmentation modeling

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

SAS is the world's largest privately held software business that offers an integrated suite of software solutions to manage data, produce reports, and build statistical models.

Who this book is for

The book introduces statistical models in the finance industry in a simplified manner. It has real-world examples supported by data and code that reproduces the models. The chapters explain the relevance of the models to business problems, and the discussions about the diagnostics explains how the models can be implemented. The book uses various graphical illustrations, rather than having a focus on equations, to help the reader understand complex models. The book is designed to be a quick introduction to various modeling techniques by explaining their key concepts.

The intended reader is someone aspiring to work in the financial industry, or one of the many financial industry professionals who want to explore its various facets. The reader could also be a student curious to know how theoretical knowledge is applied in the industry, or a finance professional who wants to up-skill and move on to another role. The book's audience may also include any individual who works as a data analyst, data scientist, data architect, data engineer, analytics and insights professional, business analyst, or someone who integrates the outputs of models in business strategy but isn't aware of how problems are solved.

What this book covers

Chapter 1, Time Series Modeling in the Financial Industry, introduces time series modeling, and discusses its importance, the characteristics and challenges of data, and explains its use in the financial industry. The chapter also discusses the way forecasting is used across industries and what is meant by a good or bad forecast.

Chapter 2, Forecasting Stock Prices and Portfolio Decisions using Time Series, discusses the concept of portfolio forecasting and the decisions involved in managing portfolios. After exploring the forecasting process and the visualization of time series data, the chapter discusses modeling techniques and explains how to select the most suitable one based on real-world modeling examples.

Chapter 3, Credit Risk Management, provides context regarding the highly regulated nature of the industry. Basel norms and key terms such as PD, LGD, EAD, and EL are discussed. A PD model build methodology is briefly discussed.

Chapter 4, Budget and Demand Forecasting, helps create an understanding of the Markov model and showcases how to build a model. The chapter goes on to compare the Markov model forecast with ARIMA-generated forecasts. It also explains how Markov Chain Monte Carlo can be used for data imputation.

Chapter 5, Inflation Forecasting for Financial Planning, defines inflation, explores the reasons for inflation, and discusses its outcomes using the theory of the Phillips curve. The chapter also shows how to leverage various procedures for data quality checks. Univariate and multivariate modeling techniques are used for forecasting and a comparison of the results.

Chapter 6, Managing Customer Loyalty using Time Series Data, introduces survival modeling, data preparation techniques, and various methodologies, including parametric and semi-parametric methods. It does this in the context of solving a business problem related to customer loyalty.

Chapter 7, Transforming Time Series – Market Basket and Clustering, provides multiple business examples while discussing the background and methodology of these techniques.

To get the most out of this book

Basic knowledge of undergraduate-level mathematics is necessary. However, no advanced mathematical degree is required to decipher how the financial industry uses time series modeling to solve problems. Functional knowledge of SAS is desirable but isn't mandatory.

SAS University Edition is free software that is used throughout the book. Download details can be found at https://www.sas.com/en_gb/software/university-edition.html.

Download the example code files

You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packtpub.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/SAS-for-Finance. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/PacktPublishing/SASforFinance_ColorImages.pdf.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Disclaimer

SAS Institute Inc. hereby grants the author permission to use screenshots of SAS output using SAS® University Edition software. It is with the understanding that the data produced will be customized/provided by the author.

Created with SAS® University Edition software. Copyright 2014, SAS Institute Inc., Cary, NC, USA. All Rights Reserved. Reproduced with permission of SAS Institute Inc., Cary,NC

Time Series Modeling in the Financial Industry

A space center is monitoring the weather pattern to schedule a departure time for its latest Martian explorer. An economist is readying his gross domestic product (GDP) forecasts to be used by equity traders, who are eager to know if we had a quarter of growth or another economic contraction. In both cases, they are relying on time series data. In the former instance to forecast a weather event, and in the latter to determine which direction GDP forecasts are headed. So, what do we mean by time series?

A series can be defined as a number of events, objects, or people of a similar or related kind coming one after another; if we add the dimension of time, we get a time series. A time series can be defined as a series of data points in time order. For example, the space center will use data from the last few years to predict the weather pattern. The data collection would have started a few years ago and subsequent data points would have given rise to an order in which data was been collected. Another aspect of the data that we usually observe is periodicity. For example, weather data would usually be collected daily, if not hourly. The periodicity of time series data is a slow-moving dimension as it seldom changes. The periodicity of recording observations is broadly driven by three factors, which are relevance, behavior driven, and purpose. In the case of weather patterns, we probably need to know how the weather will change over the course of the day. The point of sales (POS) data from debit card transactions of an individual will be recorded every time there is usage. GDP data, however, is usually aggregated in a time series format every quarter, as these numbers are usually reported on a quarterly basis by central banks or related institutions.

In this chapter, we will explore the following topics:

Time series illustration

The importance of time series

Forecasting across industries

Characteristics of time series data

Challenges in data

Good versus bad forecasts

The use of time series in the financial industry

Time series illustration

The following graph shows the quarterly GDP growth of one of Europe's leading economies. The series has been compiled at a quarterly level, and all data points from 2005 to Q3 in 2017 have been used to plot the graph. We can see that there was a decline in GDP between Q3 in 2006 and Q1 in 2009, but GDP has primarily seen an upward trajectory since then, as follows:

Figure 1.1: GDP quarterly growth of a leading European economy

The importance of time series

What importance, if any, does time series have and how will it be relevant in the future? These are just a couple of fundamental questions that any user should find answers to before delving further into the subject. Let's try to answer this by posing a question. Have you heard the terms big data, artificial intelligence (AI), and machine learning (ML)?

These three terms make learning time series analysis relevant. Big data is primarily about a large amount of data that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interaction. AI is a kind of technology that is being developed by data scientists, computational experts, and others to enable processes to become more intelligent, while ML is an enabler that is helping to implement AI. All three of these terms are interlinked with the data they use, and a lot of this data is time series in its nature. This could be either financial transaction data, the behavior pattern of individuals during various parts of the day, or related to life events that we might experience. An effective mechanism that enables us to capture the data, store it, analyze it, and then build algorithms to predict transactions, behavior (and life events, in this instance) will depend on how big data is utilized and how AI and MI are leveraged.

A common perception in the industry is that time series data is used for forecasting only. In practice, time series data is used for:

Pattern recognition

Forecasting

Benchmarking

Evaluating the influence of a single factor on the time series

Quality control

For example, a retailer may identify a pattern in clothing sales every time it gets a celebrity endorsement, or an analyst may decide to use car sales volume data from 2012 to 2017 to set a selling benchmark in units. An analyst might also build a model to quantify the effect of Lehman's crash at the height of the 2008 financial crisis in pushing up the price of gold. Variance in the success of treatments across time periods can also be used to highlight a problem, the tracking of which may enable a hospital to take remedial measures. These are just some of the examples that showcase how time series analysis isn't limited to just forecasting. In this chapter, we will review how the financial industry and others use forecasting, discuss what a good and a bad forecast is, and hope to understand the characteristics of time series data and its associated problems.

Forecasting across industries

Since one of the primary uses of time series data is forecasting, it's wise that we learn about some of its fundamental properties. To understand what the industry means by forecasting and the steps involved, let's visit a common misconception about the financial industry: only lending activities require forecasting. We need forecasting in order to grant personal loans, mortgages, overdrafts, or simply assess someone's eligibility for a credit card, as the industry uses forecasting to assess a borrower's affordability and their willingness to repay the debt. Even deposit products such as savings accounts, fixed-term savings, and bonds are priced based on some forecasts. How we forecast and the rationale for that methodology is different in borrowing or lending cases, however. All of these areas are related to time series, as we inevitably end up using time series data as part of the overall analysis that drives financial decisions. Let's understand the forecasts involved here a bit better. When we are assessing an individual's lending needs and limits, we are forecasting for a single person yet comparing the individual to a pool of good and bad customers who have been offered similar products. We are also assessing the individual's financial circumstances and behavior through industry-available scoring models or by assessing their past behavior, with the financial provider assessing the lending criteria.

In the case of deposit products, as long as the customer is eligible to transact (can open an account and has passed know your customer (KYC), anti-money laundering (AML), and other checks), financial institutions don't perform forecasting at an individual level. However, the behavior of a particular customer is primarily driven by the interest rate offered by the financial institution. The interest rate, in turn, is driven by the forecasts the financial institution has done to assess its overall treasury position. The treasury is the department that manages the central bank's money and has the responsibility of ensuring that all departments are funded, which is generated through lending and attracting deposits at a lower rate than a bank lends. The treasury forecasts its requirements for lending and deposits, while various teams within the treasury adhere to those limits. Therefore, a pricing manager for a deposit product will price the product in such a way that the product will attract enough deposits to meet the forecasted targets shared by the treasury; the pricing manager also has to ensure that those targets aren't overshot by a significant margin, as the treasury only expects to manage a forecasted target.

In both lending and deposit decisions, financial institutions do tend to use forecasting. A lot of these forecasts are interlinked, as we saw in the example of the treasury's expectations and the subsequent pricing decision for a deposit product. To decide on its future lending and borrowing positions, the treasury must have used time series data to determine what the potential business appetite for lending and borrowing in the market is, and would have assessed that with the current cash flow situation within the relevant teams and institutions.

Characteristics of time series data

Any time series analysis has to take into account the following factors:

Seasonality

Trend

Outliers and rare events

Disruptions and step changes

Seasonality

Seasonality is a phenomenon that occurs each calendar year. The same behavior can be observed each year. A good forecasting model will be able to incorporate the effect of seasonality in its forecasts. Christmas is a great example of seasonality, where retailers have come to expect higher sales over the festive period.

Seasonality can extend into months but is usually only observed over days or weeks. When looking at time series where the periodicity is hours, you may find a seasonality effect for certain hours of the day. Some of the reasons for seasonality include holidays, climate, and changes in social habits. For example, travel companies usually run far fewer services on Christmas Day, citing a lack of demand. During most holidays people love to travel, but this lack of demand on Christmas Day could be attributed to social habits, where people tend to stay at home or have already traveled. Social habit becomes a driving factor in the seasonality of journeys undertaken on Christmas Day therefore.

It's easier for the forecaster when a particular seasonal event occurs on a fixed calendar date each year; the issue comes when some popular holidays depend on lunar movement, such as Easter, Diwali, and Eid. These holidays may occur in different weeks or months over the years, which will shift the seasonality effect. Also, if some holidays fall closer to other holiday periods, it may lead to individuals taking extended holidays and travel sales may increase more than expected in such years. The coffee shop near the office may also experience lower sales for a longer period. Changes in the weather can also impact seasonality; for example, a longer, warmer summer may be welcome in the UK, but this would impact retail sales in the autumn as most shoppers wouldn't need to buy a new wardrobe. In hotter countries, sales of air-conditioners would increase substantially compared to the summer months' usual seasonality. Forecasters could offset this unpredictability in seasonality by building in a weather forecast variable. We will explore similar challenges in the chapters ahead.

Seasonality shouldn't be confused with a cyclic effect. A cyclic effect is observed over a longer period of generally two years or more. The property sector is often associated with having a cyclic effect, where it has long periods of growth or slowdown before the cycle continues.

Trend

A trend is merely a long-term direction of observed behavior that is found by plotting data against a time component. A trend may indicate an increase or decrease in behavior. Trends may not even be linear, but a broad movement can be identified by analyzing plotted data.

Outliers and rare events

Outliers and rare events are terminologies that are often used interchangeably by businesses. These concepts can have a big impact on data, and some sort of outlier treatment is usually applied to data before it is used for modeling. It is almost impossible to predict an outlier or rare event but they do affect a trend. An example of an outlier could be a customer walking into a branch to deposit an amount that is 100 times the daily average of that branch. In this case, the forecaster wouldn't expect that trend to continue.

Disruptions

Disruptions and step changes are becoming more common in time series data. One reason for this is the abundance of available data and the growing ability to store and analyze it. Disruptions could include instances when a business hasn't been able to trade as normal. Flooding at the local pub may lead to reduced sales for a few days, for example. While analyzing daily sales across a pub chain, an analyst may have to make note of a disruptive event and its impact on the chain's revenue. Step changes are also more common now due to technological shifts, mergers and acquisitions, and business process re-engineering. When two companies announce a merger, they often try to sync their data. They might have been selling x and y quantities individually, but after the merger will expect to sell x + y + c (where c is the positive or negative effect of the merger). Over time, when someone plots sales data in this case, they will probably spot a step change in sales that happened around the time of the merger, as shown in the following screenshot:

Figure 1.2: Online travel booking chart showing characteristics of time series

In the trend graph, we can see that online travel bookings are increasing. In the step change and disruptions chart, we can see that Q1 of 2012 saw a substantive increase in bookings, where Q1 of 2014 saw a substantive dip. The increase was due to the merger of two companies that took place in Q1 of 2012. The decrease in Q1 of 2014 was attributed to prolonged snow storms in Europe and the ash cloud disruption from volcanic activity over Iceland. While online bookings kept increasing after the step change, the disruption caused by the snow storm and ash cloud only had an effect on sales in Q1 of 2014. In this case, the modeler will have to treat the merger and the disruption differently while using them in the forecast, as disruption could be disregarded as an outlier and treated accordingly. Also note that the seasonality chart shows that Q4 of each year sees almost a 20% increase in travel bookings, and this pattern continues each calendar year.

Challenges in data

If your client says they have an abundance of good quality data, be sure to take it with a pinch of salt. Data collection and processing are cost – and time-intensive tasks. There is always a chance that some data within the organization may not be of as high a quality as another data set. The problems in time series are often compounded by the time element. Due to challenges in data, organizations need to recalculate metrics and make changes to historical data, source additional data and build a mechanism to store it, as well as reconcile data when there are different definitions of data or a new data source doesn't reconcile with the previous sources. Data processing and collection may not be difficult for an organization if the requirement was for data going forward; historical data recalibration across a large time period, on the other hand, will present some challenges, as shown in the following diagram:

Figure 1.3: Challenges in data

Influencer variables

The relationship between the forecasted and dependent variable and other influencer or independent variables changes over a period of time. For example, most forecasting models weren't able to predict the global economic crash of 2008. Post-crash, modelers in a leading bank tried to rebuild their models with new variables that would be able to predict behavior better. Some of these new influencer variables weren't available in the central database. A vendor was selected to provide history and a continuing data feed to enable the availability of such variables in the future. In this case, since the influencer variables changed, the modeler had to look outside the scope of available variables in the central database and try to find better fitting variables.

Definition changes

A financial institution recently changed its definition of defaulting as it moved from a standard approach to an advanced, internal rating-based approach to potentially reduce its capital requirements and credit risk. The change to the Basel definition means that the institution's entire historical database needed to be modified. In this case, the new default definition in the institution is calculated using at least six other variables that need to be stored and checked for data quality across several years.

Granularity required

After changing influencer variables and running models for a couple of years, a modeler was informed by a data team that the central bank is planning to stop providing granular data for one of the modeling variables and that the metric would still be published but in an aggregated manner. This may impact the usability of such data for modeling purposes. A change in a variable in a regulatory environment has multiple overheads. In this scenario, the modeler would have to engage with the data team to understand which variable can be used as a substitute. The IT team would then have to ensure that variable (if not already available as a regular feed) is made available to the data team and the modeler. The material impact of the change in the variable on the modeling output would also have to be studied and documented. There might be an instance where a modeling governance team has to be notified of changes in the model. In an ideal governance environment, if code changes are to accommodate a new variable, testing would be undertaken before any change is implemented. A simple change in variable granularity can trigger the need for multiple subsequent tasks.

Legacy issues

Legacy issues may mean that some amount of information isn't available in the central database. This could be due to the fact that a system was upgraded recently and didn't have the capability to capture some data.

System differences

System differences arise because of a user's behavior or the way systems process data. Users of a telephone banking customer relationship management (CRM) system, for example, may not be capturing the incoming details of its callers, whereas branch officers using a different CRM frontend will be. The branch data may only be sparsely available (only when the customer divulges income), but has the potential to be more accurate.

Source constraints

Source constraints could arise simply because of the way some systems are designed. A system designed to store data at a customer level will not be able to efficiently store data that has been aggregated at an account level, for example. Vendor changes may impact data quality or the frequency of when data is available. Organizations also have differing archival policies, so if a modeler is looking to use time series data that goes as far back as a decade, some of this data may have already been archived and retrieval may be a time-consuming affair.

Vendor changes

Most organizations end up using a particular vendor for a long period of time. In some instances, it is the bank's transactional data system, in others the CRM tool or data mining software. In all of these cases, there is dependency on a vendor. Most vendors would like to develop a relationship with their client and grow alongside them, where contracts would be re-negotiated but with greater functionality and scalable software provided. There are times, however, when a client and a vendor decide to part ways. Be mindful that this can be a painful exercise and a lot of things can go wrong. Such a transition might lead to temporary or long-term breaks in data; some systems may be switched off and the new system replacing it might not be able to capture data in the same manner. A vendor who supplied customer leads or risk ratings may no longer be contracted and data quality may even suffer from a new vendor. Businesses and modelers need to be aware of these challenges.

Archiving policy