The Excel Analyst's Guide to Access - Michael Alexander - E-Book

The Excel Analyst's Guide to Access E-Book

Michael Alexander

0,0
25,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

The ultimate handbook for Excel analysts who need reporting solutions using Access Excel and Access are intended to work together. This book offers a comprehensive review of the extensive analytical and reporting functionality that Access provides and how it enhances Excel reporting functions. Sales managers, operations analysts, administrative assistants, office managers, and many others who rely heavily on data can benefit from learning to integrate Excel and Access, and this book shows you how. Coverage includes: * Data Analysis in Access & the Basics of Access * Beyond Select Queries * Transforming Your Data with Access * Working with Calculations and Dates * Performing Conditional Analysis * Adding Dimension with Subqueries and Domain Aggregate Functions * Running Descriptive Statistics in Access * Scheduling and Running Batch Analysis * Leveraging VBA to Enhance Data Analysis * Reports, Dashboards, and Visualization in Access * Presenting Data with Access Reports * Using Pivot Tables and Pivot Charts in Access * Enhancing Queries and Reports with Visualizations * Advanced Excel and Access Integration Techniques * Getting Access Data into Excel * Using VBA to Move Data between Excel and Access * Exploring Excel and Access Automation * Integrating Excel and Access with XML * Integrating Excel and Other Office Applications * Access VBA Fundamentals * Understanding and Using SQL * and more! The Excel Analyst's Guide to Access helps you get more from both applications. Note: CD-ROM/DVD and other supplementary materials are not included as part of eBook file.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 680

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Title Page

Copyright

Dedication

About the Author

Credits

Acknowledgments

Introduction

What to Expect from This Book

What Not to Expect from This Book

Skills Required for This Book

How This Book is Organized

Conventions

Companion Database

Part I: Fundamentals of Data Analysis in Access

Chapter 1: The Case for Data Analysis in Access

Where Data Analysis with Excel Can Go Wrong

Deciding Whether to Use Access or Excel

An Excel User's Guide to Access: Don't Panic!

Summary

Chapter 2: Access Basics

Access Tables

Understanding the Relational Database Concept

Query Basics

The Top Ten Query Errors

Summary

Chapter 3: Beyond Select Queries

Aggregate Queries

Action Queries

Crosstab Queries

Summary

Part II: Basic Analysis Techniques

Chapter 4: Transforming Your Data with Access

Finding and Removing Duplicate Records

Common Transformation Tasks

Summary

Chapter 5: Working with Calculations and Dates

Using Calculations in Your Analysis

Using Dates in Your Analysis

The New Calculated Data Type

Summary

Chapter 6: Performing Conditional Analysis

Using Parameter Queries

Using Conditional Functions

Summary

Part III: Advanced Analysis Techniques

Chapter 7: Adding Dimension with Subqueries and Domain Aggregate Functions

Enhancing Your Analysis with Subqueries

Domain Aggregate Functions

Summary

Chapter 8: Running Descriptive Statistics in Access

Basic Descriptive Statistics

Advanced Descriptive Statistics

Summary

Chapter 9: Scheduling and Running Batch Analysis

Introduction to Access Macros

Setting Up and Managing Batch Analysis

Scheduling Macros to Run Nightly

Summary

Chapter 10: Leveraging VBA to Enhance Data Analysis

Creating and Using Custom Functions

Controlling Analytical Processes with Forms

Processing Data Behind the Scenes

Summary

Part IV: Reports, Dashboards, and Visualization in Access

Chapter 11: Presenting Data with Access Reports

Access Report Basics

Creating and Modifying Grouped Reports

Creating a Report from Scratch

Summary

Chapter 12: Using Pivot Tables and Pivot Charts in Access

Pivot Tables in Access?

The Anatomy of a Pivot Table

Creating a Basic Pivot Table

Creating an Advanced Pivot Table with Details

Saving Your Pivot Table

Sending Your Access Pivot Table to Excel

Pivot Table Options

Working with Pivot Charts in Access

Summary

Chapter 13: Enhancing Queries and Reports with Visualizations

Basic Visualization Techniques

Advanced Visualization Techniques

Summary

Part V: Advanced Excel and Access Integration Techniques

Chapter 14: Getting Access Data into Excel

The Different Methods for Importing Access Data

Summary

Chapter 15: Using VBA to Move Data between Excel and Access

Understanding ADO Fundamentals

Common Scenarios Where VBA Can Help

Summary

Chapter 16: Exploring Excel and Access Automation

Understanding the Concept of Binding

Automating Excel from Access

Automating Access from Excel

Summary

Chapter 17: Integrating Excel and Access with XML

Why XML?

Understanding XML

Creating a Simple Reporting Solution with XML

Creating a Data Entry Process Using XML

Summary

Chapter 18: Integrating Excel and Other Office Applications

Integrating Excel with Microsoft Word

Integrating Excel with PowerPoint

Integrating Excel and Outlook

Summary

Part VI: Appendixes

Appendix A: Access VBA Fundamentals

Covering the Basics in 10 Steps

Letting Access Teach You VBA

Appendix B: Understanding and Using SQL

Understanding Basic SQL

Getting Fancy with Advanced SQL Statements

Using SQL Specific Queries

Appendix C: Query Performance, Database Corruption, and Other Thoughts

Optimizing Query Performance

Handling Database Corruption

Getting Help in Access

Appendix D: Data Analyst's Function Reference

Abs

Asc

Atn

Choose

Chr

Cos

Date

DateAdd

DateDiff

DatePart

DateSerial

DateValue

Day

DDB

Domain Aggregate Functions

Exp

FormatCurrency

FormatDateTime

FormatNumber

FormatPercent

FV

Hour

IIf

InStr

InStrRev

IPmt

IRR

IsError

IsNull

IsNumeric

LCase

Left

Len

Log

Mid

Minute

MIRR

Month

MonthName

Now

NPer

NPV

NZ

Partition

Pmt

PPmt

PV

Rate

Replace

Right

Rnd

Round

Second

Sgn

Sin

SLN

Space

SQL Aggregate Functions

Sqr

Str

StrConv

String

StrReverse

Switch

SYD

Tan

Time

TimeSerial

TimeValue

Trim, LTrim, RTrim

TypeName

UCase

Val

VarType

Weekday

WeekdayName

Year

Index

The Excel® Analyst's Guide to Access®

Published by

Wiley Publishing, Inc.

10475 Crosspoint Boulevard

Indianapolis, IN 46256

www.wiley.com

Copyright © 2010 by Wiley Publishing, Inc., Indianapolis, Indiana

Published simultaneously in Canada

ISBN: 978-0-470-56701-2

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. No warranty may be created or extended by sales or promotional materials. The advice and strategies contained herein may not be suitable for every situation. This work is sold with the understanding that the publisher is not engaged in rendering legal, accounting, or other professional services. If professional assistance is required, the services of a competent professional person should be sought. Neither the publisher nor the author shall be liable for damages arising herefrom. The fact that an organization or Web site is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Web site may provide or recommendations it may make. Further, readers should be aware that Internet Web sites listed in this work may have changed or disappeared between when this work was written and when it is read.

For general information on our other products and services please contact our Customer Care Department within the United States at (877) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Library of Congress Control Number: 2010922044

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates, in the United States and other countries, and may not be used without written permission. Excel and Access are registered trademarks of Microsoft Corporation in the United States and/or other countries. All other trademarks are the property of their respective owners. Wiley Publishing, Inc. is not associated with any product or vendor mentioned in this book.

This is dedicated to the fans of DataPigTechnologies.com…all 12 of you.

About the Author

Mike Alexander is a Microsoft Certified Application Developer (MCAD) and author of several books on advanced business analysis with Microsoft Access and Excel. He has more than 15 years experience consulting and developing Office solutions. Michael has been named a Microsoft MVP for his ongoing contributions to the Excel community.

In his spare time he runs a free tutorial site, www.datapigtechnologies.com, where he shares basic Access and Excel tips to the Office community.

Credits

Executive Editor

Carol Long

Project Editor

Maureen Spears

Technical Editor

Dick Kusleika

Production Editor

Kathleen Wisor

Copy Editor

C.M. Jones

Editorial Director

Robyn B. Siesky

Editorial Manager

Mary Beth Wakefield

Marketing Manager

Ashley Zurcher

Production Manager

Tim Tate

Vice President and Executive Group Publisher

Richard Swadley

Vice President and Executive Publisher

Barry Pruett

Associate Publisher

Jim Minatel

Project Coordinator, Cover

Lynsey Stanford

Proofreader

Sheilah Ledwidge, Word One

Indexer

Robert Swanson

Cover Designer

Ryan Sneed

Acknowledgments

My deepest thanks to Carol Long and Maureen Spears, for all the hours of work put into bringing this book to life. Thanks also to Dick Kusleika (one of the best tech editors I've worked with) for suggesting numerous improvements to the examples and text in this book. Finally, a special thank you goes out to the wife and kids for putting up with all the time spent locked away on this project.

Introduction

If you were to ask a random sampling of people what data analysis is, most would say that it is the process of calculating and summarizing data to get an answer to a question. In one sense, they are correct. However, the actions they are describing represent only a small part of the process known as data analysis.

For example, if you were asked to analyze how much revenue in sales your company made last month, what would you have to do in order to complete that analysis? You would just calculate and summarize the sales for the month, right? Well, where would you get the sales data? Where would you store the data? Would you have to clean up the data when you got it? How would you present your analysis: by week, by day, by location? The point being made here is that the process of data analysis is made up of more than just calculating and summarizing data.

A more representative definition of data analysis is the process of systematically collecting, transforming, and analyzing data in order to present meaningful conclusions. To better understand this concept, think of data analysis as a process that encapsulates four fundamental actions: collection, transformation, analysis, and presentation.

Collection. Collection encompasses the gathering and storing of data—that is, where you obtain your data, how you will receive your data, how you will store your data, and how you will access your data when it comes time to perform some analysis.Transformation. Transformation is the process of ensuring your data is uniform in structure, free from redundancy, and stable. This generally entails things like establishing a table structure, cleaning text, removing blanks, and standardizing data fields.Analysis. Analysis is the investigation of the component parts of your data and their relationships to your data source as a whole. You are analyzing your data when you are calculating, summarizing, categorizing, comparing, contrasting, examining, or testing your data.Presentation. In the context of data analysis, presentation deals with how you make the content of your analysis available to a certain audience. That is, how you choose to display your results. Some considerations that go along with presentation of your analysis include the platform you will use, the levels of visibility you will provide, and the freedom you will give your audience to change their view.

As you think about these four fundamental actions, think about this reality: most analysts are severely limited to one tool—Excel. This means that all of the complex actions involved in each of these fundamentals are mostly being done with and in Excel. What's the problem with that? Well, Excel is not designed to do many of these actions. However, many analysts are so limited in their toolsets that they often go into hand-to-hand combat with their data, creating complex workarounds and inefficient processes.

What this book highlights is that there are powerful functionalities in Access that can help you go beyond your one dimensional spreadsheet and liberate you from the daily grind of managing and maintaining redundant analytical processes. Indeed, using Access for your data analysis needs can help you streamline your analytical processes, increase your productivity, and analyze the larger datasets that have reached Excel's limitations.

Throughout this book, you will come to realize that Access is not the dry database program used only for storing data and building departmental applications. Access possesses strong data analysis functionalities that are easy to learn and certainly applicable to many types of organizations and data systems.

What to Expect from This Book

After reading the first three chapters, you will be able to demonstrate proficiency in Access, executing powerful analysis on large datasets that have long since reached Excel's limitations. After the first nine chapters, you'll be able to add depth and dimension to your analysis with advanced Access functions, building complex analytical processes with ease. By the end of the book, you'll be creating your own custom functions, performing batch analysis, and developing automated procedures that essentially run on their own. You'll also you will be able to analyze large amounts of data in a meaningful way, quickly slice data into various views on the fly, automate redundant analysis, save time, and increase productivity.

What Not to Expect from This Book

It's important to note that there are aspects of Access and data analysis that are out of the scope of this book.

While this book does cover the fundamentals of Access, it is always in the light of data analysis and it is written from a data analyst's point of view. This is not meant to be an all-encompassing book on Access. That being said, if you are a first-time user of Access, you can feel confident that this book will provide you with a solid introduction to Access that will leave you with valuable skills you can use in your daily operations.

This book is not meant to be a book on data management theory and best practices. Nor is it meant to expound on high-level business intelligence concepts. This is more of a “technician's” book, providing hands-on instruction that introduces Access as an analytical tool that can provide powerful solutions to common analytical scenarios and issues.

Finally, while this book does contain a chapter that demonstrates various techniques to perform a whole range of statistical analysis, it is important to note that this book does not cover statistics theory, methodology, or best practices.

Skills Required for This Book

In order to get the most out of this book, it's best that you have certain skills before diving into the topics highlighted in this book. The ideal candidate for this book will have:

Some experience working with data and familiarity with the basic concepts of data analysis such as working with tables, aggregating data, and performing calculationsExperience using Excel with a strong grasp of concepts such as table structures, filtering, sorting and using formulasSome basic knowledge of Access; enough to know it exists and to have opened a database once or twice

How This Book is Organized

The following sections discuss this books structure and what it has to offer.

Part I: Fundamentals of Data Analysis in Access

Part I, which includes Chapters 1, 2 and 3, provides a condensed introduction to Access. Here, you will learn some of the basic fundamentals of Access, along with the essential query skills required throughout the rest of the book. Topics covered in this part are: relational database concepts, query basics, using aggregate queries, action queries, and Crosstab queries.

Part II: Basic Analysis Techniques

Part II introduces you to some of the basic analytical tools and techniques available in Access. Chapter 4 covers data transformation, providing examples of how to clean and shape raw data into staging areas. Chapter 5 provides in-depth instruction on how to create and utilize custom calculations in analysis. Chapter 5 also shows you how to work with dates, using them in simple date calculations. Chapter 6 introduces you to some conditional analysis techniques that allow for the addition of business logic into analytical processes.

Part III: Advanced Analysis Techniques

Part III demonstrates many of the advanced techniques that truly bring data analysis to the next level. Chapter 7 introduces you to powerful subquery and domain aggregate functionality. Chapter 8 demonstrates many of the advanced statistical analysis that can be performed using subqueries and domain aggregate functions. Chapter 9 provides you with an in-depth look at Access macros and how to schedule batch data processing. Chapter 10 not only shows you how to use SQL and VBA to run data analysis without queries, but also how to create your own custom functions.

Part IV: Reports, Dashboards and Visualizations in Access

Part IV focuses on building reports and visualizations using Access. In Chapter 11, you will cover the basics of turning data into a slick-looking PDF-style Access reports. The chapter also talks about creating charts in Access to enhance the look and feel of Access reports. Chapter 12 discusses the real-world benefits of using the built-in PivotTable and PivotChart functionality found in Access. Chapter 13 demonstrates some of the innovative ways you can implement dashboard-style visualizations in your Access Queries and Reports.

Part V: Advanced Excel and Access Integration Techniques

Part V turns your attention to automation and integration, showing you how your reporting mechanisms can be enhanced by leveraging other programs and platforms. Chapter 14 discusses the various ways to move data between Excel and Access using VBA and ADO. Chapter 15 focuses on using Excel and Access automation to manage the inevitable need to show parts of your reporting through Excel. In Chapter 16, you're introduced to the automation techniques, which allow Excel and Access to take control of one another, resulting in some interesting reporting options. In Chapter 17, you get a thorough introduction to XML including a detailed explanation of how XML can collect and transfer data. You conclude with Chapter 18, where you'll get a taste of some of the techniques you can use to integrate Excel and other applications in the Microsoft Office suite. Here, you will be shown how to perform the most common tasks in more efficient ways through integration.

Part VI: Appendixes

Part VI includes useful reference materials that will assist you in your everyday dealings with Access. Appendix A provides a high-level overview of VBA for those users who are new to the world of Access programming. Appendix B introduces SQL, offering a concise tutorial on SQL syntax and usage. Appendix C offers ideas on how to avoid performance and corruption issues when working with Access databases. Appendix D details many of the built-in Access functions that are available to data analysts.

Conventions

To help you get the most from the text and keep track of what's happening, we've used a number of conventions throughout the book.

TRICKS OF THE TRADE

These features give you in-depth information on how to handle specific problems you might encounter when working with Access.

WARNING

These hold important, not-to-be forgotten information that is directly relevant to the surrounding text.

NOTE

Notes, tips, hints, tricks, and asides to the current discussion are presented like this.

As for styles in the text:

We highlight new terms and important words when we introduce them.

We show keyboard strokes like this: Ctrl+A.We show file names, URLs, and code within the text like so: persistence.properties.We present code as follows:

We use a monofont type for code examples.

Companion Database

The examples demonstrated throughout this book can be found in the companion database. This sample database is located at www.wiley.com/go/excelanalystguide.

Part I

Fundamentals of Data Analysis in Access

In this Part

Chapter 1: The Case for Data Analysis in Access

Chapter 2: Access Basics

Chapter 3: Beyond Select Queries

Chapter 1

The Case for Data Analysis in Access

When you ask most people which software tool they use for their daily data analysis, the answer you most often get is Excel. Indeed, if you were to enter the key words data analysis in an Amazon.com search, you would get a plethora of books on how to analyze your data with Excel. Well if so many people seem to agree that using Excel to analyze data is the way to go, why bother using Access for data analysis? The honest answer: to avoid the limitations and issues that plague Excel.

This is not meant to disparage Excel. Many people across varying industries have used Excel for years, considering it the tool of choice for performing and presenting data analysis. Anyone who does not understand Excel in today's business world is undoubtedly hiding that shameful fact. The interactive, impromptu analysis that Excel can perform makes it truly unique in the industry.

However, Excel is not without its limitations, as you will see in the following section.

Where Data Analysis with Excel Can Go Wrong

Years of consulting experience have brought me face to face with managers, accountants, and analysts who all have had to accept one simple fact: Their analytical needs had outgrown Excel. They all met with fundamental issues that stemmed from one or more of Excel's three problem areas: scalability, transparency of analytical processes, and separation of data and presentation.

Scalability

Scalability is the ability of an application to develop flexibly to meet growth and complexity requirements. In the context of this chapter, scalability refers to Excel's ability to handle ever-increasing volumes of data. Most Excel aficionados will be quick to point out that as of Excel 2007, you can place 1,048,576 rows of data into a single Excel worksheet. This is an overwhelming increase from the limitation of 65,536 rows imposed by previous versions of Excel. However, this increase in capacity does not solve all of the scalability issues that inundate Excel.

Imagine that you are working in a small company and you are using Excel to analyze your daily transactions. As time goes on, you build a robust process complete with all the formulas, pivot tables, and macros you need to analyze the data stored in your neatly maintained worksheet.

As your data grows, you will first notice performance issues. Your spreadsheet will become slow to load and then slow to calculate. Why will this happen? It has to do with the way Excel handles memory. When an Excel file is loaded, the entire file is loaded into memory. Excel does this to allow for quick data processing and access. The drawback to this behavior is that each time something changes in your spreadsheet, Excel has to reload the entire spreadsheet into memory. The net result in a large spreadsheet is that it takes a great deal of memory to process even the smallest change in your spreadsheet. Eventually, each action you take in your gigantic worksheet will become an excruciating wait.

Your pivot tables will require bigger pivot caches, almost doubling your Excel workbook's file size. Eventually, your workbook will be too big to distribute easily. You may even consider breaking down the workbook into smaller workbooks (possibly one for each region). This causes you to duplicate your work. Not to mention the extra time and effort it would take should you want to recombine those workbooks.

In time, you may eventually reach the 1,048,576-row limit of your worksheet. What happens then? Do you start a new worksheet? How do you analyze two datasets on two different worksheets as one entity? Are your formulas still good? Will you have to write new macros?

These are all issues you need to deal with.

Of course, you will have the Excel power-users, who will find various clever ways to work around these limitations. In the end, though, they will always be just workarounds. Eventually, even these power-users will begin to think less about the most effective way to perform and present analysis of their data and more about how to make something “fit” into Excel without breaking their formulas and functions. Excel is flexible enough that a proficient user can make most things fit into Excel just fine. However, when users think only in terms of Excel, they are undoubtedly limiting themselves, albeit in an incredibly functional way.

In addition, these capacity limitations often force Excel users to have the data prepared for them. That is, someone else extracts large chunks of data from a large database and then aggregates and shapes the data for use in Excel. Should the serious analyst always be dependent on someone else for his or her data needs? What if an analyst could be given the tools to access vast quantities of data without being reliant on others to provide data? Could that analyst be more valuable to the organization? Could that analyst focus on the accuracy of the analysis and the quality of the presentation, instead of routing Excel data maintenance?

Access is an excellent (many would say logical) next step for the analyst who faces an ever-increasing data pool. Since an Access table takes very few performance hits with larger datasets and has no predetermined row limitations, an analyst can handle larger datasets without requiring the data to be summarized or prepared to fit into Excel. Since many tasks can be duplicated in both Excel and Access, an analyst proficient at both will be prepared for any situation. The alternative is telling everyone, “Sorry, it is not in Excel.”

Also, if ever a process becomes more crucial to the organization and needs to be tracked in a more enterprise-acceptable environment, it will be easier to upgrade and scale up if that process is already in Access.

NOTE

An Access table is limited to 256 columns but has no row limitation. This is not to say that Access has unlimited data storage capabilities. Every bit of data causes the Access database to grow in file size. An Access database has a file-size limitation of 2 gigabytes.

Transparency of Analytical Processes

One of Excel's most attractive features is its flexibility. Each cell can contain text, a number, a formula, or practically anything else the user defines. Indeed, this is one of the fundamental reasons Excel is such an effective tool for data analysis. Users can use named ranges, formulas, and macros to create an intricate system of interlocking calculations, linked cells, and formatted summaries that work together to create a final analysis.

So what is the problem with that? The problem is that there is no transparency of analytical processes. Thus, it is extremely difficult to determine what is actually going on in a spreadsheet. Anyone who has had to work with a spreadsheet created by someone else knows all too well the frustration that comes with deciphering the various gyrations of calculations and links being used to perform some analysis. Small spreadsheets performing modest analysis are painful to decipher, while large, elaborate, multi-worksheet workbooks are virtually impossible to decode, often leaving you to start from scratch.

Even auditing tools available with most Excel add-in packages provide little relief. Figure 1.1 shows the results of a formula auditing tool run on an actual workbook used by a real company. It's a list of all the formulas in this workbook. The idea is to use this list to find and make sense of existing formulas. Notice that line one shows that there are 156 formulas. Yeah, this list helps a lot; good luck.

Figure 1.1 Formula auditing tools don't help much in deciphering spreadsheets.

Compared to Excel, Access might seem rigid, strict, and unwavering in its rules. No, you can't put formulas directly into data fields. No, you can't link a data field to another table. To many users, Excel is the cool gym teacher who lets you do anything, while Access is the cantankerous librarian who has nothing but error messages for you. All this rigidity comes with a benefit, however.

Since only certain actions are allowable, you can more easily come to understand what is being done with a set of data in Access. If a dataset is being edited, a number is being calculated, or if any portion of the dataset is being affected as a part of an analytical process, you will readily see that action. This is not to say that users can't do foolish and confusing things in Access. However, you definitely will not encounter hidden steps in an analytical process such as hidden formulas, hidden cells, or named ranges in dead worksheets.

Separation of Data and Presentation

Data should be separate from presentation; you do not want the data to become too tied to any particular way of presenting it. For example, when you receive an invoice from a company, you don't assume that the financial data on that invoice is the true source of your data. Rather, it is a presentation of your data. It can be presented to you in other manners and styles on charts or on Web sites, but such representations are never the actual source of data. This sounds obvious, but it becomes an important distinction when you study an approach of using Access and Excel together for data analysis.

What exactly does this concept have to do with Excel? People who perform data analysis with Excel, more often than not, tend to fuse the data, the analysis, and the presentation together. For example, you will often see an Excel Workbook that has 12 worksheets, each representing a month. On each worksheet, data for that month is listed along with formulas, pivot tables, and summaries. What happens when you are asked to provide a summary by quarter? Do you add more formulas and worksheets to consolidate the data on each of the month worksheets? The fundamental problem in this scenario is that the worksheets actually represent data values that are fused into the presentation of your analysis. The point here is that data should not be tied to a particular presentation, no matter how apparently logical or useful it may be. However, in Excel, it happens all the time.

In addition, as previously discussed, because all manners and phases of analysis can be done directly within a spreadsheet, Excel cannot effectively provide adequate transparency to the analysis. Each cell has the potential of holding formulas, becoming hidden, and containing links to other cells. In Excel, this blurs the line between analysis and data, which makes it difficult to determine exactly what is going on in a spreadsheet. Moreover, it takes a great deal of effort in the way of manual maintenance to ensure that edits and unforeseen changes don't affect previous analyses.

Access inherently separates its analytical components into tables, queries, and reports. By separating these elements, Access makes data less sensitive to changes and creates a data analysis environment where you can easily respond to new requests for analysis without destroying previous analyses.

Many who use Excel will find themselves manipulating its functionalities to approximate this database behavior. If you find yourself in this situation, you must consider that if you are using Excel's functionality to make it behave like a database application, perhaps the real thing just might have something to offer. Utilizing Access for data storage and analytical needs would enhance overall data analysis and would allow the Excel power-user to focus on the presentation in his or her spreadsheets.

In the future, there will be more data, not less. Likewise, there will be more demands for complex data analysis, not fewer. Power-users are going to need to add some tools to their repertoire in order to get away from being simply spreadsheet mechanics. Excel can be stretched to do just about anything, but maintaining such creative solutions can be a tedious manual task. You can be sure that the sexy part of data analysis is not in routine data management within Excel. Rather, it is in the creation of slick processes and utilities that will provide your clients with the best solution for any situation.

Deciding Whether to Use Access or Excel

After such a critical view of Excel, it is important to say that the key to your success in the sphere of data analysis will not come from discarding Excel altogether and exclusively using Access. Your success will come from proficiency with both applications and the ability to evaluate a project and determine the best platform to use for your analytical needs. Are there hard-and-fast rules you can follow to make this determination? The answer is no, but there are some key indicators in every project you can consider as guidelines to determine whether to use Access or Excel. These indicators are the size of the data; the data's structure; the potential for data evolution; the functional complexity of the analysis; and the potential for shared processing.

Size of Data

The size of your dataset is the most obvious consideration you will have to take into account. Although Excel can handle more data than in previous versions, it is generally a good rule to start considering Access if your dataset begins to approach 100,000 rows. The reason for this is the fundamental way Access and Excel handle data.

When you open an Excel file, the entire file is loaded into memory to ensure quick data processing and access. The drawback to this behavior is that Excel requires a great deal of memory to process even the smallest change in your spreadsheet. You may have noticed that when you try to perform an AutoFilter on a large formula-intensive dataset, Excel is slow to respond, giving you a Calculating indicator in the status bar. The larger your dataset is, the less efficient the data crunching in Excel will be.

Access, on the other hand, does not follow the same behavior as Excel. When you open an Access table, it may seem as though the whole table is opening for you, but in reality, Access is storing only a portion of data into memory at a time. This ensures the cost-effective use of memory and allows for more efficient data crunching on larger datasets. In addition, Access allows you to make use of Indexes that enable you to search, sort, filter, and query extremely large datasets very quickly.

Data Structure

If you are analyzing data that resides in a table that has no relationships with other tables, Excel is a fine choice for your analytical needs. However, if you have a series of tables that interact with each other (such as a Customers table, an Orders table, and an Invoices table), you should consider using Access. Access is a relational database, which means it is designed to handle the intricacies of interacting datasets. Some of these are the preservation of data integrity, the prevention of redundancy, and the efficient comparison and querying of data between the datasets. You will learn more about the concept of table relationships in Chapter 2.

Data Evolution

Excel is an ideal choice for quickly analyzing data used as a means to an end, such as a temporary dataset crunched to obtain a more valuable subset of data. The result of a pivot table is a perfect example of this kind of one-time data crunching. However, if you are building a long-term analytical process with data that has the potential of evolving and growing, Access is a better choice. Many analytical processes that start in Excel begin small and run fine, but as time passes these processes grow in both size and complexity until they reach the limits of Excel's capabilities. The message here is that you should use some foresight and consider future needs when determining which platform is best for your scenario.

Functional Complexity

There are far too many real-life examples of analytical projects where processes are forced into Excel even when Excel's limitations have been reached. How many times have you seen a workbook that contains an analytical process encapsulating multiple worksheets, macros, pivot tables, and formulas that add, average, count, look up, and link to other workbooks? The fact is that when Excel-based analytical processes become overly complex, they are difficult to manage, difficult to maintain, and difficult to translate to others. Consider using Access for projects that have complex, multiple-step analytical processes.

Shared Processing

Although it is possible to have multiple users work on one central Excel spreadsheet located on a network, ask anyone who has tried to coordinate and manage a central spreadsheet how difficult and restrictive it is. Data conflicts, loss of data, locked-out users, and poor data integrity are just a few examples of some of the problems you will encounter if you try to build a multiple-user process with Excel. Consider using Access for your shared processes. Access is better suited for a shared environment for many reasons, some of which are:

The ability for users to concurrently enter and update dataInherent protection against data conflictsPrevention of data redundancyProtection against data entry errors

An Excel User's Guide to Access: Don't Panic!

Many seasoned managers, accountants, and analysts come to realize that just because something can be done in Excel does not necessarily mean Excel is the best way to do it. This is the point when they decide to open Access for the first time. When they do open Access, the first object that looks familiar to them is the Access table. In fact, Access tables look so similar to an Excel spreadsheet that most Excel users try to use tables just like a spreadsheet. However, when they realize that they can't type formulas directly into the table or duplicate most of Excel's behavior and functionality, most of them wonder just what exactly the point of using Access is.

When many Excel experts find out that Access does not behave or look like Excel, they write Access off as too difficult or as taking too much time to learn. However, the reality is that many of the concepts behind how data is stored and managed in Access are those with which the user is already familiar. Any Excel user has already learned such concepts in order to perform and present complex analysis. Investing a little time up front to see just how Access can be made to work for you can save a great deal of time in automating routine data processes.

Throughout this book, you will learn various techniques in which you can use Access to perform much of the data analysis you are now performing exclusively in Excel. This section is a brief introduction to Access from an Excel expert's point of view. Here, you will focus on the big-picture items in Access. If some of the Access terms mentioned here are new or not terribly familiar, be patient. They will be covered in detail as the book progresses.

Tables

What will undoubtedly look most familiar to you are Access tables. Tables appear almost identical to spreadsheets with familiar cells, rows, and columns. However, the first time you attempt to type a formula in one of the cells, you will see that Access tables do not possess Excel's flexible, multi-purpose nature that allows any cell to take on almost any responsibility or function.

The Access table is simply a place to store data, such as numbers and text. All of the analysis and number crunching happens somewhere else. This way, data will never be tied to any particular analysis or presentation. Data is in raw form, which leaves users to determine how they want to analyze or display it.

Chapter 2 will help you get started with a gentle introduction to Access basics.

Queries

You may have heard of Access queries but have never been able to relate to them.

Consider this: In Excel, when you use AutoFilter, a VLookup formula, or Subtotals, you are essentially running a query. A query is a question you pose against your data in order to get an answer or a result. The answer to a query can be a single data item, a Yes/No answer, or many rows of data. In Excel, the concept of querying data is a bit nebulous, as it can take the form of the different functionalities, such as formulas, AutoFilters, and PivotTables.

In Access, a query is an actual object that has its own functionalities. A query is separate from a table, ensuring that data is never tied to any particular analysis. You will cover queries extensively in subsequent chapters. Your success in using Microsoft Access to enhance your data analysis will depend on your ability to create all manners of both simple and complex queries.

Chapter 3 begins your full emersion into all the functionality you can get from Access queries.

Reports

Access reports are an incredibly powerful component of Microsoft Access, which allow data to be presented in a variety of styles. Access reports, in and of themselves, provide an excellent illustration of one of the main points of this book: Data should be separate from analysis and presentation. The report serves as the presentation layer for a database, displaying various views into the data within. Acting as the presentation layer for your database, reports are inherently disconnected from the way your data is stored and structured. As long as the report receives the data it requires in order to accurately and cleanly present its information, it will not care where the information comes from.

Access reports can have mixed reputations. On one hand, they can provide clean-looking PDF-esque reports that are ideal for invoices and form letters. On the other hand, Access reports are not ideal for showing the one-shot displays of data that Excel can provide. However, Access reports can easily be configured to prepare all manners of report styles, such as crosstabs, matrices, tabular layouts, and subtotaled layouts. You'll explore all the reporting options available to you starting in Chapter 11.

Macros and VBA

Just as Excel has macro and VBA functionality, Microsoft Access has its equivalents. This is where the true power and flexibility of Microsoft Access data analysis resides. Whether you are using them in custom functions, batch analysis, or automation, macros and VBA can add a customized flexibility that is hard to match using any other means. For example, you can use macros and VBA to automatically perform redundant analyses and recurring analytical processes, leaving you free to work on other tasks. Macros and VBA also allow you to reduce the chance of human error and to ensure that analyses are preformed the same way every time. Starting in Chapter 9, you will explore the benefits of macros and VBA and how you can leverage them to schedule and run batch analysis.

Summary

Although Excel is considered the premier tool for data analysis, Excel has some inherent characteristics that often lead to issues revolving around scalability, transparency of analytic processes, and confusion between data and presentation. Access has a suite of analytical tools that can help you avoid many of the issues that arise from Excel.

First, Access can handle very large datasets and has no predetermined row limitation. This allows for the management and analysis of large datasets without the scalability issues that plague Excel. Access also forces transparency—the separation of data and presentation by separating data into functional objects (that is, tables, queries, and reports) and by applying stringent rules that protect against bad processes and poor habits.

As you go through this book, it is important to remember that your goal is not to avoid Excel altogether but rather to broaden your toolset and to understand that, often, Access offers functionality that both enhances your analytical processes and makes your life easier.

Chapter 2

Access Basics

When working with Access for the first time, it is tempting to start filling tables right away and querying data to get fast results, but it's important to understand the basics of the relational database concept before pounding away at data. A good understanding of how a relational database works will help you take full advantage of Access as a powerful data analysis solution. This chapter covers the fundamentals of Access and methods to bring data into the program.

Access Tables

Upon opening any existing Access database, you notice that the Database window, shown in Figure 2.1, contains a task pane on the left. Using the topmost dropdown box, change the navigation category to All Access Objects. You will get six sections. Each section represents one of the six database objects: Tables, Queries, Forms, Reports, Macros, and Modules. The Tables section is appropriately at the top of the list because it is the precise location where your data will be stored. All other database objects will refer to the tables in your database for data, whether asking questions of the data or creating reports based on the data. This section covers the basics to get you working with Access tables.

Figure 2.1 The navigation pane on the left allows you to navigate through the six types of database objects: Tables, Queries, Forms, Reports, Macros, and Modules.

Table Basics

One way to think of a table is as a collection of data concerning a specific type of entity (such as customers, branches, transactions, products, and so on). You want each of these entities to have its own unique table. Among the many advantages to storing your data using this approach is eliminating or significantly decreasing duplicate information. Later in this chapter, you will learn about the dangers inherent in storing data with excessive duplications.

Opening a Table in the Datasheet View

Open your sample database and go to the Tables section in the navigation pane. Double-click the Dim_Customers table. When the table opens, it is in the Datasheet view. In this view, you are able to directly view and edit the contents of the table. As you can see in Figure 2.2, the names of the columns are at the top.

Figure 2.2 Opening the table in Datasheet view allows you to view and edit the data stored in the table.

Identifying Important Table Elements

A table is composed of rows, with each row representing an individual entity. In the Dim_Customers table, each row represents a single distinct customer. The proper database terminology for a row is record.

A table is also composed of columns, with each column representing a particular piece of information common to all instances of the table's entities. In Dim_Customers, each column represents some attribute of the customer record. The proper database terminology for a column is field.

TIP

The number of records in a table is visible at the bottom left of the Datasheet view, next to the record selectors.

Opening a Table in the Design View

Through the Design view of a table, you are able to set the field names and data types. To get to the Design view of the Dim_Customers table, go to the Home tab and select View ⇒ Design View, as demonstrated in Figure 2.3.

Figure 2.3 You can configure the structure of your table by switching to Design view.

As you can see in Figure 2.4, the Design view shows you the fields that compose the Dim_Customers table in an easy-to-manage view.

Figure 2.4 Opening the table in the Design view allows you to add field names or change existing ones.

Note how each field has a Field Name and a Data Type. The Field Name is the descriptive text string given to that particular column of a table. It appears at the top of the table when it is in the Datasheet view. The Data Type of the field ensures that only a certain type of data is allowed in the field. If a data type is tagged as a Number, Access does not allow any text to be entered into that field. By setting the data type of each column, you go a long way toward ensuring the integrity and consistency of the data.

TIP

It's good practice to avoid putting any spaces in your field names. When constructing queries or referring to tables in VBA code, spaces in the field names can lead to problems. If you need to indicate a space in your field name, use the underscore character. Keep in mind that your field names cannot include a period (.), an exclamation point (!), an accent grave (`), or brackets ([ ]).

Exploring Data Types

With the Design view of the Dim_Customers table open, select the Data type section of the first field and click the drop-down arrow. A list of predefined data type choices becomes visible. These data types are Text; Memo; Number; Date/Time; Currency; AutoNumber; Yes/No; OLE Object; Hyperlink; Attachment; and Calculated.

NOTE

When in Design View, you will also see a data type selection called Lookup Wizard. This selection is actually not a data type at all; it's a mechanism used to activate the Lookup Wizard in order to create lookup fields. The Lookup Wizard is beyond the scope of this book.

Text: Any combination of letters, numbers, spaces, and characters is text. This is by far the most common data type. Although text can be a number, it should not be a number used in a calculation. Examples of common uses of the Text data type are customer names, customer numbers (using customer numbers in calculations would have no meaning), and addresses. The maximum number of characters allowed in a Text field is 255 characters.Memo: If you need to store text data that exceeds the 255-character limit of the Text field, the Memo field should be used. Long descriptions or notes about the record can be stored in fields of this type.Number: This type is for all numerical data used in calculations, except currency (which has its own data type). Actually, Number is several data types under one heading. When you select Number as a data type in the Design view of the table, you go to the Field Size field at the top of the General tab. When you select the drop-down arrow, you get the following options: Byte, Integer, Long Integer, Single, Double, Replication ID, and Decimal. Probably the most commonly used field sizes of the Number data type are Long Integer and Double. Long Integer should be selected if the numbers are whole numbers that do not have any non-zeros to the right of the decimal point. Double should be selected if numbers with decimals need to be stored.Date/Time: Another data type often used in calculations is Date/Time. Recording the time that certain events occur is among the more important uses of this data type. Recording dates and times allows you to compare data by time durations, be it months, years, or whatever. In the business world, the date field can be crucial to analysis, especially in identifying seasonal trends or year-over-year comparisons.Currency: A special calculation data type, Currency is ideal for storing all data that represents amounts of money.AutoNumber: This data type is actually a Long Integer automatically and sequentially created for each new record added to a table. The AutoNumber can be one mechanism by which you can uniquely identify each record in a table. You will not enter data into this field.Yes/No: There are situations where the data that needs to be represented is in a simple Yes/No format. Although you could use the Text data type for creating a True/False field, it is much more intuitive to use Access's native data type for this purpose.OLE Object: This data type is not encountered very often in data analysis. It is used when the field must store a binary file, such as a picture or sound file.Hyperlink: When you need to store an address to a Web site, this is the preferred data type.Attachment: This data type was introduced with Access 2007. When you set a field to the Attachment type, you can attach images, spreadsheet files, documents, charts, and other types of supported files to the records in your database. You can also configure the field to view and edit attached files.Calculated: This data type is new to Access 2010. With Calculated type fields, you can build mathematical operations, textual evaluations, or any other calculation directly into your table.

Creating a Table from Scratch

Access provides several methods for creating a table. The ideal way to create a table in Access is with the Design view. Why? The Design view allows for a compact work area so you can add fields, reposition fields, and assign attributes easily.

Imagine that the human resources department asks you to create a simple list of employees in Access.

1. To create this table in the Design view, go to the application ribbon and select the Create tab and then the Table Design button. This opens an empty table called Table1 in Design view.

2. The idea here is to create a list of fields that describe employee attributes. Among the more common attributes in this situation are the following: EmployeeNumber, FirstName, LastName, Address, City, State, Zip, and HourlyWage. You begin by entering the names of the columns going down the list. When you have entered all of the required column names, your dialog box should look like Figure 2.5.

3. As you enter the field names, the data types default to the most common data type, Text. You now want to set the data type for each field or at least change the data type of each non-text field. Choosing the correct data type for the first field, EmployeeNumber, may be initially confusing. With the word “Number” in the field, you might think that Number would be the logical choice for the data type. Actually, the rule of thumb is that if the field will not be used in a calculation, it is best to set its data type to Text. Because there is no logical reason to perform a calculation on an employee's EmployeeNumber, the EmployeeNumber data type should remain Text. Another reason for using the Text data type for the field EmployeeNumber is that there could be a need to use alphabetic or other characters in the field.

As you go through the field names, it should be fairly obvious that you will want to set all of the fields to Text, except for HourlyWage. This field will almost certainly be used in calculations, and it will represent a monetary value, so you should change the data type to Currency.

At this point, your Design view should look similar to Figure 2.6.

4. Now you can save and name your table. Click File on the application Ribbon and select Save As. If you are using Access 2007, click the Office Icon and select Save As. This opens the Save As dialog box where you will give your newly created table an appropriate name like “Employees” or “EmployeeMaster.”

Figure 2.5 Enter the column names you want to see in your table.

Figure 2.6 You have created your first table!

Keep in mind that at this point, this table has no data. You can begin entering employee information directly into the table through the Datasheet view. For tables with a small number of records, you can enter your records manually. However, most sets of data are quite large, so other techniques of bringing data into Access are introduced later in this chapter.

NOTE

When you save a table, you may be prompted to set a primary key. Primary keys are explained later (see Setting Primary Keys in this chapter). In most cases, Access will try to choose one for you. It's generally good practice to accept Access' recommendation to create a primary key if you do not already have one on mind.

Working with Field Properties

When working with data in tables, you may encounter situations that require the data be restricted or to adhere to some default specifications in particular columns. You can define these requirements by using the field properties.

The field properties affect how the data is stored and presented, among other things. The list of field properties that are available to you is dependent on the data type chosen for that field. Some field properties are specific to Text fields, and others are specific to Number fields. The field properties can be found in the Design view, as illustrated in Figure 2.7. As you click on each field, you will see the field properties for that field.

Figure 2.7 You can find the Field Properties in Design view beneath the field names.

Some of the most important field properties to note are:

Field Size: You encountered the Field Size before, when working with the Number data type. This property also exists for the common Text data type. This property allows you to set a maximum size limit on data entered in that column. For the Text data type, size refers to the length (number of characters and spaces) of the Text data in that column. For example, looking at the Employees table, you see a field for State. Your firm tells you that the names of states should be recorded using their two-letter designation. If you set the field size to “2” for the State column, the user will be unable to type any text that is longer than two characters. So with Access, you are not only able to force a certain data type in a particular column, you can also customize that individual column to accept data only in the rigid format that you specify.Format: This property allows you to set the precise manner in which Access displays or prints the data that is located in its Tables. As with Field Size, the format available to select will depend on the data type of that column. For example, with a Currency field, you can display the data in a form that uses a dollar sign, a Euro sign, or no sign at all. With these settings, the data itself does not change—just how it displays. Another very useful function of Format is with Date/Time data types. Whether you want to display data in the long format or short format, this property allows you to set that option.Input Mask: This feature can be useful in data entry situations. Where Format controls how data is displayed, Input Mask controls how data is entered into a particular field. Input mask is available for the following data types: Text, Number, Date/Time, and Currency. For example, if a user needs to enter a telephone number, the input mask can create the characters and structure with which you are all familiar. As the user types, the number automatically assumes a phone number format: (###) ###-####.Decimal Places: In number fields, you can set the number of decimal places to the right of the decimal point that will be recorded. There is an Auto setting, which defers to the Format setting to determine the correct number of places. Apart from Auto, you are able to select 0 to 15 for the number of decimal places.Default Value: An important database concept, the default value can help save time in the data entry process. The default value is automatically placed in that column every time a new record is added. Defaults can be overridden by simply entering data into the field.Required: Another important property, Required simply forces a user to enter some value, using the proper data type, in the designated field. The user cannot add a new record if the Required field is not properly filled. As with Input Mask, this property is an excellent mechanism for asserting more control over the data entry process.

Setting Primary Keys

In some tables, you will need to ensure the uniqueness of each record. This is typically achieved by including a field whose records will not have duplicate values. One example of this is a field for Social Security numbers. Each person has one and only one unique Social Security number. By definition, you cannot have a Social Security number that represents two people. This type of unique column is what you call a primary key.

You can identify a given field as the primary key for you table in Design view—by right clicking the chosen field and selecting Primary Key. When you save your table, Access determines whether your selected fields have any null (or blank) values or duplicate data (data duplicating in multiple records for a single field). If there are blanks or duplicates, Access informs you with an error message. You must fill in the blanks with unique values and remove any duplicates if that column is indeed to become the primary key for the table.

TIP

Sometimes a table will have two or more fields that together uniquely identify a record. In these cases, you will need to create what is called a compound key. For example, imagine a table with both an invoice number and a product number. A sales representative may have sold multiple products to a customer on the same invoice. Therefore, when you look at each value separately, you'll find duplicate values of each field. By combining invoice and product number, however, you can create a compound primary key that is truly unique for each record. If you need a compound key, do the following:

1. Select the first field that will be included in your compound key by clicking on the grey square to the right of the Field Name. Then while holding down the Control key on your keyboard, click the grey square next to the second field to be included.

2.