Extending Excel with Python and R - Steven Sanderson - E-Book

Extending Excel with Python and R E-Book

Steven Sanderson

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

– Extending Excel with Python and R is a game changer resource written by experts Steven Sanderson, the author of the healthyverse suite of R packages, and David Kun, co-founder of Functional Analytics.
– This comprehensive guide transforms the way you work with spreadsheet-based data by integrating Python and R with Excel to automate tasks, execute statistical analysis, and create powerful visualizations.
– Working through the chapters, you’ll find out how to perform exploratory data analysis, time series analysis, and even integrate APIs for maximum efficiency.
– Both beginners and experts will get everything you need to unlock Excel's full potential and take your data analysis skills to the next level.
– By the end of this book, you’ll be able to import data from Excel, manipulate it in R or Python, and perform the data analysis tasks in your preferred framework while pushing the results back to Excel for sharing with others as needed.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 394

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Extending Excel with Python and R

Unlock the potential of analytics languages for advanced data manipulation and visualization

Steven Sanderson

David Kun

Extending Excel with Python and R

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Associate Group Product Manager: Kaustubh Manglurkar

Publishing Product Manager: Apeksha Shetty

Book Project Manager: Kirti Pisat

Senior Editor: Tiksha Lad

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Proofreader: Safis Editing and Tiksha Lad

Indexer: Subalakshmi Govindhan

Production Designer: Shankar Kalbhor

Senior DevRel Marketing Coordinator: Nivedita Singh

First published: April 2024

Production reference: 1030424

Published by

Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-80461-069-5

www.packtpub.com

To my wife, I love you. Thank you for putting the children to bed every night so I could work on this book. I would not have been able to work on it without that time. To my children, Daddy loves you. And to my mom, thank you for thinking any idea I had was a good one.

– Steven Sanderson

To my family, for illuminating every step of this journey with your love and unwavering support.

– David Kun

Contributors

About the authors

Steven Sanderson, MPH, is an applications manager for the patient accounts department at Stony Brook Medicine. He received his bachelor’s degree in economics and his master’s in public health from Stony Brook University. He has worked in healthcare in some capacity for just shy of 20 years. He is the author and maintainer of the healthyverse set of R packages. He likes to read material related to social and labor economics and has recently turned his efforts back to his guitar with the hope that his kids will follow suit as a hobby they can enjoy together.

I want to thank Packt for this opportunity and David, my co-author, for working on this with me. I also want to thank my family as this took a good deal of time to put together.

David Kun is a mathematician and actuary who has always worked in the gray zone between quantitative teams and ICT, aiming to build a bridge. He is a co-founder and director of Functional Analytics and the creator of the ownR Infinity platform. As a data scientist, he also uses ownR for his daily work. His projects include time series analysis for demand forecasting, computer vision for design automation, and visualization.

I extend my heartfelt gratitude to my partner, Ania, and my children for the encouragement throughout the journey of writing this, my inaugural book.

About the reviewers

Jesús Martín de la Sierra Silva is a dedicated R developer with an engineering background who has been involved in large IT projects transforming raw data into insightful actionable decisions. He provides decision-making capabilities based on a statistical approach, by exploring connections and patterns in data and producing valuable visualizations. Jesús also specializes in creating user-friendly applications that seamlessly integrate complex computations such as process mining, forecasting, network analysis, or machine learning predictive models. For years, he has promoted the transition from spreadsheet software to R to bring a powerful perspective on modern data analysis.

David Napoli holds a master’s degree in aerospace engineering and All but Dissertation in Health Services Research/Biostatistics. David has 25 years of experience working with data, with extensive experience in quantitative development, including the development, maintenance, and governance of data repositories, risk adjustment methodologies, IBNR determination, and analytic platform innovation. David also has 25 years of experience in advanced data visualization, statistical modeling and evaluation, and survival analysis. David has applied his analytics proficiency in multiple roles, including director of strategic analytics and distinguished faculty instructor of data visualization, data analytics, and advanced analytics curriculums.

Mehmet Sinan İyisoy is a biostatistician with years of experience in medical statistics. He holds BSc and MSc degrees in mathematics and statistics. Starting out as a mathematician, his career transitioned through phases of programmer, system administrator, and biostatistician. Sinan has consistently shown a strong dedication to data and programming. He is an enthusiastic user of a wide array of technologies, predominantly consisting of open source components. He has accumulated experience across various institutions and diverse roles for which his profound expertise was central. With a passion for statistics, Sinan now continues to contribute to scientific literature and public knowledge while working at Necmettin Erbakan University.

Shane Alex Jose has a master’s in statistics. His passion for coding stemmed from tutoring graduate-level Python courses – yet he claims he’s more confident in R. Oddly, he relishes debugging and considers it very cathartic! Currently, Shane is working as an analyst programmer in EvoEnergy’s data analytics team – enhancing models and building/testing internal R packages/shiny dashboards used across the company. Having worked across multiple sectors, Shane is fascinated by the variety of data used across various industries and the impact data can have (especially if incorrectly governed). He aims to become confident across multiple specializations to help identify gaps and develop unique solutions/insights that could potentially address these gaps.

Table of Contents

Preface

Part 1: The Basics – Reading and Writing Excel Files from R and Python

1

Reading Excel Spreadsheets

Technical requirements

Working with R packages for Excel manipulation

Reading Excel files to R

Installing and loading libraries

Reading multiple sheets with readxl and a custom function

Python packages for Excel manipulation

Python packages for Excel manipulation

Considerations

Opening an Excel sheet from Python and reading the data

Using pandas

Using openpyxl

Reading in multiple sheets with Python (openpyxl and custom functions)

The importance of reading multiple sheets

Using openpyxl to access sheets

Reading data from each sheet

Retrieving sheet data with openpyxl

Combining data from multiple sheets

Custom function for reading multiple sheets

Customizing the code

Summary

2

Writing Excel Spreadsheets

Technical requirements

Packages to write into Excel files

writexl

openxlsx

xlsx

A comprehensive recap and insights

Creating and manipulating Excel sheets using Python

Why export data to Excel?

Keeping it simple – exporting data to Excel with pandas

Advanced mode – openpyxl for Excel manipulation

Creating a new workbook

Adding sheets to the workbook

Deleting a sheet

Manipulating an existing workbook

Choosing between openpyxl and pandas

Other alternatives

Summary

3

Executing VBA Code from R and Python

Technical requirements

Installing and explaining the RDCOMClient R library

Installing RDCOMClient

Executing sample VBA with RDCOMClient

Integrating VBA with Python using pywin32

Why execute VBA code from Python?

Setting up the environment

Error handling with the environment setup

Writing and executing VBA code

Automating Excel tasks

Pros and cons of executing VBA from Python

Summary

4

Automating Further – Task Scheduling and Email

Technical requirements

Installing and understanding the tasksheduleR library

Creating sample scripts

RDCOMClient for Outlook

Using the Microsoft365R and blastula packages

Microsoft365R

The blastula package

Scheduling Python scripts

Introduction to Python script scheduling

Built-in scheduling options

Third-party scheduling libraries

Best practices and considerations for robust automation

Email notifications and automation with Python

Introduction to email notifications in Python

Setting up email services

Sending basic emails

Sending email notifications for script status

Summary

Part 2: Making It Pretty – Formatting, Graphs, and More

5

Formatting Your Excel Sheet

Technical requirements

Installing and using styledTables in R

Installing and using basictabler in R

Advanced options for formatting with Python

Cell formatting

Conditional formatting

Pivot tables

Summary

6

Inserting ggplot2/matplotlib Graphs

Technical requirements

Some basics

Visualizing data with ggplot2

Visualizing data with the cowplot package

Bar charts and dumbbell plots

Enhancing your Excel reports with plotnine2, matplotlib, and plotly graphs

Enhancing Excel reports with visualizations

An introduction to data visualization libraries

Plotnine – elegant grammar of graphics

Plotly – interactive visualizations

Matplotlib – classic and customizable plots

Creating graphs with plotnine (Python’s ggplot2)

Understanding the grammar of graphics

Generating various plot types

Customizing the visual elements of a plotnine plot

Incorporating additional layers

Generating graphs with matplotlib

Customizing visual elements of a matplotlib plot

Other visualization libraries

plotly

seaborn

Embedding visualizations into Excel

A basic embedding process

Summary

7

Pivot Tables and Summary Tables

Technical requirements

Making a table with the Base R xtabs function

Making a table with the gt package

Creating pivot tables with tidyquant

Creating and managing pivot tables in Python with win32com and pypiwin32

Creating pivot tables with Python: the basics

Setting up the Python environment

Creating pivot tables

Manipulating pivot tables

Groupings in pivot tables

Summary

Part 3: EDA, Statistical Analysis, and Time Series Analysis

8

Exploratory Data Analysis with R and Python

Technical requirements

Understanding data with skimr

Using the GGally package in R

Using the DataExplorer package

Getting started with EDA for Python

Data cleaning in Python for Excel data

Handling missing data

Dealing with duplicates

Handling data type conversion

Excel-specific data issues

Performing EDA in Python

Summary statistics

Data distribution

Associations between variables

Scatter plots

Visualizing key attributes

Summary

9

Statistical Analysis: Linear and Logistic Regression

Technical requirements

Linear regression

Logistic regression

Frameworks

Performing linear regression in R

Linear regression in base R

Linear regression with tidymodels and purrr

Performing logistic regression in R

Logistic regression with base R

Performing logistic regression using tidymodels

Performing linear regression in Python using Excel data

Logistic regression in Python using Excel data

Summary

10

Time Series Analysis: Statistics, Plots, and Forecasting

Technical requirements

Generating random time series objects in R

Manipulating the time series parameters

Time series plotting

Creating ACF and PACF plots in R

Auto ARIMA modeling with healthyR.ts

Creating a Brownian motion with healthyR.ts

Time series analysis in Python – statistics, plots, and forecasting

Time series plotting – basic plots and ACF/PACF plots

Autocorrelation function (ACF) plot

Partial autocorrelation function (PACF) plot

Time series statistics and statistical forecasting

Statistical analysis for time series data

Understanding predictive modeling approaches

Forecasting with statsmodels

Time series forecasting with Facebook’s prophet

Time series forecasting with deep learning – LSTM

Summary

Part 4: The Other Way Around – Calling R and Python from Excel

11

Calling R/Python Locally from Excel Directly or via an API

Technical requirements

Calling R and Python from Excel locally

Why you would want to call R/Python from Excel locally

Setting up an environment

Steps to set up BERT for R

Steps to set up xlwings for Python

Calling R/Python directly from Excel

Executing R with VBA and BERT

Interacting with Excel via BERT

Calling Python from Excel using xlwings

The Run button

Macros

UDFs

Calling R and Python from Excel via an API

An introduction to APIs

Open source solutions for exposing R as API endpoints

Open-source solutions for exposing Python as an API endpoint

Calling APIs from Excel VBA

Commercial API solutions for R and Python

Summary

Part 5: Data Analysis and Visualization with R and Python for Excel Data – A Case Study

12

Data Analysis and Visualization with R and Python in Excel – A Case Study

Technical requirements

Getting visualizations with R

Getting the data

Visualizing the data

Performing a simple ML model with R

Data preprocessing

Getting visualizations with Python

Getting the data

Visualizing the data

Performing a simple ML model with Python

Data preprocessing

Index

Other Books You May Enjoy

Part 1:The Basics – Reading and Writing Excel Files from R and Python

This introductory part lays the foundation for working with Excel files in both R and Python. The chapters cover essential tasks such as reading and writing Excel spreadsheets using popular libraries such as R and Python, enabling you to automate tasks and further enhance your Excel workflows using tools such as RDCOMClient, blastula, schedule, and smtplib for scheduling runs and sending emails.readxl, openxlsx, xlsx, pandas, and openpyxl. Additionally, you’ll learn how to execute VBA code.

This part has the following chapters:

Chapter 1, Reading Excel SpreadsheetsChapter 2, Writing Excel SpreadsheetsChapter 3, Executing VBA Code from R and PythonChapter 4, Automating Further – Task Scheduling and Email