Polars Cookbook - Yuki Kakegawa - E-Book

Polars Cookbook E-Book

Yuki Kakegawa

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The Polars Cookbook is a comprehensive, hands-on guide to Python Polars, one of the first resources dedicated to this powerful data processing library. Written by Yuki Kakegawa, a seasoned data analytics consultant who has worked with industry leaders like Microsoft and Stanford Health Care, this book offers targeted, real-world solutions to data processing, manipulation, and analysis challenges. The book also includes a foreword by Marco Gorelli, a core contributor to Polars, ensuring expert insights into Polars' applications.
From installation to advanced data operations, you’ll be guided through data manipulation, advanced querying, and performance optimization techniques. You’ll learn to work with large datasets, conduct sophisticated transformations, leverage powerful features like chaining, and understand its caveats. This book also shows you how to integrate Polars with other Python libraries such as pandas, numpy, and PyArrow, and explore deployment strategies for both on-premises and cloud environments like AWS, BigQuery, GCS, Snowflake, and S3.
With use cases spanning data engineering, time series analysis, statistical analysis, and machine learning, Polars Cookbook provides essential techniques for optimizing and securing your workflows. By the end of this book, you'll possess the skills to design scalable, efficient, and reliable data processing solutions with Polars.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 271

Veröffentlichungsjahr: 2024

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Polars Cookbook

Over 60 practical recipes to transform, manipulate, and analyze your data using Python Polars 1.x

Yuki Kakegawa

Polars Cookbook

Copyright © 2024 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager:Apeksha Shetty

Publishing Product Manager:Deepesh Patel

Book Project Manager: Farheen Fathima and Urvi Sharma

Senior Editor:Nazia Shaikh

Technical Editor: Kavyashree K S

Copy Editor: Safis Editing

Proofreader:Nazia Shaikh

Indexer:Pratik Shirodkar

Production Designers: Aparna Bhagat, Shankar Kalbhor, and Prafulla Nikalje

Senior DevRel Marketing Coordinator:Nivedita Singh

First published: August 2024

Production reference: 2020924

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK

ISBN 978-1-80512-115-2

www.packtpub.com

First and foremost, I’m forever grateful for my wife, who encouraged me to go on this endeavor and supported me throughout the process. Without her support and sacrifice, I couldn’t have written this book, let alone build my career.

Second, a big thanks to the Packt team, who ensured the quality and timeline of the book.

Third, I’d like to thank the author of Polars, Ritchie Vink, and other contributors who made Polars come to life and continue to develop it.

Finally, I’d like to express my gratitude to you, the readers. Thank you for reading my book.

– Yuki Kakegawa

Foreword

"Came for the speed, stayed for the syntax"

That's a common refrain among Polars enthusiasts. Indeed, the Polars API is truly beautiful: not only does it make for very readable code, but it also allows you to express complex aggregations that just aren't expressible with the pandas API.

Yuki has been a long-time fan of Polars. He has professional experience as a consultant. It's great to see him pair these together to produce a cookbook of practical recipes that you can use to solve real problems.

When should you use Polars? I think the best time is when you're starting a new project. Porting pandas code to Polars is certainly possible, but it's not necessarily easy. If you try thinking in Polars at the start of a new project, you'll likely surprise yourself with how expressive its API truly is, you'll use it idiomatically, and you'll make full use of its amazing features.

I'm sure you'll love learning about Polars whilst reading this book. And when you start your next data science project - please join the Polars Discord to say hello! Would love to hear about your experience!

Marco Gorelli

Polars and Pandas Contributor | Senior Software Engineer, Quansight

Contributors

About the author

Yuki Kakegawa is a data analytics professional with a background in computer science. Yuki has worked in the data space for the past several years, most of which has been spent in consulting, focusing on data engineering, analytics, and business intelligence. His clients are from various industries, such as healthcare, education, insurance, and private equity. He has worked with various companies, including Microsoft and Stanford Health Care, to name a couple.

He also runs Orem Data, a data analytics consultancy that helps companies improve their existing data and analytics infrastructure.

Aside from work, Yuki enjoys playing baseball and softball with his wife and friends.

About the reviewer

Mihai Gurău is an analytics and data professional with over eight years of experience, focusing on the “why?” behind analytics to drive meaningful action. In the airline industry, he has helped build bespoke revenue management decision support tools. His process mining implementation work effectively melded analytics with enterprise IT systems for process discovery and improvement. Nowadays, he contributes to fine-tuning product analytics and building robust data platform components for map-making and connected products and services. Beyond his professional pursuits, Mihai enjoys watersports and tries to keep abreast of relevant advancements in data and analytics engineering.

Table of Contents

Preface

1

Getting Started with Python Polars

Technical requirements

Introducing key features in Polars

Speed and efficiency

Expressions

The lazy API

See also

The Polars DataFrame

Getting ready

How to do it...

How it works...

There’s more...

See also

Polars Series

Getting ready

How to do it...

How it works...

There’s more...

See also

The Polars LazyFrame

How to do it...

How it works...

There’s more...

See also

Selecting columns and filtering data

Getting ready

How to do it...

How it works...

There’s more...

See also

Creating, modifying, and deleting columns

Getting ready

How to do it...

How it works...

There’s more...

See also

Understanding method chaining

Getting ready

How to do it...

How it works...

There’s more...

See also

Processing larger-than-RAM datasets

How to do it...

How it works...

There’s more...

See also

2

Reading and Writing Files

Technical requirements

Reading and writing CSV files

How to do it...

How it works...

There’s more...

See also

Reading and writing Parquet files

Getting ready

How to do it...

How it works...

There’s more...

See also

Reading and writing Delta Lake tables

Getting ready

How to do it...

How it works...

There’s more...

See also

Reading and writing JSON files

How to do it...

How it works...

There’s more...

See also

Reading and writing Excel files

Getting ready

How to do it...

How it works...

See also

Reading and writing other data file formats

Getting ready

How to do it...

How it works...

There’s more...

See also

Reading and writing multiple files

How to do it...

How it works...

There’s more...

See also

Working with databases

Getting ready

How to do it...

How it works…

See also

3

An Introduction to Data Analysis in Python Polars

Technical requirements

Inspecting the DataFrame

How to do it...

How it works...

There’s more...

See also

Casting data types

How to do it...

How it works...

There’s more...

See also

Handling duplicate values

How to do it...

How it works...

There’s more...

See also

Masking sensitive data

How to do it...

How it works...

There’s more...

See also

Visualizing data using Plotly

Getting ready

How to do it...

How it works...

See also

Detecting and handling outliers

Getting ready

How to do it...

How it works...

There’s more...

See also

4

Data Transformation Techniques

Technical requirements

Exploring basic aggregations

How to do it...

How it works...

There’s more...

See also

Using group by aggregations

Getting ready

How to do it...

How it works...

There’s more...

See also

Aggregating values across multiple columns

Getting ready

How to do it...

How it works...

There’s more...

See also

Computing with window functions

Getting ready

How to do it...

How it works...

There’s more...

See also

Applying UDFs

Getting ready

How to do it...

How it works...

There’s more...

See also

Using SQL for data transformations

Getting ready

How to do it…

How it works…

See also

5

Handling Missing Data

Technical requirements

Identifying missing data

Getting ready

How to do it...

How it works...

See also

Deleting rows and columns containing missing data

Getting ready

How to do it...

How it works...

There’s more...

See also

Filling in missing data

Getting ready

How to do it...

How it works...

There’s more...

See also

6

Performing String Manipulations

Technical requirements

Filtering strings

How to do it...

How it works...

There’s more...

See also

Converting strings into date, time, and datetime

How to do it...

How it works...

See also

Extracting substrings

How to do it...

How it works...

There’s more...

See also

Cleaning strings

Getting ready

How to do it...

How it works...

See also

Splitting strings into lists and structs

Getting ready

How to do it...

How it works...

See also

Concatenating and combining strings

Getting ready

How to do it...

How it works...

See also

7

Working with Nested Data Structures

Technical requirements

Creating lists

How to do it...

How it works...

There’s more...

See also

Aggregating elements in lists

How to do it...

How it works...

There’s more...

See also

Accessing and selecting elements in lists

Getting ready

How to do it...

How it works...

There’s more...

See also

Applying logic to each element in lists

Getting ready

How to do it...

How it works...

There’s more...

See also

Working with structs and JSON data

Getting ready

How to do it...

How to do it...

There’s more…

See also

8

Reshaping and Tidying Data

Technical requirements

Turning columns into rows

How to do it...

How it works...

See also

Turning rows into columns

Getting ready

How to do it...

How it works...

There’s more...

See also

Joining DataFrames

Getting ready

How to do it...

How it works...

There’s more...

See also

Concatenating DataFrames

Getting ready

How to do it...

How it works...

There’s more...

See also

Other techniques for reshaping data

Getting ready

How to do it...

How it works...

See also

9

Time Series Analysis

Technical requirements

Working with date and time

How to do it...

How it works...

There is more...

See also

Applying rolling window calculations

How it works...

There is more...

See also

Resampling techniques

How to do it...

How it works...

See also

Time series forecasting with the functime library

Getting ready

How to do it...

How it works...

There is more...

See also

10

Interoperability with Other Python Libraries

Technical requirements

Converting to and from a pandas DataFrame

Getting ready

How to do it...

How it works...

There’s more...

See also

Converting to and from NumPy arrays

Getting ready

How to do it...

How it works...

There’s more...

See also

Interoperating with PyArrow

Getting ready

How to do it...

How it works...

See also

Integrating with DuckDB

Getting ready

How to do it...

How it works...

See also

11

Working with Common Cloud Data Sources

Technical requirements

Working with Amazon S3

Getting ready

How to do it...

How it works...

See also

Working with Azure Blob Storage

Getting ready

How to do it...

How it works...

There’s more...

See also

Working with Google Cloud Storage

Getting ready

How to do it...

How it works...

See also

Working with BigQuery

Getting ready

How to do it...

How it works...

See also

Working with Snowflake

Getting ready

How to do it...

How it works...

See also

12

Testing and Debugging in Polars

Technical requirements

Debugging chained operations

How to do it...

How it works...

There’s more...

See also

Inspecting and optimizing the query plan

Getting ready

How to do it...

How it works...

There’s more...

See also

Testing data quality with cuallee

Getting ready

How to do it...

How it works...

There’s more...

See also

Running unit tests with pytest

Getting ready

How to do it...

How it works...

There’s more...

See also

Index

Other Books You May Enjoy