Graph Data Modeling in Python - Gary Hutson - E-Book

Graph Data Modeling in Python E-Book

Gary Hutson

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Graphs have become increasingly integral to powering the products and services we use in our daily lives, driving social media, online shopping recommendations, and even fraud detection. With this book, you’ll see how a good graph data model can help enhance efficiency and unlock hidden insights through complex network analysis.

Graph Data Modeling in Python will guide you through designing, implementing, and harnessing a variety of graph data models using the popular open source Python libraries NetworkX and igraph. Following practical use cases and examples, you’ll find out how to design optimal graph models capable of supporting a wide range of queries and features. Moreover, you’ll seamlessly transition from traditional relational databases and tabular data to the dynamic world of graph data structures that allow powerful, path-based analyses. As well as learning how to manage a persistent graph database using Neo4j, you’ll also get to grips with adapting your network model to evolving data requirements.

By the end of this book, you’ll be able to transform tabular data into powerful graph data models. In essence, you’ll build your knowledge from beginner to advanced-level practitioner in no time.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 369

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Graph Data Modeling in Python

A practical guide to curating, analyzing, and modeling data with graphs

Gary Hutson

Matt Jackson

BIRMINGHAM—MUMBAI

Graph Data Modeling in Python

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Reshma Raman

Publishing Product Manager: Arindam Majumder

Senior Editor: Nathanya Dias

Technical Editor: Rahul Limbachiya

Copy Editor: Safis Editing

Project Coordinator: Farheen Fathima

Proofreader: Safis Editing

Indexer: Subalakshmi Govindhan

Production Designer: Joshua Misquitta

Marketing Coordinator: Nivedita Singh

First published: July 2023

Production reference: 1210623

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80461-803-5

www.packtpub.com

To my son, Charlie, my wife, Kerry, and my supportive parents, Carol and Eric, plus my parents-in-law, Patricia and John. Thanks for all your love, support, and always being there to offer sage advice.

– Gary Hutson

To Lori, for her patience and support.

– Matt Jackson

Contributors

About the authors

Gary Hutson is an experienced Python and graph database developer. He has experience in Python, R, C, SQL, and many other programming languages, and has been working with databases of some form for 20+ years. Professionally, he works as the Head of Graph Data Science and Machine Learning for a company that uses machine learning (ML) and graph data science techniques to detect risks on social media and other platforms. He is experienced in many graph and ML techniques, specializing in natural language processing, computer vision, deep learning, and ML. His passion is using open sourced technologies to create useful toolsets and practical applied solutions, as this was the focus of his master’s degree.

Matt Jackson is a lead data scientist specializing in graph theory and network analytics. His interest in graphs was sparked during his PhD in systems biology, where network analysis was used to uncover novel features of cell organization. Since then, he has worked in diverse industries - from academia to intelligence, highlighting patterns and risk in complex data by harnessing the latest in graph algorithms and machine learning.

About the reviewer

Atul Kadlag, a seasoned professional in the business intelligence, data, and analytics industry, possesses diverse experience across different technologies and a proven track record of success. A self-motivated learner, he has excelled at working at various multinational companies for more than 15 years, leading transformative initiatives in business intelligence, data warehouses, and data analytics. Atul has immense experience in handling end-to-end projects in business intelligence and data warehouse technologies, and is dedicated to driving positive change and inspiring others through continuous learning, making a lasting impact in the data industry. His expertise involves SQL, Python, and business intelligence and data warehousing technologies.

Table of Contents

Preface

Part 1: Getting Started with Graph Data Modeling

1

Introducing Graphs in the Real World

Technical requirements

Why should you use graphs?

Composite components of a graph

The fundamentals of nodes and edges and the properties of a graph

Undirected graphs

Directed graphs

Node properties

Heterogeneous graphs

Schema design

Comparing RDBs and GDBs

GDBs to the rescue

The use of graphs across various industries

Introduction to NetworkX and igraph

NetworkX basics

igraph basics

Summary

2

Working with GraphData Models

Technical requirements

Making the transition from tabular to graph data

Examining the data

Designing a schema

Implementing the model in Python

Adding nodes and attributes

Adding edges

Writing a generic graph import method

The most popular TV show – a real-world use case

Examining the graph structure

Measuring connectedness

Looking at the top degree nodes

Using select() to interrogate the graph

Properties of our popular nodes

Summary

Part 2: Making the Graph Transition

3

Data Model Transformation – Relational to Graph Databases

Technical requirements

Recommending a game to a user

Installing MySQL

Setting up a MySQL database

Querying MySQL in Python

Examining the data in Python

Path-based analytics in tabular data

From relational to graph databases

Schema design

Ingestion considerations

Path-based analytics in igraph

Our recommendation system

Generic MySQL to igraph methods

A more advanced recommendation system using Jaccard similarity

Summary

4

Building a Knowledge Graph

Technical requirements

Introducing knowledge graphs

Cleaning the data for our knowledge graph

Ingesting data into a knowledge graph

Designing a knowledge graph schema

Linking text to terms

Constructing the knowledge graph

Knowledge graph analysis and community detection

Examining the knowledge graph structure

Identifying abstracts of interest

Identifying fields with community detection

Summary

Part 3: Storing and Productionizing Graphs

5

Working with Graph Databases

Technical requirements

Using graph databases

Neo4j as a graph database

The Cypher query language

Querying Neo4j from Python

Storing a graph in Neo4j

Preprocessing data

Moving nodes, edges, and properties to Neo4j

Optimizing travel with Python and Cypher

Travel recommendations

Moving to ingestion pipelines

Summary

6

Pipeline Development

Technical requirements

Graph pipeline development

A graph database for retail

Designing a schema and pipeline

Setting up a new database

Schema design

Adding static product information

Simulating customer interactions

Making product recommendations

Product recommendations by brand

Drawing on other customers purchases

Using similarity scores to recommend products

Summary

7

Refactoring and Evolving Schemas

Technical requirements

Refactoring reasoning

Change in relational and graph databases

Effectively evolving with graph schema design

Putting the changes into development

Initializing a new database

Adding constraints

Pre-change schema

Updating the schema

Summary

Part 4: Graphing Like a Pro

8

Perfect Projections

Technical requirements

What are projections?

How to use a projection

Creating a projection in igraph

Creating a projection in Neo4j

Putting the projection to work

Analyzing the igraph actor projection

Exploring connected components

Exploring cliques in our graph

Analyzing the Neo4j film projection

Summary

9

Common Errors and Debugging

Technical requirements

Debugging graph issues

Common igraph issues

No nodes in the graph

Node IDs in igraph

Adding properties

Using the select method

Chained statements and select

Efficiency and path lengths

Common Neo4j issues

Slow writing from file to Neo4j

Indexing for query performance

Caching results

Memory limitations

Handling duplicates with MERGE

Handling duplicates with constraints

EXPLAIN, PROFILE, and the eager operator

Summary

Index

Other Books You May Enjoy

Part 1: Getting Started with Graph Data Modeling

This will be our first delve into graph data modelling in Python. This part covers what you need to know with regard to graph data modelling, such as why and when you need to use graphs; analyzing the fundamentals of graphs and how they are used in industry; and introducing the core packages you will be working with in these chapters, igraph and NetworkX.

Moving on from the fundamentals, we will then look at how to work with graph data models and work through a television recommendation use case as a Python pipeline.

This will serve as the entry-level part of this book and it has the following chapters:

Chapter 1, Introducing Graphs in the Real WorldChapter 2, Working with Graph Data Models

2

Working with Graph Data Models

This chapter will move you toward taking what you have learned hitherto and moving from a business problem through to how to obtain the data and then to getting that data graph ready. In this chapter, the aim is to teach you the fundamental skills needed to start working with graph data models at pace.

It will focus on many of the key skills to get up to speed with working with graph data models and many of the attributes of a graph structure. In the following sections, the aim will be to get you familiar with igraph and how to use it to ingest data into your graph.

From there, we'll move on to building your understanding of how to model nodes and edges in a graph. This will culminate in working on a use case to cement and reinforce what you will learn in this chapter.

The use case will touch on the key techniques needed to model a graph structure and what is meant by degree centrality.

In this chapter, we’re going to cover the following main topics:

Making the transition from tabular to graph dataImplementing the model in PythonThe most popular TV show – a real-world use case

Technical requirements

We will be using Jupyter notebooks to run our coding exercises; this requires python>=3.8.0, along with the following packages, which will need to be installed with the pip install command in your environment:

networkx==2.8.8igraph==0.9.8matplotlib

All notebooks, with the coding exercises, are available at the following GitHub link: https://github.com/PacktPublishing/Graph-Data-Modeling-in-Python/tree/main/CH02.

Making the transition from tabular to graph data

To introduce the power of a graph data model, we will first focus on using a real social media dataset, from Facebook. This open source data contains information on Facebook pages, their name, and the type of page. Four types of pages are included, namely those for TV shows, companies, politicians, and governmental organizations. In addition, we have data on mutual likes between pages. If two pages like each other on Facebook, this is represented in our data.