32,39 €
Graphs have become increasingly integral to powering the products and services we use in our daily lives, driving social media, online shopping recommendations, and even fraud detection. With this book, you’ll see how a good graph data model can help enhance efficiency and unlock hidden insights through complex network analysis.
Graph Data Modeling in Python will guide you through designing, implementing, and harnessing a variety of graph data models using the popular open source Python libraries NetworkX and igraph. Following practical use cases and examples, you’ll find out how to design optimal graph models capable of supporting a wide range of queries and features. Moreover, you’ll seamlessly transition from traditional relational databases and tabular data to the dynamic world of graph data structures that allow powerful, path-based analyses. As well as learning how to manage a persistent graph database using Neo4j, you’ll also get to grips with adapting your network model to evolving data requirements.
By the end of this book, you’ll be able to transform tabular data into powerful graph data models. In essence, you’ll build your knowledge from beginner to advanced-level practitioner in no time.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 369
Veröffentlichungsjahr: 2023
A practical guide to curating, analyzing, and modeling data with graphs
Gary Hutson
Matt Jackson
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Reshma Raman
Publishing Product Manager: Arindam Majumder
Senior Editor: Nathanya Dias
Technical Editor: Rahul Limbachiya
Copy Editor: Safis Editing
Project Coordinator: Farheen Fathima
Proofreader: Safis Editing
Indexer: Subalakshmi Govindhan
Production Designer: Joshua Misquitta
Marketing Coordinator: Nivedita Singh
First published: July 2023
Production reference: 1210623
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80461-803-5
www.packtpub.com
To my son, Charlie, my wife, Kerry, and my supportive parents, Carol and Eric, plus my parents-in-law, Patricia and John. Thanks for all your love, support, and always being there to offer sage advice.
– Gary Hutson
To Lori, for her patience and support.
– Matt Jackson
Gary Hutson is an experienced Python and graph database developer. He has experience in Python, R, C, SQL, and many other programming languages, and has been working with databases of some form for 20+ years. Professionally, he works as the Head of Graph Data Science and Machine Learning for a company that uses machine learning (ML) and graph data science techniques to detect risks on social media and other platforms. He is experienced in many graph and ML techniques, specializing in natural language processing, computer vision, deep learning, and ML. His passion is using open sourced technologies to create useful toolsets and practical applied solutions, as this was the focus of his master’s degree.
Matt Jackson is a lead data scientist specializing in graph theory and network analytics. His interest in graphs was sparked during his PhD in systems biology, where network analysis was used to uncover novel features of cell organization. Since then, he has worked in diverse industries - from academia to intelligence, highlighting patterns and risk in complex data by harnessing the latest in graph algorithms and machine learning.
Atul Kadlag, a seasoned professional in the business intelligence, data, and analytics industry, possesses diverse experience across different technologies and a proven track record of success. A self-motivated learner, he has excelled at working at various multinational companies for more than 15 years, leading transformative initiatives in business intelligence, data warehouses, and data analytics. Atul has immense experience in handling end-to-end projects in business intelligence and data warehouse technologies, and is dedicated to driving positive change and inspiring others through continuous learning, making a lasting impact in the data industry. His expertise involves SQL, Python, and business intelligence and data warehousing technologies.
This will be our first delve into graph data modelling in Python. This part covers what you need to know with regard to graph data modelling, such as why and when you need to use graphs; analyzing the fundamentals of graphs and how they are used in industry; and introducing the core packages you will be working with in these chapters, igraph and NetworkX.
Moving on from the fundamentals, we will then look at how to work with graph data models and work through a television recommendation use case as a Python pipeline.
This will serve as the entry-level part of this book and it has the following chapters:
Chapter 1, Introducing Graphs in the Real WorldChapter 2, Working with Graph Data ModelsThis chapter will move you toward taking what you have learned hitherto and moving from a business problem through to how to obtain the data and then to getting that data graph ready. In this chapter, the aim is to teach you the fundamental skills needed to start working with graph data models at pace.
It will focus on many of the key skills to get up to speed with working with graph data models and many of the attributes of a graph structure. In the following sections, the aim will be to get you familiar with igraph and how to use it to ingest data into your graph.
From there, we'll move on to building your understanding of how to model nodes and edges in a graph. This will culminate in working on a use case to cement and reinforce what you will learn in this chapter.
The use case will touch on the key techniques needed to model a graph structure and what is meant by degree centrality.
In this chapter, we’re going to cover the following main topics:
Making the transition from tabular to graph dataImplementing the model in PythonThe most popular TV show – a real-world use caseWe will be using Jupyter notebooks to run our coding exercises; this requires python>=3.8.0, along with the following packages, which will need to be installed with the pip install command in your environment:
networkx==2.8.8igraph==0.9.8matplotlibAll notebooks, with the coding exercises, are available at the following GitHub link: https://github.com/PacktPublishing/Graph-Data-Modeling-in-Python/tree/main/CH02.
To introduce the power of a graph data model, we will first focus on using a real social media dataset, from Facebook. This open source data contains information on Facebook pages, their name, and the type of page. Four types of pages are included, namely those for TV shows, companies, politicians, and governmental organizations. In addition, we have data on mutual likes between pages. If two pages like each other on Facebook, this is represented in our data.
