Graph Machine Learning - Claudio Stamile - E-Book

Graph Machine Learning E-Book

Claudio Stamile

0,0
38,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Graph Machine Learning will introduce you to a set of tools used for processing network data and leveraging the power of the relation between entities that can be used for predictive, modeling, and analytics tasks. The first chapters will introduce you to graph theory and graph machine learning, as well as the scope of their potential use. You’ll then learn all you need to know about the main machine learning models for graph representation learning: their purpose, how they work, and how they can be implemented in a wide range of supervised and unsupervised learning applications. You'll build a complete machine learning pipeline, including data processing, model training, and prediction in order to exploit the full potential of graph data. After covering the basics, you’ll be taken through real-world scenarios such as extracting data from social networks, text analytics, and natural language processing (NLP) using graphs and financial transaction systems on graphs. You’ll also learn how to build and scale out data-driven applications for graph analytics to store, query, and process network information, and explore the latest trends on graphs. By the end of this machine learning book, you will have learned essential concepts of graph theory and all the algorithms and techniques used to build successful machine learning applications.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 374

Veröffentlichungsjahr: 2021

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Graph Machine Learning

Take graph data to the next level by applying machine learning techniques and algorithms

Claudio Stamile

Aldo Marzullo

Enrico Deusebio

BIRMINGHAM—MUMBAI

Graph Machine Learning

Copyright © 2021 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Group Product Manager: Kunal Parikh

Publishing Product Manager: Devika Battike

Senior Editor: Roshan Kumar

Content Development Editor: Sean Lobo

Technical Editor: Sonam Pandey

Copy Editor: Safis Editing

Project Coordinator: Aparna Ravikumar Nair

Proofreader: Safis Editing

Indexer: Vinayak Purushotham

Production Designer: Joshua Misquitta

First published: May 2021

Production reference: 1270521

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-80020-449-2

www.packt.com

Alla memoria di mio Zio, Franchino Avolio. Alle ruote delle bici troppo sgonfie, all'infanzia che mi ha regalato.

In memory of my uncle, Franchino Avolio. To the wheels of bikes that are too flat, to the childhood he gave me.

– Claudio Stamile

To my family, my roots.

– Aldo Marzullo

To Lili, for always reminding me with your 'learning' process how wonderful the human brain and life are.

– Enrico Deusebio

Contributors

About the authors

Claudio Stamile received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2013 and, in September 2017, he received his joint Ph.D. from KU Leuven (Leuven, Belgium) and Université Claude Bernard Lyon 1 (Lyon, France). During his career, he has developed a solid background in artificial intelligence, graph theory, and machine learning, with a focus on the biomedical field. He is currently a senior data scientist in CGnal, a consulting firm fully committed to helping its top-tier clients implement data-driven strategies and build AI-powered solutions to promote efficiency and support new business models.

Aldo Marzullo received an M.Sc. degree in computer science from the University of Calabria (Cosenza, Italy) in September 2016. During his studies, he developed a solid background in several areas, including algorithm design, graph theory, and machine learning. In January 2020, he received his joint Ph.D. from the University of Calabria and Université Claude Bernard Lyon 1 (Lyon, France), with a thesis entitled Deep Learning and Graph Theory for Brain Connectivity Analysis in Multiple Sclerosis. He is currently a postdoctoral researcher at the University of Calabria and collaborates with several international institutions.

Enrico Deusebio is currently the chief operating officer at CGnal, a consulting firm that helps its top-tier clients implement data-driven strategies and build AI-powered solutions. He has been working with data and large-scale simulations using high-performance facilities and large-scale computing centers for over 10 years, both in an academic and industrial context. He has collaborated and worked with top-tier universities, such as the University of Cambridge, the University of Turin, and the Royal Institute of Technology (KTH) in Stockholm, where he obtained a Ph.D. in 2014. He also holds B.Sc. and M.Sc. degrees in aerospace engineering from Politecnico di Torino.

About the reviewers

Kacper Kubara is a technical co-founder of Artemo and a data engineer at Annual Insight, and is currently pursuing a postgraduate degree in AI at the University of Amsterdam. Despite the focus of his research being graph representation learning, he is also interested in the tools and methods that help to bridge the gap between the AI industry and academia.

Tural Gulmammadov has been leading a group of data scientists and machine learning engineers at Oracle to tackle applied machine learning problems from various industries. He is dedicated to and motivated by the applications of graph theory and discrete mathematics in machine learning over distributed computational environments. He is a cognitive science, statistics, and psychology enthusiast, as well as a chess player, painter, seasonal horse rider, and paddler.

Table of Contents

Preface

Section 1 – Introduction to Graph Machine Learning

Chapter 1: Getting Started with Graphs

Technical requirements

Introduction to graphs with networkx

Types of graphs9

Graph representations14

Plotting graphs

networkx18

Gephi21

Graph properties

Integration metrics27

Segregation metrics30

Centrality metrics32

Resilience metrics35

Benchmarks and repositories

Examples of simple graphs36

Generative graph models 38

Benchmarks40

Dealing with large graphs

Summary 

Chapter 2: Graph Machine Learning

Technical requirements

Understanding machine learning on graphs

Basic principles of machine learning53

The benefit of machine learning on graphs55

The generalized graph embedding problem

The taxonomy of graph embedding machine learning algorithms

The categorization of embedding algorithms65

Summary 

Section 2 – Machine Learning on Graphs

Chapter 3: Unsupervised Graph Learning

Technical requirements

The unsupervised graph embedding roadmap

Shallow embedding methods

Matrix factorization74

Skip-gram81

Autoencoders

TensorFlow and Keras – a powerful combination93

Our first autoencoder95

Denoising autoencoders100

Graph autoencoders102

Graph neural networks

Variants of GNNs106

Spectral graph convolution107

Spatial graph convolution110

Graph convolution in practice111

Summary 

Chapter 4: Supervised Graph Learning

Technical requirements

The supervised graph embedding roadmap 

Feature-based methods 

Shallow embedding methods 

Label propagation algorithm121

Label spreading algorithm127

Graph regularization methods

Manifold regularization and semi-supervised embedding 132

Neural Graph Learning134

Planetoid144

Graph CNNs

Graph classification using GCNs145

Node classification using GraphSAGE148

Summary 

Chapter 5: Problems with Machine Learning on Graphs

Technical requirements

Predicting missing links in a graph

Similarity-based methods154

Embedding-based methods158

Detecting meaningful structures such as communities

Embedding-based community detection 164

Spectral methods and matrix factorization165

Probability models166

Cost function minimization167

Detecting graph similarities and graph matching

Graph embedding-based methods171

Graph kernel-based methods171

GNN-based methods171

Applications172

Summary 

Section 3 – Advanced Applications of Graph Machine Learning

Chapter 6: Social Network Graphs

Technical requirements

Overview of the dataset

Dataset download179

Loading the dataset using networkx180

Network topology and community detection

Topology overview182

Node centrality183

Community detection186

Embedding for supervised and unsupervised tasks

Task preparation189

node2vec-based link prediction190

GraphSAGE-based link prediction191

Hand-crafted features for link prediction197

Summary of results199

Summary

Chapter 7: Text Analytics and Natural Language Processing Using Graphs

Technical requirements

Providing a quick overview of a dataset

Understanding the main concepts and tools used in NLP

Creating graphs from a corpus of documents

Knowledge graphs210

Bipartite document/entity graphs212

Building a document topic classifier

Shallow learning methods234

Graph neural networks238

Summary

Chapter 8:Graph Analysis for Credit Card Transactions

Technical requirements

Overview of the dataset

Loading the dataset and graph building using networkx254

Network topology and community detection

Network topology260

Community detection264

Embedding for supervised and unsupervised fraud detection

Supervised approach to fraudulent transaction identification271

Unsupervised approach to fraudulent transaction identification274

Summary

Chapter 9: Building a Data-Driven Graph-Powered Application

Technical requirements

Overview of Lambda architectures

Lambda architectures for graph-powered applications

Graph processing engines285

Graph querying layer288

Selecting between Neo4j and GraphX293

Summary

Chapter 10: Novel Trends on Graphs

Technical requirements 

Learning about data augmentation for graphs

Sampling strategies297

Exploring data augmentation techniques298

Learning about topological data analysis

Topological machine learning300

Applying graph theory in new domains

Graph machine learning and neuroscience302

Graph theory and chemistry and biology304

Graph machine learning and computer vision304

Recommendation systems305

Summary

Why subscribe?

Other Books You May Enjoy

Preface

Graph Machine Learning provides a new set of tools for processing network data and leveraging the power of the relationship between entities that can be used for predictive, modeling, and analytics tasks.

You will start with a brief introduction to graph theory and Graph Machine Learning, learning to understand their potential. As you proceed, you will become well versed with the main machine learning models for graph representation learning: their purpose, how they work, and how they can be implemented in a wide range of supervised and unsupervised learning applications. You'll then build a complete machine learning pipeline, including data processing, model training, and prediction, in order to exploit the full potential of graph data. Moving on, you will cover real-world scenarios, such as extracting data from social networks, text analytics, and natural language processing using graphs and financial transaction systems on graphs. Finally, you will learn how to build and scale out data-driven applications for graph analytics to store, query, and process network information, before progressing to explore the latest trends on graphs.

By the end of this machine learning book, you will have learned the essential concepts of graph theory and all the algorithms and techniques used to build successful machine learning applications.

Who this book is for

This book is for data analysts, graph developers, graph analysts, and graph professionals who want to leverage the information embedded in the connections and relations between data points, unravel hidden structures, and exploit topological information to boost their analysis and models' performance. The book will also be useful for data scientists and machine learning developers who want to build machine learning-driven graph databases. A beginner-level understanding of graph databases and graph data is required. An intermediate-level working knowledge of Python programming and machine learning is also expected to make the most out of this book.

What this book covers

Chapter 1, Getting Started with Graphs, introduces the basic concepts of graph theory using the NetworkX Python library.

Chapter 2, Graph Machine Learning, introduces the main concepts of graph machine learning and graph embedding techniques.

Chapter 3, Unsupervised Graph Learning, covers recent unsupervised graph embedding methods.

Chapter 4, Supervised Graph Learning, covers recent supervised graph embedding methods.

Chapter 5, Problems with Machine Learning on Graphs, introduces the most common machine learning tasks on graphs.

Chapter 6, Social Network Analysis, shows an application of machine learning algorithms on social network data.

Chapter 7, Text Analytics and Natural Language Processing Using Graphs, shows the application of machine learning algorithms to natural language processing tasks.

Chapter 8, Graph Analysis for Credit Card Transactions, shows the application of machine learning algorithms to credit card fraud detection.

Chapter 9, Building a Data-Driven Graph-Powered Application, introduces some technologies and techniques that are useful for dealing with large graphs.

Chapter 10, Novel Trends on Graphs, introduces some novel trends (algorithms and applications) in graph machine learning.

To get the most out of this book

A Jupyter or a Google Colab notebook is sufficient to cover all the examples. For some chapters, Neo4j and Gephi are also required.

If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Graph-Machine-Learning. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781800204492_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

html, body, #map {

height: 100%;

margin: 0;

padding: 0

}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

Jupyter==1.0.0

networkx==2.5

matplotlib==3.2.2

node2vec==0.3.3

karateclub==1.0.19

scipy==1.6.2

Any command-line input or output is written as follows:

$ mkdir css

$ cd css

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Tips or important notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1 – Introduction to Graph Machine Learning

In this section, the reader will get a brief introduction to graph machine learning, showing the potential of graphs combined with the right machine learning algorithms. Moreover, a general overview of graph theory and Python libraries is provided in order to allow the reader to deal with (that is, create, modify, and plot) graph data structures.

This section comprises the following chapters:

Chapter 1, Getting Started with GraphsChapter 2, Graph Machine Learning