39,59 €
Neo4j is a graph database that allows traversing huge amounts of data with ease. This book aims at quickly getting you started with the popular graph database Neo4j.
Starting with a brief introduction to graph theory, this book will show you the advantages of using graph databases along with data modeling techniques for graph databases. You'll gain practical hands-on experience with commonly used and lesser known features for updating graph store with Neo4j's Cypher query language. Furthermore, you'll also learn to create awesome procedures using APOC and extend Neo4j's functionality, enabling integration, algorithmic analysis, and other advanced spatial operation capabilities on data.
Through the course of the book you will come across implementation examples on the latest updates in Neo4j, such as in-graph indexes, scaling, performance improvements, visualization, data refactoring techniques, security enhancements, and much more. By the end of the book, you'll have gained the skills to design and implement modern spatial applications, from graphing data to unraveling business capabilities with the help of real-world use cases.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2014
Second Edition: October 2017
Production reference: 1171017
ISBN 978-1-78646-614-3
www.packtpub.com
Authors
Jérôme Baton
Rik Van Bruggen
Copy Editor
Tasneem Fatehi
Reviewers
Taffy Brecknock
Jose Ernesto Echeverria
Adriano Longo
Project Coordinator
Manthan Patel
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Vinay Argekar
Indexer
Tejal Daruwale Soni
Content Development Editor
Jagruti Babaria
Tejas Limkar
Graphics
Tania Dutta
Technical EditorDinesh Chaudhary
Dharmendra Yadav
Production Coordinator
Deepika Naik
Jérôme Baton started hacking computers at the age of skin problems, gaming first then continued his trip by self-learning Basic on Amstrad CPC, peaking on coding a full screen horizontal starfield, and messing the interlace of the video controller so that sprites appeared twice as high in horizontal beat'em up games. Disks were three inches for 178 Kb then.
Then, for gaming reasons, he switched to Commodore Amiga and its fantastic AMOS Basic. Later caught by seriousness and studies, he wrote Turbo Pascal, C, COBOL, Visual C++, and Java on PCs and mainframes at university, and even Logo in high school. Then, Java happened and he became a consultant, mostly on backend code of websites in many different businesses.
Jérôme authored several articles in French on Neo4j, JBoss Forge, an Arduino workshop for Devoxx4Kids, and reviewed kilos of books on Android. He has a weakness for wordplay, puns, spoonerisms, and Neo4j that relieves him from join(t) pains.
Jérôme also has the joy to teach in French universities, currently at I.U.T de Paris, Université Paris V - René Descartes (Neo4j, Android), and Université de Troyes (Neo4j), where he does his best to enterTRain the students.
When not programming, Jérôme enjoys photography, doing electronics, everything DIY, understanding how things work, trying to be clever or funny on Twitter, and spends a lot of time trying to understand his kids and life in general.
Rik Van Bruggen is the VP of Sales for Neo Technology for Benelux, UK, and the Nordic region. He has been working for startup companies for most of his career, including eCom Interactive Expertise, SilverStream Software, Imprivata, and Courion. While he has an interest in technology, his real passion is business and how to make technology work for a business. He lives in Antwerp, Belgium, with his wife and three lovely kids, and enjoys technology, orienteering, jogging, and Belgian beer.
I would like to thank many people for this project that is truly a great personal achievement for me.
First of all, Rik Van Bruggen, who is the original author of this book and literally, the giant on whose shoulders I stand. Secondly, Vinay and Jagruti from Packt Publishing for their patience with a slow writer.
Thank you, William LyOn, Cédric FauVEt, Mark NEedham, BenOit Simard, Michael Hunger, Craig Taverner, and Jim Webber from Neo4j for their help and sharing their knowledge over the last few years on Stack Overflow, on Slack, or in person.
This would not have been possible if I myself had not had inspiring teachers such as Daniel 'DG' Guillaume, Françoise Meunier, Florence Fessy-Mesatfa, and Jérôme Fessy from IUT de Paris, and Dr. Robert T Hughes, Richard N Griffith, and Graham Winstanley from the University of Brighton.
Going further in the past, there are more teachers from whom I learned pedagogy and inspired me to share; I remember you, Mrs. Legrand, Mrs. Viala, and Mr. Bouhadda. Also, not being a native English speaker, I was at first very bad at speaking English. Extra energy from Mrs Goddard and Mrs Maluski really unlocked this second language for me.
Teachers change lives!
Also thanks to the doctors of my national health service without whom I would be a souvenir already. Vive la Sécurité Sociale!
Basically, I would like to thank all the people I learned from, be they teachers or not. Including my students.
Thank you, Romin Irani (@iRomin), my friend--you are an example.
Thank you, Anny Naïm, you are a truly shining person.
Above all, love you, kiddos!
I really should make a graph of all the people I would like to thank.
Taffy Brecknock has worked in the IT industry for more than 20 years. During his career, he has worked as a software developer, managed development teams, and has been responsible for application design and more recently systems architecture.
He has held roles with both public and private sector organizations. While working with the Australian Government, Taffy got first-hand exposure to the use of connected data in law enforcement. After using relational database systems as the data repository, he is experienced in the short comings of using this paradigm to model such systems.
After learning about graph databases, specifically Neo4j, he has become extremely interested in the many different applications of this technology. He feels that there are few problems in today's business world that cannot benefit from being modeled in a graph.
Jose Ernesto Echeverria started working with relational databases in the 90s, and has been working with Neo4j since 2014. He prefers graph databases over others, given their capabilities for real-world modeling and their adaptability to change. As a polyglot programmer, he has used languages such as Java, Ruby, and R with Neo4j in order to solve data management problems of multinational corporations. He is a regular attendee of GraphConnect, OSCON, and RailsConf. When not working, he enjoys spending time with family, road trips, Minecraft projects with his children, as well as reading and drinking craft beers.
Adriano Longo is a freelance data analyst based in the Netherlands with a passion for Neo4j's relationship-oriented data model.
He is specialized in querying, processing, and modeling data with Cypher, R, Python, and SQL and has worked on climate prediction models at UEA's Climatic Research Unit before focusing on analytical solutions for the private sector.
Today, Adriano uses Neo4j and Linkurious.js to explore the complex web of relationships that nefarious actors use to obfuscate their abuse of environmental and financial regulations--making dirty secrets less transparent, one graph at a time.
For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1786466147.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
Graph Theory and Databases
Introducing Neo4j 3.x and a history of graphs
Definition and usage of the graph theory
Social studies
Biological studies
Computer science
Flow problems
Route problems
Web search
Background
Navigational databases
Relational databases
NoSQL databases
Key-value stores
Column-family stores
Document stores
Graph databases
The Property Graph model of graph databases
Node labels
Relationship types
Why use graph databases, or not
Why use a graph database?
Complex queries
In-the-clickstream queries on live data
Pathfinding queries
When not to use a graph database and what to use instead
Large set-oriented queries
Graph global operations
Simple aggregate-oriented queries
Test questions
Summary
Getting Started with Neo4j
Key concepts and characteristics of Neo4j
Built for graphs from the ground up
Transactional ACID-compliant database
Made for online transaction processing
Designed for scalability
A declarative query language - Cypher
Sweet spot use cases of Neo4j
Complex join-intensive queries
Pathfinding queries
Committed to open source
The features
The support
The license conditions
Installing Neo4j
Installing Neo4j on Windows
Installing Neo4j on Mac or Linux
Using Neo4j in a cloud environment
Sandbox
Using Neo4j in a Docker container
Installing Docker
Preparing the filesystem
Running Neo4j in a Docker container
Test questions
Summary
Modeling Data for Neo4j
The four fundamental data constructs
How to start modeling for graph databases
What we know – ER diagrams and relational schemas
Introducing complexity through join tables
A graph model – a simple, high-fidelity model of reality
Graph modeling – best practices and pitfalls
Graph modeling best practices
Designing for query-ability
Aligning relationships with use cases
Looking for n-ary relationships
Granulate nodes
Using in-graph indexes when appropriate
Graph database modeling pitfalls
Using rich properties
Node representing multiple concepts
Unconnected graphs
The dense node pattern
Test questions
Summary
Getting Started with Cypher
Writing the Cypher syntax
Key attributes of Cypher
Being crude with the data
Create data
Read data
Update data
Delete data
Key operative words in Cypher
Syntax norms
More that you need to know
With a little help from my friends
The Cypher refcard
The openCypher project
Summary
Awesome Procedures on Cypher - APOC
Installing APOC
On a hardware server
On a Docker container
Verifying APOC installation
Functions and procedures
My preferred usages
A little help from a friend
Graph overview
Several key usages
Setup
Random graph generators
PageRank
Timeboxed execution of Cypher statements
Linking of a collection of nodes
There's more in APOC
Test questions
Summary
Extending Cypher
Building an extension project
Creating a function
Creating a procedure
Custom aggregators
Unmanaged extensions
HTTP and JAX-RS refreshers
Registering
Accessing
Streaming JSON responses
Summary
Query Performance Tuning
Explain and profile instructions
A query plan
Operators
Indexes
Force index usage
Force label usage
Rules of thumb
Explain all the queries
Rows
Do not overconsume
Cartesian or not?
Simplicity
Summary
Importing Data into Neo4j
LOAD CSV
Scaling the import
Importing from a JSON source
Importing from a JDBC source
Test setup
Importing all the systems
Importing from an XML source
Summary
Going Spatial
What is spatial?
Refresher
Not faulty towers
What is so spatial then?
Neo4j's spatial features
APOC spatial features
Geocoding
Setting up OSM as provider
Setting up Google as provider
Neo4j spatial
Online demo
Features
Importing OpenStreetMap data
Large OSM Imports
Easy way
The tougher way to import data
Restroom please
Understanding WKT and BBOX
Removing all the geo data
Summary
Security
Authentication and authorization
Roles
Other roles
Users management
Linking Neo4j to an LDAP directory
Starting the directory
Configuring Neo4j to use LDAP
Test questions
Summary
Visualizations for Neo4j
The power of graph visualizations
Why graph visualizations matter!
Interacting with data visually
Looking for patterns
Spot what's important
The basic principles of graph visualization
Open source visualization libraries
D3.js
GraphViz
Sigma.js
Vivagraph.js
yWorks
Integrating visualization libraries in your application
Visualization solutions
Gephi
Keylines
Keylines graph visualization
Linkurio.us
Neo4j Browser
Tom Sawyer Software for graph visualization
Closing remarks on visualizations - pitfalls and issues
The fireworks effect
The loading effect
Cytoscape example
Source code
Questions and answers
Summary
Data Refactoring with Neo4j
Preliminary step
Simple changes
Renaming
Adding data
Adding data with a default value
Adding data with specific values
Checking our values
Removing data
Great changes
Know your model
Refactoring tools
Property to label
Property to node
Related node to label
Merging nodes
Relations
Consequences
Summary
Clustering
Why set up a cluster?
Concepts
Core servers
Read replica servers
High throughput
Data redundancy
High availability
Bolt
Building a cluster
The core servers
The read replicas
The bolt+routing protocol
Disaster recovery
Summary
Use Case Example - Recommendations
Recommender systems dissected
Using a graph model for recommendations
Specific query examples for recommendations
Recommendations based on product purchases
Recommendations based on brand loyalty
Recommendations based on social ties
Bringing it all together - compound recommendations
Business variations on recommendations
Fraud detection systems
Access control systems
Social networking systems
Questions and answers
Summary
Use Case Example - Impact Analysis and Simulation
Impact analysis systems dissected
Impact analysis in business process management
Modeling your business as a graph
Which applications are used in which buildings?
Which buildings are affected if something happens to Appl_9?
What business processes with an RTO of 0-2 hours would be affected by a fire at location Loc_100?
Impact simulation in a cost calculation environment
Modeling your product hierarchy as a graph
Working with a product hierarchy graph
Calculating the price based on a full sweep of the tree
Calculating the price based on intermediate pricing
Impact simulation on product hierarchy
Questions and answers
Summary
Tips and Tricks
Reset password
Check for other hosts
Getting the first line of a CSV file
Enabling SSH on a Raspberry Pi
Creating guides for the Neo4j browser
Data backup and restore
Community version
Enterprise version
Tools
Cypher-shell
Data integration tools
Modeling tools
Arrows
OmniGraffle
Community projects
Online documentation
Community
More proverbs
Learning Neo4j 3.x will give you the keys to graph databases and Neo4j in particular. From concepts to applications, you will learn a lot about Neo4j and will wonder why using relational databases again.
Chapter 1, Graph Theory and Databases, explains the fundamental theoretical and historical underpinnings of graph database technology. Additionally, this chapter positions graph databases in an ever-changing database landscape. It compares the technology/industry with other data technologies out there.
Chapter 2, Getting Started with Neo4j, introduces the specific Neo4j implementation of a graph database and looks at key concepts and characteristics.
Chapter 3, Modeling Data for Neo4j, covers the basic modeling techniques for graph databases.
Chapter 4, Getting Started with Cypher, provides an overview of the Cypher query language.
Chapter 5, Awesome Procedures on Cypher - APOC, introduces the APOC library. You will learn how to use it within Cypher queries, get information on it, and find the procedure you need among the hundreds provided by the community.
Chapter 6, Extending Cypher, talks about adding functions and procedures to a Neo4j instance. Write your own APOC.
Chapter 7, Query Performance Tuning, shows you how to tune your Cypher queries for better performance.
Chapter 8, Importing Data into Neo4j, explains how to import data from different kinds of sources.
Chapter 9, Going Spatial, covers the geolocation capabilities of Neo4j, APOC, and Neo4j Spatial.
Chapter 10, Security, covers authentication and authorization in Neo4j.
Chapter 11, Visualizations for Neo4j, shows you how to display your data.
Chapter 12, Data Refactoring with Neo4j, explains how to change the data model to fit new requirements.
Chapter 13, Clustering, sets up a causal cluster using the Neo4j Enterprise edition.
Chapter 14, Use-Case Example – Recommendations, digs into a specific graph database use case--real-time recommendations--and explains it using a specific example dataset/query patterns.
Chapter 15, Use-Case Example – Impact Analysis and Simulation, analyzes the impact of a change in the network on the rest of the network. In this part of the book, we will explain and explore that use case.
Appendix, Tips and Tricks, provides tips and more knowledge, don't miss it.
To run the software and examples, you will need a decent developer station with Java 7 or better, with 4 GB of RAM and 2 GB of free disk space.
Examples are provided for the GNU/Linux systems.
Most chapters apply to Neo4j Community Edition and Neo4j Enterprise Edition, except Chapter 10, Security, and Chapter 13, Clustering.
In the later chapters, two laptops, several Raspberry Pis, and Docker containers are used.
This book is for developers who want an alternative way to store and process data within their applications or developers who have to deal with highly connected data. No previous graph database experience is required; however, some basic database knowledge will help you understand the concepts more easily.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Neo4j-3x-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
People have different ways of learning new topics. We know that background information can contribute greatly to a better understanding of new topics. That is why, in this chapter of our Learning Neo4j 3.x book, we will start with a bit of background information, not to recount the tales of history, but to give you the necessary context that can lead to a better understanding of the topics.
In order to do so, we will address the following topics:
Graphs
: What they are and where they came from. This section will aim to set the record straight on what, exactly, our subject will contain, and what it won't.
Graph theory
: What it is and what it is used for. This section will give you quite a few examples of graph theory applications, and it will also start hinting at applications for graph databases, such as Neo4j later on.
Databases
: What the different kinds of databases and what they are used for. This section will help you to know what the right database for your projects.
So, let's dig right in.
Many people have used the word graph at some point in their professional or personal lives. However, chances are that they did not use it in the way that we will be using it in this book. Most people--obviously not you, otherwise you probably would not have picked up this book--actually think about something very different when talking about a graph. They think about pie charts and bar charts. They think about graphics, not graphs.
In this book, we will be working with a completely different type of subject--the graphs that you might know from your math classes. I, for one, distinctly remember being taught the basics of discrete mathematics in one of my university classes, and I also remember finding it terribly complex and difficult to work with. Little did I know that my later professional career would use these techniques in a software context, let alone that I would be writing a book on this topic.
So, what are graphs? To explain this, I think it is useful to put a little historic context around the concept. Graphs are actually quite old as a concept. They were invented, or at least first described, in an academic paper by the well-known Swiss mathematician, Leonhard Euler. He was trying to solve an age-old problem that we now know as the Seven Bridges of Königsberg. The problem at hand was pretty simple to understand.
Königsberg was a beautiful medieval city in the Prussian Empire situated on the river Pregel. It is located between Poland and Lithuania in today's Russia. If you try to look it up on any modern-day map, you will most likely not find it as it is currently known as Kaliningrad. The Pregel not only cut Königsberg into left- and right-bank sides of the city, but it also created an island in the middle of the river, which was known as the Kneiphof. The result of this peculiar situation was a city that was cut into four parts (we will refer to them as A, B, C, and D), which were connected by seven bridges (labelled a, b, c, d, e, f, and g in the following diagram). This gives us the following situation:
The seven bridges are connected to the four different parts of the city
The essence of the problem that people were trying to solve was to take a tour of the city, visiting every one of its parts and crossing every single one of its bridges, without having to walk a single bridge or street twice
In the following diagram, you can see how Euler illustrated this problem in his original 1736 paper:
Essentially, it was a pathfinding problem, like many others (for example, the knight's ride problem, or the traveling salesman problem). It does not seem like a very difficult assignment at all now, does it? However, at the time, people really struggled with it and were trying to figure it out for the longest time. It was not until Euler got involved and took a very different, mathematical approach to the problem that it got solved once and for all.
Euler did the following two things that I find really interesting:
First and foremost, he decided not to take the traditional brute force method to solve the problem (in this case, drawing a number of different route options on the map and trying to figure out--essentially by trial and error--if there was such a route through the city), but to do something different. He took a step back and took a different look at the problem by creating what I call an
abstract version
of the problem at hand, which is essentially a model of the problem domain that he was trying to work with. In his mind, at least, Euler must have realized that the citizens of Königsberg were focusing their attention on the wrong part of the problem--the streets. Euler quickly came to the conclusion that the streets of Königsberg did not
really
matter to find a solution to the problem. The only things that mattered for his pathfinding operation were the following:
The parts of the city
The bridges connecting the parts of the city
Now, all of a sudden, we seem to have a very different problem at hand, which can be accurately represented in what is often regarded as the world's first graph:
Secondly, Euler solved the puzzle at hand by applying a mathematical algorithm on the model that he created. Euler's logic was simple--if I want to take a walk in the town of Königsberg, then I will have to do as follows:
I will have to start somewhere in any one of the four parts of the city
I will have to leave that part of the city; in other words, I will have to cross one of the bridges to go to another part of the city
I will then have to cross another five bridges, leaving and entering different parts of the city
Finally, I will end the walk through Königsberg in another part of the city
Therefore, Euler argues, the case must be that the first and last parts of the city have an odd number of bridges that connect them to other parts of the city (because you leave from the first part and you arrive at the last part of the city), but the other two parts of the city must have an even number of bridges connecting them to the first and last parts of the city, because you will arrive and leave from these parts of the city.
This number of bridges connecting the parts of the city has a very special meaning in the model that Euler created, the graph representation of the model. We call this the degree of the nodes in the graph. In order for there to be a path through Königsberg that only crossed every bridge once, Euler proved that all he had to do was to apply a very simple algorithm that would establish the degree (in other words, count the number of bridges) of every part of the city. This is shown in the following diagram:
This is how Euler solved the famous Seven Bridges of Königsberg problem. By proving that there was no part of the city that had an even number of bridges, he also proved that the required walk in the city could not be done. Adding one more bridge would immediately make it possible, but with the state of the city and its bridges at the time, there was no way one could take such Eulerian Walk of the city.
By doing so, Euler created the world's first graph. The concepts and techniques of his research, however, are universally applicable; in order to do such a walk on any graph, the graph must have zero or two vertices with odd degrees and all intermediate vertices must have even degree.
To summarize, a graph is nothing more than an abstract, mathematical representation of two or more entities, which are somehow connected or related to each other. Graphs model pairwise relations between objects. They are, therefore, always made up of the following components:
The nodes of the graph, usually representing the objects mentioned previously
: In math, we usually refer to these structures as vertices; but for this book and in the context of graph databases such as Neo4j, we will always refer to vertices as nodes.
The links between the nodes of the graph
: In math, we refer to these structures as edges, but again, for the purpose of this book, we will refer to these links as
relationships
.
The structure of how nodes and relationships are connected to each other makes a graph
: Many important qualities, such as the number of edges connected to a node (what we referred to as degrees), can be assessed. Many other such indicators also exist.
Now that we have discussed graphs and understand a bit more about their nature and history, it's time to look at the discipline that was created on top of these concepts, often referred to as the graph theory.
When Euler invented the first graph, he was trying to solve a very specific problem of the citizens of Königsberg, with a very specific representation/model and a very specific algorithm. It turns out that there are quite a few problems that can be addressed as follows:
Described using the graph metaphor of objects and pairwise relations between them
Solved by applying a mathematical algorithm to this structure
The mechanism is the same, and the scientific discipline that studies these modeling and solution patterns, using graphs is often referred to as the graph theory and is considered to be a part of discrete mathematics.
There are lots of different types of graphs that have been analyzed in this discipline, as you can see from the following diagram:
Graph theory, the study of graph models and algorithms, has turned out to be a fascinating field of study, which has been used in many different disciplines to solve some of the most interesting questions facing mankind. Interestingly enough, it has seldom really been applied with rigor in the different fields of science that can benefit from it; maybe scientists today don't have the multidisciplinary approach required (providing expertise from graph theory and their specific field of study) to do so.
So, let's talk about some of these fields of study a bit, without giving you an exhaustive list of all applicable fields. Still, I do believe that some of these examples will be of interest for our future discussions in this book and will work up an appetite for what types of applications we will use a graph-based database, such as, Neo4j for.
For the longest time, people have understood that the way humans interact with one another is actually very easy to describe in a network. People interact with people every day. People influence one another every day. People exchange ideas every day. As they do, these interactions cause ripple effects through the social environment that they inhabit. Modelling these interactions as a graph has been of primary importance to better understand global demographics, political movements, and--last, but not least--the commercial adoption of certain products by certain groups. With the advent of online social networks, this graph-based approach to social understanding has taken a whole new direction. Companies such as Google, Facebook, Twitter, LinkedIn, and many others have undertaken very specific efforts to include graph-based systems in the way they target their customers and users, and in doing so, they have changed many of our daily lives quite fundamentally. See the following diagram, featuring a visualization of my LinkedIn network:
We often say in marketing taglines: Graphs Are Everywhere. When we do so, we are actually describing reality in a very real and fascinating way. Also, in this field, researchers have known for quite some time that biological components (proteins, molecules, genes, and so on) and their interactions can accurately be modelled and described by means of a graph structure, and doing so yields many practical advantages. In metabolic pathways (see the following diagram for the human metabolic system), for example, graphs can help us understand how the different parts of the human body interact with each other. In metaproteomics (the study of all protein samples taken from the natural environment), researchers analyze how different kinds of proteins interact with one another and are used in order to better steer chemical and biological production processes.
Some of the earliest computers were built with graphs in mind. Graph Compute Engines solved scheduling problems for railroads as early as the late 19th century, and the usage of graphs in computer science has only accelerated since then. In today's applications, use cases vary from chip design, network management, recommendation systems, and UML modeling to algorithm generation and dependency analysis. The following is an example of a UML diagram:
The latter is probably one of the more interesting use cases. Using pathfinding algorithms, software and hardware engineers have been analyzing the effects of changes in the design of their artifacts on the rest of the system. If a change is made to one part of the code, for example, a particular object is renamed; the dependency analysis algorithms can easily walk the graph of the system to find out what other classes will be affected by that change.
