E-Book
39,59 €

Learning Neo4j 3.x E-Book

Jerome Baton

0,0

39,59 €

oder

Leseprobe lesen

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Neo4j is a graph database that allows traversing huge amounts of data with ease. This book aims at quickly getting you started with the popular graph database Neo4j.
Starting with a brief introduction to graph theory, this book will show you the advantages of using graph databases along with data modeling techniques for graph databases. You'll gain practical hands-on experience with commonly used and lesser known features for updating graph store with Neo4j's Cypher query language. Furthermore, you'll also learn to create awesome procedures using APOC and extend Neo4j's functionality, enabling integration, algorithmic analysis, and other advanced spatial operation capabilities on data.
Through the course of the book you will come across implementation examples on the latest updates in Neo4j, such as in-graph indexes, scaling, performance improvements, visualization, data refactoring techniques, security enhancements, and much more. By the end of the book, you'll have gained the skills to design and implement modern spatial applications, from graphing data to unraveling business capabilities with the help of real-world use cases.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Veröffentlichungsjahr: 2017

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Learning Neo4j 3.x

Jerome Baton

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Patrick M. Lencioni

Mensch und Wald

Carsten Wippermann

The Food Truck Handbook

David Weber

Leseprobe

Learning Neo4j 3.x

Second Edition

Effective data modeling, performance tuning and data visualization techniques in Neo4j

Jérôme Baton

Rik Van Bruggen

BIRMINGHAM - MUMBAI

Learning Neo4j 3.x

Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: August 2014

Second Edition: October 2017

Production reference: 1171017

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78646-614-3

www.packtpub.com

Credits

Authors

Jérôme Baton

Rik Van Bruggen

Copy Editor

Tasneem Fatehi

Reviewers

Taffy Brecknock

Jose Ernesto Echeverria

Adriano Longo

Project Coordinator

Manthan Patel

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Vinay Argekar

Indexer

Tejal Daruwale Soni

Content Development Editor

Jagruti Babaria

Tejas Limkar

Graphics

Tania Dutta

Technical EditorDinesh Chaudhary

Dharmendra Yadav

Production Coordinator

Deepika Naik

About the Authors

Jérôme Baton started hacking computers at the age of skin problems, gaming first then continued his trip by self-learning Basic on Amstrad CPC, peaking on coding a full screen horizontal starfield, and messing the interlace of the video controller so that sprites appeared twice as high in horizontal beat'em up games. Disks were three inches for 178 Kb then.

Then, for gaming reasons, he switched to Commodore Amiga and its fantastic AMOS Basic. Later caught by seriousness and studies, he wrote Turbo Pascal, C, COBOL, Visual C++, and Java on PCs and mainframes at university, and even Logo in high school. Then, Java happened and he became a consultant, mostly on backend code of websites in many different businesses.

Jérôme authored several articles in French on Neo4j, JBoss Forge, an Arduino workshop for Devoxx4Kids, and reviewed kilos of books on Android. He has a weakness for wordplay, puns, spoonerisms, and Neo4j that relieves him from join(t) pains.

Jérôme also has the joy to teach in French universities, currently at I.U.T de Paris, Université Paris V - René Descartes (Neo4j, Android), and Université de Troyes (Neo4j), where he does his best to enterTRain the students.

If you would be a real seeker after truth, it is necessary that at least once in your life you doubt, as far as possible, all things. Rene Descartes

Read more at: https://www.brainyquote.com/authors/rene_descartes.

When not programming, Jérôme enjoys photography, doing electronics, everything DIY, understanding how things work, trying to be clever or funny on Twitter, and spends a lot of time trying to understand his kids and life in general.

Rik Van Bruggen is the VP of Sales for Neo Technology for Benelux, UK, and the Nordic region. He has been working for startup companies for most of his career, including eCom Interactive Expertise, SilverStream Software, Imprivata, and Courion. While he has an interest in technology, his real passion is business and how to make technology work for a business. He lives in Antwerp, Belgium, with his wife and three lovely kids, and enjoys technology, orienteering, jogging, and Belgian beer.

Acknowledgement

I would like to thank many people for this project that is truly a great personal achievement for me.

First of all, Rik Van Bruggen, who is the original author of this book and literally, the giant on whose shoulders I stand. Secondly, Vinay and Jagruti from Packt Publishing for their patience with a slow writer.

Thank you, William LyOn, Cédric FauVEt, Mark NEedham, BenOit Simard, Michael Hunger, Craig Taverner, and Jim Webber from Neo4j for their help and sharing their knowledge over the last few years on Stack Overflow, on Slack, or in person.

This would not have been possible if I myself had not had inspiring teachers such as Daniel 'DG' Guillaume, Françoise Meunier, Florence Fessy-Mesatfa, and Jérôme Fessy from IUT de Paris, and Dr. Robert T Hughes, Richard N Griffith, and Graham Winstanley from the University of Brighton.

Going further in the past, there are more teachers from whom I learned pedagogy and inspired me to share; I remember you, Mrs. Legrand, Mrs. Viala, and Mr. Bouhadda. Also, not being a native English speaker, I was at first very bad at speaking English. Extra energy from Mrs Goddard and Mrs Maluski really unlocked this second language for me.

Teachers change lives!

Also thanks to the doctors of my national health service without whom I would be a souvenir already. Vive la Sécurité Sociale!

Basically, I would like to thank all the people I learned from, be they teachers or not. Including my students.

Thank you, Romin Irani (@iRomin), my friend--you are an example.

Thank you, Anny Naïm, you are a truly shining person.

Above all, love you, kiddos!

I really should make a graph of all the people I would like to thank.

About the Reviewers

Taffy Brecknock has worked in the IT industry for more than 20 years. During his career, he has worked as a software developer, managed development teams, and has been responsible for application design and more recently systems architecture.

He has held roles with both public and private sector organizations. While working with the Australian Government, Taffy got first-hand exposure to the use of connected data in law enforcement. After using relational database systems as the data repository, he is experienced in the short comings of using this paradigm to model such systems.

After learning about graph databases, specifically Neo4j, he has become extremely interested in the many different applications of this technology. He feels that there are few problems in today's business world that cannot benefit from being modeled in a graph.

Jose Ernesto Echeverria started working with relational databases in the 90s, and has been working with Neo4j since 2014. He prefers graph databases over others, given their capabilities for real-world modeling and their adaptability to change. As a polyglot programmer, he has used languages such as Java, Ruby, and R with Neo4j in order to solve data management problems of multinational corporations. He is a regular attendee of GraphConnect, OSCON, and RailsConf. When not working, he enjoys spending time with family, road trips, Minecraft projects with his children, as well as reading and drinking craft beers.

Adriano Longo is a freelance data analyst based in the Netherlands with a passion for Neo4j's relationship-oriented data model.

He is specialized in querying, processing, and modeling data with Cypher, R, Python, and SQL and has worked on climate prediction models at UEA's Climatic Research Unit before focusing on analytical solutions for the private sector.

Today, Adriano uses Neo4j and Linkurious.js to explore the complex web of relationships that nefarious actors use to obfuscate their abuse of environmental and financial regulations--making dirty secrets less transparent, one graph at a time.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1786466147.

If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Graph Theory and Databases

Introducing Neo4j 3.x and a history of graphs

Definition and usage of the graph theory

Social studies

Biological studies

Computer science

Flow problems

Route problems

Web search

Background

Navigational databases

Relational databases

NoSQL databases

Key-value stores

Column-family stores

Document stores

Graph databases

The Property Graph model of graph databases

Node labels

Relationship types

Why use graph databases, or not

Why use a graph database?

Complex queries

In-the-clickstream queries on live data

Pathfinding queries

When not to use a graph database and what to use instead

Large set-oriented queries

Graph global operations

Simple aggregate-oriented queries

Test questions

Summary

Getting Started with Neo4j

Key concepts and characteristics of Neo4j

Built for graphs from the ground up

Transactional ACID-compliant database

Made for online transaction processing

Designed for scalability

A declarative query language - Cypher

Sweet spot use cases of Neo4j

Complex join-intensive queries

Pathfinding queries

Committed to open source

The features

The support

The license conditions

Installing Neo4j

Installing Neo4j on Windows

Installing Neo4j on Mac or Linux

Using Neo4j in a cloud environment

Sandbox

Using Neo4j in a Docker container

Installing Docker

Preparing the filesystem

Running Neo4j in a Docker container

Test questions

Summary

Modeling Data for Neo4j

The four fundamental data constructs

How to start modeling for graph databases

What we know – ER diagrams and relational schemas

Introducing complexity through join tables

A graph model – a simple, high-fidelity model of reality

Graph modeling – best practices and pitfalls

Graph modeling best practices

Designing for query-ability

Aligning relationships with use cases

Looking for n-ary relationships

Granulate nodes

Using in-graph indexes when appropriate

Graph database modeling pitfalls

Using rich properties

Node representing multiple concepts

Unconnected graphs

The dense node pattern

Test questions

Summary

Getting Started with Cypher

Writing the Cypher syntax

Key attributes of Cypher

Being crude with the data

Create data

Read data

Update data

Delete data

Key operative words in Cypher

Syntax norms

More that you need to know

With a little help from my friends

The Cypher refcard

The openCypher project

Summary

Awesome Procedures on Cypher - APOC

Installing APOC

On a hardware server

On a Docker container

Verifying APOC installation

Functions and procedures

My preferred usages

A little help from a friend

Graph overview

Several key usages

Setup

Random graph generators

PageRank

Timeboxed execution of Cypher statements

Linking of a collection of nodes

There's more in APOC

Test questions

Summary

Extending Cypher

Building an extension project

Creating a function

Creating a procedure

Custom aggregators

Unmanaged extensions

HTTP and JAX-RS refreshers

Registering

Accessing

Streaming JSON responses

Summary

Query Performance Tuning

Explain and profile instructions

A query plan

Operators

Indexes

Force index usage

Force label usage

Rules of thumb

Explain all the queries

Rows

Do not overconsume

Cartesian or not?

Simplicity

Summary

Importing Data into Neo4j

LOAD CSV

Scaling the import

Importing from a JSON source

Importing from a JDBC source

Test setup

Importing all the systems

Importing from an XML source

Summary

Going Spatial

What is spatial?

Refresher

Not faulty towers

What is so spatial then?

Neo4j's spatial features

APOC spatial features

Geocoding

Setting up OSM as provider

Setting up Google as provider

Neo4j spatial

Online demo

Features

Importing OpenStreetMap data

Large OSM Imports

Easy way

The tougher way to import data

Restroom please

Understanding WKT and BBOX

Removing all the geo data

Summary

Security

Authentication and authorization

Roles

Other roles

Users management

Linking Neo4j to an LDAP directory

Starting the directory

Configuring Neo4j to use LDAP

Test questions

Summary

Visualizations for Neo4j

The power of graph visualizations

Why graph visualizations matter!

Interacting with data visually

Looking for patterns

Spot what's important

The basic principles of graph visualization

Open source visualization libraries

D3.js

GraphViz

Sigma.js

Vivagraph.js

yWorks

Integrating visualization libraries in your application

Visualization solutions

Gephi

Keylines

Keylines graph visualization

Linkurio.us

Neo4j Browser

Tom Sawyer Software for graph visualization

Closing remarks on visualizations - pitfalls and issues

The fireworks effect

The loading effect

Cytoscape example

Source code

Questions and answers

Summary

Data Refactoring with Neo4j

Preliminary step

Simple changes

Renaming

Adding data

Adding data with a default value

Adding data with specific values

Checking our values

Removing data

Great changes

Know your model

Refactoring tools

Property to label

Property to node

Related node to label

Merging nodes

Relations

Consequences

Summary

Clustering

Why set up a cluster?

Concepts

Core servers

Read replica servers

High throughput

Data redundancy

High availability

Bolt

Building a cluster

The core servers

The read replicas

The bolt+routing protocol

Disaster recovery

Summary

Use Case Example - Recommendations

Recommender systems dissected

Using a graph model for recommendations

Specific query examples for recommendations

Recommendations based on product purchases

Recommendations based on brand loyalty

Recommendations based on social ties

Bringing it all together - compound recommendations

Business variations on recommendations

Fraud detection systems

Access control systems

Social networking systems

Questions and answers

Summary

Use Case Example - Impact Analysis and Simulation

Impact analysis systems dissected

Impact analysis in business process management

Modeling your business as a graph

Which applications are used in which buildings?

Which buildings are affected if something happens to Appl_9?

What business processes with an RTO of 0-2 hours would be affected by a fire at location Loc_100?

Impact simulation in a cost calculation environment

Modeling your product hierarchy as a graph

Working with a product hierarchy graph

Calculating the price based on a full sweep of the tree

Calculating the price based on intermediate pricing

Impact simulation on product hierarchy

Questions and answers

Summary

Tips and Tricks

Reset password

Check for other hosts

Getting the first line of a CSV file

Enabling SSH on a Raspberry Pi

Creating guides for the Neo4j browser

Data backup and restore

Community version

Enterprise version

Tools

Cypher-shell

Data integration tools

Modeling tools

Arrows

OmniGraffle

Community projects

Online documentation

Community

More proverbs

Preface

Learning Neo4j 3.x will give you the keys to graph databases and Neo4j in particular. From concepts to applications, you will learn a lot about Neo4j and will wonder why using relational databases again.

What this book covers

Chapter 1, Graph Theory and Databases, explains the fundamental theoretical and historical underpinnings of graph database technology. Additionally, this chapter positions graph databases in an ever-changing database landscape. It compares the technology/industry with other data technologies out there.

Chapter 2, Getting Started with Neo4j, introduces the specific Neo4j implementation of a graph database and looks at key concepts and characteristics.

Chapter 3, Modeling Data for Neo4j, covers the basic modeling techniques for graph databases.

Chapter 4, Getting Started with Cypher, provides an overview of the Cypher query language.

Chapter 5, Awesome Procedures on Cypher - APOC, introduces the APOC library. You will learn how to use it within Cypher queries, get information on it, and find the procedure you need among the hundreds provided by the community.

Chapter 6, Extending Cypher, talks about adding functions and procedures to a Neo4j instance. Write your own APOC.

Chapter 7, Query Performance Tuning, shows you how to tune your Cypher queries for better performance.

Chapter 8, Importing Data into Neo4j, explains how to import data from different kinds of sources.

Chapter 9, Going Spatial, covers the geolocation capabilities of Neo4j, APOC, and Neo4j Spatial.

Chapter 10, Security, covers authentication and authorization in Neo4j.

Chapter 11, Visualizations for Neo4j, shows you how to display your data.

Chapter 12, Data Refactoring with Neo4j, explains how to change the data model to fit new requirements.

Chapter 13, Clustering, sets up a causal cluster using the Neo4j Enterprise edition.

Chapter 14, Use-Case Example – Recommendations, digs into a specific graph database use case--real-time recommendations--and explains it using a specific example dataset/query patterns.

Chapter 15, Use-Case Example – Impact Analysis and Simulation, analyzes the impact of a change in the network on the rest of the network. In this part of the book, we will explain and explore that use case.

Appendix, Tips and Tricks, provides tips and more knowledge, don't miss it.

What you need for this book

To run the software and examples, you will need a decent developer station with Java 7 or better, with 4 GB of RAM and 2 GB of free disk space.

Examples are provided for the GNU/Linux systems.

Most chapters apply to Neo4j Community Edition and Neo4j Enterprise Edition, except Chapter 10, Security, and Chapter 13, Clustering.

In the later chapters, two laptops, several Raspberry Pis, and Docker containers are used.

Who this book is for

This book is for developers who want an alternative way to store and process data within their applications or developers who have to deal with highly connected data. No previous graph database experience is required; however, some basic database knowledge will help you understand the concepts more easily.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

Enter the name of the book in the

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Learning-Neo4j-3x-Second-Edition. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Graph Theory and Databases

People have different ways of learning new topics. We know that background information can contribute greatly to a better understanding of new topics. That is why, in this chapter of our Learning Neo4j 3.x book, we will start with a bit of background information, not to recount the tales of history, but to give you the necessary context that can lead to a better understanding of the topics.

In order to do so, we will address the following topics:

Graphs

: What they are and where they came from. This section will aim to set the record straight on what, exactly, our subject will contain, and what it won't.

Graph theory

: What it is and what it is used for. This section will give you quite a few examples of graph theory applications, and it will also start hinting at applications for graph databases, such as Neo4j later on.

Databases

: What the different kinds of databases and what they are used for. This section will help you to know what the right database for your projects.

So, let's dig right in.

Introducing Neo4j 3.x and a history of graphs

Many people have used the word graph at some point in their professional or personal lives. However, chances are that they did not use it in the way that we will be using it in this book. Most people--obviously not you, otherwise you probably would not have picked up this book--actually think about something very different when talking about a graph. They think about pie charts and bar charts. They think about graphics, not graphs.

In this book, we will be working with a completely different type of subject--the graphs that you might know from your math classes. I, for one, distinctly remember being taught the basics of discrete mathematics in one of my university classes, and I also remember finding it terribly complex and difficult to work with. Little did I know that my later professional career would use these techniques in a software context, let alone that I would be writing a book on this topic.

So, what are graphs? To explain this, I think it is useful to put a little historic context around the concept. Graphs are actually quite old as a concept. They were invented, or at least first described, in an academic paper by the well-known Swiss mathematician, Leonhard Euler. He was trying to solve an age-old problem that we now know as the Seven Bridges of Königsberg. The problem at hand was pretty simple to understand.

Königsberg was a beautiful medieval city in the Prussian Empire situated on the river Pregel. It is located between Poland and Lithuania in today's Russia. If you try to look it up on any modern-day map, you will most likely not find it as it is currently known as Kaliningrad. The Pregel not only cut Königsberg into left- and right-bank sides of the city, but it also created an island in the middle of the river, which was known as the Kneiphof. The result of this peculiar situation was a city that was cut into four parts (we will refer to them as A, B, C, and D), which were connected by seven bridges (labelled a, b, c, d, e, f, and g in the following diagram). This gives us the following situation:

The seven bridges are connected to the four different parts of the city

The essence of the problem that people were trying to solve was to take a tour of the city, visiting every one of its parts and crossing every single one of its bridges, without having to walk a single bridge or street twice

In the following diagram, you can see how Euler illustrated this problem in his original 1736 paper:

Illustration of the problem as mentioned by Euler in his paper in 1736

Essentially, it was a pathfinding problem, like many others (for example, the knight's ride problem, or the traveling salesman problem). It does not seem like a very difficult assignment at all now, does it? However, at the time, people really struggled with it and were trying to figure it out for the longest time. It was not until Euler got involved and took a very different, mathematical approach to the problem that it got solved once and for all.

Euler did the following two things that I find really interesting:

First and foremost, he decided not to take the traditional brute force method to solve the problem (in this case, drawing a number of different route options on the map and trying to figure out--essentially by trial and error--if there was such a route through the city), but to do something different. He took a step back and took a different look at the problem by creating what I call an

abstract version

of the problem at hand, which is essentially a model of the problem domain that he was trying to work with. In his mind, at least, Euler must have realized that the citizens of Königsberg were focusing their attention on the wrong part of the problem--the streets. Euler quickly came to the conclusion that the streets of Königsberg did not

really

matter to find a solution to the problem. The only things that mattered for his pathfinding operation were the following:

The parts of the city

The bridges connecting the parts of the city

Now, all of a sudden, we seem to have a very different problem at hand, which can be accurately represented in what is often regarded as the world's first graph:

Simplifying Königsberg

Secondly, Euler solved the puzzle at hand by applying a mathematical algorithm on the model that he created. Euler's logic was simple--if I want to take a walk in the town of Königsberg, then I will have to do as follows:

I will have to start somewhere in any one of the four parts of the city

I will have to leave that part of the city; in other words, I will have to cross one of the bridges to go to another part of the city

I will then have to cross another five bridges, leaving and entering different parts of the city

Finally, I will end the walk through Königsberg in another part of the city

Therefore, Euler argues, the case must be that the first and last parts of the city have an odd number of bridges that connect them to other parts of the city (because you leave from the first part and you arrive at the last part of the city), but the other two parts of the city must have an even number of bridges connecting them to the first and last parts of the city, because you will arrive and leave from these parts of the city.

This number of bridges connecting the parts of the city has a very special meaning in the model that Euler created, the graph representation of the model. We call this the degree of the nodes in the graph. In order for there to be a path through Königsberg that only crossed every bridge once, Euler proved that all he had to do was to apply a very simple algorithm that would establish the degree (in other words, count the number of bridges) of every part of the city. This is shown in the following diagram:

Simplified town

This is how Euler solved the famous Seven Bridges of Königsberg problem. By proving that there was no part of the city that had an even number of bridges, he also proved that the required walk in the city could not be done. Adding one more bridge would immediately make it possible, but with the state of the city and its bridges at the time, there was no way one could take such Eulerian Walk of the city.

By doing so, Euler created the world's first graph. The concepts and techniques of his research, however, are universally applicable; in order to do such a walk on any graph, the graph must have zero or two vertices with odd degrees and all intermediate vertices must have even degree.

To summarize, a graph is nothing more than an abstract, mathematical representation of two or more entities, which are somehow connected or related to each other. Graphs model pairwise relations between objects. They are, therefore, always made up of the following components:

The nodes of the graph, usually representing the objects mentioned previously

: In math, we usually refer to these structures as vertices; but for this book and in the context of graph databases such as Neo4j, we will always refer to vertices as nodes.

The links between the nodes of the graph

: In math, we refer to these structures as edges, but again, for the purpose of this book, we will refer to these links as

relationships

The structure of how nodes and relationships are connected to each other makes a graph

: Many important qualities, such as the number of edges connected to a node (what we referred to as degrees), can be assessed. Many other such indicators also exist.

Now that we have discussed graphs and understand a bit more about their nature and history, it's time to look at the discipline that was created on top of these concepts, often referred to as the graph theory.

Definition and usage of the graph theory

When Euler invented the first graph, he was trying to solve a very specific problem of the citizens of Königsberg, with a very specific representation/model and a very specific algorithm. It turns out that there are quite a few problems that can be addressed as follows:

Described using the graph metaphor of objects and pairwise relations between them

Solved by applying a mathematical algorithm to this structure

The mechanism is the same, and the scientific discipline that studies these modeling and solution patterns, using graphs is often referred to as the graph theory and is considered to be a part of discrete mathematics.

There are lots of different types of graphs that have been analyzed in this discipline, as you can see from the following diagram:

Graph types

Graph theory, the study of graph models and algorithms, has turned out to be a fascinating field of study, which has been used in many different disciplines to solve some of the most interesting questions facing mankind. Interestingly enough, it has seldom really been applied with rigor in the different fields of science that can benefit from it; maybe scientists today don't have the multidisciplinary approach required (providing expertise from graph theory and their specific field of study) to do so.

So, let's talk about some of these fields of study a bit, without giving you an exhaustive list of all applicable fields. Still, I do believe that some of these examples will be of interest for our future discussions in this book and will work up an appetite for what types of applications we will use a graph-based database, such as, Neo4j for.

Social studies

For the longest time, people have understood that the way humans interact with one another is actually very easy to describe in a network. People interact with people every day. People influence one another every day. People exchange ideas every day. As they do, these interactions cause ripple effects through the social environment that they inhabit. Modelling these interactions as a graph has been of primary importance to better understand global demographics, political movements, and--last, but not least--the commercial adoption of certain products by certain groups. With the advent of online social networks, this graph-based approach to social understanding has taken a whole new direction. Companies such as Google, Facebook, Twitter, LinkedIn, and many others have undertaken very specific efforts to include graph-based systems in the way they target their customers and users, and in doing so, they have changed many of our daily lives quite fundamentally. See the following diagram, featuring a visualization of my LinkedIn network:

Rik's professional network representation

Biological studies

We often say in marketing taglines: Graphs Are Everywhere. When we do so, we are actually describing reality in a very real and fascinating way. Also, in this field, researchers have known for quite some time that biological components (proteins, molecules, genes, and so on) and their interactions can accurately be modelled and described by means of a graph structure, and doing so yields many practical advantages. In metabolic pathways (see the following diagram for the human metabolic system), for example, graphs can help us understand how the different parts of the human body interact with each other. In metaproteomics (the study of all protein samples taken from the natural environment), researchers analyze how different kinds of proteins interact with one another and are used in order to better steer chemical and biological production processes.

A diagram representing the human metabolic system

Computer science

Some of the earliest computers were built with graphs in mind. Graph Compute Engines solved scheduling problems for railroads as early as the late 19th century, and the usage of graphs in computer science has only accelerated since then. In today's applications, use cases vary from chip design, network management, recommendation systems, and UML modeling to algorithm generation and dependency analysis. The following is an example of a UML diagram:

An example of an UML diagram

The latter is probably one of the more interesting use cases. Using pathfinding algorithms, software and hardware engineers have been analyzing the effects of changes in the design of their artifacts on the rest of the system. If a change is made to one part of the code, for example, a particular object is renamed; the dependency analysis algorithms can easily walk the graph of the system to find out what other classes will be affected by that change.