[go: up one dir, main page]

DEV Community

Cover image for Neo4j 101
Nacho González Bullón for Playtomic

Posted on

Neo4j 101

Introduction

In Playtomic we have added a new database to the family, Neo4j. This is the first post of a series where we will try to explain what it is, how we use it and more interesting things about this database.

What is Neo4j

We have plenty of options when picking a database for our project. The range usually goes from classic relational databases (MySql or PostgreSQL) to NoSQL databases. In the latter group, we often use document-oriented databases (MongoDB, CouchDB), wide-column stores (BigTable, Cassandra), or key-value storage (Redis, Dynamo). But sometimes the data we need to store and query does not fit well in any of these types. For some of these cases, we can use Neo4j.

Neo4j is a graph database, this means that data is organised in nodes and relationships. A node fits into the entity in any other database, it has one or more labels and any number of properties. A relationship relates two given nodes, it also has a label, a direction, and any number of properties. So graph databases have entities and relationships, nothing new. But what makes them shine and be different from the rest of the databases is that they treat relationships as first-class citizens, as the entities. So, Neo4j is a perfect suit when you need to know about the relationship between your data. This commonly includes:

  • Fraud detection: by detecting uncommon patterns of relationships between nodes.
  • Real time recommendations: based on the relationships between the nodes we can provide useful recommendations to the users.
  • Network representations: this will allow the network managers to analyse and predict problems and design better topographies.
  • Identity and access management: for complex cases, a database capable of traversing relationships within milliseconds might be the way to go.

But there is more, Neo4j comes with many graph algorithms that will allow us to analyse and take the best out of our data. They are organised in six big groups:

  • Community detection: to evaluate how the nodes are forming communities, whether there are partitions or detect weak links in the graph.
  • Centrality: to determine the importance of distinct nodes in the graph.
  • Similarity: to evaluate how alike nodes are.
  • Heuristic link prediction: to predict new relationships based on the topology of the network.
  • Pathfinding & search: two find shortest paths between nodes or find if there are available paths between nodes.
  • Node embedding: to compute low-dimensional representation of the graph to be used for machine learning.

How to install and start working with Neo4j

The easiest way to install Neo4j is Neo4j Desktop. This app will allow us to create local databases, install tools and plug-ins and start working with our data. It also allows us to connect to our remote databases (cloud provided or Docker instances, for example). The app will also provide several features such as:

  • database information,
  • access to commands to load example datasets or different tutorials and exercises
  • storage of your favourite queries
  • keep a history of your queries

And of course you can run your CYPHER queries taking advantage of the syntax highlighting and getting your results in several formats such as table or graph.

Cypher: what is it and first queries

Note: to run the following examples you can run :play northwind-graph with Neo4j Desktop in your local database and you will be presented with some instructions to load the sample data.

Cypher is the query language used in Neo4j. It shines for its pattern matching syntax that produces very readable queries. This pattern matching syntax consists in drawing the path you want your query to traverse. In the patterns we can determine nodes (enclosed in parenthesis), relationships (enclosed in square brackets) and the direction of the relationship using ASCII characters (-> or <-). The simplest query we can think of is the one that returns all the nodes of a given type with a limited number of results.

MATCH(p:Product) 
RETURN p LIMIT 25
Enter fullscreen mode Exit fullscreen mode

Here we say that we want to match all nodes of type Product and return them.

Results from MATCH(p:Product) RETURN p LIMIT 25

Now we can think of a more complex query where we can retrieve products and the category or categories they belong to.

MATCH (p:Product)-[:PART_OF]->(c:Category) 
RETURN p, c LIMIT 25
Enter fullscreen mode Exit fullscreen mode

Here we want to match all nodes of type Product and the Category they are PART_OF and we define the direction of the relationship. Finally we limit the results to 25.

Results from MATCH (p:Product)-[:PART_OF]->(c:Category) RETURN p, c LIMIT 25

But of course the Cypher language allows us to filter the results. We can, for example, get the orders of a given product.

MATCH (p:Product {productName:'Mozzarella di Giovanni'})<-[:ORDERS]-(o:Order) 
RETURN p, o LIMIT 25
Enter fullscreen mode Exit fullscreen mode

Here we are filtering products by its productName. Also pay attention to the direction of the arrows as now they go from right to left, because and order orders products. Of course you can write the query inverting the order of the nodes along with the direction of the arrows and in both cases the result would be exactly the same.

MATCH (o:Order)-[:ORDERS]->(p:Product {productName:'Mozzarella di Giovanni'}) 
RETURN p, o LIMIT 25
Enter fullscreen mode Exit fullscreen mode

Results from MATCH (p:Product {productName:'Mozzarella di Giovanni'})<-[:ORDERS]-(o:Order) RETURN p, o LIMIT 25

But we can go even further and query the database to get all the orders that contain Mozzarella di Giovanni and the order consists of more than 4 products.

MATCH(o:Order)-[:ORDERS]->(p:Product{productName:'Mozzarella di Giovanni'}) 
WITH apoc.node.degree(o,'ORDERS>') AS numOfProducts, p, o
WHERE numOfProducts > 4
RETURN p, o, numOfProducts LIMIT 25
Enter fullscreen mode Exit fullscreen mode

Here we are taking advantage of two new concepts:

  • the WITH clause that allows us to pipe query parts from one part of the query to the next. In this case we are using it to count the number of products in an order to filter them.
  • the apoc library that provides us with tons of procedures and functions to perform different tasks. In this case we are using the apoc.node.degree function to get the number of ORDERS relationships that come out of each Order node.

Results from MATCH(o:Order)-[:ORDERS]->(p:Product{productName:'Mozzarella di Giovanni'}) WITH apoc.node.degree(o,'ORDERS>') AS numOfProducts, p, o<br>
WHERE numOfProducts > 4 RETURN p, o, numOfProducts LIMIT 25

These are the basics of Cypher, obviously there are much more of it and you can continue learning in their guides and documentation.

Conclusions

Neo4j is an interesting player in the databases market and it is worth a try if your project needs what it provides. We will add more posts to these series to explain some other interesting aspects of Neo4j.

Cover photo by Clint Adair on Unsplash

Top comments (0)