When it comes to graph databases, no doubt that Neo4j is the shining star.

When it comes to Neo4j in production, no doubt that a Neo4j cluster is probably the best approach.

When it comes to a Neo4j cluster, in my humble opinion, there are some guidelines you should follow.

  1. Make sure you never run out of disk space. This might sound trivial, but recovery from a full disk can be super-painful as it will definitely make your cluster go out of sync and may get your databases quarantined
  2. A Causal Cluster is Causal up to a certain point. When a core server is lost, even if you had enough core servers to support the loss of that server, databases may still go offline anyway and your cluster might go out of sync.
  3. A database might remain quarantined even after the underlying issue was resolved. Usually, databases get quarantined for a reason, and usually a justified one. However, after the underlying issue that caused the database to get quarantined is resolved, sometimes the database remains quarantined (this time with no good reason)
  4. Have enough core servers to afford losses. The formula to calculate the number of core servers that can be lost without damaging the cluster is n=2F+1, when F is the number of servers you are willing to lose and n is the total number of required servers to achieve that.
  5. Have a simple and efficient deployment mechanism. When it comes to clusters, especially in production, the most important and critical aspect is the day-to-day maintenance. And don’t delude yourself, you will have maintenance and lots of it. All the time. So, you must have a super-simple and standardized deployment mechanism to facilitate ad-hoc changes.  
  6. Use a customized helm chart. Make sure you control the exact version running in production to avoid surprises. This is a general pessimistic approach I recommend taking with any 3rd party you use. You can never know when something will be changed on the vendor’s side. And you can never know how such a change will affect your application. Better safe than sorry.
  7. Have a separate mechanism for handling volumes. Neo4j Servers are deployed via Stateful-Sets and do not allow volume size changes via the helm chart once deployed. So make sure you have a standardized deployment mechanism for facilitating that. Didn’t I already write that in section 5? Well, it’s important.
  8. Think very carefully before you attend to a cluster issue. The first actions when attending to a cluster issue may have a huge effect on the overall outcome. Read more here.

About Author

Leave a Reply

Your email address will not be published.