Neo4j

Neo4j is a native graph database that leverages data relationships as first-class entities. You can connect a cluster in Databricks to a Neo4j cluster using the neo4j-spark-connector, which offers Spark APIs for RDD, DataFrame, GraphX and GraphFrames. The neo4j-spark-connector uses the binary Bolt protocol to transfer data to and from the Neo4j server.

Neo4j Configuration

Neo4j can be deployed on various cloud providers: Azure, Digital Ocean, AWS EC2, etc.

To deploy Neo4j, please see the official Neo4j cloud deployment guide. This guide assumes Neo4j 3.2.2.

Make sure the Neo4j password has been changed from default (you should be prompted when you first access Neo4j) and modify conf/neo4j.conf to accept remote connections. For more information see Configuring Neo4j Connectors.

# conf/neo4j.conf

# Bolt connector
dbms.connector.bolt.enabled=true
#dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=0.0.0.0:7687

# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
#dbms.connector.http.listen_address=0.0.0.0:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.
dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=0.0.0.0:7473

Databricks

  1. You’ll need two libraries: neo4j-spark-connector and graphframes. See this guide for instructions on how to attach libaries directly from Spark Packages.

  2. Create a cluster with these spark configs

    spark.neo4j.bolt.url bolt://<ip-of-neo4j-instance>:7687
    spark.neo4j.bolt.user <username>
    spark.neo4j.bolt.password <password>
    
  3. Import relevant libraries and test out the connection

    import org.neo4j.spark._
    import org.graphframes._
    
    val neo = Neo4j(sc)
    
    // Dummy Cypher query to check connection
    val testConnection = neo.cypher("MATCH (n) RETURN n;").loadRdd[Long]
    
    /*
    
    Below example query assumes the following sample data has been loaded into Neo4j
    
      UNWIND range(1,100) as id
      CREATE (p:Person {id:id}) WITH collect(p) as people
      UNWIND people as p1
      UNWIND range(1,10) as friend
      WITH p1, people[(p1.id + friend) % size(people)] as p2
      CREATE (p1)-[:KNOWS {years: abs(p2.id - p2.id)}]->(p2)
    
    */
    
    val graphFrame = neo.pattern(("Person","id"),("KNOWS",null), ("Person","id")).partitions(3).rows(1000).loadGraphFrame