DEV Community

Cover image for Introducing The Spatial Cypher Cheat Sheet
William Lyon
William Lyon

Posted on • Originally published atlyonwjon

Introducing The Spatial Cypher Cheat Sheet

A Resource For Working With Geospatial Data In Neo4j

In this post we explore some techniques for working with geospatial data in Neo4j. We will cover some basic spatial Cypher functions, spatial search, routing algorithms, and different methods of importing geospatial data into Neo4j.

I recently went through the examples in the Spatial Cypher Cheat Sheet in an episode of the Neo4j Livestream, you can watch therecordingbelow:

The first page of theSpatial Cypher Cheat Sheetintroduces Cypher and the property graph data model, the spatial types available in the Neo4j database, as well as some of the spatial functions available in Cypher. We also touch on importing geospatial data into Neo4j (from CSV and GeoJSON) as well as some of the path-finding algorithms (breadth-first search, Dijkstra’s Algorithm, and A*).

Spatial Cypher cheat sheet - page 1

Page 2 of theSpatial Cypher Cheat Sheetcovers using Neo4j with Python. First, using the Neo4j Python driver to query data from Neo4j to build a GeoDataFrame. We then explore using the OSMNx Python package for fetching data from OpenStreetMap to load a road network into Neo4j.

Spatial Cypher cheat sheet - page 2

You candownload the PDF version of the Spatial Cypher Cheat Sheet,or read on for the same content in the rest of this blog post.

Introduction To Geospatial Cypher Functions With Neo4j

  • Intro To Cypher & The Property Graph Data Model
  • Spatial Point Type
  • Spatial Cypher Functions
  • Data Import
  • Routing With Path Finding Algorithms

Using Neo4j With Python

  • The Neo4j Python Driver
  • Creating A GeoDataFrame From Data Stored In Neo4j
  • Working With OpenStreetMap Data
  • Loading A Road Network With OSMNx
  • Analyzing and Visualizing Road Networks With Neo4j Bloom And Graph Data Science

Introduction To Geospatial Cypher Functions With Neo4j

Intro To Cypher And The Property Graph Data Model

Neo4j is a database management system (DBMS) that uses the property graph data model which is composed of nodes, relationships, and properties to model, store, and query data as a graph. Nodes can have one or more labels, relationships have a single type and direction. Key-value pair properties can be stored on nodes and relationships.

Airline flight graph data model

The Cypher query language is used to query data and interact with Neo4j. Cypher is a declarative query language that uses ASCII-art like syntax to define graph patterns that form the basis of most query operations. Nodes are defined with parenthesis, relationships with square brackets, and can be combined to create complex graph patterns. Common Cypher commands areMATCH(find where the graph pattern exists),CREATE(add data to the database using the specified graph pattern), andRETURN(return a subset of the data as a result of a traversal through the graph.

MATCHp=(sfo:Airport{iata:"SFO"})-[:FLIGHT_TO*2]->(rsw:Airport{iata:"RSW"})RETURNp
Enter fullscreen mode Exit fullscreen mode

Spatial Point Type

Neo4j supports 2D or 3D geographic (WGS84) or cartesian coordinate reference system (CRS) points. Here we create a point by specifying latitude/longitude, WGS84 is assumed when using latitude/longitude:

RETURNpoint({latitude:37.62245,longitude:-122.383989})
Enter fullscreen mode Exit fullscreen mode

Point data can be stored as properties on nodes or relationships. Here we create an Airport node and set its location as a point.

CREATE(a:Airport)
SETa.iata="SFO",
a.location=point({latitude:37.62245,longitude:-122.383989})
RETURNa
Enter fullscreen mode Exit fullscreen mode

Database indexes are used to speed up search performance. Here we create a database index on the location property for Airport nodes. This will help us find airports faster when searching for airports by location (radius distance or bounding box spatial search).

CREATEPOINTINDEXairportIndexFOR(a:Airport)ON(a.location)
Enter fullscreen mode Exit fullscreen mode

Spatial Cypher Functions

Radius Distance Search

To find nodes close to a point or other nodes in the graph we can use thepoint.distance()function to perform a radius distance search

MATCH(a:Airport)
WHEREpoint.distance(
a.location,
point({latitude:37.55948,longitude:-122.32544}))<20000
RETURNa
Enter fullscreen mode Exit fullscreen mode

Airline flight plot

Within Bounding Box

To search for nodes within a bounding box we can use thepoint.withinBBox()function.

MATCH(a:Airport)
WHEREpoint.withinBBox(
a.location,
point({longitude:-122.325447,latitude:37.55948}),
point({longitude:-122.314675,latitude:37.563596}))
RETURNa
Enter fullscreen mode Exit fullscreen mode

Bounding box search

See also my other blog post that goes into a bit more detail on spatial search functionality with Neo4j, including point in polygon:Spatial Search Functionality With Neo4j

Geocoding

To geocode a location description into latitude, longitude location we can use theapoc.spatial.geocode()procedure. By default this procedure uses the Nominatim geocoding API but can be configured to use other geocoding services, such as Google Cloud.

CALLapoc.spatial.geocode('SFO Airport')YIELDlocation
---------------------------------------------------------------
{
"description":"San Francisco International Airport, 780, South Airport Boulevard, South San Francisco, San Mateo County, CAL Fire Northern Region, California, 94128, United States",
"longitude":-122.38398938548363,
"latitude":37.622451999999996,
}

Enter fullscreen mode Exit fullscreen mode

Data Import

We can use Cypher to import data into Neo4j from formats such as CSV and JSON, including GeoJSON.

CSV

Using theLOAD CSVCypher command to create an airport routing graph.

1 - Create a constraint on the field that identies uniquesness, in this case Airport IATA code. This ensures we won’t create duplicate airports but also creates a database index to improve performance of our data import steps below.

CREATECONSTRAINTFOR(a:Airport)REQUIREa.iataISUNIQUE;
Enter fullscreen mode Exit fullscreen mode

2 - Create Airport nodes, storing their location, name, IATA code, etc as node properties.

LOADCSVWITHHEADERS
FROM"https://cdn.neo4jlabs /data/flights/airports.csv"
ASrow
MERGE(a:Airport{iata:row.IATA_CODE})
ONCREATESETa.city=row.CITY,
a.name=row.AIRPORT,
a.state=row.STATE,
a.country=row.country,
a.location=
point({latitude:toFloat(row.LATITUDE),
longitude:toFloat(row.LONGITUDE)
});
Enter fullscreen mode Exit fullscreen mode

3 - Create FLIGHT_TO relationships connecting airports with a connecting flight. Increment the num_flights counter variable to keep track of the number of flights between airports per year.

:auto
LOADCSVWITHHEADERS
FROM"https://cdn.neo4jlabs /data/flights/flights.csv"ASrow
CALL{
WITHrow
MATCH(origin:Airport{iata:row.ORIGIN_AIRPORT})
MATCH(dest:Airport{iata:row.DESTINATION_AIRPORT})
MERGE(origin)-[f:FLIGHT_TO]->(dest)
ONCREATESET
f.num_flights=0,f.distance=toInteger(row.DISTANCE)
ONMATCHSET
f.num_flights=f.num_flights+1
}INTRANSACTIONSOF50000ROWS;
Enter fullscreen mode Exit fullscreen mode

GeoJSON

We can also store arrays of Points to represent complex geometries like lines and polygons, for example to represent land parcels.

CALLapoc.load.json('https://cdn.neo4jlabs /data/landgraph/parcels.geojson')
YIELDvalue
UNWINDvalue.featuresASfeature
CREATE(p:Parcel)SET
p.coordinates=[coordINfeature.geometry.coordinates[0]|point({latitude:coord[1],longitude:coord[0]})]
p+=feature.properties;
Enter fullscreen mode Exit fullscreen mode

Routing With Path Finding Algorithms

Shortest Path

The shortestPath function performs a binary breadth-first search to find the shortest path between nodes in the graph.

MATCHp=shortestPath(
(:Airport{iata:"SFO"})-[:FLIGHT_TO*..10]->(:Airport{iata:"RSW"})
)RETURNp
Enter fullscreen mode Exit fullscreen mode

Shortest path

ShortestWeightedPath

Often we want to consider the shortest weighted path taking into account distance, time or some other cost stored as relationship properties. Dijkstra and A* are two algorithms that take relationship (or edge) weights into account when calculating the shortest path.

Dijkstra's Algorithm

Dijkstra's Algorithm

Dijkstra’s algorithm is similar to a breadth-first search, but takes into account relationship properties (distance) and prioritizes exploring low-cost routes first using a priority queue.

MATCH(origin:Airport{iata:"SFO"})
MATCH(dest:Airport{iata:"RSW"})
CALL
apoc.algo.dijkstra(
origin,
dest,
"FLIGHT_TO",
"distance"
)
YIELDpath,weight
UNWINDnodes(path)ASn
RETURN{
airport:n.iata,
lat:n.location.latitude,
lng:n.location.longitude
}ASroute
Enter fullscreen mode Exit fullscreen mode

A* Algorithm

A* algorithm

The A* algorithm adds a heuristic function to choose which paths to explore. In our case the heuristic is the distance to the final destination.

MATCH(origin:Airport{iata:"SFO"})
MATCH(dest:Airport{iata:"RSW"})
CALL
apoc.algo.aStarConfig(
origin,
dest,
"FLIGHT_TO",
{
pointPropName:"location",
weight:"distance"
}
)
YIELDweight,path
RETURNweight,path
Enter fullscreen mode Exit fullscreen mode

There are additional path-finding algorithms available in Neo4j'sGraph Data Science Library.

Using Neo4j With Python For Geospatial Data

The Neo4j Python Driver

In this section we’ll use the Neo4j Python Driver to create a GeoDataFrame of our flight data. We’ll also compute weighted degree centrality so we can plot airport size relative to their “importance” in the US airline network. The Neo4j Python Driver can be installed with:

pip install neo4j
Enter fullscreen mode Exit fullscreen mode

Creating A GeoDataFrame From Data Stored In Neo4j

First we import theneo4jPython package, define our connection credentials for our Neo4j instance (here we are using a local Neo4j instance), and create adriverinstance.

importneo4j

NEO4J_URI="neo4j://localhost:7689"
NEO4J_USER="neo4j"
NEO4J_PASSWORD="letmeinnow"
NEO4J_DATABASE="neo4j"

driver=neo4j.GraphDatabase.driver(
NEO4J_URI,auth=(NEO4J_USER,NEO4J_PASSWORD)
)
Enter fullscreen mode Exit fullscreen mode

Next, we define a Cypher query to fetch data from Neo4j. In addition to fetching each flight between airports we computeweighted degree centrality,a measure of each node's importance in the network, the sum of all relationship weights connected to a given node, in this case the number of flights per year for each airport.

We also return the geometry of our origin and destination airports, and the flight route as Well Known Text (WKT).POINTfor the airports andLINESTRINGfor the flight route. We’ll parse this WKT when defining the geometry in our GeoDataFrame.

AIRPORT_QUERY="""
MATCH (origin:Airport)-[f:FLIGHT_TO]->(dest:Airport)
CALL {
WITH origin
MATCH (origin)-[f:FLIGHT_TO]-()
RETURN sum(f.num_flights) AS origin_centrality
}
CALL {
WITH dest
MATCH (dest)-[f:FLIGHT_TO]-()
RETURN sum(f.num_flights) AS dest_centrality
}
RETURN {
origin_wkt:
"POINT ( "+ origin.location.longitude +" "+ origin.location.latitude +" ) ",
origin_iata: origin.iata,
origin_city: origin.city,
origin_centrality: origin_centrality,
dest_centrality: dest_centrality,
dest_wkt:
"POINT ( "+ dest.location.longitude +" "+ dest.location.latitude +" ) ",
dest_iata: dest.iata,
dest_city: dest.city,
distance: f.distance,
num_flights: f.num_flights,
geometry:
"LINESTRING ( "+ origin.location.longitude +" "+ origin.location.latitude +", "
+ dest.location.longitude +" "+ dest.location.latitude +" ) "
}
"""
Enter fullscreen mode Exit fullscreen mode

Next we define a Python function to execute our Cypher query and process the results into a GeoPandas GeoDataFrame. The Neo4j Python driver has a.to_df()method which will convert a Neo4j result set to a Pandas DataFrame. Note that we parse the WKT columns into GeoSeries and convert the pandas DataFrame into a GeoPandas GeoDataFrame.

defget_airport(tx):
results=tx.run(AIRPORT_QUERY)
df=results.to_df(expand=True)
df.columns=
['origin_city','origin_wkt','dest_city','dest_wkt','origin_centrality','distance','origin_iata',
'geometry','num_flights','dest_centrality','dest_iata']
df['geometry']=geopandas.GeoSeries.from_wkt(df['geometry'])
df['origin_wkt']=geopandas.GeoSeries.from_wkt(df['origin_wkt'])
df['dest_wkt']=geopandas.GeoSeries.from_wkt(df['dest_wkt'])
gdf=geopandas.GeoDataFrame(df,geometry='geometry')
returngdf

withdriver.session(database=NEO4J_DATABASE)assession:
airport_df=session.execute_read(get_airport)
Enter fullscreen mode Exit fullscreen mode

Airline flight GeoDataFrame

We now have a GeoDataFrame where each row is a flight route between two airports. We can plot the airport and routes, using the centrality metric to size airport nodes: more important airports should be larger.

world=geopandas.read_file(
geopandas.datasets.get_path('naturalearth_lowres')
)

ax=world[world.continent=='North America']
.plot(color='white',edgecolor='black')

flights_gdf=flights_gdf.set_geometry("origin_wkt")
flights_gdf.plot(ax=ax,markersize='origin_centrality')

flights_gdf=flights_gdf.set_geometry("geometry")
flights_gdf.plot(ax=ax,markersize=0.1,linewidth=0.01)
Enter fullscreen mode Exit fullscreen mode

Flight plot

Working With OpenStreetMap Data

In this section we will import data from OpenStreetMap into Neo4j using the OSMNx Python package. Below is the property graph data model we will use to model the road network of Boston.

pipinstallosmnx
Enter fullscreen mode Exit fullscreen mode

Loading A Road Network With OSMNx

importosmnxasox

G=ox.graph_from_place("Boston, MA, USA",network_type="drive")
fig,ax=ox.plot_graph(G)

gdf_nodes,gdf_relationships=ox.graph_to_gdfs(G)
gdf_nodes.reset_index(inplace=True)
gdf_relationships.reset_index(inplace=True)

Enter fullscreen mode Exit fullscreen mode

Here is our nodes GeoDataFrame. Each row represents an intersection in the Boston road network:

GeoDataFrame of Boston road network nodes

Here is our relationships GeoDataFrame. Each row represents a road segment connecting two intersections.

GeoDataFrame of Boston road network relationships

We'll define a Cypher query to add intersection nodes from the nodes GeoDataFrame and add road segments from the relationships GeoDataFrame connecting intersection nodes.

First let's create a constraint to ensure we don't have duplicate Intersection nodes (this will also create a node index to improve lookups during import):

CREATECONSTRAINTFOR(i:Intersection)REQUIREi.osmidISUNIQUE
Enter fullscreen mode Exit fullscreen mode

We can also create an index on theosmidproperty of theROAD_SEGMENTrelationship to improve import performance:

CREATEINDEXFOR()-[r:ROAD_SEGMENT]-()ONr.osmid
Enter fullscreen mode Exit fullscreen mode

Graph data model for our road network

Because our GeoDataFrames can be very large we break them up into batches to avoid sending too much data to the database at once.

node_query='''
UNWIND $rows AS row
WITH row WHERE row.osmid IS NOT NULL
MERGE (i:Intersection {osmid: row.osmid})
SET i.location =
point({latitude: row.y, longitude: row.x }),
i.ref = row.ref,
i.highway = row.highway,
i.street_count = toInteger(row.street_count)
RETURN COUNT(*) as total
'''

rels_query='''
UNWIND $rows AS road
MATCH (u:Intersection {osmid: road.u})
MATCH (v:Intersection {osmid: road.v})
MERGE (u)-[r:ROAD_SEGMENT {osmid: road.osmid}]->(v)
SET r.oneway = road.oneway,
r.lanes = road.lanes,
r.ref = road.ref,
r.name = road.name,
r.highway = road.highway,
r.max_speed = road.maxspeed,
r.length = toFloat(road.length)
RETURN COUNT(*) AS total
'''

definsert_data(tx,query,rows,batch_size=1000):
total=0
batch=0

whilebatch*batch_size<len(rows):
results=tx.run(query,parameters={
'rows':
rows[batch*batch_size:(batch+1)*batch_size]
.to_dict('records')
}).data()
print(results)
total+=results[0]['total']
batch+=1

withdriver.session()assession:
session.execute_write(insert_data,node_query,gdf_nodes.drop(columns=['geometry']))
session.execute_write(insert_data,rels_query,gdf_relationships.drop(columns=['geometry']))
Enter fullscreen mode Exit fullscreen mode

In this post we introduced some of the spatial functionality natively supported by Neo4j including the point type and related Cypher functions and demonstrated how to accomplish various spatial search operations as well as a brief look at routing with graph algorithms.

Resources

Top comments(0)