OD matrix to Graph tutorial#
This notebook demonstrates the conversion of Origin Destination (OD) Matrix into graph. The function od_matrix_to_graph
supports ODs from edgelist
(a DataFrame with source, target, and weight columns) and adjacency
(a square matrix DataFrame or NumPy array). To construct a geospatial graph, it needs the data of zones_gdf
that contains the information of zones.
1. Overview#
In this tutorial you will:
Learn the function inputs and outputs
Build a tiny mock example (both edgelist and adjacency)
Build a real graph from London MSOA zones and an ODMG flow table
Visualize results and optionally convert to NetworkX
2. Imports and setup#
[ ]:
# Imports
import numpy as np
import pandas as pd
import geopandas as gpd
import networkx as nx
import matplotlib.pyplot as plt
import city2graph as c2g
plt.rcParams["figure.figsize"] = (10, 8)
3. Edge list#
We’ll create 4 toy zones laid out in a square, then an edge list with a single weight column.
Creating sample zones#
First, we’ll create a simple 4x4 grid of zones to work with. Each zone will be a 1×1 degree square polygon with a unique identifier like “G00”, “G10”, etc. This gives us 16 zones total that we can reference in our OD data.
[2]:
# Sixteen zones as a 4x4 grid of square cells (polygons) with explicit ids
from shapely.geometry import box
cells = []
ids = []
# Iterate rows (y) and columns (x) to build 1x1 degree squares
for i in range(4): # rows (y = 0..3)
for j in range(4): # cols (x = 0..3)
cells.append(box(j, i, j + 1, i + 1))
ids.append(f"G{j}{i}") # e.g., G00, G10, ..., G33
zones_gdf = gpd.GeoDataFrame({"zone_id": ids}, geometry=cells, crs="EPSG:4326")
zones_gdf.head()
[2]:
zone_id | geometry | |
---|---|---|
0 | G00 | POLYGON ((1 0, 1 1, 0 1, 0 0, 1 0)) |
1 | G10 | POLYGON ((2 0, 2 1, 1 1, 1 0, 2 0)) |
2 | G20 | POLYGON ((3 0, 3 1, 2 1, 2 0, 3 0)) |
3 | G30 | POLYGON ((4 0, 4 1, 3 1, 3 0, 4 0)) |
4 | G01 | POLYGON ((1 1, 1 2, 0 2, 0 1, 1 1)) |
Creating the edge list and converting to graph#
Now we’ll create a simple edge list DataFrame with source-target pairs and flow values. This represents movement between zones - for example, 5 units flowing from zone G00 to G10. We’ll then use od_matrix_to_graph()
to convert this into spatial graph format.
[3]:
# Edge list with single weight column using grid zone ids (subset for clarity)
E = pd.DataFrame({
"source": ["G00", "G00", "G10", "G01", "G22", "G23", "G12"],
"target": ["G10", "G01", "G11", "G11", "G32", "G33", "G22"],
"flow": [5, 2, 3, 7, 4, 6, 5],
})
nodes_gdf1, edges_gdf1 = c2g.od_matrix_to_graph(
E, zones_gdf, zone_id_col="zone_id",
matrix_type="edgelist",
source_col="source", target_col="target",
weight_cols=["flow"],
threshold=None, # drop zeros only
include_self_loops=False,
compute_edge_geometry=True,
directed=True,
as_nx=False,
)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:137: UserWarning: Geographic CRS detected; distance/length measures may be inaccurate (requirement 3.5)
_validate_crs(zones_gdf)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:1103: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
centroids = zones_gdf.geometry.centroid
The function returns two GeoDataFrames: nodes_gdf1
contains the zone geometries, and edges_gdf1
contains the flow edges with geometries connecting zone centroids.
[4]:
edges_gdf1.head()
[4]:
weight | flow | geometry | ||
---|---|---|---|---|
source | target | |||
G00 | G01 | 2 | 2 | LINESTRING (0.5 0.5, 0.5 1.5) |
G10 | 5 | 5 | LINESTRING (0.5 0.5, 1.5 0.5) | |
G01 | G11 | 7 | 7 | LINESTRING (0.5 1.5, 1.5 1.5) |
G10 | G11 | 3 | 3 | LINESTRING (1.5 0.5, 1.5 1.5) |
G12 | G22 | 5 | 5 | LINESTRING (1.5 2.5, 2.5 2.5) |
Visualizing the spatial graph#
Now let’s create a visualization showing both the zone boundaries and the flow edges. The edge thickness will be proportional to the flow weight.
[5]:
# Quick plot of nodes (grid cells) and edges
ax = zones_gdf.boundary.plot(color="#2E86AB", alpha=0.8, linewidth=0.8)
edges_gdf1.plot(ax=ax, color="#E67E22", linewidth=np.log1p(edges_gdf1["weight"]) * 1.2)
zones_gdf.centroid.plot(ax=ax, color="#2E86AB", markersize=20, alpha=0.8)
plt.title("Mock edgelist graph on 4x4 grid (directed)")
plt.axis("equal"); plt.axis("off")
plt.show()
/var/folders/_n/l2f9tkgn3g17dj7hnsjprssc0000gn/T/ipykernel_51323/3909391215.py:4: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
zones_gdf.centroid.plot(ax=ax, color="#2E86AB", markersize=20, alpha=0.8)

4. Undirected with multiple weights#
od_matrix_to_graph
also supports undirected way of summing flows, and multiple weights. We’ll add two weight columns and merge reciprocals by summing. We must choose a threshold_col
as the primary weight (canonical weight
).
Creating an undirected graph with multiple weights#
In this example, we’ll create edge data with reciprocal pairs (e.g., G00→G10 and G10→G00) and multiple weight columns. When directed=False
, reciprocal edges are merged by summing their weights.
[15]:
# Multi-weight undirected example on 4x4 grid (subset of pairs with reciprocals)
E2 = pd.DataFrame({
"source": ["G00", "G10", "G01", "G11", "G22", "G32", "G23", "G33"],
"target": ["G10", "G00", "G11", "G01", "G32", "G22", "G33", "G23"],
"trips": [5, 1, 7, 2, 4, 3, 8, 2],
"cost": [10, 2, 12, 3, 7, 4, 15, 3],
})
nodes_gdf2, edges_gdf2 = c2g.od_matrix_to_graph(
E2, zones_gdf, zone_id_col="zone_id",
matrix_type="edgelist",
source_col="source", target_col="target",
weight_cols=["trips", "cost"],
threshold=3, threshold_col="trips",
include_self_loops=False,
compute_edge_geometry=True,
directed=False, # undirected: sum reciprocals
as_nx=False,
)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:137: UserWarning: Geographic CRS detected; distance/length measures may be inaccurate (requirement 3.5)
_validate_crs(zones_gdf)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:1103: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
centroids = zones_gdf.geometry.centroid
Plotting the undirected graph#
The visualization will show the merged edges with their combined weights. Notice how reciprocal flows are now represented as single undirected edges.
[16]:
ax = zones_gdf.boundary.plot(color="#2E86AB", alpha=0.8, linewidth=0.8)
edges_gdf2.plot(ax=ax, color="#9B59B6", linewidth=np.log1p(edges_gdf2["weight"]) * 1.2)
zones_gdf.centroid.plot(ax=ax, color="#2E86AB", markersize=20, alpha=0.8)
plt.title("Mock edgelist graph on 4x4 grid (undirected, primary=trips)")
plt.axis("equal"); plt.axis("off")
plt.show()
/var/folders/_n/l2f9tkgn3g17dj7hnsjprssc0000gn/T/ipykernel_51323/133846062.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
zones_gdf.centroid.plot(ax=ax, color="#2E86AB", markersize=20, alpha=0.8)

5. Adjacency matrix#
You can pass a square pandas DataFrame (index and columns must match) or a NumPy array.
Building an adjacency matrix#
Instead of an edge list, we can provide data as an adjacency matrix. Here we’ll create a 16×16 DataFrame where rows and columns represent zones, and cell values represent flows.
[8]:
# Build an adjacency DataFrame matching the 4x4 grid zone ids
ids = zones_gdf["zone_id"]
# Create a sparse 16x16 matrix with a few flows
A = pd.DataFrame(0, index=ids, columns=ids, dtype=float)
# add some directed flows
A.loc["G00", "G10"] = 5
A.loc["G00", "G01"] = 2
A.loc["G10", "G11"] = 3
A.loc["G01", "G11"] = 7
A.loc["G22", "G32"] = 4
A.loc["G23", "G33"] = 6
A.loc["G12", "G22"] = 5
nodes_gdf3, edges_gdf3 = c2g.od_matrix_to_graph(
A, zones_gdf, zone_id_col="zone_id",
matrix_type="adjacency",
include_self_loops=False,
threshold=None,
directed=True,
)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:137: UserWarning: Geographic CRS detected; distance/length measures may be inaccurate (requirement 3.5)
_validate_crs(zones_gdf)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:1103: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
centroids = zones_gdf.geometry.centroid
Visualizing the adjacency matrix graph#
The result should be identical to our first example since we’re using the same flow values, just in matrix format instead of edge list format.
[9]:
ax = zones_gdf.boundary.plot(color="#2E86AB", alpha=0.8, linewidth=0.8)
edges_gdf3.plot(ax=ax, color="#27AE60", linewidth=np.log1p(edges_gdf3["weight"]) * 1.2)
zones_gdf.centroid.plot(ax=ax, color="#2E86AB", markersize=20, alpha=0.8)
plt.title("Mock adjacency graph on 4x4 grid (directed)")
plt.axis("equal"); plt.axis("off")
plt.show()
/var/folders/_n/l2f9tkgn3g17dj7hnsjprssc0000gn/T/ipykernel_51323/2081733941.py:3: UserWarning: Geometry is in a geographic CRS. Results from 'centroid' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
zones_gdf.centroid.plot(ax=ax, color="#2E86AB", markersize=20, alpha=0.8)

6. England & Wales MSOA example#
This section mirrors the development notebook and uses local files under dev/
. You can adjust paths as needed.
Loading real-world data#
Now we’ll work with actual migration data from the UK Office for National Statistics . For the dataset, we employed the OD data of England and Wales from the UK Census 2021. We’ll load MSOA (Middle Layer Super Output Areas) zone boundaries for the zone unit.
[10]:
# Paths (adjust to your environment if needed)
# Relative to repository root
ZONE_GPKG = "../../../dev/Middle_layer_Super_Output_Areas_December_2021_Boundaries_EW_BGC_V3_-1334546435986816930.gpkg"
OD_CSV = "../../../dev/odmg/odmg01ew/ODMG01EW_MSOA.csv"
zones_london = gpd.read_file(ZONE_GPKG)
od_london = pd.read_csv(OD_CSV)
print(f"zones_london: {len(zones_london)} rows, CRS={zones_london.crs}")
od_london.head(3)
zones_london: 7264 rows, CRS=EPSG:27700
[10]:
Migrant MSOA one year ago code | Migrant MSOA one year ago label | Middle layer Super Output Areas code | Middle layer Super Output Areas label | Count | |
---|---|---|---|---|---|
0 | -8 | Does not apply | E02000001 | City of London 001 | 6237 |
1 | -8 | Does not apply | E02000002 | Barking and Dagenham 001 | 7622 |
2 | -8 | Does not apply | E02000003 | Barking and Dagenham 002 | 10285 |
Converting London migration data to graph#
The ODMG dataset has specific column names for the migration data. We’ll map these to our function parameters and create an undirected graph to represent bidirectional migration flows.
[11]:
# Column mapping for ODMG dataset
source_col = "Migrant MSOA one year ago code"
target_col = "Middle layer Super Output Areas code"
weight_col = "Count"
zone_id_col = "MSOA21CD"
od_nodes, od_edges = c2g.od_matrix_to_graph(
od_london, zones_london, zone_id_col=zone_id_col,
matrix_type="edgelist",
source_col=source_col, target_col=target_col,
weight_cols=[weight_col],
threshold=None,
include_self_loops=False,
compute_edge_geometry=True,
directed=False,
as_nx=False,
)
len(od_nodes), len(od_edges)
/Users/yutasato/Projects/Liverpool/city2graph/city2graph/mobility.py:175: UserWarning: Dropped 36661 edges referencing unknown zone IDs (requirement 3.6)
aligned = _align_edgelist_zones(
[11]:
(7264, 1228547)
Inspecting the London graph structure#
Let’s examine the structure of our London migration network to understand what we’ve created.
[12]:
# Inspect a few rows
od_nodes.head()
[12]:
MSOA21CD | MSOA21NM | MSOA21NMW | BNG_E | BNG_N | LAT | LONG | GlobalID | geometry | |
---|---|---|---|---|---|---|---|---|---|
MSOA21CD | |||||||||
E02000001 | E02000001 | City of London 001 | 532384 | 181355 | 51.515621 | -0.093490 | {71249043-B176-4306-BA6C-D1A993B1B741} | MULTIPOLYGON (((532135.138 182198.131, 532071.... | |
E02000002 | E02000002 | Barking and Dagenham 001 | 548267 | 189685 | 51.586521 | 0.138756 | {997A80A8-0EBE-461C-91EB-3E4122571A6E} | MULTIPOLYGON (((548881.563 190845.265, 548845.... | |
E02000003 | E02000003 | Barking and Dagenham 002 | 548259 | 188520 | 51.576061 | 0.138149 | {62DED9D9-F53A-454D-AF35-04404D9DBE9B} | MULTIPOLYGON (((549102.438 189324.625, 549120.... | |
E02000004 | E02000004 | Barking and Dagenham 003 | 551004 | 186412 | 51.556389 | 0.176828 | {511181CD-E71F-4C63-81EE-E8E76744A627} | MULTIPOLYGON (((551550.056 187364.705, 551551.... | |
E02000005 | E02000005 | Barking and Dagenham 004 | 548733 | 186824 | 51.560692 | 0.144267 | {B0C823EB-69E0-4AE7-9E1C-37715CF3FE87} | MULTIPOLYGON (((549099.634 187656.076, 549057.... |
[13]:
od_edges.head()
[13]:
weight | Count | geometry | ||
---|---|---|---|---|
source | target | |||
E02000001 | E02000012 | 1 | 1 | LINESTRING (532485.482 181271.782, 545635.991 ... |
E02000024 | 1 | 1 | LINESTRING (532485.482 181271.782, 524413.978 ... | |
E02000029 | 1 | 1 | LINESTRING (532485.482 181271.782, 527246.245 ... | |
E02000030 | 1 | 1 | LINESTRING (532485.482 181271.782, 522736.368 ... | |
E02000035 | 3 | 3 | LINESTRING (532485.482 181271.782, 525825.887 ... |
Calculating Network centrality#
[17]:
G = c2g.gdf_to_nx(od_nodes, od_edges)
[19]:
# Calculate centralities
degree_centrality = nx.degree_centrality(G)
# Set as node attributes
nx.set_node_attributes(G, degree_centrality, 'degree_centrality')
[20]:
od_nodes, od_edges = c2g.nx_to_gdf(G)
[22]:
od_nodes.head()
[22]:
MSOA21CD | MSOA21NM | MSOA21NMW | BNG_E | BNG_N | LAT | LONG | GlobalID | geometry | degree_centrality | |
---|---|---|---|---|---|---|---|---|---|---|
MSOA21CD | ||||||||||
E02000001 | E02000001 | City of London 001 | 532384 | 181355 | 51.515621 | -0.093490 | {71249043-B176-4306-BA6C-D1A993B1B741} | MULTIPOLYGON (((532135.138 182198.131, 532071.... | 0.116205 | |
E02000002 | E02000002 | Barking and Dagenham 001 | 548267 | 189685 | 51.586521 | 0.138756 | {997A80A8-0EBE-461C-91EB-3E4122571A6E} | MULTIPOLYGON (((548881.563 190845.265, 548845.... | 0.039928 | |
E02000003 | E02000003 | Barking and Dagenham 002 | 548259 | 188520 | 51.576061 | 0.138149 | {62DED9D9-F53A-454D-AF35-04404D9DBE9B} | MULTIPOLYGON (((549102.438 189324.625, 549120.... | 0.061683 | |
E02000004 | E02000004 | Barking and Dagenham 003 | 551004 | 186412 | 51.556389 | 0.176828 | {511181CD-E71F-4C63-81EE-E8E76744A627} | MULTIPOLYGON (((551550.056 187364.705, 551551.... | 0.042269 | |
E02000005 | E02000005 | Barking and Dagenham 004 | 548733 | 186824 | 51.560692 | 0.144267 | {B0C823EB-69E0-4AE7-9E1C-37715CF3FE87} | MULTIPOLYGON (((549099.634 187656.076, 549057.... | 0.055349 |
[44]:
from matplotlib.collections import LineCollection
# Filter for flows >= 10
top_edges = od_edges[od_edges[weight_col] >= 10]
# Compute percentile ranks for weights
percentile_ranks = top_edges[weight_col].rank(pct=True)
# Normalize alpha values based on percentile ranks
alpha_values = percentile_ranks
# Create subplots: left for network, right for degree centrality
fig, axs = plt.subplots(1, 2, figsize=(24, 12))
# Left subplot: Network visualization
axs[0].set_title("England & Wales MSOA Migration Network: Flows ≥10\n(Alpha by Percentile Rank)", fontsize=22)
zones_london.plot(ax=axs[0], color="#BDC3C7", edgecolor="white", linewidth=0.3, alpha=0.6)
# Convert LineString geometries to coordinate lists for LineCollection
lines = [list(line.coords) for line in top_edges.geometry]
colors = [(0/255, 100/255, 200/255, alpha) for alpha in alpha_values] # Blue base color with alpha
linewidths = np.log1p(top_edges[weight_col]) * 0.2
lc = LineCollection(lines, colors=colors, linewidths=linewidths)
axs[0].add_collection(lc)
zones_london.centroid.plot(ax=axs[0], color="#2C3E50", markersize=5, alpha=0.8)
axs[0].set_axis_off()
# Right subplot: Degree centrality
axs[1].set_title("England & Wales MSOA Migration Network: Degree Centrality\n(Migration Flows 2021 - Quantile Classification)", fontsize=22)
od_nodes.plot(
column='degree_centrality',
ax=axs[1],
cmap='viridis',
edgecolor='white',
linewidth=0.1,
legend=True,
scheme='quantiles',
k=4,
legend_kwds={'title': 'Degree Centrality (Quantiles)', 'loc': 'upper left', 'bbox_to_anchor': (1, 1), 'fontsize': 14}
)
axs[1].set_axis_off()
axs[1].set_facecolor('#f8f9fa')
plt.tight_layout()
plt.show()
