Transportation Network Analysis with City2Graph¶
This notebook demonstrates the power of City2Graph for processing and analyzing public transportation networks. We'll use the General Transit Feed Specification (GTFS) data format to showcase how City2Graph transforms complex transit schedules into intuitive graph representations suitable for:
- Urban accessibility analysis - Understanding travel patterns and reachability
- Network visualization - Creating compelling maps of transit flows
- Graph neural networks - Converting transportation data into ML-ready formats
- Spatial analysis - Examining the relationship between transit and urban morphology
The City2Graph library simplifies the complex process of converting GTFS data into actionable insights, making transportation network analysis accessible to researchers, planners, and data scientists.
1. Environment Setup and Dependencies¶
Before we dive into transportation analysis, let's import the necessary libraries. City2Graph integrates seamlessly with the geospatial Python ecosystem, building on familiar tools like GeoPandas, Shapely, and NetworkX.
# Geospatial data processing
import geopandas as gpd
import networkx as nx
import rustworkx as rx
import pandas as pd
# Mapping and visualization
import contextily as ctx
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
# Network analysis
import osmnx as ox
# Others
from pathlib import Path
# The star of the show: city2graph for transportation network analysis
import city2graph as c2g
2. Loading GTFS Data with City2Graph¶
What is GTFS?¶
The General Transit Feed Specification (GTFS) is the global standard for public transportation schedules and geographic information. GTFS data contains multiple interconnected tables describing:
- Routes: Transit lines (bus routes, train lines, etc.)
- Stops: Physical locations where passengers board/alight
- Trips: Individual vehicle journeys along routes
- Stop times: Scheduled arrival/departure times at each stop
- Calendar: Service patterns (weekdays, weekends, holidays)
City2Graph's GTFS Advantage¶
While GTFS data is powerful, it's typically stored as separate CSV files that require complex joins and processing. City2Graph simplifies this workflow by:
- Automatic parsing of zipped GTFS files
- Spatial integration - converting coordinates to proper GeoDataFrames
- Data validation and type coercion for reliable analysis
- Seamless integration with the Python geospatial stack with compatibility to GeoDataFrame, nx.MultiGraph, PyTorch Geometric Data(), etc.
Let's see this in action with Transport for London data:
# Load GTFS data into DuckDB (current API)
sample_gtfs_path = Path("./data/itm_london_gtfs.zip")
print("Loading London Transport GTFS data into DuckDB...")
gtfs_con = c2g.load_gtfs(sample_gtfs_path)
print("GTFS database ready.")
Loading London Transport GTFS data into DuckDB...
FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))
GTFS database ready.
# Inspect loaded GTFS tables from DuckDB
tables_df = gtfs_con.execute("SHOW TABLES").df()
table_names = sorted(tables_df["name"].tolist())
print(f"Found {len(table_names)} data tables")
print(", ".join(table_names))
for table in ["stops", "routes", "stop_times"]:
if table in table_names:
n = gtfs_con.execute(f"SELECT COUNT(*) AS n FROM {table}").df().loc[0, 'n']
print(f"Total {table}: {n:,}")
Found 10 data tables agency, calendar, calendar_dates, feed_info, frequencies, routes, shapes, stop_times, stops, trips Total stops: 24,745 Total routes: 1,087 Total stop_times: 17,807,603
Understanding the GTFS Data Structure¶
City2Graph's current load_gtfs() function returns a DuckDB connection with GTFS tables loaded in-memory. This is convenient for large feeds because we can query only what we need with SQL and convert results to pandas/GeoPandas when needed.
Let's explore each component to understand how transit systems are structured:
- calendar: Service schedules with weekday/weekend availability windows
- trips: Individual vehicle trips linked to route and service
- routes: Route metadata (line names, route type, agency)
- stop_times: Ordered stop sequences and scheduled times for each trip
- stops: Stop attributes plus geometry (spatial points)
Together, these components provide a complete view of transit operations for robust network analysis and visualization.
# Explore GTFS tables available in DuckDB
print("Available GTFS tables:")
for i, table_name in enumerate(table_names, 1):
num_records = gtfs_con.execute(
f"SELECT COUNT(*) AS n FROM {table_name}"
).df().loc[0, 'n']
print(f" {i}. {table_name}: {num_records:,} records")
print("\n" + "=" * 50)
print("GTFS Table Descriptions:")
print("=" * 50)
print("agency - Transit operators (TfL, etc.)")
print("calendar - Service patterns (weekdays/weekends)")
print("routes - Transit lines")
print("stops - Physical stop locations (with coordinates)")
print("stop_times - Scheduled arrivals/departures")
print("trips - Individual vehicle journeys")
print("calendar_dates - Service exceptions (holidays, etc.)")
table_names
Available GTFS tables: 1. agency: 56 records 2. calendar: 576 records 3. calendar_dates: 40,264 records 4. feed_info: 1 records 5. frequencies: 61 records 6. routes: 1,087 records 7. shapes: 147,172 records 8. stop_times: 17,807,603 records 9. stops: 24,745 records 10. trips: 488,935 records ================================================== GTFS Table Descriptions: ================================================== agency - Transit operators (TfL, etc.) calendar - Service patterns (weekdays/weekends) routes - Transit lines stops - Physical stop locations (with coordinates) stop_times - Scheduled arrivals/departures trips - Individual vehicle journeys calendar_dates - Service exceptions (holidays, etc.)
['agency', 'calendar', 'calendar_dates', 'feed_info', 'frequencies', 'routes', 'shapes', 'stop_times', 'stops', 'trips']
# Agency information - Who operates the transit services?
print("Transit Agencies:")
print("This table contains information about transportation operators")
gtfs_con.execute("SELECT * FROM agency LIMIT 5").df()
Transit Agencies: This table contains information about transportation operators
| agency_id | agency_name | agency_url | agency_timezone | agency_lang | agency_phone | agency_noc | |
|---|---|---|---|---|---|---|---|
| 0 | OP11949 | Golden Tours | https://www.traveline.info | Europe/London | EN | None | GTSL |
| 1 | OP14145 | Quality Line | https://www.traveline.info | Europe/London | EN | None | QULN |
| 2 | OP14161 | London Underground (TfL) | https://www.traveline.info | Europe/London | EN | None | LULD |
| 3 | OP14162 | NATIONAL EXPRESS OPERAT | https://www.traveline.info | Europe/London | EN | None | SS |
| 4 | OP14163 | London Docklands Light Railway - TfL | https://www.traveline.info | Europe/London | EN | None | LDLR |
gtfs_con.execute("SELECT * FROM calendar LIMIT 5").df()
| service_id | monday | tuesday | wednesday | thursday | friday | saturday | sunday | start_date | end_date | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 20250530 | 20260228 |
| 1 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 20250530 | 20260228 |
| 2 | 3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 20250530 | 20260228 |
| 3 | 19 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 20250530 | 20260228 |
| 4 | 21 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 20250530 | 20260228 |
gtfs_con.execute("SELECT * FROM calendar_dates LIMIT 5").df()
| service_id | date | exception_type | |
|---|---|---|---|
| 0 | 20458 | 20250903 | 1 |
| 1 | 31969 | 20251001 | 2 |
| 2 | 31438 | 20250711 | 2 |
| 3 | 20583 | 20251105 | 1 |
| 4 | 33097 | 20251201 | 1 |
gtfs_con.execute("SELECT * FROM routes LIMIT 5").df()
| route_id | agency_id | route_short_name | route_long_name | route_type | |
|---|---|---|---|---|---|
| 0 | 58 | OP5050 | 025 | None | 200 |
| 1 | 89 | OP5050 | 444 | None | 200 |
| 2 | 100 | OP5050 | 007 | None | 200 |
| 3 | 116 | OP5050 | 022 | None | 200 |
| 4 | 289 | OP53 | 372 | None | 3 |
# Stops - The spatial foundation of transit networks
stops_df = gtfs_con.execute(
"""
SELECT * EXCLUDE (geometry), ST_AsText(geometry) AS geometry_wkt
FROM stops
LIMIT 5
"""
).df()
stops_preview = gpd.GeoDataFrame(
stops_df.drop(columns=["geometry_wkt"]),
geometry=gpd.GeoSeries.from_wkt(stops_df["geometry_wkt"]),
crs="EPSG:4326",
)
print("Transit Stops (with Spatial Coordinates):")
print("Notice how city2graph stores stop geometry in DuckDB and we can convert to GeoPandas")
print(f"Coordinate Reference System: {stops_preview.crs}")
print(f"Geometry type: {stops_preview.geometry.geom_type.iloc[0]}")
stops_preview.head()
Transit Stops (with Spatial Coordinates): Notice how city2graph stores stop geometry in DuckDB and we can convert to GeoPandas Coordinate Reference System: EPSG:4326 Geometry type: Point
| stop_id | stop_code | stop_name | stop_lat | stop_lon | wheelchair_boarding | location_type | parent_station | platform_code | geometry | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 490014597S | 48536 | White Hart Ln Grt Cambridge Rd | 51.6049 | -0.08595 | 0 | 0 | None | None | POINT (-0.08595 51.6049) |
| 1 | 490007372S | 74106 | Granville Place | 51.5965 | -0.38728 | 0 | 0 | None | None | POINT (-0.38728 51.5965) |
| 2 | 490013521E | 52358 | The Ravensbury | 51.39799 | -0.15787 | 0 | 0 | None | None | POINT (-0.15787 51.39799) |
| 3 | 240G006160A | NaN | Bus Station | 51.27127 | 0.1933035 | 0 | 1 | None | None | POINT (0.1933 51.27127) |
| 4 | 490007476V | 56036 | Palmers Green / Green Lanes | 51.61252 | -0.1071 | 0 | 0 | None | None | POINT (-0.1071 51.61252) |
gtfs_con.execute("SELECT * FROM stop_times LIMIT 5").df()
| trip_id | arrival_time | departure_time | stop_id | stop_sequence | stop_headsign | pickup_type | drop_off_type | shape_dist_traveled | timepoint | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | VJ000015f02fdb2ffc0444ac5453798bd8befdca76 | 05:05:00 | 05:05:00 | 490006587C | 1 | None | 0 | 0 | None | 0 |
| 1 | VJ000015f02fdb2ffc0444ac5453798bd8befdca76 | 05:05:00 | 05:05:00 | 490009222A | 0 | None | 0 | 1 | None | 0 |
| 2 | VJ000015f02fdb2ffc0444ac5453798bd8befdca76 | 05:06:00 | 05:06:00 | 490006588R | 2 | None | 0 | 0 | None | 0 |
| 3 | VJ000015f02fdb2ffc0444ac5453798bd8befdca76 | 05:06:00 | 05:06:00 | 490009169S | 3 | None | 0 | 0 | None | 0 |
| 4 | VJ000015f02fdb2ffc0444ac5453798bd8befdca76 | 05:07:00 | 05:07:00 | 490004650S | 5 | None | 0 | 0 | None | 0 |
gtfs_con.execute("SELECT * FROM trips LIMIT 5").df()
| route_id | service_id | trip_id | trip_headsign | direction_id | block_id | shape_id | wheelchair_accessible | vehicle_journey_code | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 58 | 32443 | VJ0b7453c953d79096488dd30c8d67da29644842ed | Brighton - Victoria, London | 1 | None | None | 0 | VJ99 |
| 1 | 58 | 32443 | VJ0873eade6dfa11f222109beac2b7504007554ccd | Belgravia, Victoria - Brighton | 0 | None | None | 0 | VJ109 |
| 2 | 58 | 32443 | VJ01dbdf44f6d74ff8027d089fb3db5ae022559673 | Worthing - Victoria, London | 1 | None | None | 0 | VJ57 |
| 3 | 58 | 32445 | VJ1218298147f26d820d17ed4344a4dc9dc5f24cb6 | Brighton - Victoria, London | 1 | None | None | 0 | VJ29 |
| 4 | 58 | 32444 | VJ18b4d838cbdac9a9ab9de1d2f0ca0426719ba0a8 | Belgravia, Victoria - Brighton | 0 | None | None | 0 | VJ142 |
# Build a full stops GeoDataFrame from DuckDB for quick map preview
stops_all_df = gtfs_con.execute(
"""
SELECT * EXCLUDE (geometry), ST_AsText(geometry) AS geometry_wkt
FROM stops
"""
).df()
stops_gdf = gpd.GeoDataFrame(
stops_all_df.drop(columns=["geometry_wkt"]),
geometry=gpd.GeoSeries.from_wkt(stops_all_df["geometry_wkt"]),
crs="EPSG:4326",
)
# Reproject to British National Grid for accurate map display
stops_gdf_bng = stops_gdf.to_crs(epsg=27700)
print("Visualizing London's Transit Stops")
ax = c2g.plot_graph(nodes=stops_gdf_bng)
ctx.add_basemap(ax, crs=stops_gdf_bng.crs, source=ctx.providers.CartoDB.DarkMatter)
plt.show()
Visualizing London's Transit Stops
3. Creating Transit Graphs with City2Graph¶
The Power of Graph Representation¶
Raw GTFS data contains thousands of individual trips and stop times, but what we really want to understand are the connections and flow patterns in the network. This is where City2Graph shines - it transforms complex scheduling data into clean graph representations.
After loading the GTFS, travel_summary_graph() can summarize the trips between stops. The output contains the origin and destination of stops, with average travel times in seconds and frequency in the specified time intervals.
What it does:
- Processes thousands of individual trips into meaningful connections between stops
- Calculates average travel times between consecutive stops
- Counts service frequency (how often services run between stop pairs)
- Creates spatial geometries for each connection
- Handles complex scheduling including service calendars and time-of-day filtering
Why this matters:
- Transforms scheduling complexity into simple origin-destination relationships
- Enables network analysis (shortest paths, centrality, accessibility)
- Perfect input for graph neural networks and machine learning
- Ready-to-use format for visualization and spatial analysis
Let's see this powerful transformation in action:
# Transform GTFS schedules into a travel network graph
travel_summary_nodes, travel_summary_edges = c2g.travel_summary_graph(
gtfs_con,
calendar_start="20250601", # Analyze services for June 1, 2025
calendar_end="20250601", # Single day analysis for demonstration
)
travel_summary_nodes = travel_summary_nodes.to_crs(epsg=27700)
travel_summary_edges = travel_summary_edges.to_crs(epsg=27700)
print("Created graph with:")
print(f" • {len(travel_summary_nodes):,} nodes (stops with connections)")
print(f" • {len(travel_summary_edges):,} edges (stop-to-stop connections)")
print(" • Each edge contains travel time (seconds) and frequency data")
FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))
Created graph with: • 24,745 nodes (stops with connections) • 26,449 edges (stop-to-stop connections) • Each edge contains travel time (seconds) and frequency data
travel_summary_nodes.head()
| stop_code | stop_name | stop_lat | stop_lon | wheelchair_boarding | location_type | parent_station | platform_code | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| stop_id | |||||||||
| 01000053216 | bstgjpt | Bus Station | 51.45906 | -2.59298 | 0 | 0 | 010G0005 | 8 | POINT (358898.332 173509.501) |
| 01000053218 | bstgjmw | Bus Station | 51.45901 | -2.59314 | 0 | 0 | 010G0005 | 6 | POINT (358887.171 173504.03) |
| 01000053221 | bstgjgm | Bus Station | 51.45895 | -2.59334 | 0 | 0 | 010G0005 | 3 | POINT (358873.221 173497.47) |
| 0100BRP90233 | bstdmgp | Triangle West | 51.45652 | -2.60802 | 0 | 0 | NaN | NaN | POINT (357851.066 173235.586) |
| 0100BRP90237 | bstdmwa | Queen's Road | 51.45651 | -2.60658 | 0 | 0 | NaN | NaN | POINT (357951.108 173233.644) |
# Examine the edges (connections) in our travel network
print("Network Edges (Transit Connections):")
print("Each row represents a direct connection between two stops")
print("\nKey metrics city2graph calculated:")
print(" • travel_time_sec: Average time to travel between stops (seconds)")
print(" • frequency: Number of services per day on this connection")
print(" • geometry: LineString for mapping and spatial analysis")
print(f"\nPerformance insight:")
print(f" Fastest connection: {travel_summary_edges['travel_time_sec'].min():.0f} seconds")
print(f" Busiest connection: {travel_summary_edges['frequency'].max():.0f} services/day")
print(f" Average travel time: {travel_summary_edges['travel_time_sec'].mean():.0f} seconds")
travel_summary_edges.head()
Network Edges (Transit Connections): Each row represents a direct connection between two stops Key metrics city2graph calculated: • travel_time_sec: Average time to travel between stops (seconds) • frequency: Number of services per day on this connection • geometry: LineString for mapping and spatial analysis Performance insight: Fastest connection: 1 seconds Busiest connection: 1066 services/day Average travel time: 219 seconds
| frequency | geometry | travel_time_sec | ||
|---|---|---|---|---|
| from_stop_id | to_stop_id | |||
| 01000053216 | 0170SGP90689 | 179 | LINESTRING (358898.332 173509.501, 362270.014 ... | 821.22905 |
| 0190NSZ01231 | 14 | LINESTRING (358898.332 173509.501, 336803.818 ... | 3000.00000 | |
| 035059860001 | 28 | LINESTRING (358898.332 173509.501, 471026.138 ... | 5625.00000 | |
| 1100DEA57098 | 12 | LINESTRING (358898.332 173509.501, 292592.045 ... | 6450.00000 | |
| 360000174 | 46 | LINESTRING (358898.332 173509.501, 325332.697 ... | 3750.00000 |
# Examine the edges (connections) in our travel network
print("Network Edges (Transit Connections):")
print("Each row represents a direct connection between two stops")
print("\nKey metrics city2graph calculated:")
print(" • travel_time_sec: Average time to travel between stops (seconds)")
print(" • frequency: Number of services per day on this connection")
print(" • geometry: LineString for mapping and spatial analysis")
print(f"\nPerformance insight:")
print(f" Fastest connection: {travel_summary_edges['travel_time_sec'].min():.0f} seconds")
print(f" Busiest connection: {travel_summary_edges['frequency'].max():.0f} services/day")
print(f" Average travel time: {travel_summary_edges['travel_time_sec'].mean():.0f} seconds")
travel_summary_edges.head()
Network Edges (Transit Connections): Each row represents a direct connection between two stops Key metrics city2graph calculated: • travel_time_sec: Average time to travel between stops (seconds) • frequency: Number of services per day on this connection • geometry: LineString for mapping and spatial analysis Performance insight: Fastest connection: 1 seconds Busiest connection: 1066 services/day Average travel time: 219 seconds
| frequency | geometry | travel_time_sec | ||
|---|---|---|---|---|
| from_stop_id | to_stop_id | |||
| 01000053216 | 0170SGP90689 | 179 | LINESTRING (358898.332 173509.501, 362270.014 ... | 821.22905 |
| 0190NSZ01231 | 14 | LINESTRING (358898.332 173509.501, 336803.818 ... | 3000.00000 | |
| 035059860001 | 28 | LINESTRING (358898.332 173509.501, 471026.138 ... | 5625.00000 | |
| 1100DEA57098 | 12 | LINESTRING (358898.332 173509.501, 292592.045 ... | 6450.00000 | |
| 360000174 | 46 | LINESTRING (358898.332 173509.501, 325332.697 ... | 3750.00000 |
Now we will clean up the dataset to focus on the Greater London extent. We clip the transit network to the London administrative boundary using City2Graph's boundary utility, then continue network analysis (Sections 4–5) on this London-wide graph.
# Get Greater London boundary using city2graph
london_boundary = c2g.get_boundaries("Greater London, UK").to_crs(epsg=27700)
# Ensure projected CRS for spatial operations
travel_summary_nodes = travel_summary_nodes.to_crs(epsg=27700)
travel_summary_edges = travel_summary_edges.to_crs(epsg=27700)
# Clip graph to Greater London extent
print("Clipping nodes and edges to Greater London boundary...")
travel_summary_nodes, travel_summary_edges = c2g.clip_graph(
(travel_summary_nodes, travel_summary_edges),
london_boundary,
keep_outer_neighbors=False,
)
print("Boundary clipping complete:")
print(f" Nodes within London: {len(travel_summary_nodes):,}")
print(f" Edges within London: {len(travel_summary_edges):,}")
Clipping nodes and edges to Greater London boundary... Boundary clipping complete: Nodes within London: 19,233 Edges within London: 23,133
# Quick check of London-wide network
print("Greater London network summary")
print(f"Nodes: {len(travel_summary_nodes):,}")
print(f"Edges: {len(travel_summary_edges):,}")
travel_summary_nodes.head()
Greater London network summary Nodes: 19,233 Edges: 23,133
| stop_code | stop_name | stop_lat | stop_lon | wheelchair_boarding | location_type | parent_station | platform_code | geometry | |
|---|---|---|---|---|---|---|---|---|---|
| stop_id | |||||||||
| 210021803340 | hrtajatj | Mount Vernon Hospital | 51.61460025309353 | -0.450657771262085 | 0 | 0 | NaN | NaN | POINT (507371.001 191778) |
| 490000002A | 57685 | Acton Town | 51.50273 | -0.28101 | 0 | 0 | NaN | NaN | POINT (519408.799 179599.927) |
| 490000002B | 50980 | Acton Town | 51.50402 | -0.27986 | 0 | 0 | NaN | NaN | POINT (519485.242 179745.257) |
| 490000003B | 47908 | St Botolph Street | 51.51495 | -0.07514 | 0 | 0 | NaN | NaN | POINT (533661.004 181314.47) |
| 490000003R | 58621 | Aldgate | 51.51408 | -0.07493 | 0 | 0 | NaN | NaN | POINT (533678.12 181218.107) |
As a result, we can see the overview by plot_graph().
# Plot the network with dual encoding (color + width) using city2graph
c2g.plot_graph(
nodes=travel_summary_nodes,
edges=travel_summary_edges,
edge_color='travel_time_sec',
edge_linewidth=travel_summary_edges['frequency'] / 250,
edge_alpha=0.8,
)
plt.show()
4. Network Centrality Analysis¶
We can now visualize the network structure using the calculated betweenness centrality. Nodes with higher centrality (key hubs) will be highlighted with brighter colors, while edge thickness represents service frequency. This visualization helps identify the most critical stops and connections that serve as bridges in the London transit network.
For the calcluation of network centralities, networkx is not the fastest solution. city2graph provides conversion functionalies to rustworkx for better compuation efficiency by rust.
travel_summary_graph = c2g.gdf_to_nx(
travel_summary_nodes,
travel_summary_edges,
)
travel_rx_graph = c2g.nx_to_rx(travel_summary_graph)
betweenness_centrality = rx.betweenness_centrality(travel_rx_graph)
# Set the betweenness centrality as a node attribute
nx.set_node_attributes(
travel_summary_graph,
betweenness_centrality,
"betweenness_centrality"
)
travel_nodes, travel_edges = c2g.nx_to_gdf(travel_summary_graph)
Removed 3 invalid geometries
# Calculate quantiles for node size to handle skew in centrality values
# Drop duplicate bin edges to avoid qcut errors when many nodes share the same centrality
centrality_bins = pd.qcut(
travel_nodes['betweenness_centrality'],
q=10,
labels=False,
duplicates="drop"
)
travel_nodes['centrality_quantile'] = centrality_bins.fillna(0).astype(int) + 1
travel_nodes['node_size_visual'] = travel_nodes['centrality_quantile'] * 15
fig, ax = plt.subplots(figsize=(16, 16))
# Use city2graph to plot the network structure
# Node color represents centrality (skews handled by quantiles)
# Edge width represents frequency
c2g.plot_graph(
nodes=travel_nodes,
edges=travel_edges,
markersize=8,
node_color='centrality_quantile',
node_alpha=0.9,
edge_color='#bbbbbb',
edge_linewidth=travel_edges['frequency'] / 200,
edge_alpha=0.7,
figsize=(16, 16),
title="Central London Transit Network\nBetweenness Centrality of Stops",
legend=False, # remove colorbar
ax=ax
)
# Add legend for betweenness centrality ranges
cmap = plt.cm.viridis
norm = plt.Normalize(
vmin=travel_nodes['centrality_quantile'].min(),
vmax=travel_nodes['centrality_quantile'].max()
)
quantile_ranges = (
travel_nodes
.groupby('centrality_quantile')['betweenness_centrality']
.agg(['min', 'max'])
.reset_index()
)
handles = [
mlines.Line2D(
[],
[],
color=cmap(norm(row['centrality_quantile'])),
marker='o',
linestyle='',
markersize=8,
label=f"{row['min']:.6f} – {row['max']:.6f}"
)
for _, row in quantile_ranges.iterrows()
]
ax.legend(handles=handles, title='Betweenness Centrality', loc='lower right')
# Add basemap with a clean, modern style
ctx.add_basemap(ax, crs=travel_nodes.crs, source=ctx.providers.CartoDB.DarkMatter)
plt.show()
As seen in the plot, we could detect stations with higher betweenness centrality that are alongside of main streets.
To see local transportation patterns, we will filter the graph using filter_graph_by_distance. This time we filter it with a 15-minute (900-second) travel time from the center of London.
# Filter graph to show network within 15 minutes from central London
center_point = london_boundary.geometry.to_crs(27700).iloc[0].centroid
filtered_travel_summary_graph = c2g.filter_graph_by_distance(
travel_summary_graph,
center_point=center_point,
threshold=900, # 900 seconds (~15 minutes travel time)
edge_attr="travel_time_sec" # Filter by travel time, not physical distance
)
filtered_travel_rx_graph = c2g.nx_to_rx(filtered_travel_summary_graph)
betweenness_centrality_filtered = rx.betweenness_centrality(filtered_travel_rx_graph)
# Set the betweenness centrality as a node attribute
nx.set_node_attributes(
filtered_travel_summary_graph,
betweenness_centrality_filtered,
"betweenness_centrality"
)
filtered_travel_nodes, filtered_travel_edges = c2g.nx_to_gdf(
filtered_travel_summary_graph
)
# Calculate quantiles for node size to handle skew in centrality values
# We bin the data into 5 quantiles and scale the size for visibility
filtered_travel_nodes['centrality_quantile'] = pd.qcut(
filtered_travel_nodes['betweenness_centrality'],
q=5,
labels=False
) + 1
filtered_travel_nodes['node_size_visual'] = filtered_travel_nodes['centrality_quantile'] * 15
fig, ax = plt.subplots(figsize=(16, 16))
# Use city2graph to plot the network structure
# Node size represents centrality (skews handled by quantiles)
# Edge width represents frequency
c2g.plot_graph(
nodes=filtered_travel_nodes,
edges=filtered_travel_edges,
node_color='centrality_quantile',
markersize=100,
node_alpha=0.9,
edge_color='#bbbbbb',
edge_linewidth=filtered_travel_edges['frequency'] / 200,
edge_alpha=0.7,
ax=ax
)
# Add basemap with a clean, modern style
ctx.add_basemap(ax, crs=filtered_travel_nodes.crs, source=ctx.providers.CartoDB.DarkMatter)
# Add legend for betweenness centrality ranges
cmap = plt.cm.viridis
norm = plt.Normalize(
vmin=filtered_travel_nodes['centrality_quantile'].min(),
vmax=filtered_travel_nodes['centrality_quantile'].max()
)
quantile_ranges = (
filtered_travel_nodes
.groupby('centrality_quantile')['betweenness_centrality']
.agg(['min', 'max'])
.reset_index()
)
handles = [
mlines.Line2D(
[],
[],
color=cmap(norm(row['centrality_quantile'])),
marker='o',
linestyle='',
markersize=8,
label=f"{row['min']:.6f} – {row['max']:.6f}"
)
for _, row in quantile_ranges.iterrows()
]
ax.legend(handles=handles, title='Betweenness Centrality', loc='lower right')
plt.show()
6. Accessibility Analysis with Isochrones¶
In this section, we compare isochrones for two scenarios only: walk-only and walk+transit accessibility from the same origin point in London.
Understanding Isochrones¶
An isochrone is a contour line connecting points that are equally accessible (by travel time) from a given origin. In transportation networks, isochrones reveal the "reachability" from a central location - showing how far passengers can travel within a specific time limit.
Why isochrones matter:
- Accessibility assessment - Which neighborhoods are within 15 minutes of travel?
- Mode comparison - How much extra reach is gained when transit is added to walking?
- Equity analysis - Identifying underserved areas with limited access
- Urban planning - Informing decisions about multimodal infrastructure investment
City2Graph's create_isochrone() function simplifies this complex spatial analysis, automatically:
- Building travel time relationships from network edges
- Finding all reachable nodes within a travel time threshold
- Generating smooth boundary polygons around reachable areas
- Supporting multiple polygon generation methods (concave hull, alpha shape, convex hull, buffer)
- Supporting heterogeneous graphs with multimodal situations (street + transit)
Isochrones on Street Networks (Walk-Only Baseline)¶
To build a walk-only baseline, we create a walk network for a Central London analysis polygon. This keeps the OSM query tractable and gives us a direct baseline for comparison against the walk+transit scenario.
# Keep walking-network download limited for performance
analysis_radius_m = 10_000 # 10 km radius around Piccadilly Circus
# Piccadilly Circus (WGS84 input coordinate)
piccadilly_circus_lon, piccadilly_circus_lat = -0.1337, 51.5101
piccadilly_circus_point = gpd.GeoSeries.from_xy(
[piccadilly_circus_lon],
[piccadilly_circus_lat],
crs="EPSG:4326",
).to_crs(epsg=27700).iloc[0]
analysis_area = gpd.GeoDataFrame(
{"name": ["Piccadilly Circus 10km"]},
geometry=[piccadilly_circus_point.buffer(analysis_radius_m)],
crs="EPSG:27700",
)
analysis_poly_wgs84 = analysis_area.to_crs(epsg=4326).geometry.iloc[0]
# Configure OSMnx for reliable downloads
ox.settings.use_cache = True
ox.settings.cache_folder = "./cache/osmnx"
ox.settings.requests_timeout = 180
print("Fetching Central London walk network from OSM...")
print(f"Query polygon area: {analysis_area.geometry.iloc[0].area / 1e6:.1f} km^2")
street_graph = ox.graph_from_polygon(
analysis_poly_wgs84,
network_type="walk",
simplify=True,
retain_all=True,
truncate_by_edge=True,
)
if street_graph.number_of_nodes() == 0 or street_graph.number_of_edges() == 0:
raise RuntimeError(
"OSM download returned an empty graph. "
"Try a larger area or check connectivity/API availability."
)
print(
f"OSM network downloaded successfully: "
f"{street_graph.number_of_nodes():,} nodes, {street_graph.number_of_edges():,} edges"
)
# Convert to GeoDataFrames and reproject to British National Grid
street_nodes, street_edges = c2g.nx_to_gdf(street_graph)
street_nodes = street_nodes.to_crs(epsg=27700)
street_edges = street_edges.to_crs(epsg=27700)
# Snap isochrone center to the nearest street node around Piccadilly Circus
nearest_node_idx = street_nodes.geometry.distance(piccadilly_circus_point).idxmin()
center_point = street_nodes.loc[nearest_node_idx, "geometry"]
snap_distance_m = piccadilly_circus_point.distance(center_point)
# Compute walking travel time (seconds) assuming 4.5 km/h = 4500 m/h
street_edges["length"] = street_edges.geometry.length
street_edges["travel_time_sec"] = street_edges["length"] / 4500 * 3600
print(f"Isochrone origin snapped to nearest street node ({snap_distance_m:.1f} m from Piccadilly Circus)")
print(f"Street nodes (limited area): {len(street_nodes):,}")
print(f"Street edges (limited area): {len(street_edges):,}")
display(street_edges.head())
Fetching Central London street network from OSM (walk network)... Query polygon area: 313.7 km^2 OSM download attempt 1/3
Fetching Central London street network from OSM (walk network)... Query polygon area: 313.7 km^2 OSM download attempt 1/3 OSM network downloaded successfully: 173,345 nodes, 440,544 edges Isochrone origin snapped to nearest street node (9.0 m from Piccadilly Circus) Street nodes (limited area): 173,345 Street edges (limited area): 440,544
| osmid | access | highway | maxspeed | name | oneway | reversed | length | geometry | bridge | lanes | ref | service | junction | tunnel | width | est_width | area | travel_time_sec | |||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 78112 | 25508583 | 0 | 129375498 | permissive | unclassified | 20 mph | Outer Circle | False | False | 19.399136 | LINESTRING (528725 182525.235, 528726.116 1825... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 15.519308 |
| 25508584 | 0 | 129375498 | permissive | unclassified | 20 mph | Outer Circle | False | True | 63.718133 | LINESTRING (528725 182525.235, 528723.33 18258... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 50.974506 | |
| 1 | 4257258 | permissive | residential | 20 mph | Cambridge Terrace | False | False | 102.053589 | LINESTRING (528725 182525.235, 528744.582 1825... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 81.642871 | ||
| 99884 | 12378884761 | 0 | 4082681 | NaN | footway | NaN | NaN | False | False | 173.282760 | LINESTRING (528245.101 182222.468, 528259.75 1... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 138.626208 |
| 4544836450 | 0 | 5090291 | NaN | footway | NaN | NaN | False | False | 308.182017 | LINESTRING (528245.101 182222.468, 528275.674 ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 246.545613 |
walking_labels = ["5 minutes", "10 minutes", "15 minutes"]
walking_threshold_seconds = [300, 600, 900]
# Compute layered walking isochrones in one call
walking_isochrones = c2g.create_isochrone(
nodes=street_nodes,
edges=street_edges,
center_point=center_point,
threshold=walking_threshold_seconds,
edge_attr="travel_time_sec",
method="concave_hull_knn",
k=50
)
walking_isochrones["label"] = walking_labels
for _, row in walking_isochrones.iterrows():
area_km2 = row.geometry.area / 1e6
print(f"{row['label']:12} -> {area_km2:6.2f} km^2 reachable area")
5 minutes -> 0.28 km^2 reachable area 10 minutes -> 1.13 km^2 reachable area 15 minutes -> 2.66 km^2 reachable area
fig, ax = plt.subplots(figsize=(16, 14))
color_by_label = {
"5 minutes": "#ff6b6b",
"10 minutes": "#48dbfb",
"15 minutes": "#1dd1a1",
}
# Plot larger thresholds first for layering
for _, row in walking_isochrones.sort_values("threshold", ascending=False).iterrows():
row_gdf = gpd.GeoDataFrame([row], geometry="geometry", crs=walking_isochrones.crs)
row_gdf.plot(
ax=ax,
alpha=0.35,
color=color_by_label.get(row["label"], "#1dd1a1"),
edgecolor="#111111",
linewidth=1.25,
label=row["label"],
zorder=2,
)
# Origin marker
ax.plot(center_point.x, center_point.y, "r*", markersize=28, markeredgecolor="white", label="Origin", zorder=4)
# Basemap for context
ctx.add_basemap(ax, crs=street_nodes.crs, source=ctx.providers.CartoDB.Positron, alpha=0.6)
ax.set_title("Walking Accessibility Isochrones\nWalk Travel Times from Center of London", fontsize=16, color="black", pad=18)
ax.legend(loc="upper right", framealpha=0.9)
ax.axis("off")
plt.tight_layout()
plt.show()
/var/folders/_n/l2f9tkgn3g17dj7hnsjprssc0000gn/T/ipykernel_92928/2395002001.py:29: UserWarning: Legend does not support handles for PatchCollection instances. See: https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html#implementing-a-custom-legend-handler ax.legend(loc="upper right", framealpha=0.9)
/var/folders/_n/l2f9tkgn3g17dj7hnsjprssc0000gn/T/ipykernel_92928/2395002001.py:29: UserWarning: Legend does not support handles for PatchCollection instances. See: https://matplotlib.org/stable/tutorials/intermediate/legend_guide.html#implementing-a-custom-legend-handler ax.legend(loc="upper right", framealpha=0.9)