Transportation#
Transportation Network Analysis Module.
This module provides comprehensive functionality for processing General Transit Feed Specification (GTFS) data and creating transportation network representations. It specializes in converting public transit data into graph structures suitable for network analysis and accessibility studies.
All functions return ready-to-use pandas/GeoPandas objects or NetworkX graphs that can be seamlessly integrated into analysis pipelines, notebooks, or model training workflows.
- load_gtfs(path)[source]#
Parse a GTFS zip file and enrich stops/shapes with geometry.
This function loads a GTFS (General Transit Feed Specification) zip file and converts it into a dictionary of pandas/GeoPandas DataFrames. Stop locations and route shapes are automatically converted to geometric objects for spatial analysis.
- Parameters:
path (str or pathlib.Path) – Location of the zipped GTFS feed (e.g.
"./rome_gtfs.zip"
).- Returns:
Keys are the original GTFS file names (without extension) and values are pandas or GeoPandas DataFrames ready for analysis.
- Return type:
See also
get_od_pairs
Create origin-destination pairs from GTFS data.
travel_summary_graph
Create network representation from GTFS data.
Notes
The function never mutates the original file - everything is kept in memory.
Geometry columns are added only when the relevant coordinate columns are present and valid.
Examples
>>> from pathlib import Path >>> gtfs = load_gtfs(Path("data/rome_gtfs.zip")) >>> print(list(gtfs)) ['agency', 'routes', 'trips', 'stops', ...] >>> gtfs['stops'].head(3)[['stop_name', 'geometry']] stop_name geometry 0 Termini (MA) POINT (12.50118 41.90088) 1 Colosseo(MB) POINT (12.49224 41.89021)
- get_od_pairs(gtfs, start_date=None, end_date=None, include_geometry=True)[source]#
Materialise origin-destination pairs for every trip and service day.
This function creates a comprehensive dataset of all origin-destination pairs for transit trips within the specified date range, optionally including geometric information for spatial analysis.
- Parameters:
gtfs (dict) – Dictionary returned by
load_gtfs()
.start_date (str or None, optional) – Restrict the calendar expansion to the closed interval
[start_date, end_date]
(format YYYYMMDD). When None the period is inferred fromcalendar.txt
.end_date (str or None, optional) – Restrict the calendar expansion to the closed interval
[start_date, end_date]
(format YYYYMMDD). When None the period is inferred fromcalendar.txt
.include_geometry (bool, default True) – If True the result is a GeoDataFrame whose geometry is a straight LineString connecting the two stops.
- Returns:
One row per trip-day-leg with departure / arrival timestamps, travel time in seconds and, optionally, geometry.
- Return type:
See also
load_gtfs
Load GTFS data from zip file.
travel_summary_graph
Create network representation from GTFS data.
Examples
>>> gtfs = load_gtfs("data/rome_gtfs.zip") >>> od = get_od_pairs(gtfs, start_date="20230101", end_date="20230107") >>> od.head(3)[['orig_stop_id', 'dest_stop_id', 'travel_time_sec']] orig_stop_id dest_stop_id travel_time_sec 0 7045490 7045491 120.0 1 7045491 7045492 180.0 2 7045492 7045493 240.0
- travel_summary_graph(gtfs, start_time=None, end_time=None, calendar_start=None, calendar_end=None, as_nx=False)[source]#
Aggregate stop-to-stop travel time & frequency into an edge list.
This function analyzes GTFS data to create a network representation of transit connections, computing average travel times and service frequencies between consecutive stops.
- Parameters:
gtfs (dict) – A dictionary produced by
load_gtfs()
- must contain at leaststop_times
andstops
.start_time (str or None, optional) – Consider only trips whose departure falls inside
[start_time, end_time]
(format HH:MM:SS). When None the whole service day is used.end_time (str or None, optional) – Consider only trips whose departure falls inside
[start_time, end_time]
(format HH:MM:SS). When None the whole service day is used.calendar_start (str or None, optional) – Period over which service-days are counted (format YYYYMMDD). If omitted it spans the native range in
calendar.txt
.calendar_end (str or None, optional) – Period over which service-days are counted (format YYYYMMDD). If omitted it spans the native range in
calendar.txt
.as_nx (bool, default False) – If True return a NetworkX graph, otherwise two GeoDataFrames
(nodes_gdf, edges_gdf)
. The latter follow the convention used in utils.py.
- Returns:
Nodes - every stop with a valid geometry.
Edges - columns =
from_stop_id, to_stop_id, mean_travel_time, frequency, geometry
.
- Return type:
tuple[geopandas.GeoDataFrame, geopandas.GeoDataFrame] or networkx.Graph
See also
get_od_pairs
Create origin-destination pairs from GTFS data.
load_gtfs
Load GTFS data from zip file.
Examples
>>> gtfs = load_gtfs("data/rome_gtfs.zip") >>> nodes, edges = travel_summary_graph( ... gtfs, ... start_time="07:00:00", ... end_time="10:00:00", ... ) >>> print(edges.head(3)[['travel_time', 'frequency']]) travel_time frequency from_stop_id to_stop_id 7045490 7045491 120.0 42 7045491 7045492 180.0 42 7045492 7045493 240.0 42
You can directly obtain a NetworkX object too:
>>> G = travel_summary_graph(gtfs, as_nx=True) >>> print(G.number_of_nodes(), G.number_of_edges()) 2564 3178