Utils#
Core Utilities Module.
This module provides essential utilities for graph conversion, data validation, and spatial analysis operations. It serves as the foundation for the city2graph package, offering a standardized data format for handling geospatial relations across GeoPandas, NetworkX objects, and eventually PyTorch Geometric objects. The module enables seamless integration between different graph representations and geospatial data formats through robust data structures and conversion functions.
- gdf_to_nx(nodes=None, edges=None, keep_geom=True, multigraph=False, directed=False)[source]#
Convert GeoDataFrames of nodes and edges to a NetworkX graph.
This function provides a high-level interface to convert geospatial data, represented as GeoDataFrames, into a NetworkX graph. It supports both homogeneous and heterogeneous graphs.
For homogeneous graphs, provide a single GeoDataFrame for nodes and edges. For heterogeneous graphs, provide dictionaries mapping type names to GeoDataFrames.
- Parameters:
nodes (geopandas.GeoDataFrame or dict[str, geopandas.GeoDataFrame], optional) – Node data. For homogeneous graphs, a single GeoDataFrame. For heterogeneous graphs, a dictionary mapping node type names to GeoDataFrames. Node IDs are taken from the GeoDataFrame index.
edges (geopandas.GeoDataFrame or dict, optional) – Edge data. For homogeneous graphs, a single GeoDataFrame. For heterogeneous graphs, a dictionary mapping edge type tuples (source_type, relation_type, target_type) to GeoDataFrames. Edge relationships are defined by a MultiIndex on the edge GeoDataFrame (source ID, target ID). For MultiGraphs, a third level in the index can be used for edge keys.
keep_geom (bool, default True) – If True, the geometry of the nodes and edges GeoDataFrames will be preserved as attributes in the NetworkX graph.
multigraph (bool, default False) – If True, a networkx.MultiGraph is created, which can store multiple edges between the same two nodes.
directed (bool, default False) – If True, a directed graph (networkx.DiGraph or networkx.MultiDiGraph) is created. Otherwise, an undirected graph is created.
- Returns:
A NetworkX graph object representing the spatial network. Graph-level metadata, such as CRS and heterogeneity information, is stored in graph.graph.
- Return type:
networkx.Graph or networkx.MultiGraph or networkx.DiGraph or networkx.MultiDiGraph
See also
nx_to_gdf
Convert a NetworkX graph back to GeoDataFrames.
Examples
>>> # Homogeneous graph >>> import geopandas as gpd >>> import pandas as pd >>> from shapely.geometry import Point, LineString >>> nodes_gdf = gpd.GeoDataFrame( ... geometry=[Point(0, 0), Point(1, 1)], ... index=pd.Index([10, 20], name="node_id") ... ) >>> edges_gdf = gpd.GeoDataFrame( ... {"length": [1.414]}, ... geometry=[LineString([(0, 0), (1, 1)])], ... index=pd.MultiIndex.from_tuples([(10, 20)], names=["u", "v"]) ... ) >>> G = gdf_to_nx(nodes=nodes_gdf, edges=edges_gdf) >>> print(G.nodes(data=True)) >>> [(0, {'geometry': <POINT (0 0)>, ... '_original_index': 10, ... 'pos': (0.0, 0.0)}), ... (1, {'geometry': <POINT (1 1)>, ... '_original_index': 20, ... 'pos': (1.0, 1.0)})] >>> print(G.edges(data=True)) >>> [(0, 1, {'length': 1.414, ... 'geometry': <LINESTRING (0 0, 1 1)>, ... '_original_edge_index': (10, 20)})]
>>> # Heterogeneous graph >>> buildings_gdf = gpd.GeoDataFrame(geometry=[Point(0, 0)], index=pd.Index(['b1'], name="b_id")) >>> streets_gdf = gpd.GeoDataFrame(geometry=[Point(1, 1)], index=pd.Index(['s1'], name="s_id")) >>> connections_gdf = gpd.GeoDataFrame( ... geometry=[LineString([(0,0), (1,1)])], ... index=pd.MultiIndex.from_tuples([('b1', 's1')]) ... ) >>> nodes_dict = {"building": buildings_gdf, "street": streets_gdf} >>> edges_dict = {("building", "connects_to", "street"): connections_gdf} >>> H = gdf_to_nx(nodes=nodes_dict, edges=edges_dict) >>> print(H.nodes(data=True)) >>> [(0, {'geometry': <POINT (0 0)>, ... 'node_type': 'building', ... '_original_index': 'b1', ... 'pos': (0.0, 0.0)}), ... (1, {'geometry': <POINT (1 1)>, ... 'node_type': 'street', ... '_original_index': 's1', ... 'pos': (1.0, 1.0)})] >>> print(H.edges(data=True)) >>> [(0, 1, {'geometry': <LINESTRING (0 0, 1 1)>, ... 'full_edge_type': ('building', 'connects_to', 'street'), ... '_original_edge_index': ('b1', 's1')})]
- nx_to_gdf(G, nodes=True, edges=True)[source]#
Convert a NetworkX graph to GeoDataFrames for nodes and/or edges.
This function reconstructs GeoDataFrames from a NetworkX graph that was created by gdf_to_nx or follows a similar structure. It can handle both homogeneous and heterogeneous graphs, extracting node and edge attributes and reconstructing geometries from position data.
- Parameters:
G (networkx.Graph or networkx.MultiGraph) – The NetworkX graph to convert. It is expected to have metadata stored in G.graph to guide the conversion, including CRS and heterogeneity information. Node positions are expected in a ‘pos’ attribute.
nodes (bool, default True) – If True, a GeoDataFrame for nodes will be created and returned.
edges (bool, default True) – If True, a GeoDataFrame for edges will be created and returned.
- Returns:
The returned type depends on the graph type and input parameters: - Homogeneous graph:
(nodes_gdf, edges_gdf) if nodes and edges are True.
nodes_gdf if only nodes is True.
edges_gdf if only edges is True.
- Heterogeneous graph:
(nodes_dict, edges_dict) where dicts map types to GeoDataFrames.
- Return type:
- Raises:
ValueError – If both nodes and edges are False.
See also
gdf_to_nx
Convert GeoDataFrames to a NetworkX graph.
Examples
>>> import networkx as nx >>> # Create a simple graph with spatial attributes >>> G = nx.Graph(is_hetero=False, crs="EPSG:4326") >>> G.add_node(0, pos=(0, 0), population=100, geometry=Point(0,0)) >>> G.add_node(1, pos=(1, 1), population=200, geometry=Point(1,1)) >>> G.add_edge(0, 1, weight=1.5, geometry=LineString([(0, 0), (1, 1)])) >>> # Convert back to GeoDataFrames >>> nodes_gdf, edges_gdf = nx_to_gdf(G) >>> print(nodes_gdf) >>> print(edges_gdf) >>> population geometry ... 0 100 POINT (0 0) ... 1 200 POINT (1 1) ... weight geometry ... 0 1 1.5 LINESTRING (0 0, 1 1)
- validate_gdf(nodes_gdf=None, edges_gdf=None, allow_empty=True)[source]#
Validate node and edge GeoDataFrames with type detection.
This function validates both homogeneous and heterogeneous GeoDataFrame inputs, performs type checking, and determines whether the input represents a heterogeneous graph structure.
- Parameters:
nodes_gdf (geopandas.GeoDataFrame or dict[str, geopandas.GeoDataFrame], optional) – The GeoDataFrame containing node data to validate, or a dictionary mapping node type names to GeoDataFrames for heterogeneous graphs.
edges_gdf (geopandas.GeoDataFrame or dict[tuple[str, str, str], geopandas.GeoDataFrame], optional) – The GeoDataFrame containing edge data to validate, or a dictionary mapping edge type tuples to GeoDataFrames for heterogeneous graphs.
allow_empty (bool, default True) – If True, allows the GeoDataFrames to be empty. If False, raises an error.
- Returns:
- geopandas.GeoDataFrame | dict[tuple[str, str, str], geopandas.GeoDataFrame] | None,
bool]
A tuple containing: - validated nodes_gdf (same type as input) - validated edges_gdf (same type as input) - is_hetero: boolean indicating if this is a heterogeneous graph
- Return type:
tuple[geopandas.GeoDataFrame | dict[str, geopandas.GeoDataFrame] | None,
- Raises:
TypeError – If an input is not a GeoDataFrame or appropriate dictionary type.
ValueError – If the input types are inconsistent or invalid.
See also
validate_nx
Validate a NetworkX graph.
Examples
>>> import geopandas as gpd >>> from shapely.geometry import Point, LineString >>> nodes = gpd.GeoDataFrame(geometry=[Point(0, 0)]) >>> edges = gpd.GeoDataFrame(geometry=[LineString([(0, 0), (1, 1)])]) >>> try: ... validated_nodes, validated_edges, is_hetero = validate_gdf(nodes, edges) ... print(f"Validation successful. Heterogeneous: {is_hetero}") ... except (TypeError, ValueError) as e: ... print(f"Validation failed: {e}") Validation successful. Heterogeneous: False
- validate_nx(graph)[source]#
Validate a NetworkX graph with comprehensive type checking.
Checks if the input is a NetworkX graph, ensures it is not empty (i.e., it has both nodes and edges), and verifies that it contains the necessary metadata for conversion back to GeoDataFrames or PyG objects.
- Parameters:
graph (networkx.Graph or networkx.MultiGraph) – The NetworkX graph to validate.
- Raises:
TypeError – If the input is not a NetworkX graph.
ValueError – If the graph has no nodes, no edges, or is missing essential metadata.
- Return type:
See also
validate_gdf
Validate GeoDataFrames for graph conversion.
Examples
>>> import networkx as nx >>> from shapely.geometry import Point >>> G = nx.Graph(is_hetero=False, crs="EPSG:4326") >>> G.add_node(0, pos=(0, 0)) >>> G.add_node(1, pos=(1, 1)) >>> G.add_edge(0, 1) >>> try: ... validate_nx(G) ... print("Validation successful.") ... except (TypeError, ValueError) as e: ... print(f"Validation failed: {e}") Validation successful.
- segments_to_graph(segments_gdf, multigraph=False)[source]#
Convert a GeoDataFrame of LineString segments into a graph structure.
This function takes a GeoDataFrame of LineStrings and processes it into a topologically explicit graph representation, consisting of a GeoDataFrame of unique nodes (the endpoints of the lines) and a GeoDataFrame of edges.
The resulting nodes GeoDataFrame contains unique points representing the start and end points of the input line segments. The edges GeoDataFrame is a copy of the input, but with a new MultiIndex (from_node_id, to_node_id) that references the IDs in the new nodes GeoDataFrame. If multigraph is True and there are multiple edges between the same pair of nodes, an additional index level (edge_key) is added to distinguish them.
- Parameters:
segments_gdf (geopandas.GeoDataFrame) – A GeoDataFrame where each row represents a line segment, and the ‘geometry’ column contains LineString objects.
multigraph (bool, default False) – If True, supports multiple edges between the same pair of nodes by adding an edge_key level to the MultiIndex. This is useful when the input contains duplicate node-to-node connections that should be preserved as separate edges.
- Returns:
A tuple containing two GeoDataFrames:
nodes_gdf: A GeoDataFrame of unique nodes (Points), indexed by node_id.
edges_gdf: A GeoDataFrame of edges (LineStrings), with a MultiIndex mapping to the node_id in nodes_gdf. If multigraph is True, the index includes a third level (edge_key) for duplicate connections.
- Return type:
Examples
>>> import geopandas as gpd >>> from shapely.geometry import LineString >>> # Create a GeoDataFrame of line segments >>> segments = gpd.GeoDataFrame( ... {"road_name": ["A", "B"]}, ... geometry=[LineString([(0, 0), (1, 1)]), LineString([(1, 1), (1, 0)])], ... crs="EPSG:32633" ... ) >>> # Convert to graph representation >>> nodes_gdf, edges_gdf = segments_to_graph(segments) >>> print(nodes_gdf) >>> print(edges_gdf) node_id geometry 0 POINT (0 0) 1 POINT (1 1) 2 POINT (1 0) road_name geometry from_node_id to_node_id 0 1 A LINESTRING (0 0, 1 1) 1 2 B LINESTRING (1 1, 1 0)
>>> # Example with duplicate connections (multigraph) >>> segments_with_duplicates = gpd.GeoDataFrame( ... {"road_name": ["A", "B", "C"]}, ... geometry=[LineString([(0, 0), (1, 1)]), ... LineString([(0, 0), (1, 1)]), ... LineString([(1, 1), (1, 0)])], ... crs="EPSG:32633" ... ) >>> nodes_gdf, edges_gdf = segments_to_graph(segments_with_duplicates, multigraph=True) >>> print(edges_gdf.index.names) ['from_node_id', 'to_node_id', 'edge_key']
- dual_graph(graph, edge_id_col, keep_original_geom=False, as_nx=False)[source]#
Convert a primal graph represented by nodes and edges GeoDataFrames to its dual graph.
In the dual graph, original edges become nodes and original nodes become edges connecting adjacent original edges.
- Parameters:
graph (tuple[geopandas.GeoDataFrame, geopandas.GeoDataFrame] or networkx.Graph or networkx.MultiGraph) – A graph containing nodes and edges GeoDataFrames or a NetworkX graph of the primal graph.
edge_id_col (str, optional) – The name of the column in the edges GeoDataFrame to be used as unique identifiers for dual graph nodes. If None, the index of the edges GeoDataFrame is used. Default is None.
keep_original_geom (bool, default False) – If True, preserve the original geometry of the edges in a new column named ‘original_geometry’ in the dual nodes GeoDataFrame.
as_nx (bool, default False) – If True, return the dual graph as a NetworkX graph instead of GeoDataFrames.
- Returns:
A tuple containing the nodes and edges of the dual graph as GeoDataFrames.
Dual nodes GeoDataFrame: Nodes represent original edges. The geometry is the centroid of the original edge’s geometry. The index is derived from edge_id_col or the original edge index.
Dual edges GeoDataFrame: Edges represent adjacency between original edges (i.e., they shared a node in the primal graph). The geometry is a LineString connecting the centroids of the two dual nodes. The index is a MultiIndex of the connected dual node IDs.
- Return type:
See also
segments_to_graph
Convert LineString segments to a graph structure.
Examples
>>> import geopandas as gpd >>> import pandas as pd >>> from shapely.geometry import Point, LineString >>> # Primal graph nodes >>> nodes = gpd.GeoDataFrame( ... {"node_id": [0, 1, 2]}, ... geometry=[Point(0, 0), Point(1, 1), Point(1, 0)], ... crs="EPSG:32633" ... ).set_index("node_id") >>> # Primal graph edges >>> edges = gpd.GeoDataFrame( ... {"edge_id": ["a", "b"]}, ... geometry=[LineString([(0, 0), (1, 1)]), LineString([(1, 1), (1, 0)])], ... crs="EPSG:32633" ... ).set_index(pd.MultiIndex.from_tuples([(0, 1), (1, 2)])) >>> # Convert to dual graph >>> dual_nodes, dual_edges = dual_graph( ... graph=(nodes, edges), edge_id_col="edge_id", keep_original_geom=True ... ) >>> print(dual_nodes) >>> print(dual_edges) >>> geometry original_geometry mm_len ... edge_id ... a LINESTRING (0 0, 1 1) LINESTRING (0 0, 1 1) 1.414214 ... b LINESTRING (1 1, 1 0) LINESTRING (1 1, 1 0) 1.000000 ... angle geometry ... from_edge_id to_edge_id ... a b 135.0 LINESTRING (0.5 0.5, 1 0.5)
- filter_graph_by_distance(graph, center_point, distance, edge_attr='length', node_id_col=None)[source]#
Filter a graph to include only elements within a specified distance from a center point.
This function calculates the shortest path from a center point to all nodes in the graph and returns a subgraph containing only the nodes (and their induced edges) that are within the given distance. The input can be a NetworkX graph or an edges GeoDataFrame.
- Parameters:
graph (geopandas.GeoDataFrame or networkx.Graph or networkx.MultiGraph) – The graph to filter. If a GeoDataFrame, it represents the edges of the graph and will be converted to a NetworkX graph internally.
center_point (Point or geopandas.GeoSeries) – The origin point(s) for the distance calculation. If multiple points are provided, the filter will include nodes reachable from any of them.
distance (float) – The maximum shortest-path distance for a node to be included in the filtered graph.
edge_attr (str, default "length") – The name of the edge attribute to use as weight for shortest path calculations (e.g., ‘length’, ‘travel_time’).
node_id_col (str, optional) – The name of the node identifier column if the input graph is a GeoDataFrame. Defaults to the index.
- Returns:
The filtered subgraph. The return type matches the input graph type. If the input was a GeoDataFrame, the output is a GeoDataFrame of the filtered edges.
- Return type:
geopandas.GeoDataFrame or networkx.Graph or networkx.MultiGraph
See also
create_isochrone
Generate an isochrone polygon from a graph.
Examples
>>> import networkx as nx >>> from shapely.geometry import Point >>> # Create a graph >>> G = nx.Graph() >>> G.add_node(0, pos=(0, 0)) >>> G.add_node(1, pos=(10, 0)) >>> G.add_node(2, pos=(20, 0)) >>> G.add_edge(0, 1, length=10) >>> G.add_edge(1, 2, length=10) >>> # Filter the graph >>> center = Point(1, 0) >>> filtered_graph = filter_graph_by_distance(G, center, distance=12) >>> print(list(filtered_graph.nodes)) >>> [0, 1]
- create_tessellation(geometry, primary_barriers=None, shrink=0.4, segment=0.5, threshold=0.05, n_jobs=-1, **kwargs)[source]#
Create tessellations from given geometries, with optional barriers.
This function generates either morphological or enclosed tessellations based on the input geometries. If primary_barriers are provided, it creates an enclosed tessellation; otherwise, it generates a morphological tessellation.
- Parameters:
geometry (geopandas.GeoDataFrame or geopandas.GeoSeries) – The geometries (typically building footprints) to tessellate around.
primary_barriers (geopandas.GeoDataFrame or geopandas.GeoSeries, optional) – Geometries (typically road network) to use as barriers for enclosed tessellation. If provided, momepy.enclosed_tessellation is used. Default is None.
shrink (float, default 0.4) – The distance to shrink the geometry for the skeleton endpoint generation. Passed to momepy.morphological_tessellation or momepy.enclosed_tessellation.
segment (float, default 0.5) – The segment length for discretizing the geometry. Passed to momepy.morphological_tessellation or momepy.enclosed_tessellation.
threshold (float, default 0.05) – The threshold for snapping skeleton endpoints to the boundary. Only used for enclosed tessellation.
n_jobs (int, default -1) – The number of jobs to use for parallel processing. -1 means using all available processors. Only used for enclosed tessellation.
**kwargs (object, optional) – Additional keyword arguments passed to the underlying momepy tessellation function.
- Returns:
A GeoDataFrame containing the tessellation cells as polygons. Each cell has a unique tess_id.
- Return type:
- Raises:
ValueError – If primary_barriers are not provided and the geometry is in a geographic CRS (e.g., EPSG:4326), as morphological tessellation requires a projected CRS.
See also
momepy.morphological_tessellation
Generate morphological tessellation.
momepy.enclosed_tessellation
Generate enclosed tessellation.
Examples
>>> import geopandas as gpd >>> from shapely.geometry import Polygon >>> # Create some building footprints >>> buildings = gpd.GeoDataFrame( ... geometry=[Polygon([(0, 0), (1, 0), (1, 1), (0, 1)]), ... Polygon([(2, 2), (3, 2), (3, 3), (2, 3)])], ... crs="EPSG:32633" ... ) >>> # Generate morphological tessellation >>> tessellation = create_tessellation(buildings) >>> print(tessellation.head())
>>> # Generate enclosed tessellation with roads as barriers >>> from shapely.geometry import LineString >>> roads = gpd.GeoDataFrame( ... geometry=[LineString([(0, -1), (3, -1)]), LineString([(1.5, -1), (1.5, 4)])], ... crs="EPSG:32633" ... ) >>> enclosed_tess = create_tessellation(buildings, primary_barriers=roads) >>> print(enclosed_tess.head())
- create_isochrone(graph, center_point, distance, edge_attr='length')[source]#
Generate an isochrone polygon from a graph.
An isochrone represents the area reachable from a center point within a given travel distance or time. This function computes the set of reachable edges and nodes in a network and generates a polygon (the convex hull) that encloses this reachable area.
- Parameters:
graph (geopandas.GeoDataFrame or networkx.Graph or networkx.MultiGraph) – The network graph. If a GeoDataFrame, it represents the edges of the graph.
center_point (Point or geopandas.GeoSeries or geopandas.GeoDataFrame) – The origin point(s) for the isochrone calculation.
distance (float) – The maximum travel distance (or time) that defines the boundary of the isochrone.
edge_attr (str, default "length") – The edge attribute to use as the cost of travel (e.g., ‘length’, ‘travel_time’).
- Returns:
A GeoDataFrame containing a single Polygon geometry that represents the isochrone.
- Return type:
See also
filter_graph_by_distance
Filter a graph by distance from a center point.
Examples
>>> import networkx as nx >>> from shapely.geometry import Point >>> # Create a graph >>> G = nx.Graph(crs="EPSG:32633") >>> G.add_node(0, pos=(0, 0)) >>> G.add_node(1, pos=(10, 0)) >>> G.add_node(2, pos=(0, 10)) >>> G.add_edge(0, 1, length=10) >>> G.add_edge(0, 2, length=10) >>> # Create an isochrone >>> center = Point(0, 0) >>> isochrone = create_isochrone(G, center, distance=12) >>> print(isochrone.geometry.iloc[0].wkt) POLYGON ((0 0, 10 0, 0 10, 0 0))