topologicpy.PyG module

class topologicpy.PyG.PyG(path: str, config: _RunConfig)

Bases: object

A clean PyTorch Geometric interface for TopologicPy-exported CSV datasets.

You can control medium-level hyperparameters by passing keyword arguments to ByCSVPath, for example:

pyg = PyG.ByCSVPath(: path=”C:/dataset”, level=”graph”, task=”classification”, graphLabelType=”categorical”, cv=”kfold”, k_folds=5, conv=”gatv2”, hidden_dims=(128, 128, 64), activation=”gelu”, batch_norm=True, residual=True, dropout=0.2, lr=1e-3, optimizer=”adamw”, early_stopping=True, early_stopping_patience=10, gradient_clip_norm=1.0

)

Methods

`ByCSVPath`(path[, level, task, ...])	Creates a `PyG` instance from a TopologicPy-exported CSV dataset folder.
`CrossValidate`([k_folds, epochs, batch_size])	Perform k-fold cross-validation for graph-level tasks.
`LoadModel`(path[, strict, ...])	Load model weights from disk.
`MetadataByGraphID`(graphID)	Returns preserved metadata for one graph id.
`OntologyMetadata`()	Returns preserved ontology and semantic metadata for the loaded dataset.
`PlotConfusionMatrix`([split, normalize, ...])	Returns a Plotly Figure of the confusion matrix of the inference.
`PlotCrossValidationSummary`([cv_report, ...])
`PlotHistory`()
`PlotParity`([split, title, xTitle, yTitle, ...])	Plot a parity / correlation plot for regression tasks by delegating to `Plotly.FigureByCorrelation`.
`Predict`([split, threshold, return_logits, ...])	Run inference (prediction) using the current model on the loaded dataset.
`SaveModel`(path[, include_config])	Save the model to disk.
`SetHyperparameters`(**kwargs)	Set one or more configuration values (hyperparameters) on this instance.
`Summary`()	Return a compact summary of the current configuration and dataset size.
`Test`()	Compute metrics on the test split.
`Train`([epochs, batch_size])	Train the model using the current configuration.
`Validate`()	Compute metrics on the validation split.

static ByCSVPath(path: str, level: Literal['graph', 'node', 'edge', 'link'] = 'graph', task: Literal['classification', 'regression', 'link_prediction'] = 'classification', graphLabelType: Literal['categorical', 'continuous'] = 'categorical', nodeLabelType: Literal['categorical', 'continuous'] = 'categorical', edgeLabelType: Literal['categorical', 'continuous'] = 'categorical', ontology: bool = True, **kwargs) → PyG

Creates a PyG instance from a TopologicPy-exported CSV dataset folder.

The dataset folder is expected to contain three files:

graphs.csv : one row per graph (graph-level labels/features)
nodes.csv : one row per node (node-level labels/features/masks)
edges.csv : one row per edge (edge-level labels/features/masks)

The created instance immediately loads the CSVs, builds a list of torch_geometric.data.Data objects, performs an initial holdout split (for graph-level tasks), and builds a default model according to the provided configuration.

Parameters

pathstr

Path to the dataset folder that contains graphs.csv, nodes.csv, and edges.csv.

level{“graph”, “node”, “edge”, “link”}, optional

The prediction level:

"graph": graph-level labels in graphs.csv
"node" : node-level labels in nodes.csv
"edge" : edge-level labels in edges.csv
"link" : link prediction (binary edge existence)

task{“classification”, “regression”, “link_prediction”}, optional

The learning task. For level="link" this should be "link_prediction".

graphLabelType{“categorical”, “continuous”}, optional

Label type for graph-level targets (used when level="graph").

nodeLabelType{“categorical”, “continuous”}, optional

Label type for node-level targets (used when level="node").

edgeLabelType{“categorical”, “continuous”}, optional

Label type for edge-level targets (used when level="edge").

ontologybool, optional

If True, preserves ontology and semantic metadata columns from graphs.csv, nodes.csv, and edges.csv. These columns are stored as metadata and are not converted into numeric feature tensors. Default is True.

**kwargsdict

Optional overrides for any field in _RunConfig. Common examples include conv, hidden_dims, activation, dropout, batch_norm, residual, pooling, epochs, batch_size, lr, weight_decay, and cross-validation options.

Returns

PyG: The created PyG instance.

Raises

ValueError: If the path does not exist, required CSV files are missing, or no node feature columns are found.

Examples

. pyg = PyG.ByCSVPath(path=”C:/dataset”, level=”graph”, task=”classification”) . history = pyg.Train(epochs=50)

CrossValidate(k_folds: Optional[int] = None, epochs: Optional[int] = None, batch_size: Optional[int] = None) → Dict[str, Union[float, List[Dict[str, float]]]]

Perform k-fold cross-validation for graph-level tasks.

This method rebuilds and retrains a fresh model per fold, evaluates on the fold’s held-out set, and returns fold-wise metrics along with mean/std aggregates.

Parameters

k_foldsint, optional: Number of folds. Defaults to config.k_folds.
epochsint, optional: Training epochs per fold. Defaults to config.epochs.
batch_sizeint, optional: Batch size for DataLoader. Defaults to config.batch_size.

Returns

dict

A dictionary of the form:

{
  "fold_metrics": [{"fold": 0, ...}, {"fold": 1, ...}, ...],
  "mean_<metric>": ...,
  "std_<metric>": ...
}

Raises

ValueError: If called for non-graph levels, or if k_folds < 2.

Notes

Stratified folding is available for categorical graph labels when config.k_stratify is True.
Cross-validation is intentionally limited to graph-level tasks; node/edge tasks typically rely on per-graph masks rather than splitting graphs.

LoadModel(path: str, strict: bool = True, rebuild_from_checkpoint: bool = True)

Load model weights from disk.

This method is backward compatible with older .pt files that contain only a raw state_dict. If the file contains a checkpoint dict produced by SaveModel() with include_config=True, the model can be rebuilt automatically to match the saved architecture.

Parameters

pathstr: Path to a .pt file.
strictbool, optional: Passed to load_state_dict. Default is True.
rebuild_from_checkpointbool, optional: If True and the checkpoint contains saved config fields, rebuilds the model before loading weights. Default is True.

Returns

None

MetadataByGraphID(graphID) → Dict[str, object]

Returns preserved metadata for one graph id.

Parameters

graphIDany: The graph id value as stored in graphs.csv.

Returns

dict: A dictionary containing graph, node, and edge metadata.

OntologyMetadata() → Dict[str, object]

Returns preserved ontology and semantic metadata for the loaded dataset.

Returns

dict: A dictionary with graphs, nodes, edges and columns sections. Metadata is keyed by graph index and graph id where possible.

PlotConfusionMatrix(split: str = 'test', normalize: bool = False, minValue: int = None, maxValue: int = None, title: str = None, xTitle: str = 'Actual Categories', yTitle: str = 'Predicted Categories', width: int = 950, height: int = 500, showScale: bool = True, colorScale: str = 'viridis', colorSamples: int = 10, backgroundColor: str = 'rgba(0,0,0,0)', marginLeft: int = 0, marginRight: int = 0, marginTop: int = 40, marginBottom: int = 0, baseFontSize: int = 16, tickFontSize: int = 14, titleFontSize: int = 22, axisTitleFontSize: int = 16, annotationFontSize: int = 18, grayScale: bool = False, mantissa: int = 6)

Returns a Plotly Figure of the confusion matrix of the inference. Actual categories are displayed on the X-Axis, Predicted categories are displayed on the Y-Axis.

Parameters

splitstr , optional: Which split(s) to evaluate. Options are: {“train”,”val”,”validate”,”validation”,”test”,”all”}. Default is “test”.
normalizebool, optional: If True, row-normalize the confusion matrix. Default is False.
minValuefloat , optional: The desired minimum value to use for the color scale. If set to None, the minimum value found in the input data will be used.
maxValuefloat , optional: The desired maximum value to use for the color scale. If set to None, the maximum value found in the input data will be used.
titlestr , optional: The desired title to display. Default is “Confusion Matrix”.
xTitlestr , optional: The desired X-axis title to display. Default is “Actual Categories”.
yTitlestr , optional: The desired Y-axis title to display. Default is “Predicted Categories”.
widthint , optional: The desired width of the figure. Default is 950.
heightint , optional: The desired height of the figure. Default is 500.
showScalebool , optional: If set to True, a color scale is shown on the right side of the figure. Default is True.
colorScalestr , optional: The desired type of plotly color scales to use (e.g. “Viridis”, “Plasma”). Default is “Viridis”.
colorSamplesint , optional: The number of discrete color samples to use for displaying the data. Default is 10.
backgroundColorlist or str , optional: The desired background color (see docstring above). Default is transparent.
marginLeft, marginRight, marginTop, marginBottomint , optional: Plot margins in pixels.
baseFontSizeint , optional: The base font size. Default is 16.
tickFontSizeint , optional: The tick font size. Default is 14.
titleFontSizeint , optional: The title font size. Default is 22.
axisTitleFontSizeint , optional: The axis title font size. Default is 16.
annotationFontSizeint , optional: The annotation font size. Default is 18.
grayScalebool , optional: If set to True, the figure is rendered in grayscale. Default is False.
mantissaint , optional: The desired length of the mantissa. Default is 6.

Returns

plotly.graph_objects.Figure: The created plotly figure.

PlotCrossValidationSummary(cv_report: Optional[Dict[str, Union[float, List[Dict[str, float]]]]] = None, metrics: Optional[List[str]] = None, show_mean_std: bool = True)

PlotHistory()

PlotParity(split: str = 'test', title: str = None, xTitle: str = 'Actual Values', yTitle: str = 'Predicted Values', showIdentity: bool = True, showBestFit: bool = True, dotSize: int = 6, dotColor: str = 'blue', lineColor: str = 'red', width: int = 800, height: int = 600, theme: str = 'default', backgroundColor: str = 'rgba(0,0,0,0)', marginLeft: int = 0, marginRight: int = 0, marginTop: int = 40, marginBottom: int = 0)

Plot a parity / correlation plot for regression tasks by delegating to Plotly.FigureByCorrelation.

Parameters

split{“train”, “val”, “validate”, “validation”, “test”, “all”}, optional: Which split to evaluate. Default is "test".
titlestr, optional: Custom plot title. If None, an automatic title is generated.
xTitlestr, optional: The X-axis title. Default is "Actual Values".
yTitlestr, optional: The Y-axis title. Default is "Predicted Values".
showIdentitybool, optional: If set to true, shows the 45 degree line.
showBestFitbool, optional: If set to True, draws the best fit line through the data.
dotSizeint, optional: The marker size
dotColorstr, optional: Dot color passed to Plotly.FigureByCorrelation.
lineColorstr, optional: Best-fit line color passed to Plotly.FigureByCorrelation.
widthint, optional: Figure width in pixels.
heightint, optional: Figure height in pixels.
themestr, optional: Plotly theme. Options are "dark", "light", "default".
backgroundColorstr, optional: Figure background color.
marginLeftint, optional: Left margin in pixels.
marginRightint, optional: Right margin in pixels.
marginTopint, optional: Top margin in pixels.
marginBottomint, optional: Bottom margin in pixels.

Returns

plotly.graph_objects.Figure: A correlation figure of actual vs predicted values.

Raises

ValueError: If called when config.task is not "regression" or when config.level is "link".
RuntimeError: If no regression labels are found for the requested split(s).

Notes

For node/edge regression, the method uses the corresponding boolean masks

on each graph and aggregates across all graphs. - This method relies on _predict_graph(), _predict_node(), and _predict_edge(). - show_identity, show_best_fit, and point_size are kept only for API compatibility. The delegated Plotly method always shows the best-fit line and 45-degree line, and does not expose point size.

Predict(split: str = 'all', threshold: float = 0.5, return_logits: bool = False, return_probs: bool = True, return_embeddings: bool = False, attach_to_data: bool = False, pred_key: str = 'pred', prob_key: str = 'prob', logits_key: str = 'logits', emb_key: str = 'emb') → Dict[str, object]

Run inference (prediction) using the current model on the loaded dataset.

This method is designed for post-training workflows, including the common pattern of train → save → reload → predict on unseen data. It performs forward passes only (no gradient computation) and returns predictions in a compact, serializable form.

Behaviour depends on level:

"graph": graph-level prediction using a mini-batched DataLoader
"node" : node-level prediction using node masks (train_mask, val_mask, test_mask)
"edge" : edge-level prediction using edge masks (edge_train_mask, edge_val_mask, edge_test_mask)
"link" : link prediction using RandomLinkSplit per graph

Parameters

splitstr, optional

The subset to predict. Supported values depend on config.level:

graph-level: "train", "val", "test", "all"
node-level : "train", "val", "test", "all" ("all" returns full-length vectors)
edge-level : "train", "val", "test", "all" ("all" returns full-length vectors)
link-level : "train", "val", "test" ("all" is treated as "test")

Default is "all".

thresholdfloat, optional

Threshold for converting link-prediction probabilities into binary labels. Only used when config.level == "link". Default is 0.5.

return_logitsbool, optional

If True, includes raw model outputs (logits) in the returned dictionary. For regression tasks, logits are the raw predictions. Default is False.

return_probsbool, optional

If True, includes probabilities/scores when applicable:

classification: softmax probabilities
link prediction: sigmoid probabilities
regression: ignored (no probabilities)

Default is True.

return_embeddingsbool, optional

If True, includes the node embeddings produced by the GNN backbone (the output of model["encoder"]) for each predicted batch/graph. Default is False.

attach_to_databool, optional

If True, attaches prediction tensors to each Data object in data_list using keys pred_key, prob_key, logits_key, and emb_key. This is useful for downstream processing (e.g., exporting to CSV or mapping back to Topologic entities). Default is False.

pred_keystr, optional

Attribute name to attach predicted labels/values to each Data object when attach_to_data is True. Default is "pred".

prob_keystr, optional

Attribute name to attach probabilities/scores to each Data object when attach_to_data is True. Default is "prob".

logits_keystr, optional

Attribute name to attach logits/raw outputs to each Data object when attach_to_data is True. Default is "logits".

emb_keystr, optional

Attribute name to attach encoder embeddings to each Data object when attach_to_data is True. Default is "emb".

Returns

dict

A dictionary containing (at minimum) the key "pred" with predictions.

Graph-level

"pred": (N,) predicted class indices or regression values
"y_true": (N,) true labels/targets if present
"index": (N,) integer indices aligned with self.data_list order

Node/Edge-level

"pred": list of arrays (one per graph) unless split != "all"
"y_true": list of arrays (one per graph) if present
"mask": mask name used when split in {train,val,test}

Link-level

"score": sigmoid probabilities for edge_label_index samples
"pred": binary predictions derived from threshold
"y_true": binary ground truth labels for sampled links

Raises

ValueError: If split or config.level is unsupported, or if the model is not initialised.

Notes

This method assumes you have already called ByCSVPath()

(or otherwise populated data_list), and that model is loaded/initialised (e.g., via Train() or LoadModel()). - For classification tasks, the returned class indices follow the encoding present in the CSV labels.

SaveModel(path: str, include_config: bool = True)

Save the model to disk.

Parameters

pathstr: Output file path. If the extension is not .pt, it is appended automatically.
include_configbool, optional: If True, saves enough configuration alongside weights to rebuild the model on load. Default is True.

Returns

None

SetHyperparameters(**kwargs) → Dict[str, Union[str, int, float, bool, Tuple]]

Set one or more configuration values (hyperparameters) on this instance.

This method updates config fields using keyword arguments. If any model-shaping setting changes (e.g. conv, hidden_dims, activation, dropout, batch_norm, residual, pooling), the model is rebuilt automatically.

Parameters

**kwargsdict: Key/value pairs matching fields in _RunConfig. Unknown keys are ignored.

Returns

dict: A compact configuration summary (same as Summary()).

Raises

ValueError: If an attempted setting fails validation (e.g. malformed split or empty hidden_dims).

Notes

For graph-level tasks, changing split affects holdout splitting. You may want to call ByCSVPath() again (or re-instantiate) if you need a fresh split with new ratios.
For node/edge tasks, masks are taken from CSV columns if present; otherwise they are generated using split ratios within each graph.

Summary() → Dict[str, Union[str, int, float, bool, Tuple]]

Return a compact summary of the current configuration and dataset size.

Returns

dict: A dictionary containing key configuration choices such as level, task, network options (conv, hidden_dims, etc.), training hyperparameters, current device, and basic dataset counts.

Notes

This is intended to be a lightweight, ReadTheDocs-friendly snapshot suitable for logging and reproducibility.

Test() → Dict[str, float]

Compute metrics on the test split.

Returns

dict

A dictionary of metric values. Key names are prefixed depending on task:

graph-level: keys are prefixed with "test_"
node/edge/link: keys are prefixed with "test_" via internal helpers

Raises

ValueError: If the configured level is unsupported.

Train(epochs: Optional[int] = None, batch_size: Optional[int] = None) → Dict[str, List[float]]

Train the model using the current configuration.

Training behaviour depends on level:

"graph": uses the current holdout split (train/val sets)
"node" : uses in-graph boolean masks (train_mask, val_mask)
"edge" : uses in-graph boolean masks (edge_train_mask, edge_val_mask)
"link" : uses torch_geometric.transforms.RandomLinkSplit per graph

Parameters

epochsint, optional: If provided, overrides config.epochs for this run.
batch_sizeint, optional: If provided, overrides config.batch_size for this run. For node/edge/link tasks the loader uses batch_size=1 (one graph at a time).

Returns

dict: Training history dictionary with keys "train_loss" and "val_loss". Each value is a list of floats (one per epoch).

Notes

For graph-level tasks, early stopping can be enabled via config.early_stopping and config.early_stopping_patience.
For k-fold cross-validation on graph-level tasks, use CrossValidate() instead.

Validate() → Dict[str, float]

Compute metrics on the validation split.

Returns

dict

A dictionary of metric values. Key names are prefixed depending on task:

graph-level: keys are prefixed with "val_"
node/edge/link: keys are prefixed with "val_" via internal helpers

Raises

ValueError: If the configured level is unsupported.