topologicpy.PyG module

class topologicpy.PyG.PyG(path: str, config: _RunConfig)

Bases: object

A clean PyTorch Geometric interface for TopologicPy-exported CSV datasets.

You can control medium-level hyperparameters by passing keyword arguments to ByCSVPath, for example:

pyg = PyG.ByCSVPath(

path=”C:/dataset”, level=”graph”, task=”classification”, graphLabelType=”categorical”, cv=”kfold”, k_folds=5, conv=”gatv2”, hidden_dims=(128, 128, 64), activation=”gelu”, batch_norm=True, residual=True, dropout=0.2, lr=1e-3, optimizer=”adamw”, early_stopping=True, early_stopping_patience=10, gradient_clip_norm=1.0

)

Methods

ByCSVPath(path[, level, task, ...])

Creates a PyG instance from a TopologicPy-exported CSV dataset folder.

CrossValidate([k_folds, epochs, batch_size])

Perform k-fold cross-validation for graph-level tasks.

LoadModel(path[, strict, ...])

Load model weights from disk.

MetadataByGraphID(graphID)

Returns preserved metadata for one graph id.

OntologyMetadata()

Returns preserved ontology and semantic metadata for the loaded dataset.

PlotConfusionMatrix([split, normalize, ...])

Returns a Plotly Figure of the confusion matrix of the inference.

PlotCrossValidationSummary([cv_report, ...])

PlotHistory()

PlotParity([split, title, xTitle, yTitle, ...])

Plot a parity / correlation plot for regression tasks by delegating to Plotly.FigureByCorrelation.

Predict([split, threshold, return_logits, ...])

Run inference (prediction) using the current model on the loaded dataset.

SaveModel(path[, include_config])

Save the model to disk.

SetHyperparameters(**kwargs)

Set one or more configuration values (hyperparameters) on this instance.

Summary()

Return a compact summary of the current configuration and dataset size.

Test()

Compute metrics on the test split.

Train([epochs, batch_size])

Train the model using the current configuration.

Validate()

Compute metrics on the validation split.

static ByCSVPath(path: str, level: Literal['graph', 'node', 'edge', 'link'] = 'graph', task: Literal['classification', 'regression', 'link_prediction'] = 'classification', graphLabelType: Literal['categorical', 'continuous'] = 'categorical', nodeLabelType: Literal['categorical', 'continuous'] = 'categorical', edgeLabelType: Literal['categorical', 'continuous'] = 'categorical', ontology: bool = True, **kwargs) PyG

Creates a PyG instance from a TopologicPy-exported CSV dataset folder.

The dataset folder is expected to contain three files:

  • graphs.csv : one row per graph (graph-level labels/features)

  • nodes.csv : one row per node (node-level labels/features/masks)

  • edges.csv : one row per edge (edge-level labels/features/masks)

The created instance immediately loads the CSVs, builds a list of torch_geometric.data.Data objects, performs an initial holdout split (for graph-level tasks), and builds a default model according to the provided configuration.

Parameters
pathstr

Path to the dataset folder that contains graphs.csv, nodes.csv, and edges.csv.

level{“graph”, “node”, “edge”, “link”}, optional

The prediction level:

  • "graph": graph-level labels in graphs.csv

  • "node" : node-level labels in nodes.csv

  • "edge" : edge-level labels in edges.csv

  • "link" : link prediction (binary edge existence)

task{“classification”, “regression”, “link_prediction”}, optional

The learning task. For level="link" this should be "link_prediction".

graphLabelType{“categorical”, “continuous”}, optional

Label type for graph-level targets (used when level="graph").

nodeLabelType{“categorical”, “continuous”}, optional

Label type for node-level targets (used when level="node").

edgeLabelType{“categorical”, “continuous”}, optional

Label type for edge-level targets (used when level="edge").

ontologybool, optional

If True, preserves ontology and semantic metadata columns from graphs.csv, nodes.csv, and edges.csv. These columns are stored as metadata and are not converted into numeric feature tensors. Default is True.

**kwargsdict

Optional overrides for any field in _RunConfig. Common examples include conv, hidden_dims, activation, dropout, batch_norm, residual, pooling, epochs, batch_size, lr, weight_decay, and cross-validation options.

Returns
PyG

The created PyG instance.

Raises
ValueError

If the path does not exist, required CSV files are missing, or no node feature columns are found.

Examples

. pyg = PyG.ByCSVPath(path=”C:/dataset”, level=”graph”, task=”classification”) . history = pyg.Train(epochs=50)

CrossValidate(k_folds: Optional[int] = None, epochs: Optional[int] = None, batch_size: Optional[int] = None) Dict[str, Union[float, List[Dict[str, float]]]]

Perform k-fold cross-validation for graph-level tasks.

This method rebuilds and retrains a fresh model per fold, evaluates on the fold’s held-out set, and returns fold-wise metrics along with mean/std aggregates.

Parameters
k_foldsint, optional

Number of folds. Defaults to config.k_folds.

epochsint, optional

Training epochs per fold. Defaults to config.epochs.

batch_sizeint, optional

Batch size for DataLoader. Defaults to config.batch_size.

Returns
dict

A dictionary of the form:

{
  "fold_metrics": [{"fold": 0, ...}, {"fold": 1, ...}, ...],
  "mean_<metric>": ...,
  "std_<metric>": ...
}
Raises
ValueError

If called for non-graph levels, or if k_folds < 2.

Notes

  • Stratified folding is available for categorical graph labels when config.k_stratify is True.

  • Cross-validation is intentionally limited to graph-level tasks; node/edge tasks typically rely on per-graph masks rather than splitting graphs.

LoadModel(path: str, strict: bool = True, rebuild_from_checkpoint: bool = True)

Load model weights from disk.

This method is backward compatible with older .pt files that contain only a raw state_dict. If the file contains a checkpoint dict produced by SaveModel() with include_config=True, the model can be rebuilt automatically to match the saved architecture.

Parameters
pathstr

Path to a .pt file.

strictbool, optional

Passed to load_state_dict. Default is True.

rebuild_from_checkpointbool, optional

If True and the checkpoint contains saved config fields, rebuilds the model before loading weights. Default is True.

Returns
None
MetadataByGraphID(graphID) Dict[str, object]

Returns preserved metadata for one graph id.

Parameters
graphIDany

The graph id value as stored in graphs.csv.

Returns
dict

A dictionary containing graph, node, and edge metadata.

OntologyMetadata() Dict[str, object]

Returns preserved ontology and semantic metadata for the loaded dataset.

Returns
dict

A dictionary with graphs, nodes, edges and columns sections. Metadata is keyed by graph index and graph id where possible.

PlotConfusionMatrix(split: str = 'test', normalize: bool = False, minValue: int = None, maxValue: int = None, title: str = None, xTitle: str = 'Actual Categories', yTitle: str = 'Predicted Categories', width: int = 950, height: int = 500, showScale: bool = True, colorScale: str = 'viridis', colorSamples: int = 10, backgroundColor: str = 'rgba(0,0,0,0)', marginLeft: int = 0, marginRight: int = 0, marginTop: int = 40, marginBottom: int = 0, baseFontSize: int = 16, tickFontSize: int = 14, titleFontSize: int = 22, axisTitleFontSize: int = 16, annotationFontSize: int = 18, grayScale: bool = False, mantissa: int = 6)

Returns a Plotly Figure of the confusion matrix of the inference. Actual categories are displayed on the X-Axis, Predicted categories are displayed on the Y-Axis.

Parameters
splitstr , optional

Which split(s) to evaluate. Options are: {“train”,”val”,”validate”,”validation”,”test”,”all”}. Default is “test”.

normalizebool, optional

If True, row-normalize the confusion matrix. Default is False.

minValuefloat , optional

The desired minimum value to use for the color scale. If set to None, the minimum value found in the input data will be used.

maxValuefloat , optional

The desired maximum value to use for the color scale. If set to None, the maximum value found in the input data will be used.

titlestr , optional

The desired title to display. Default is “Confusion Matrix”.

xTitlestr , optional

The desired X-axis title to display. Default is “Actual Categories”.

yTitlestr , optional

The desired Y-axis title to display. Default is “Predicted Categories”.

widthint , optional

The desired width of the figure. Default is 950.

heightint , optional

The desired height of the figure. Default is 500.

showScalebool , optional

If set to True, a color scale is shown on the right side of the figure. Default is True.

colorScalestr , optional

The desired type of plotly color scales to use (e.g. “Viridis”, “Plasma”). Default is “Viridis”.

colorSamplesint , optional

The number of discrete color samples to use for displaying the data. Default is 10.

backgroundColorlist or str , optional

The desired background color (see docstring above). Default is transparent.

marginLeft, marginRight, marginTop, marginBottomint , optional

Plot margins in pixels.

baseFontSizeint , optional

The base font size. Default is 16.

tickFontSizeint , optional

The tick font size. Default is 14.

titleFontSizeint , optional

The title font size. Default is 22.

axisTitleFontSizeint , optional

The axis title font size. Default is 16.

annotationFontSizeint , optional

The annotation font size. Default is 18.

grayScalebool , optional

If set to True, the figure is rendered in grayscale. Default is False.

mantissaint , optional

The desired length of the mantissa. Default is 6.

Returns
plotly.graph_objects.Figure

The created plotly figure.

PlotCrossValidationSummary(cv_report: Optional[Dict[str, Union[float, List[Dict[str, float]]]]] = None, metrics: Optional[List[str]] = None, show_mean_std: bool = True)
PlotHistory()
PlotParity(split: str = 'test', title: str = None, xTitle: str = 'Actual Values', yTitle: str = 'Predicted Values', showIdentity: bool = True, showBestFit: bool = True, dotSize: int = 6, dotColor: str = 'blue', lineColor: str = 'red', width: int = 800, height: int = 600, theme: str = 'default', backgroundColor: str = 'rgba(0,0,0,0)', marginLeft: int = 0, marginRight: int = 0, marginTop: int = 40, marginBottom: int = 0)

Plot a parity / correlation plot for regression tasks by delegating to Plotly.FigureByCorrelation.

Parameters
split{“train”, “val”, “validate”, “validation”, “test”, “all”}, optional

Which split to evaluate. Default is "test".

titlestr, optional

Custom plot title. If None, an automatic title is generated.

xTitlestr, optional

The X-axis title. Default is "Actual Values".

yTitlestr, optional

The Y-axis title. Default is "Predicted Values".

showIdentitybool, optional

If set to true, shows the 45 degree line.

showBestFitbool, optional

If set to True, draws the best fit line through the data.

dotSizeint, optional

The marker size

dotColorstr, optional

Dot color passed to Plotly.FigureByCorrelation.

lineColorstr, optional

Best-fit line color passed to Plotly.FigureByCorrelation.

widthint, optional

Figure width in pixels.

heightint, optional

Figure height in pixels.

themestr, optional

Plotly theme. Options are "dark", "light", "default".

backgroundColorstr, optional

Figure background color.

marginLeftint, optional

Left margin in pixels.

marginRightint, optional

Right margin in pixels.

marginTopint, optional

Top margin in pixels.

marginBottomint, optional

Bottom margin in pixels.

Returns
plotly.graph_objects.Figure

A correlation figure of actual vs predicted values.

Raises
ValueError

If called when config.task is not "regression" or when config.level is "link".

RuntimeError

If no regression labels are found for the requested split(s).

Notes

  • For node/edge regression, the method uses the corresponding boolean masks

on each graph and aggregates across all graphs. - This method relies on _predict_graph(), _predict_node(), and _predict_edge(). - show_identity, show_best_fit, and point_size are kept only for API compatibility. The delegated Plotly method always shows the best-fit line and 45-degree line, and does not expose point size.

Predict(split: str = 'all', threshold: float = 0.5, return_logits: bool = False, return_probs: bool = True, return_embeddings: bool = False, attach_to_data: bool = False, pred_key: str = 'pred', prob_key: str = 'prob', logits_key: str = 'logits', emb_key: str = 'emb') Dict[str, object]

Run inference (prediction) using the current model on the loaded dataset.

This method is designed for post-training workflows, including the common pattern of train → save → reload → predict on unseen data. It performs forward passes only (no gradient computation) and returns predictions in a compact, serializable form.

Behaviour depends on level:

  • "graph": graph-level prediction using a mini-batched DataLoader

  • "node" : node-level prediction using node masks (train_mask, val_mask, test_mask)

  • "edge" : edge-level prediction using edge masks (edge_train_mask, edge_val_mask, edge_test_mask)

  • "link" : link prediction using RandomLinkSplit per graph

Parameters
splitstr, optional

The subset to predict. Supported values depend on config.level:

  • graph-level: "train", "val", "test", "all"

  • node-level : "train", "val", "test", "all" ("all" returns full-length vectors)

  • edge-level : "train", "val", "test", "all" ("all" returns full-length vectors)

  • link-level : "train", "val", "test" ("all" is treated as "test")

Default is "all".

thresholdfloat, optional

Threshold for converting link-prediction probabilities into binary labels. Only used when config.level == "link". Default is 0.5.

return_logitsbool, optional

If True, includes raw model outputs (logits) in the returned dictionary. For regression tasks, logits are the raw predictions. Default is False.

return_probsbool, optional

If True, includes probabilities/scores when applicable:

  • classification: softmax probabilities

  • link prediction: sigmoid probabilities

  • regression: ignored (no probabilities)

Default is True.

return_embeddingsbool, optional

If True, includes the node embeddings produced by the GNN backbone (the output of model["encoder"]) for each predicted batch/graph. Default is False.

attach_to_databool, optional

If True, attaches prediction tensors to each Data object in data_list using keys pred_key, prob_key, logits_key, and emb_key. This is useful for downstream processing (e.g., exporting to CSV or mapping back to Topologic entities). Default is False.

pred_keystr, optional

Attribute name to attach predicted labels/values to each Data object when attach_to_data is True. Default is "pred".

prob_keystr, optional

Attribute name to attach probabilities/scores to each Data object when attach_to_data is True. Default is "prob".

logits_keystr, optional

Attribute name to attach logits/raw outputs to each Data object when attach_to_data is True. Default is "logits".

emb_keystr, optional

Attribute name to attach encoder embeddings to each Data object when attach_to_data is True. Default is "emb".

Returns
dict

A dictionary containing (at minimum) the key "pred" with predictions.

Graph-level
  • "pred": (N,) predicted class indices or regression values

  • "y_true": (N,) true labels/targets if present

  • "index": (N,) integer indices aligned with self.data_list order

Node/Edge-level
  • "pred": list of arrays (one per graph) unless split != "all"

  • "y_true": list of arrays (one per graph) if present

  • "mask": mask name used when split in {train,val,test}

Link-level
  • "score": sigmoid probabilities for edge_label_index samples

  • "pred": binary predictions derived from threshold

  • "y_true": binary ground truth labels for sampled links

Raises
ValueError

If split or config.level is unsupported, or if the model is not initialised.

Notes

  • This method assumes you have already called ByCSVPath()

(or otherwise populated data_list), and that model is loaded/initialised (e.g., via Train() or LoadModel()). - For classification tasks, the returned class indices follow the encoding present in the CSV labels.

SaveModel(path: str, include_config: bool = True)

Save the model to disk.

Parameters
pathstr

Output file path. If the extension is not .pt, it is appended automatically.

include_configbool, optional

If True, saves enough configuration alongside weights to rebuild the model on load. Default is True.

Returns
None
SetHyperparameters(**kwargs) Dict[str, Union[str, int, float, bool, Tuple]]

Set one or more configuration values (hyperparameters) on this instance.

This method updates config fields using keyword arguments. If any model-shaping setting changes (e.g. conv, hidden_dims, activation, dropout, batch_norm, residual, pooling), the model is rebuilt automatically.

Parameters
**kwargsdict

Key/value pairs matching fields in _RunConfig. Unknown keys are ignored.

Returns
dict

A compact configuration summary (same as Summary()).

Raises
ValueError

If an attempted setting fails validation (e.g. malformed split or empty hidden_dims).

Notes

  • For graph-level tasks, changing split affects holdout splitting. You may want to call ByCSVPath() again (or re-instantiate) if you need a fresh split with new ratios.

  • For node/edge tasks, masks are taken from CSV columns if present; otherwise they are generated using split ratios within each graph.

Summary() Dict[str, Union[str, int, float, bool, Tuple]]

Return a compact summary of the current configuration and dataset size.

Returns
dict

A dictionary containing key configuration choices such as level, task, network options (conv, hidden_dims, etc.), training hyperparameters, current device, and basic dataset counts.

Notes

This is intended to be a lightweight, ReadTheDocs-friendly snapshot suitable for logging and reproducibility.

Test() Dict[str, float]

Compute metrics on the test split.

Returns
dict

A dictionary of metric values. Key names are prefixed depending on task:

  • graph-level: keys are prefixed with "test_"

  • node/edge/link: keys are prefixed with "test_" via internal helpers

Raises
ValueError

If the configured level is unsupported.

Train(epochs: Optional[int] = None, batch_size: Optional[int] = None) Dict[str, List[float]]

Train the model using the current configuration.

Training behaviour depends on level:

  • "graph": uses the current holdout split (train/val sets)

  • "node" : uses in-graph boolean masks (train_mask, val_mask)

  • "edge" : uses in-graph boolean masks (edge_train_mask, edge_val_mask)

  • "link" : uses torch_geometric.transforms.RandomLinkSplit per graph

Parameters
epochsint, optional

If provided, overrides config.epochs for this run.

batch_sizeint, optional

If provided, overrides config.batch_size for this run. For node/edge/link tasks the loader uses batch_size=1 (one graph at a time).

Returns
dict

Training history dictionary with keys "train_loss" and "val_loss". Each value is a list of floats (one per epoch).

Notes

  • For graph-level tasks, early stopping can be enabled via config.early_stopping and config.early_stopping_patience.

  • For k-fold cross-validation on graph-level tasks, use CrossValidate() instead.

Validate() Dict[str, float]

Compute metrics on the validation split.

Returns
dict

A dictionary of metric values. Key names are prefixed depending on task:

  • graph-level: keys are prefixed with "val_"

  • node/edge/link: keys are prefixed with "val_" via internal helpers

Raises
ValueError

If the configured level is unsupported.