topologicpy.PyG module
- class topologicpy.PyG.PyG(path: str, config: _RunConfig)
Bases:
objectA clean PyTorch Geometric interface for TopologicPy-exported CSV datasets.
You can control medium-level hyperparameters by passing keyword arguments to ByCSVPath, for example:
- pyg = PyG.ByCSVPath(
path=”C:/dataset”, level=”graph”, task=”classification”, graphLabelType=”categorical”, cv=”kfold”, k_folds=5, conv=”gatv2”, hidden_dims=(128, 128, 64), activation=”gelu”, batch_norm=True, residual=True, dropout=0.2, lr=1e-3, optimizer=”adamw”, early_stopping=True, early_stopping_patience=10, gradient_clip_norm=1.0
)
Methods
ByCSVPath(path[, level, task, ...])Creates a
PyGinstance from a TopologicPy-exported CSV dataset folder.CrossValidate([k_folds, epochs, batch_size])Perform k-fold cross-validation for graph-level tasks.
LoadModel(path[, strict, ...])Load model weights from disk.
MetadataByGraphID(graphID)Returns preserved metadata for one graph id.
Returns preserved ontology and semantic metadata for the loaded dataset.
PlotConfusionMatrix([split, normalize, ...])Returns a Plotly Figure of the confusion matrix of the inference.
PlotCrossValidationSummary([cv_report, ...])PlotParity([split, title, xTitle, yTitle, ...])Plot a parity / correlation plot for regression tasks by delegating to
Plotly.FigureByCorrelation.Predict([split, threshold, return_logits, ...])Run inference (prediction) using the current model on the loaded dataset.
SaveModel(path[, include_config])Save the model to disk.
SetHyperparameters(**kwargs)Set one or more configuration values (hyperparameters) on this instance.
Summary()Return a compact summary of the current configuration and dataset size.
Test()Compute metrics on the test split.
Train([epochs, batch_size])Train the model using the current configuration.
Validate()Compute metrics on the validation split.
- static ByCSVPath(path: str, level: Literal['graph', 'node', 'edge', 'link'] = 'graph', task: Literal['classification', 'regression', 'link_prediction'] = 'classification', graphLabelType: Literal['categorical', 'continuous'] = 'categorical', nodeLabelType: Literal['categorical', 'continuous'] = 'categorical', edgeLabelType: Literal['categorical', 'continuous'] = 'categorical', ontology: bool = True, **kwargs) PyG
Creates a
PyGinstance from a TopologicPy-exported CSV dataset folder.The dataset folder is expected to contain three files:
graphs.csv: one row per graph (graph-level labels/features)nodes.csv: one row per node (node-level labels/features/masks)edges.csv: one row per edge (edge-level labels/features/masks)
The created instance immediately loads the CSVs, builds a list of
torch_geometric.data.Dataobjects, performs an initial holdout split (for graph-level tasks), and builds a default model according to the provided configuration.- Parameters
- pathstr
Path to the dataset folder that contains
graphs.csv,nodes.csv, andedges.csv.- level{“graph”, “node”, “edge”, “link”}, optional
The prediction level:
"graph": graph-level labels ingraphs.csv"node": node-level labels innodes.csv"edge": edge-level labels inedges.csv"link": link prediction (binary edge existence)
- task{“classification”, “regression”, “link_prediction”}, optional
The learning task. For
level="link"this should be"link_prediction".- graphLabelType{“categorical”, “continuous”}, optional
Label type for graph-level targets (used when
level="graph").- nodeLabelType{“categorical”, “continuous”}, optional
Label type for node-level targets (used when
level="node").- edgeLabelType{“categorical”, “continuous”}, optional
Label type for edge-level targets (used when
level="edge").- ontologybool, optional
If True, preserves ontology and semantic metadata columns from
graphs.csv,nodes.csv, andedges.csv. These columns are stored as metadata and are not converted into numeric feature tensors. Default is True.- **kwargsdict
Optional overrides for any field in
_RunConfig. Common examples includeconv,hidden_dims,activation,dropout,batch_norm,residual,pooling,epochs,batch_size,lr,weight_decay, and cross-validation options.
- Returns
- PyG
The created
PyGinstance.
- Raises
- ValueError
If the path does not exist, required CSV files are missing, or no node feature columns are found.
Examples
. pyg = PyG.ByCSVPath(path=”C:/dataset”, level=”graph”, task=”classification”) . history = pyg.Train(epochs=50)
- CrossValidate(k_folds: Optional[int] = None, epochs: Optional[int] = None, batch_size: Optional[int] = None) Dict[str, Union[float, List[Dict[str, float]]]]
Perform k-fold cross-validation for graph-level tasks.
This method rebuilds and retrains a fresh model per fold, evaluates on the fold’s held-out set, and returns fold-wise metrics along with mean/std aggregates.
- Parameters
- k_foldsint, optional
Number of folds. Defaults to
config.k_folds.- epochsint, optional
Training epochs per fold. Defaults to
config.epochs.- batch_sizeint, optional
Batch size for DataLoader. Defaults to
config.batch_size.
- Returns
- dict
A dictionary of the form:
{ "fold_metrics": [{"fold": 0, ...}, {"fold": 1, ...}, ...], "mean_<metric>": ..., "std_<metric>": ... }
- Raises
- ValueError
If called for non-graph levels, or if
k_folds < 2.
Notes
Stratified folding is available for categorical graph labels when
config.k_stratifyisTrue.Cross-validation is intentionally limited to graph-level tasks; node/edge tasks typically rely on per-graph masks rather than splitting graphs.
- LoadModel(path: str, strict: bool = True, rebuild_from_checkpoint: bool = True)
Load model weights from disk.
This method is backward compatible with older
.ptfiles that contain only a rawstate_dict. If the file contains a checkpoint dict produced bySaveModel()withinclude_config=True, the model can be rebuilt automatically to match the saved architecture.- Parameters
- pathstr
Path to a
.ptfile.- strictbool, optional
Passed to
load_state_dict. Default is True.- rebuild_from_checkpointbool, optional
If True and the checkpoint contains saved config fields, rebuilds the model before loading weights. Default is True.
- Returns
- None
- MetadataByGraphID(graphID) Dict[str, object]
Returns preserved metadata for one graph id.
- Parameters
- graphIDany
The graph id value as stored in
graphs.csv.
- Returns
- dict
A dictionary containing graph, node, and edge metadata.
- OntologyMetadata() Dict[str, object]
Returns preserved ontology and semantic metadata for the loaded dataset.
- Returns
- dict
A dictionary with
graphs,nodes,edgesandcolumnssections. Metadata is keyed by graph index and graph id where possible.
- PlotConfusionMatrix(split: str = 'test', normalize: bool = False, minValue: int = None, maxValue: int = None, title: str = None, xTitle: str = 'Actual Categories', yTitle: str = 'Predicted Categories', width: int = 950, height: int = 500, showScale: bool = True, colorScale: str = 'viridis', colorSamples: int = 10, backgroundColor: str = 'rgba(0,0,0,0)', marginLeft: int = 0, marginRight: int = 0, marginTop: int = 40, marginBottom: int = 0, baseFontSize: int = 16, tickFontSize: int = 14, titleFontSize: int = 22, axisTitleFontSize: int = 16, annotationFontSize: int = 18, grayScale: bool = False, mantissa: int = 6)
Returns a Plotly Figure of the confusion matrix of the inference. Actual categories are displayed on the X-Axis, Predicted categories are displayed on the Y-Axis.
- Parameters
- splitstr , optional
Which split(s) to evaluate. Options are: {“train”,”val”,”validate”,”validation”,”test”,”all”}. Default is “test”.
- normalizebool, optional
If True, row-normalize the confusion matrix. Default is False.
- minValuefloat , optional
The desired minimum value to use for the color scale. If set to None, the minimum value found in the input data will be used.
- maxValuefloat , optional
The desired maximum value to use for the color scale. If set to None, the maximum value found in the input data will be used.
- titlestr , optional
The desired title to display. Default is “Confusion Matrix”.
- xTitlestr , optional
The desired X-axis title to display. Default is “Actual Categories”.
- yTitlestr , optional
The desired Y-axis title to display. Default is “Predicted Categories”.
- widthint , optional
The desired width of the figure. Default is 950.
- heightint , optional
The desired height of the figure. Default is 500.
- showScalebool , optional
If set to True, a color scale is shown on the right side of the figure. Default is True.
- colorScalestr , optional
The desired type of plotly color scales to use (e.g. “Viridis”, “Plasma”). Default is “Viridis”.
- colorSamplesint , optional
The number of discrete color samples to use for displaying the data. Default is 10.
- backgroundColorlist or str , optional
The desired background color (see docstring above). Default is transparent.
- marginLeft, marginRight, marginTop, marginBottomint , optional
Plot margins in pixels.
- baseFontSizeint , optional
The base font size. Default is 16.
- tickFontSizeint , optional
The tick font size. Default is 14.
- titleFontSizeint , optional
The title font size. Default is 22.
- axisTitleFontSizeint , optional
The axis title font size. Default is 16.
- annotationFontSizeint , optional
The annotation font size. Default is 18.
- grayScalebool , optional
If set to True, the figure is rendered in grayscale. Default is False.
- mantissaint , optional
The desired length of the mantissa. Default is 6.
- Returns
- plotly.graph_objects.Figure
The created plotly figure.
- PlotCrossValidationSummary(cv_report: Optional[Dict[str, Union[float, List[Dict[str, float]]]]] = None, metrics: Optional[List[str]] = None, show_mean_std: bool = True)
- PlotHistory()
- PlotParity(split: str = 'test', title: str = None, xTitle: str = 'Actual Values', yTitle: str = 'Predicted Values', showIdentity: bool = True, showBestFit: bool = True, dotSize: int = 6, dotColor: str = 'blue', lineColor: str = 'red', width: int = 800, height: int = 600, theme: str = 'default', backgroundColor: str = 'rgba(0,0,0,0)', marginLeft: int = 0, marginRight: int = 0, marginTop: int = 40, marginBottom: int = 0)
Plot a parity / correlation plot for regression tasks by delegating to
Plotly.FigureByCorrelation.- Parameters
- split{“train”, “val”, “validate”, “validation”, “test”, “all”}, optional
Which split to evaluate. Default is
"test".- titlestr, optional
Custom plot title. If None, an automatic title is generated.
- xTitlestr, optional
The X-axis title. Default is
"Actual Values".- yTitlestr, optional
The Y-axis title. Default is
"Predicted Values".- showIdentitybool, optional
If set to true, shows the 45 degree line.
- showBestFitbool, optional
If set to True, draws the best fit line through the data.
- dotSizeint, optional
The marker size
- dotColorstr, optional
Dot color passed to
Plotly.FigureByCorrelation.- lineColorstr, optional
Best-fit line color passed to
Plotly.FigureByCorrelation.- widthint, optional
Figure width in pixels.
- heightint, optional
Figure height in pixels.
- themestr, optional
Plotly theme. Options are
"dark","light","default".- backgroundColorstr, optional
Figure background color.
- marginLeftint, optional
Left margin in pixels.
- marginRightint, optional
Right margin in pixels.
- marginTopint, optional
Top margin in pixels.
- marginBottomint, optional
Bottom margin in pixels.
- Returns
- plotly.graph_objects.Figure
A correlation figure of actual vs predicted values.
- Raises
- ValueError
If called when
config.taskis not"regression"or whenconfig.levelis"link".- RuntimeError
If no regression labels are found for the requested split(s).
Notes
For node/edge regression, the method uses the corresponding boolean masks
on each graph and aggregates across all graphs. - This method relies on
_predict_graph(),_predict_node(), and_predict_edge(). -show_identity,show_best_fit, andpoint_sizeare kept only for API compatibility. The delegated Plotly method always shows the best-fit line and 45-degree line, and does not expose point size.
- Predict(split: str = 'all', threshold: float = 0.5, return_logits: bool = False, return_probs: bool = True, return_embeddings: bool = False, attach_to_data: bool = False, pred_key: str = 'pred', prob_key: str = 'prob', logits_key: str = 'logits', emb_key: str = 'emb') Dict[str, object]
Run inference (prediction) using the current model on the loaded dataset.
This method is designed for post-training workflows, including the common pattern of train → save → reload → predict on unseen data. It performs forward passes only (no gradient computation) and returns predictions in a compact, serializable form.
Behaviour depends on
level:"graph": graph-level prediction using a mini-batchedDataLoader"node": node-level prediction using node masks (train_mask,val_mask,test_mask)"edge": edge-level prediction using edge masks (edge_train_mask,edge_val_mask,edge_test_mask)"link": link prediction usingRandomLinkSplitper graph
- Parameters
- splitstr, optional
The subset to predict. Supported values depend on
config.level:graph-level:
"train","val","test","all"node-level :
"train","val","test","all"("all"returns full-length vectors)edge-level :
"train","val","test","all"("all"returns full-length vectors)link-level :
"train","val","test"("all"is treated as"test")
Default is
"all".- thresholdfloat, optional
Threshold for converting link-prediction probabilities into binary labels. Only used when
config.level == "link". Default is 0.5.- return_logitsbool, optional
If True, includes raw model outputs (logits) in the returned dictionary. For regression tasks, logits are the raw predictions. Default is False.
- return_probsbool, optional
If True, includes probabilities/scores when applicable:
classification: softmax probabilities
link prediction: sigmoid probabilities
regression: ignored (no probabilities)
Default is True.
- return_embeddingsbool, optional
If True, includes the node embeddings produced by the GNN backbone (the output of
model["encoder"]) for each predicted batch/graph. Default is False.- attach_to_databool, optional
If True, attaches prediction tensors to each
Dataobject indata_listusing keyspred_key,prob_key,logits_key, andemb_key. This is useful for downstream processing (e.g., exporting to CSV or mapping back to Topologic entities). Default is False.- pred_keystr, optional
Attribute name to attach predicted labels/values to each Data object when
attach_to_datais True. Default is"pred".- prob_keystr, optional
Attribute name to attach probabilities/scores to each Data object when
attach_to_datais True. Default is"prob".- logits_keystr, optional
Attribute name to attach logits/raw outputs to each Data object when
attach_to_datais True. Default is"logits".- emb_keystr, optional
Attribute name to attach encoder embeddings to each Data object when
attach_to_datais True. Default is"emb".
- Returns
- dict
A dictionary containing (at minimum) the key
"pred"with predictions.- Graph-level
"pred":(N,)predicted class indices or regression values"y_true":(N,)true labels/targets if present"index":(N,)integer indices aligned withself.data_listorder
- Node/Edge-level
"pred": list of arrays (one per graph) unlesssplit != "all""y_true": list of arrays (one per graph) if present"mask": mask name used whensplit in {train,val,test}
- Link-level
"score": sigmoid probabilities for edge_label_index samples"pred": binary predictions derived fromthreshold"y_true": binary ground truth labels for sampled links
- Raises
- ValueError
If
splitorconfig.levelis unsupported, or if the model is not initialised.
Notes
This method assumes you have already called
ByCSVPath()
(or otherwise populated
data_list), and thatmodelis loaded/initialised (e.g., viaTrain()orLoadModel()). - For classification tasks, the returned class indices follow the encoding present in the CSV labels.
- SaveModel(path: str, include_config: bool = True)
Save the model to disk.
- Parameters
- pathstr
Output file path. If the extension is not
.pt, it is appended automatically.- include_configbool, optional
If True, saves enough configuration alongside weights to rebuild the model on load. Default is True.
- Returns
- None
- SetHyperparameters(**kwargs) Dict[str, Union[str, int, float, bool, Tuple]]
Set one or more configuration values (hyperparameters) on this instance.
This method updates
configfields using keyword arguments. If any model-shaping setting changes (e.g.conv,hidden_dims,activation,dropout,batch_norm,residual,pooling), the model is rebuilt automatically.- Parameters
- **kwargsdict
Key/value pairs matching fields in
_RunConfig. Unknown keys are ignored.
- Returns
- dict
A compact configuration summary (same as
Summary()).
- Raises
- ValueError
If an attempted setting fails validation (e.g. malformed
splitor emptyhidden_dims).
Notes
For graph-level tasks, changing
splitaffects holdout splitting. You may want to callByCSVPath()again (or re-instantiate) if you need a fresh split with new ratios.For node/edge tasks, masks are taken from CSV columns if present; otherwise they are generated using
splitratios within each graph.
- Summary() Dict[str, Union[str, int, float, bool, Tuple]]
Return a compact summary of the current configuration and dataset size.
- Returns
- dict
A dictionary containing key configuration choices such as
level,task, network options (conv,hidden_dims, etc.), training hyperparameters, current device, and basic dataset counts.
Notes
This is intended to be a lightweight, ReadTheDocs-friendly snapshot suitable for logging and reproducibility.
- Test() Dict[str, float]
Compute metrics on the test split.
- Returns
- dict
A dictionary of metric values. Key names are prefixed depending on task:
graph-level: keys are prefixed with
"test_"node/edge/link: keys are prefixed with
"test_"via internal helpers
- Raises
- ValueError
If the configured level is unsupported.
- Train(epochs: Optional[int] = None, batch_size: Optional[int] = None) Dict[str, List[float]]
Train the model using the current configuration.
Training behaviour depends on
level:"graph": uses the current holdout split (train/val sets)"node": uses in-graph boolean masks (train_mask,val_mask)"edge": uses in-graph boolean masks (edge_train_mask,edge_val_mask)"link": usestorch_geometric.transforms.RandomLinkSplitper graph
- Parameters
- epochsint, optional
If provided, overrides
config.epochsfor this run.- batch_sizeint, optional
If provided, overrides
config.batch_sizefor this run. For node/edge/link tasks the loader usesbatch_size=1(one graph at a time).
- Returns
- dict
Training history dictionary with keys
"train_loss"and"val_loss". Each value is a list of floats (one per epoch).
Notes
For graph-level tasks, early stopping can be enabled via
config.early_stoppingandconfig.early_stopping_patience.For k-fold cross-validation on graph-level tasks, use
CrossValidate()instead.
- Validate() Dict[str, float]
Compute metrics on the validation split.
- Returns
- dict
A dictionary of metric values. Key names are prefixed depending on task:
graph-level: keys are prefixed with
"val_"node/edge/link: keys are prefixed with
"val_"via internal helpers
- Raises
- ValueError
If the configured level is unsupported.