The Structural Topology Kernel (STK): a compact, interpretable, multi-task structural descriptor for materials microstructure

Classification, taxonomy alignment, measurable label ambiguity, and a fair field-governed property test

Russell Parrish · Parallax Metrology · ORCID 0009-0008-9781-7995 · draft 2026-06-11

Contribution. STK is a fixed 50-coordinate, training-free structural descriptor that reaches incumbent-tier performance across microstructure classification and 2D permeability prediction from a single representation, while making label ambiguity interpretable and measurable, one auditable interface in place of a toolbox of per-task descriptors.

The contribution is deliberately a generalist one. STK does not beat task-specialized methods on any single benchmark, and it is not meant to: its value is that one fixed, interpretable coordinate set operates at incumbent tier across tasks that the field normally addresses with separate, non-overlapping pipelines, and does two things no specialist pipeline offers at all, namely auditability across tasks and a measurable readout of label ambiguity. The question this paper answers is not "is there a descriptor that wins task X," but "how much can a single transparent representation do at once, and exactly where does it stop." We read a tier-level result across three tasks as a stronger statement about structural information content than a single benchmark win at one.

Introduction: We introduce the Structural Topology Kernel (STK), a deterministic, training-free, 50-coordinate descriptor of grayscale structural fields (geometry, orientation, void topology, a multi-island resistance graph, GLCM texture, blob microstructure, spectral bands, and tonal mass). On the UHCSDB ultrahigh-carbon-steel benchmark (961 SEM images, but only 43 parent specimens), STK reaches 0.832 macro AUC under GroupKFold by specimen, beating HOG (0.728) and Haralick (0.629) at 50 interpretable features; the advantage is intrinsic (it beats the dimensionality-matched union of S2+Haralick+HOG by +0.11), structural rather than an acquisition fingerprint (0.829 with all tonal/spectral/edge channels removed), and reproduces on a second steel set (bainite, 0.964 grouped). STK's coordinates are metallurgically interpretable, each class is discriminated by the axis a metallurgist would name (pearlite↔orientation, network↔connectivity, spheroidite↔void-topology), and its continuous structural position turns subjective microstructure labels into a measurable quantity, placing mixtures between their pure parents and flagging where the taxonomy is genuinely fuzzy. On a fair, dimensionality-matched field-governed property (2D permeability of binary porous microstructures; held-out on a 2,000/2,000 subset), STK-50 reaches the classical-descriptor tier (R²=0.939, matching porosity+specific-surface+Euler+S2 and beating porosity-alone), adding only a marginal nonlinear residual beyond a connectivity-aware incumbent (bootstrap ΔR²: 0 linear, +0.011 GBM) and sitting below a task-trained ConvNeXt ceiling (~0.995). The contribution is a single fixed interpretable coordinate set that operates at incumbent tier across classification, property prediction, and label-ambiguity quantification, a compact multi-task surrogate, not a property-prediction specialist. All findings are appropriately bounded; the label-ambiguity contribution is proxy-validated in materials, with its mechanism checked cross-domain against six pathologists.

All numbers in this paper are re-derived from raw data; code and a hashed reproducibility manifest accompany it (§8).

Consequence: one representation, many heads

Materials informatics typically fragments by task. Phase classification reaches for high-contrast texture descriptors (HOG, Haralick); property regression reaches for geometry descriptors (porosity, Euler number, skeletonization, pore-network statistics); and label-quality auditing is usually a separate, often manual, expert exercise. These are treated as different problems requiring different, non-overlapping pipelines. STK's central claim is that they are manifestations of the same topological information: because the 50 coordinates encode local texture and global structure (connectivity, percolation, void distribution) together, the same fixed vector feeds a classification head (0.83 grouped), a permeability-regression head (classical-descriptor tier), and an ambiguity readout (label fuzziness as a measured quantity, mapped back to specific coordinates). The representation is static; the task is a choice of head. This is not merely a packaging convenience. A per-task toolbox cannot, even in principle, report where a task's labels are soft or why a property estimate is uncertain, because the descriptor that classifies and the procedure that audits are different objects computed from different pixels. Because STK's classification, property, and ambiguity readouts are the same fifty coordinates, a model's uncertainty on one task is legible as structural complexity in the others, the ambiguity STK flags in classification is the same margin that predicts independent descriptors' errors (§4), measured on the identical axes. Unification is the mechanism, not the marketing: it is what makes the auditing and the ambiguity readout possible, and it is precisely what siloed specialists give up. Practically, this replaces a toolbox of siloed per-task descriptors with one transparent, auditable, reproducible interface, easier to validate and debug, and the same place a model's uncertainty can be read as structural complexity rather than discarded as noise. The boundary is part of the contribution: STK is not a magic residual-finder that beats task-trained models, but a portable structural measurement layer with a now-characterized ceiling.

1. The instrument

STK is a fixed, deterministic operator: it maps a grayscale field to 50 real-valued coordinates with no training, no tunable per-dataset parameters, and no stochasticity. The coordinates span nine measurement layers:

STK is modality-agnostic: it reads structure in any grayscale field, SEM micrographs, EBSD-derived maps, or binary masks, and makes no assumption about how the field was formed. Implementation: a single deterministic kernel (provided).

2. Classification: competitive, interpretable, intrinsic

On UHCSDB (DeCost & Holm; 7 classes), evaluation must respect specimen structure, the 961 micrographs derive from only 43 parent specimens, so per-image cross-validation leaks specimens across folds. Under GroupKFold by specimen, STK-50 scores 0.832 macro AUC versus HOG-7200 (0.728) and Haralick-196 (0.629). The win is not feature-compactness: STK also beats the dimensionality-matched PCA-50 union of S2+Haralick+HOG by +0.11, and is structural rather than an acquisition signature, removing all tonal, spectral, and edge channels leaves it at 0.829.

Why STK wins: a mechanism, not just a score. A single boundary-density coordinate collapses into the radial two-point autocorrelation S2 (pre-registered test; STK claims no spatial-organization channel beyond S2). But S2 is the autocorrelation (|FFT|², phase discarded), so it saturates the spatial axis only to second order, distinct microstructures can share one S2, which is precisely why the field also resorts to lineal-path and cluster-correlation functions. STK's connectivity/topology coordinates are higher-order spatial information S2 cannot hold by construction: conditioned on an S2 descriptor they add +0.042, comparable to the +0.038 from texture/orientation. This is the structural reason for the AUC gap, not merely a restatement of it: the instrument's edge is multichannel, partly higher-order-spatial content (texture, orientation, topology) that the incumbents, even pooled, cannot encode at second order.

3. Cross-dataset generalization

STK generalizes beyond UHCSDB across further datasets and a second task type, all leakage-controlled:

Honest: consistent positive signal across four datasets / two tasks; a second steel SEM set independently reproduces the classification win. Only UHCSDB is strongly powered, and bainite has 26 specimens, breadth rests partly on small-N sets. (A copper grain-boundary-engineering set was excluded for an acquisition confound: reference and treated maps used different detectors and step sizes.)

4. Making subjective labels measurable

Dataset	Task	Result	Power
UHCSDB	7-class classification	0.832 macro AUC (GroupKFold by specimen)	strong (961 imgs / 43 specimens)
Bainite (steel SEM)	3-class M-A-island subclass	0.964 macro AUC (GroupKFold by specimen)	moderate (386 tiles / 26 specimens)
Ti-6Al-4V (18 conditions)	rolling-temperature regression	R²=0.41, Spearman 0.59 (GroupKFold by map)	moderate (18 maps)
AlSi10Mg (8 states)	as-built vs annealed	AUC=1.0 (GroupKFold by map)	clean but small (8 maps)

Microstructure "classes" are often continua forced into categories by expert judgment. STK provides a reproducible continuous position that (1) places mixed classes between their pure parents, and (2) flags label ambiguity: ~40% of samples sit closer to another class's centroid than to their own label. The robust finding is the structure of this ambiguity, it is concentrated in the mixed classes and falls along structurally-adjacent class pairs, not at random, rather than the exact percentage, which depends on the distance metric and centroid definition. The pattern, not the point estimate, is the claim.

Interpretability (taxonomy alignment).

The coordinates that discriminate each class are the ones a metallurgist would name: pearlite (lamellar) ← orientation entropy (single-coordinate one-vs-rest AUC 0.91); network (connected grain-boundary cementite) ← cohesion/connectivity (0.80); spheroidite (spherical particles) ← void-topology (0.78); martensite (fine lath) ← fine-scale texture. The mixtures are discriminated by intermediate values of those same axes, the fuzziness result from the other side. STK is interpretable and taxonomy-aligned, not a black box.

External mechanism check (cross-domain, bounded).

The in-materials validation above uses HOG and Haralick, different algorithms on the same pixels, not independent observers, so it shows reproducibility, not observer-independence, which materials data cannot establish (no inter-observer microstructure archives exist). We therefore tested the mechanism against a dataset that has what materials lacks: Gleason 2019, six pathologists annotating prostate H&E at the pixel level. Using the identical kernel and centroid-margin ambiguity, STK's ambiguity converges with the six-reader disagreement field at core-level ρ = 0.26 (95% CI [0.12, 0.38]), after stain normalization and cluster-respecting aggregation, a modest but genuinely observer-independent convergence. This is cross-domain support for the construct, not materials evidence; the materials-native gold standard remains open.

5. Field-governed property test: 2D permeability

The sharpest test of whether STK reads property-relevant structure is a property whose mechanism lives in field/network organization. We use 2D permeability of binary porous microstructures (lattice-Boltzmann ground truth; 24,000-sample set), chosen because it is genuinely 2D-governed (STK's 2D reading is dimensionally matched, not a slice of a 3D process) and because porosity-plus-connectivity is exactly the mechanism class STK's island/resistance coordinates target. Scope on N: STK is ~0.5–2.7 s/image on these pore fields, so we did not exhaust the full set, all R² are held-out on a 2,000-train / 2,000-test subset (bootstrap on n_test=2,000), and the scale ablation uses N=500. R² is on log mean-diagonal permeability.

predictor	R² (linear)	R² (GBM)
porosity only (Kozeny-Carman lowest-order)	0.920	0.914
classical incumbent (porosity + specific-surface + Euler + S2)	0.937	0.920
strengthened incumbent (+ components/skeleton/throat/distance-transform)	0.939	0.921
STK-50	0.939	0.928
strengthened incumbent + STK	0.940	0.932

Competitive surrogate, not additive. STK-50 alone reaches the classical-descriptor tier, matching porosity+S2+Euler and a connectivity-aware strengthened incumbent (0.939), and beating porosity-alone (0.920), so a general grayscale instrument autonomously recovers the small connectivity residual past porosity (~0.002 R²; this dataset is porosity-dominated). Concatenated with the incumbent, a bootstrap (n=2,000, 2,000 resamples) shows STK adds 0 linearly (ΔR² +0.001, 95% CI [−0.001, +0.003]) and only a marginal nonlinear sliver (GBM +0.011, 95% CI [+0.005, +0.016]), small, real, not a new independent channel.

The ceiling, conceded: a task-trained ConvNeXt-Small reaches R²≈0.995. That gap is expected, the target is a noiseless deterministic function of the exact pixels, so a high-capacity network with 24k examples approximates the solver to arbitrary precision, while a fixed 50-coordinate summary compresses away the fine throat geometry that sets the last ~6%. STK's claim is the generalist one: one fixed interpretable coordinate set at classical-descriptor tier across permeability and classification and label-fuzziness, not a single-task property specialist. STK is also not computationally cheaper than the simple topology descriptors here; its edge is compactness, interpretability, and multi-task unification.

6. Scope and limits

Established	Not established
Grouped (specimen-safe) classification beyond classical descriptors and their union	Property-prediction superiority over incumbents (competitive, not additive)
Metallurgical interpretability / taxonomy alignment	Independent materials inter-observer (disagreement) validation
Classical-descriptor-tier permeability prediction, scale-robust by direct ablation	A computational-speed advantage over simple topology descriptors
Proxy-level measurable label-ambiguity (mechanism checked cross-domain)	A universal materials structure→property law; matching a task-trained CNN ceiling

Limitations, gathered. The permeability result is on a synthetic, noiseless distribution and a 2,000-train/2,000-test subset (not the full 24k), where a task-trained CNN reaches ~0.995 that no summary descriptor approaches; the label-ambiguity contribution is proxy-validated in materials with no native inter-observer gold standard, its mechanism checked only cross-domain (Gleason, ρ=0.26); STK is not computationally cheaper than simple topology descriptors; and cross-lab breadth rests on one strongly-powered dataset (UHCSDB), with the bainite replication at 26 specimens. None of these is fatal to the stated claims, but each bounds them.

7. Conclusion

STK is a portable, deterministic, interpretable structural measurement layer for microstructure: one fixed 50-coordinate representation that operates at incumbent-descriptor tier across phase classification, a field-governed physical property, and label-ambiguity quantification, each claim bounded, and each boundary mapped by direct test rather than asserted. It does not beat task-trained models on their home turf, it does not recover a single grain's size, and it does not encode the fine pore geometry that sets a transport property's last few percent. Knowing precisely where it stops is part of the scientific result: STK is not a magic residual-finder but a unified, auditable interface that replaces a toolbox of siloed, per-task descriptors with one transparent, reproducible representation, with a now-characterized ceiling. The structure of a material, read deterministically and interpretably, is enough to classify it, to estimate a connectivity-governed property at descriptor tier, and to say where its own labels are soft, from the same fifty numbers.

8. Methods and reproducibility

STK is implemented as a single deterministic kernel (vtl_materials_kernel.py, a legacy filename retained for provenance). Classification uses balanced logistic regression under GroupKFold by specimen; macro one-vs-rest AUC. Incumbent descriptors (HOG, Haralick, S2) are computed on identical images. Permeability uses linear and gradient-boosted regression on log mean-diagonal permeability, fit on train and evaluated held-out on test; ΔR² confidence intervals by test-set bootstrap (2,000 resamples). The cross-domain mechanism check (Gleason 2019) tiles TMAs, builds a six-reader disagreement field, applies Macenko stain normalization, and aggregates at the core level with a cluster bootstrap. Every headline number was re-derived from raw data during a close-out validation pass; all code, intermediate features, figures, and a hashed manifest (0 stale, 0 missing across 92 tracked artifacts) accompany this paper.

References (selected)

Draft: exploratory; statistical claims are bounded as stated. Correspondence: Russell Parrish, Parallax Metrology.

Appendix A: Transparency log

In keeping with the project's standing discipline, pre-register predictions and report them even when they invert, document discards and nulls alongside wins, trust a clean number least, and keep a hashed trail at zero stale, this appendix records the corrections the result survived, the provenance and sample sizes behind each headline, and the reproducibility state. The result is offered not as a clean arc but as one that held up under its own audits.

A.1 Corrections the work survived

A.2 Provenance and sample sizes

A.3 Reproducibility

Stage	First read	Correction, after audit
UHCSDB classification	0.92 macro AUC (per-image CV)	0.832 grouped-by-specimen, the per-image split leaked specimens (961 micrographs from 43 parents); an external review flagged it.
"Beyond S2"	+0.164 AUC beyond S2	retired as near-tautological (50-dim vs 1-channel); replaced by the incumbent-union benchmark and the second-order argument (§2).
Fatigue property (RPData)	"crystallographic, wrong channel"	falsified by feature audit → the property is grain-area-dominated (a per-object scalar); STK is a field instrument, so this was the wrong test, recorded as a documented null.
Permeability additivity	"adds nothing, ΔR²≈0"	bootstrap corrected it: 0 linearly (CI includes 0) but +0.011 GBM (CI excludes 0) → "marginal nonlinear residual," not zero.
Permeability resolution	reasoned the under-resolution was immaterial	checked directly, a pre-declared 256-px ablation confirmed the percolation coordinates were degenerate at 128 px and wake up at 256 px, with no material change in residual → scale-robust by test, not assumption.

Result	Data	N	Ground truth
Classification 0.832	UHCSDB (DeCost & Holm)	961 imgs / 43 specimens, GroupKFold-5	expert phase labels
Bainite 0.964	Figshare 19242903 (test split)	386 tiles / 26 specimens, GroupKFold-5	round-robin consensus labels
Permeability 0.939	Zenodo 17711512 (2D binary porous)	2,000 train / 2,000 test of 24k; bootstrap n=2,000	lattice-Boltzmann (simulated, noiseless)
Scale ablation	same, 128 vs 256 px	N=500	same
Ambiguity mechanism	Gleason 2019 (6 pathologists)	1,881 tiles / 244 cores	six-reader pixel annotations

Every headline number was re-derived from raw data in a close-out validation pass, classification AUCs recomputed from the feature matrix and kernel, the permeability ΔR² bootstrapped (2,000 resamples). A hashed manifest tracks 92 artifacts (code, intermediate features, figures, documents) at 0 stale / 0 missing. The deterministic kernel, batch drivers, analysis scripts, and manifest accompany this paper. The kernel file is named vtl_materials_kernel.py for provenance, the instrument was developed under the author's broader "VTL" (Visual Thinking Lens) program before the materials instrument was named STK, and the code release's README leads with this rename so the artifact is not mistaken for a different tool.

Nulls and dead ends recorded beside the wins: a per-object property (RPData fatigue protrusion) where STK correctly does not apply; a chromatic pathology kernel that did not beat the grayscale kernel on label-disagreement; and the property-prediction ceiling set by a task-trained CNN. These bound the contribution and are part of it.

(c) 2026 Russell Parrish / Parallax Metrology. All rights reserved. No part of this system, visual material, or accompanying documents may be reproduced, distributed, or transmitted in any form or by any means, including AI training datasets, without explicit written permission from the creator. www.parallaxmetrology.com | www.artistinfluencer.com | ORCID: 0009-0008-9781-79