Structural Topology Kernel — Multi-Task Structural Descriptor for Materials Microstructure

01 — The Kernel

Nine measurement layers.
Fifty coordinates.

STK is a fixed, deterministic operator: it maps a grayscale field to 50 real-valued coordinates with no training, no tunable per-dataset parameters, and no stochasticity. It reads structure in any grayscale field — SEM micrographs, EBSD-derived maps, binary masks — and makes no assumption about how the field was formed.

Structural Topology Kernel measurement pipeline

STK measurement pipeline: nine coordinate layers applied to a grayscale structural field, producing 50 fixed real-valued outputs. The same pipeline feeds classification, permeability regression, and ambiguity quantification heads — no per-task retraining.

Nine Measurement Layers

What the kernel
actually reads.

Layer Name What It Measures

Geometry

Centroid offset, void ratio, cohesion, radial mass distribution, peripheral mass, orientation entropy, structural diversity. The global spatial organization of mass in the field.

Radial Compliance

Center-to-edge structural change. How the field's organization shifts from its core to its periphery — a directional gradient in structural density.

Contour / VCLI-G

Boundary wander, void topology, contour curvature variation, orientation entropy + scale slope, structural coherence. Reads the geometry of phase boundaries and their complexity.

Resistance Graph

Multi-island resistance graph: disconnected structural domain count, size heterogeneity, spatial spread, percolation resistance, drain confidence, topological connectivity. The higher-order spatial information S2 cannot hold.

GLCM Texture

Gray-level co-occurrence matrix texture at fine (cellular) and coarse (colony) scale. Reads the statistical texture of the field at two physically distinct length scales.

Blob Microstructure

Particle count, density, size statistics. Reads individual structural features — spheroidite particles, pores, grain inclusions — as discrete objects rather than continuous fields.

FFT

Spectral / Edge

FFT band power, edge density, line-edge roughness, gap field. Reads periodic structural organization and boundary sharpness at different spatial frequencies.

Tonal Mass

Intensity by tone. Reads the distribution of gray levels across the field. Flagged as acquisition-sensitive — removing it leaves classification performance at 0.829 (vs. 0.832 full kernel).

What the STK kernel reads — material field types

What STK reads: ultrahigh-carbon-steel phases (pearlite, spheroidite, network, martensite), bainite M-A-island subclasses, synthetic 2D porous media at low and high porosity, and a hematoxylin-eosin histology tile for the cross-domain mechanism check. The same fixed 50-coordinate kernel reads all of them.

02 — Validation

Four datasets.
Two task types.
Same fifty coordinates.

UHCSDB · 961 SEM images · 43 parent specimens · GroupKFold

0.832 macro AUC — beats incumbents and their dimensionality-matched union.

On the UHCSDB ultrahigh-carbon-steel benchmark, STK-50 scores 0.832 macro AUC under GroupKFold by specimen — the correct evaluation respecting specimen structure, since 961 images derive from only 43 parents. The win is not feature compactness: STK also beats the dimensionality-matched PCA-50 union of S2+Haralick+HOG by +0.11. Removing all tonal, spectral, and edge channels leaves STK at 0.829 — the gain is structural, not an acquisition fingerprint.

STK-50 grouped AUC 0.832

HOG (7200 features) 0.728

Haralick (196 features) 0.629

PCA-50 incumbent union 0.721

Bainite · 386 tiles · 26 specimens · 3-class M-A-island subclassification

0.964 AUC on a second steel dataset — classification win reproduces independently.

STK scores 0.964 macro AUC on bainite M-A-island subclassification under GroupKFold by specimen. The dataset is independently collected, different detector, different steel system, different task — yet the same fixed 50 coordinates produce a strong classification result. The UHCSDB win is not dataset-specific. Breadth caveat: 26 specimens is moderate statistical power; the result is consistent but not as strongly powered as UHCSDB.

Bainite macro AUC (grouped) 0.964

Specimens 26 (GroupKFold)

Task 3-class M-A-island subclass

Same kernel? yes — no retraining

2D Porous Media · Zenodo 17711512 · 2,000-train / 2,000-test · Lattice-Boltzmann ground truth

R²=0.939 — classical-descriptor tier on a fair field-governed property test.

2D permeability of binary porous microstructures is the sharpest test of whether STK reads property-relevant structure: it is 2D-governed (matching STK's 2D reading), and the mechanism is percolation and connectivity — exactly the coordinate class STK's island/resistance layer targets. STK-50 alone matches porosity + specific-surface + Euler + S2 and a connectivity-aware strengthened incumbent. The task-trained ConvNeXt ceiling (~0.995) represents fine pore geometry no 50-coordinate summary descriptor captures — that gap is expected and conceded.

STK-50 R² (held-out) 0.939

Classical incumbent (porosity+S2+Euler) 0.937

Porosity-only (Kozeny-Carman) 0.920

Task-trained ConvNeXt ceiling ~0.995

Label Ambiguity · UHCSDB Mixed Classes · Cross-domain: Gleason 2019 (6 Pathologists)

~40% of samples measurably closer to another class — concentrated in structural mixtures.

Microstructure "classes" are often continua forced into categories. STK provides a reproducible continuous position that places mixed classes between their pure parents and flags label ambiguity. The robust finding is the structure of this ambiguity: it is concentrated in mixed classes (56–68% ambiguous) and falls along structurally-adjacent class pairs, not at random. STK's ambiguity margin predicts the misclassifications of independent descriptors (HOG-error AUC 0.604, Haralick-error AUC 0.645) — the samples STK flags are genuinely hard for other methods too. Cross-domain mechanism check: ρ=0.26 convergence with six-pathologist disagreement on H&E (Gleason 2019).

Ambiguity in mixed classes 56–68%

Ambiguity in structurally distinct classes 28–34%

HOG-error prediction AUC 0.604

Cross-domain ρ (Gleason 2019) 0.26 [0.12, 0.38]

STK benchmark comparison and specimen-leakage correction

Left: honest grouped-by-specimen benchmark — STK-50 (0.832) beats HOG, Haralick, and their dimensionality-matched union; holds with acquisition channels removed (0.829). Right: leakage correction — per-image CV (grey) inflated every method ~0.06–0.09; grouped-by-specimen (red) is the honest number.

STK structural space and label ambiguity by class

Left: STK space (2D projection) — pure classes anchor the corners; every mixed class sits between its pure parents (a measured continuum). Right: label ambiguity by class — concentrated in the mixtures (56–68%) vs. the structurally distinct classes (28–34%).

Cross-Dataset Generalization

Consistent positive signal
across four datasets.

Dataset Task Result Power

UHCSDB (DeCost & Holm)
7-class classification
0.832 AUC (grouped)
Strong · 961 imgs / 43 specimens

Bainite SEM (Figshare 19242903)

3-class M-A subclass

0.964 AUC (grouped)

Moderate · 386 tiles / 26 specimens

Ti-6Al-4V (18 conditions)

Rolling-temp. regression

R²=0.41, ρ=0.59

Small · 18 maps (grouped)

AlSi10Mg (8 states)

As-built vs. annealed

AUC = 1.0 (grouped)

Small · 8 maps (clean)

2D Porous Media (Zenodo 17711512)

Permeability regression

R² = 0.939

Strong · 2k train / 2k test

STK permeability benchmark — all descriptor tiers and scale ablation

A: all summary descriptors cluster at the classical tier (~R²=0.94); the gap to a task-trained ConvNeXt (0.995) is fine pore geometry no summary descriptor captures. B: scale ablation — STK's percolation coordinates are degenerate at 128 px and wake up on upsampling to 256 px. C: bootstrap of STK's residual over the strengthened incumbent — zero linearly, a marginal nonlinear sliver under GBM (+0.011).

The claim boundary, stated directly.

STK does not beat task-specialized methods on any single benchmark — its value is that one fixed coordinate set operates at incumbent tier across tasks normally addressed with separate, non-overlapping pipelines. The permeability result is on a synthetic, noiseless distribution and a 2,000/2,000 subset; a task-trained ConvNeXt reaches ~0.995 that no 50-coordinate summary approaches, because the fine pore geometry that sets the last ~6% is compressed away. The label-ambiguity contribution is proxy-validated in materials with no native inter-observer gold standard; the mechanism is checked cross-domain only (Gleason ρ=0.26). Cross-lab breadth rests on one strongly-powered dataset (UHCSDB) with bainite replication at 26 specimens. STK is not computationally cheaper than simple topology descriptors. The boundary is part of the contribution: STK is a portable structural measurement layer with a now-characterized ceiling, not a magic residual-finder.

+0.11 AUC above
dimensionality-matched
incumbent union

0.829 AUC without
acquisition-sensitive
tonal channels

not a
specialist generalist tier
across tasks,
not task-winner

03 — Position

One interface
where the field uses
three toolboxes.

Siloed per-task descriptors answer

Which method wins benchmark X?

Materials informatics typically uses HOG and Haralick for phase classification, porosity and Euler number for property regression, and manual expert review for label quality auditing. Each is optimized for its task. None can, even in principle, report where its own labels are soft or why a property estimate is uncertain — because the descriptor that classifies and the procedure that audits are different objects computed from different pixels.

STK answers

What is the structural content of this field, across tasks?

Because STK's classification, property, and ambiguity readouts share the same fifty coordinates, a model's uncertainty on one task is legible as structural complexity in the others. The ambiguity STK flags in classification is the same margin that predicts other descriptors' errors — measured on the identical axes. This is not a packaging convenience: unification is what makes the auditing possible, and it is precisely what siloed specialists give up.

The practical consequence is a single transparent, auditable, reproducible interface — one fixed coordinate set that replaces a toolbox of siloed per-task descriptors with a representation that is easier to validate and debug, and where a model's uncertainty can be read as structural complexity rather than discarded as noise. STK is a portable structural measurement layer, not a task-specific winner.

One representation.
Three tasks.
No training.

Nine measurement layers.
Fifty coordinates.

What the kernel
actually reads.

Four datasets.
Two task types.
Same fifty coordinates.

Consistent positive signal
across four datasets.

One interface
where the field uses
three toolboxes.

Read the
complete work.

One representation.Three tasks.No training.

Nine measurement layers.Fifty coordinates.

What the kernelactually reads.

Four datasets.Two task types.Same fifty coordinates.

Consistent positive signalacross four datasets.

One interfacewhere the field usesthree toolboxes.

Read thecomplete work.

One representation.
Three tasks.
No training.

Nine measurement layers.
Fifty coordinates.

What the kernel
actually reads.

Four datasets.
Two task types.
Same fifty coordinates.

Consistent positive signal
across four datasets.

One interface
where the field uses
three toolboxes.

Read the
complete work.