VTL v3.1 + PTD-Z · Semiconductor Inspection
A physics-grounded, modality-agnostic structural coordinate framework for semiconductor inspection imagery. Fifteen coordinates derived from image formation first principles — no learned parameters, no domain-specific retraining.
The coordinate system does not fit domain-specific statistics. It tracks changes in visual organization consistent with how each modality physically forms images.
VTL is a 15-coordinate image descriptor derived from image formation first principles rather than fitted to data. Coordinates include established lithography metrology quantities — NILS (Normalized Image Log Slope, the standard edge sharpness measure) and LER (Line Edge Roughness, the standard stochastic noise metric for printed edges) — extended into image-domain structural topology. The framework is predictive rather than descriptive: the math precedes the application. It cannot transfer to a new imaging modality without domain-specific modification if it is domain-fitted; it can transfer if it is physics-derived. LER is identically zero on coherent simulated aerial images, confirmed before any real data was examined. That is not a classification result. It is a physics prediction.
PTD-Z extends VTL into a routed structural telemetry framework. The practical inspection problem is not only whether an image is unusual — it is whether the pattern has stopped behaving like the structure it was expected to be, and whether the unusual evidence can be translated into a reviewable structural statement. PTD-Z decomposes measurement into six route families, each carrying a process-facing hypothesis and a refusal condition for claims the evidence cannot yet support. It treats refusal as a first-class method component, not an afterthought: PTD-Z does not replace classical descriptors or inspection tools. It reorganizes image evidence into interpretable routes, tests how much survives classical baselines, and refuses process-causal claims without process-linked data.
Validation Overview
4,579 production SEM images from one semiconductor layer, six defect classes. VTL achieves 93.4% ± 3.4% balanced accuracy (5-fold CV), first among Haralick GLCM, HOG, LBP, Hu moments, and Zernike moments. PTD-Z adds +0.0214 macro F1 over Classical_All in independence audit. image_geometry is the most defensible residual route — pitch_phase shows the strongest independence profile.
7,015 MixedWM38 wafer maps (8-class): 94.9% ± 0.2% balanced accuracy (7.59× lift over Hu moments, which collapse to chance at 12.5%). WM-811K (8,763 maps, 9-class): 67.5% ± 1.0% — ceiling consistent with a 45×48px resolution constraint, not class imbalance. Null-label shuffle collapses toward chance, confirming the coordinates carry real class signal. A constrained logistic layer over deterministic coordinates reaches 0.8859 macro F1 vs. 0.5609 for hand grammar alone.
NIST SEM degradation (3,402 images): PTD-Z envelope breach ratio 0.9477 — Spearman ρ = −0.9485 between envelope norm and SSIM, −0.9184 with U-Net eval Dice. image_geometry and orientation_topology carry stronger Dice relationships than residual_only, arguing topology drift is not reducible to residual differencing alone. Aerial lithography simulation (mds2-3838, 3,402 images): LER identically zero on coherent defocus — a physics prediction confirmed before real data was examined. All 15 VTL v3.1 coordinates are contrast-immune (R²/contrast < 0.10).
No public dataset in this work provides direct fab process-cause labels such as focus, dose, etch, chamber, maintenance, or overlay. Carinthia is real semiconductor SEM defect imagery, but it is a defect-class dataset, not a known-cause process sequence. PTD-Z intentionally refuses process-causal language when evidence is absent. The strongest current claim is residual organizational signal: PTD-Z adds information after Classical_All in both the semiconductor-specific Carinthia audit and the broader NFFA-Europe SEM morphology audit. PTD-Z should not replace classical descriptors. Hybrid systems outperform either alone. The claim is route grammar and residual interpretable signal — not descriptor replacement or fab root-cause proof. The next decisive artifact is a physical-sequence validation packet: 30–100 through-focus SEM images from a shared nanofab, metrology lab, or process partner, with known focus/Z offsets.
A fine-tuned CNN or foundation-vision model may outperform VTL on raw Carinthia defect classification — this paper does not test that question and makes no claim of classification superiority over deep learning. Trained classifiers achieve strong accuracy but require per-domain training data, cannot explain which physical properties drove the prediction, and do not transfer to new imaging modalities without retraining. They produce a verdict, not a physical reading.
Continuous structural coordinates derived from image formation physics. The same 15 coordinates transfer across SEM, wafer maps, and aerial imagery without retraining because they track physical image organization, not domain statistics. PTD-Z adds route decomposition: selected route, runner-up, margin, process hypothesis, and refusal condition. If a classical descriptor explains the signal better, the system absorbs that fact rather than hiding it.
The 15-coordinate VTL vector is a structured, physically interpretable feature representation. It can be concatenated with any foundation model's learned embedding. The strongest results across all audits are hybrid systems — PTD-Z plus classical descriptors outperform either alone. The intended relationship is hybrid and complementary, not competitive. The 93.4% SEM accuracy figure is evidence that the coordinate space has real structural signal. Its value as a hybrid component, as a drift monitor, and as a physics-interpretable audit layer is not bounded by that ceiling.
The decisive missing artifact is a through-focus SEM stack from a shared nanofab or metrology lab: 30–100 images of the same site or repeated pattern with known focus/Z offsets. This closes the gap between defect-class signal and process-cause routing.
A split-condition dose/focus matrix would test material, pitch, CD, and bridge routes simultaneously. PTD-Z is designed to route these into separate hypothesis families — the controlled experiment tests whether the decomposition is meaningful under known variation.
The 58-axis structural vector concatenated with UNI, CONCH, or a ResNet embedding. VTL appears to carry mesoscale organizational signal that texture and moment descriptors do not fully preserve. Whether that gap survives deep learning embeddings is untested.
image_geometry and pitch_phase repeatedly survive independence audits; signal_noise and residual_only drift toward support roles. Whether this pattern is causal stability or a recurring artifact of the current audit design requires larger, process-linked studies.
Current benchmark is fixed at 224–480px. The hypothesis that fiber-scale organization reads more cleanly at larger field width requires tiling from source WSIs or higher-resolution SEM acquisition. The experiment was designed; data access remains the constraint.
The residual/support route requires paired image/reference data to move from support evidence to a primary route. 30–100 image/reference pairs with known overlay offsets would test geometry and phase displacement routing against a known ground truth.