Deterministic Structural Instrumentation

Most evaluation
frameworks measure
the wrong thing.

Parallax Metrology builds deterministic structural instruments for generative systems. No model in the loop. No semantic judgment. No guesswork. Three domains. One measurement philosophy.

01 — Visual
Visual Thinking Lens

Geometric fingerprinting for image generation models. Measures compositional bias, spatial priors, and structural drift — invisible to semantic evaluation.

K = [Δx, rᵥ, ρᵣ, μ, xₚ, θ, dₛ] · 800+ images validated
02 — Linguistic
Linguistic Kernel

Structural telemetry for LLM outputs. Eight dimensions. Deterministic. Every coordinate backed by countable evidence. No judge model required.

4,000+ responses · Cross-engine validated · Cohen's d = 2.47
03 — Pathology
Parallax Pathology

Deterministic structural measurement for H&E histology. 58 axes. Zero trained parameters. The geometry of deviation is the diagnosis.

91.5% nine-class accuracy · 0 trained parameters · HR=1.872 DSS
Deterministic · No Model In The Loop Same Input · Same Output Structure Before Semantics · Always Countable Evidence · Every Coordinate Engine-Agnostic · Cross-Domain The Ruler, Not The Critic · Parallax Metrology Deterministic · No Model In The Loop Same Input · Same Output Structure Before Semantics · Always Countable Evidence · Every Coordinate Engine-Agnostic · Cross-Domain The Ruler, Not The Critic · Parallax Metrology

Image models repeat
the same geometry.

Semantic diversity explains less than 10% of observed spatial variance in text-to-image systems. Composition is not prompt-driven. It is prior model-driven.


400 MidJourney prompts. 8 semantic categories. One geometric attractor. 100% of outputs within 0.15 radius of center. Different prompts, different subjects, identical compositional bias.


The VTL measures the spatial signature each engine learned from its training data — the prior it applies regardless of what you ask for. That signature is stable, reproducible, and invisible to standard evaluation.

"Spatial prompt intensity explains 0–0.1% of compositional displacement variance. The model has already decided where mass goes."

VTL · Seven geometric primitives · Engine-agnostic
0.15
Attractor Radius
100% of MidJourney outputs fall within 0.15 radius of geometric center under standard prompting
6%
Semantic Variance Explained
Subject category explains 6% of spatial variance. 94% is structural prior, not prompt.
800+
Images Validated
Cross-platform: MidJourney, Sora, GPT, SDXL, Firefly, OpenArt
3–4
Steps Early Warning
Compositional metrics detect model degradation 3–4 inference steps before semantic breakdown
Δx,y
Mass Displacement
Where compositional mass sits — placement offset from center. The primary axis for detecting attractor behavior and steering.
rᵥ
Void Ratio
How much empty space surrounds the mass. Captures openness, compression, and negative space behavior.
ρᵣ
Packing Density
How compressed the marks are. Measures spatial concentration of compositional elements.
μ
Compositional Unity
How unified the composition reads as a field. Structural coherence across the image plane.
xₚ
Peripheral Pull
How hard the edges pull against center. Measures resistance to the central attractor.
θ
Orientation Stability
Directional coherence of structural elements. High values indicate trained directional priors.
dₛ
Structural Thickness
Surface depth and layering. Measures how models build spatial complexity above a base plane.
Try the Demo Python on GitHub Full Documentation

Text has shape.
Measure it.

Most production failures in LLM deployments are not benchmark failures. They are structural reliability failures. The model breaks an output format, drifts structurally across conversation turns, compresses under stress, or fails to recover after adversarial input. These failures are invisible to benchmark evaluation.


The Linguistic Kernel is a deterministic, behavior-only instrumentation layer. It reduces any text string to a coordinate in eight-dimensional structural space. Same input, same output, every time. No model in the loop. Every number backed by countable evidence.


It is not a scoring system. It is a location system. Two responses can be equally correct and land at completely different coordinates. That is not a flaw — it is the point.

01 Tokenize & Segment Words, sentences, paragraphs
02 Extract Evidence All countable features
03 Compute Coordinates Eight dimensions, versioned math
04 Classify Basin + rhetorical state
05 Assemble Output Full audit bundle returned
2.47
Cohen's d
Δy separation between normal and constrained responses. Largest effect size across all eight dimensions.
4,000+
Responses Validated
Cross-engine: Claude, GPT, Gemini. Attractor basin coherence confirmed across all three.
100%
Collapse Detection
v19 three-tier detector: TP=15, FP=0, FN=0 on GPT EMS corpus (n=400).
0
Models In The Loop
Entirely deterministic. No external calls. ~0.3ms per 1,000 tokens.
Finding 01
Structural behavior is stable across runs
For any given prompt under normal conditions, kernel coordinates cluster tightly around a consistent centroid. 89% of Gemini EMS responses fall within a single geometrically coherent attractor basin. No random structural wandering.
Finding 02
Constraint prompts induce predictable deformation
The model bends before it breaks. Under constraint: rᵥ collapsed from 0.349 → 0.196, Δy spiked from 0.248 → 1.450, θ collapsed from 0.644 → 0.141. Bootstrap CIs show no overlap with normal intervals.
Finding 03
Two failure modes are geometrically opposite
Constraint-compliance (rᵥ collapse, Δy spike) and formatting collapse (rᵥ inflation, Δy negative) occupy opposite corners of kernel space. A single detection threshold cannot catch both. Domain predicts which failure mode is likely.
Finding 04
Cohesion is invariant under structural stress
μ coefficient of variation: 0.056 across all 400 responses including constrained and truncated groups. When μ moves significantly, something more fundamental than formatting pressure has changed. Reliable stability baseline.
Try the Demo GitHub Full Documentation

The pathology is the
absence of normal form.

Standard computational pathology trains models to learn what disease looks like. Parallax Pathology asks a different question with a different instrument.


The system encodes the structural laws of normal tissue. Deviation from those laws is the signal. A rare disease the system has never seen will register as structurally distant from all known normals. That is not a misclassification — it is the correct output.


58 deterministic axes. 16 structural theories applied simultaneously. Zero trained parameters. Every output traceable to a specific geometric property. The explanation is the measurement, not post-hoc attribution.

"The average tumor state does not predict outcome. What predicts outcome is how structural states are distributed across tumor regions."

TCGA-COAD · n=180 · HR=1.872 DSS · p=0.0002
91.5%
Nine-Class Accuracy
CRC-VAL-HE-7K, 5-fold cross-validation. 1,800 patches. Zero trained parameters.
16.3pp
Emergence Gap
Best single layer: 75.2%. Full 58-axis system: 91.5%. The combination is genuinely greater than any component.
1.872
Hazard Ratio (DSS)
Disease-specific survival. No outcome training. No molecular data. Stage-adjusted. p=0.0002. C-index: 0.808.
3.3×
Mortality Difference
Low vs. high composite tertile. Structural geometry predicts survival without knowing outcomes.
Validation 01 — CRC-VAL-HE-7K
Tissue classification without training
Nine tissue classes. 91.5% accuracy. Zero trained parameters. Deviant TUM patches stratify into two geometrically distinct failure modes — mucinous differentiation and desmoplastic reaction — detected from geometry alone, without pathological labels.
Validation 02 — EBHI-SEG
Dysplasia grading as continuous measurement
Six-class dysplasia grading from Normal through Adenocarcinoma (n=920). Spearman ρ = 0.716. 84.3% of predictions correct or within one adjacent grade step. The system correctly places serrated adenoma structurally distinct from conventional low-grade neoplasia.
Validation 03 — GTEx Cross-Dataset
The frame is universal. The memory is not.
Three independent GTEx whole slide images. Different fixation chemistry from training data. Structural centers emerge independently within each donor without labels. The frame invariant. The reference swappable.
Validation 04 — TCGA-COAD Survival
The honest null, and what it revealed
Primary hypothesis (mean Ω predicts OS) was not supported. HR=0.766, p=0.132. Reported directly. What emerged: spatial distribution across tumor regions predicts outcome. HR=1.872 DSS, p=0.0002. The average state is not the signal — the field distribution is.
Try the Demo Read the Paper Full Documentation