VTL compositional analysis — displaced structure

01 — Visual Thinking Lens

Image models repeat
the same geometry.
We measure it.

Semantic diversity explains less than 10% of observed spatial variance in text-to-image systems. Composition is not prompt-driven. It is prior model-driven. The VTL makes that prior visible, measurable, and comparable across engines.

Compositional Bias · Measurable Engine-Agnostic · Deterministic 800+ Images Validated · 14+ Models Semantic Categories Explain · 6% of Variance Structural Degradation · 3–4 Steps Early Same Prompt · Same Attractor Compositional Bias · Measurable Engine-Agnostic · Deterministic 800+ Images Validated · 14+ Models Semantic Categories Explain · 6% of Variance Structural Degradation · 3–4 Steps Early Same Prompt · Same Attractor

The model has already
decided where mass goes.

400 MidJourney prompts. 8 semantic categories ranging from portraits to landscapes to architecture. One geometric attractor. 100% of outputs fall within 0.15 radius of geometric center. The subject changes. The compositional prior doesn't.

Spatial prompt intensity — words like "left," "edge," "corner," "peripheral" — explains 0–0.1% of compositional displacement variance. The model is not reading the spatial instruction. It is applying a learned structural prior regardless of what you ask for.

The VTL gives that prior a coordinate. A fingerprint. Something you can measure, compare, and track across model versions, prompt families, and inference conditions.

"Spatial prompt intensity explains 0–0.1% of compositional displacement variance. The model has already decided where mass goes."

0.15
Attractor Radius
100% of MidJourney outputs cluster within 0.15 radius of center
6%
Semantic Variance
Subject category explains only 6% of spatial variance
VTL geometric measurement applied across 8 image generation outputs
VTL geometric measurement — concentric radial analysis applied across 8 generative outputs · Same instrument, different structural signatures

Instrument Pipeline

Visual Thinking Lens measurement pipeline
800+
Images Validated
Cross-platform: MidJourney, Sora, GPT, SDXL, Firefly, OpenArt, and 8 others
3–4
Steps Early Warning
Compositional metrics detect model degradation 3–4 inference steps before semantic breakdown
<10%
Semantic Explanatory Power
Semantic diversity explains less than 10% of observed spatial variance across all tested engines
7
Geometric Primitives
Seven independent structural axes capturing the full compositional signature of any image

Seven Dimensions

The coordinate space.

Each dimension captures a distinct structural property of the image. Together they form a complete compositional fingerprint — reproducible, comparable, and engine-agnostic.

Δx,y
Mass Displacement
Where compositional mass sits relative to center. The primary axis for detecting attractor behavior and measuring how far a model pulls from neutral placement. High values indicate strong learned positional priors.
rᵥ
Void Ratio
How much empty space surrounds the mass. Captures openness, compression, and negative space behavior. Models with high void ratio tend toward isolated subject placement with large inactive regions.
ρᵣ
Packing Density
How compressed the compositional elements are. Measures spatial concentration — whether elements cluster tightly or distribute across the field. Distinguishes between sparse and dense compositional priors.
μ
Compositional Unity
How unified the composition reads as a field. Structural coherence across the image plane — whether elements form a single integrated mass or fragment into competing centers.
xₚ
Peripheral Pull
How hard the edges pull against center. Measures resistance to the central attractor — the degree to which structural energy reaches the periphery rather than collapsing to center.
θ
Orientation Stability
Directional coherence of structural elements. High values indicate strong trained directional priors — the model consistently places elements along preferred axes regardless of subject matter.
dₛ
Structural Thickness
Surface depth and layering. Measures how models build spatial complexity above a base plane — the degree of foreground/background stratification and depth-cue reliance in the compositional prior.

Prompt variation reveals the structural prior.

When the same compositional prompt produces structurally different outputs, the VTL makes that difference countable. These two outputs share the same prompt family. The structural coordinates diverge significantly.

Storm baseline output
Output A — Baseline
Δx: −0.04 · rᵥ: 0.41
ρᵣ: 0.52 · Basin: B0
State: centered_compact
Storm variant output
Output B — Steered
Δx: +0.18 · rᵥ: 0.28
ρᵣ: 0.67 · Basin: B1
State: right_displaced

Weight shift Δx moved +0.22 between outputs · Packing density ρᵣ increased 29% · Basin classification shifted from centered to right-displaced

Flagging structural drift
before it becomes visible.

The VTL establishes a neutral baseline for any prompt family and flags outputs that deviate beyond the 2σ detection boundary. Five flagged outputs in this MidJourney corpus — all identified from geometry alone, before content-level inspection.

ΔX (HORIZONTAL PLACEMENT) RV (VOID RATIO) −0.2 −0.1 0.0 0.1 0.2 0.2 0.3 0.4 0.5 0.6 ALERT 1.7σ BASELINE Δx: +0.0099 ± 0.0556 rᵥ: 0.3923 ± 0.0850 FLAGGED: 5 STATUS: Outside 2σ MJ REGRESSION DETECTION — BASELINE: NEUTRAL PROMPT (n=36)

The VTL establishes a neutral baseline distribution for any prompt family. Individual outputs are evaluated against the 2σ envelope. Structural outliers are flagged before content review.

Normal outputs
Within 2σ detection boundary — structurally consistent with the prompt family baseline
Baseline centroid
Δx: +0.0099, rᵥ: 0.3923 — the neutral structural center for this prompt family
Flagged outputs
5 outputs exceed 2σ boundary — structural regression detected from geometry alone, before content inspection

Measurement creates
a control surface.

Once you can measure where visual mass goes, you can redirect it. The VTL is not just a detection instrument — it is a steering interface. Structural coordinates become constraints. Constraints become prompts. Prompts produce controlled deformation.

Generative models default to anatomical coherence as a safety heuristic. Getting a model to produce a purposeful, isolated anatomical impossibility — a neck that stretches impossibly upward while the body remains grounded, lighting consistent, fabric unaffected — requires breaking that heuristic at a specific structural node without triggering global incoherence.

The framework: Intent → Anchors → Constraints → Prompts → Transforms. Each stage feeds the next. The output is not an accident of latent space. It is a specified structural state.

"Most distortion relies on accidental artifacts or post-processing. Getting AI to generate purposeful, isolated anatomical impossibilities during initial generation — while maintaining coherence everywhere else — is uncharted territory."

CONSTRAINT CHAIN
Intent Anchors Constraints Prompts Transforms
Baseline figure — default anatomical output
Baseline — Default Output
Subject centered · anatomy normal
Structural prior: unmodified
Δx: 0.02 · rᵥ: 0.38
Steered figure — controlled anatomical deformation
Steered — Controlled Deformation
Neck extended · body anchored
Lighting coherent · fabric unaffected
Δx: −0.08 · rᵥ: 0.51

Purposeful isolated anatomical distortion generated during initial inference · Structural coherence maintained throughout · No post-processing

What the corpus showed.

Finding 01
All tested engines share a central attractor
MidJourney, Sora, GPT, SDXL, Firefly — every engine tested shows a strong central compositional attractor. 100% of MidJourney outputs fall within 0.15 radius of geometric center across all semantic categories. The attractor is engine-specific in position but universal in existence.
Finding 02
Semantic categories explain less than 6% of spatial variance
Subject category — portrait vs landscape vs architecture vs abstract — accounts for roughly 6% of compositional displacement variance. The remaining 94% is structural prior, not prompt. Semantic diversity does not produce compositional diversity.
Finding 03
Structural degradation precedes semantic failure
Compositional metrics detect model degradation 3–4 inference steps before visible semantic breakdown. The geometry moves first. This makes the VTL a pre-semantic early warning instrument — catching drift before a human reviewer would notice anything wrong.
Finding 04
Cross-platform linear progressions are measurable
Displacement, void ratio, and architectural ceiling metrics show cross-platform linear progressions consistent enough to enable engine comparison and fingerprinting. Different models occupy different structural territories — and those territories are stable across prompt families.