Visual Thinking Lens — Image Model Compositional Analysis

What It Measures

The model has already
decided where mass goes.

400 MidJourney prompts. 8 semantic categories ranging from portraits to landscapes to architecture. One geometric attractor. 100% of outputs fall within 0.15 radius of geometric center. The subject changes. The compositional prior doesn't.

Spatial prompt intensity — words like "left," "edge," "corner," "peripheral" — explains 0–0.1% of compositional displacement variance. The model is not reading the spatial instruction. It is applying a learned structural prior regardless of what you ask for.

The VTL gives that prior a coordinate. A fingerprint. Something you can measure, compare, and track across model versions, prompt families, and inference conditions.

"Spatial prompt intensity explains 0–0.1% of compositional displacement variance. The model has already decided where mass goes."

0.15

Attractor Radius

100% of MidJourney outputs cluster within 0.15 radius of center

Semantic Variance

Subject category explains only 6% of spatial variance

Read Mass Not Subject

800+

Images Validated

Cross-platform: MidJourney, Sora, GPT, SDXL, Firefly, OpenArt, and 8 others

3–4

Steps Early Warning

Compositional metrics detect model degradation 3–4 inference steps before semantic breakdown

<10%

Semantic Explanatory Power

Semantic diversity explains less than 10% of observed spatial variance across all tested engines

Geometric Primitives

Seven independent structural axes capturing the full compositional signature of any image

Seven Dimensions

The coordinate space.

Each dimension captures a distinct structural property of the image. Together they form a complete compositional fingerprint — reproducible, comparable, and engine-agnostic.

Δx,y

Mass Displacement

Where compositional mass sits relative to center. The primary axis for detecting attractor behavior and measuring how far a model pulls from neutral placement. High values indicate strong learned positional priors.

rᵥ

Void Ratio

How much empty space surrounds the mass. Captures openness, compression, and negative space behavior. Models with high void ratio tend toward isolated subject placement with large inactive regions.

ρᵣ

Packing Density

How compressed the compositional elements are. Measures spatial concentration — whether elements cluster tightly or distribute across the field. Distinguishes between sparse and dense compositional priors.

Compositional Unity

How unified the composition reads as a field. Structural coherence across the image plane — whether elements form a single integrated mass or fragment into competing centers.

xₚ

Peripheral Pull

How hard the edges pull against center. Measures resistance to the central attractor — the degree to which structural energy reaches the periphery rather than collapsing to center.

Orientation Stability

Directional coherence of structural elements. High values indicate strong trained directional priors — the model consistently places elements along preferred axes regardless of subject matter.

dₛ

Structural Thickness

Surface depth and layering. Measures how models build spatial complexity above a base plane — the degree of foreground/background stratification and depth-cue reliance in the compositional prior.

Before / After — Same Prompt, Different Structure

Prompt variation reveals the structural prior.

When the same compositional prompt produces structurally different outputs, the VTL makes that difference countable. These two outputs share the same prompt family. The structural coordinates diverge significantly.

Output A — Baseline

Δx: −0.04 · rᵥ: 0.41
ρᵣ: 0.52 · Basin: B0
State: centered_compact

Output B — Steered

Δx: +0.18 · rᵥ: 0.28
ρᵣ: 0.67 · Basin: B1
State: right_displaced

Weight shift Δx moved +0.22 between outputs · Packing density ρᵣ increased 29% · Basin classification shifted from centered to right-displaced

Regression Detection

Flagging structural drift
before it becomes visible.

The VTL establishes a neutral baseline for any prompt family and flags outputs that deviate beyond the 2σ detection boundary. Five flagged outputs in this MidJourney corpus — all identified from geometry alone, before content-level inspection.

How the gate works

The VTL establishes a neutral baseline distribution for any prompt family. Individual outputs are evaluated against the 2σ envelope. Structural outliers are flagged before content review.

Normal outputs

Within 2σ detection boundary — structurally consistent with the prompt family baseline

Baseline centroid

Δx: +0.0099, rᵥ: 0.3923 — the neutral structural center for this prompt family

Flagged outputs

5 outputs exceed 2σ boundary — structural regression detected from geometry alone, before content inspection

Recursive Steering

Measurement creates
a control surface.

Once you can measure where visual mass goes, you can redirect it. The VTL is not just a detection instrument — it is a steering interface. Structural coordinates become constraints. Constraints become prompts. Prompts produce controlled deformation.

Generative models default to anatomical coherence as a safety heuristic. Getting a model to produce a purposeful, isolated anatomical impossibility — a neck that stretches impossibly upward while the body remains grounded, lighting consistent, fabric unaffected — requires breaking that heuristic at a specific structural node without triggering global incoherence.

The framework: Intent → Anchors → Constraints → Prompts → Transforms. Each stage feeds the next. The output is not an accident of latent space. It is a specified structural state.

"Most distortion relies on accidental artifacts or post-processing. Getting AI to generate purposeful, isolated anatomical impossibilities during initial generation — while maintaining coherence everywhere else — is uncharted territory."

CONSTRAINT CHAIN

Intent → Anchors → Constraints → Prompts → Transforms

Baseline figure — default anatomical output

Baseline — Default Output

Subject centered · anatomy normal
Structural prior: unmodified
Δx: 0.02 · rᵥ: 0.38

Steered figure — controlled anatomical deformation

Steered — Controlled Deformation

Neck extended · body anchored
Lighting coherent · fabric unaffected
Δx: −0.08 · rᵥ: 0.51

Deformation Playbook

Purposeful isolated anatomical distortion generated during initial inference · Structural coherence maintained throughout · No post-processing

Key Findings

What the corpus showed.

Finding 01

All tested engines share a central attractor

MidJourney, Sora, GPT, SDXL, Firefly — every engine tested shows a strong central compositional attractor. 100% of MidJourney outputs fall within 0.15 radius of geometric center across all semantic categories. The attractor is engine-specific in position but universal in existence.

Finding 02

Semantic categories explain less than 6% of spatial variance

Subject category — portrait vs landscape vs architecture vs abstract — accounts for roughly 6% of compositional displacement variance. The remaining 94% is structural prior, not prompt. Semantic diversity does not produce compositional diversity.

Finding 03

Structural degradation precedes semantic failure

Compositional metrics detect model degradation 3–4 inference steps before visible semantic breakdown. The geometry moves first. This makes the VTL a pre-semantic early warning instrument — catching drift before a human reviewer would notice anything wrong.

Finding 04

Cross-platform linear progressions are measurable

Displacement, void ratio, and architectural ceiling metrics show cross-platform linear progressions consistent enough to enable engine comparison and fingerprinting. Different models occupy different structural territories — and those territories are stable across prompt families.

Image models repeatthe same geometry.We measure it.

The model has alreadydecided where mass goes.

The coordinate space.

Prompt variation reveals the structural prior.

Flagging structural driftbefore it becomes visible.

Measurement createsa control surface.

What the corpus showed.

Image models repeat
the same geometry.
We measure it.

The model has already
decided where mass goes.

Flagging structural drift
before it becomes visible.

Measurement creates
a control surface.