LLMs are trained purely on text — yet their internal representations spontaneously develop geometric structures that mirror human perception across color, pitch, emotion, and taste. These structures emerge in intermediate layers before attenuating in deeper representations.
Key Findings
By extracting layer-wise representations from open-weight language models and comparing them to human perceptual baselines, we uncovered a consistent pattern across domains and model families.
Color wheels, pitch spirals, emotion manifolds, and taste maps form spontaneously inside LLMs trained only on text.
Taste peaks early and fades fast. Emotion peaks later and persists deep. Pitch and color lie in between, each with their own curve.
Early layers are diffuse, intermediate layers crystallize human-like manifolds, later layers dissolve them as models shift to task-specific computation.
Overview
The figure below shows the central result: for color (LLaMA-3-8B), pitch (Qwen-3-4B), and emotion (Gemma-7B), we display the human perceptual baseline, the model's peak-alignment geometry, and the layer-wise alignment profile.
Figure 2. Each row is a perceptual domain. Left column: human perceptual baseline. Middle: peak-alignment model geometry via MDS. Right: RSA and GPA alignment scores across layers, showing the rise–peak–fade trajectory.
Figure 3. Human perceptual map (top left) vs. Gemma-7B peak-layer LLM representation for taste (top right). Bottom: layer-wise RSA and GPA alignment scores.
The Pattern
Layer-wise alignment profile (schematic)
Results by modality
At peak alignment layers, color representations organize into a smooth circular manifold closely resembling the human perceptual color wheel — arising from purely linguistic statistics.
The depth-wise pattern is consistent across all four architectures: a clear rise–peak–fall profile where color alignment peaks at an intermediate layer before late-layer attenuation. Qwen-3-4B shows a brief late-layer rebound before final degradation.
Fig. 6. Layer-wise emergence of 2D color geometry in LLaMA-3-8B. Human baseline (top-left), early layer (top-right), peak layer (bottom-left), final layer (bottom-right).
The peak-layer geometry reveals a smooth arc-like organization consistent with continuous, ordinal human pitch perception. No discrete clusters — pitch is represented relationally, as a continuous manifold.
Early layers show weak partial ordering; intermediate layers undergo a structural transition; later layers progressively deform this organization.
Fig. 17. Layer-wise emergence of 3D Pitch geometry in Qwen-3-4B. Human baseline (top-left), early layer (top-right), peak layer (bottom-left), final layer (bottom-right).
The peak-layer representation recovers a well-organized affective manifold aligned with the human valence–arousal structure. Unlike color, this alignment remains comparatively stable across later layers.
This persistence suggests that emotional geometry is more deeply encoded — perhaps because affective concepts are more densely represented in language than sensory modalities.
Fig. 11. Layer-wise emergence of emotion geometry in Gemma-7B. Human baseline (top-left), early layer (top-right), peak layer (bottom-left), final layer (bottom-right).
Taste representations recover a qualitatively well-formed manifold with strong geometric alignment (high GPA). The relative ordering of primary tastes and mixtures broadly matches human perceptual arrangement.
However, taste diverges from other domains: lower RSA scores and noisier layer-wise profiles indicate less precise pairwise relations, and the geometry degrades more rapidly after its peak.
Fig. 13. Layer-wise emergence of Taste geometry in Gemma-7B. Human baseline (top-left), early layer (top-right), peak layer (bottom-left), final layer (bottom-right).
Methodology
No probing classifiers. No fine-tuning. We extract layer-wise residual stream representations, construct geometric maps, and compare them to human perceptual baselines using two complementary metrics.
01 · Stimuli
Each stimulus (e.g. #9B081A, afraid, 261 Hz) is embedded in a short template designed to avoid semantic bias.
02 · Extract
Last-token hidden state activations are extracted at every transformer layer from the residual stream.
03 · Geometry
Pairwise cosine dissimilarities are projected into low-dimensional space via MDS, verified with Isomap to rule out projection artifacts.
04 · Compare
Representational Similarity Analysis and Generalized Procrustes Analysis measure alignment against human perceptual baselines at each layer.
Paper summary
LLMs spontaneously develop structured, human-like perceptual geometry within their hidden representations — with no direct perceptual supervision at any point during training. Across color, pitch, emotion, and taste, concepts self-organize into geometric manifolds that closely parallel how humans perceptually arrange the same concepts.
Perceptual domains do not emerge uniformly — each follows a distinct layer-wise trajectory shaped by its own complexity. Simpler relational structures such as taste crystallize in earlier layers and dissolve quickly, while richer domains like emotion build more gradually and remain coherent well into deeper layers.
Perceptual geometry is not distributed uniformly across a model's depth — it arises transiently. Early layers hold weak or fragmented structure; intermediate layers cohere into recognizable human-like manifolds; later layers progressively dismantle these representations as the network shifts toward task-oriented processing. This "rise → peak → fade" arc appears consistently across architectures and domains.
The entire analysis is fully intrinsic — no probing classifiers, no fine-tuning, no additional supervision. We reconstruct layer-wise geometry via MDS and Isomap, then quantify alignment against human perceptual baselines using Representational Similarity Analysis and Generalized Procrustes Analysis at every layer.
Color representations converge into smooth, circular manifolds closely resembling the human color wheel. Emotion takes a different path: its valence–arousal structure not only peaks strongly but stays comparatively stable through later layers — suggesting affective geometry is more persistently encoded than purely sensory geometry.
Pitch representations gradually organize into smooth, continuous manifolds that reflect the ordinal nature of human pitch perception, before deforming again at greater depth. Taste is the outlier — emerging earlier than any other domain but proving far noisier and less stable, degrading rapidly after its peak.
Taken together, these results suggest that text-only training is sufficient for LLMs to internalize structured approximations of human perceptual space — not as a deliberate capability, but as a natural consequence of the statistical geometry embedded in language co-occurrence.
Citation
This paper is currently under review at ICML Mechanistic Interpretability workshop 2026. If you use this work, please cite:
@misc{singh2026geometryhumanperceptualdomains,
title={Geometry of Human Perceptual Domains Emerges Transiently in LLM Representations},
author={Simardeep Singh and Paras Chopra},
year={2026},
eprint={2605.27970},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2605.27970},
}