Rui Yang | Research

Minimal Visual Abstraction in Computer Vision

Minimal visual abstraction in computer vision aims to investigate how complex visual content can be represented with the smallest possible amount of visual information while still preserving semantic recognizability and perceptual coherence. From ancient petroglyphs such as the Helan Mountain Rock Art, to traditional Chinese freehand landscape painting, and later to the continuous-line artworks of Pablo Picasso, the evolution of visual art consistently demonstrates a common principle: human perception does not require complete pixel-level realism to recognize objects, actions, or emotions.

Instead, humans rely primarily on sparse structural cues, geometric organization, contour continuity, and semantic topology. This observation aligns closely with several emerging directions in computer vision, including sparse representation learning, structure-aware generation, semantic compression, differentiable vector graphics, and sketch-based visual understanding. In this context, One Continuous Line Drawing (OCLD) provides an ideal computational paradigm for studying minimal abstraction because it imposes an extreme information bottleneck: objects must be represented using a single continuous stroke without redundant texture, shading, or repeated contours.

Unlike traditional edge extraction methods such as Canny or HED, which densely preserve local gradients and high-frequency details, continuous-line abstraction requires global semantic reasoning, topology-aware structure simplification, and trajectory-level optimization. The problem therefore extends beyond low-level image processing into high-level visual cognition and semantic organization.

Minimal visual abstraction and structural perception

From the ancient Helan Mountain Rock Art to contemporary minimal line art, early human visual expression already demonstrated the principle of minimal visual abstraction. Through sparse contours, symbolic animal forms, and simplified structural representations, the petroglyphs preserve semantic recognizability while discarding unnecessary visual details, reflecting an early form of structure-aware visual communication that parallels modern research in semantic abstraction and perceptual sparsification in computer vision.

Abstraction scale and semantic sufficiency

A continuous-line flamingo drawing by Pablo Picasso exemplifies the concept of minimal visual abstraction through the use of a single uninterrupted stroke to represent the essential structure and posture of the object. Despite the absence of texture, shading, and detailed geometry, the drawing remains immediately recognizable due to its preservation of semantically critical contours and global topology. This illustrates how human visual perception relies more strongly on sparse structural organization than on pixel-level realism, providing artistic and cognitive inspiration for structure-aware abstraction, semantic sparsification, and continuous-stroke generation in modern computer vision.

From a technical perspective, minimal visual abstraction introduces several fundamental challenges for modern computer vision systems. First, the model must identify the minimal semantic sufficient set of visual structures required for object recognition. This involves selectively preserving critical components such as object silhouettes, skeletal geometry, pose-defining contours, and perceptually salient junctions, while removing redundant texture and appearance information. Such a process resembles semantic sparsification and perceptual compression rather than conventional image reconstruction.

Second, continuous-line abstraction requires global topological consistency. Since all visual elements must be connected through a single stroke trajectory, the system must jointly optimize semantic importance, stroke continuity, curvature smoothness, and spatial connectivity under severe structural constraints. This transforms the task into a hybrid problem involving vectorized rendering, path planning, differentiable optimization, and generative abstraction.

Recent advances in diffusion-based image generation further provide new opportunities for abstraction-oriented visual modeling. While diffusion models traditionally focus on progressively reconstructing detailed images from noise, minimal abstraction can be interpreted as the reverse process: progressively discarding non-essential information until only semantically meaningful structures remain. Such a perspective suggests a new research direction that may be described as reverse semantic distillation, where visual complexity is iteratively reduced while semantic fidelity is preserved. This idea is closely related to sparse attention mechanisms, token pruning strategies in vision transformers, selective state-space models, and structure-guided generation frameworks. Consequently, minimal abstraction is not merely an artistic rendering problem, but also a computational framework for efficient semantic representation and interpretable visual reasoning.

More importantly, minimal visual abstraction reveals a deeper connection between computational vision and human perceptual cognition. Traditional computer vision systems have long been dominated by pixel-level objectives such as PSNR, SSIM, and photorealistic synthesis, emphasizing detailed reconstruction and high-dimensional feature learning. However, the success of sketches, calligraphy, freehand paintings, and continuous-line drawings suggests that perceptual understanding depends more strongly on structural organization than on exhaustive visual completeness.

Human observers are capable of perceptual completion, contour inference, and semantic reconstruction even when large portions of visual information are absent. This indicates that the human visual system naturally favors structured abstraction over raw visual fidelity. Therefore, future research in minimal visual abstraction may contribute not only to sketch generation, vector graphics synthesis, and stylized rendering, but also to broader areas such as interpretable artificial intelligence, low-bandwidth visual communication, efficient multimodal representation learning, and cognitively inspired computer vision systems.

In this sense, minimal visual abstraction provides a unifying framework that connects ancient artistic expression, modern generative modeling, structural perception, and semantic intelligence, ultimately shifting the focus of computer vision from reproducing visual reality toward understanding the essential structures underlying visual cognition.

Minimal abstraction and perceptual cognition

Minimal abstraction in script evolution—from oracle bone script to regular script—illustrates a progressive structural simplification that preserves category-level recognizability while discarding low-level visual detail, aligning with the principles of contour-based shape encoding and cognitively efficient perceptual organization in computer vision.