PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design

*Equal Contribution    Corresponding Author

Video Presentation

Abstract

Graphic design forms the cornerstone of modern visual communication, serving as a vital medium for promoting cultural and commercial events. Recent advances have explored automating this process using Large Multimodal Models (LMMs), yet existing methods often produce geometrically inaccurate layouts and lack the iterative, layer-specific editing required in professional workflows. To address these limitations, we present PosterCopilot, a framework that advances layout reasoning and controllable editing for professional graphic design. Specifically, we introduce a progressive three-stage training strategy that equips LMMs with geometric understanding and aesthetic reasoning for layout design, consisting of Perturbed Supervised Fine-Tuning, Reinforcement Learning for Visual-Reality Alignment, and Reinforcement Learning from Aesthetic Feedback. Furthermore, we develop a complete workflow that couples the trained LMM-based design model with generative models, enabling layer-controllable, iterative editing for precise element refinement while maintaining global visual consistency. Extensive experiments demonstrate that PosterCopilot achieves geometrically accurate and aesthetically superior layouts, offering unprecedented controllability for professional iterative design.

Overview

Multi-stage training strategy

Rather than formulating the training process as a simple regression task, we endow PosterCopilot with outstanding layout capabilities and human-like aesthetics through a three-stage training paradigm. This enables LMMs to reason over continuous spatial relationships while instilling design principles and human aesthetics.

Technical Framework

1. Perturbed Supervised Fine-Tuning
Perturbed Supervised Fine-Tuning (PSFT) reformulates coordinate regression into a distribution-based learning paradigm by introducing controlled perturbations to ground-truth coordinates. Compared to point-wise regression, learning a coordinate distribution allows the model to reason over continuous spatial relationships rather than memorizing discrete positions, leading to more coherent and aesthetically balanced layouts.

2. Reinforcement Learning for Visual-Reality Alignment
Reinforcement Learning for Visual-Reality Alignment (RL-VRA) introduces verifiable geometric reward signals to explicitly correct residual spatial inaccuracies after SFT, providing reliable visual-reality alignment. This geometrically-grounded approach aligns model outputs with graphic design principles, ensuring geometric accuracy and visual realism.

3. Reinforcement Learning from Aesthetic Feedback
Reinforcement Learning from Aesthetic Feedback (RLAF) employs a learned aesthetic reward model to encourage the model to generate aesthetically coherent and diverse compositions. This stage encourages the layout model to discover novel, high-appeal layouts that may surpass the ground-truth.

4. Versatile Poster Layout and Precise Editing Pipeline
We present PosterCopilot, a complete workflow that couples the trained LMM-based design model with generative agents. This framework enables layer-controllable, iterative editing for precise element refinement while maintaining global visual consistency, supporting multiple user input modes and offering unprecedented controllability for professional graphic design.

PosterCopilot Dataset

PosterCopilot Dataset Pipeline

We introduce the PosterCopilot Dataset, constructed to address the long-standing challenge of over-segmentation in multi-layer poster datasets, where single visual elements are often fragmented across independent layers. By employing a novel OCR-based pipeline to merge these overly fine-grained components and filter out redundancies, we curated a high-quality dataset comprising 160K posters with a total of 2.6M layers (1.2M text and 1.4M image/decorative elements). PosterCopilot Dateset spans over 40 distinct domains, ranging from commercial promotions and entertainment to public announcements. This thematic diversity captures a vast spectrum of design aesthetics and layout structures.

Application Examples: Precise Layer-level Control and Fine-grained Editing

1. Poster Layout Generation

PosterCopilot excels at arranging a complete set of user-provided assets into an aesthetically pleasing, professional-grade design, while guaranteeing every asset is faithfully preserved without alteration.

recons_1

2. Poster generation from insufficient assets

PosterCopilot's generative agent handles incomplete assets by synthesizing missing layers, such as background or foreground layers, with stylistic consistency.

insufficient_assets

3. Multi-round refinement

PosterCopilot supports multiple, varied edits on a single, fine-grained layer (e.g., modifying a camera's material or a character's pose). This high-fidelity process strictly confines the edit scope to the target layer, ensuring precise modification while preserving all other elements.

multi_round multi_round_2

4. Poster theme switch

PosterCopilot can seamlessly switch between different thematic styles while maintaining visual coherence and layout integrity.

theme_switch

5. Poster Reframe

Leveraging the design model's powerful reasoning capability, PosterCopilot can intelligently reframe and regenerate appropriate layouts simply by modifying the canvas size specification in the input requirements.

reframe

BibTeX

@misc{wei2025postercopilot,
        title={PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design}, 
        author={Jiazhe Wei and Ken Li and Tianyu Lao and Haofan Wang and Liang Wang and Caifeng Shan and Chenyang Si},
        year={2025},
        eprint={2512.04082},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
  }