PAID Project Page

PAID: (Prompt-guided) Attention Interpolation of Text-to-Image

NeurIPS 2024 Paper

(† corresponding authors)

¹ Computer Vision & Machine Learning Group, National University of Singapore ² S-Lab, Nanyang Technological University

Abstract

Conditional diffusion models can create unseen images in various settings, aiding image interpolation. Interpolation in latent spaces is well-studied, but interpolation with specific conditions like text or poses is less understood. Simple approaches, such as linear interpolation in the space of conditions, often result in images that lack consistency, smoothness, and fidelity. To that end, we introduce a novel training-free technique named Attention Interpolation via Diffusion (AID). Our key contributions include 1) proposing an inner/outer interpolated attention layer; 2) fusing the interpolated attention with self-attention to boost fidelity; and 3) applying beta distribution to selection to increase smoothness. We also present a variant, Prompt-guided Attention Interpolation via Diffusion (PAID), that considers interpolation as a condition-dependent generative process. This method enables the creation of new images with greater consistency, smoothness, and efficiency, and offers control over the exact path of interpolation. Our approach demonstrates effectiveness for conceptual and spatial interpolation, and can be applied to various tasks including compositional generation, image editing, image morphing and image-control generation.

Dataset	Method	Smoothness (↑)	Consistency (↓)	Fidelity (↓)
CIFAR-10	Text Embedding Interpolation	0.7531	0.3645	118.05
Denoising Interpolation	0.7564	0.4295	87.13
AID-O	0.7831	0.2905*	51.43*
AID-I	0.7861*	0.3271	101.13
LAION-Aesthetics	Text Embedding Interpolation	0.7424	0.3867	142.38
Denoising Interpolation	0.7511	0.4365	101.31
AID-O	0.7643	0.2944*	82.01*
AID-I	0.8152*	0.3787	129.41

Dataset

Method

Smoothness (↑)

Consistency (↓)

Fidelity (↓)

CIFAR-10

Text Embedding Interpolation

0.7531

0.3645

118.05

Denoising Interpolation

0.7564

0.4295

87.13

AID-O

0.7831

0.2905*

51.43*

AID-I

0.7861*

0.3271

101.13

LAION-Aesthetics

Text Embedding Interpolation

0.7424

0.3867

142.38

Denoising Interpolation

0.7511

0.4365

101.31

AID-O

0.7643

0.2944*

82.01*

AID-I

0.8152*

0.3787

129.41

BibTeX

If you find our work useful, please consider citing our paper:

@article{he2024aid, title={AID: Attention Interpolation of Text-to-Image Diffusion}, author={He, Qiyuan and Wang, Jinghao and Liu, Ziwei and Yao, Angela}, journal={arXiv preprint arXiv:2403.17924}, year={2024} }

PAID: (Prompt-guided) Attention Interpolation of Text-to-Image

Abstract

Applications of conditional interpolation

Method

Qualitative Results

Quantitative Results

BibTeX