Styleclip: Text-driven manipulation of stylegan imagery
Proceedings of the IEEE/CVF international conference on …, 2021•openaccess.thecvf.com
Inspired by the ability of StyleGAN to generate highly re-alistic images in a variety of
domains, much recent work hasfocused on understanding how to use the latent spaces
ofStyleGAN to manipulate generated and real images. How-ever, discovering semantically
meaningful latent manipula-tions typically involves painstaking human examination ofthe
many degrees of freedom, or an annotated collectionof images for each desired
manipulation. In this work, weexplore leveraging the power of recently introduced Con …
domains, much recent work hasfocused on understanding how to use the latent spaces
ofStyleGAN to manipulate generated and real images. How-ever, discovering semantically
meaningful latent manipula-tions typically involves painstaking human examination ofthe
many degrees of freedom, or an annotated collectionof images for each desired
manipulation. In this work, weexplore leveraging the power of recently introduced Con …
Abstract
Inspired by the ability of StyleGAN to generate highly re-alistic images in a variety of domains, much recent work hasfocused on understanding how to use the latent spaces ofStyleGAN to manipulate generated and real images. How-ever, discovering semantically meaningful latent manipula-tions typically involves painstaking human examination ofthe many degrees of freedom, or an annotated collectionof images for each desired manipulation. In this work, weexplore leveraging the power of recently introduced Con-trastive Language-Image Pre-training (CLIP) models in or-der to develop a text-based interface for StyleGAN imagemanipulation that does not require such manual effort. Wefirst introduce an optimization scheme that utilizes a CLIP-based loss to modify an input latent vector in response to auser-provided text prompt. Next, we describe a latent map-per that infers a text-guided latent manipulation step fora given input image, allowing faster and more stable text-based manipulation. Finally, we present a method for map-ping a text prompts to input-agnostic directions in Style-GAN's style space, enabling interactive text-driven imagemanipulation. Extensive results and comparisons demon-strate the effectiveness of our approaches.
openaccess.thecvf.com