Content-Based Search for Deep Generative Models

Daohan Lu^1*

Sheng-Yu Wang^1*

Nupur Kumari^1*

Rohan Agarwal^1*

David Bau²

Jun-Yan Zhu¹

*equal contribution

¹Carnegie Mellon University

²Northeastern University

[GitHub]

[Paper]

[Modelverse]

We develop a content-based search engine for Modelverse, a model sharing platform that contains a diverse set of deep generative models, such as animals, landscapes, portraits, and art pieces. From left to right, our search algorithm enables queries (1st row) with four different modalities – text, images, sketches, and existing models. The 2nd and 3rd rows show the two top-ranked models. The color of each model icon implies the model type. Our method finds relevant models with similar semantic concepts in all modalities.

Abstract

The growing proliferation of pretrained generative models has made it infeasible for a user to be fully cognizant of every model that exists. To address this problem, we introduce the problem of content-based model retrieval: given a query and a large set of generative models, finding the models that best match the query. Because each generative model produces a distribution of images, we formulate the search problem as an optimization to maximize the probability of generating a query match given a model. We develop approximations to make this problem tractable when the query is an image, a sketch, a text description, another generative model, or a combination of the above. We benchmark our methods in both accuracy and speed over a set of generative models. We further demonstrate that our model search can retrieve good models for image editing and reconstruction, few-shot transfer learning, and latent space interpolation.

Modelverse: search engine for generative models

[Click here to try out Modelverse!!]

Video

Paper

Daohan Lu*, Sheng-Yu Wang*, Nupur Kumari*, Rohan Agarwal*, David Bau, Jun-Yan Zhu.
Modelverse: Content-Based Search for Deep Generative Models.
In arXiv, 2022. (Paper)

[Bibtex]

Method

Our search system consists of a pre-caching stage (a, b) and an inference stage (c). Given a collection of models, (a) we first generate 50K samples for each model. (b) We then encode the images into image features and compute the 1st and 2nd order feature statistics for each model. The statistics are cached in our system for efficiency. (c) At inference time, we support queries of different modalities (text, image, or sketch). We encode the query into a feature vector, and assess the similarity between the query feature and each model’s statistics. The models with the best similarity measures are retrieved.

Finding the best GAN to invert real images

GAN inversion is a technique to edit real images using GANs. When we have a lot of GAN models, it is important to know which GAN can invert and edit an image well. Below, we find the most relevant model for inverting images and then interpolate in that generative model’s latent space. Selecting a relevant model leads to meaningful interpolation between the two images.

Limitations

(Left) Sometimes, a sketch query (e.g., the bird sketch) will match models with abstract styles. It is ambiguous whether the CLIP feature should match the shape of the sketch, or the styles and textures. (Right) For conflicting multimodal queries (elephant text query + a dog image), our system has difficulty retrieving models with both concepts. There are no elephant models in the top-ranked models.

Acknowledgements

We thank George Cazenavette and Chonghyuk (Andrew) Song for proofreading the draft. We are also grateful to Aaron Hertzmann, Maneesh Agrawala, Stefano Ermon, Chenlin Meng, William Peebles, Richard Zhang, Phillip Isola, Taesung Park, Muyang Li, Kangle Deng, Gaurav Parmar, and Ji Lin for helpful comments and discussion. The work is partly supported by Adobe Inc., Sony Corporation, and Naver Corporation. Website template is from Colorful Colorization.