☆ 4.7 Article

SuperstarGAN: Generative adversarial networks for image-to-image translation in large-scale domains

NEURAL NETWORKS (2023)

期刊

NEURAL NETWORKS

卷 162, 期 -, 页码 330-339

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.neunet.2023.02.042

关键词

Generative adversarial networks; Image -to -image translation; Domain translation; Face image translation; Image generation

类别

Computer Science, Artificial Intelligence Neurosciences

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Image-to-image translation with GANs is an active research area. StarGAN stands out by achieving multiple domain translation with a single generator, but it has limitations in learning large-scale domain mappings and expressing small feature changes. To overcome these limitations, an improved version called SuperstarGAN is proposed. It incorporates the idea of Controllable GAN and uses data augmentation techniques to handle overfitting. Evaluated with face image dataset, SuperstarGAN achieves better performance in terms of FID and LPIPS compared to StarGAN, and it can also control the degree of expression of target domain features in generated images.

Image-to-image translation with generative adversarial networks (GANs) has been extensively studied in recent years. Among the models, StarGAN has achieved image-to-image translation for multiple domains with a single generator, whereas conventional models require multiple generators. However, StarGAN has several limitations, including the lack of capacity to learn mappings among large-scale domains; furthermore, StarGAN can barely express small feature changes. To address the limitations, we propose an improved StarGAN, namely SuperstarGAN. We adopted the idea, first proposed in controllable GAN (ControlGAN), of training an independent classifier with the data augmentation techniques to handle the overfitting problem in the classification of StarGAN structures. Since the generator with a well-trained classifier can express small features belonging to the target domain, SuperstarGAN achieves image-to-image translation in large-scale domains. Evaluated with a face image dataset, SuperstarGAN demonstrated improved performance in terms of Frechet Inception distance (FID) and learned perceptual image patch similarity (LPIPS). Specifically, compared to StarGAN, SuperstarGAN exhibited decreased FID and LPIPS by 18.1% and 42.5%, respectively. Furthermore, we conducted an additional experiment with interpolated and extrapolated label values, indicating the ability of SuperstarGAN to control the degree of expression of the target domain features in generated images. Additionally, SuperstarGAN was successfully adapted to an animal face dataset and a painting dataset, where it can translate styles of animal faces (i.e., a cat to a tiger) and styles of painters (i.e., Hassam to Picasso), respectively, which explains the generality of SuperstarGAN regardless of datasets. (c) 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

作者

我是这篇论文的作者

点击您的名字以认领此论文并将其添加到您的个人资料中。

主要评分

4.7

评分不足

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

Mask guided diverse face image synthesis

Song Sun, Bo Zhao, Muhammad Mateen, Xin Chen, Junhao Wen

Summary: Recent studies have proposed an end-to-end learning framework for generating diverse, realistic and controllable face images guided by face masks. By using a style encoder, generator and discriminator, the proposed model can generate face images with different styles based on the input face mask and fine control the generated face image by manipulating the face mask.

FRONTIERS OF COMPUTER SCIENCE (2022)