StyleGAN2学习笔记

Analyzing and Improving the Image Quality of StyleGAN

原文链接

Analyzing and Improving the Image Quality of StyleGAN. CVPR 2020: 8107-8116

摘要

  基于风格的GAN架构(StyleGAN)在数据驱动的无条件生成图像建模中实现了最先进的效果。我们公开并分析了它的几个缺点(characteristic artifacts),并提出了模型体系结构和训练方法的改变来解决这些问题。特别地,我们重新设计了生成器规范化,重新研究了渐进增长(progressive growing),并对生成器进行了正则化,以鼓励在从潜在编码到图像的映射中的良好的调节。除了提高图像质量外,此路径长度正则化器还带来了其他好处,即生成器明显更容易反转。这使得通过某个特定网络来赋予生成图像属性成为可能。此外,我们还可视化了生成器如何利用其输出分辨率,并确定(identify)了容量问题,从而促使我们训练更大的模型,以进一步提高质量。总的来说,我们的改进模型重新定义了无条件图像模型的最先进水平,既包括现有的分布质量指标(distribution quality metrics ),也包括感知图像质量(perceived image quality)。

2. Removing normalization artifacts

2.1. Generator architecture revisited

  我们将首先修改StyleGAN生成器的几个细节,以更好地方便重新设计的规范化。就质量指标而言,这些变化对其自身产生了中性或小的积极影响。

Figure 2. We redesign the architecture of the StylGAN synthesis network. (a) The original StyleGAN, where A denotes a learned affine transform from W that produces a style and B is a noise broadcast operation. (b) The same diagram with full detail. Here we have broken the AdaIN to explicit normalization followed by modulation, both operating on the mean and standard deviation per feature map. We have also annotated the learned weights (ω), biases (b), and constant input (c), and redrawn the gray boxes so that one style is active per box. The activation function (leaky ReLU) is always applied right after adding the bias. (c) We make several changes to the original architecture that are justified in the main text. We remove some redundant operations at the beginning, move the addition of b and B to be outside active area of a style, and adjust only the standard deviation per feature map. (d) The revised architecture enables us to replace instance normalization with a 'demodulation' operation, which we apply to the weights associated with each convolution layer.

2.2 Instance normalization revisited

  StyleGAN的主要优势之一是能够通过风格混合控制生成的图像,例如,通过在推理时间(inference time)向不同的层传入不同的潜在 $\mathrm{w}$。

  如果我们愿意牺牲特定尺度(scale-specific)控制(见视频),我们可以简单地去除标准化,从而去除伪影,并稍微改善FID。现在,我们将提出一个更好的替代方案,在保留完全可控性的同时去除伪影。其主要思想是基于传入特征映射的预期统计数据进行规范化,但不需要显式强制(explicit forcing)。实际上,风格调整可能会将某些特征图放大一个数量级或更多。为了让风格混合发挥作用,我们必须在每个样本的基础上明确抵消这种放大效应——否则后续层将无法以有意义的方式对数据进行操作。