风格迁移 - 码农教程

先看一下迁移后的图片

VGG19

一些是运行中打印出的网络结构代码，与上图完全对应。

LAYER GROUP 1 #卷积层组1
# 下面有两个卷积层，一个池化层，relu为线性整流层，每次卷积后，都relu一下
--conv1_1 | shape=(1, 663, 1000, 64) | weights_shape=(3, 3, 3, 64)
--relu1_1 | shape=(1, 663, 1000, 64) | bias_shape=(64,)
--conv1_2 | shape=(1, 663, 1000, 64) | weights_shape=(3, 3, 64, 64)
--relu1_2 | shape=(1, 663, 1000, 64) | bias_shape=(64,)
--pool1   | shape=(1, 332, 500, 64)

LAYER GROUP 2 #卷积层组2
# 下面有两个卷积层，一个池化层
--conv2_1 | shape=(1, 332, 500, 128) | weights_shape=(3, 3, 64, 128)
--relu2_1 | shape=(1, 332, 500, 128) | bias_shape=(128,)
--conv2_2 | shape=(1, 332, 500, 128) | weights_shape=(3, 3, 128, 128)
--relu2_2 | shape=(1, 332, 500, 128) | bias_shape=(128,)
--pool2   | shape=(1, 166, 250, 128)

LAYER GROUP 3 # 卷积层组3
# 下面有四个卷积层，一个池化层
--conv3_1 | shape=(1, 166, 250, 256) | weights_shape=(3, 3, 128, 256)
--relu3_1 | shape=(1, 166, 250, 256) | bias_shape=(256,)
--conv3_2 | shape=(1, 166, 250, 256) | weights_shape=(3, 3, 256, 256)
--relu3_2 | shape=(1, 166, 250, 256) | bias_shape=(256,)
--conv3_3 | shape=(1, 166, 250, 256) | weights_shape=(3, 3, 256, 256)
--relu3_3 | shape=(1, 166, 250, 256) | bias_shape=(256,)
--conv3_4 | shape=(1, 166, 250, 256) | weights_shape=(3, 3, 256, 256)
--relu3_4 | shape=(1, 166, 250, 256) | bias_shape=(256,)
--pool3   | shape=(1, 83, 125, 256)

LAYER GROUP 4 # 卷积层组4
# 下面有四个卷积层，一个池化层
--conv4_1 | shape=(1, 83, 125, 512) | weights_shape=(3, 3, 256, 512)
--relu4_1 | shape=(1, 83, 125, 512) | bias_shape=(512,)
--conv4_2 | shape=(1, 83, 125, 512) | weights_shape=(3, 3, 512, 512)
--relu4_2 | shape=(1, 83, 125, 512) | bias_shape=(512,)
--conv4_3 | shape=(1, 83, 125, 512) | weights_shape=(3, 3, 512, 512)
--relu4_3 | shape=(1, 83, 125, 512) | bias_shape=(512,)
--conv4_4 | shape=(1, 83, 125, 512) | weights_shape=(3, 3, 512, 512)
--relu4_4 | shape=(1, 83, 125, 512) | bias_shape=(512,)
--pool4   | shape=(1, 42, 63, 512)

LAYER GROUP 5 # 卷积层组5
# 下面有四个卷积层，一个池化层
--conv5_1 | shape=(1, 42, 63, 512) | weights_shape=(3, 3, 512, 512)
--relu5_1 | shape=(1, 42, 63, 512) | bias_shape=(512,)
--conv5_2 | shape=(1, 42, 63, 512) | weights_shape=(3, 3, 512, 512)
--relu5_2 | shape=(1, 42, 63, 512) | bias_shape=(512,)
--conv5_3 | shape=(1, 42, 63, 512) | weights_shape=(3, 3, 512, 512)
--relu5_3 | shape=(1, 42, 63, 512) | bias_shape=(512,)
--conv5_4 | shape=(1, 42, 63, 512) | weights_shape=(3, 3, 512, 512)
--relu5_4 | shape=(1, 42, 63, 512) | bias_shape=(512,)
--pool5   | shape=(1, 21, 32, 512)

vgg 本身还是一个卷积神经网络（CNN）(详细介绍)，卷积神经网络由输入层、卷积层、激活函数、池化层、全连接层组成，即INPUT（输入层）-CONV（卷积层）-RELU（激活函数）-POOL（池化层）-FC（全连接层）。

vgg19在卷积☛池化部分做了扩充修改。

由图及code，可以观察到，vgg19一共有五个卷积层组（conv layer），卷积层使用的卷积核均为3×3卷积核，三个全连接层（FC layer）。

共计一共19个隐藏层，其中16个卷积层，1个池化层。

VGG优点: VGGNet的结构非常简洁，整个网络都使用了同样大小的卷积核尺寸（3x3）和池化尺寸（2x2）。几个小滤波器（3x3）卷积层的组合比一个大滤波器（5x5或7x7）卷积层好：验证了通过不断加深网络结构可以提升性能。
VGG缺点: VGG耗费更多计算资源，并且使用了更多的参数（这里不是3x3卷积的锅），导致更多的内存占用。

其中绝大多数的参数都是来自于第一个全连接层。VGG可是有3个全连接层

论文阅读

原始content图像，用vec{p}表示，即最开始输入图像内容。
生成图像，用vec{x}表示，即迁移学习过程中生成的图像。
原始style图像，用vec{a}表示，即输入风格图像。
N_l：在第l个网络层中的feature map数
M_l：在第l个网络层中的feature map大小，即feature map的长宽乘积。
F^l：图像在第l个网络层的所有特征图组成的矩阵。
F^l_{ij}：原content图像在第l个网络层的F^l在第i个filter，位置j处的激活。
A^l_{ij}：原style图像在第l个网络层的A^l在第i个filter，位置j处的激活。
P^l_{ij}：同F^l_{ij}，P表示生成过程中图像。
定义误差损失函数L_{content}(vec{p},vec{x},l) = frac{1}{2}sumlimits_{i,j}(F_{ij}^l-P_{ij}^l)^2
G_{il}^l = sumlimits_{k}F_{ik}^lF_{kj}^l，表示在第l层，feature map i与feature map j的内积。
E_l=frac{1}{4N_l^2M_l^2}sumlimits_{i,j}(G_{ij}^l-A_{ij}^l)^2，生成图像与原始style图像在第l层的均方损失。
L_{style}(vec{a},vec{x})=sumlimits_{l=0}^Lw_lE_l，每一层生成的图像vec{x}与原始style图像的总损失。
L_{total} = {alpha}L_{content}(vec{p},vec{x})+{beta}L_{style}(vec{a},vec{x})

未写其反向传播过程，不太理解他那张图。待填坑。