KaiSu's Homepage

1.LeNet

Gradient Based Learning Applied to Document Recognition，1998

一般用于手写识别，paper中input为32x32x1，我实现时的mnist数据集好像是28x28x1，然后两层的conv（5x5，stride=1）、pooling（2x2，stride=2），之后fc、relu，最基本的cnn

2.Alxnet

ImageNet Classification with Deep Convolutional Neural Networks，NIPS 2012

输入224x224x3，五层的conv、pooling、relu，为了防过拟合使用了dropout和data augmentation（1.extract 224x224 from 256x256，2.horizontal reflections，3.alter the intensities of the RGB channels），还有weight decay

实验时一张卡放不下，把model拆成2份放到2张卡训练

3.VGG

Very Deep Convolutional Networks for Large-Scale Image Recognition，ICLR 2015

使用3x3 conv构成两种sequence结构：VGG-16和VGG-19

VGG的方法是一层一层地堆conv，继续增加深度会有训练困难、参数量增加等问题

4.GoogleNet系列

4.1 Inception v1

Going Deeper with Convolutions，CVPR 2015

设计一种较宽的Inception module，（a）naive版本：将1x1，3x3，5x5的conv和3x3的pooling，都stack在一起，一方面增加了网络的width，另一方面增加了网络对尺度的适应性. (主要是因为1x1, 3x3, 5x5, 3x3 pooling的作用都不一样，索性都stack在一起，让模型自己选)（b）降维版本：在计算量大的conv之前，先用1x1降维，减少计算量

网络的组成：在低层的时候仍用传统的卷积方式（sequence结构），高层开始堆Inception module；中间loss监督防止梯度消失；使用global average pooling

4.2 Inception v2

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift，arXiv 2015

使用Batch normalization，将每层输入归一化到N(0,1)的高斯分布

网络结构方面学习VGG用2个3x3代替一个5x5，参数了变少，但感受视野一样

4.3 Inception v3

Rethinking the inception architecture for computer vision，CVPR 2016

卷积进一步分解，5x5用2个3x3卷积替换，7x7用3个3x3卷积替换，3x3卷积核可以进一步用1x3和3x1的卷积核组合来替换，7x7分解成两个一维的卷积（1x7,7x1），进一步减少计算量，好处，既可以加速计算（多余的计算能力可以用来加深网络），又可以将1个conv拆成2个conv，使得网络深度进一步增加，增加了网络的非线性

4.4 Inception v4

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning，AAAI 2017

探索residual connection对Inception module的影响：收敛加速，但是最终效果好像提升很少。paper中提出了一些residual和 non-residual Inception networks

4.5 Xception

Xception: Deep Learning with Depthwise Separable Convolutions，CVPR 2017

Xception将分解的思想推到了极致：跨通道的相关性和空间相关性是完全可分离的，最好不要联合映射它们，先pointwise + relu再depthwise + relu（和mobilenet相反）

5.ResNet

Deep Residual Learning for Image Recognition，CVPR 2016 Best Paper

提出Residual Learning，两种bottleneck：1.3x3xc，3x3xc ; 2.1x1x(c/4)，3x3x(c/4)，1x1xc

Identity Mappings in Deep Residual Networks，ECCV 2016

做实验探索residual bottlneck里面是conv，bn，relu还是bn，relu，conv，实验效果后者好，但是一般我还是用前者，TF和Pytorch放出来的Imagenet pretrained model基本都是前者

6.DenseNet

Densely Connected Convolutional Networks，CVPR 2017

DenseNet将residual connection思想推到极致，每一层输出都直连到后面的所有层，可以更好地复用特征，每一层都比较浅，融合了来自前面所有层的所有特征，很容易训练。缺点是显存占用更大并且反向传播计算更复杂一点

7.ResNeXt

Aggregated Residual Transformations for Deep Neural Networks，CVPR 2017

借鉴了Inception加宽的思想，使用分组卷积，所以计算量减少，bottleneck的维度可以适当增加，效果提升：1x1x(c/2)，3x3x(c/2)，1x1xc

8.DPN

Dual Path Networks，NIPS 2017

把ResNeXt（feature re-usage）和DenseNet（new features exploration）合并

9.WRN

Wide Residual Networks，BMVC 2017

把ResNet变宽：增加output channel的数量来使模型变得更wider，深度可以不用太深了

10.SENet

Squeeze-and-Excitation Networks，CVPR 2018

Feature map的channel-wise attention

11.NASNet

Learning Transferable Architectures for Scalable Image Recognition，arXiv 2017

Google的AutoML

12.MobileNet

12.1 MobileNet v1

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications，CVPR 2017

用depth wise conv + point wise conv替代标准conv，减少计算量

12.2 MobileNet v2

MobileNetV2: Inverted Residuals and Linear Bottlenecks，arXiv 2018

Paper在探索这样的问题：如何把residual bottleneck应用到mobile net v1？一种改进是通过先扩张在收缩的方式，让depth wise conv提取的特征更丰富些，还有就是在和indentify mapping元素相加时去掉了relu，因为paper做实验证明relu只适合用于维度多的feature map的激活

13.ShuffleNet

13.1 ShuffleNet v1

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices，CVPR 2018

进一步用group conv + channel wise替代mobilenet中的point wise conv

13.2 ShuffleNet v2

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design，ECCV 2018

Paper指出FLOPs并不能完全衡量模型速度，并给出4个实验结论：

卷积层的输入和输出特征通道数相等时memory access cost最小，模型速度最快
过多的group操作会增大MAC，从而使模型速度变慢
模型中的分支数量越少，模型速度越快
element-wise操作所带来的时间消耗远比在FLOPs上的体现的数值要多，因此要尽可能减少element-wise操作

然后根据上述4个规则重新设计了shuffle net v2结构

14.Analysis

An Analysis Of Deep Neural Network Models For Practical Applications

从paper的Figure. 2可以看出，比较划算的是Inception、Resnet系列

References

Kai Su / 2018-09-08
Published under (CC) BY-NC-SA in categories Research tagged with cnn