1.LeNet
Gradient Based Learning Applied to Document Recognition,1998
一般用于手写识别,paper中input为32x32x1,我实现时的mnist数据集好像是28x28x1,然后两层的conv(5x5,stride=1)、pooling(2x2,stride=2),之后fc、relu,最基本的cnn
2.Alxnet
ImageNet Classification with Deep Convolutional Neural Networks,NIPS 2012
输入224x224x3,五层的conv、pooling、relu,为了防过拟合使用了dropout和data augmentation(1.extract 224x224 from 256x256,2.horizontal reflections,3.alter the intensities of the RGB channels),还有weight decay
实验时一张卡放不下,把model拆成2份放到2张卡训练
3.VGG
Very Deep Convolutional Networks for Large-Scale Image Recognition,ICLR 2015
使用3x3 conv构成两种sequence结构:VGG-16和VGG-19
VGG的方法是一层一层地堆conv,继续增加深度会有训练困难、参数量增加等问题
4.GoogleNet系列
4.1 Inception v1
Going Deeper with Convolutions,CVPR 2015
设计一种较宽的Inception module,(a)naive版本:将1x1,3x3,5x5的conv和3x3的pooling,都stack在一起,一方面增加了网络的width,另一方面增加了网络对尺度的适应性. (主要是因为1x1, 3x3, 5x5, 3x3 pooling的作用都不一样,索性都stack在一起,让模型自己选)(b)降维版本:在计算量大的conv之前,先用1x1降维,减少计算量
网络的组成:在低层的时候仍用传统的卷积方式(sequence结构),高层开始堆Inception module;中间loss监督防止梯度消失;使用global average pooling
4.2 Inception v2
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,arXiv 2015
使用Batch normalization,将每层输入归一化到N(0,1)的高斯分布
网络结构方面学习VGG用2个3x3代替一个5x5,参数了变少,但感受视野一样
4.3 Inception v3
Rethinking the inception architecture for computer vision,CVPR 2016
卷积进一步分解,5x5用2个3x3卷积替换,7x7用3个3x3卷积替换,3x3卷积核可以进一步用1x3和3x1的卷积核组合来替换,7x7分解成两个一维的卷积(1x7,7x1),进一步减少计算量,好处,既可以加速计算(多余的计算能力可以用来加深网络),又可以将1个conv拆成2个conv,使得网络深度进一步增加,增加了网络的非线性
4.4 Inception v4
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,AAAI 2017
探索residual connection对Inception module的影响:收敛加速,但是最终效果好像提升很少。paper中提出了一些residual和 non-residual Inception networks
4.5 Xception
Xception: Deep Learning with Depthwise Separable Convolutions,CVPR 2017
Xception将分解的思想推到了极致:跨通道的相关性和空间相关性是完全可分离的,最好不要联合映射它们,先pointwise + relu再depthwise + relu(和mobilenet相反)
5.ResNet
Deep Residual Learning for Image Recognition,CVPR 2016 Best Paper
提出Residual Learning,两种bottleneck:1.3x3xc,3x3xc ; 2.1x1x(c/4),3x3x(c/4),1x1xc
Identity Mappings in Deep Residual Networks,ECCV 2016
做实验探索residual bottlneck里面是conv,bn,relu还是bn,relu,conv,实验效果后者好,但是一般我还是用前者,TF和Pytorch放出来的Imagenet pretrained model基本都是前者
6.DenseNet
Densely Connected Convolutional Networks,CVPR 2017
DenseNet将residual connection思想推到极致,每一层输出都直连到后面的所有层,可以更好地复用特征,每一层都比较浅,融合了来自前面所有层的所有特征,很容易训练。缺点是显存占用更大并且反向传播计算更复杂一点
7.ResNeXt
Aggregated Residual Transformations for Deep Neural Networks,CVPR 2017
借鉴了Inception加宽的思想,使用分组卷积,所以计算量减少,bottleneck的维度可以适当增加,效果提升:1x1x(c/2),3x3x(c/2),1x1xc
8.DPN
Dual Path Networks,NIPS 2017
把ResNeXt(feature re-usage)和DenseNet(new features exploration)合并
9.WRN
Wide Residual Networks,BMVC 2017
把ResNet变宽:增加output channel的数量来使模型变得更wider,深度可以不用太深了
10.SENet
Squeeze-and-Excitation Networks,CVPR 2018
Feature map的channel-wise attention
11.NASNet
Learning Transferable Architectures for Scalable Image Recognition,arXiv 2017
Google的AutoML
12.MobileNet
12.1 MobileNet v1
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,CVPR 2017
用depth wise conv + point wise conv替代标准conv,减少计算量
12.2 MobileNet v2
MobileNetV2: Inverted Residuals and Linear Bottlenecks,arXiv 2018
Paper在探索这样的问题:如何把residual bottleneck应用到mobile net v1?一种改进是通过先扩张在收缩的方式,让depth wise conv提取的特征更丰富些,还有就是在和indentify mapping元素相加时去掉了relu,因为paper做实验证明relu只适合用于维度多的feature map的激活
13.ShuffleNet
13.1 ShuffleNet v1
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,CVPR 2018
进一步用group conv + channel wise替代mobilenet中的point wise conv
13.2 ShuffleNet v2
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design,ECCV 2018
Paper指出FLOPs并不能完全衡量模型速度,并给出4个实验结论:
- 卷积层的输入和输出特征通道数相等时memory access cost最小,模型速度最快
- 过多的group操作会增大MAC,从而使模型速度变慢
- 模型中的分支数量越少,模型速度越快
- element-wise操作所带来的时间消耗远比在FLOPs上的体现的数值要多,因此要尽可能减少element-wise操作
然后根据上述4个规则重新设计了shuffle net v2结构
14.Analysis
An Analysis Of Deep Neural Network Models For Practical Applications
从paper的Figure. 2可以看出,比较划算的是Inception、Resnet系列