KaiSu's Homepage

记得是十月份，就在国庆节假的前一天，mtcnn作者在github上发布了论文的测试代码。群里当时还是炸了锅的，有人抱怨假期发布重磅消息的不人道行为。但是，trained model毕竟还是小公司的命根。还多人虽不满足于测试代码，纷纷复现mtcnn，不过效果都不咋样。

最近，Github又分布了mxnet成功复现mtcnn face detection module的代码，还是非常值得学习的。

首先是mxnet的安装，相对于caffe的安装，mxnet简单得许多。中途遇到两个问题：

git clone mxnet仓库的时候，需要加上–recursive，将第三方子仓库clone到本地
/usr/bin/ld: cannot find -lcudart的问题，在askubuntu.com上找着了解决方案

mtcnn属于cascaded model，从我拙劣的理解来看，mtcnn model赢在了数据上…所以复现mtcnn最难的莫过于如何准备training set？按照论文的说法，可以依据IoU将某块cropped image patch分为Negatives（IoU < 0.3）, Positives（IoU < 0.65）, Part（0.4 < IoU < 0.65）

一块image patch需要三个属性：size（width=height），left，top。所以为了crop patch，需要确定每个属性的范围，然后random crop就行

Negatives：可以从两个角度去制作负样本
- 在整个image中作随机crop
- 在face bbox groundtruth附近制作有overlap的负样本，那么nx1范围为[x1-size,x1+w]

size = npr.randint(12, min(width, height) / 2)
nx = npr.randint(0, width - size)
ny = npr.randint(0, height - size)

size = npr.randint(12,  min(width, height) / 2)
delta_x = npr.randint(max(-size, -x1), w)
delta_y = npr.randint(max(-size, -y1), h)
nx1 = max(0, x1 + delta_x)
ny1 = max(0, y1 + delta_y)

Positives：正样本肯定是在face bbox gt附件制作，确保IoU >= 0.65

size = npr.randint(int(min(w, h) * 0.8), np.ceil(1.25 * max(w, h)))
# delta here is the offset of box center
delta_x = npr.randint(-w * 0.2, w * 0.2)
delta_y = npr.randint(-h * 0.2, h * 0.2)
nx1 = max(x1 + w / 2 + delta_x - size / 2, 0)
ny1 = max(y1 + h / 2 + delta_y - size / 2, 0)
nx2 = nx1 + size
ny2 = ny1 + size

用于P-Net回归的bbox regression gt label可以通过下面计算：

offset_x1 = (x1 - nx1) / float(size)
offset_y1 = (y1 - ny1) / float(size)
offset_x2 = (x2 - nx2) / float(size)
offset_y2 = (y2 - ny2) / float(size)

Parts：制作样本方式和正样本一致，通过IoU区分即可

Kai Su / 2016-12-24
Published under (CC) BY-NC-SA in categories Research tagged with deep learning face detection paper