International Conference on Automatic Face and Gesture Recognition
会议日期
2019/5/14-2019/5/18
会议地点
Lille, France
摘要
The ensemble of convolutional neural networks
(CNNs) has widely been used in many computer vision tasks
including face recognition. Many existing ensembles of face
recognition CNNs apply a two-stage pipeline to target performance
improvement [10], [20], [22], [23], [29]: (1) it trains
multiple CNNs separately with many face patches covering
different facial areas; (2) the features derived from different
models are aggregated off-line by different fusion methods.
The well-known face recognition work, DeepID2 [20] trains
200 networks based on 200 arbitrarily chosen facial areas and
chooses the best 25 ones to achieve impressive performance.
However, it is very time-consuming to train so many networks.
In addition, a brute-force like way of choosing facial patches is
used without knowing which face patches are complementary
and discriminative. It might be lack of generalization capability
for cross-database applications. To solve that, we propose a
novel end-to-end CNN ensemble architecture which automatically
learns the complementary and discriminative patches
for face recognition. Specifically, we propose a novel Patch
Generation Engine (PGE) with Patch Search Spatial Transformer
Network (PS-STN) and ROI shrunk loss to perform the
patch selection process. ROI shrunk loss enlarges the distance
of learned features in spatial space and feature space and
learn complementary features. In order to get final aggregated
feature, we use a supervised fusion module named Two Stage
Discriminative Fusion Module (TSDFM) which effective to
capture the global and local information and further guide the
PGE to learn better patches. Extensive experiments conducted
on LFW and YTF datasets show the effectiveness of our novel
end-to-end ensemble method.
修改评论