@fanisfun 2017-06-01T12:12:57.000000Z 字数 7143 阅读 2426

Face Verification

DeepLearning Face

FaceNet

training (tensorflow (99.3))
convert to caffe

dlib face

training
- Feature
- Metric
- Image Jitter (99.38/99.13)
convert to caffe

Triplet

Theory details
- Paper: CVPR 2015 Google: FaceNet: A Unified Embedding for Face Recognition and Clustering
- Blogs
  1. triplet loss原理以及梯度推导
  2. 如何在caffe中增加layer以及caffe中triplet loss layer的实现
- StackOverFlow: What's the triplet loss back propagation gradient formula?
Caffe Inplementations on github
- luhaofang/tripletloss python exprimental 152
- tyandzx/caffe c++ cosine matric 19
- hizhangp/triplet python 17
  1. Modify sampledata.py， config.py and train.py to fit your dataset and working environment.
  2. Pre-train your model with softmax loss.
  3. Finetune triplet model based on your pre-trained model.
  4. Learn to adjust parameters.
- Caffe PR#3663 (Triplet loss)
- Caffe PR#3123 (CNN Triplet Training) Most References
  - losstype
    1. FaceNet: A Unified Embedding for Face Recognition and Clustering
      
      $\mathcal{L}_0(x_a, x_p, x_n) = max(0, m - || x_a - x_n ||_2^2 + || x_a - x_p ||_2^2)$
    2. Learning Descriptors for Object Recognition and 3D Pose Estimation
      
      $\mathcal{L}_1(x_a, x_p, x_n) = max(0, 1 - \frac{|| x_a - x_n ||_2^2}{|| x_a - x_p ||_2^2 + m})$
    3. Learning Descriptors for Object Recognition and 3D Pose Estimation
      
      $\mathcal{L}_2(x_a, x_p, x_n) = max(0, 1 - \frac{exp(|| x_a - x_n ||_2^2)}{exp(|| x_a - x_p ||_2^2) + m})$
  - partial deriviation (losstype 2)
    
    $\mathcal{L}_{tri}(s_i,s_j,s_k) = max(0,1-\frac{||f(x_i)-f(x_k)||_2^2}{||f(x_i)-f(x_j)||_2^2+m})$
    where $f(x)$ is the input of the loss layer for sample $x$ and $m$ is the margin for triplet.
    Denote that $D_{ij}=||f(x_i)-f(x_j)||_2^2$ and $D_{ik}=||f(x_i)-f(x_k)||_2^2$ , so the partial
    differential equations for the input of triplet loss layer are:
    
    $\frac{\partial \mathcal{L}_{tri}}{\partial f(x_i)}= D_{ik}(f(x_i)-f(x_j))-(D_{ij}+m)(f(x_i)-f(x_k))(D_{ij}+m)^2$
    
    $\frac{\partial \mathcal{L}_{tri}}{\partial f(x_j)}= \frac{D_{ik}(f(x_j)-f(x_i))}{(D_{ij}+m)^2} \nonumber$
    
    $\frac{\partial \mathcal{L}_{tri}}{\partial f(x_k)}= \frac{f(x_i)-f(x_k)}{D_{ij}+m}$

Multi-negatives Triplet-Loss

Layer Setup in caffe
- $batch = \{triplet_1, triplet_2, ..., triplet_i,..., triplet_n\}$ where $n$ is num_triplets
- $triplet = \{f_a, f_p, f_1, f_2, ..., f_j, ..., f_m\}$ where $m$ is num_negatives
Forward
- Each negative: $l(a, p, j) = max(0, || f_a - f_p ||_2^2 - || f_a - f_j ||_2^2 + mrg)$ where $mrg$ is margin
- Each triplet: $\mathcal{L}_{tri} = \sum_j^ml(a,p,j) = \sum_j^mmax(0, || f_a - f_p ||_2^2 - || f_a - f_j ||_2^2 + mrg)$
- Each batch: $\mathcal{L}_{batch} = \frac{1}{2n}\sum_i^n\mathcal{L}_{tri} = \frac{1}{2n}\sum_i^n\sum_j^mmax(0, || f_a - f_p ||_2^2 - || f_a - f_j ||_2^2 + mrg)$
Backward
- Each negative:
  
  $\frac{\partial l(a,p,j)}{f_a}=\left\{ \begin{array}{lcc} 2(f_j-f_p) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
  
  $\frac{\partial l(a,p,j)}{f_p}=\left\{ \begin{array}{lcc} -2(f_a-f_p) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
  
  $\frac{\partial l(a,p,j)}{f_j}=\left\{ \begin{array}{lcc} 2(f_a-f_j) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
- Each triplet:
  
  $\frac{\partial \mathcal{L}_{tri}}{f_a}=\left\{ \begin{array}{lcc} \sum_j^m2(f_j-f_p) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
  
  $\frac{\partial \mathcal{L}_{tri}}{f_p}=\left\{ \begin{array}{lcc} \sum_j^m-2(f_a-f_p) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
  
  $\frac{\partial \mathcal{L}_{tri}}{f_j}=\left\{ \begin{array}{lcc} 2(f_a-f_j) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
- Each batch:
  
  $\frac{\partial \mathcal{L}_{batch}}{f_{ai}}=\left\{ \begin{array}{lcc} \frac1n\sum_j^m(f_{ij}-f_{pi}) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
  
  $\frac{\partial \mathcal{L}_{batch}}{f_{pi}}=\left\{ \begin{array}{lcc} -\frac1n\sum_j^m(f_{ai}-f_{pi}) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$
  
  $\frac{\partial \mathcal{L}_{batch}}{f_{ij}}=\left\{ \begin{array}{lcc} \frac1n(f_{ai}-f_{ij}) & & {\text{if } (l(a,p,j)>0)} \\ 0 & & \text{otherwise} \\ \end{array} \right.$

Layer setup in caffe

Dtype(n,1,dim,1) diff_pos; // cache for backward
Dtype(n,m,dim,1) diff_neg; // cache for backward
Dtype(n,m,1,1) loss_ij;  // cache for backward

Forward in caffe

Diff:

for (int i = 0; i < n; ++i) {
    Dtype(n,1,dim,1) diff_pos[i] = fa - fp; // cached for backward
    for (int j = 0; j < m; ++j)
        Dtype(n,m,dim,1) diff_neg[i][j] = fa - fn;  // cached for backward
}

Dist:

for (int i = 0; i < n; ++i) {
    local Dtype(0.0) dist_sq_pos = diff_pos^T * diff_pos;
    for (int j = 0; j < m; ++j)
        local Dtype(0.0) dist_sq_neg = diff_neg^T * diff_neg;
}

Loss:

for (int i = 0; i < n; ++i)
    for (int j = 0; j < m; ++j) {
        Dtype(n,m,1,1) loss_ij = max(0, dist_sq_pos - dist_sq_neg + margin); // cached for backward
        loss += loss_ij;
    }
// out of loops
loss /= 2n;

Backward in caffe

local const Dtype(0.0) alpha = top[0].cpu_diff[0] / n;  // top_diff
Dtype* bout = bottom[0]->mutable_cpu_diff();            // bottom_diff
caffe_set(bottom[0]->count(), Dtype(0), bout);
for (int i = 0; i < n; ++i) {
    for (int j = 0; j < m; ++j) {
        if (loss_ij[i][j] > Dtype(0.0)] {
            // axpby: y = alpha x + beta y
            // anchor f_n - x_p = diff_pos - diff_neg
            axpby(dim, alpha, diff_pos[i], Dtype(1.0), bout[i*(m+2)+0]);
            axpby(dim, -alpha, diff_neg[i][j], Dtype(1.0), bout[i*(m+2)+0]);
            // positive
            axpby(dim, -alpha, diff_pos[i], Dtype(1.0), bout[i*(m+2)+1]);
            // negative
            axpby(dim, alpha, diff_neg[i][j], Dtype(0.0), bout[i*(m+2)+(j+2)]);
        }
    }
}

Face Datasets

MSR Image Recognition Challenge

Loss Function

Softmax Loss
Triplet Loss
- (汤晓鸥 Deep LearningFace Representation by Joint Identification-Verification：2014)
Constrastive Loss
- (Google FaceNet: A Unified Embedding for Face Recognition and Clustering：2015)
Softmax Loss
Tuplet loss
- (NEC: Improved deep metric learning with multi-class n-pair loss objective：NIPS 2016)

Pairlet Loss

Layer Setup in caffe
- $batch = \{pair_1, pair_2,..., pair_n\}$ where $n$ is num_pairs
- $pair = \{f_a, f_b\}$ so $batch\_size = 2 n$
Loss Formula

$\mathcal{L}_{pairlet}=\frac1{n}\sum_{i=1}^nlog[1+\sum_{j\neq i}exp(f_j^bf_i^a-f_i^bf_i^a)+\sum_{i\neq j}exp(f_j^af_i^b-f_i^af_i^b)]$
Back Propagation
Assume that

then we can get

and

we can extract those w.r.t
- if $k = i$ :
  
  ${\mathcal{T}_k}'=\sum_{j\neq i}exp(f_j^bf_i^a-f_i^bf_i^a)+\sum_{i\neq j}exp(f_j^af_i^b-f_i^af_i^b)$
- else ( $k\neq i$ ) :
  
  ${\mathcal{T}_k}'=exp(f_i^af_k^b-f_k^af_k^b)$
  so
  
  $\begin{split} \frac {\partial \mathcal{L}_{pairlet}}{\partial f_i^a} &= \frac1n\{\frac1{\mathcal{T}_i}[\sum_{j\neq i}exp(f_j^bf_i^a-f_i^bf_i^a)(f_j^b-f_i^b)+\sum_{i\neq j}exp(f_j^af_i^b-f_i^af_i^b)(-f_i^b)]\\ &+ \sum_{j\neq i}\frac1{\mathcal{T}_j}exp(f_i^af_j^b-f_j^af_j^b)(f_j^b)\} \end{split}$
  in a similar way:
  
  $\begin{split} \frac {\partial \mathcal{L}_{pairlet}}{\partial f_i^b} &= \frac1n\{\frac1{\mathcal{T}_i}[\sum_{j\neq i}exp(f_j^af_i^b-f_i^af_i^b)(f_j^a-f_i^a)+\sum_{i\neq j}exp(f_j^bf_i^a-f_i^bf_i^a)(-f_i^a)]\\ &+ \sum_{j\neq i}\frac1{\mathcal{T}_j}exp(f_i^bf_j^a-f_j^bf_j^a)(f_j^a)\} \end{split}$

Todo

Read Google Papar
- Figure out meaning of typeloss
- How to select pairs and generate images list
- Semi-hard Selection
  python code
  OpenFace(torch): 0.1.0->0.2.0
  Ref-details: FaceNet Paper
- details and tricks of training later
Fork Offcial Caffe
- create triple-face branch
- modify/merge changes of PR#3123
- create examples/triplet-face
Find / design a good triplet-net
- decide input size: 96*112
- create data
Train triplet-net
- prograss
  1. train loss:
  2. lfw acc:
  3. val acc:
  4. more:
- adjust parameters
  1. learning rate:
  2. weight decay:
  3. center loss:
  4. num of negs:
  5. more:
Prepare PPT

Ways to own layers:
1. Python Layer forward and backward, work
2. C++ CPU forward: check variables
3. Cuda GPU forward: deploy
4. C++ CPU backward: stable and convergent
5. Cuda GPU backward: Quicker Training
6. Test code
6. Format Style and Comments
7. PR to Offcial Caffe