@nrailgun 2016-05-14T07:07:32.000000Z 字数 2854 阅读 1880

Deep feature learning with relative distance comparison for person re-identification

论文笔记

ABSTRACT

Identifying the same individual across different scenes is an important yet difficult task. The main difficult lies in preserving similarity against variation while discriminating different individuals. We present a scalable distance driven feature learning framework.

Introduction

2 contributions to literature:

A scalable deep feature learning method for person re-identification via maximum relative distance.
An effective learning algorithm for which the training cost main depends on the number of images rather than the number of triplets.

Model

Our objective is to use a deep convolutional network to learn effective feature representation that can satisfy the relative distance relationship under the $L_2$ distance.

$O_i = <O_i^1, O_i^2, O_i^3>$ in which $O_i^1$ and $O_i^2$ are matched pair and $O_i^1$ and $O_i^3$ are mismatched pair. Let $W = \{ W_j \}$ denote the network parameters and $F_W(I)$ denote the network output of image $I$ .

The desired feature should satisfy the following condition under the $L_2$ norm:

∥ F W (O 1 i) - F W (O 2 i) ∥ 2 < ∥ F W (O 1 i) - F W (O 3 i) ∥ 2

$\left\| F_W(O_i^1) - F_W(O_i^2) \right\|^2 \lt \left\| F_W(O_i^1) - F_W(O_i^3) \right\|^2$

Maximizing the distance between matched and mismatched pairs, where $n$ is the number of the training triplets:

f (W, O) = \sum i = 1 n max (C, ∥ F W (O 1 i) - F W (O 2 i) ∥ 2 - ∥ F W (O 1 i) - F W (O 3 i) ∥ 2)

$f(W, O) = \sum_{i=1}^n \max \left( C, \left\| F_W(O_i^1) - F_W(O_i^2) \right\|^2 - \left\| F_W(O_i^1) - F_W(O_i^3) \right\|^2 \right)$

Constant $C = -1$ is for preventing the overall value of the objective function from being dominated by easily identifiable triplets.

Learning algorithm

Triplet-based gradient descent algorithm

Check the paper for details.

Image-based gradient descent algorithm

In the triplet-based gradient descent algorithm, the number of network propagations depends on the number of training triplets in each iteration, with each triplet involving three rounds of forward and backward propagation. If the same image occurs in different triplets, the forward and backward propagation of that image can be reused.

Let $\{ l_k' \}$ presents the set of all the distinct images in the triplets,

{l' k} = {O 1 i} \cup {O 2 i} \cup {O 3 i}

$\{ l_k' \} = \{ O_i^1 \} \cup \{ O_i^2 \} \cup \{ O_i^3 \}$

The objective function can also be seen as follows:

f = f (F W (I' 1), F W (I' 2), \dots, F W (I' m))

$f = f \left( F_W(I_1'), F_W(I_2'), \dots, F_W(I_m') \right)$

f = f (X l 1, X l 2, \dots, X l m)

$f = f \left( X_1^l, X_2^l, \dots, X_m^l \right)$

where $m$ is the number of the images in the triplets. The derivative rule gives us the following equations:

\partial f \partial W l = \sum k = 1 m \partial f \partial X l k \partial X l k \partial W l

$\frac {\partial f} {\partial W^l} = \sum_{k=1}^m \frac {\partial f} {\partial X_k^l} \frac {\partial X_k^l} {\partial W^l}$

\partial f \partial X l k = \partial f \partial X l + 1 k \partial X l + 1 k \partial X l k

$\frac {\partial f} {\partial X_k^l} = \frac {\partial f} {\partial X_k^{l+1}} \frac {\partial X_k^{l+1}} {\partial X_k^l}$

It easy to get the derivative with respect to the output of each image:

\partial f \partial F W ( I ' k ) = \sum i = 1 n \partial max ( C , ∥ F W ( O 1 i ) - F W ( O 2 i ) ∥ 2 - ∥ F W ( O 1 i ) - F W ( O 3 i ) ∥ 2 ) \partial F W ( I ' k )

$\frac {\partial f} {\partial F_W(I_k')} = \sum_{i=1}^n \frac {\partial \max \left( C, \left\| F_W(O_i^1) - F_W(O_i^2) \right\|^2 - \left\| F_W(O_i^1) - F_W(O_i^3) \right\|^2 \right)} {\partial F_W(I_k')}$