[关闭]
@nrailgun 2016-05-14T15:07:32.000000Z 字数 2854 阅读 1781

Deep feature learning with relative distance comparison for person re-identification

论文笔记


ABSTRACT

Identifying the same individual across different scenes is an important yet difficult task. The main difficult lies in preserving similarity against variation while discriminating different individuals. We present a scalable distance driven feature learning framework.

Introduction

2 contributions to literature:

  1. A scalable deep feature learning method for person re-identification via maximum relative distance.
  2. An effective learning algorithm for which the training cost main depends on the number of images rather than the number of triplets.

Model

Our objective is to use a deep convolutional network to learn effective feature representation that can satisfy the relative distance relationship under the L2 distance.

Oi=<O1i,O2i,O3i> in which O1i and O2i are matched pair and O1i and O3i are mismatched pair. Let W={Wj} denote the network parameters and FW(I) denote the network output of image I.

The desired feature should satisfy the following condition under the L2 norm:

FW(O1i)FW(O2i)2<FW(O1i)FW(O3i)2

Maximizing the distance between matched and mismatched pairs, where n is the number of the training triplets:

f(W,O)=i=1nmax(C,FW(O1i)FW(O2i)2FW(O1i)FW(O3i)2)

Constant C=1 is for preventing the overall value of the objective function from being dominated by easily identifiable triplets.

Learning algorithm

Triplet-based gradient descent algorithm

Check the paper for details.

Image-based gradient descent algorithm

In the triplet-based gradient descent algorithm, the number of network propagations depends on the number of training triplets in each iteration, with each triplet involving three rounds of forward and backward propagation. If the same image occurs in different triplets, the forward and backward propagation of that image can be reused.

Let {lk} presents the set of all the distinct images in the triplets,

{lk}={O1i}{O2i}{O3i}

The objective function can also be seen as follows:

f=f(FW(I1),FW(I2),,FW(Im))

f=f(Xl1,Xl2,,Xlm)

where m is the number of the images in the triplets. The derivative rule gives us the following equations:

fWl=k=1mfXlkXlkWl

fXlk=fXl+1kXl+1kXlk

It easy to get the derivative with respect to the output of each image:

fFW(Ik)=i=1nmax(C,FW(O1i)FW(O2i)2FW(O1i)FW(O3i)2)FW(Ik)

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注