@CrazyHenry
2018-04-21T18:40:12.000000Z
字数 3065
阅读 1384
hhhhfaiss
- Author:李英民 | Henry
- E-mail: li
_
yingmin@
outlookdot
com- Home: https://liyingmin.wixsite.com/henry
nlist, the number of Voronoi cells
nprobe, the number of cells (out of nlist)
index.train(xb)
The search time roughly increases linearly with the number of probes plus some constant due to the quantization.(nprobe越大,搜索时间线性增加)
The nprobe parameter is always a way of adjusting the tradeoff between speed and accuracy of the result. Setting nprobe = nlist gives the same result as the brute-force search (but slower).
(nprobe参数越大,结果越精确(同时也越慢),当nlist == nprobe时,结果与暴力搜索一样,而且更慢)
那么nlist怎么确定呢?
At search time, only the database vectors y contained in the cell the query x falls in and a few neighboring ones are compared against the query vector.(只需要把和query x落到同一个Voronoi cell的database y以及这个y周围的一些y 跟query x做比较)
The IndexIVFFlat also requires another index, the quantizer, that assigns vectors to Voronoi cells(把原始vector assign给Voronoi cells). Each (Voronoi)cell is defined by a centroid(质心), and finding the Voronoi cell a vector falls in(找到被原始vectorfall in 的Voronoi cell) consists in finding the nearest neighbor of the vector in the set of centroids(相当于在set of centroids中做IndexFlatL2的KNN). This is the task of the other index, which is typically an IndexFlatL2.
/*
* Author: yingmin.li
*
* Date: 2018/4/21
*
* Detail: add a timespan part for demo2
*
*/
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cstdio>
#include <cstdlib>
#include <chrono>
#include <faiss/IndexFlat.h>
#include <faiss/IndexIVFFlat.h>
//using std::
using std::endl;
using std::cout;
//note: header file do not use using
int main() {
int d = 256; // dimension
int nb = 100000; // database size
int nq = 10000; // nb of queries
float *xb = new float[d * nb];
float *xq = new float[d * nq];
for(int i = 0; i < nb; i++) {
for(int j = 0; j < d; j++)
xb[d * i + j] = drand48();
xb[d * i] += i / 1000.;
}
for(int i = 0; i < nq; i++) {
for(int j = 0; j < d; j++)
xq[d * i + j] = drand48();
xq[d * i] += i / 1000.;
}
int nlist = 100;
int k = 4;
faiss::IndexFlatL2 quantizer(d); // the other index
faiss::IndexIVFFlat index(&quantizer, d, nlist, faiss::METRIC_L2);
// here we specify METRIC_L2, by default it performs inner-product search,内积cosin
assert(!index.is_trained);
index.train(nb, xb);
assert(index.is_trained);
index.add(nb, xb);
{ // search xq
long *I = new long[k * nq];
float *D = new float[k * nq];
for(int loopi = 1; loopi <= 40; loopi += 10)
{
index.nprobe = loopi;
auto begin_time = std::chrono::steady_clock::now();
for(int i = 0; i < 10; ++i)
{
index.search(nq, xq, k, D, I);//default时,nprobe = 1
}
auto end_time = std::chrono::steady_clock::now();
auto time_span = std::chrono::duration_cast<std::chrono::duration<long,std::ratio<1,1000>>>(end_time - begin_time);
cout << endl << endl << "IVFFlat(use quantization, CPU, nlist == 100):" <<endl
<< "when the nprobe = "<<loopi<<" ,"
<< "avg_time(10 times avg) for:" << endl
<< " -vec dimension d = " << d << endl
<< " -database vec numbers nb = " << nb <<endl
<< " -query vec batchings nq = " << nq << endl
<< " -KNN algorithm's k = " << k << endl
<< " is " << time_span.count() / 10 << " milliseconds" << endl;
}
// printf("I=\n");
// for(int i = nq - 5; i < nq; i++) {
// for(int j = 0; j < k; j++)
// printf("%5ld ", I[i * k + j]);
// printf("\n");
// }
// index.nprobe = 10;
// index.search(nq, xq, k, D, I);
// printf("I=\n");
// for(int i = nq - 5; i < nq; i++) {
// for(int j = 0; j < k; j++)
// printf("%5ld ", I[i * k + j]);
// printf("\n");
// }
delete [] I;
delete [] D;
}
delete [] xb;
delete [] xq;
return 0;
}