@xxh 2015-06-20T02:51:01.000000Z 字数 1352 阅读 329

[Nearest Neighbor]LSH Algorithm

applied in high dimensional spaces; one of the method for approximate nearest neighbor searching
general idea: hash the data points using several hash functions so as to ensure that, for each function, the probability of collision is much higher for points which are close to each other than for those which are far apart. Then, one can determine near neighbors by hashing the query point and retrieving elements stored in buckets containing that point.
key points: $D (p, q) \leq r \to P r o [h (p) = h (q)] \geq p 1$ $D(p,q)≤r\rightarrow Pro[h(p)=h(q)]≥p1$ $D (p, q) > r (1 + ε) \to P r o [h (p) = h (q)] \leq p 2$ $D(p,q)>r(1+ε)\rightarrow Pro[h(p)=h(q)]≤p2$ h:= hash function
algo:
- s1: offline: build up a hash table with hash function, by the searching(training?) result find the # of hash table L and # of hash functions in each table K
  - 希望原本相邻的数据经过LSH hash后，都能够落入到相同的桶内，而不相邻的数据经过LSH hash后，都能够落入到不同的桶中。
- s2: online, for new p0, get the datas in the place: h(p0) and compare their distance
comment: 减少需要匹配的数据点个数的同时保证查找到最近邻的数据点的概率很大(不是一定找得到)。
modification： LSH hash functions (>1 hash table and >1 hash function for each table)