[关闭]
@nrailgun 2015-10-02T18:43:14.000000Z 字数 1174 阅读 1541

k-NN

机器学习


Algorithm

  1. Given Training set T={(xi,yi)|xiXRn,yiYRn}, where i=1,2,,N, and distance metric.

  2. Let Nk(x) denotes k Nearest Neighbors (aka, k-NN), according to given distance metric.

  3. Decide class y of x according to points in Nk(x):

    y=argmaxcjxiNk(x)I(yi=ci),

    where i=1,2,,N, j=1,2,,K, and I denotes Indicator function.


Model

Selecting a smaller k reduces approximation error, but easily effected by near noise, aka, increases estimation error. Setting k=N is totally stupid, and usually selecting a smaller k is a good idea.


kd tree

Linear scanning works OK, but too slow. We need kd tree. Given T={x1,x2,,xN}, where xi=(x(1)i,x(2)i,,x(k)i)T, i=1,2,,N.

  1. For node at depth j, partition xi with x(l)i median for all xi in region, into 2 sub regions.
  2. Put those xi whose x(l)i less than median to the left sub-tree, others to the right sub-tree. Repeat step 1 till only one xi left.

kd tree search

TODO: The algorithm is not hard to understand, but I don't know how to implement it. I'll deal with it later.

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注