[关闭]
@nrailgun 2015-10-10T21:23:07.000000Z 字数 1383 阅读 1795

PageRank

机器学习


Graph Data: Media Graph

Web as a directed graph:

PageRank

Define a rank rj for page j

rj=ijridi

Let M be a n×n matrix. If ij, Mij=1di, else Mij=0, and let ri of vector r denote rank of page i. The flow equations can be written as:

r=Mr

The Google Formulation

2 Problems

Teleports

With probability 1β jump to some random pages. This solves Spider Trap problem. For those Dead Ends, always teleports! Thus, the rank for page j is

rj=ijβridi+(1β)1N

The corresponding vectorized form is
r=(βM+(1β)[1N]N×N)r

Practical Implementation

M is too large to hold in memory.

Sparse matrix

We rearrange the pagerank equation

r=βMr+[1βN]N

Since M is a sparse matrix, there are many tricks for it. In practice, we might need to replace [1βN]N with [1jrjN]N, since we have dead ends.

Assume enough RAM to fit rnew into memory, and store rold and M on disk. 1 step of power iteration is:

  1. Initialize rnew=1βN;
  2. For each page i
    1. Read encoded sparse matrix M(1,:) into memory;
    2. For j=1di, update rnew.
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注