机器学习
Web as a directed graph:
- Nodes: Web pages
 
- Edges: Hyperlinks
 
Define a rank rj for page j 
rj=∑i→jridi
Let M be a n×n matrix. If i→j, Mij=1di, else Mij=0, and let ri of vector r denote rank of page i. The flow equations can be written as: 
r=M⋅r
2 Problems
- Dead End (some pages have no out-links)
 
- Spider Trap (all out-links are within a group)
 
Teleports
With probability 1−β jump to some random pages. This solves Spider Trap problem. For those Dead Ends, always teleports! Thus, the rank for page j is 
rj=∑i→jβridi+(1−β)1N
 
The corresponding vectorized form is 
r=(βM+(1−β)[1N]N×N)⋅r
Practical Implementation
M is too large to hold in memory.
Sparse matrix
We rearrange the pagerank equation 
r=βM⋅r+[1−βN]N
Since M is a sparse matrix, there are many tricks for it. In practice, we might need to replace [1−βN]N with [1−∑jrjN]N, since we have dead ends.
Assume enough RAM to fit rnew into memory, and store rold and M on disk. 1 step of power iteration is:
- Initialize rnew=1−βN;
 
- For each page i 
- Read encoded sparse matrix M(1,:) into memory;
 
- For j=1…di, update rnew.