@nrailgun 2015-10-10T13:23:07.000000Z 字数 1383 阅读 1906

PageRank

机器学习

Graph Data: Media Graph

Web as a directed graph:

Nodes: Web pages
Edges: Hyperlinks

PageRank

Define a rank $r_j$ for page $j$

r j = \sum i \to j r i d i

$r_j = \sum_{i \rightarrow j} \frac{r_i}{d_i}$

Let $M$ be a $n \times n$ matrix. If $i \rightarrow j$ , $M_{ij} = \frac{1}{d_i}$ , else $M_{ij} = 0$ , and let $r_i$ of vector $r$ denote rank of page $i$ . The flow equations can be written as:

r = M \cdot r

$r = M \centerdot r$

The Google Formulation

2 Problems

Dead End (some pages have no out-links)
Spider Trap (all out-links are within a group)

Teleports

With probability $1 - \beta$ jump to some random pages. This solves Spider Trap problem. For those Dead Ends, always teleports! Thus, the rank for page $j$ is

r j = \sum i \to j β r i d i + (1 - β) 1 N

$r_j = \sum_{i \rightarrow j} \beta \frac{r_i}{d_i} + (1 - \beta) \frac{1}{N}$
The corresponding vectorized form is

r = (β M + (1 - β) [1 N] N \times N) \cdot r

$r = (\beta M + (1 - \beta) \left[ \frac{1}{N} \right]_{N \times N}) \centerdot r$

Practical Implementation

$M$ is too large to hold in memory.

Sparse matrix

We rearrange the pagerank equation

r = β M \cdot r + [1 - β N] N

$r = \beta M \centerdot r + \left[\frac{1-\beta}{N}\right]_{N}$

Since $M$ is a sparse matrix, there are many tricks for it. In practice, we might need to replace $\left[\frac{1-\beta}{N}\right]_{N}$ with $\left[\frac{1-\sum_jr_j}{N}\right]_{N}$ , since we have dead ends.

Assume enough RAM to fit $r^{new}$ into memory, and store $r^{old}$ and $M$ on disk. 1 step of power iteration is:

Initialize $r^{new} = \frac{1 - \beta}{N}$ ;
For each page i
1. Read encoded sparse matrix $M(1, :)$ into memory;
2. For $j = 1 \dots d_i$ , update $r^{new}$ .

PageRank

Graph Data: Media Graph

PageRank

The Google Formulation

2 Problems

Teleports

Practical Implementation

Sparse matrix

内容目录