Page rank algorithm

programming

This algorithm is used to calculate the “importance” of a page on the internet, or the page rank. This a Probability distribution on the Webgraph, $\pi$.

To approximate this we use a Markov chain and approximate its stationary distribution.

Pseudocode

Build a Markov chain $P$ on the vertices of the Webgraph $G = (V, E)$. This will have

$$p_{y,x} = \begin{cases} \frac{\alpha}{\vert Out(y) \vert} + \frac{1 - \alpha}{\vert V \vert} & \mbox{if } (y,x) \in E\\ \frac{1 - \alpha}{\vert V \vert} & \mbox{if } \vert Out(y) \vert > 0\\ \frac{1}{\vert V \vert} & \mbox{otherwise.}\end{cases}$$

for some constant $\alpha \in [0,1]$ with $Out(x) = \{y \in V \vert (x,y) \in E\}$ as in the definition for page rank.

We then approximate its stationary distribution

$$\mu_0 P^t$$

for some large $t$.

There are some tricks that will help with this computation (note the Webgraph isn’t static)

  • if $\alpha$ is sufficiently small, then $t$ doesn’t have to be too large,
  • if we use an old approximation of $\pi$ for $\mu_0$ it will converge faster, and
  • matrix multiplication doesn’t need to take $O(N^2)$ instead we can get it on the order of $O(\vert E \vert)$.