Computing PageRank

Revised PageRank (1) is based on the main formula

R = (C A + E × 1) R

where A is the modified adjacency matrix, E is a "source vector", and 1 is a row vector of all 1's.

Once approach to approximate the principal eigenvector would be to use the Power Method described earlier in this talk. In reference 2, however, Lawrence Page et al propose the following variation that overlooks issues of scale. (This presentation changes the notation somewhat to have more of a programming style.)

   Let S be almost any vector over Web pages (for example E)

   R = S

        R' = AR
        d = ||R|| - ||R'||
        R' = R' + dE
        δ = ||R' - R||
   Until (δ < ε)

   Return R' as eigenvector approximation

In this modified algorithm, we continue to compute powers of the A matrix, but we also add part of the source vector at each iteration.

Reference 2 (from 1998) indicates that "on a large 22 million link database converges to a reasonable tolerance in roughly 52 iterations. The convergence on half the data takes roughly 45 iterations."

created: 10 February 2007
last revised: 15 February 2007
previous  next Valid HTML 4.01!