Application of Eigenvalues and Eigenvectors: Google Rank

Table of Contents

Eigenvalues and Eigenvectors: intuitive idea Exemple en 2D The power method: maximum module eigenvalue Application of the calculation of Eigenvalues and Eigenvectors: Google Rank Standarization: Iterator algorithm: A special case: one page does not cite any other (dangling node) Separable networks Irreducible graph: the magic number of Google Associated eigenvalues N large: Generation of a random graph

Eigenvalues and Eigenvectors: intuitive idea

Eigenvectors are those that maintain the direction after being multiplied by a matrix. Although they maintain the direction, their length and their point may change. This depends on the factor that multiplies the direction and has its eigenvalue.

Exemple en 2D

Consider the matrix

to visualize the eigenvectors and eigenvalues, we plot in blue nthe original vector and in green its imatge by A. To see this in

, we can take unit vectors v ( in bleu) with the arrow over the circle of radius 1 and we paint their transformed by the product with the matrix (in green). As we will see, the originals move on a circle and the transformed ones move on an ellipse. Note that when the two directions are aligned we are facing a proper direction. What we can observe is that:

If v and have the same orientation, the associated eigenvalue λ is positive
If v and have inverted orientation, the associated eigenvalue λ is negative
If , the associated eigenvalue
If , the associated eigenvalue

A=[1, 3; 4, 2]/4;

h=2*pi/200;

ang=0:h:2*pi-h; %we take 500

v=[cos(ang); sin(ang)]; %all initial vectors of norm 1

vt=A*v; %all vectors transformed by A

% we paint each vector and its transformed

for k=1:2 %we do two laps

for i=1:size(v,2)

plot(v(1,:), v(2,:),'b'); % the circle

hold on;

plot(vt(1,:), vt(2,:),'g'); % the ellipse

axis equal

plot([0,v(1,i)],[0,v(2,i)],'b');

plot([0,vt(1,i)],[0,vt(2,i)],'g');

plot(v(1,i),v(2,i),'marker','o','markersize',10,'markerfacecolor','blue');

plot(vt(1,i),vt(2,i),'marker','o','markersize',10,'markerfacecolor','green');

hold off;

pause(0.01) %in seconds

end

The power method: maximum module eigenvalue

The simplest way to calculate an eigenvector is the power method, which consists in applying the iteration:

being A a linear matrix and normalizing x at every step

This method provides us with the eigenvector corresponding to the eigenvalue of the maximum (dominant) module. The convergence is guaranted if there exists an eigenvalue bigger than the rest

The eigenvalue is computed from the ratio of the coordinates beteween a vector and its transformed by A

To see how it works, we will compare it with a simple example with what Matlab gives us.

A=[1, 3; 4, 2]/4;
[veps, vaps]=eig(A) 
veps = 2×2
   -0.7071   -0.6000
    0.7071   -0.8000
vaps = 2×2
   -0.5000         0
         0    1.2500
x = [1;1];        % initial vector 
xant = x/norm(x); % normalization
tol = 1.e-6;      % tolerance value
variation = 1000;  % big enough value
nIter=0;
while (variation > tol && nIter < 100 ) %maximum 100 iterations
    nIter = nIter+1;
    xnou = A*xant;
    xnou = xnou/norm(xnou);
    variation = max(abs(xnou-xant)); %norma subinfinit (equivalent: norm(xnou-xant,inf))
    % actualitzem per passar a la propera iteració
    xant = xnou;
end
 
eigMax = max((A*xnou)./xnou); %ha de donar pràcticament el mateix per a totes les components
fprintf('nIter = %d , vap max = %f, vep=[%.4e, %.4e] \n',nIter,eigMax,xnou); %noteu el canvi de signe del vector
nIter = 15 , vap max = 1.250000, vep=[6.0000e-01, 8.0000e-01] 

Application of the calculation of Eigenvalues and Eigenvectors: Google Rank

Pagerank's algorithm was invented by Page i Brin (1998) and was used in the engine prototype near Google.

The objective is to estimate the popularity, or importance, of a page based on the interconnexion of the web. The idea behind is

that a page with more incoming links is more important than a page that has less.
A page with a link to a page that is known to be very important is also more important.

The idea of the classification that Google does in the vicinity of the most relevant pages when we make a search, is based on the study of the connectivity matrix between the different pages. This connectivity is quantified from the links between the pages. Thus, for example, we can have a set of pages related to each other by the graph of the figure (where each arrow means a link from one to the other).

If we number the different nodes of the graph as A = 1, B = 2, C = 3, D = 4. Each connection is represented as a coefficient in the connectivity matrix, in which each row is the information that nodes link to that of the current row. Rows mean arrows that enter and columns that leave each node.

C1 = [0,0,1,1;
     1,0,0,0;
     1,0,0,1;
     1,1,1,0]
C1 = 4×4
     0     0     1     1
     1     0     0     0
     1     0     0     1
     1     1     1     0

Standarization:

In this matrix each column indicates the arrows that 'leave' each node, therefore we will normalize by columns to represent in a normalized way the weight of the connectivity of each node. Technically it is said that the matrix is stochastic by columns. The matrix obtained satisfies that the sum by columns is 1.

for i=1:size(C1,2)
    A1(:,i) = C1(:,i)/sum(C1(:,i));
end
A1
A1 = 4×4
         0         0    0.5000    0.5000
    0.3333         0         0         0
    0.3333         0         0    0.5000
    0.3333    1.0000    0.5000         0

Iterator algorithm:

Essentially it consists of applying matrix A succesively to an initial vector that is normally the vector onas. In fact, it is the power method as we have seen before. Here the own vector associated to the dominant own value will have a fundamental interpretation at the component level since the maximum component corresponds to the most important node of the graph.

A=[];
A=A1; %iteration matrix
tol = 1.e-3;
x=ones(size(A,1),1); 
x = x/norm(x); %normalize
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
    nIter=nIter+1;
    x=xnou;
    xnou=A*x;  
    X=[X; nIter,xnou'];
    variation = max(abs(xnou-x));
end
X
X = 11×5
         0    0.5000    0.5000    0.5000    0.5000
    1.0000    0.5000    0.1667    0.4167    0.9167
    2.0000    0.6667    0.1667    0.6250    0.5417
    3.0000    0.5833    0.2222    0.4931    0.7014
    4.0000    0.5972    0.1944    0.5451    0.6632
    5.0000    0.6042    0.1991    0.5307    0.6661
    6.0000    0.5984    0.2014    0.5344    0.6658
    7.0000    0.6001    0.1995    0.5324    0.6681
    8.0000    0.6002    0.2000    0.5341    0.6657
    9.0000    0.5999    0.2001    0.5329    0.6671
[val,highestRank]=max(X(end,2:end)); %maximum value of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.666484, number of page = 4 

A special case: one page does not cite any other (dangling node)

We now consider the case of the figure, in which node D has no exit arrow (does not cite any pages). This case does not allow the application of the algorithm and we will have to modify the connectivity matrix.

For this case the connectivity matrix is:

C2=[0, 0, 1, 0;
   1, 0, 0, 0;
   1, 0, 0, 0;
   1, 1, 1, 0];
% that standarized it would be:
for i=1:size(C2,2)
    A2(:,i) = C2(:,i)/sum(C2(:,i));
end
A2
A2 = 4×4
         0         0    0.5000       NaN
    0.3333         0         0       NaN
    0.3333         0         0       NaN
    0.3333    1.0000    0.5000       NaN

This matrix could not be used to iterate. Even if we respect the last column of zeros, with which the iterations could be done, then the convergence would be towards zero (do the calculation).

To fix the problem, we substitute the column of NaN with a value divided as it would be 1 / N where N is the number of nodes.

N=size(A2,2);
A2(:,4)=1/N;
A2
A2 = 4×4
         0         0    0.5000    0.2500
    0.3333         0         0    0.2500
    0.3333         0         0    0.2500
    0.3333    1.0000    0.5000    0.2500

Now we can do the iterations:

A=[];
A=A2;  %iterations matrix
tol = 1.e-3;
x=ones(size(A,1),1); 
x = x/norm(x); %normalize
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
    nIter=nIter+1;
    x=xnou;
    xnou=A*x;  
    X=[X; nIter,xnou'];
    variation = max(abs(xnou-x));
end
X
X = 11×5
         0    0.5000    0.5000    0.5000    0.5000
    1.0000    0.3750    0.2917    0.2917    1.0417
    2.0000    0.4062    0.3854    0.3854    0.8229
    3.0000    0.3984    0.3411    0.3411    0.9193
    4.0000    0.4004    0.3626    0.3626    0.8743
    5.0000    0.3999    0.3521    0.3521    0.8960
    6.0000    0.4000    0.3573    0.3573    0.8854
    7.0000    0.4000    0.3547    0.3547    0.8906
    8.0000    0.4000    0.3560    0.3560    0.8880
    9.0000    0.4000    0.3553    0.3553    0.8893
[val,highestRank]=max(X(end,2:end)); %maximum value of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.888672, number of page = 4 

Separable networks

A different problem is when the connections are defined so that there are blocks in the network from which you could not leave (following only the links)

That's what happens with the blocks E, F, G, H of the network of the figure

C3=[0,0,1,0,0,0,0,0;
   1,0,0,1,0,0,0,0;
   1,0,0,0,0,0,0,0;
   1,1,1,0,0,0,0,0;
   0,1,0,0,0,1,0,0;
   0,0,0,0,0,0,1,1;
   0,0,0,1,1,0,0,1;
   0,0,0,0,0,1,0,0;];
for i=1:size(C3,2)
    A3(:,i) = C3(:,i)/sum(C3(:,i));
end

Irreducible graph: the magic number of Google

The previous case is that of a reducible graph, in fact a graph is said to be irreducible if we can leave a node and following the arrows, I can reach any other node and vice versa.

If we iterate through a reducible graph, they will tend to zero. To fix the problem you have to modify the normalized matrix with the formula:

where d is a constant (damping factor),

and N the number of columns of A.

Google used the value of damping d = 0.85

d= 0.85;
N=size(A3,2);
M=d*A3+((1-d)/N)*ones(size(A3));
%
% Now we can do the iterations as in the previous cases:
%
A=[];
A=M; %iteration matrix
tol = 1.e-3;
x=ones(size(A,1),1); 
x = x/norm(x); %normalize
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
    nIter=nIter+1;
    x=xnou;
    xnou=A*x;  
    X=[X; nIter,xnou'];
    variation = max(abs(xnou-x));
end
[val,highestRank]=max(X(end,2:end)); %maximum component of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.802451, number of page = 6 

Associated eigenvalues

The convergence of the iterations is linked to the eigenvalues of the iteration matrix. By construction the matrix we have used are called stochastic matrix by columns (the sum by columns is 1) and all their elements are positive. The Perron-Frobenius theorem guarantees that these matrices have their own value of module 1 and the others are strictly lower than 1.

We see this in our matrices:

% Modulus of eigenvalues for A1
abs(eig(A1))  %abs is the module if the vap is complex
ans = 4×1
    1.0000
    0.4082
    0.4082
    0.5000
% Modulus of eigenvalues for A2
abs(eig(A2))  
ans = 4×1
    1.0000
    0.5000
    0.2500
    0.0000
% Modulus of eigenvalues for M
abs(eig(M))
ans = 8×1
    1.0000
    0.7361
    0.7361
    0.4250
    0.3470
    0.4250
    0.3470
    0.0000

N large: Generation of a random graph

We can generate larger measurement graphs at random (if we do not have data) from a matrix C generated with 0 and 1's.

N=40;

C4=round(rand(N)); %we fill the matrix of 0 and 1

for i=1:N

C4(i,i)=0;

end

spy(C4)

% standardization

for i=1:size(C4,2)

A4(:,i) = C4(:,i)/sum(C4(:,i));

end

% we see if there is any column of NaN

A4(isnan(A4)) = 1/N;

% we avoid irreducible grafs

d= 0.85;

M4=d*A4+((1-d)/N)*ones(size(A4));

% Now we can do the iterations as in the previous cases:

A=[];

A=M4; %iteration matrix

tol = 1.e-3;

x=ones(size(A,1),1);

x=x/norm(x);

X=[0,x'];

xnou=A*x;

variation = max(abs(xnou-x));

nIter=1;

X=[X;nIter,xnou'];

while (nIter < 100 && variation > tol)

nIter=nIter+1;

x=xnou;

xnou=A*x;

X=[X; nIter,xnou'];

variation = max(abs(xnou-x));

end

[val,highestRank]=max(X(end,2:end)); %maximum value of the last iteration

fprintf('value =%f, number of page = %d \n',val, highestRank);

value =0.194661, number of page = 2 

% Modulus eigenvalues M4: (first 8 )

modVaps=abs(eig(M4));

modVaps(1:8) %we show only the first 8 vaps in module

ans = 8×1

Exercice 1:

The figure below corresponds to the representation of a sphere (a soccer ball) or also of a bidirectional graph of 60 nodes (carbon atoms) and 90 connection connections known in Chemistry with the name of fullerene C-60 (Bucky ) and its numbering within the Matlab.

Its connectivity or adjacency matrix can be loaded from Matlab by doing

C=bucky;

(a) Apply the GoogleRank method with

and

to classify the most important nodes

(b) Enter a new connection from node 10 to 50 and apply the same algorithm.