Application of Eigenvalues and Eigenvectors: Google Rank

Table of Contents

Eigenvalues and Eigenvectors: intuitive idea

Eigenvectors are those that maintain the direction after being multiplied by a matrix. Although they maintain the direction, their length and their point may change. This depends on the factor that multiplies the direction and has its eigenvalue.

Exemple en 2D

Consider the matrixto visualize the eigenvectors and eigenvalues, we plot in blue nthe original vector and in green its imatge by A. To see this in , we can take unit vectors v ( in bleu) with the arrow over the circle of radius 1 and we paint their transformed by the product with the matrix (in green). As we will see, the originals move on a circle and the transformed ones move on an ellipse. Note that when the two directions are aligned we are facing a proper direction. What we can observe is that:
A=[1, 3; 4, 2]/4;
h=2*pi/200;
ang=0:h:2*pi-h; %we take 500
v=[cos(ang); sin(ang)]; %all initial vectors of norm 1
vt=A*v; %all vectors transformed by A
% we paint each vector and its transformed
for k=1:2 %we do two laps
for i=1:size(v,2)
plot(v(1,:), v(2,:),'b'); % the circle
hold on;
plot(vt(1,:), vt(2,:),'g'); % the ellipse
axis equal
plot([0,v(1,i)],[0,v(2,i)],'b');
plot([0,vt(1,i)],[0,vt(2,i)],'g');
plot(v(1,i),v(2,i),'marker','o','markersize',10,'markerfacecolor','blue');
plot(vt(1,i),vt(2,i),'marker','o','markersize',10,'markerfacecolor','green');
hold off;
pause(0.01) %in seconds
end
end

The power method: maximum module eigenvalue

The simplest way to calculate an eigenvector is the power method, which consists in applying the iteration:
being A a linear matrix and normalizing x at every step
This method provides us with the eigenvector corresponding to the eigenvalue of the maximum (dominant) module. The convergence is guaranted if there exists an eigenvalue bigger than the rest .
The eigenvalue is computed from the ratio of the coordinates beteween a vector and its transformed by A
.
To see how it works, we will compare it with a simple example with what Matlab gives us.
A=[1, 3; 4, 2]/4;
[veps, vaps]=eig(A)
veps = 2×2
-0.7071 -0.6000 0.7071 -0.8000
vaps = 2×2
-0.5000 0 0 1.2500
x = [1;1]; % initial vector
xant = x/norm(x); % normalization
tol = 1.e-6; % tolerance value
variation = 1000; % big enough value
nIter=0;
while (variation > tol && nIter < 100 ) %maximum 100 iterations
nIter = nIter+1;
xnou = A*xant;
xnou = xnou/norm(xnou);
variation = max(abs(xnou-xant)); %norma subinfinit (equivalent: norm(xnou-xant,inf))
% actualitzem per passar a la propera iteració
xant = xnou;
end
 
eigMax = max((A*xnou)./xnou); %ha de donar pràcticament el mateix per a totes les components
fprintf('nIter = %d , vap max = %f, vep=[%.4e, %.4e] \n',nIter,eigMax,xnou); %noteu el canvi de signe del vector
nIter = 15 , vap max = 1.250000, vep=[6.0000e-01, 8.0000e-01]

Application of the calculation of Eigenvalues and Eigenvectors: Google Rank

Pagerank's algorithm was invented by Page i Brin (1998) and was used in the engine prototype near Google.
The objective is to estimate the popularity, or importance, of a page based on the interconnexion of the web. The idea behind is
The idea of the classification that Google does in the vicinity of the most relevant pages when we make a search, is based on the study of the connectivity matrix between the different pages. This connectivity is quantified from the links between the pages. Thus, for example, we can have a set of pages related to each other by the graph of the figure (where each arrow means a link from one to the other).
If we number the different nodes of the graph as A = 1, B = 2, C = 3, D = 4. Each connection is represented as a coefficient in the connectivity matrix, in which each row is the information that nodes link to that of the current row. Rows mean arrows that enter and columns that leave each node.
C1 = [0,0,1,1;
1,0,0,0;
1,0,0,1;
1,1,1,0]
C1 = 4×4
0 0 1 1 1 0 0 0 1 0 0 1 1 1 1 0

Standarization:

In this matrix each column indicates the arrows that 'leave' each node, therefore we will normalize by columns to represent in a normalized way the weight of the connectivity of each node. Technically it is said that the matrix is stochastic by columns. The matrix obtained satisfies that the sum by columns is 1.
for i=1:size(C1,2)
A1(:,i) = C1(:,i)/sum(C1(:,i));
end
A1
A1 = 4×4
0 0 0.5000 0.5000 0.3333 0 0 0 0.3333 0 0 0.5000 0.3333 1.0000 0.5000 0

Iterator algorithm:

Essentially it consists of applying matrix A succesively to an initial vector that is normally the vector onas. In fact, it is the power method as we have seen before. Here the own vector associated to the dominant own value will have a fundamental interpretation at the component level since the maximum component corresponds to the most important node of the graph.
A=[];
A=A1; %iteration matrix
tol = 1.e-3;
x=ones(size(A,1),1);
x = x/norm(x); %normalize
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
nIter=nIter+1;
x=xnou;
xnou=A*x;
X=[X; nIter,xnou'];
variation = max(abs(xnou-x));
end
X
X = 11×5
0 0.5000 0.5000 0.5000 0.5000 1.0000 0.5000 0.1667 0.4167 0.9167 2.0000 0.6667 0.1667 0.6250 0.5417 3.0000 0.5833 0.2222 0.4931 0.7014 4.0000 0.5972 0.1944 0.5451 0.6632 5.0000 0.6042 0.1991 0.5307 0.6661 6.0000 0.5984 0.2014 0.5344 0.6658 7.0000 0.6001 0.1995 0.5324 0.6681 8.0000 0.6002 0.2000 0.5341 0.6657 9.0000 0.5999 0.2001 0.5329 0.6671
[val,highestRank]=max(X(end,2:end)); %maximum value of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.666484, number of page = 4

A special case: one page does not cite any other (dangling node)

We now consider the case of the figure, in which node D has no exit arrow (does not cite any pages). This case does not allow the application of the algorithm and we will have to modify the connectivity matrix.
For this case the connectivity matrix is:
C2=[0, 0, 1, 0;
1, 0, 0, 0;
1, 0, 0, 0;
1, 1, 1, 0];
% that standarized it would be:
for i=1:size(C2,2)
A2(:,i) = C2(:,i)/sum(C2(:,i));
end
A2
A2 = 4×4
0 0 0.5000 NaN 0.3333 0 0 NaN 0.3333 0 0 NaN 0.3333 1.0000 0.5000 NaN
This matrix could not be used to iterate. Even if we respect the last column of zeros, with which the iterations could be done, then the convergence would be towards zero (do the calculation).
To fix the problem, we substitute the column of NaN with a value divided as it would be 1 / N where N is the number of nodes.
N=size(A2,2);
A2(:,4)=1/N;
A2
A2 = 4×4
0 0 0.5000 0.2500 0.3333 0 0 0.2500 0.3333 0 0 0.2500 0.3333 1.0000 0.5000 0.2500
Now we can do the iterations:
A=[];
A=A2; %iterations matrix
tol = 1.e-3;
x=ones(size(A,1),1);
x = x/norm(x); %normalize
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
nIter=nIter+1;
x=xnou;
xnou=A*x;
X=[X; nIter,xnou'];
variation = max(abs(xnou-x));
end
X
X = 11×5
0 0.5000 0.5000 0.5000 0.5000 1.0000 0.3750 0.2917 0.2917 1.0417 2.0000 0.4062 0.3854 0.3854 0.8229 3.0000 0.3984 0.3411 0.3411 0.9193 4.0000 0.4004 0.3626 0.3626 0.8743 5.0000 0.3999 0.3521 0.3521 0.8960 6.0000 0.4000 0.3573 0.3573 0.8854 7.0000 0.4000 0.3547 0.3547 0.8906 8.0000 0.4000 0.3560 0.3560 0.8880 9.0000 0.4000 0.3553 0.3553 0.8893
[val,highestRank]=max(X(end,2:end)); %maximum value of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.888672, number of page = 4

Separable networks

A different problem is when the connections are defined so that there are blocks in the network from which you could not leave (following only the links)
That's what happens with the blocks E, F, G, H of the network of the figure
C3=[0,0,1,0,0,0,0,0;
1,0,0,1,0,0,0,0;
1,0,0,0,0,0,0,0;
1,1,1,0,0,0,0,0;
0,1,0,0,0,1,0,0;
0,0,0,0,0,0,1,1;
0,0,0,1,1,0,0,1;
0,0,0,0,0,1,0,0;];
for i=1:size(C3,2)
A3(:,i) = C3(:,i)/sum(C3(:,i));
end

Irreducible graph: the magic number of Google

The previous case is that of a reducible graph, in fact a graph is said to be irreducible if we can leave a node and following the arrows, I can reach any other node and vice versa.
If we iterate through a reducible graph, they will tend to zero. To fix the problem you have to modify the normalized matrix with the formula:
where d is a constant (damping factor), and N the number of columns of A.
Google used the value of damping d = 0.85
d= 0.85;
N=size(A3,2);
M=d*A3+((1-d)/N)*ones(size(A3));
%
% Now we can do the iterations as in the previous cases:
%
A=[];
A=M; %iteration matrix
tol = 1.e-3;
x=ones(size(A,1),1);
x = x/norm(x); %normalize
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
nIter=nIter+1;
x=xnou;
xnou=A*x;
X=[X; nIter,xnou'];
variation = max(abs(xnou-x));
end
[val,highestRank]=max(X(end,2:end)); %maximum component of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.802451, number of page = 6

Associated eigenvalues

The convergence of the iterations is linked to the eigenvalues of the iteration matrix. By construction the matrix we have used are called stochastic matrix by columns (the sum by columns is 1) and all their elements are positive. The Perron-Frobenius theorem guarantees that these matrices have their own value of module 1 and the others are strictly lower than 1.
We see this in our matrices:
% Modulus of eigenvalues for A1
abs(eig(A1)) %abs is the module if the vap is complex
ans = 4×1
1.0000 0.4082 0.4082 0.5000
% Modulus of eigenvalues for A2
abs(eig(A2))
ans = 4×1
1.0000 0.5000 0.2500 0.0000
% Modulus of eigenvalues for M
abs(eig(M))
ans = 8×1
1.0000 0.7361 0.7361 0.4250 0.3470 0.4250 0.3470 0.0000

N large: Generation of a random graph

We can generate larger measurement graphs at random (if we do not have data) from a matrix C generated with 0 and 1's.
N=40;
C4=round(rand(N)); %we fill the matrix of 0 and 1
for i=1:N
C4(i,i)=0;
end
spy(C4)
% standardization
for i=1:size(C4,2)
A4(:,i) = C4(:,i)/sum(C4(:,i));
end
% we see if there is any column of NaN
A4(isnan(A4)) = 1/N;
% we avoid irreducible grafs
d= 0.85;
M4=d*A4+((1-d)/N)*ones(size(A4));
%
% Now we can do the iterations as in the previous cases:
%
A=[];
A=M4; %iteration matrix
tol = 1.e-3;
x=ones(size(A,1),1);
x=x/norm(x);
X=[0,x'];
xnou=A*x;
variation = max(abs(xnou-x));
nIter=1;
X=[X;nIter,xnou'];
while (nIter < 100 && variation > tol)
nIter=nIter+1;
x=xnou;
xnou=A*x;
X=[X; nIter,xnou'];
variation = max(abs(xnou-x));
end
[val,highestRank]=max(X(end,2:end)); %maximum value of the last iteration
fprintf('value =%f, number of page = %d \n',val, highestRank);
value =0.194661, number of page = 2
% Modulus eigenvalues M4: (first 8 )
modVaps=abs(eig(M4));
modVaps(1:8) %we show only the first 8 vaps in module
ans = 8×1
1.0000 0.1450 0.1450 0.1214 0.1214 0.1165 0.1165 0.1193
Exercice 1:
The figure below corresponds to the representation of a sphere (a soccer ball) or also of a bidirectional graph of 60 nodes (carbon atoms) and 90 connection connections known in Chemistry with the name of fullerene C-60 (Bucky ) and its numbering within the Matlab.
Its connectivity or adjacency matrix can be loaded from Matlab by doing
C=bucky;
(a) Apply the GoogleRank method with and to classify the most important nodes
(b) Enter a new connection from node 10 to 50 and apply the same algorithm.
(c) Do the same but from the original model and eliminating the two connections between nodes 7 and 30.
(c) Numerical Factory 2024 (following the paper http://home.ie.cuhk.edu.hk/~wkshum/papers/pagerank.pdf)