None
import numpy as np
from numpy import linalg as la
np.set_printoptions(precision=3,suppress=True)
A1=np.array([[1,0,0,1/2,0,0],[0,0,1,1/2,1/3,1/2],[0,1,0,0,0,0],[0,0,0,0,1/3,0],[0,0,0,0,0,1/2],[0,0,0,0,1/3,0]])
print(A1)
la.det(A1)
$A$ is a stochastic matrix because all its columns sum up to 1. We apply the Power method to see where the Markov chain converges.
# Power method with Euclidean (L_2 norm) Scaling:
x_0=np.array([0,0,0,0,0,1])
x_n=x_0
for i in range(14):
x_n = np.dot(A1,x_n)
x_n = x_n/np.linalg.norm(x_n,ord=2)
print(x_n)
x_n/np.linalg.norm(x_n,1) # we mormalise to obtain again probabilities
# Power method with sum (L_1 norm) Scaling:
x_0=np.array([0,0,0,0,0,1])
x_n=x_0
for i in range(14):
x_n = np.dot(A1,x_n)
x_n = x_n/np.linalg.norm(x_n,ord=1)
print(x_n)
x_n # /np.linalg.norm(x_n,1) # we do not need to normalise
# Power method with maximum entry (L_\infty norm) Scaling:
x_0=np.array([0,0,0,0,0,1])
x_n=x_0
for i in range(14):
x_n = np.dot(A1,x_n)
x_n = x_n/np.linalg.norm(x_n,ord=np.inf)
print(x_n)
x_n/np.linalg.norm(x_n,1) # we need to normalise to obtain probabilities
Normalization is useful to avoid numbers growing large and creating numerical problems. However, since the dominant eigenvalue of a stochastic matrix is never larger than one, we can also skip the normalization throughout the iterations in our specific case.
x_0=np.array([0,0,0,0,0,1])
x_n=x_0
for i in range(14):
x_n = np.dot(A1,x_n)
print(x_n)
x_n/np.linalg.norm(x_n,1) # we need to normalise to obtain probabilities
def power_method(A, x_0, niter, scaling=2):
x_n=x_0/np.linalg.norm(x_0,scaling)
for i in range(niter):
x_n = np.dot(A,x_n)
x_n = x_n/np.linalg.norm(x_n,ord=scaling)
print(x_n)
x_n/np.linalg.norm(x_n,ord=1) # we mormalise to obtain again probabilities
return x_n
x_0=np.array([0,0,0,0,0,1])
power_method(A1,x_0,15)
x_0=np.array([0,0,0,1,0,0])
power_method(A1, x_0, 14)
x_0=np.array([0,0,0,1,0,0])
power_method(A1, x_0, 15)
So the Markov process does not converge.
Remember that: If a Markov chain converges to a steady-state vector $\vec x$, if $\lambda_1=1$ is a dominat eigenvalue of $A$.
Let's have a look at the eigenvalues of the transition matrix:
la.eig(A1)
There are three eigenvalues with absolute value equal to 1, ie, $|\lambda_i|=1, i=1,2,3$. In general, a stochastic matrix has a dominant eigenvalue equal to one but not in this case.
A stochastic matrix has the following properties. The largest absolute value of a stochastic matrix is at most 1 by Gershgorin circle theorem (not discussed in class). Additionally, every stochastic matrix has an "obvious" column eigenvector associated to the eigenvalue 1: the vector ${\boldsymbol {1}}$, whose coordinates are all equal to 1.
On the other hand, Perron theorem applied to stochastic matrices tells us that if the stochastic matrix is positive then it has a dominant eigenvalue $\lambda = 1$. More generally, Frobenius theorem tells us that if the stochastic matrix is nonnegative and irreducible then again it has a dominant eigenvalue $\lambda = 1$. However, in general stochastic matrices need not be positive or irreducible.
In the next subtask we try to modify the transition matrix such that it has all entries positive (no zeros) such that Perron's theorem applies and the corresponding process converges.
A Markov process with transition matrix $A$ is said to be regular if all the entries of some power of $A$ are positive. It can be shown that if this happens the $A$ has a dominant eigenvalue equal to 1 as well.
A2 = 1/6*np.ones((6,6))
A2
x_0=np.array([0,0,0,1,0,0])
x=power_method(A2,x_0,niter=14)
x/np.linalg.norm(x,ord=1)
ell, P = la.eig(A2)
ell, P
The matrix is positive hence it has a dominant eigenvalue equal to 1. The corresponding eigenvector is a vector of entries of 1 or normalized as in the matrix P above.
If we can calculate the eigenvalues because computationally feasibile as in these small examples, we can also find the steady-state vectors by applying the theory seen on slide 20.
P_inv = la.inv(P) # this is also computationally demanding
z_0 = P_inv @ x_0
x_n = z_0[0] * P[:,0]
print(x_n)
A = 0.85 * A1 + 0.15 * A2
x_0=np.array([0,0,0,1,0,0])
power_method(A,x_0,niter=30)
Hence page 2 has the highest ranking
ell, P = la.eig(A)
ell, P
P_inv = la.inv(P) # this is also computationally demanding
z_0 = P_inv @ x_0
x_n = z_0[0] * P[:,0]
print(x_n)
The slight discrepancies might be due to rounding erros or to an insufficient number of iterations in the power method.
with open("../assets/top250movies.txt", encoding="utf-8") as f:
lines = f.readlines()
db = {}
for line in lines:
entries = line.strip().split("/")
db[entries[0]] = entries[1:]
To handle the issue of weights due to repeated co-presences of actors in movies, we can first create a multi directed graph and then convert it in a directed graph with weights on arcs. A multi digraph is a graph that allows multiple arcs between nodes. Alternatively, as hinted by the text of the exercise we could create a adjacency dictionary where for every actor we list the actors that are reached by the first actor (ie, more expensive actors) allowing repeated entries. Then we can construct the digraph using the adjacency dictionary. However, graphs in networkx automatically construct adjacency lists and since library functions are to be preferred in Python because more efficient, we prefer to use the first alternative with multi digraph then converted in digraph.
import networkx as nx
MDG = nx.MultiDiGraph()
for k in db:
for i in range(len(db[k])):
actor = db[k][i]
MDG.add_edges_from([(cheaper_actor,actor) for cheaper_actor in db[k][(i+1):]])
MDG.number_of_nodes(), MDG.number_of_edges()
DG = nx.DiGraph()
for node, outgoing_neighbors in MDG.adjacency():
for neighbor, arc_dict in outgoing_neighbors.items():
value = len(arc_dict.values())
DG.add_edge(node, neighbor, weight = value)
DG.number_of_nodes(), DG.number_of_edges()
PR = nx.pagerank(DG, alpha=0.7)
sorted(PR, key=PR.get, reverse=True)[0:5]