# Monte Carlo Estimates of Log Determinants

## Monte Carlo Estimates of Log-Determinants¶

In statistics, we often need to compute a log-determinant as part of a multivariate maximum likelihood problem for non-independent data. The matrix for which we may need a log-determinant can be very large, though more often than not it is also very sparse. Direct methods of computing the determinant for a symmetric variance-covariance matrix often use the Cholesky decomposition or in certain cases an LU decomposition can be used.

I have recently need to tackle this problem for the estimation of some spatial-econometric models. One good thing about the variance-covariance in spatial econometric models is that they're sparse. Unfortunately, all of the sparse Cholesky routines that I am aware of are not suitable to be included in BSD-compatible projects like statsmodels and pysal. Furthermore, sparse methods (sometimes) rely on particular orderings of the matrices to achieve good performance; however, these orderings are not always possible to achieve with spatial weighting matrices. Another good thing about working with spatial econometric models is that we often don't need exact estimates. Approximations often do just fine. This led me to a 1999 paper by Barry and Pace in which they introduce a simple Monte Carlo method to approximate log-determinants of matrices, "A Monte Carlo Estimator of Log Determinants of Large Sparse Matrices." [pdf]. See also Chapter 4.4 in LeSage and Pace's Introduction to Spatial Econometrics.

To describe the problem in a little more detail. We need to calculate $\ln|I_n - \rho W|$ where $I_n$ is an $n\times n$ identity matrix, $\rho$ is a scalar in $(-1,1)$, and $W$ has been rescaled such that its eigenvalues are in $(-1,1)$. Direct methods would require the computation of this log determinant for a number of different values of $\rho$. However, we can develop an MC approximation that avoids these recomputations by first noting that

$$\ln|I_n - \rho W| = \operatorname{tr}(\ln(I_n - \rho W))$$

where $\operatorname{tr}$ denotes the trace of the matrix. Next note that the matrix logarithm has the power series expansion

$$\ln(I_n - \rho W) = -\sum_{i=1}^{\infty}\frac{\rho^i W^i}{i}$$

Since the trace is a linear operation, we can write

$$\ln|I_n - \rho W| = -\sum_{i=1}^{\infty}\frac{\rho^i \operatorname{tr}(W^i)}{i}$$

Note that this will also hold for non-symmetric W with complex eigenvalues or for complex $\rho$. Assuming that truncation error is small, we can write a finite approximation of the above power series

$$\ln|I_n - \rho W| = -\sum_{i=1}^{m}\frac{\rho^i \operatorname{tr}(W^i)}{i}$$

The computation for $\operatorname{tr}(W^i)$ can be computationally costly for larger values of $i$. Barry and Pace introduce the following "trick" to calculate the trace. Let the vector $\boldsymbol{x}$ come from a $N_n(0,I)$ distribution and assume we have some matrix $D$. Since the square of any single $x_i\sim N(\cdot)$ gives $x_i^2\sim\chi^2_1$ with $\mathbb{E}[x_i^2]=1$ and $\mathbb{E}[x_ix_j]=0$ by construction, then given the quadratic form $\boldsymbol{x}^\prime D\boldsymbol{x}$ we have the following using $n=2$ for illustration

\begin{align}\boldsymbol{x}^\prime D\boldsymbol{x} &=x_1^2d_{1,1} + x_2^2d_{2,2} + x_1x_2(d_{1,2} + d_{2,1}) \cr \mathbb{E}[\boldsymbol{x}^\prime D\boldsymbol{x}] &=\mathbb{E}[x_1^2d_{1,1} + x_2^2d_{2,2} + x_1x_2(d_{1,2} + d_{2,1})] \cr &= d_{1,1} + d_{2,2} \cr &= \operatorname{tr(D)} \end{align}

This result suggests the following algorithm, which I'll implement below.

 pick x_i = randn(n)
set z = x_i
for t = 1:m
z = Wz
tr(W^i) = n*(x_i'z)/(t*x_i'x_i)


Storing all of the estimates for $\operatorname{tr}(W^i)$, we can use them to compute our truncated power series.

$$a^{\prime}b$$

where $a$ are the estimates of the trace and $b=[\rho, \rho^2, \dots, \rho^m]$. It's common to compute the estimate for a grid of values for $\rho$ so that the log determinant over all possible values for $\rho$ can be computed in a vectorized way and then to use interpolation to get intermediate values as needed.

Notice that I used $x_i$ for the random vector above. This suggests that we can do this numerous times and take the average to get a better estimate for the log determinant. Now let's look at the implementation.

In [43]:
import numpy as np
np.set_printoptions(suppress=True)


Take an example of a contiguity matrix

In [44]:
W = np.zeros((7,7), dtype=float)
W[0,1] = 1.
W[-1, -2] = 1.
for i in range(5):
W[i+1,i] = .5
W[i+1, i+2] = .5

In [45]:
W

Out[45]:
array([[ 0. ,  1. ,  0. ,  0. ,  0. ,  0. ,  0. ],
[ 0.5,  0. ,  0.5,  0. ,  0. ,  0. ,  0. ],
[ 0. ,  0.5,  0. ,  0.5,  0. ,  0. ,  0. ],
[ 0. ,  0. ,  0.5,  0. ,  0.5,  0. ,  0. ],
[ 0. ,  0. ,  0. ,  0.5,  0. ,  0.5,  0. ],
[ 0. ,  0. ,  0. ,  0. ,  0.5,  0. ,  0.5],
[ 0. ,  0. ,  0. ,  0. ,  0. ,  1. ,  0. ]])


This is what is called a row-stochastic weight matrix in the spatial regression literature. That is, the rows all sum to 1. This guarantees that the eigenvalues are constrained to be from -1 to 1.

In [46]:
np.linalg.eig(W)[0]

Out[46]:
array([-1.       , -0.8660254, -0.5      , -0.       ,  0.5      ,
1.       ,  0.8660254])

In [47]:
rho = .25


Direct computation of $\ln|I_n - \rho W|$ yields

In [48]:
np.log(np.linalg.det(np.eye(W.shape[0]) - rho*W))

Out[48]:
-0.12829609729207081


Or better yet, use the more numerically stable slogdet.

In [49]:
np.linalg.slogdet(np.eye(W.shape[0]) - rho*W)

Out[49]:
(1.0, -0.12829609729207084)


Below is the implementation of the algorithm of Barry and Pace. I haven't done any timings or optimizations for it. I first need to see how it performs in practice. Notice also that it only takes a single $\rho$. This is easily updated to work with a grid for $\rho$, though I'm thinking about using a memoization pattern for the trace estimates in my own code, depending on speed, so I've left it as is here.

One addition to the algorithm above is given via the exact keyword. If $W$ is symmetric with zeros on the diagonals, which it always will be for spatial econometric applications, then we can compute the first two moments exactly at small cost. $\operatorname{tr}(W=0)$ and $\operatorname{tr}(W^2)=\sum W^2$ where the square is taken elementwise.

In [50]:
from scipy import sparse

def logdet_mc(W, rho=1, seed=None, nterms=10, nrepl=100, exact=True):
"""
Compute the approximate log-determinant of I - rho*W.

Parameters
----------
W : array or sparse
An n x n array or a scipy.sparse matrix.
rho : scalar
The autoregressive coefficient.
seed : int or None
Seed passed to numpy.random.seed.
nterms : int
The number of terms for the power series estimate of the log determinate.
nrepl : int
The number of power series to estimate. The log determinate is the average
of each these.
exact : bool
Use exact if W is symmetric with zeros on the diagonal to use the exact
first two moments.
"""
if seed is not None:
np.random.seed(seed)

if hasattr(W, 'dot'): # then prefer it (for sparse) and newer numpy
Wdot = W.dot
else:
Wdot = lambda x : np.dot(W, x)

nobs = W.shape[0]
try:
assert W.shape[1] == nobs
except AssertionError:
raise ValueError("W is not square.")

allT = np.empty((nrepl, nterms))

range_ = np.arange(1., nterms+1)

for j in range(nrepl):

T = np.empty(nterms)
# get the "seed"
u1 = np.random.randn(nobs, 1)
z_t = u1
uTu = np.dot(u1.T, u1).item() # scalar

# compute the terms in the expansion
for i in range(1, nterms+1):
z_t = Wdot(z_t)
T[i-1] = np.dot(u1.T, z_t).item()

T *= nobs/(range_*uTu)
allT[j] = T

if exact: # must be symmetric and zeros on diagonal
if sparse.issparse(W):
exactT = [0, np.sum(W.data**2)/2.]
else:
exactT = [0, np.sum(W**2)/2.]

allT[:,:2] = exactT

meanT = np.mean(allT, axis=0)

b = -rho ** range_

return np.dot(b, meanT)

In [51]:
rho = .75


Exact estimate

In [52]:
np.linalg.slogdet(np.eye(W.shape[0]) - rho * W)

Out[52]:
(1.0, -1.5261936420271165)


Monte-carlo estimate

In [53]:
logdet_mc(W, rho, seed=1234, nterms=30, exact=False, nrepl=1000)

Out[53]:
-1.5140194676583663


(EDIT 8/14/2013) The below estimate using CHOLMOD is incorrect since W is not symmetric. The Cholesky decomposition estimate is only valid for positive-definite symmetric matrices. See below for a correct estimate using the LU decomposition.

Estimate using CHOLMOD from scikits.sparse.

In [54]:
sparseW = sparse.csc_matrix(W)

In [55]:
from scikits.sparse import cholmod

In [56]:
f = cholmod.cholesky((sparse.eye(7, format="csc") - rho*sparseW))
f.logdet()

Out[56]:
-2.0152902872928142


(EDIT 8/14/2013 Added this section). We can use the LU decomposition to get the log-determinant of a non-symmetric positive-definite matrix.

In [57]:
from scipy.sparse import linalg as sp_linalg

In [58]:
umfpack = sparse.linalg.umfpack.UmfpackContext()
L, U, P, Q, R, do_recip = umfpack.lu((sparse.eye(7, format="csc") - rho*sparseW))
print np.sum(np.log(U.diagonal()/R))

-1.52619364203



### Pace and Barry Data¶

The following example uses the example data from Pace and Barry (1997) on 3,107 US Counties for the 1980 Presidential election. You can find the data and more information here.

In [59]:
import pandas as pd
from scipy import sparse

In [60]:
dta = pd.read_table("http://spatial-econometrics.com/data/elect.dat", sep=" *", header=None)[[4,5]]
dta.rename(columns={4 : "lat", 5 : "long"}, inplace=True)


#### Nearest Neighbors¶

Compute the 5 nearest neighbors using Euclidean distance. Use 5 because it counts itself and we want the 4 nearest neighbors.

In [61]:
from sklearn import neighbors

In [62]:
neigh = neighbors.NearestNeighbors(n_neighbors=5)
neigh.fit(dta.values)

Out[62]:
NearestNeighbors(algorithm='auto', leaf_size=30, n_neighbors=5, p=2,

In [63]:
nobs = len(dta)
W = sparse.lil_matrix((nobs, nobs))

In [64]:
for i, row in dta.iterrows():
closest = neigh.kneighbors(row.values, return_distance=False)
for j in closest[0]:
if j == i: # don't count own observation
continue
W[i, j] = 1.


Make row stochastic

In [65]:
W = W / 4.
assert all(W.sum(1) == 1)


Convert to CSC format for calculations

In [66]:
W = W.tocsc()


Set $\rho$

In [67]:
rho = .1


Calculate the exact log determinant.

In [68]:
np.linalg.slogdet(np.eye(W.shape[0]) - rho*W.toarray())

Out[68]:
(1.0, -3.346557139406221)


Get the monte carlo approximation of the log determinant.

In [94]:
logdet_mc(W, rho, seed=124, nterms=10, exact=False, nrepl=2500)

Out[94]:
-3.2609283583388167


(EDIT 8/14/2013 This is wrong. See above note. LU decomposition added below.) Compute the log determinant using CHOLMOD and scikits.sparse again.

In [70]:
f = cholmod.cholesky(sparse.eye(W.shape[0], format="csc") - rho * W)

In [71]:
f.logdet()

Out[71]:
-3.9400025132708647


L, U, P, Q, R, do_recip = umfpack.lu((sparse.eye(W.shape[0], format="csc") - rho*W))

-3.34655713941