The Gaussian tripod by Piotr Zwiernik

This short note is about a simple Gaussian latent tree model, where three jointly Gaussian random variables \(X_1\), \(X_2\), \(X_3\) are assumed to be jointly independent given a latent Gaussian random variable \(H\).

Model description

For simplicity assume that \(\mathbb{E}(X_i)=\mathbb{E}(H)=0\). Then the model is given by all \(3\times 3\) covariance matrices that satify the given conditional independence assumptions. Since the model is defined only by conditional independence statements, there are no constraints on the variances \({\rm var}(X_i)\) and hence the model can be analyzed by describing possible correlation matrices. Because any two \(X_i\), \(X_j\) are independent given \(H\) we have: \[ {\rm corr}(X_i,X_j)\quad=\quad {\rm corr}(X_i,H)\cdot {\rm corr}(X_j,H)\qquad\mbox{for all }i,j. \] It follows that the model is parametrized by setting values for \({\rm corr}(X_i,H)\in [-1,1]\) and in particular it does not depend on \({\rm var}(H)\), which we can then set to be \(1\). The space of all possible correlation matrices in the model has dimension \(3\) and is depicted below:

the Gaussian tripod

It is important to note that the boundary of this model is a union of three linear exponential families given by simple graphical models, where two of the observed random variables are independent given a third one. This boundary is a positive semidefinite part of the algebraic surface given as the union of three surfaces \(xy=z\), \(xz=y\), \(yz=x\), depicted below:

the boundary

Maximum likelihood

An important consequence of this nice description of the boundary is that maximizing the likelihood function over the model is simple and it can be done in a closed form. Modulo the constant, the log-likelihood is of the form \[ \ell(K)\quad=\quad \frac{n}{2}\log\det K-\frac{n}{2}{\rm Trace}(S K), \] where \(K:=\Sigma^{-1}\) is the inverse of a correlation matrix in the model and \(S\) is the sample correlation matrix. This function is concave in the space of all positive semi-definite correlation matrices with the unique maximum given by \(S^{-1}\). It follows that the log-likelihood function restricted to the model:

has a unique maximum \(\Sigma^\star=S\) if \(S\) lies in the model, or;
the maximum lies on the boundary if \(S\) does not lie in the model.

In this last case we simply list three (unique) MLEs for each of the pieces of the boundary and report the one with higher value of the log-likelihood function. For example on the piece of the boundary, where \({\rm corr}(X_1,H)=1\) the maximum likelihood estimate of the correlation matrix is obtained by setting \({\rm corr}(X_1,X_2)\) and \({\rm corr}(X_1,X_3)\) to their sample values \(\hat{\rho}_{12}\), \(\hat{\rho}_{13}\) and the MLE for \({\rm corr}(X_2,X_3)\) is \(\hat{\rho}_{12}\hat{\rho}_{13}\).

To check if the sample correlation matrix lies in the model space we list the defining constraints, which in our case are: \[ {\rm corr}(X_1,X_2){\rm corr}(X_1,X_3)\leq {\rm corr}(X_2,X_3),\quad{\rm corr}(X_1,X_2){\rm corr}(X_2,X_3)\leq {\rm corr}(X_1,X_3),\quad {\rm corr}(X_1,X_3){\rm corr}(X_2,X_3)\leq {\rm corr}(X_1,X_2) \] together with an additional constraint \({\rm corr}(X_1,X_2){\rm corr}(X_1,X_3) {\rm corr}(X_2,X_3)\geq 0\)

As I show below, for realistic data the MLE will often lie on the boundary of the model. In consequence, the reported parameter vector corresponds to a degenerate model in which one of the observed random variables is functionally related to the random variable, which is not observed (sic!). In this situation, the original interpretation ofthe model breaks down, which calls for a better understanding of this model class and its inference.

Consequences for the inference

A standard way to fit this model is to use the EM algorithm. If the sample correlation lies in the model, then te likelihood is well behaved and the unique global maximum will be easily found. In other case the EM algorithm may report a local maximum.

We first check the volume of the model in the space of all positive definite correlation matrices. The following R code tells us that the model takes approximately 20% of the total volume of the set of all concentration matrices.

set.seed(0)
# sample independently the correlation coefficients
n <- 100000
x <- runif(n, min=-1, max=1)
y <- runif(n, min=-1, max=1)
z <- runif(n, min=-1, max=1)
# construct the corresponding "correlation matrix"
S <- array(1,c(3,3,n))
S[1,2,] <- S[2,1,] <- x
S[1,3,] <- S[3,1,] <- y
S[2,3,] <- S[3,2,] <- z
# check constraints
dets <- apply(S,3,det)
c <- S[1,2,]*S[1,3,]*S[2,3,]
c1 <- abs(S[1,2,])-abs(S[1,3,]*S[2,3,])
c2 <- abs(S[1,3,])-abs(S[1,2,]*S[2,3,])
c3 <- abs(S[2,3,])-abs(S[1,3,]*S[1,2,])
posdef <- (dets>0)*1
inmodel <- posdef*(c>=0)*(c1>=0)*(c2>=0)*(c3>=0)
#report the proportion
sum(inmodel)/sum(posdef)

## [1] 0.2017

Of course if we really expect the data to be generated by a probability distribution in the model then the probability that sample correlation lies in the model is higher. In this case the analysis will depend on the sample size and I omit it. This however already indicates that the situation can get really bad for more complicated models.

Maximum likelihood estimation

To show how the MLE can be performed for this simple model suppose that the real concentration matrix is \[ \left[\begin{array}{ccc} 1 & 0.72 & 0.54\\ \cdot & 1 & 0.48\\ \cdot & \cdot & 1 \end{array}\right], \] which lies in the model and corresponds to \({\rm corr}(X_1,H)=0.9\), \({\rm corr}(X_2,H)=0.8\), \({\rm corr}(X_3,H)=0.6\). Suppose that we have a sample of length 10 from this distribution.

library(MASS)
set.seed(1)
N <- 10 # sample size
Lam <- c(0.9,0.8,0.6)
R <- diag(1-Lam^2)+Lam%*%t(Lam)
dat <- mvrnorm(N, rep(0,3), R); 
R0 <- cor(dat)
R0

##        [,1]   [,2]   [,3]
## [1,] 1.0000 0.8077 0.3709
## [2,] 0.8077 1.0000 0.1705
## [3,] 0.3709 0.1705 1.0000

This sample correlation matrix does not lie in the model and hence the MLE will lie on the boundary of the model the model. We check each of the boundary pieces separately

R1 <- R2 <- R3 <- array(1,c(3,3))
R1[1,2] <- R1[2,1] <- R2[1,2] <- R2[2,1] <-R0[1,2]
R1[1,3] <- R1[3,1] <- R3[1,3] <- R3[3,1] <-R0[1,3]
R3[3,2] <- R3[2,3] <- R2[3,2] <- R2[2,3] <-R0[3,2]
R1[2,3] <- R1[3,2] <- R0[1,2]*R0[1,3]
R2[1,3] <- R2[3,1] <- R0[1,2]*R0[2,3]
R3[1,2] <- R3[2,1] <- R0[1,3]*R0[2,3]
l1 <- -log(det(R1))-sum(diag(R0%*%solve(R1)))
l2 <- -log(det(R2))-sum(diag(R0%*%solve(R2)))
l3 <- -log(det(R3))-sum(diag(R0%*%solve(R3)))
c(l1,l2,l3)

## [1] -1.795 -1.914 -2.822

The likelihood is maximized on the piece of the boundary where \({\rm corr}(X_1,H)=1\). The other two parameters are estimated to be \[{\rm corr}(X_2,H)=\hat{\rho}_{12},\quad {\rm corr}(X_3,H)=\hat{\rho}_{13}.\] So the vector of parameters is

c(1,R0[1,2],R0[1,3])

## [1] 1.0000 0.8077 0.3709

Consider another much bigger data set:

library(MASS)
set.seed(0)
N <- 1000 # sample size
Lam <- c(0.9,0.8,0.6)
R <- diag(1-Lam^2)+Lam%*%t(Lam)
dat <- mvrnorm(N, rep(0,3), R); 
R0 <- cor(dat)
R0

##        [,1]   [,2]   [,3]
## [1,] 1.0000 0.7408 0.5318
## [2,] 0.7408 1.0000 0.4471
## [3,] 0.5318 0.4471 1.0000

In this case the sample covariance matrix lies in the model and hence it is the global maximum. We easily check that the value of the likelihood function over this point is larger than on the boundary:

R1 <- R2 <- R3 <- array(1,c(3,3))
R1[1,2] <- R1[2,1] <- R2[1,2] <- R2[2,1] <-R0[1,2]
R1[1,3] <- R1[3,1] <- R3[1,3] <- R3[3,1] <-R0[1,3]
R3[3,2] <- R3[2,3] <- R2[3,2] <- R2[2,3] <-R0[3,2]
R1[2,3] <- R1[3,2] <- R0[1,2]*R0[1,3]
R2[1,3] <- R2[3,1] <- R0[1,2]*R0[2,3]
R3[1,2] <- R3[2,1] <- R0[1,3]*R0[2,3]
#l0 is the MLE in the interior, the rest is the boundary
l0 <- -log(det(R0))-sum(diag(R0%*%solve(R0)))
l1 <- -log(det(R1))-sum(diag(R0%*%solve(R1)))
l2 <- -log(det(R2))-sum(diag(R0%*%solve(R2)))
l3 <- -log(det(R3))-sum(diag(R0%*%solve(R3)))
c(l0,l1,l2,l3)

## [1] -1.863 -1.872 -1.981 -2.445

The maximum likelihood estimates of the parameters are

r1 <- sqrt((R0[1,2]*R0[1,3])/(R0[2,3]))
r2 <- sqrt((R0[1,2]*R0[2,3])/(R0[1,3]))
r3 <- sqrt((R0[1,3]*R0[2,3])/(R0[1,2]))
c(r1,r2,r3)

## [1] 0.9387 0.7892 0.5665

and they are close to the true parameters of the data generating distribution.

Lam

## [1] 0.9 0.8 0.6

The EM-algorithm

In this section we compare our exact results with the output of the EM-algorithm. We use the library FactMixtAnalysis.

library(FactMixtAnalysis)

## Loading required package: mvtnorm

set.seed(1)
N <- 10 # sample size
dat <- mvrnorm(N, rep(0,3), R)
resFEM <- fma(dat,1,1)
# translate results into our parameters
pars <- sqrt(diag(var(dat)-resFEM$psi)/diag(var(dat)))
# the parameters
pars

## [1] 0.9926 0.8340 0.4716

Rem <- array(1,c(3,3))
Rem[1,2] <- Rem[2,1] <- pars[1]*pars[2]
Rem[1,3] <- Rem[3,1] <- pars[1]*pars[3]
Rem[3,2] <- Rem[2,3] <- pars[3]*pars[2]
lem <- -log(det(Rem))-sum(diag(R0%*%solve(Rem)))
# the likelihood
lem

## [1] -1.976

(The transformation between FactMixtAnalysis’ parameters and our edge correlations is explained in the end of this note)

The result above is close to the exact result given earlier but it is still worse. This shows that the behavior of the likelihood around the boundary is less stable. On the other hand our second example with the MLE in the interior of the model is much more stable.

library(MASS)
set.seed(0)
N <- 1000 # sample size
dat <- mvrnorm(N, rep(0,3), R); 
resFEM <- fma(dat,1,1)
pars <- sqrt(diag(var(dat)-resFEM$psi)/diag(var(dat)))
#parameters
pars

## [1] 0.9389 0.7894 0.5671

Rem <- array(1,c(3,3))
Rem[1,2] <- Rem[2,1] <- pars[1]*pars[2]
Rem[1,3] <- Rem[3,1] <- pars[1]*pars[3]
Rem[3,2] <- Rem[2,3] <- pars[3]*pars[2]
lem <- -log(det(Rem))-sum(diag(R0%*%solve(Rem)))
# the likelihood
lem

## [1] -1.863