--- title: "Equivariant Estimation of Kronecker Structured Covariance Matrices" author: "David Gerard" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Equivariant Estimation} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Abstract. This Vignette demonstrates using the equivariant estimation procedures in the `R` package `tensr`. We will simulate some data then explore the performance differences between the uniformly minimum risk equivariant estimator (UMREE), the maximum likelihood estimator (MLE), multiway Takemura's estimator, and a moment based estimator. Then we will demonstrate extracting posterior summaries. Finally, we will demonstrate the equivariance of the Bayes rule. # Comparison of Methods. First, we generate a data array $X$ of dimension $10 \times 10 \times 10$. Each mode of $X$ will have an AR-1(0.7) covariance structure. That is $\text{cov}(\text{vec}(X)) = \Sigma_3 \otimes \Sigma_2 \otimes \Sigma_1$, where $\text{vec}(\cdot)$ is the same think as `c` in `R` and just "vectorizes" $X$, and "$\otimes$" is the Kronecker product. Hence $\Sigma_{i,j} = 0.7^{|i - j|}$. ```{r, echo = FALSE} library(tensr) ``` ```{r} set.seed(360) p <- c(10, 10, 10) #Dimension of tensor. n <- length(p) #Order of the tensor. cov_list <- list() normalized_cov_list <- list() #Same as cov_list, but scaled down to determinant 1. sig2_total <- 1 rho <- 0.7 for (mode_index in 1:n) { cov_list[[mode_index]] <- rho ^ abs(outer(1:p[mode_index], 1:p[mode_index], "-")) scale_cov <- det(cov_list[[mode_index]]) ^ (1 / p[mode_index]) sig2_total <- sig2_total * scale_cov normalized_cov_list[[mode_index]] <- cov_list[[mode_index]] / scale_cov } ##Generate data with above covariance structure. X <- atrans(array(rnorm(prod(p)), dim = p), lapply(cov_list, mhalf)) ``` To get the UMREE, we first apply the function `equi_mcmc`. This will give us posterior draws of the inverse of the lower-triangular Cholesky square root of each component covariance matrix. This is with respect to using the right Haar measure over a product group of lower-triangular matrices as our prior. Using the output from `equi_mcmc`, we apply the function `get_equi_bayes` to obtain the Bayes rule under multiway Stein's loss. ```{r} mcmc_out <- equi_mcmc(X, 1000) bayes_rule <- get_equi_bayes(mcmc_out$Phi_inv, mcmc_out$sigma) cov_umree <- bayes_rule$Sig_hat #Estimate of the "standard deviation" form for the total variation parameter. sig_umree <- bayes_rule$b ``` We can improve on the UMREE by calculating UMREEs for different rotations of the data array, then "average" these UMREEs. The more averagings the better in terms of statistical risk. This estimator is called multiway Takemura's estimator. ```{r} tak_est <- multiway_takemura(X, ortho_max = 3, print_mcmc = TRUE) cov_takemura <- tak_est$B sig_takemura <- tak_est$b ``` We use `holq` to obtain the MLE for comparison. The output of `holq` is fed into `mle_from_holq` to obtain the MLEs of each component covariance matrix. ```{r} holq_x <- holq(X, print_diff = FALSE) mle_x <- mle_from_holq(holq_x) cov_mle <- mle_x$cov_mle #The "standard deviation" form for the total variation parameter. sig_mle <- mle_x$sig_mle ``` We also obtain a "moment-based" estimator for comparison. These are just the sample covariance matrices of each mode, scaled down to have unit determinant. The estimate of the total variation parameter is the MLE conditional on the component covariance matrices. This is not implemented in `tensr` because it performs poorly. ```{r} get_moment <- function(X) { p <- dim(X) n <- length(p) cov_moment <- list() cov_chol_inv <- list() ## used for getting scale est for (mode_index in 1:n) { cov_temp <- mat(X, mode_index) %*% t(mat(X, mode_index)) cov_moment[[mode_index]] <- cov_temp/(det(cov_temp)^(1/p[mode_index])) cov_chol_inv[[mode_index]] <- t(backsolve(chol(cov_moment[[mode_index]]), diag(p[mode_index]))) } sig_moment <- sqrt(sum(atrans(X, cov_chol_inv)^2)/prod(p)) return(list(cov_moment = cov_moment, sig_moment = sig_moment)) } moment_x <- get_moment(X) cov_moment <- moment_x$cov_moment sig_moment <- moment_x$sig_moment ``` We can now compare the multiway Stein's losses for each of the estimators using `multi_stein_loss_cov`. The risk performance should be, in decreasing order of performance, multiway Takemura's estimator, UMREE, MLE, moment. Of course, for any individual realization, the loss ordering may be different. ```{r} umree_loss <- multi_stein_loss_cov(cov_umree, normalized_cov_list, sig_umree, sqrt(sig2_total)) tak_loss <- multi_stein_loss_cov(cov_takemura, normalized_cov_list, sig_takemura, sqrt(sig2_total)) mle_loss <- multi_stein_loss_cov(cov_mle, normalized_cov_list, sig_mle, sqrt(sig2_total)) moment_loss <- multi_stein_loss_cov(cov_moment, normalized_cov_list, sig_moment, sqrt(sig2_total)) cat( "Multiway Takemura's Loss:", round(tak_loss), "\n", " UMREE Loss:", round(umree_loss), "\n", " MLE Loss:", round(mle_loss), "\n", " Moment Loss:", round(moment_loss), "\n" ) ``` # Posterior Summaries You can also get the samples from the posterior if you want to perform a fully Bayesian analysis. Just insert the output of `equi_bayes` into `convert_cov` to get these posterior samples. ```{r} cov_post <- convert_cov(mcmc_out) ``` `cov_post[[2]]` contains the posterior samples of the total variation parameter. `cov_post[[1]][[i]][,,j]` contains the $j$th posterior sample of the $i$th component covariance matrix. Here are some trace plots of the posterior samples. ```{r} par(cex.axis = 1, cex.lab = 1, cex.axis = 1, mar = c(2.4,2.6,2,0.2), mgp = c(1.5,0.5,0)) ## trace plot of total variation parameter plot(cov_post[[2]], type = "l", ylab = expression(sigma^2), xlab = "Iteration", main = "Trace Plot") abline(h = sig2_total, lty = 2, col = 2) legend("topright", "True Parameter", lty = 2, col = 2, bty = "n") ## some more trace plots random_trace <- function(cov_post, true_cov, p) { n <- length(p) mode_look <- sample(1:n, 1) x_look <- sample(1:p[mode_look], size = 1) y_look <- sample(1:p[mode_look], size = 1) plot(cov_post[[1]][[mode_look]][x_look, y_look, ], type = "l", xlab = "Iteration", ylab = bquote(Sigma[.(mode_look)]["["][.(x_look)][","][.(y_look)]["]"]), main = "Traceplot") abline(h = true_cov[[mode_look]][x_look, y_look], lty = 2, col = 2) legend("topright", "True Parameter", lty = 2, col = 2, bty = "n") } for(index in 1:4){ random_trace(cov_post, normalized_cov_list, p) } ``` Here are the 95\% posterior credible intervals of the covariance parameters, sorted in lexicographical ordering of the row and column indices. ```{r} par(cex.axis = 1, cex.lab = 1, cex.axis = 1, mar = c(2.4,2.6,2,0.2), mgp = c(1.5,0.5,0)) quantile_list <- list() for (mode_index in 1:n) { quantile_list[[mode_index]] <- apply(cov_post[[1]][[mode_index]], c(1, 2), quantile, probs = c(0.025, 0.5, 0.975)) } par(cex.main = 0.7) for (mode_index in 1:n) { upper_val <- c(quantile_list[[mode_index]][3, , ][lower.tri(diag(p[mode_index]), diag = TRUE)]) lower_val <- c(quantile_list[[mode_index]][1, , ][lower.tri(diag(p[mode_index]), diag = TRUE)]) median_val <- c(quantile_list[[mode_index]][2, , ][lower.tri(diag(p[mode_index]), diag = TRUE)]) num_cred <- p[mode_index] * (p[mode_index] + 1)/2 plot(c(), xlim = c(1, num_cred), ylim = c(min(lower_val), max(upper_val)), xaxt = "n", xlab = "Parameter", ylab = "Value", main = substitute(z~~Sigma[y], list(z = "Medians and 95% Credible Intervals for", y = mode_index))) arrows(1:num_cred, lower_val, 1:num_cred, upper_val, length = 0) points(1:num_cred, median_val, pch = 16, col = 2, cex = 0.4) points(1:num_cred, median_val, pch = 1, col = 1, cex = 0.4) abline(h = 0, lty = 2, col = 2) v_at <- cumsum(p[mode_index]:2) + 0.5 abline(v = v_at, lty = 2) } ``` # Demonstrate Equivariance We finish with a small example of how the UMREE is equivariant. First, we generate a group element of a product group of lower triangular matrices. ```{r} A_list <- list() for (mode_index in 1:n) { A_temp <- matrix(0, nrow = p[mode_index], ncol = p[mode_index]) A_temp[lower.tri(A_temp)] <- rnorm(p[mode_index] * (p[mode_index] - 1)/2) diag(A_temp) <- rgamma(p[mode_index], shape = 1, rate = 1) A_temp <- A_temp/prod(diag(A_temp))^(1/p[mode_index]) A_list[[mode_index]] <- A_temp } a_scale <- 10 ``` Now we transform X. ```{r} X_transformed <- a_scale * atrans(X, A_list) ``` Now we get the UMREEs of the original and the transformed data. ```{r} mcmc_out <- equi_mcmc(X, 10000) bayes_rule <- get_equi_bayes(mcmc_out$Phi_inv, mcmc_out$sigma) cov_x <- bayes_rule$Sig_hat sig_x <- bayes_rule$b mcmc_out <- equi_mcmc(X_transformed, 10000) bayes_rule <- get_equi_bayes(mcmc_out$Phi_inv, mcmc_out$sigma) cov_x_t <- bayes_rule$Sig_hat sig_x_t <- bayes_rule$b ``` The UMREE of transformed data should equal the transformed UMREE of original data. ```{r} #These two quantities should be close. cat(" Total Variation of transformed data:", sig_x_t, "\n", "Transformation of total variation of original data:", sig_x * a_scale, "\n") for (mode_index in 1:n) { par(ask = TRUE) x_val <- cov_x_t[[mode_index]][lower.tri(diag(p[mode_index]))] y_val <- (A_list[[mode_index]] %*% cov_x[[mode_index]] %*% t(A_list[[mode_index]]))[lower.tri(diag(p[mode_index]))] plot(x_val, y_val, xlab = "Transform of UMREE", ylab = "UMREE of Transformed Data", main = "Covariance Estimates") mtext("(Should Lie on Line if Equivariant)") abline(c(0, 1)) par(ask = FALSE) } ``` # References **Gerard, D.**, & Hoff, P. (2015). Equivariant minimax dominators of the MLE in the array normal model. _Journal of Multivariate Analysis_, 137, 32-49. [[Link to JMVA]](https://doi.org/10.1016/j.jmva.2015.01.020) [[Link to arXiv]](http://arxiv.org/pdf/1408.0424.pdf)