Cholesky Decomposition for Structural Equation Models in R

Hierarchical regression models are common in linear regression to examine the amount of explained variance a variable explains beyond the variables already included in the model. Such models can fit with more general structural equations, too, with the advantage being it can handle latent variables and multiple outcomes. One way to do this is to use a Cholesky decomposition. Below, I demonstrate this using the lavaan package in R. (For more lavaan examples, see Latent Variable Modeling using R: A Step-By-Step Guide)

The example comes from a 1999 article by de Jong . There question of interest is how much do phonological Awareness (3 tests), phonological Synthesis (3 tests), serial rapid naming (SRN; 2 tests), and fluid reasoning (Gf; 3 tests) contribute to the prediction of reading (1 test).

Here is simplified path model for the regression. Path model for Cholesky decomposition example

1. Import the data.

# load lavaan package
library(lavaan)
# correlation values
jong.cor <- c(1, 0.56, 0.48, 0.6, 0.59, 0.51, 0.32, 0.41, 0.24, 0.46, 0.2, 0.38, 
    1, 0.45, 0.53, 0.54, 0.56, 0.51, 0.53, 0.35, 0.51, 0.38, 0.43, 1, 0.63, 
    0.56, 0.57, 0.29, 0.36, 0.35, 0.42, 0.21, 0.34, 1, 0.81, 0.76, 0.48, 0.58, 
    0.34, 0.47, 0.22, 0.44, 1, 0.84, 0.55, 0.58, 0.4, 0.5, 0.26, 0.49, 1, 0.59, 
    0.62, 0.43, 0.47, 0.37, 0.48, 1, 0.8, 0.39, 0.47, 0.25, 0.55, 1, 0.34, 0.45, 
    0.24, 0.51, 1, 0.53, 0.48, 0.25, 1, 0.51, 0.41, 1, 0.26, 1)
# sd values
jong.sd <- c(1.8, 4.3, 3.8, 5.5, 4, 3.1, 0.4, 0.4, 6.8, 4.7, 6.8, 4.1)
# variable names
jong.names <- c("aware.a", "aware.b", "aware.c", "synth.a", "synth.b", "synth.c", 
    "srn.a", "srn.b", "gf.a", "gf.b", "gf.c", "reading")
# make correlations into matrix
jong.cor <- vech.reverse(jong.cor)
# names variables in correlation matrix
colnames(jong.cor) <- rownames(jong.cor) <- jong.names
# convert correlation matrix to covariance matrix
jong.cov <- cor2cov(jong.cor, jong.sd)

Here is how the covariance matrix should look

jong.cov

##         aware.a aware.b aware.c synth.a synth.b synth.c  srn.a  srn.b
## aware.a  3.2400  4.3344  3.2832   5.940   4.248  2.8458 0.2304 0.2952
## aware.b  4.3344 18.4900  7.3530  12.534   9.288  7.4648 0.8772 0.9116
## aware.c  3.2832  7.3530 14.4400  13.167   8.512  6.7146 0.4408 0.5472
## synth.a  5.9400 12.5345 13.1670  30.250  17.820 12.9580 1.0560 1.2760
## synth.b  4.2480  9.2880  8.5120  17.820  16.000 10.4160 0.8800 0.9280
## synth.c  2.8458  7.4648  6.7146  12.958  10.416  9.6100 0.7316 0.7688
## srn.a    0.2304  0.8772  0.4408   1.056   0.880  0.7316 0.1600 0.1280
## srn.b    0.2952  0.9116  0.5472   1.276   0.928  0.7688 0.1280 0.1600
## gf.a     2.9376 10.2340  9.0440  12.716  10.880  9.0644 1.0608 0.9248
## gf.b     3.8916 10.3071  7.5012  12.149   9.400  6.8479 0.8836 0.8460
## gf.c     2.4480 11.1112  5.4264   8.228   7.072  7.7996 0.6800 0.6528
## reading  2.8044  7.5809  5.2972   9.922   8.036  6.1008 0.9020 0.8364
##            gf.a    gf.b    gf.c reading
## aware.a  2.9376  3.8916  2.4480  2.8044
## aware.b 10.2340 10.3071 11.1112  7.5809
## aware.c  9.0440  7.5012  5.4264  5.2972
## synth.a 12.7160 12.1495  8.2280  9.9220
## synth.b 10.8800  9.4000  7.0720  8.0360
## synth.c  9.0644  6.8479  7.7996  6.1008
## srn.a    1.0608  0.8836  0.6800  0.9020
## srn.b    0.9248  0.8460  0.6528  0.8364
## gf.a    46.2400 16.9388 22.1952  6.9700
## gf.b    16.9388 22.0900 16.2996  7.9007
## gf.c    22.1952 16.2996 46.2400  7.2488
## reading  6.9700  7.9007  7.2488 16.8100

2. Specify the fit measures of interest.

# fit indices
fit.measures <- c("chisq", "df", "pvalue", "cfi", "srmr", "rmsea")

3. Specify and fit the measurement model (i.e., all latent variables are correlated).

# specify model
measurement.model <- '
# latent variable model
awareness =~ aware.a + aware.b + aware.c
synthesis =~ synth.a + synth.b + synth.c
srn =~ srn.a + srn.b
gf =~ gf.a + gf.b + gf.c
read =~ 1* reading
'
# fit model to data
measurement.fit <- cfa(measurement.model, sample.cov=jong.cov, sample.nobs=95)

# assess model fit
fitMeasures(measurement.fit, fit.measures = fit.measures)

##  chisq     df pvalue    cfi   srmr  rmsea 
## 56.727 45.000  0.113  0.982  0.047  0.052

# inspect latent covariances
inspect(measurement.fit, what = "cor.lv")

##           awrnss synths srn   gf    read 
## awareness 1.000                          
## synthesis 0.880  1.000                   
## srn       0.647  0.705  1.000            
## gf        0.734  0.620  0.585 1.000      
## read      0.545  0.525  0.591 0.456 1.000

4. Specify and fit the simultaneous regression model.

# specify regression model
regression.model <- '
# latent variable model
awareness =~ aware.a + aware.b + aware.c
synthesis =~ synth.a + synth.b + synth.c
srn =~ srn.a + srn.b
gf =~ gf.a + gf.b + gf.c

# regression
reading ~ awareness + synthesis + srn + gf 
'
# fit model to data
regression.fit <- sem(regression.model, sample.cov=jong.cov, sample.nobs=95)

# assess model fit
fitMeasures(regression.fit, fit.measures = fit.measures)

##  chisq     df pvalue    cfi   srmr  rmsea 
## 56.727 45.000  0.113  0.982  0.047  0.052

# inspect coefficients
regression.pe <- inspect(regression.fit, what = "std.coef")
regression.pe[regression.pe$op == "~", 1:4]

##       lhs op       rhs est.std
## 1 reading  ~ awareness   0.303
## 2 reading  ~ synthesis  -0.048
## 3 reading  ~       srn   0.418
## 4 reading  ~        gf   0.018

5. Specify Cholesky decomposition model. For this example, the variable order of entry is: (a) Gf, (b) Awareness, ( c ) Synthesis, and (d) Serial Rapid Naming.

cholesky.model <- '
# latent variable model
awareness =~ aware.a + aware.b + aware.c
synthesis =~ synth.a + synth.b + synth.c
srn =~ srn.a + srn.b
gf =~ gf.a + gf.b + gf.c
# cholesky
f1 =~ srn 
f2 =~  srn + synthesis
f3 =~  srn + synthesis + awareness
f4 =~  srn + synthesis + awareness + gf
f1 ~~ 0*f2 + 0*f3 + 0*f4
f2 ~~ 0*f3  + 0*f4
f3 ~~  0*f4
gf ~~0*gf
awareness~~0*awareness
synthesis~~0*synthesis
srn~~0*srn
# constraint to keep f2 variance positive
f2 ~~ a*f2
a > 0
# regression
reading ~ f1 + f2 + f3 + f4
'
# fit model to data
cholesky.fit <- sem(cholesky.model, sample.cov=jong.cov, sample.nobs=95)

# assess model fit
fitMeasures(cholesky.fit, fit.measures = fit.measures)

##  chisq     df pvalue    cfi   srmr  rmsea 
## 56.727 45.000  0.113  0.982  0.047  0.052

# inspect coefficients
cholesky.pe <- inspect(cholesky.fit, what = "std.coef")
cholesky.pe[cholesky.pe$op == "~", 1:4]

##       lhs op rhs est.std
## 1 reading  ~  f1   0.285
## 2 reading  ~  f2   0.103
## 3 reading  ~  f3   0.310
## 4 reading  ~  f4   0.456

For the hierarchical analysis standardized regression values, the value associated with factor f4 is the correlation of reading with Gf. The square of the standardized regression value reflects the proportion of variance in reading explained by Gf, i.e., 0.46² = 0.21

The other values are semipartial correlations for the factors (variables) subsequently included. The square of these standardized regression values reflect the incremental proportion of variance (Δ R²) explained in reading after including the the factor (variable). Thus, adding Awareness (f3) explains and additional 0.10 proportion of reading’s variance. For Synthesis (f2) and SRN (f1), the respective amount of additional variance they explained are 0.01 and 0.08

References

de Jong, P. F. (1999). Hierarchical regression analysis in structural equation modeling. Structural Equation Modeling: A Multidisciplinary Journal, 6, 198-211. doi: 10.1080/10705519909540128