ISREL is a stand-alone
software product designed to estimate and test Structural Equation Models
(SEMs). SEMs are statistical models of linear relationships among latent
(unobserved) variables and manifest (observed) variables. Lisrel can also
carry out exploratory and confirmatory factor analysis, as well as path
analysis.
LISREL uses the correlations
or covariances among measured variables such as survey items to estimate
or infer the values of factor loadings, variances, and errors of latent
(unobserved) variables. LISREL syntax prior to version 8 relied solely on
the specification of the nature of eight matrices: LX, LY, TD, TE, PH, PS,
GA, and BE. These matrices detail the interrelationships of the manifest
variables with the latent variables in any given model. In this way it can
perform factor and path analyses. Furthermore, LISREL's flexibility allows
it to also estimate the relationships among latent variables with other
latent variables.
LISREL now features a new
programming language called SIMPLIS, which allows the LISREL user to
program in a language approximating plain English rather than explicitly
specifying matrix values, as was the case in previous versions of LISREL.
However, you may still use the older LISREL matrix syntax with this
version.
New
Features
- Structural Equation
Modeling with incomplete data: Efficient Full Information Maximum
Likelihood (FIML) for incomplete data that are missing at random.
- Multilevel Structural
Equation Modeling with complete and incomplete data.
- Nonlinear Multilevel
Modeling: Two-level nonlinear regression models.
- Exploratory Data Analysis:
Formal Inference-based Recursive Modeling (FIRM) for detecting complex
statistical relationships among categorical and continuous variables.
- Multiple Imputation:
Expected Maximization (EM) or Markov Chain Monte Carlo (MCMC) for
imputing incomplete data that are missing at random under the assumption
of an underlying multivariate normal distribution.
- PRELIS System Files:
LISREL 8.50 for Windows uses a PRELIS System File (*. psf ) to store information such as number of observations, number
of variables, variable names, type of variable, category labels, missing
value codes and the raw data. When opened the .PSF file is displayed in
spreadsheet format and a PSF toolbar appears which enables users to make
graphic displays, define and compute variables and analyze the data. A
*. psf file can now also be specified as part of the
LISREL or SIMPLIS syntax. Use of a *. psf file greatly
facilitates the ability to draw path diagrams and build syntax
interactively.
- External Data Sources:
Import data from numerous external data sources with no limitations on
the number of observations and variables, except for those imposed by
the computer resources.
- Additional Graphical
Displays: Pie charts, Box-and-Whisker plots and matrix scatterplots.
- Windows Interface: A 32
bit Windows interface for Windows 95, 98, Me, NT, 2000 and XP supporting
long path and file names. Note that Windows 3.1+ is no longer supported.
- Documentation: Revised
online Help file covering all new features and a 500 page user's guide
describing the LISREL user interface, new statistical features and
syntax.
LISREL
Historical Background
Most first, and even second,
courses in applied statistics seldom go much further than ordinary least
squares analysis of data from controlled experiments, group comparisons,
or simple prediction studies. Collectively, these procedures make up
regression analysis, and the linear mathematical functions on which they
depend are referred to as regression models. This basic method of data
analysis is quite suitable for curve-fitting problems in physical science,
where an empirical relationship between an observed dependent variable and
a manipulated independent variable must be estimated. It also serves well
the purposes of biological investigation in which organisms are assigned
randomly to treatment conditions and differences in the average responses
among the treatment groups are estimated. An essential feature of these
applications is that only the dependent variable or the observed response
is assumed to be subject to measurement error or other uncontrolled
variation. That is, there is only one random variable in the picture. The
independent variable or treatment level is assumed to be fixed by the
experimenter at known predetermined values. The only exception to this
formulation is the empirical prediction problem. For that purpose, the
investigator observes certain values of one or more predictor variables
and wishes to estimate the mean and variance of the distribution of a
criterion variable among respondents with given values of the predictors.
Because the prediction is conditional on these known values, they may be
considered fixed quantities in the regression model. An example is
predicting the height that a child will attain at maturity from his or her
current height and the known heights of the parents. Even though all of
the heights are measured subject to error, only the childs height at
maturity is considered a random variable. Where ordinary regression
methods no longer suffice, and indeed give misleading results, is in
purely observational studies in which all variables are subject to
measurement error or uncontrolled variation and the purpose of the inquiry
is to estimate relationships that account for variation among the
variables in question. This is the essential problem of data analysis in
those fields where experimentation is impossible or impractical and mere
empirical prediction is not the objective of the study. It is typical of
almost all research in fields such as sociology, economics, ecology, and
even areas of physical science such as geology and meteorology. In these
fields, the essential problem of data analysis is the estimation of
structural relationships between quantitative observed variables. When the
mathematical model that represents these relationships is linear we speak
of a linear structural relationship. The various aspects of
formulating, fitting, and testing such relationships we refer to as structural equation modeling . Although structural equation
modeling has become a prominent form of data analysis only in the last
twenty years (thanks in part to the availability of the LISREL program),
the concept was first introduced nearly eighty years ago by the population
biologist, Sewell Wright, at the University of Chicago. He showed that
linear relationships among observed variables could be represented in the
form of so-called path diagrams and associated path
coefficients . By tracing causal and associational paths on the
diagram according to simple rules, he was able to write down immediately
the linear structural relationship between the variables. Wright applied
this technique initially to calculate the correlation expected between
observed characteristics of related persons on the supposition of
Mendelian inheritance. Later, he applied it to more general types of
relationships among persons. The modern form of linear structural analysis
includes an algebraic formulation of the model in addition to the path
diagram representation. The two forms are equivalent and the
implementation of the analysis in the LISREL program permits the user to
submit the model to the computer in either representation. The path
analytic approach is excellent when the number of variables involved in
the relationship is moderate, but the diagram becomes cumbersome when the
number of variables is large. In that case, writing the relationships
symbolically is more convenient. The SIMPLIS manual presents examples of
both representations and makes clear the correspondence between the paths
and the structural equations. Notice that in the above mentioned fields in
which experimentation is hardly ever possible, psychology and education do
not appear. Controlled experiments with both animal and human subjects
have been a mainstay of psychological research for more than a century,
and in the 1920s experimental evaluations of instructional methods began
to appear in education. As empirical research developed in these fields,
however, a new type of data analytic problem became apparent that was not
encountered in other fields. In psychology, the difficulty was, and still
is, that for the most part there are no well-defined dependent variables.
The variables of interest differ widely from one area of psychological
research to another and often go in and out of favor within areas over
relatively short periods of time. Psychology has been variously described
as the science of behavior or the science of human information processing.
But the varieties of behavior and information handling are so multifarious
that no progress in research can be made until investigators identify the
variables to be studied and the method of observing them. Where headway
has been made in defining a coherent domain of observation, it has been
through the mediation of a construct-some putative latent variable that is
modified by stimuli from various sources and in turn controls or
influences various observable aspects of behavior. The archetypal example
of such a latent variable is the construct of general intelligence
introduced by Charles Spearman to account for the observed positive
correlations between successful performance on many types of
problem-solving tasks. Investigation of mathematical and statistical
methods required in validating constructs and measuring their influence
led to the development of the data analytic procedure called factor
analysis . Its modern form is due largely to the work of Truman Kelly
and L.L.Thurstone, who transformed Spearman's one-factor analysis into a
fully general multiple-factor analysis. More recently, Karl Jöreskog added
confirmatory factor analysis to the earlier exploratory form of analysis.
The two forms serve different purposes. Exploratory factor analysis is an
authentic discovery procedure: it enables one to see relationships among
variables that are not at all obvious in the original data or even in the
correlations among variables. Confirmatory factor analysis enables one to
test whether relationships expected on theoretical grounds actually appear
in the data. Derrick Lawley and Karl Jöreskog provided a statistical
procedure, based on maximum likelihood estimation, for fitting factor
models to data and testing the number of factors that can be detected and
reliably estimated in the data. Similar problems of defining variables
appear in educational research, even in experimental studies of
alternative methods of instruction. The goals of education are broad and
the outcomes of instruction are correspondingly many: an innovation in
instructional practice may lead to a gain in some measured outcomes and a
loss in others. The investigator can measure a great many such outcomes,
but unless all are favorable or all unfavorable the results become too
complex to discuss or provide any guide to educational policy. Again,
factor analysis is a great assistance in identifying the main dimensions
of variation among outcomes and suggesting parsimonious constructs for
their discussion. In the LISREL model, the linear structural relationship
and the factor structure are combined into one comprehensive model
applicable to observational studies in many fields. The model allows
- multiple latent constructs
indicated by observable explanatory (or exogenous ) variables,
- recursive and nonrecursive
relationships between constructs, and
- multiple latent constructs
indicated by observable responses (or endogenous ) variables.
The connections between the
latent constructs compose the structural equation model; the relationships
between the latent constructs and their observable indicators or outcomes
compose the factor models. All parts of the comprehensive model may be
represented in the path diagram and all factor loadings and structural
relationships appear as coefficients of the path. Nested within the
general model are simpler models that the user of the LISREL program may
choose as special cases. If some of the variables involved in the
structural relationships are observed directly, rather than indicated,
part or all of the factor model may be excluded. Conversely, if there are
no structural relationships, the model may reduce to a confirmatory factor
analysis applicable to the data in question. Finally, if the data arise
from a simple prediction problem or controlled experiment in which the
independent variable or treatment level is measured without error, the
user may specialize to a simple regression model and obtain the standard
results of ordinary least-squares analysis. These specializations may be
communicated to the LISREL computer program in three different ways. At
the most intuitive, visual level, the user may construct the path diagram
interactively on the screen, and specify paths to be included or excluded.
The corresponding verbal level is the SIMPLIS command language. It
requires only that the user name the variables and declare the
relationships among them. The third and most detailed level is the LISREL
command language. It is phrased in terms of matrices that appear in the
matrix-algebraic representation of the model. Various parameters of the
matrices may be fixed or set equal to other parameters, and linear and
non-linear constraints may be imposed among them. The terms and syntax of
the LISREL command language are explained and illustrated in the LISREL
program manuals. Most but not all of these functions are included in the
SIMPLIS language; certain advanced functions are only possible in native
LISREL commands. The essential statistical assumption of LISREL analysis
is that random quantities within the model are distributed in a form
belonging to the family of elliptical distributions, the most prominent
member of which is the multivariate normal distribution. In applications
where it is reasonable to assume multivariate normality, the maximum
likelihood method of estimating unknowns in the model is justified and
usually preferred. Where the requirements of maximum likelihood estimation
are not met, as when the data are ordinal rather than measured, the
various least squares estimation methods are available. It is important to
understand, however, except in those cases where ordinary least squares
analysis applies or the weight matrices of other least squares methods are
known, that these are large-sample estimation procedures. This is not a
serious limitation in observation studies, where samples are typically
large. Small-sample theory applies properly only to controlled experiments
and only when the model contains a single, univariate or multivariate
normal error component. The great merit of restricting the analytical
methods to elliptically distributed variation is the fact that the sample
mean and covariance matrix (or correlation matrix and standard deviations)
are sufficient statistics of the analysis. This allows all the information
in the data that bear on the choice and fitting of the model to be
compressed into the relatively small number of summary statistics. The
resulting data compression is a tremendous advantage in large-scale
sample-survey studies, where the number of observations may run to the
tens of thousands, whereas the number of sufficient statistics are of an
order of magnitude determined by the number of variables. The operation of
reducing the raw data to their sufficient statistics (while cleaning and
verifying the validity for the data) is performed by the PRELIS program
which accompanies LISREL. PRELIS also computes summary statistics for
qualitative data in the form of tetrachoric or polychoric correlation
matrices. When there are several sample groups, and the LISREL model is
defined and compared across the groups, PRELIS prepares the sufficient
statistics for each sample in turn. In many social and psychological or
educational research studies where a single sample is involved, the
variables are ususally measured on a scale with an arbitrary origin. In
that case, the overall means of the variables in the sample can be
excluded from the analysis, and the fitting of the LISREL model can be
regarded simply as an analysis of the covariance structure, in which case
the expected covariance matrix implied by the model is fitted to the
observed covariance matrix directly. Since the sample covariance matrix is
a sufficient statistic under the distribution assumption, the result is
equivalent to fitting the data. Again, the analysis is made more
manageable because one can examine the residuals from the observed
covariances, which are moderate in number, as opposed to analyzing
residuals of the original observations in a large sample. Many of these
aspects of the LISREL analysis are brought out in the examples in the PRELIS and LISREL
program manuals . In addition, the SIMPLIS manual contains exercises to help the student strengthen and expand his or
her understanding of this powerful method of data analysis. Files
containing the data of these examples are included with the program and
can be analyzed in numerous different ways to explore and test alternative
models. The PRELIS, LISREL and SIMPLIS manuals are included with the
LISREL program. A complete manual list is in the SSI bookstore . Also
see the reference
page for further reading material on structural equation modeling.
Description of the LISREL
model
The LISREL model, in its most
general form, consists of a set of linear structural equations. Variables
in the equation system may be either directly observed variables or
unmeasured latent (theoretical) variables that are not observed but relate
to observed variables. It is assumed in the model that there is a causal
structure among a set of latent variables, and that the observed variables
are indicators of the latent variables. The model consists of two parts,
the measurement model and the structural equation
model :
- The measurement model
specifies how latent variables or hypothetical constructs depend upon or
are indicated by the observed variables. It describes the measurement
properties (reliabilities and validities) of the observed variables.
- The structural equation
model specifies the causal relationships among the latent variables,
describes the causal effects, and assigns the explained and unexplained
variance.
- The LISREL method
estimates the unknown coefficients of the set of linear structural
equations. It is particularly designed to accommodate models that
include latent variables, measurement errors in both dependent and
independent variables, reciprocal causation, simultaneity, and
interdependence.
The method includes as special cases
such procedures as
- confirmatory factor
analysis,
- multiple regression
analysis,
- path analysis,
- economic models for
time-dependent data,
- recursive and
non-recursive models for cross-sectional / longitudinal data,
- covariance structure
models and
- multi-sample analysis.
The full LISREL model for
single samples is defined, for deviations about the means, by the
following three equations.
- The structural equation
model: h = B h + G x + z
- The measurement model for
x : x = L x x + d
- The measurement model for
y : y = L y h + e
The terms in these models are
defined as follows:
- h is a m x 1 random vector of latent dependent, or endogenous,
variables
- x is a n x 1 random vector of latent independent, or exogenous,
variables
- y is a p x1 vector of observed indicators of the dependent latent
variables h
- x is a q x 1 vector of observed indicators of the independent latent
variables x
- e is a p x 1 vector of measurement errors in y
- d is a q x 1 vector of measurement errors in x
- L y is a p x m matrix of coefficients of the regression of y on h
- L x is a q x n matrix of coefficients of the regression of x on x
- G is a m x n matrix of coefficients of the x - variables in
the structural relationship
- B is a m x m matrix of coefficients of the h - variables in
the structural relationship. ( B has zeros in the
diagonal, and I - B is required to be
non-singular)
- z is a m x 1 vector of equation errors (random disturbances) in the
structural relationship between h and x
The random components in the LISREL
model are assumed to satisfy the following minimal assumptions:
- e is
uncorrelated with h
- d is uncorrelated with x
- z is
uncorrelated with x
z is uncorrelated with e and d.
Covariance matrices
- Cov ( x )
= F ( n x n )
- Cov ( e )
= Q e ( p x p )
- Cov ( z )
= Y ( m x m )
- Cov ( d )
= Q d ( q x q )
Implied covariance matrix
The assumptions in
the previous section imply the following form for the covariance matrix of
the observed variables:
where A is inverse( I - B ).
Fixed, free and constrained
parameters
The general LISREL
model is specialized by fixing and constraining the parameters that
compromise the elements in L y , L x , B, G, Y, F and Q d . The elements are of three kinds:
- Fixed parameters - assigned specific values
- Constrained parameters - unknown but equal to a function of one or more other unknown
parameters
- Free parameters -
unknown and not constrained to be equal to other parameters
|