Mixed model

8月 29, 2020

Fixed effects: dummy vars, chosen by the PI, each level is of interest, levels will be reused, not extrapolating to other levels

Random effects: want to make inferences

Model selection: estat ic, cAIC4, glmmlasso (std vars), gglasso (group lasso), AIC, DIC, MumIn::dredge

Longitudinal data: i or t is level 1 (time), j is level 2 (subject)

Time can be continuous (growth curve model) or discrete, can be imbalanced or different among subjects, but only continuous time can have a random slope and it should be consistent in fe and re.

Cross-level interaction should always include a random slope for the level 1 entity.

Fixed effect: within subject (micro) mean (population average in a single time period, higher-level entities treated as a dummy var in longitudinal or panel data, ignoring between subject variations: yij = b0 + b1xij + eij, estimated by BLUE (least squares). The only source of the variability is residual variance. It removes time-invariant heterogeneity. It is consistent, and the assumption is that the individual-specific effects are correlated with the independent variables (i.e. uc or endogeneity). Assumes that var(uj) is infinite. Equivalent to LSDV (LS dummy var with subject-specific intercept). assume MCAR, unmeasured time-invariant cov correlated with X.

Random effect (multilevel, hierarchical linear model): between subject (macro) variance components (difference between each subject or cluster and the grand mean): uj + eij, both are iid and homoscedastic with a mean of 0. uj is a level 2 residual (which is shrunk towards 0), fe is a level 1 residual. The sources of variability are variances of intercepts (uj) and residuals. Estimated by BLUP (ML or reml), higher-level entities (level 2 residuals) treated as a distribution. Coefficient is the sum of fixed coef (global mean) and random coef. It is efficient, and is compared to fixed effects model by the Hausman test. cov (exc) is the cov among the higher levels, res(ar1, t(time)) is the cov among the 1st level, assumption is that the individual-specific effects are uncorrelated with the independent variables. It reduces the problem of multiple comparison. do not report p values, assume MAR, unmeasured cov uncorrelated with X.

Sigma matrix (1st-level residual var-cov): compound symmetry (sphericity, all repeated measurements have same variance, all paired measurements have same cov, ANOVA), unstructured (MANOVA), AR1

Var-cov matrix for random intercept and random slope (G matrix): pdSymm (unstructured, default), pdCompSymm, pdDiag (var components)

REWB (hybrid model: https://link.springer.com/article/10.1007/s11135-018-0802-x): the best model, captures both within and between effects. It should be used when time varying vars correlate with re (group factors).

Conditional mean, parametric (specify all moments: mean, var, skewness, kurtosis)

Estimation: ML (for N > 30, biased for random effects, should be used to compare models with different fixed effects), reml (for N < 30, biased for fixed effects)

GOF: hausman test (compare fe and re, sigmamore or sigmaless), if different, then choose fe.

Assumptions: normality, linearity, between-subject homoscedasticity

Correct only if intraclass correlation correct

Clusters: panels, multiple, N should > 30 for the highest level

Trajectories: individual

Time-invariant xs are level 2 entities, which cannot be estimated by fe.

Centering (grand mean or cluster mean) of time-varying predictors help convergence, interpretation of intercept and interaction, and reduce collinearity (https://link.springer.com/article/10.1007%2Fs11135-018-0802-x).

REBW (re between-within) is the best model where we add re with x̄j.

xtreg y x, re is equivalent to mixed y x ||id:

Assumptions:

1. All relevant predictors are included in the model

2. All relevant random effects are included in the model

3. The covariance structure of the within-cluster residuals, R, is properly specified (when the

outcome is continuous)

4. The covariance structure of the random effects, G, is properly specified (for all outcomes

scales)

5. The within-cluster residuals and the random effects do not covary [Cov(u, ε) = 0 ]

6. The within-cluster residuals follow a multivariate normal distribution (when the outcome is continuous).

7. The random effects follow a multivariate normal distribution (for all outcome scales).

8. The predictor variables do not covary with the residuals/random effects at any other level

[Cov(X,ε)=0, Cov(X,u)=0].

9. Sample size is sufficiently large for asymptotic inference at each level

10. With or without preprocessing, missing data are assumed to be missing completely at

random (MCAR) or missing at random (MAR).

搜尋此網誌

統計學

Mixed model

留言

張貼留言

這個網誌中的熱門文章

可轉移性、普遍性、代表性和外部有效性

頻率學派 vs 貝氏學派

貝氏分析計算器