glm: assumptions

8月 29, 2020

線性回歸（Y. = a + bX + e）的假設

Xs 的效果是相加的（沒有交互作用）

Xs 是固定（非隨機）的

Xs 沒有測量誤差

Xs 沒有完美共線性

Y 是連續變數（沒有上/下限、沒有截斷）

殘差是獨立且相同分布（iid）的，亦即殘差與 Xs 不相關，而且沒有自相關（序列相關，重複測量或是縱貫性資料）

e （殘差）的平均值是零

殘差是常態分佈的

X 是有變異性的

模型是正確（應變數 Y 和因變數 X 的關係是線性）的，而且是事先設定的，不是事後（模型選擇、因變數選擇之後）的

沒有遺失的資料

沒有極端值

沒有未測量的 X 混淆變項

Gauss-Markov theorem (https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem): mean of e is zero, e has homoscedasticity, independent var. If e has normal distr, then ols equals mle.

Model

Pre-specified: no model or variable selection

Linearity (Rx: gamsel, krls, splines, fracpoly, svm)

No misspecification (Rx: krls)

No missing data (Rx: imputation)

No outliers (Rx: robust methods)

Variables

Not bounded: Rx by logit

Not censored: Rx by survival analysis, tobit models

Not truncated

Predictors

Fixed: nonrandom

Additive: no interactions, Rx by krls

No measurement errors: Rx by eivreg, SEM

No unmeasured confounders (Rx: sensitivity analysis, E values)

No collinearity: Rx by pcareg, pls, ridge

Having variance

Residuals (OLS)

Normality: CLT kicks in if n > 30 or sample per var > 10. affects ci, not bias. Rx by median/quantile regression, robust se

Homoscedasticity: affects CI, not bias, Rx by heteroscedasticity-robust se

iid (no autocorrelation or clustering): affects ci, not bias. Rx by mixed models or gee (marginal models), cluster-robust se

Not correlated with Xs

Zero mean

搜尋此網誌

統計學

glm: assumptions

留言

張貼留言

這個網誌中的熱門文章

可轉移性、普遍性、代表性和外部有效性

頻率學派 vs 貝氏學派

貝氏分析計算器