glm: assumptions
線性回歸(Y. = a + bX + e)的假設
Xs 的效果是相加的(沒有交互作用)
Xs 是固定(非隨機)的
Xs 沒有測量誤差
Xs 沒有完美共線性
Y 是連續變數(沒有上/下限、沒有截斷)
殘差是獨立且相同分布(iid)的,亦即殘差與 Xs 不相關,而且沒有自相關(序列相關,重複測量或是縱貫性資料)
e (殘差)的平均值是零
殘差是常態分佈的
X 是有變異性的
模型是正確(應變數 Y 和因變數 X 的關係是線性)的,而且是事先設定的,不是事後(模型選擇、因變數選擇之後)的
沒有遺失的資料
沒有極端值
沒有未測量的 X 混淆變項
Gauss-Markov theorem (https://en.wikipedia.org/wiki/Gauss%E2%80%93Markov_theorem): mean of e is zero, e has homoscedasticity, independent var. If e has normal distr, then ols equals mle.
Model
Pre-specified: no model or variable selection
Linearity (Rx: gamsel, krls, splines, fracpoly, svm)
No misspecification (Rx: krls)
No missing data (Rx: imputation)
No outliers (Rx: robust methods)
Variables
Not bounded: Rx by logit
Not censored: Rx by survival analysis, tobit models
Not truncated
Predictors
Fixed: nonrandom
Additive: no interactions, Rx by krls
No measurement errors: Rx by eivreg, SEM
No unmeasured confounders (Rx: sensitivity analysis, E values)
No collinearity: Rx by pcareg, pls, ridge
Having variance
Residuals (OLS)
Normality: CLT kicks in if n > 30 or sample per var > 10. affects ci, not bias. Rx by median/quantile regression, robust se
Homoscedasticity: affects CI, not bias, Rx by heteroscedasticity-robust se
iid (no autocorrelation or clustering): affects ci, not bias. Rx by mixed models or gee (marginal models), cluster-robust se
Not correlated with Xs
Zero mean
留言
張貼留言