統計學

https://rdcu.be/d41qA David Spiegelhalter Life is uncertain. None of us know what is going to happen. We know little of what has happened in the past, or is happening now outside our immediate experience. Uncertainty has been called ‘the consciousness of ignorance’ – be it of the weather tomorrow, the next Premier League champions, the climate in 2100 or the identity of our ancient ancestors. In daily life, we generally say an event “could”, “might” or “is likely to” happen (or have happened). But uncertain words can be treacherous. When, in 1961, the newly elected US president John F. Kennedy was informed about a CIA-sponsored plan to invade communist Cuba, he commissioned an appraisal from his military top brass. They concluded that the mission had a 30% chance of success – that is, a 70% chance of failure. In the report that reached the president, this was rendered as “a fair chance”. The Bay of Pigs invasion went ahead, and was a fiasco. There are now established scales for convert...

解釋性線性回歸：y= b0 + b1x1 +…+ bixi + e, xi 是 x1 之外的干擾變項（影響暴露 x1 和結局 y 的因子）, e 誤差（殘差）, 只要看 b1 的 p 值是否 < 0.05 和 95% 信賴區間是否不包含 0（不要看 bi）線性回歸的假設 https://www.facebook.com/share/1E3KxEtXfZ/?mibextid=wwXIfr 不要用逐步回歸：固定 x1，用 p 值或 AIC, BIC 逐步選擇變項。缺點是忽略領域專業知識、變項選擇的隨機性增加不穩定性、多重比較增加第一型錯誤（假陽性）率、忽略非線性、忽略交互作用、忽略共線性、p 值被低估、95% 信賴區間被低估、過度擬合 overfitting（模型在訓練數據上表現良好，卻在新數據上表現不佳）機器學習： • 用交叉驗證（把數據分成 n 份，由 n-1 份訓練模型 mi，在剩下的一份驗證 mi，如此重複進行）選擇最佳的 m，用拔靴法（由數據做有放回的重複抽樣，在每次的抽樣中估計平均值）估計該 m 的 95% 信賴區間 • 能改善非線性（隨機森林）、交互作用（隨機森林）、共線性（ridge 回歸、隨機森林）、干擾因子選擇（LASSO） • 假設跟 y 的關係：b1 是線性、bi 可能是線性或是非線性雙重機器學習：減少單純機器學習的偏誤 https://poe.com/preview/in8Sair3cp3FQubrqZPt • 治療模型：D=m0(Z)+V, E(V|Z)=0 • 結局模型：y=Dθ0+g0(Z)+U, E(U|Z, D)=0 • D: 治療，θ0：平均治療效果，Z: 干擾因子，E: 期望值（平均值）, hat: 估計值 • 把數據分成二份 • 樣本 0 用機器學習訓練 m0hat⁰(Z)來估計 D 的條件期望值 mo(Z)，並用機器學習訓練 g0hat⁰(Z) 來估計 Y 的條件期望值 g0(Z) • 在樣本 1 計算殘差 Vhat⁰= D-m0hat⁰(Z)= 與控制變數 Z 無關的 D 變異部分 • 在樣本 1 計算殘差 Uhat⁰=y-g0hat⁰(Z)= 與控制變數 Z 無關的 Y 變異部分 • 線性回歸得到 θ0hat⁰：Uhat⁰=θ0hat⁰.Vhat⁰ +e, E(e|Vhat⁰)=0 • 在樣本 1 用機器學習訓...

搜尋此網誌

統計學

發表文章

Why probability probably doesn’t exist (but it is useful to act like it does) 24

觀察性研究的因果分析