Notes
Biostatistics
Biostatistics & Study Design — Complete
Mark complete
Biostatistics
Biostatistics & Study Design — Complete
Study designs, RR vs OR, sensitivity/specificity, biases, statistical tests.
Select any text to highlight it or make a flashcard.
◆
Study designs (strongest to weakest)
•
Meta-analysis > Systematic review > RCT > Cohort > Case-control > Cross-sectional > Case series/report
•
RCT: gold standard for causation; randomization eliminates confounding
•
Cohort: prospective; exposure → outcome; calculates incidence + RR
•
Retrospective cohort: uses past records (both exposure and outcome already occurred)
•
Case-control: starts with disease, looks back for exposure; OR; good for rare diseases; recall bias risk
•
Cross-sectional: snapshot; measures prevalence; no temporal relationship
•
Ecological: population-level data; ecological fallacy = main bias
•
Twin concordance: monozygotic vs dizygotic for heritability
◆
Clinical trial phases
•
Phase I: small group of HEALTHY volunteers — SAFETY + pharmacokinetics
•
Phase II: patients with disease — efficacy + side effects + dosing
•
Phase III: large RCT — confirms efficacy, compares to standard or placebo, FDA approval
•
Phase IV: post-marketing surveillance — long-term/rare side effects
◆
Risk measures
•
Relative Risk (RR) = Incidence in exposed / Incidence in unexposed (cohort/RCT)
•
Odds Ratio (OR) = ad/bc (case-control)
•
Relative Risk Reduction (RRR) = 1 − (Rt/Rc)
•
Absolute Risk Reduction (ARR) = Rc − Rt
•
Number Needed to Treat (NNT) = 1 / ARR (round UP)
•
Number Needed to Harm (NNH) = 1 / ARI
•
Attributable Risk = Ie − Iu
◆
Diagnostic test characteristics
•
Sensitivity = TP / (TP + FN) — rules OUT (SnNout); intrinsic test property
•
Specificity = TN / (TN + FP) — rules IN (SpPin); intrinsic test property
•
PPV = TP / (TP + FP) — depends on prevalence
•
NPV = TN / (TN + FN) — depends on prevalence
•
LR+ = Sn / (1 − Sp) — large value (>10) rules IN
•
LR− = (1 − Sn) / Sp — small value (<0.1) rules OUT
•
Likelihood ratios: independent of prevalence
◆
Cutoff effects (lower threshold)
•
Lower cutoff: ↑ Sn, ↑ NPV, ↓ Sp, ↓ PPV — more false positives
•
Higher cutoff: ↑ Sp, ↑ PPV, ↓ Sn, ↓ NPV — more false negatives
•
Screening test: lower cutoff (want high sensitivity)
•
Confirmatory test: raise cutoff (want high specificity)
◆
Biases
•
Selection bias: Berkson (hospital controls), healthy worker, attrition (loss to follow-up)
•
Recall bias: case-control inaccurate memory
•
Observer bias: researcher expectation; prevent with double-blinding
•
Hawthorne effect: behavior change from being observed
•
Confounding: 3rd variable linked to both; address with randomization, matching, stratification, regression
•
Effect modification: NOT a bias — true difference in effect across subgroups
•
Lead-time bias: earlier diagnosis without survival benefit
•
Length-time bias: screening preferentially detects indolent disease
•
ITT (intention-to-treat) analysis: addresses attrition bias
◆
Statistical tests by data type
•
2 means, normal distribution → t-test (paired or independent)
•
≥3 means → ANOVA
•
Categorical × categorical → chi-square (or Fisher exact if cells <5)
•
Paired categorical → McNemar
•
Continuous × continuous (linear) → Pearson correlation
•
Non-normal small samples → Mann-Whitney (2 groups), Kruskal-Wallis (≥3)
•
Survival data → Kaplan-Meier curves; log-rank test for comparison
◆
Hypothesis testing
•
Null hypothesis: NO effect/difference
•
Type I error (α, false positive): rejecting a TRUE null; usually 0.05
•
Type II error (β, false negative): failing to reject a FALSE null
•
Power = 1 − β (probability of correctly rejecting false null)
•
Increase power: ↑ sample size, ↑ effect size, ↓ variability, ↑ α
•
P-value: probability of observing data if null is true; p < α → reject null
•
Confidence interval: if CI includes null value (0 for differences, 1 for ratios) = NOT significant
◆
Distributions & central tendency
•
Normal distribution: 68% ±1 SD, 95% ±2 SD, 99.7% ±3 SD
•
Right-skewed: mean > median > mode
•
Left-skewed: mean < median < mode
•
Outliers pull MEAN; median is robust
•
Standard error (SE) = SD / √n
High-yield pearls
◆
Cohort → RR; Case-control → OR
◆
NNT = 1/ARR (round UP)
◆
Sn/Sp don't depend on prevalence; PPV/NPV do
◆
Power = 1 − β; increase n to increase power
◆
Lead-time bias: 🕒 earlier clock; Length-time bias: 🐢 turtle (slow) cancers
Done reading?
Track your progress by marking this complete.
Mark complete
Next in Biostatistics