Biostatistics
Biostatistics & Study Design — Complete
Biostatistics

Biostatistics & Study Design — Complete

Study designs, RR vs OR, sensitivity/specificity, biases, statistical tests.

Select any text to highlight it or make a flashcard.

Study designs (strongest to weakest)

  • Meta-analysis > Systematic review > RCT > Cohort > Case-control > Cross-sectional > Case series/report
  • RCT: gold standard for causation; randomization eliminates confounding
  • Cohort: prospective; exposure → outcome; calculates incidence + RR
  • Retrospective cohort: uses past records (both exposure and outcome already occurred)
  • Case-control: starts with disease, looks back for exposure; OR; good for rare diseases; recall bias risk
  • Cross-sectional: snapshot; measures prevalence; no temporal relationship
  • Ecological: population-level data; ecological fallacy = main bias
  • Twin concordance: monozygotic vs dizygotic for heritability

Clinical trial phases

  • Phase I: small group of HEALTHY volunteers — SAFETY + pharmacokinetics
  • Phase II: patients with disease — efficacy + side effects + dosing
  • Phase III: large RCT — confirms efficacy, compares to standard or placebo, FDA approval
  • Phase IV: post-marketing surveillance — long-term/rare side effects

Risk measures

  • Relative Risk (RR) = Incidence in exposed / Incidence in unexposed (cohort/RCT)
  • Odds Ratio (OR) = ad/bc (case-control)
  • Relative Risk Reduction (RRR) = 1 − (Rt/Rc)
  • Absolute Risk Reduction (ARR) = Rc − Rt
  • Number Needed to Treat (NNT) = 1 / ARR (round UP)
  • Number Needed to Harm (NNH) = 1 / ARI
  • Attributable Risk = Ie − Iu

Diagnostic test characteristics

  • Sensitivity = TP / (TP + FN) — rules OUT (SnNout); intrinsic test property
  • Specificity = TN / (TN + FP) — rules IN (SpPin); intrinsic test property
  • PPV = TP / (TP + FP) — depends on prevalence
  • NPV = TN / (TN + FN) — depends on prevalence
  • LR+ = Sn / (1 − Sp) — large value (>10) rules IN
  • LR− = (1 − Sn) / Sp — small value (<0.1) rules OUT
  • Likelihood ratios: independent of prevalence

Cutoff effects (lower threshold)

  • Lower cutoff: ↑ Sn, ↑ NPV, ↓ Sp, ↓ PPV — more false positives
  • Higher cutoff: ↑ Sp, ↑ PPV, ↓ Sn, ↓ NPV — more false negatives
  • Screening test: lower cutoff (want high sensitivity)
  • Confirmatory test: raise cutoff (want high specificity)

Biases

  • Selection bias: Berkson (hospital controls), healthy worker, attrition (loss to follow-up)
  • Recall bias: case-control inaccurate memory
  • Observer bias: researcher expectation; prevent with double-blinding
  • Hawthorne effect: behavior change from being observed
  • Confounding: 3rd variable linked to both; address with randomization, matching, stratification, regression
  • Effect modification: NOT a bias — true difference in effect across subgroups
  • Lead-time bias: earlier diagnosis without survival benefit
  • Length-time bias: screening preferentially detects indolent disease
  • ITT (intention-to-treat) analysis: addresses attrition bias

Statistical tests by data type

  • 2 means, normal distribution → t-test (paired or independent)
  • ≥3 means → ANOVA
  • Categorical × categorical → chi-square (or Fisher exact if cells <5)
  • Paired categorical → McNemar
  • Continuous × continuous (linear) → Pearson correlation
  • Non-normal small samples → Mann-Whitney (2 groups), Kruskal-Wallis (≥3)
  • Survival data → Kaplan-Meier curves; log-rank test for comparison

Hypothesis testing

  • Null hypothesis: NO effect/difference
  • Type I error (α, false positive): rejecting a TRUE null; usually 0.05
  • Type II error (β, false negative): failing to reject a FALSE null
  • Power = 1 − β (probability of correctly rejecting false null)
  • Increase power: ↑ sample size, ↑ effect size, ↓ variability, ↑ α
  • P-value: probability of observing data if null is true; p < α → reject null
  • Confidence interval: if CI includes null value (0 for differences, 1 for ratios) = NOT significant

Distributions & central tendency

  • Normal distribution: 68% ±1 SD, 95% ±2 SD, 99.7% ±3 SD
  • Right-skewed: mean > median > mode
  • Left-skewed: mean < median < mode
  • Outliers pull MEAN; median is robust
  • Standard error (SE) = SD / √n

High-yield pearls

  • Cohort → RR; Case-control → OR
  • NNT = 1/ARR (round UP)
  • Sn/Sp don't depend on prevalence; PPV/NPV do
  • Power = 1 − β; increase n to increase power
  • Lead-time bias: 🕒 earlier clock; Length-time bias: 🐢 turtle (slow) cancers
Done reading?
Track your progress by marking this complete.