Learning Paths · RWEdnesdays

Track 01

Claims Data Foundations

12 steps ~6–8 hrs Beginner-friendly

Companion visual: Claim Forms annotated reference

01

Study Time Windows Anatomy

The temporal skeleton every claims analysis is built on — start here before touching any code field.

✦ Beginner layer included

02

ICD-10-CM Diagnosis Coding

Diagnosis codes define your outcome and comorbidities — understand hierarchy before building phenotypes.

✦ Beginner layer included

03

CPT Procedure Coding

Procedures anchor treatment exposure in professional claims; pairs directly with ICD codes on the same claim line.

✦ Beginner layer included

04

NDC — National Drug Code

Drug exposure lives in NDCs on pharmacy claims; required before tackling dispensing-based cohort entry.

✦ Beginner layer included

05

NCPDP Pharmacy Claim Fields

Maps NDCs onto the NCPDP transaction format — needed to extract days supply, quantity, and refill data.

✦ Beginner layer included

06

UB-04 Institutional Claim Fields

Inpatient and outpatient facility claims travel on UB-04 — understanding its structure is required for any hospital-based outcome.

✦ Beginner layer included

07

CMS-1500 Professional Claim Fields

Physician services ride the CMS-1500; links CPT codes to place of service and billing provider.

✦ Beginner layer included

08

Revenue Center Codes

Revenue codes on UB-04 distinguish emergency, observation, and ICU encounters — critical for site-of-care phenotyping.

✦ Beginner layer included

09

Claim Adjustments, Reversals & Denials

Failure to handle adjustment/void claim lines creates ghost utilization — must be cleaned before any analysis.

✦ Beginner layer included

10

Continuous Enrollment & Observable Time

Gaps in coverage = gaps in data; defining eligible observation time is the gatekeeper for all downstream denominators.

✦ Beginner layer included

11

Washout / Clean Lookback Period

Washout windows ensure new-user and drug-naïve cohorts — the prerequisite for credible exposure classification.

✦ Beginner layer included

12

Diagnosis Phenotype Algorithm (1IP/2OP)

Combines every prior step into a validated cohort-entry rule — the capstone of claims-based cohort construction.

✦ Beginner layer included

Track 02

Statistics On-Ramp

12 steps ~8–10 hrs Beginner-friendly

01

Descriptive Statistics

Mean, median, IQR — the vocabulary every table-1 and every subsequent test assumes you already know.

✦ Beginner layer included

02

Normal Distribution & CLT

The central limit theorem is why most parametric tests work on large RWE samples even with skewed raw data.

✦ Beginner layer included

03

Inferential Statistics Foundations

p-values, confidence intervals, and the null-hypothesis framework underpin every comparison you will run.

✦ Beginner layer included

04

Maximum Likelihood Estimation

MLE is the engine inside every regression model you will use — understanding it demystifies model fitting.

✦ Beginner layer included

05

Parametric vs. Nonparametric Tests

Knowing when normality assumptions break guides whether you reach for t-tests or rank-based alternatives.

✦ Beginner layer included

06

Two-Sample t-Test

The workhorse for comparing continuous baseline covariates between treatment arms in Table 1.

✦ Beginner layer included

07

Chi-Square Test

The analogous test for categorical variables — essential for comparing demographic and binary covariate distributions.

✦ Beginner layer included

08

Risk Ratio & Risk Difference

The two fundamental measures of comparative effect — every HEOR model uses one or both as its currency.

✦ Beginner layer included

09

Binomial Distribution & Logit Link

Binary outcomes (hospitalisation, event) follow the binomial; the logit link maps probabilities onto the real line for regression.

✦ Beginner layer included

10

OLS Linear Regression

The intuition and matrix algebra behind OLS transfers directly to GLMs and survival models.

✦ Beginner layer included

11

Generalized Linear Models

GLMs unify logistic, Poisson, and gamma regression under one framework — the workhorse of outcomes research.

✦ Beginner layer included

12

Regression Diagnostics

Residual plots, leverage, and VIF reveal whether your model assumptions hold before you trust its estimates.

✦ Beginner layer included

Track 03

Survival Analysis

10 steps ~6–8 hrs Prereq: Track 02

01

Censoring Mechanisms

Right censoring, left truncation, informative censoring — the defining quirks of time-to-event data in RWE.

✦ Beginner layer included

02

Person-Time Denominator Construction

How to accumulate and truncate at-risk time in claims data — the denominator every incidence rate depends on.

✦ Beginner layer included

03

Incidence Rate Calculation

Events per person-year is the universal rate measure; practice computing it before fitting any model.

✦ Beginner layer included

04

Kaplan-Meier Estimator

Non-parametric survival curves — the first visual every reviewer expects and the baseline every model is judged against.

✦ Beginner layer included

05

Log-Rank Test

Tests equality of KM curves across groups — needed before deciding whether a Cox model is warranted.

✦ Beginner layer included

06

Hazard Ratio Interpretation

HR ≠ relative risk; understanding the instantaneous-rate framing prevents the most common HTA submission error.

✦ Beginner layer included

07

Cox PH Regression

The semi-parametric workhorse of time-to-event analyses in pharma RWE — assumes proportional hazards, which must be verified.

✦ Beginner layer included

08

Competing Risks — Cause-Specific & Fine-Gray

Death before the outcome of interest invalidates standard KM; Fine-Gray subdistribution HR is the HTA-preferred alternative.

✦ Beginner layer included

09

Accelerated Failure Time Models

Parametric alternative when proportional hazards fails — and the basis for survival extrapolation in economic models.

✦ Beginner layer included

10

Restricted Mean Survival Time (RMST)

HR-free summary of survival benefit over a defined horizon — increasingly required by HTA bodies when PH fails.

✦ Beginner layer included

Track 04

Causal Inference

12 steps ~8–10 hrs Prereq: Tracks 01–03

01

DAGs & Backdoor Criterion in Drug Studies

Directed acyclic graphs make confounding assumptions explicit — the foundation before any adjustment strategy.

✦ Beginner layer included

02

Confounding by Indication & Channeling

The dominant bias in pharmacoepidemiology — sicker patients get treated, which inflates or deflates effect estimates.

✦ Beginner layer included

03

New-User Design

Restricting to treatment initiators eliminates prevalent-user bias — the modern standard for drug comparisons in RWE.

✦ Beginner layer included

04

Active Comparator + New-User Design

Pairing active comparators with new-user entry balances measured confounders and reflects real prescribing decisions.

✦ Beginner layer included

05

Time-Zero / Index Date Alignment

Misaligned index dates introduce immortal time; correct anchoring is non-negotiable for valid follow-up.

✦ Beginner layer included

06

Immortal Time Bias Handling

Time between cohort entry and treatment start cannot be spent having the outcome — mis-handling this inflates efficacy.

✦ Beginner layer included

07

Propensity Score Methods (PSM / IPTW)

Matching and weighting on PS is the primary tool for reducing measured confounding in large claims databases.

✦ Beginner layer included

08

Overlap Weights & Modern PS Weighting

Overlap weights outperform ATE/ATT weights in sparse-overlap settings common in specialty-drug RWE.

✦ Beginner layer included

09

G-Computation / Parametric G-Formula

Standardisation via outcome regression — avoids PS trimming losses and handles time-varying confounders.

✦ Beginner layer included

10

Target Trial Emulation

Formalises the hypothetical RCT your observational study is trying to answer — the FDA/EMA preferred framing.

✦ Beginner layer included

11

E-Value Sensitivity Analysis

Quantifies how strong unmeasured confounding must be to explain away your result — required for any regulatory submission.

✦ Beginner layer included

12

Quantitative Bias Analysis Toolkit

Goes beyond E-values to model misclassification and selection bias jointly — the advanced bias-analysis capstone.

✦ Beginner layer included

Track 05

HEOR & Economic Modeling

10 steps ~6–8 hrs Prereq: Tracks 02–03

01

Healthcare Costs — PPPM / PPPY / PMPM

Per-member cost metrics are the lingua franca of payer communications and the raw input to budget impact models.

✦ Beginner layer included

02

Gamma Distribution

Healthcare costs are right-skewed and strictly positive — gamma GLMs are the appropriate parametric model.

✦ Beginner layer included

03

Two-Part Models for Semicontinuous Costs

Zero-inflated cost distributions need a logistic first part (any utilisation?) and a gamma second part (how much?).

✦ Beginner layer included

04

Bootstrap Resampling Methods

Non-parametric CIs for cost differences and NMB estimates when distributional assumptions can't be verified.

✦ Beginner layer included

05

QALY & Utility Mapping from RWE

Mapping condition-specific PROs to EQ-5D index scores is the bridge between RWE data and NICE/HTA cost-utility models.

✦ Beginner layer included

06

Markov Transition Probabilities from RWE

How to derive state-transition parameters from observed data — the central plumbing of Markov cohort models.

✦ Beginner layer included

07

Survival Extrapolation for HTA

Parametric extrapolation beyond the trial horizon is required by every NICE TA — RWE informs the long-run tail.

✦ Beginner layer included

08

ICER & Net Monetary Benefit

The decision metric for cost-effectiveness: ICER vs. threshold, and NMB as the linear transformation for regression.

✦ Beginner layer included

09

Probabilistic Sensitivity Analysis

Monte Carlo PSA propagates joint parameter uncertainty to the cost-effectiveness plane — required in every HTA dossier.

✦ Beginner layer included

10

Budget Impact Analysis

Payer-facing affordability analysis; complements the cost-utility model and is required for most formulary submissions.

✦ Beginner layer included

Track 06

Prediction & AI

10 steps ~6–8 hrs Prereq: Track 02

01

Predictive & Causal ML Models in RWE

Frames the prediction vs. causal distinction before diving into specific algorithms — avoids conflating the two goals.

✦ Beginner layer included

02

Cross-Validation & Overfitting

k-fold CV is the standard guard against overfitting in high-dimensional claims data — required before tuning any model.

✦ Beginner layer included

03

Regularized Regression — LASSO & Ridge

Penalised regression handles thousands of ICD/CPT indicators without overfitting; LASSO gives automatic variable selection.

✦ Beginner layer included

04

Tree-Based Ensembles in RWE

Random forests and gradient boosting capture non-linear interactions that linear models miss in heterogeneous patient populations.

✦ Beginner layer included

05

ROC, AUC & Discrimination

C-statistic measures how well the model ranks patients — the primary metric for algorithm validation studies.

✦ Beginner layer included

06

Brier Score & Calibration

Calibration (does predicted probability match observed frequency?) is as important as discrimination for clinical decision support.

✦ Beginner layer included

07

Prediction Model Validation & Recalibration

External validation and recalibration ensure the model performs in a new population — required before clinical deployment.

✦ Beginner layer included

08

NLP for Clinical Text Extraction

Structured claims miss nuance captured in physician notes; NLP unlocks smoking status, severity, and disease course from EHR text.

✦ Beginner layer included

09

LLM-Assisted Abstraction in RWE

Large language models as chart-abstraction tools — covers prompt design, hallucination risk, and regulatory validation requirements.

✦ Beginner layer included

10

Agreement Statistics — Kappa, ICC, Bland-Altman

Measures inter-rater reliability between NLP/LLM output and human abstraction — the final gate before algorithm acceptance.

✦ Beginner layer included