Glossary · RWEdnesdays

No terms match your filter.

#

10-to-11 digit normalization: The process of converting an FDA-formatted 10-digit NDC (which can come in three different segment layouts) into the standard 11-digit HIPAA format used in pharmacy claims, by inserting one leading zero into the segment that is too short.; Appears in: NDC (National Drug Code)
2x2 confusion table: A four-cell table that cross-tabs the test result (positive/negative) against the truth (disease/no disease).; Appears in: Diagnostic Accuracy Study
2x2 table: A four-cell grid that cross-tabulates what the algorithm called (event or not) against what the chart review found (true event or not), producing the counts that go into PPV and sensitivity.; Appears in: Case-Control Study Design, Claims Outcome Algorithm PPV/Sensitivity Trade-off

A

Aalen-Johansen estimator: A calculation method that correctly accounts for competing events (like death) when estimating cumulative incidence, ensuring the risk estimate is not inflated the way a simpler method would be.; Appears in: Cumulative Incidence and Absolute Risk Estimation, Multi-State Models
absolute dose: The total milligrams of a drug dispensed or prescribed, without adjusting for how big or small the patient is.; Appears in: Pediatric Dose Normalization
absolute risk reduction: The plain difference in event rates between two groups — for example, 12 per 100 minus 9 per 100 equals 3 per 100 — which shows how many fewer events the treatment produced; easier to communicate honestly than a relative measure like a hazard ratio.; Appears in: Plain-Language Summaries of Evidence
absolute risk reduction (ARR): The plain difference in outcome risk between the two groups, found by subtracting the treated group's risk from the comparison group's risk.; Appears in: Number Needed to Treat (and Number Needed to Harm)
absolute vs relative effect: The absolute effect (risk difference) answers "how many extra events per N patients?" while the relative effect (risk ratio) answers "how many times more likely?"; neither alone is sufficient for clinical or policy decisions.; Appears in: Risk Ratio and Risk Difference
absorbing state: A health state a patient can enter but never leave, most commonly death, so its self-transition probability is always 1.0.; Appears in: Markov Transition Probabilities from Real-World Data, Multi-State Models
abstraction: The process of a trained human reviewer reading medical records — radiology reports, oncologist notes — and extracting a structured data point (such as a progression date and the basis for it) that was never recorded as a coded field in the database.; Appears in: Real-World Progression and rwPFS
accelerated failure time: A model structure in which treatment multiplies the time scale rather than the event rate; the time ratio exp(coefficient) says how much longer treated patients go before the event relative to the control group.; Appears in: Weibull Distribution for Time-to-Event Data
acceleration factor: Another name for the time ratio — it describes how much slower (greater than 1) or faster (less than 1) the event occurs under the condition of interest compared to the reference group; an acceleration factor less than 1 means the event is accelerated (harmful if the event is bad).; Appears in: Accelerated Failure Time (AFT) Models
active comparator: A comparison group that takes a different drug for the same condition, so both groups had to pass the same health-seeking threshold to get treated at all — reducing the hidden health advantage between groups.; Appears in: Active Comparator, New-User Design, Comparative Effectiveness Research (CER) Methods, Confounding by Indication and Channeling Bias, Healthy User Bias, Observational Comparative Effectiveness Research, Product/Exposure Registry
ADaM ADTTE: The ADaM time-to-event dataset, with one row per patient per analysis parameter; AVAL is the number of days until the event or censoring, and CNSR (0 = event, 1 = censored) records whether the event occurred.; Appears in: CDISC Standards (SDTM/ADaM) for RWE Submissions
additive interaction: A situation where two factors together produce more (or less) risk than the sum of their individual effects, the scale most relevant for deciding who benefits most from a treatment.; Appears in: Causal Mediation and Effect Modification
additive value model: The aggregation rule that multiplies each criterion's weight by the option's score on it and adds the products into one total value.; Appears in: Multi-Criteria Decision Analysis (MCDA)
adherence threshold: A cut-off (commonly PDC >= 0.80) used to label a patient adherent or not.; Appears in: Proportion of Days Covered (PDC)
adherent patient: A patient who consistently fills and takes their medication as prescribed over a defined treatment period.; Appears in: Healthy User Bias
adjudicated outcome: An outcome (like a hospitalization or worsening event) that trained reviewers confirm using a uniform definition, rather than just accepting a billing code at face value.; Appears in: Disease Registry
adjudication committee: A small panel of clinicians who independently read patient records and vote on whether a candidate event meets the study's case definition; disagreements are resolved by a third reviewer.; Appears in: Endpoint Adjudication and Chart Review
adjusted sequence ratio (aSR): The crude sequence ratio divided by the null-effect sequence ratio — this is the final reported number after correcting for background prescribing trends; a value meaningfully above 1 is flagged as a potential safety signal.; Appears in: Prescription Sequence Symmetry Analysis (PSSA)
adjustment: Including additional variables in a regression model so that each coefficient represents the outcome difference within groups of patients who are similar on those other variables, reducing (but not eliminating) confounding.; Appears in: Ordinary Least Squares (OLS) Linear Regression
adjustment set: The specific group of variables you include as covariates in your analysis to block all backdoor paths without introducing new ones — the answer the backdoor criterion gives you.; Appears in: DAGs and the Backdoor Criterion for Drug Studies
administrative censoring: An observation endpoint driven by a calendar rule rather than the patient's health — for example, a study data cut on a specific date records all patients still alive at that date as ending follow-up on that day regardless of how sick they were.; Appears in: Censoring: Types, Mechanisms, and Informativeness
administrative claims: The dated billing and enrollment records an insurer generates to reimburse care, later reused as research data.; Appears in: Administrative Claims Analysis
admitting diagnosis: The working clinical hypothesis the admitting physician wrote down when the patient first arrived at the hospital, before any inpatient workup was complete; it represents suspicion, not confirmed disease.; Appears in: Diagnosis Position, Type, and Qualifiers on Claims
AESI: Adverse Event of Special Interest — a safety outcome (such as liver injury or a serious heart problem) that the study is specifically designed to detect and measure.; Appears in: Voluntary (Non-Imposed) Post-Authorisation Safety Study
age-adjusted CCI: A version that adds one point for each decade of age over 40, combining disease burden and age into a single score.; Appears in: Charlson Comorbidity Index (CCI)
age-standardization: A mathematical adjustment that makes survival estimates from different cancer registries or time periods comparable even when their patients are diagnosed at different ages, by applying a standard set of age weights (such as the ICSS weights) to each age-stratum estimate.; Appears in: Relative and Net Survival
age-standardized rate: A rate that has been mathematically adjusted so that differences in the age makeup of two populations cannot explain away a difference in rates, making comparisons between groups fair.; Appears in: Descriptive Epidemiology in RWE
aggregate data: Summary numbers reported in a paper - means, percentages, overall results - without access to the individual patients.; Appears in: Ecological (Aggregate) Study, MAIC and STC: Population-Adjusted Indirect Comparisons
aggregate-data meta-analysis: The conventional approach that combines only the published summary statistics from each study (for example, one hazard ratio and confidence interval per study) rather than the underlying patient records.; Appears in: Individual Participant Data (IPD) Meta-Analysis
aggregated outcome series: A table with one row per time period (e.g., one row per month) where each row records how many events — fills, hospitalizations, diagnoses — happened across the whole study population that period, usually expressed as a rate per 1,000 people enrolled.; Appears in: Interrupted Time Series (Segmented Regression)
AHRQ Elixhauser index: The maintained version of the measures from AHRQ's HCUP program, shipping separate weight sets for predicting mortality and readmission.; Appears in: Elixhauser Comorbidity Measures / Index
AIC: The Akaike Information Criterion, equal to minus twice the maximized log-likelihood plus twice the number of parameters; lower AIC indicates a better balance of model fit and complexity when comparing models on the same dataset.; Appears in: Maximum Likelihood Estimation
AIC (Akaike Information Criterion): A model comparison score (lower is better) that penalizes complexity; a difference of more than 10 AIC units between two models is generally considered strong evidence in favor of the lower-AIC model.; Appears in: Splines and Flexible Functional Forms
AIC / BIC: Statistical scores (lower is better) that measure how well a model fits the observed data — but they say nothing about which curve is most accurate in the unobserved tail beyond the data.; Appears in: Survival Extrapolation for HTA Using RWE
ALCOA+: Attributable, Legible, Contemporaneous, Original, Accurate, plus Complete, Consistent, Enduring, and Available — FDA data integrity principles applied to RWE to ensure every data transformation step is traceable and that the analysis dataset matches what was submitted.; Appears in: QC, Double Programming, and Reproducible Analysis
algorithm: A written rule applied to data fields — such as diagnosis codes, procedure codes, or lab results — that labels each patient as a case or a non-case of a condition.; Appears in: Algorithm Validation, Positive and Negative Predictive Value
algorithm-positive: A patient who meets all the coded criteria in the case definition, meaning the database rule flagged them as a likely case before any human chart review.; Appears in: Safety Signal Case Definition
all-cause costs: The total amount paid for every medical and pharmacy claim a patient generated during a study period, regardless of what condition prompted each visit or prescription.; Appears in: All-Cause vs Attributable Costs
all-cause utilization: Counting every healthcare event a patient had during the observation window, regardless of which medical condition prompted it.; Appears in: Healthcare Resource Utilization (HCRU)
allowed amount: The amount the insurer agrees to pay for a service after applying the contracted rate or fee schedule, as opposed to the submitted charge, which is what the provider billed.; Appears in: CMS-1500 / 837P Professional Claim Fields, Healthcare Costs (PPPM, PPPY, PMPM)
alpha and beta parameters: The two positive shape numbers (written α and β) that control where a beta distribution is centred and how spread out it is; larger values give a narrower, more confident distribution.; Appears in: Beta Distribution for Proportions and Utilities
alpha spending: Dividing a fixed total false-alarm budget across all the planned looks so that repeatedly peeking does not inflate the chance of a false signal.; Appears in: MaxSPRT and Sequential Safety Surveillance
AMSTAR: A quality-rating checklist (currently AMSTAR-2) used to score how rigorously a systematic review was conducted, yielding a confidence label of high, moderate, low, or critically low.; Appears in: Umbrella Review (Review of Systematic Reviews)
analyte (Component): The specific substance being measured in a lab test — for example, creatinine, glucose, or hemoglobin — which is the first of the six named parts of a LOINC code.; Appears in: LOINC Laboratory and Observation Codes
analytical theme: A higher-level insight the reviewers build by comparing and interpreting descriptive themes across all included studies — a conclusion that no single study stated on its own.; Appears in: Qualitative Evidence Synthesis
analyzable cohort: The group of patients who survive every eligibility rule and whose outcomes can be validly measured; the number at the bottom of the attrition funnel.; Appears in: Database Feasibility Assessment and Attrition Funnel
anchor claim: The first qualifying record that opens an episode — for example, an inpatient hospital claim where the heart-attack diagnosis code appears as the main reason for admission.; Appears in: Acute Event Deduplication Window
anchored vs unanchored: Anchored comparisons keep a shared comparator and only balance effect modifiers; unanchored ones drop it and must balance every relevant patient factor, assuming none is missing.; Appears in: MAIC and STC: Population-Adjusted Indirect Comparisons
Andersen-Gill model: A counting-process Cox model where each time a patient is at risk for the next event becomes its own data row, allowing all repeat events to contribute to the estimate.; Appears in: Recurrent Events Analysis
annual prevalence: Period prevalence calculated over exactly one calendar year — the standard used in most public health surveillance reports.; Appears in: Prevalence (Point, Period, and Annual) in RWE
annualize: Convert a monthly rate to an annual one by multiplying by 12; PPPY = PPPM × 12, expressing the same rate as a per-year figure.; Appears in: Healthcare Costs (PPPM, PPPY, PMPM)
approximate flag: A marker in a GEM row indicating that the mapping is the best available match but not a true clinical equivalent — the majority of GEM rows carry this flag.; Appears in: Code Crosswalks and Mappings Between Coding Systems
arbitrary pattern: A missingness structure where gaps appear in no predictable order across variables on the same record, requiring a more flexible method called chained-equations imputation.; Appears in: Missing Data Pattern Table
Area Deprivation Index (ADI): A published composite score built from 17 census variables (income, housing, employment, education) that ranks every U.S. census block group from 1 (least deprived) to 100 (most deprived) on a national scale.; Appears in: Social Determinants of Health (SDoH) in RWE
area under the curve: The total surface below a survival curve when plotted over time; in this model it directly equals the average time a patient spends in that state, which is then multiplied by the cost or quality-of-life weight for that state.; Appears in: Partitioned Survival Model
area-level measurement: A value assigned to a geographic unit such as a census tract or ZIP code (for example, median income or an index score) rather than measured directly on an individual person.; Appears in: Social Determinants of Health (SDoH) in RWE
arithmetic mean: The ordinary average — add up all values and divide by the number of patients; it is highly sensitive to even one extremely large value.; Appears in: Cost Outlier Handling (Winsorization, Trimming, Robust/Two-Part Models)
artificial censoring: Deliberately removing a clone from its assigned strategy arm the instant the real patient's observed behavior deviates from that strategy's rule, even though the real patient is still alive and in the data.; Appears in: Clone-Censor-Weight for Per-Protocol Target-Trial Emulation
assay sensitivity: The ability of a trial to detect a meaningful difference between treatments if one truly exists; non-inferiority logic breaks down if the trial was so poorly designed or powered that it could not have distinguished an effective drug from an ineffective one.; Appears in: Equivalence and Non-Inferiority Testing
asymmetric ascertainment: When the chance of recording an outcome differs between study groups because one group is watched more closely, creating a measurement difference that does not reflect a biological difference.; Appears in: Surveillance and Detection Bias
at-risk clock: For one person, the stretch of calendar time when they were observable and still eligible to have a first event; it starts when follow-up begins and stops at the event, study end, leaving the data, or death.; Appears in: Incidence Rate Calculation
at-risk cohort: The set of patients who are still being observed at any given point in time and who have not yet had the outcome — this group shrinks as patients leave or are lost.; Appears in: Attrition and Loss to Follow-Up
at-risk window: The stretch of calendar time, after the lag and induction periods have been excluded, during which a disease event would be attributed to the drug exposure.; Appears in: Exposure Lag, Induction, and Latency Windows
ATC classification: The Anatomical Therapeutic Chemical classification system used internationally to identify drugs by their active ingredient and therapeutic use (e.g., B01AF02 = rivaroxaban), replacing the US National Drug Code; ATC codes are the international equivalent of NDC numbers for identifying what drug was dispensed.; Appears in: International Real-World Data Sources
ATC code: A five-character alphanumeric code assigned by the WHO to each drug substance, classifying it from anatomical group (first character) down to specific chemical substance (fifth position) — for example, A10BA02 for metformin.; Appears in: ATC Classification and Defined Daily Dose (DDD)
ATC version pinning: The practice of recording and fixing which annual release of the WHO ATC/DDD index was used for a study, so that results are reproducible even after the WHO updates DDD values or reclassifies drugs in subsequent years.; Appears in: ATC Classification and Defined Daily Dose (DDD)
ATE (average treatment effect): The effect of the treatment if it were given to every eligible patient in the full population — the right target for a coverage or formulary decision that applies to everyone.; Appears in: Estimands (ATE/ATT) and Intercurrent Events in RWE, Predictive and Causal ML Models
ATO (average treatment effect in the overlap population): The causal estimand targeted by overlap weights, representing the average treatment effect among patients near clinical equipoise — those whose measured characteristics place them midway between the two treatment arms rather than strongly predicted to receive one drug.; Appears in: Overlap Weights and Modern Propensity Weighting
ATT: Average Treatment effect on the Treated — the estimated causal effect of the policy specifically for the units that actually adopted it, expressed here as the DiD double-difference.; Appears in: Difference-in-Differences with Staggered Adoption
ATT (average treatment effect in the treated): The effect of the treatment among the patients who actually received it — answers how well the drug worked for the people who chose it, not for everyone who could have.; Appears in: Estimands (ATE/ATT) and Intercurrent Events in RWE
attributable fraction: The percentage of a patient's total spending that is captured by the disease-specific rule, calculated as disease-attributable cost divided by all-cause cost; a useful transparency check because it is often well below 100 percent in chronic disease.; Appears in: All-Cause vs Attributable Costs
attributable fraction among the exposed (AF_exposed): The proportion of disease cases in the exposed group that would not have occurred if those people had the same risk as the unexposed; equal to AR divided by the risk in the exposed group.; Appears in: Attributable Risk and Population Attributable Fraction
attributable risk (AR): The difference in risk between exposed and unexposed groups; it estimates how many extra disease events occur per 100 (or 1000) exposed persons solely because of the exposure.; Appears in: Attributable Risk and Population Attributable Fraction
attribute: One specific feature of a treatment that is varied across choice options in a DCE, such as the chance of a side effect, the way the drug is taken, or the monthly out-of-pocket cost.; Appears in: Patient Preference Study (DCE / BWS)
attrition: The number of patients lost between one stage and the next — the patients who were present at step N but did not show up at step N+1.; Appears in: Cascade of Care Analysis
attrition funnel: A step-by-step flow diagram showing how many patients were excluded at each filter (e.g., diagnosis requirement, enrollment length, prior drug use) on the way from the full database to the final study group.; Appears in: Database Feasibility Assessment and Attrition Funnel, Visualizations and Diagrams in Pharmacoepidemiology and RWE
AUC: Area Under the Curve — the single number summarizing discrimination, equal to the fraction of all case/non-case pairs where the case scored higher.; Appears in: ROC Curves, AUC, and the c-statistic
AUC (C-statistic): A single number between 0 and 1 summarising discrimination; it equals the probability that a randomly chosen patient who has the event gets a higher predicted risk than a randomly chosen patient who does not.; Appears in: Prediction Model Validation and Recalibration in RWE
audit trail: A complete, versioned record of every protocol decision, code file, and parameter setting (such as the length of a drug-free lookback window) so a reviewer can follow the path from the original question all the way to the final number without gaps.; Appears in: Regulatory and HTA Readiness for RWE
augmentation: Adding a second drug on top of the existing regimen while keeping the original drug going — this does NOT start a new line under most cancer conventions because the first treatment has not failed.; Appears in: Treatment Patterns and Lines of Therapy (LOT)
autocorrelation: The tendency for consecutive monthly (or weekly) measurements to be more similar to each other than would be expected by chance — if rates were high last month they tend to be high this month — which means standard confidence intervals are too narrow and must be corrected.; Appears in: Interrupted Time Series (Segmented Regression)
average marginal effect (AME): The average difference in predicted probability of the outcome between two treatment groups, calculated by running the model's prediction engine twice (once labeling everyone as treated, once as untreated) and taking the mean difference across all patients.; Appears in: Marginal Effects and Interpretation of Inferential Statistics
average treatment effect: The average difference in outcomes between what would have happened if everyone in the study had received the treatment versus if everyone had not, calculated across the whole study population.; Appears in: Targeted Maximum Likelihood Estimation (TMLE)

B

B2 reversal: The NCPDP pharmacy transaction code indicating a prescription fill was reversed after initial adjudication, usually because the patient never picked up the medication; these reversals must be matched and removed to avoid phantom fills inflating adherence metrics.; Appears in: Claim Adjustments, Reversals, and Denials
backdoor path: Any path in the DAG that flows from the exposure back through an arrow pointing into the exposure and then out to the outcome — a noncausal route that, if left open, makes the drug look better or worse than it really is.; Appears in: DAGs and the Backdoor Criterion for Drug Studies
backward mapping: Translating from a newer or more granular system back to an older or less granular one, for example ICD-10-CM to ICD-9-CM; often collapses multiple specific codes into one general code.; Appears in: Code Crosswalks and Mappings Between Coding Systems
base DRG: The common clinical identity shared by the two or three severity-tiered DRGs in a family, before splitting by CC/MCC status — for example, heart failure is the base DRG behind the 291/292/293 triplet.; Appears in: MS-DRG (Medicare Severity Diagnosis-Related Groups)
baseline characteristics: The traits each patient already had before treatment started (age, sex, existing conditions), measured in a window before their first dose.; Appears in: Baseline Characteristics and Covariate Balance
baseline period: The stretch of time before the index date (commonly 180 or 365 days) during which you check that a patient was not already using the study drug and measure the health conditions they had going in.; Appears in: Self-Controlled Case Series (SCCS), Study Time Windows: Baseline, Observation, and Outcome Windows
baseline risk: The risk in the comparison (unexposed) group; the same relative risk implies a large absolute difference in a high-risk population and a tiny one in a low-risk population, which is why an RD from one study cannot be directly transplanted to another setting.; Appears in: Disease Risk Scores, Marginal Effects and Interpretation of Inferential Statistics, Number Needed to Treat (and Number Needed to Harm), Risk Ratio and Risk Difference
Bayes' theorem: A formula that combines how good the algorithm is with how common the disease is to give the chance a flagged record is truly a case.; Appears in: Positive and Negative Predictive Value
BCa interval: Bias-corrected and accelerated bootstrap CI; adjusts the percentile cutoffs to correct for bias and for how the standard error changes with the parameter value, giving better coverage than the plain percentile method.; Appears in: Bootstrap and Resampling Methods
benefit-cost ratio (BCR): Total monetized benefits divided by total costs; a BCR above 1.0 means each dollar spent returns more than one dollar in benefits.; Appears in: Cost-Benefit Analysis (CBA)
between-group variance: How much the group averages differ from the overall average — the "signal" in an ANOVA that you hope is larger than the noise inside each group.; Appears in: One-Way ANOVA
between-study heterogeneity: The degree to which the treatment effect genuinely differs from study to study, measured by statistics called tau-squared and I-squared — high heterogeneity means results varied more than you would expect from chance alone.; Appears in: Individual Participant Data (IPD) Meta-Analysis
bias correction: A mathematical adjustment that uses information from the validation substudy to shift the main study's raw estimate closer to what it would have been if the missing factor had been measured for everyone.; Appears in: External Adjustment and Validation-Substudy Bias Correction
bias factor: A multiplier computed from the bias parameters that captures how much the observed estimate would move if the assumed hidden confounder were present; dividing the observed estimate by the bias factor gives the adjusted estimate.; Appears in: Unmeasured Confounding Probabilistic Bias Analysis
bias parameters: The specific numbers that describe how a bias operates — for example, how common an unmeasured risk factor is in each treatment group, or how often an outcome code in claims data correctly identifies the true event.; Appears in: Quantitative Bias Analysis Toolkit, Unmeasured Confounding Probabilistic Bias Analysis
bias-variance tradeoff: The fundamental tension in statistical estimation where adding a small amount of bias (by penalizing coefficients) can reduce variance enough that predictions on new data improve overall — the core justification for regularized regression over ordinary least squares in high-dimensional settings.; Appears in: Regularized Regression: LASSO, Ridge, and Elastic Net
BIC (Bayesian Information Criterion): A score that balances how well a model fits the data against how complex the model is, used to choose the number of classes k — lower BIC favors the better model, and analysts look for the k where the BIC stops improving.; Appears in: Group-Based Trajectory Models and Latent Class Analysis
billable code: A diagnosis code that has reached its most specific level and can legally appear on a claim; three-character category codes like I50 are header codes, not billable, and will be rejected by a payer.; Appears in: ICD-10-CM Diagnosis Codes
billing provider: The entity — usually a practice group or clinic — that submits the claim and receives payment; its NPI appears in box 33 and is distinct from the rendering provider who delivered the care.; Appears in: CMS-1500 / 837P Professional Claim Fields
billing units: The number of units a provider reports on a claim for a single drug administration; each unit equals the dose amount written in the code's official description (for example, one unit of J9305 equals 10 mg of pemetrexed, so 50 units means 500 mg was given).; Appears in: HCPCS Level II Codes and J-Codes
binary outcome: An endpoint that is recorded as yes or no for each patient — for example, 'was hospitalized within 90 days' (yes/no) or 'achieved response at 12 weeks' (yes/no).; Appears in: Logistic Regression for Binary Outcomes
biomarker: A measurable biological signal, such as a gene mutation, protein level, or lab value, that identifies a patient subgroup likely to respond differently to a treatment.; Appears in: Biomarker-Defined Cohort (RWE)
Bland-Altman limits of agreement: The range within which approximately 95% of differences between two measurement methods are expected to fall; if this range is clinically acceptable, the two methods can be used interchangeably.; Appears in: Agreement Statistics: Kappa, ICC, and Bland-Altman
blinded review: The practice of hiding each patient's drug or treatment assignment from the reviewers so their decisions cannot be influenced by knowing which group the patient was in.; Appears in: Endpoint Adjudication and Chart Review
blip function: The specific form of the treatment effect at one time interval — how much treated patients' outcomes shifted compared to what they would have been had they stopped treatment at that moment — which the analyst pre-specifies before looking at the data.; Appears in: G-Estimation of Structural Nested Models
Bonferroni correction: The simplest multiplicity adjustment — divide the target alpha by the number of tests (for example, 0.05 divided by 10 equals 0.005) and reject only hypotheses whose raw p-value falls below that stricter threshold.; Appears in: Multiplicity and Multiple Comparisons
boosting: A strategy that builds models sequentially, where each new model focuses on the cases the previous models got wrong, gradually reducing errors by combining many weak learners into one strong predictor.; Appears in: Tree-Based Ensembles: Random Forests and Gradient Boosting
bootstrap aggregating: A technique that creates many different training datasets by sampling the original data with replacement, fits one model to each, then averages all predictions — reducing the jumpiness (variance) of any single model.; Appears in: Tree-Based Ensembles: Random Forests and Gradient Boosting
bootstrap replicate: One random re-sample drawn with replacement from the original data, used to compute one value of the statistic being studied.; Appears in: Bootstrap and Resampling Methods
borrowing: Using summary information from a past or external group of patients to supplement the control arm of a current study, rather than collecting all that information again from scratch.; Appears in: Bayesian Borrowing from Historical / External Controls
bounding: Calculating the worst-case upper and lower limits of how much a study result could shift if the excluded patients differed from the included ones in a plausible way.; Appears in: Selection Bias Sensitivity Analysis
Bross bias formula: The ranking formula hdPS uses to score each candidate code: codes that appear more often in one treatment group AND are associated with the outcome get a higher score and are more likely to be selected.; Appears in: High-Dimensional Propensity Score (hdPS)
Business Associate Agreement (BAA): A legal contract required under HIPAA before sending patient health information to a cloud service provider; without a BAA, transmitting clinical notes to an LLM API is a compliance violation regardless of whether a data breach occurs.; Appears in: LLM-Assisted Data Abstraction and Evidence Work in RWE
buy-and-bill: The payment arrangement in which a doctor or clinic purchases a drug from a supplier, administers it to the patient, and then bills the insurer for both the drug cost and the administration service; this is how most infused oncology and biologic drugs are paid for under the medical benefit.; Appears in: HCPCS Level II Codes and J-Codes

C

calibrated p-value: A recalculated p-value that judges how surprising the real finding is relative to the spread of the negative-control estimates, rather than assuming the pipeline is perfectly unbiased.; Appears in: Empirical Calibration with Negative Controls
calibration: For models that predict probabilities (e.g., "this patient has a 30% chance of readmission"), calibration measures whether those predicted percentages actually match observed event rates in the data — a perfectly calibrated model's 30% predictions come true about 30% of the time.; Appears in: Brier Score, Prediction Model Validation and Recalibration in RWE, Regression Diagnostics and Model Checking, ROC Curves, AUC, and the c-statistic
calibration plot: A chart that groups patients into bins by predicted risk, then compares the average predicted risk in each bin to the fraction who actually had the event — points on the diagonal mean perfect calibration.; Appears in: Prediction Model Validation and Recalibration in RWE
canonical link: The default, numerically most stable link for a given family (logit for Binomial, log for Poisson and Gamma, identity for Gaussian); using a non-canonical link is valid but may cause convergence problems.; Appears in: Generalized Linear Models (GLM)
cap-at-1 rule: A carry-over rule that prevents an early refill from double-counting days already covered — the covered days from a new fill only extend past the point where the prior supply ran out.; Appears in: Exposure Episode Construction
CARE checklist: The standard list of items a published case report must include, such as a patient timeline and a consent statement.; Appears in: Case Report
care setting: Where the medical visit took place — for example, an inpatient hospital stay, an emergency room visit, or a routine outpatient office visit — which signals how serious the encounter was.; Appears in: Outcome Algorithm Construction, Safety Signal Case Definition
carry-over bridging: The bridging rule that assumes a chronic medication continued uninterrupted during a hospital stay, marking those hospital days as 'covered' even though no outpatient fill was dispensed.; Appears in: Inpatient Bridging of Drug Exposure
carryover rule: The analyst's choice about whether to bank leftover pill-days from an early refill and push them to the end of coverage, rather than discarding them when the next fill arrives.; Appears in: Stockpiling and Carryover Rules
cascade stage: One named checkpoint on the patient journey (for example, 'received treatment') at which an analyst counts how many people have reached that point.; Appears in: Cascade of Care Analysis
case: A person who meets the definition of having the condition being measured, such as having a diagnosis recorded in their medical record or having filled a prescription for a treatment.; Appears in: Case-Control Study Design, Nested Case-Control Design, Prevalence (Point, Period, and Annual) in RWE
case definition: A written, pre-specified set of clinical criteria a candidate event must meet to be counted as a confirmed outcome — for example, the Fourth Universal Definition of Myocardial Infarction.; Appears in: Endpoint Adjudication and Chart Review
case-finding: Using a large database to flag a small handful of patients worth pulling for a detailed chart review.; Appears in: Case Report
case-finding window: The span of calendar time during which both qualifying outpatient claims must appear; if the second claim arrives after this window closes, the patient does not meet the rule.; Appears in: Diagnosis Phenotype Algorithm (1 IP / 2 OP, Time Window)
category code: The three-character root of an ICD-10-CM code (for example, M20) that groups related conditions; it is a header and is not itself billable or submittable on a claim.; Appears in: ICD-10-CM Diagnosis Codes
Category III code: A temporary CPT code (four digits plus the letter T) used for new or experimental procedures; if the procedure becomes common enough, the AMA replaces it with a permanent five-digit Category I code, which means historical claims data will show two different codes for the same procedure at different time points.; Appears in: CPT Codes (HCPCS Level I)
causal ML: A family of machine-learning methods (such as double ML and TMLE) that use flexible algorithms only to control for confounding, then apply a special estimator to isolate the treatment effect.; Appears in: Predictive and Causal ML Models
cause-specific hazard: The instantaneous rate at which the outcome of interest occurs among patients who have not yet had the outcome or any competing event; it answers an etiologic question about the biological process.; Appears in: Competing Risks (Cause-Specific Hazard, Cumulative Incidence, and Fine-Gray)
CC/MCC: Complication or Comorbidity (CC) and Major Complication or Comorbidity (MCC) — severity categories assigned to secondary diagnoses by CMS that determine whether a stay lands in the highest-payment, middle, or lowest-payment tier of a DRG family.; Appears in: MS-DRG (Medicare Severity Diagnosis-Related Groups)
CDM: Common Data Model — a fixed table structure and shared vocabulary that makes patient data look identical across different hospitals or insurers, so one analysis script runs everywhere.; Appears in: OMOP CDM Method Patterns for RWE
CEAC: The cost-effectiveness acceptability curve is a graph that shows, at each possible willingness-to-pay threshold, the percentage of PSA simulation draws in which the treatment was cost-effective.; Appears in: Probabilistic Sensitivity Analysis (PSA) for Health-Economic Models
cell suppression: The practice of withholding any count smaller than a minimum threshold (commonly fewer than 11 patients) before sharing aggregate results, to prevent re-identification of individuals from small groups.; Appears in: Federated and Distributed Network Analysis
censor: Stopping the clock on a person before the study ends because you can no longer watch them, for example they left their insurance plan, so you stop counting their time without recording an outcome.; Appears in: Prospective Cohort Study
censored: A patient whose outcome has not yet occurred by the end of the study period; their data still contribute information about the time they were observed without the outcome.; Appears in: Incidence Rate Calculation, Landmark Analysis
censored observation: A patient whose event was not observed because the study ended or the patient left before the event occurred; in cure models, long-term censored patients provide the evidence for the cure fraction.; Appears in: Cure Models (Mixture and Non-Mixture)
censoring: What happens when a participant's follow-up ends for a reason other than the event of interest — for example, they lose insurance coverage or the study period ends — meaning we stop counting their time but do not count them as having had the event.; Appears in: Attrition and Loss to Follow-Up, Clone-Censor-Weight for Per-Protocol Target-Trial Emulation, Competing Risks (Cause-Specific Hazard, Cumulative Incidence, and Fine-Gray), Cox Proportional Hazards Regression, Cox Regression with Time-Dependent Covariates, Cumulative Incidence and Absolute Risk Estimation, Inverse Probability of Censoring Weighting (IPCW), Kaplan-Meier Estimator, Log-Rank Test, Person-Time Denominator Construction, Real-World Progression and rwPFS, Recurrent Events Analysis, Restricted Mean Survival Time (RMST), Retrospective Cohort Study Design, Win Ratio and Generalized Pairwise Comparisons
censoring weight: A number assigned to each patient who remains in the study, equal to the inverse of their estimated probability of still being there, so patients who were unlikely to stay carry more weight in the analysis.; Appears in: Inverse Probability of Censoring Weighting (IPCW)
Central Limit Theorem: A mathematical result guaranteeing that the average of a large enough sample from any distribution with a finite mean and variance will itself follow approximately a bell-curve distribution, regardless of the shape of the individual data.; Appears in: Normal Distribution and the Central Limit Theorem
CERQual confidence rating: A label (high, moderate, low, or very low) that tells a reader how much trust to place in a synthesized finding, based on study quality, how consistent findings were, how much supporting data existed, and how relevant the studies are to the decision question.; Appears in: Qualitative Evidence Synthesis
change from baseline: The difference between a patient's outcome value at a later visit and their starting value on the day they entered the study — a common way to measure how much a treatment moved the needle.; Appears in: Mixed Model for Repeated Measures (MMRM) in RWE, Patient-Reported Outcomes in Real-World Settings
channeling: The systematic tendency for newer or more specialized drugs to be prescribed preferentially to sicker patients or those who have already failed other treatments, creating an unfair comparison in observational studies.; Appears in: Confounding by Indication and Channeling Bias, Negative Control Exposures
chart review: The process of a clinician reading the actual medical record for a patient to confirm whether a coded event (like a heart attack) truly happened — used as the gold-standard reference.; Appears in: Claims Outcome Algorithm PPV/Sensitivity Trade-off, EHR Phenotyping Algorithms, Outcome Algorithm Construction
charting: Reading each included study and recording its key features into a structured form, like filling one spreadsheet row per study, so the studies can be compared side by side.; Appears in: Scoping Review
check function (pinball loss): The loss function minimised by quantile regression; it penalises predictions that are too low more than predictions too high (or vice versa, depending on the target percentile), pulling the fitted line to the chosen quantile rather than the mean.; Appears in: Quantile Regression
CI E-value: The smaller confounder strength needed only to stretch your confidence interval until it touches 1.0, erasing significance.; Appears in: E-value Sensitivity Analysis
claim line: A single row within a multi-line institutional or professional claim, each carrying its own service date, charge amount, revenue code (on institutional claims), and optionally a HCPCS/CPT procedure code.; Appears in: Revenue (Center) Codes
claim lineage: The chain of claim transactions linking an original submission to its subsequent voids and replacements, tracked via a claim control number or internal claim identifier.; Appears in: Claim Adjustments, Reversals, and Denials
claims: Records generated when care is billed to insurance; they show what services and drugs were paid for, but not clinical detail like disease stage or lab values.; Appears in: Disease Registry
Claims completeness: How thoroughly a database captures all the healthcare services a patient actually received; a source with lower completeness will appear to show fewer services even if the patient had them.; Appears in: Medicare FFS vs Medicare Advantage vs Commercial Claims Differences
claims data: Administrative records that a health plan creates every time a member gets a service billed to insurance, containing the date, type of service, diagnosis codes, and dollar amounts.; Appears in: Healthcare Costs (PPPM, PPPY, PMPM), Observational Comparative Effectiveness Research, Retrospective Cohort Study Design
claims outcome algorithm: The set of billing codes and rules an analyst writes to identify a medical event — such as a hospitalization for heart attack — from insurance records.; Appears in: Claims Outcome Algorithm PPV/Sensitivity Trade-off
claims-observable person-time: The stretches of follow-up time during which a patient has active insurance coverage that generates medical claims records, so that outcomes can actually be detected in the data.; Appears in: Product/Exposure Registry
class imbalance: When one group (usually the healthy majority) hugely outnumbers the other, so a lazy test can look accurate just by siding with the big group.; Appears in: Diagnostic Accuracy Study
classification window: The period from the index date up to and including the landmark day, during which the analyst looks at records to decide which group each patient belongs to.; Appears in: Landmark Analysis
clinical equipoise: The state in which a clinician has genuine uncertainty about which treatment to give a patient because both options are medically reasonable given that patient's baseline characteristics, resulting in a propensity score near 0.5.; Appears in: Overlap Weights and Modern Propensity Weighting
cloning: Creating two identical copies of every patient at the study start date so that each copy can be assigned to a different treatment strategy simultaneously.; Appears in: Clone-Censor-Weight for Per-Protocol Target-Trial Emulation
cluster: A natural grouping in the data whose members share something in common — for example, all patients treated at the same hospital, or all visits from the same person.; Appears in: Cluster-Randomized Trial, Cluster-Robust Standard Errors
cluster bootstrap: A bootstrap variant that resamples whole patients (or other clusters) rather than individual rows, preserving the correlation among multiple claims or visits from the same person.; Appears in: Bootstrap and Resampling Methods
clustered or correlated data: Data where multiple observations belong to the same unit, such as several lab values from the same patient, making those observations more similar to each other than to observations from a different patient.; Appears in: GEE Population-Average (Marginal) Models
code: A short label an analyst attaches to a chunk of interview text to tag the idea in it (for example, "fear of needles").; Appears in: Qualitative Interview Study
code algorithm (crosswalk): The published list of ICD diagnosis codes (Deyo for ICD-9, Quan for ICD-10) that decides whether a condition counts as present in administrative data.; Appears in: Charlson Comorbidity Index (CCI)
code system: The shared dictionary a claim uses to label what happened, such as ICD-10 for diagnoses, CPT for procedures, and NDC for drugs.; Appears in: Administrative Claims Analysis
codebook: The agreed list of codes and what each one means, so that two analysts label the same passage the same way.; Appears in: Qualitative Interview Study
Coding intensity: The tendency for Medicare Advantage plans to document more diagnoses per patient than traditional Medicare would for the same patient, because plans are paid more when their enrollees appear sicker under the government's risk-adjustment formula.; Appears in: Medicare FFS vs Medicare Advantage vs Commercial Claims Differences
coefficient of variation: The standard deviation divided by the mean (SD/mean); for a gamma distribution this ratio is constant regardless of the mean level, which matches the empirical observation that higher-cost patient groups show proportionally higher variability in costs.; Appears in: Gamma Distribution for Cost and Skewed Outcomes
cognitive debriefing: A round of patient interviews in which draft questionnaire items are read aloud and patients explain what they think each question means, so that confusing or ambiguous wording can be fixed before the tool is finalized.; Appears in: PRO Instrument Development
Cohen's kappa: A statistic ranging from -1 to 1 that measures how much two raters agree on a categorical outcome beyond what would be expected if each rater made independent decisions based only on their own overall calling rates.; Appears in: Agreement Statistics: Kappa, ICC, and Bland-Altman, Endpoint Adjudication and Chart Review
cohort: A defined group of patients who are enrolled at some starting point and followed forward in time to see who develops the outcome.; Appears in: Nested Case-Control Design
cohort entry date: The calendar day a patient officially enters the study — for example, the date of a new atrial fibrillation diagnosis — which kicks off follow-up.; Appears in: Immortal Time Bias Handling
cohort exit: The day a patient stops being watched -- whichever comes first among the design end rule (e.g., 365 days of follow-up), the last day of insurance coverage, death, or the occurrence of the outcome.; Appears in: OMOP Time-at-Risk and Cohort Exit
collider: A variable that is caused by two other variables at once; in selection bias, being in the study is caused by both treatment and outcome, so restricting the analysis to study participants distorts the comparison.; Appears in: DAGs and the Backdoor Criterion for Drug Studies, Selection Bias Sensitivity Analysis
combination code: An entry in the GEM where one ICD-9-CM concept requires a cluster of two or more ICD-10-CM codes together to capture the full clinical meaning, because no single ICD-10-CM code covers everything the ICD-9 code described.; Appears in: Code Crosswalks and Mappings Between Coding Systems
combination product trap: The mapping problem that arises when a single dispensed product (one NDC) contains two active ingredients with different ATC codes — for example, metformin/sitagliptin — requiring either two separate DDD calculations or a deliberate single-component assignment choice.; Appears in: ATC Classification and Defined Daily Dose (DDD)
common comparator: The treatment that appears in multiple trials and serves as the anchor linking otherwise unconnected drugs in the network.; Appears in: Network Meta-Analysis
common comparator (anchor): A treatment that appears in both trials (here, drug A), used as a shared yardstick so the two trials' results can be linked.; Appears in: MAIC and STC: Population-Adjusted Indirect Comparisons
common data model: A shared blueprint that tells every participating database how to name its tables and columns so that the same analysis code can run identically at every site without custom rewriting.; Appears in: Multi-Database / Distributed Network Study
common data model (CDM): A shared blueprint that specifies how every participating database should name its tables and columns so that the same analysis code runs identically at every site without custom rewriting.; Appears in: Federated and Distributed Network Analysis
comorbidity flag: A yes/no marker for whether a patient has a given condition, set by whether qualifying diagnosis codes appear in the baseline window.; Appears in: Elixhauser Comorbidity Measures / Index
comorbidity weight: The fixed point value (1, 2, 3, or 6) the index assigns to a condition based on how strongly that condition predicted death within a year.; Appears in: Charlson Comorbidity Index (CCI)
Comparator: The alternative treatment or no-treatment group that the intervention is measured against, so that any difference in outcomes can be attributed to the treatment rather than to differences in who was treated.; Appears in: PICOTS Framework for RWE
comparator drug: The established drug patients are already taking, against which the new study drug is being compared.; Appears in: Prevalent New-User Design
comparator group: A separate group of similar patients who did not get the treatment, used to judge whether the treatment made a difference; a case series has none.; Appears in: Case Series
comparison group: Plans, states, or providers that did not receive the policy change during the study window, used to estimate what would have happened to the treated group without the policy.; Appears in: Difference-in-Differences with Staggered Adoption
competing event: An outcome that permanently prevents the event of interest from ever occurring — death is the classic example, because a patient who has died can never later have a stroke.; Appears in: Cumulative Incidence and Absolute Risk Estimation
competing risk: An event — here, death — that prevents the outcome of interest from ever occurring; in oncology, a patient who dies cannot later progress, so death must be modeled as its own event type rather than treated as a simple dropout.; Appears in: Competing Risks (Cause-Specific Hazard, Cumulative Incidence, and Fine-Gray), Therapeutic-Area-Specific RWE Challenges — Oncology
Complete-case sample: The subset of patients who have no missing values on any variable the analysis uses; these are the only patients kept when complete-case analysis is applied.; Appears in: Complete-Case Analysis
complex sampling: A survey design that uses stratification and multi-stage selection (for example, first selecting counties, then households, then individuals) instead of drawing everyone at random from one big list.; Appears in: Survey Weights and Complex Sampling
component event: One of the individual events (such as heart attack, stroke, or death) that together make up the composite endpoint.; Appears in: Composite Endpoint Construction
composite endpoint: An outcome that counts whichever of two or more events happens first — for rwPFS, that means either progression or death both qualify as the endpoint, so the patient is not counted as event-free just because they survived.; Appears in: Composite Endpoint Construction, Real-World Progression and rwPFS
computable phenotype: A rule built from billing or health-record codes that labels each patient as having a condition or not.; Appears in: Sensitivity and Specificity
concept: A single, uniquely identified clinical idea in SNOMED CT, represented by a permanent numeric code (e.g., 44054006 for Diabetes mellitus type 2) that means the same thing in every country and every version of the terminology.; Appears in: SNOMED CT Clinical Terminology
concept elicitation: A qualitative research technique in which researchers interview patients with the disease to discover, in patients' own words, which symptoms and impacts are most important to capture in the questionnaire.; Appears in: PRO Instrument Development
concept set: A saved, versioned list of standardized numeric codes (concept_ids) that together define one clinical idea, such as all recorded diagnoses of type 2 diabetes in an OMOP database.; Appears in: OMOP Concept Set Development
CONCEPT_ANCESTOR: A lookup table in the CDM that links a general drug or disease concept (e.g., the ingredient atorvastatin) to every more-specific version of it (every brand name, every dose form), so one query captures all of them.; Appears in: OMOP CDM Method Patterns for RWE
concept_id: The unique integer that the OMOP vocabulary assigns to every clinical code; databases from different hospitals or countries use the same concept_id for the same clinical idea, making queries portable.; Appears in: OMOP Concept Set Development, OMOP Standardized Vocabularies (OHDSI/Athena)
concordant pair: One case and one non-case where the case has the higher predicted score — exactly the ordering you want a good model to produce.; Appears in: ROC Curves, AUC, and the c-statistic
concordant pairs: Patient pairs where both conditions give the same answer (both hospitalized or both not hospitalized); they are counted in the table but contribute zero information to the McNemar test, which is why a study with mostly concordant pairs has low power even with a large total sample.; Appears in: McNemar's Test for Paired Proportions
CONDITION_ERA: A derived OMOP table that collapses consecutive CONDITION_OCCURRENCE rows for the same disease into a single continuous time span, recording its start date, end date, and how many individual occurrences were folded into it.; Appears in: OMOP CONDITION_OCCURRENCE and CONDITION_ERA
CONDITION_OCCURRENCE: An OMOP table with one row for each recorded diagnosis event, storing the patient identifier, the standardized disease code, the date the condition was noted, and the type of encounter that produced it.; Appears in: OMOP CONDITION_OCCURRENCE and CONDITION_ERA
condition_type_concept_id: A code on each CONDITION_OCCURRENCE row that records how the diagnosis was captured — for example, as an inpatient primary diagnosis, an outpatient claim, or an EHR problem-list entry.; Appears in: OMOP CONDITION_OCCURRENCE and CONDITION_ERA
conditional logistic regression: A statistical method that compares exposure across matched groups — here, the hazard window versus the referent windows within a single patient — and estimates how many times more likely the event was when the exposure was present.; Appears in: Case-Crossover Design, Nested Case-Control Design
conditional Poisson regression: The statistical method used to estimate the relative incidence by counting events per day in each interval while locking in the individual as their own reference, so only the timing difference between intervals drives the result.; Appears in: Self-Controlled Case Series (SCCS), Self-Controlled Risk Interval (SCRI) Design
conditional quantile: The quantile of the outcome for patients who share a specific set of characteristics (age, treatment, comorbidity), estimated by the regression model — analogous to the conditional mean that ordinary regression estimates.; Appears in: Quantile Regression
conditional share: Each stage's count expressed as a percentage of only the immediately prior stage, showing which single transition is the steepest drop.; Appears in: Cascade of Care Analysis
conditioning on margins: The assumption that the row totals and column totals in a 2×2 table are treated as fixed constants; Fisher's test builds its probability calculation on this assumption, which makes the test valid but also more conservative than tests that let the column totals vary.; Appears in: Fisher's Exact Test
confidence interval: A range of plausible values for the true mean difference; a 95% CI means that if the study were repeated many times, about 95% of such intervals would contain the true value — wider intervals signal more uncertainty, narrower intervals signal more precision.; Appears in: Cluster-Robust Standard Errors, Empirical Calibration with Negative Controls, Inferential Statistics Foundations, Two-Sample (Student's) t-Test
confidence interval (CI): The plausible range around your estimate; if it does not cross 1.0, the result is 'statistically significant.'; Appears in: E-value Sensitivity Analysis
confidence interval half-width: The distance from the point estimate to the upper (or lower) edge of the confidence interval; a half-width of 2 percentage points means the CI spans about 4 percentage points total.; Appears in: Sample Size, Power, and Precision in RWE
confidence interval upper bound: The highest plausible value for the true difference between the two treatments, given the study data; in non-inferiority testing, non-inferiority is demonstrated when this upper limit stays below the pre-specified margin.; Appears in: Equivalence and Non-Inferiority Testing
confidence-thresholded review queue: A workflow where an LLM's extraction is automatically accepted only if its confidence score is above a pre-set threshold, and all lower-confidence outputs are routed to a human reviewer before use in the analysis.; Appears in: LLM-Assisted Data Abstraction and Evidence Work in RWE
confounder: A factor that influences both who receives a treatment and what outcome they experience, making it look like the treatment caused the outcome when it may not have — a confounder must be accounted for in the analysis.; Appears in: Social Determinants of Health (SDoH) in RWE
confounding: A variable that causes both the drug use and the outcome, so patients who receive the drug already differ from those who do not in ways that also affect the outcome — creating a misleading comparison if you ignore it.; Appears in: DAGs and the Backdoor Criterion for Drug Studies, Predictive and Causal ML Models, Targeted Maximum Likelihood Estimation (TMLE)
confounding by contraindication: The reverse of channeling: a drug is withheld from the sickest patients because it is dangerous for them, making the treated group look artificially healthier than untreated patients in a database study.; Appears in: Confounding by Indication and Channeling Bias
confounding by frailty: The bias that arises when frailer patients are systematically more or less likely to get a treatment, so frailty distorts the apparent treatment effect unless adjusted for.; Appears in: Claims-Based Frailty Index (Faurot / Kim)
confounding by indication: When the medical reason a drug is prescribed is itself a cause of the outcome being studied, making the drug appear harmful or beneficial simply because sick people get the drug, not because of what the drug actually does.; Appears in: Active Comparator, New-User Design, Comparative Effectiveness Research (CER) Methods, Confounding by Indication and Channeling Bias, High-Dimensional Propensity Score (hdPS), Instrumental Variables in Pharmacoepidemiology, Observational Comparative Effectiveness Research, Propensity Score Methods (PSM, IPTW)
confounding by unmeasured behavior: When a hidden behavior — like exercising or attending routine check-ups — simultaneously predicts who gets treated and who stays healthier, making the treatment look responsible for an outcome it did not actually cause.; Appears in: Healthy User Bias
confusion matrix: A 2-by-2 table that tallies every patient into one of four buckets: correctly flagged as sick (true positive), wrongly flagged as sick (false positive), correctly cleared as healthy (true negative), or wrongly cleared as healthy (false negative).; Appears in: F1 Score, Precision, and Recall
conjugate prior: A prior distribution family that, when combined with a specific likelihood, produces a posterior in the same family, allowing exact closed-form updating without numerical integration; the beta distribution is the conjugate prior for a binomial event rate.; Appears in: Bayesian Inference Foundations, Beta Distribution for Proportions and Utilities
conservatism: The property of a statistical test whose actual chance of a false positive is below the stated threshold (e.g., truly 2% when you set alpha = 5%); conservative tests are safe but sacrifice power, meaning they miss real effects more often than they should.; Appears in: Fisher's Exact Test
content validity: Evidence that a questionnaire actually covers the symptoms and experiences that matter to patients in the target disease — established by interviewing real patients during development, not by expert opinion alone.; Appears in: PRO Instrument Development
contingency table: A grid that cross-tabulates two categorical variables — for example, rows for treatment group (A vs B) and columns for outcome (event vs no event) — so you can see all four combinations at once.; Appears in: Chi-Square Test of Independence
continuous enrollment: A requirement that a patient be uninterruptedly covered by their insurance plan across the entire lookback window, so that any absence of fills reflects true non-use rather than a gap in data coverage.; Appears in: Database Feasibility Assessment and Attrition Funnel, New-User (Incident-User) Design, Washout / Clean / Lookback Period
control: A comparison person who does not have the outcome, picked to represent the same population the cases came from.; Appears in: Case-Control Study Design
control interval: A second window in the same person, further from the exposure (for example, days 15 through 43 after the dose), used as the within-person comparison when the acute effect is assumed to have faded.; Appears in: Self-Controlled Risk Interval (SCRI) Design
corrected covered area (CCA): A number that measures how many of the same underlying studies appear in more than one of the reviews you collected; high overlap means those reviews are not truly independent evidence.; Appears in: Umbrella Review (Review of Systematic Reviews)
correlation: A number describing whether two measures tend to rise and fall together across the groups you are comparing.; Appears in: Ecological (Aggregate) Study
correlation between pairs: How strongly a patient's pre-measurement predicts their post-measurement; a high correlation (patients who start high stay relatively high) means the paired design gains a lot of power compared to treating the two measurements as if they came from unrelated people.; Appears in: Paired t-Test
cost center: In hospital accounting, a department or unit whose costs are tracked separately (e.g., pharmacy, ICU, operating room); revenue codes map directly to these accounting units on the claim.; Appears in: Revenue (Center) Codes
cost component: One labeled bucket of spending, like hospital care or lost wages, that you add to the others to reach the grand total.; Appears in: Cost-of-Illness (COI) Study
cost offset: Money the plan saves elsewhere because of the new drug, such as avoided hospital stays, which is subtracted from its cost.; Appears in: Budget Impact Analysis
cost-effectiveness: A way of asking whether the extra health benefit a new treatment provides is worth the extra money it costs, usually expressed as dollars per quality-adjusted life year (QALY) gained.; Appears in: Discrete-Event Simulation Using RWE Inputs
cost-effectiveness analysis: A related economic method that compares costs against health outcomes left in natural units (for example, dollars per hospitalization avoided) rather than converting those outcomes into dollars.; Appears in: Cost-Benefit Analysis (CBA)
cost-effectiveness plane: A simple chart with added cost on the vertical axis and added health on the horizontal axis — where a point falls (upper-right, lower-right, etc.) tells you at a glance whether the new treatment is likely worth adopting.; Appears in: ICER and Net Monetary Benefit (NMB)
cost-effectiveness threshold: A benchmark dollar-per-QALY value set by a health system or payer (for example, roughly $100,000 to $150,000 per QALY in the US) above which a treatment is considered too expensive relative to its health benefit.; Appears in: Cost-Utility Analysis (CUA)
counterfactual elimination: The hypothetical scenario underlying PAF — what would happen to disease burden if the exposure were completely and instantly removed from the entire population; it is a mathematical thought experiment, not a realistic policy target.; Appears in: Attributable Risk and Population Attributable Fraction
counterfactual outcome: The outcome a patient would have experienced under a treatment they did not actually receive; g-computation predicts these unobserved values from a fitted model and averages them to estimate the population-level treatment effect.; Appears in: G-Computation and the Parametric G-Formula
counterfactual trend: The projected path the outcome rate would have followed after the intervention if the intervention had never occurred, estimated by extending the pre-period straight line forward in time.; Appears in: Interrupted Time Series (Segmented Regression)
counting-process interval: One row in the data table representing a stretch of follow-up time during which a patient's exposure status did not change, written as a start day and a stop day.; Appears in: Cox Regression with Time-Dependent Covariates
covariance: A measure of how much two variables change together; if X tends to be above its average at the same time Y is above its average, the covariance is positive.; Appears in: Pearson and Spearman Correlation
covariance structure: A mathematical description of how closely related a patient's measurements are to each other across visits; the unstructured form makes no assumptions about the pattern of those relationships.; Appears in: Mixed Model for Repeated Measures (MMRM) in RWE
covariate: Any baseline characteristic you compare across groups, such as age or whether the patient has diabetes.; Appears in: Baseline Characteristics and Covariate Balance
covariate adjustment: Including patient characteristics — age, sex, prior diagnoses — in the model so that the treatment comparison accounts for the fact that the two groups may differ on those traits.; Appears in: Logistic Regression for Binary Outcomes
covariate balance: How similar the two treatment arms look on measured patient characteristics — good balance means the groups are comparable before you look at outcomes.; Appears in: Propensity Score Methods (PSM, IPTW)
coverage gap: A span of days when a patient's health-plan enrollment has lapsed, so any care they receive during that stretch does not appear in the claims data.; Appears in: Continuous Enrollment and Observable Time
covered day: A calendar day on which the patient had at least one fill's supply still on hand.; Appears in: Proportion of Days Covered (PDC)
covered-day interval: The calendar range from the fill date through fill date plus days_supply minus one, representing the days a patient theoretically had medication in hand.; Appears in: Pregnancy Exposure Window
covered_through: The last date a patient still has pills on hand from a single fill, calculated as fill_date plus days_supply minus one day.; Appears in: Restart, Rechallenge, and New-Episode Rules
CPT/HCPCS codes: Five-character billing codes that physicians and outpatient facilities use to report procedures they performed, such as CPT 27447 for a total knee replacement.; Appears in: Procedure Identification and Measurement in Claims and EHR
credible interval: A Bayesian uncertainty range [a, b] such that the posterior probability the true parameter falls inside it equals the stated coverage (e.g., 95%); unlike a confidence interval, this is a direct probability statement about the parameter, not about a repeated-sampling procedure.; Appears in: Bayesian Inference Foundations
criterion: One of the things that matters in the decision - for example efficacy, safety, or unmet need - that each option gets scored on.; Appears in: Multi-Criteria Decision Analysis (MCDA)
critical boundary: A threshold on the LLR, worked out in advance, that the running test must cross to declare a signal.; Appears in: MaxSPRT and Sequential Safety Surveillance
Cronbach's alpha: A number between 0 and 1 that tells you how consistently the individual questions on a scale hang together — values around 0.70 to 0.90 are typically considered acceptable for a multi-item questionnaire.; Appears in: PRO Instrument Validation
cross-validated lambda: The penalty strength chosen by testing many lambda values on held-out folds and selecting the one with the lowest prediction error; lambda.1se is the parsimonious choice (largest lambda within one standard error of the minimum) preferred in RWE for interpretability.; Appears in: Regularized Regression: LASSO, Ridge, and Elastic Net
cumulative dose: The running total of drug received from the first fill up to (but not including) a given day, typically expressed in milligrams; it grows each time a new fill is taken.; Appears in: Time-Updated Exposures and Cumulative Dose
cumulative incidence: The share of the starting group that develops the outcome over the follow-up period, found by dividing the number of people with the event by the number of people you started with.; Appears in: Prospective Cohort Study
cumulative incidence function: The actual probability that a patient experiences the outcome of interest before any competing event by a given point in time; this is the honest, decision-relevant number reported in health technology assessments.; Appears in: Competing Risks (Cause-Specific Hazard, Cumulative Incidence, and Fine-Gray)
cumulative-incidence curve: An alternative to the Kaplan-Meier curve used when patients can also die or switch treatment before the event of interest occurs; it shows the true probability of the event in that real-world context.; Appears in: Visualizations and Diagrams in Pharmacoepidemiology and RWE
cure fraction (pi): The proportion of patients who will never experience the event of interest, no matter how long they are followed — typically estimated through a logistic regression model on baseline patient characteristics.; Appears in: Cure Models (Mixture and Non-Mixture)
current use: A yes/no flag for whether an active prescription supply covers the current day, based on fill date plus days of supply from the pharmacy record.; Appears in: Time-Updated Exposures and Cumulative Dose
cut (on the tree): A node of the tree taken together with everything beneath it - either one specific code (a leaf) or a whole category summing all its sub-codes (a branch).; Appears in: Tree-Based Scan Statistics (TreeScan)
cutoff: The fixed threshold on the running variable where treatment status changes — patients on one side are treated (or much more likely to be treated) and patients on the other side are not.; Appears in: Regression Discontinuity Design
cycle: The fixed unit of time the model advances in each step, for example one month or three months, after which every patient is re-assigned to a health state.; Appears in: Markov Transition Probabilities from Real-World Data

D

DAG: Short for directed acyclic graph — a box-and-arrow diagram drawn before looking at any data that maps out which variables cause which others, so the analyst knows which ones must be adjusted for.; Appears in: Visualizations and Diagrams in Pharmacoepidemiology and RWE
data leakage: Accidentally letting information that would not be available at prediction time — or from the same patient's other records — flow into the training process, making the model look far better than it really is.; Appears in: Cross-Validation, Overfitting, and Optimism
data partner: One of the independent databases participating in the network (for example, a commercial insurance plan or a hospital system), each contributing its own patient population.; Appears in: Multi-Database / Distributed Network Study
data provenance: The documented record of where a dataset came from, how it was processed, and whether it can actually observe the events the study needs to measure — for example, can these claims data see every hospital visit, or are some visits missing because of plan-type gaps?; Appears in: Regulatory and HTA Readiness for RWE
DAW code: A one-digit Dispense-as-Written code that records whether a brand-name drug was dispensed because the prescriber required it (DAW = 1), the patient requested it (DAW = 2), or a generic was substituted as allowed (DAW = 0).; Appears in: NCPDP Pharmacy Claim Fields
days_supply: The pharmacist-recorded number of days the dispensed quantity is intended to last (for example, 30 for a one-month supply of a once-daily tablet); the single most important field for computing medication adherence metrics.; Appears in: Active Comparator, New-User Design, Administrative Claims Analysis, As-Treated Risk Window Construction, Case-Case-Time-Control Design, Case-Time-Control Design, Dose Titration / Up-Titration to Target Dose, Drug Utilization Study, Exposure Episode Construction, Grace Period and Permissible Gap Rules, Infused Biologic Administration Capture, Inpatient Bridging of Drug Exposure, Medication Possession Ratio (MPR), NCPDP Pharmacy Claim Fields, NDC (National Drug Code), New-User (Incident-User) Design, OMOP Drug Exposure and Drug Era, Persistence Time to Discontinuation, Pregnancy Exposure Window, Prevalent User Bias, Proportion of Days Covered (PDC), Restart, Rechallenge, and New-Episode Rules, Route-of-Administration Differences in RWE, Stockpiling and Carryover Rules, Switch, Add-On, and Augmentation Rules, Time Zero (Index Date) Alignment, Washout / Clean / Lookback Period
DDD (Defined Daily Dose): The WHO's assumed average maintenance dose per day for a drug used for its main indication in adults — a measurement unit for comparing drug volumes across countries, not a recommended prescribing dose.; Appears in: ATC Classification and Defined Daily Dose (DDD)
DDD/1000 inhabitants/day: The standard population-level utilization metric: total DDDs dispensed in a period divided by the number of people in the population times the number of days, then multiplied by 1000 — expressing how many of every 1000 people would be on the drug if each took exactly one DDD per day.; Appears in: ATC Classification and Defined Daily Dose (DDD)
deactivation: When CMS removes an NPI from active status because the provider retired, died, or no longer meets enrollment criteria; deactivated NPIs are published in a separate file and should not be treated as valid active providers.; Appears in: NPI (National Provider Identifier)
Death Master File (DMF) / Social Security Death Index (SSDI): A Social Security Administration file listing people whose deaths were reported to Social Security; it covers most U.S. deaths but has missed a growing share of state-reported deaths since 2011.; Appears in: Mortality Source Hierarchy
death-index linkage: Connecting a claims or EHR database to an external death registry (such as the National Death Index) so that the cause and date of a patient's death can be confirmed — without this, fatal outcomes cannot be reliably counted.; Appears in: Fit-for-Purpose Data Assessment
dechallenge: Stopping a drug because the patient had a bad reaction or side effect suspected to be caused by that drug.; Appears in: Case Report, Restart, Rechallenge, and New-Episode Rules
decision model: A mathematical structure that simulates what happens to a group of patients over time under different treatment options, using estimated rates, costs, and quality-of-life values as inputs.; Appears in: Health Economic Modeling Methods Using RWE
decision tree: A prediction model that sorts patients into groups by answering a series of yes/no questions about their features (age over 65? prior hospitalization? taking drug X?), ending in a leaf that gives a predicted outcome.; Appears in: Tree-Based Ensembles: Random Forests and Gradient Boosting
deduplication: The step where an analyst collapses multiple billing rows that all describe the same single procedure into one counted event, so the procedure is not counted twice.; Appears in: Procedure Identification and Measurement in Claims and EHR
deduplication window: A rule that collapses multiple claims for the same acute episode (for example, a hospital stay, a transfer, and several follow-up office visits all coded as the same heart attack) into a single event so it is not counted more than once.; Appears in: Acute Event Deduplication Window, Composite Endpoint Construction
deficit-accumulation index: A way of scoring frailty as the fraction of a long list of possible health problems a person actually has, landing them on a 0-to-1 scale (the Rockwood approach Kim's index uses).; Appears in: Claims-Based Frailty Index (Faurot / Kim)
define.xml: A machine-readable file required in every FDA submission that documents every dataset name, every variable, its allowable values, and the rule used to derive it — the metadata catalog that lets a reviewer understand and verify the data without asking the sponsor.; Appears in: CDISC Standards (SDTM/ADaM) for RWE Submissions
degrees of freedom: The number of independent pieces of information in the data used to estimate the variance; for a two-sample t-test with equal group sizes of n, df = 2n - 2, and it governs how wide the t-distribution is (wider at low df, approaching normal at high df).; Appears in: Chi-Square Test of Independence, Paired t-Test, Two-Sample (Student's) t-Test
degrees of freedom (df): The number of free parameters a spline adds to a regression model; a k-knot RCS uses k minus 2 degrees of freedom beyond a linear term.; Appears in: Splines and Flexible Functional Forms
delivery claim: A billing record submitted to the insurance plan when a hospital provides labor and delivery services, used to identify the date a pregnancy ended and to anchor the search for a matching newborn record.; Appears in: Mother-Infant Linkage
denied claim: A claim the payer adjudicated and refused to pay, resulting in a paid amount of zero and a claim adjustment reason code; the service may still have occurred even though no reimbursement was made.; Appears in: Claim Adjustments, Reversals, and Denials
denominator: The total observed time or number of people at risk that sits beneath the event count in a rate calculation; getting this right is the most common challenge in real-world data.; Appears in: Cascade of Care Analysis, Case Report, Descriptive Epidemiology in RWE, Prevalence (Point, Period, and Annual) in RWE
depletion of susceptibles: As a study continues, the people most likely to have the event fail first and leave the risk pool, so the groups being compared gradually become non-comparable — this makes the hazard ratio's meaning shift over time even in a perfectly run trial.; Appears in: Prevalent New-User Design, Prevalent User Bias, The Hazard Ratio as an Effect Measure
descendant expansion: A flag you set on an ancestor concept that automatically pulls in all more-specific child codes beneath it, so defining 'ACE inhibitor' at the drug-class level inherits every individual ACE-inhibitor product automatically.; Appears in: OMOP Concept Set Development
description: A human-readable label attached to a SNOMED CT concept — every concept has one official Fully Specified Name plus one or more synonyms and preferred terms, all of which resolve to the same concept ID.; Appears in: SNOMED CT Clinical Terminology
descriptive study: A study that summarizes and counts what is happening, without trying to prove that one thing caused another.; Appears in: Drug Utilization Study
descriptor amount: The official quantity stated in a HCPCS code's written description that defines how much drug one billing unit represents (for example, J9271 says "per 1 mg," so one billing unit equals 1 mg of pembrolizumab).; Appears in: HCPCS Level II Codes and J-Codes
design effect: The factor by which the required sample size grows because outcomes are correlated within clusters; a design effect of 2.0 means you need twice as many patients as an individually randomized trial of the same size would need.; Appears in: Cluster-Randomized Trial, Survey Weights and Complex Sampling
detection sensitivity: The probability that a person who truly has the condition will have it found and coded during a given period of observation — higher surveillance means a higher chance of detecting existing disease, not necessarily more disease.; Appears in: Surveillance and Detection Bias
deterministic linkage: Joining records by requiring an exact match on one or more identifiers (such as a Social Security number or a precise combination of date of birth, sex, and ZIP code) — fast and transparent, but any typo or changed identifier causes a true pair to be missed.; Appears in: Linked Multi-Database Study (Record Linkage), Tokenization and Privacy-Preserving Record Linkage
deviance: The GLM measure of model fit, analogous to residual sum of squares in OLS; a residual deviance divided by degrees of freedom much larger than 1.0 signals that the assumed family underestimates the real variability in the data.; Appears in: Generalized Linear Models (GLM)
diagnosis pointer: A letter (A through L) in box 24E of each service line that points to one or more of the up to 12 header diagnoses, indicating which diagnosis explains why that specific procedure was performed.; Appears in: CMS-1500 / 837P Professional Claim Fields
diagnosis position: The ranked slot a code occupies on a claim — the principal (first) position means the provider treated that condition as the main reason for the visit, while secondary positions are supporting or incidental findings.; Appears in: Outcome Algorithm Construction, Safety Signal Case Definition
diagnostic delay: The gap in time between when a disease actually begins in the body and when it is recognized and recorded as a diagnosis in the medical record; the longer this gap, the larger the protopathic window.; Appears in: Protopathic Bias and Reverse Causation
dimension: One category of claims codes that hdPS searches separately -- for example, inpatient diagnoses, outpatient diagnoses, filled prescriptions, and procedures are each their own dimension.; Appears in: High-Dimensional Propensity Score (hdPS)
direct effect: The portion of a drug's total effect on the outcome that does NOT pass through the mediator, sometimes called the natural direct effect (NDE).; Appears in: Causal Mediation and Effect Modification
direct medical costs: Dollars paid for health care services — hospital stays, doctor visits, lab tests, and prescription drugs — that appear as claims in an insurance database.; Appears in: Burden of Disease and Cost-of-Illness (COI) Studies, Cost-of-Illness (COI) Study
direction-of-bias reasoning: Working out in advance whether confounding by indication would make a drug look more harmful or more beneficial than it truly is, based on whether the indication itself increases or decreases the risk of the outcome.; Appears in: Confounding by Indication and Channeling Bias
directly standardized rate (DSR): The single summary rate you get after weighting a population's own age-specific rates by the standard population's age distribution and summing the results.; Appears in: Direct Standardization
Discharge status: The two-digit code in FL17 recording where the patient went at the end of the stay (e.g., 01 = home, 02 = transferred to another acute hospital, 20 = died during stay, 30 = still hospitalized); drives in-hospital death ascertainment, transfer-chain linkage, and readmission denominators.; Appears in: Hospitalization and Transfer Collapse, UB-04 / 837I Institutional Claim Fields
discharge status code: A code on a hospital billing record that says how a patient left the hospital — including a specific code meaning the patient died during that stay.; Appears in: Mortality Source Hierarchy
discontinuation: The point at which an analyst declares a patient has stopped treatment, defined here as failing to receive the next infusion before the grace period expires.; Appears in: Grace Period and Permissible Gap Rules, Infused Biologic Administration Capture
discontinuation date: The day an analyst officially records that a patient stopped — set to the day after their supply ran out, once the permissible gap has been confirmed by no refill arriving in time.; Appears in: Persistence Time to Discontinuation
discontinuity: The sudden jump in outcome rates at the cutoff — the size of that jump is the estimated treatment effect, because everything else about patients at the threshold is assumed to vary smoothly.; Appears in: Regression Discontinuity Design
discordance: When two or more reviews on the same question reach different conclusions, often because they used different time windows, different inclusion rules, or different statistical approaches.; Appears in: Umbrella Review (Review of Systematic Reviews)
discordant pairs: Patient pairs where the two conditions give different answers — one "yes" and one "no" — which are the only pairs the test uses; pairs where both conditions agree (both "yes" or both "no") are ignored because they reveal nothing about a proportion change.; Appears in: McNemar's Test for Paired Proportions
discount rate: The annual percentage used to shrink a future dollar (or health gain) down to what it is worth today — for example, the US reference-case rate is 3% per year.; Appears in: Discounting of Costs and Effects in Economic Evaluation
discrepancy resolution log: A formal document recording every difference found between the two programmers' outputs, the root cause traced to the data, and the agreed resolution — it is a required audit artifact for regulatory deliverables.; Appears in: QC, Double Programming, and Reproducible Analysis
discrete choice experiment (DCE): A survey method where respondents repeatedly choose their preferred option from sets of two or three treatment profiles that differ on defined features, and the pattern of choices is used to calculate how much each feature is worth to them.; Appears in: Patient Preference Study (DCE / BWS)
discrete-event simulation: A computer model that advances each virtual patient through a series of health events (e.g., progression, hospitalization, death) one at a time rather than in fixed monthly or yearly steps.; Appears in: Discrete-Event Simulation Using RWE Inputs
discrimination: How well the model separates patients who will have the outcome from those who will not — measured by the AUC (area under the ROC curve), where 0.5 means no better than a coin flip and 1.0 means perfect separation.; Appears in: Prediction Model Validation and Recalibration in RWE, ROC Curves, AUC, and the c-statistic
disease risk score: A patient's model-predicted chance of having the outcome based on their baseline characteristics, before accounting for the treatment being studied.; Appears in: Disease Risk Scores
disease-attributable costs: The portion of a patient's total spending that can be linked to a specific condition, either by identifying claims that carry that condition's diagnosis codes or by measuring how much more patients with the disease spent compared to similar patients without it.; Appears in: All-Cause vs Attributable Costs
disenrollment censoring: Removing a patient from a study's follow-up period because they left their insurance plan (e.g., changed jobs or switched insurer) — a major source of informative censoring in US commercial claims that does not occur in single-payer national registries where coverage is continuous until death.; Appears in: International Real-World Data Sources
dispensing record: The pharmacy's record that a patient actually received (was given) the drug; this is what shows up as a pharmacy claim in insurance data.; Appears in: Primary Non-Adherence and Treatment Initiation
dispersion parameter: A number (called alpha) added to the negative binomial model to capture how much more spread the counts have than a Poisson distribution would predict; when alpha equals zero the negative binomial is identical to Poisson.; Appears in: Negative Binomial Distribution for Overdispersed Counts
dispersion parameter (alpha): The extra term negative binomial adds to model how much extra spread exists in the counts beyond what Poisson allows; when alpha equals zero, negative binomial and Poisson give the same answer.; Appears in: Poisson and Negative Binomial Count Models for HCRU and Utilization
disproportionality: A reporting imbalance: the drug-event pair appears in the database more often than its individual frequencies would predict if the two were unrelated.; Appears in: Signal Detection (Disproportionality Analysis)
distributed analysis: An approach where each site runs the study code on its own data and returns only summary numbers to the coordinating team — no individual patient rows ever leave the site's own servers.; Appears in: Multi-Database / Distributed Network Study
distribution: The full pattern of values a variable takes and how frequently each value occurs; visualized with a histogram or boxplot before deciding which summary statistics are appropriate.; Appears in: Descriptive Statistics
domain: The category that determines which CDM table a concept lands in — Condition, Drug, Measurement, Procedure, or Observation — sometimes in a different table than a researcher would expect from the source code.; Appears in: OMOP Standardized Vocabularies (OHDSI/Athena)
donor pool: The set of untreated comparison regions (states, health systems, countries) whose data are used to build the synthetic control; every unit in this pool must be unaffected by the intervention.; Appears in: Synthetic Control Method
dose intensity: Cumulative dose divided by the length of time the patient was on the drug, giving the average daily dose actually received over the treatment period.; Appears in: Dose Titration / Up-Titration to Target Dose, Time-Updated Exposures and Cumulative Dose
dose normalization: Adjusting a drug dose by a child's body weight (e.g., mg per kg) so that exposure can be compared fairly across children of different sizes.; Appears in: Special Populations RWE Methods
dose-completion: A way to measure adherence for clinician-administered drugs by counting how many of the scheduled injections or infusions a patient actually received within a reasonable time window around each due date.; Appears in: Route-of-Administration Differences in RWE
dose-response curve: A graph showing the predicted outcome (e.g., log-odds ratio of death) across the full range of a continuous exposure, with one reference value set to zero; this is the correct output of a spline model, not the individual spline coefficients.; Appears in: Splines and Flexible Functional Forms
dosing interval: The number of days the drug label specifies between planned infusion appointments; for infliximab maintenance the interval is 56 days (every 8 weeks).; Appears in: Infused Biologic Administration Capture
double programming: A QC method where a second analyst independently writes code to reproduce the same results as the primary programmer, without seeing the first analyst's code, so that any discrepancy surfaces a real error or SAP ambiguity.; Appears in: QC, Double Programming, and Reproducible Analysis
doubly robust: A property of certain statistical methods meaning the final estimate is still approximately correct as long as at least one of the two required models (the outcome model or the treatment model) is a reasonable approximation of reality.; Appears in: Targeted Maximum Likelihood Estimation (TMLE)
drug era: A pre-built continuous treatment episode the CDM creates by collapsing individual dispensing records into one stretch of time, using a default 30-day gap rule to decide when one treatment episode ends and another begins.; Appears in: OMOP CDM Method Patterns for RWE
drug strength: The amount of active drug in each unit dispensed, for example 5 mg per tablet or 2 mg per mL of a liquid suspension.; Appears in: Pediatric Dose Normalization
DRUG_ERA: A derived OMOP table that merges consecutive fills of the same active ingredient into one continuous-treatment episode, provided the gap between fills does not exceed the persistence threshold.; Appears in: OMOP Drug Exposure and Drug Era
DRUG_EXPOSURE: The raw OMOP table that stores one row for every single prescription fill or drug administration, recording the drug, fill date, and how many days of supply were dispensed.; Appears in: OMOP Drug Exposure and Drug Era
dual independent screening: Two reviewers decide separately whether each study should be kept or dropped, then compare, so one person's slip is less likely to lose a relevant study.; Appears in: Scoping Review

E

E code: An ICD-9-CM supplementary code beginning with the letter E (E000-E999) that records the external cause of an injury or poisoning, such as a fall or a motor vehicle accident; important for injury research and safety studies.; Appears in: ICD-9-CM Legacy Diagnosis and Procedure Codes
e-prescription (e-Rx): A medication order that a doctor sends electronically to a pharmacy; it creates a digital record that a drug was prescribed, even if the patient never picks it up.; Appears in: Primary Non-Adherence and Treatment Initiation
E-value: The minimum association strength (expressed as a risk-ratio-like number) that an unmeasured confounder would need to have with both the exposure and the outcome simultaneously to fully explain away the observed result; a higher E-value means the result is harder to dismiss.; Appears in: Quantitative Bias Analysis Toolkit, Unmeasured Confounding Probabilistic Bias Analysis
early refill: When a patient picks up the next bottle of medication before the current bottle is used up, creating a period where two fills are 'on hand' at the same time.; Appears in: Medication Possession Ratio (MPR)
ecological fallacy: The error that occurs when you assume every person in a group (such as a census tract) shares the average characteristic of that group, when in reality individuals within the same neighborhood can differ widely.; Appears in: Ecological (Aggregate) Study, Social Determinants of Health (SDoH) in RWE
effect estimate: A number summarizing how much a treatment or exposure changes the outcome in one study, such as a hazard ratio of 0.80 meaning a 20% lower rate in the treated group.; Appears in: Meta-Analysis of Observational Studies
effect modification: When the size or direction of a drug's effect differs across subgroups defined by a characteristic that was measured before treatment started, such as baseline BMI category or age group.; Appears in: Causal Mediation and Effect Modification
effect modifier: A factor that changes the size or direction of a treatment effect in different subgroups — for example, if a digital medication reminder only helps patients who have stable housing, then housing stability is an effect modifier.; Appears in: Generalizability, Transportability, and External Validity, MAIC and STC: Population-Adjusted Indirect Comparisons, Social Determinants of Health (SDoH) in RWE
effective population: All the patients whose treatment the decision will determine over the years it stays in force, counted with future years weighted slightly less.; Appears in: Value of Information Analysis (EVPI, EVPPI, EVSI)
effective sample size: The number of patients whose data actually carry statistical information after adjustments and weighting are applied; often substantially smaller than the raw patient count.; Appears in: MAIC and STC: Population-Adjusted Indirect Comparisons, Sample Size, Power, and Precision in RWE
effective sample size (ESS): A summary measure of how much statistical information the weighted analysis retains, computed as the squared sum of weights divided by the sum of squared weights; an ESS well below the actual sample size indicates that a few patients with extreme weights are dominating the analysis.; Appears in: Bayesian Borrowing from Historical / External Controls, Overlap Weights and Modern Propensity Weighting
effectiveness: How well a treatment works under ordinary real-world conditions, as opposed to efficacy, which is how well it works under ideal, tightly controlled conditions.; Appears in: Pragmatic Trial
effectiveness vs efficacy: Efficacy is how well a drug works under the strict, controlled conditions of a clinical trial; effectiveness is how well it works in routine clinical practice with real patients.; Appears in: Comparative Effectiveness Research (CER) Methods
electronic health record (EHR): The clinical record a care team builds during visits — orders, encounters, labs, vitals, and notes — rather than the bills a health plan processes.; Appears in: EHR-Based Study
eligibility criterion: A rule that a patient must meet to be included in the study — for example, having a specific diagnosis, being a certain age, or having uninterrupted insurance coverage during the observation window.; Appears in: Database Feasibility Assessment and Attrition Funnel
eligibility frame: The complete list of registry patients who meet the study entry criteria at the moment of their procedure or visit — the registry provides this list automatically, with no separate screening step.; Appears in: Registry-Based Randomized Controlled Trial (RRCT)
eligibility mirroring: The process of applying the same patient inclusion and exclusion rules used in the trial to the external database, so that the external patients are as similar as possible to the trial patients before any statistical adjustment.; Appears in: Rare Disease External Controls
eligible population: The count of covered members who could appropriately receive the new drug, based on their diagnosis and treatment history.; Appears in: Budget Impact Analysis
empirical calibration: A technique that uses results from many negative control outcomes to estimate how much systematic error a study design typically produces, then mathematically adjusts the main result to account for that error.; Appears in: Negative Control Outcomes
empirical null distribution: The bell-curve-shaped pattern of where negative-control estimates actually land across many known-null pairs; if they cluster above or below zero, the study pipeline is carrying systematic bias that can be measured and corrected.; Appears in: Empirical Calibration with Negative Controls
emulation: Using observational data (such as insurance claims or EHR records) to mimic each component of the target trial protocol as faithfully as the data allow.; Appears in: Target Trial Emulation
ENCePP: The European Network of Centres for Pharmacoepidemiology and Pharmacovigilance — a public registry where PASS protocols are listed so the study is transparent before results are known.; Appears in: Voluntary (Non-Imposed) Post-Authorisation Safety Study
encounter: A recorded contact with the health system, such as an office visit or hospital stay; the EHR only adds data when an encounter happens.; Appears in: EHR-Based Study
Encounter record: A report submitted by a Medicare Advantage plan to document a service that was provided, as opposed to a fee-for-service claim that triggers a payment.; Appears in: Medicare FFS vs Medicare Advantage vs Commercial Claims Differences
enrollment date: The day a patient joins the registry at a site after meeting the disease criteria; it serves as the patient's day-zero for follow-up.; Appears in: Disease Registry
enrollment span: A continuous block of time — with no gaps, or only very short bridged gaps — during which a patient holds active health-plan coverage with both medical and pharmacy benefits.; Appears in: Administrative Claims Analysis, Continuous Enrollment and Observable Time
entity: The individual virtual patient being tracked in the simulation, who carries their own profile of characteristics such as age, treatment history, and past side effects.; Appears in: Discrete-Event Simulation Using RWE Inputs
entropy: A summary statistic ranging from 0 to 1 that measures how cleanly the model separates patients into classes — values at or above 0.80 mean most patients are assigned to one class with high confidence, while low values mean assignments are uncertain.; Appears in: Group-Based Trajectory Models and Latent Class Analysis
entropy balancing: An alternative weighting approach that finds the set of weights closest to uniform while exactly satisfying user-specified covariate balance constraints, without estimating a propensity score model as an intermediate step.; Appears in: Overlap Weights and Modern Propensity Weighting
environment pinning: Recording the exact version numbers of every software package used in an analysis so that a re-run in the future uses identical software and produces the same numerical results.; Appears in: QC, Double Programming, and Reproducible Analysis
episode: A single occurrence of a medical event — for example, one heart attack — treated as one unit for counting purposes, even if multiple records in the data describe it.; Appears in: Acute Event Deduplication Window, As-Treated Risk Window Construction, Hospitalization and Transfer Collapse
EQ-5D: A short questionnaire that asks patients to rate five health dimensions (mobility, self-care, usual activities, pain, anxiety), then uses population survey data to convert those responses into a single utility score on the 0-to-1 scale.; Appears in: Health-Related Quality of Life (HRQoL) Measurement, QALY Utility Mapping (Crosswalking to Health-State Utilities)
equally distributed equivalent (EDE): The single equal level of health that would be judged exactly as good as the actual unequal spread of health across groups - lower than the average when health is unequal.; Appears in: Distributional Cost-Effectiveness Analysis
equidispersion: The defining property of the Poisson distribution that the mean and variance are equal (both equal λ); when the actual data show variance larger than the mean (overdispersion), a negative binomial model is the correct fix.; Appears in: Poisson Distribution for Counts and Rates
equity subgroup: A slice of the population defined by a fairness-relevant characteristic such as area deprivation, race or ethnicity, or rural versus urban location.; Appears in: Distributional Cost-Effectiveness Analysis
equivalence assumption: The pre-condition for this method: a prior conclusion that the two treatments produce the same health outcomes, so cost is the only thing left to compare.; Appears in: Cost-minimization Analysis (CMA)
Era-aware code dispatch: The programming practice of applying ICD-9-CM code lists to claims with service dates before 1 October 2015 and ICD-10-CM code lists to claims on or after that date, ensuring the correct coding system is used for each record.; Appears in: ICD-9-CM Legacy Diagnosis and Procedure Codes
error distribution: The statistical family assumed for the variability in log-event-time after removing covariate effects; the choice (Weibull, log-normal, log-logistic, generalized gamma) determines the shape of the hazard over time and affects how the model extrapolates beyond the observed data.; Appears in: Accelerated Failure Time (AFT) Models
estimand: The exact quantity a study is designed to measure — it names the patient population, the two treatment options being compared, the outcome, and how events like death or switching drugs are handled, all locked before analysis begins.; Appears in: Estimands (ATE/ATT) and Intercurrent Events in RWE, Imposed Post-Authorisation Safety Study (PASS), Missing Data, Trimming, and Winsorization in RWE, Regulatory and HTA Readiness for RWE, Study Protocol and SAP Elements for RWE
estimand drift: The silent substitution that occurs when the code or data rules actually implement a different question than the one the study pre-specified, without anyone noticing.; Appears in: Estimand-to-Analysis Traceability
estimating equation: A formula set equal to zero whose solution is the treatment effect estimate; in g-estimation it says that once the right treatment effect is removed from the outcome, the residual outcome should be uncorrelated with the treatment decision.; Appears in: G-Estimation of Structural Nested Models
ETL process: The Extract-Transform-Load pipeline a data vendor runs to turn raw source records (hospital systems, pharmacy claims) into the research database an analyst receives; undocumented or unstable ETL is a reliability red flag.; Appears in: Fit-for-Purpose Data Assessment
EU CTR 536/2014: The European Union Clinical Trial Regulation that makes it mandatory for sponsors to publish a plain-language lay summary of clinical trial results in the EU trial register within 12 months of the trial ending, and requires that summary to be non-promotional.; Appears in: Plain-Language Summaries of Evidence
Evaluation and Management (E/M) visit: A billed office or outpatient encounter between a clinician and patient; E/M visits make up the largest single category of CPT codes and are the primary way researchers count outpatient contact events in claims data.; Appears in: CPT Codes (HCPCS Level I)
event: A point in simulated time when something clinically meaningful happens to a patient, such as disease progression, a treatment switch, an adverse reaction, or death.; Appears in: Discrete-Event Simulation Using RWE Inputs, Restricted Mean Survival Time (RMST)
event indicator: The 1-or-0 number assigned to each patient at the end of their follow-up: 1 if the event was actually observed, and 0 if the patient left observation before the event happened.; Appears in: Censoring: Types, Mechanisms, and Informativeness
event study: A way of plotting the DiD effect at each point in time before and after the policy; pre-period estimates should be near zero (confirming parallel trends), and post-period estimates reveal when and how large the effect was.; Appears in: Difference-in-Differences with Staggered Adoption
events per variable (EPV): The number of outcome events in the data divided by the number of free parameters in the model; the rule of thumb is at least 10 events per degree of freedom to avoid overfitting.; Appears in: Splines and Flexible Functional Forms
events-per-variable (EPV): A rough check of whether a model has enough outcome events to reliably estimate each predictor; fewer than about 10 events per predictor is the warning zone where Firth correction is often needed.; Appears in: Firth Penalized Regression
evidence map: A picture (often a grid or bubble chart) showing how many studies fall into each combination of population, design, and outcome, so you can see at a glance what's well-covered and what's blank.; Appears in: Scoping Review, Umbrella Review (Review of Systematic Reviews)
exact p-value: A p-value computed by adding up the true probabilities of every table as extreme or more extreme than the one observed, rather than using a bell-curve approximation; "exact" means the stated alpha level is never exceeded, even in tiny samples.; Appears in: Fisher's Exact Test
exact Poisson CI: A confidence interval for a rate built directly from the Poisson distribution using chi-squared quantiles rather than a normal approximation; essential when the observed event count is small (fewer than about 30) because the normal approximation is inaccurate for rare events.; Appears in: Poisson Distribution for Counts and Rates
exact test: A version of a statistical test (like Fisher's exact test) that computes the p-value by counting all possible arrangements of the data rather than using a mathematical approximation, making it reliable even in very small samples.; Appears in: Exact and Penalized-Likelihood Methods for Sparse Data, Parametric and Nonparametric Tests
excess hazard: The cancer-attributable portion of a patient's death rate at any point in time; it equals the total observed death rate minus the death rate expected from the general population at that age and sex, and is what excess hazard regression models as a function of stage, grade, and treatment.; Appears in: Relative and Net Survival
exclusion restriction: The rule that the instrument can only affect the outcome by changing treatment, with no back-door path of its own — if a policy that shifts prescribing also changes copays or monitoring, this rule is broken.; Appears in: Instrumental Variables in Pharmacoepidemiology
expected agreement by chance (Pe): The proportion of cases where two raters would agree purely by chance, computed from each rater's marginal calling rates; kappa subtracts this baseline before computing the agreement ratio.; Appears in: Agreement Statistics: Kappa, ICC, and Bland-Altman
expected count: How many events you would see so far if the product caused no extra risk - usually a background rate times the amount of exposed follow-up time accrued.; Appears in: MaxSPRT and Sequential Safety Surveillance
expected counts: How many events you would predict at each code if the drug had no effect, usually taken from a comparison group or from each person's own non-exposed time.; Appears in: Tree-Based Scan Statistics (TreeScan)
expected events: The number of events predicted to occur in the study group if, and only if, the reference population's rates had applied — calculated by multiplying each stratum's person-time by the matching reference rate and summing.; Appears in: Indirect Standardization, SMR, and SIR
expected events (E): The number of events the log-rank test predicts a group would have had if both groups had identical risk at every point in time; calculated as the group's share of the risk pool multiplied by the total events at each event time.; Appears in: Log-Rank Test
expected survival: The survival probability that a patient would have had based only on their age, sex, and calendar year -- obtained from a general-population life table -- as if they had never been diagnosed with cancer.; Appears in: Relative and Net Survival
expected-cell-count rule: The practical guideline that chi-square is only a reliable approximation when every cell has an expected count of at least 5; when this fails, switch to Fisher's exact test.; Appears in: Chi-Square Test of Independence
exposure: The thing you suspect might cause the outcome, like having taken a particular drug before getting sick.; Appears in: Case-Control Study Design
exposure episode: A continuous stretch of time during which a patient is treated as being on the drug, built by stitching together consecutive fills that arrive close enough together to satisfy the grace period rule.; Appears in: Grace Period and Permissible Gap Rules, Switch, Add-On, and Augmentation Rules
exposure prevalence: The proportion of the population that is exposed to the risk factor; a higher prevalence means the same relative risk translates to a larger PAF, which is why PAFs are population-specific and cannot be transferred across populations without re-weighting.; Appears in: Attributable Risk and Population Attributable Fraction
exposure-anchored enrollment: Patients join the registry because they received the specific drug or device being studied, not because of their diagnosis or condition.; Appears in: Product/Exposure Registry
exposure-time trend: A gradual background change in how often people in general are taking a drug over months or years; if this trend exists, the more-recent hazard window will look more exposed than earlier referent windows for reasons unrelated to the event, which would bias the result.; Appears in: Case-Case-Time-Control Design, Case-Crossover Design, Case-Time-Control Design
EXSTDTC: The SDTM EX domain variable for exposure start date in ISO 8601 format (YYYY-MM-DD); in RWE studies it is typically derived from the pharmacy claims fill_date, and the derivation rule must appear in define.xml.; Appears in: CDISC Standards (SDTM/ADaM) for RWE Submissions
external adjustment: Using bias parameters — numbers that describe how strong and how unevenly distributed the missing factor is — from a separate data source to correct an effect estimate in the main study.; Appears in: External Adjustment and Validation-Substudy Bias Correction
external control: A comparison group built from patients who are NOT enrolled in the trial, drawn instead from a registry, electronic health record database, or insurance claims, to stand in for the randomized control arm that could not be recruited.; Appears in: Rare Disease External Controls, Special Populations RWE Methods
external control arm: A comparison group assembled from real-world records or prior studies rather than from patients randomized alongside the treated group, used when randomizing to placebo in advanced cancer is considered unethical.; Appears in: Single-Arm Trial with External (Historical) Control, Therapeutic-Area-Specific RWE Challenges — Oncology
external validity: A broader term for whether a study finding applies outside the original study sample, covering both generalizability and transportability.; Appears in: Generalizability, Transportability, and External Validity
extrapolation: Projecting a mathematical curve into the future beyond the range of observed data — essentially predicting what the survival curve does after the trial ended.; Appears in: Survival Extrapolation for HTA Using RWE

F

F statistic: The ratio of between-group mean square to within-group mean square; a large F means the group averages are farther apart than random chance would expect given the scatter inside each group.; Appears in: One-Way ANOVA
facility claim: A billing record submitted by a hospital or inpatient facility that shows the patient's admission date, discharge date, and overall services — this is the data source used to identify and date a hospital stay.; Appears in: Hospitalization and Transfer Collapse, Inpatient Bridging of Drug Exposure, Procedure Identification and Measurement in Claims and EHR
false discovery rate (FDR): The expected proportion of rejected hypotheses that are actually false positives; the Benjamini-Hochberg procedure controls this at a preset level (usually 5% or 10%), allowing more discoveries than FWER methods while still limiting the fraction of errors.; Appears in: Multiplicity and Multiple Comparisons
false match: A linkage error in which two records from different people are incorrectly joined, potentially attaching the wrong outcome (such as another person's death date) to a study patient.; Appears in: Linked Multi-Database Study (Record Linkage)
false match and false miss: A false match links two different people as if they were one; a false miss leaves two records for the same person unlinked.; Appears in: Tokenization and Privacy-Preserving Record Linkage
false negative (FN): A person who truly has the condition but whom the test wrongly clears as negative, missing them.; Appears in: F1 Score, Precision, and Recall, Positive and Negative Predictive Value, Sensitivity and Specificity
false positive (FP): A person who truly does not have the condition but whom the test wrongly flags as positive, a false alarm.; Appears in: F1 Score, Precision, and Recall, Positive and Negative Predictive Value, Sensitivity and Specificity
falsification test: A check you run on a question whose answer you already know is null (zero effect), so that a wrong answer proves your method has a problem.; Appears in: Negative Control Exposures, Negative Control Outcomes
family: The assumed probability distribution for the outcome in a GLM; the family determines how variance grows with the mean (e.g., Binomial for 0/1 events, Gamma for right-skewed costs, Poisson for counts).; Appears in: Generalized Linear Models (GLM)
family or subscriber ID: A shared number that an insurance plan uses to group a parent and their dependents together under one policy, making it possible to find a newborn's enrollment record by looking for a new dependent added under the mother's account shortly after her delivery.; Appears in: Mother-Infant Linkage
familywise error rate (FWER): The probability of making at least one false positive rejection across all the hypotheses tested together; Bonferroni and Holm adjustments keep this at or below a preset level (usually 5%) regardless of how many tests are run.; Appears in: Multiplicity and Multiple Comparisons
feasibility assessment: An upfront check — done on summary counts before any detailed analysis — asking whether a database has enough patients, enough recorded drug use, and enough observable time to answer the study question.; Appears in: Database Feasibility Assessment and Attrition Funnel
federated analysis: An approach used when raw patient data cannot leave a site due to privacy rules: each site runs the identical analysis on its own data and sends back only the summary result (a number and its standard error), never the individual records.; Appears in: Federated and Distributed Network Analysis, Individual Participant Data (IPD) Meta-Analysis
FFS (fee-for-service): The part of Medicare (Parts A, B, and D) where each individual service is billed separately, making every drug administration and procedure visible as its own claim line; Medicare Advantage plans often do not produce these detailed line-item records.; Appears in: Therapeutic-Area-Specific RWE Challenges — Oncology
FHIR bulk export ($export): A FHIR operation that downloads data for an entire patient population at once as flat files with one record per line, rather than pulling records one patient at a time — the mode used for population-scale research cohort extraction.; Appears in: FHIR and Healthcare Interoperability for RWE
FHIR resource: A self-contained JSON document representing one clinical concept — one patient, one prescription, one lab result — that can be sent over the internet using standard web tools.; Appears in: FHIR and Healthcare Interoperability for RWE
field notes: The researcher's detailed written record of what they saw, heard, and did during each observation visit, which becomes the raw data to be analyzed.; Appears in: Ethnographic / Observational Qualitative Study
fill (dispensing): The pharmacy event where the drug is actually handed to the patient — the proof of treatment that an order alone does not give you.; Appears in: EHR-Based Study, Proportion of Days Covered (PDC)
fill date: The date a patient picked up (dispensed) a prescription at the pharmacy; this is the date recorded in insurance claims, not the date the doctor wrote the prescription.; Appears in: Medication Possession Ratio (MPR), Pregnancy Exposure Window
fill_date: The calendar date a pharmacy prescription was filled, the field in a claims table that usually marks when a patient actually started a drug.; Appears in: Case-Case-Time-Control Design, Drug Utilization Study, Exposure Episode Construction, Grace Period and Permissible Gap Rules, Inpatient Bridging of Drug Exposure, Prospective Cohort Study, Restart, Rechallenge, and New-Episode Rules, Stockpiling and Carryover Rules, Switch, Add-On, and Augmentation Rules
final-action claim: The resolved, authoritative version of a claim after all voids, replacements, and adjustments have been applied, leaving one definitive record per service event.; Appears in: Claim Adjustments, Reversals, and Denials
first-listed diagnosis: The diagnosis a physician lists first on a professional (doctor's office or clinic) billing form; there is no statutory rule governing which condition must go first, so it is not the same as the hospital principal diagnosis.; Appears in: Diagnosis Position, Type, and Qualifiers on Claims
Firth: A specific penalty, proposed by statistician David Firth in 1993, that uses information about how uncertain the data are (the Fisher information) to stabilize estimates under separation or very rare events.; Appears in: Firth Penalized Regression
Firth penalized regression: A modified version of logistic regression that adds a small mathematical penalty to prevent estimates from running off to infinity when zero cells or near-separation cause standard regression to fail.; Appears in: Exact and Penalized-Likelihood Methods for Sparse Data
Firth-penalized regression: A statistical method that produces reliable estimates when the outcome event is very rare or the study group is very small, where standard regression models break down.; Appears in: Special Populations RWE Methods
fiscal-year code version: The annual edition of the ICD-10-CM code set, effective each October 1, that adds new codes and retires old ones; a phenotype code list must be verified against the version that was active when the claims were submitted.; Appears in: ICD-10-CM Diagnosis Codes
Fisher z transformation: A mathematical conversion that turns a Pearson r into a quantity that behaves like a normal distribution, making it possible to compute a valid confidence interval around r.; Appears in: Pearson and Spearman Correlation
fixed effect: A coefficient that applies to everyone in the study -- for example, the average drop in HbA1c across all patients on the drug.; Appears in: Mixed-Effects (Random-Effects) Models for Longitudinal RWE
fixed risk window: The defined follow-up period (for example, 12 months from a patient's index date) over which events are counted to compute a risk; without a common fixed window for all patients, the incidence proportion is undefined and a rate or hazard measure is required instead.; Appears in: Risk Ratio and Risk Difference
flat format: The way diagnosis codes are stored in most US claims files, with the decimal point removed; "I509" in the data represents the published code I50.9.; Appears in: ICD-10-CM Diagnosis Codes
follow-up: The stretch of time a patient is actively observed in a study, from their start date until they leave, have the outcome, or the study ends — whichever comes first.; Appears in: Attrition and Loss to Follow-Up, Composite Endpoint Construction
follow-up time: The number of days a patient is observed for the outcome of interest, measured here from the landmark day — not from the original index date — so no pre-landmark survival is counted twice.; Appears in: Landmark Analysis, Mortality Source Hierarchy, Time Zero (Index Date) Alignment
follow-up window: The block of time after study entry during which the researcher watches for the outcome of interest; it must end no later than the observation period end date, because events after that date are unobserved.; Appears in: Censoring: Types, Mechanisms, and Informativeness, OMOP Observation Period
forest plot: A standard chart for displaying meta-analysis results: each row shows one study's effect and confidence interval as a horizontal line with a square, and a diamond at the bottom shows the pooled estimate.; Appears in: Meta-Analysis of Observational Studies, Visualizations and Diagrams in Pharmacoepidemiology and RWE
Form Locator (FL): The numbered position on the UB-04 paper form (e.g., FL4, FL17, FL67) that corresponds to a specific data element; each FL has a defined code set and research meaning.; Appears in: UB-04 / 837I Institutional Claim Fields
forward mapping: Translating a code from an older or source system to a newer or target system, for example ICD-9-CM to ICD-10-CM; one source code may produce several target codes.; Appears in: Code Crosswalks and Mappings Between Coding Systems
fraction of missing information (FMI): A diagnostic number between 0 and 1 that says what share of the uncertainty in a final estimate is due to the missing data; an FMI of 0.35 means roughly 35 percent of the variance comes from not knowing those values.; Appears in: Multiple Imputation for Longitudinal RWE
frailty: A state of reduced physical reserve and heightened vulnerability to stressors, distinct from simply having many diagnosed diseases.; Appears in: Claims-Based Frailty Index (Faurot / Kim)
framing symmetry: The practice of reporting the same statistic in both its event framing ("3 of 100 had the event") and its survival framing ("97 of 100 remained event-free"), so that neither a positive nor a negative spin dominates the reader's impression.; Appears in: Plain-Language Summaries of Evidence
future cases: People who experienced the same outcome but at a later calendar date; their within-person comparison estimates how much drug use was drifting over time, so that drift can be subtracted out of the main result.; Appears in: Case-Case-Time-Control Design

G

g-computation: The standardization procedure used to compute an AME: fit the outcome model, copy the dataset twice (set everyone to treatment A, then everyone to treatment B), predict risks both ways for every patient, and average the difference.; Appears in: Marginal Effects and Interpretation of Inferential Statistics
g-estimation: The numerical procedure that finds the treatment effect by searching for the value that, after mathematically removing the treatment effect from the outcome, makes the adjusted outcome uncorrelated with the treatment decision at each time point.; Appears in: G-Estimation of Structural Nested Models
g-null paradox: A technical limitation where the parametric g-formula is mathematically guaranteed to be misspecified when treatment truly has no effect but treatment-confounder feedback is present; g-estimation of structural nested models is robust to this issue.; Appears in: G-Computation and the Parametric G-Formula
gap: A stretch of days when a patient had no medicine on hand because the previous supply ran out before the next fill.; Appears in: Drug Utilization Study
gap rule: The decision that two consecutive facility bills belong to the same stay when the second admission date falls within 0 or 1 calendar day of the previous discharge date.; Appears in: Hospitalization and Transfer Collapse
garden of forking paths: The invisible multiplicity created by analyst decisions during the analysis — such as which outcomes to report, which subgroups to highlight, or which analytical approach to present — that are never counted as formal tests but still inflate the effective false-positive rate.; Appears in: Multiplicity and Multiple Comparisons
GEM (General Equivalence Mapping): The official CMS translation tables that map ICD-9-CM diagnosis and procedure codes to their ICD-10-CM/PCS equivalents, and vice versa, each row labeled with whether the match is exact or approximate.; Appears in: Code Crosswalks and Mappings Between Coding Systems
GEMs (General Equivalence Mappings): Translation tables published by CMS that map each ICD-9-CM code to one or more ICD-10-CM codes (and vice versa), providing a starting point for converting code lists across the 2015 transition, though they are approximate and not a substitute for re-validating a phenotype algorithm.; Appears in: ICD-10-PCS Inpatient Procedure Codes, ICD-9-CM Legacy Diagnosis and Procedure Codes
GEMs crosswalk: The General Equivalence Mappings published by CMS and NCHS that translate between ICD-9-CM and ICD-10-CM codes; the mappings are approximate and some conditions map to multiple codes in the other system.; Appears in: ICD-10-CM Diagnosis Codes
generalizability: The study participants are a subset drawn from the larger target population, and the goal is to re-weight the study results to represent that whole population.; Appears in: Generalizability, Transportability, and External Validity
geometric mean: The middle value of a set of numbers computed by multiplying them together and taking the nth root; for costs of 100, 1000, and 10000, the geometric mean is 1000, which is lower than the arithmetic mean of 3700 because it ignores the pull of the expensive outlier.; Appears in: Log-Normal Distribution and the Retransformation Problem
gestational age: How far along a pregnancy is, counted in weeks from the last menstrual period — used to define which weeks are the critical window for fetal organ development.; Appears in: Pregnancy Exposure Window, Special Populations RWE Methods
go-with-mitigations verdict: A fit-for-purpose decision that says the source can be used but only with specific pre-planned adjustments and sensitivity analyses to address bounded, measurable data gaps.; Appears in: Fit-for-Purpose Data Assessment
gold standard: The most reliable available confirmation of a patient's true status, almost always a clinician reading the actual medical chart, because the chart contains details the billing data does not capture.; Appears in: Algorithm Validation
grace period: A short buffer (often 30 days) added after a fill's supply runs out, during which the patient is still counted as on treatment even if they have not yet refilled — it forgives a brief late trip to the pharmacy.; Appears in: As-Treated Risk Window Construction, Clone-Censor-Weight for Per-Protocol Target-Trial Emulation, Exposure Episode Construction, Infused Biologic Administration Capture, Switch, Add-On, and Augmentation Rules
Greenwood variance: The standard formula for estimating how uncertain the Kaplan-Meier curve is at any given time point, which produces wider confidence bands as fewer patients remain in the risk set at later follow-up times.; Appears in: Kaplan-Meier Estimator
grey literature: Useful documents that aren't peer-reviewed journal articles, such as conference abstracts, HTA agency reports, and trial registry entries.; Appears in: Scoping Review
grouper: The computer program that takes a stay's diagnosis and procedure codes as input and outputs the single MS-DRG number, following CMS's published decision rules.; Appears in: MS-DRG (Medicare Severity Diagnosis-Related Groups)
guarantee-time trap: The bias that occurs when one group of patients must survive longer just to be labeled as that group, making them look healthier than the other group for reasons unrelated to treatment.; Appears in: Landmark Analysis
GVP Module VIII: A set of rules published by the European Medicines Agency (EMA) that defines what counts as a PASS and how non-interventional safety studies must be designed and registered.; Appears in: Voluntary (Non-Imposed) Post-Authorisation Safety Study

H

H statistic: The Kruskal-Wallis test statistic, which measures how unequally the rank totals are spread across groups relative to what pure chance would produce if there were no real difference between them.; Appears in: Kruskal-Wallis Test
hallucination: When an LLM confidently states a fact — such as a lab value or a date — that does not appear in the source document and was not provided to the model; the most dangerous failure mode in clinical abstraction.; Appears in: LLM-Assisted Data Abstraction and Evidence Work in RWE
harmonic mean: A way of averaging two numbers that gives a low result whenever either number is low, unlike a regular average which can stay high even if one number is near zero.; Appears in: F1 Score, Precision, and Recall
harmonization: The process of redefining variables, eligibility criteria, and outcome measurements the same way across all contributing studies so that differences in results reflect biology rather than differences in how each study was run.; Appears in: Individual Participant Data (IPD) Meta-Analysis
hazard: The instantaneous rate at which an event occurs at a specific moment in time, measured only among people who have not yet had the event — think of it as "how fast is the event arriving right now for those still waiting."; Appears in: Restricted Mean Survival Time (RMST), The Hazard Ratio as an Effect Measure
hazard function: A function describing how fast the event of interest (death, progression) is occurring at each moment in time among patients who have not yet had the event; higher values mean the event is happening more rapidly at that instant.; Appears in: Weibull Distribution for Time-to-Event Data
hazard ratio: A summary number comparing the event rate in a treated group to the event rate in an untreated group at any given moment; a value below 1.0 suggests the treated group has a lower rate, but only if the comparison is fairly constructed.; Appears in: Cox Proportional Hazards Regression, Cox Regression with Time-Dependent Covariates, Empirical Calibration with Negative Controls, Immortal Time Bias Handling, Log-Rank Test
hazard ratio (HR): The ratio of the instantaneous rate of the outcome in the exposed group to that in the unexposed group among people still at risk at any given moment; the main effect estimate from the weighted Cox model in a case-cohort analysis.; Appears in: Case-Cohort Design
hazard window: The short time period immediately before the acute event — typically the 7 days before a hospitalization — where we ask whether the patient happened to be taking the drug of interest.; Appears in: Case-Case-Time-Control Design, Case-Crossover Design, Case-Time-Control Design
HCPCS J-code: A Healthcare Common Procedure Coding System Level II code beginning with the letter J that identifies a specific injectable or infusible drug administered by a provider; on institutional claims, J-codes appear alongside revenue code 0636 to identify the exact drug being billed.; Appears in: Revenue (Center) Codes
HCPCS Level I: The formal HIPAA name for CPT codes: 'Level I' means CPT codes (five digits, AMA-maintained), while 'Level II' refers to a separate set of codes maintained by CMS that cover drugs, equipment, and supplies not described by CPT.; Appears in: CPT Codes (HCPCS Level I)
health state: One of the distinct clinical situations a patient can occupy in the model — in this model the three states are progression-free, progressed (cancer has worsened but patient is alive), and dead.; Appears in: Health Economic Modeling Methods Using RWE, Markov Transition Probabilities from Real-World Data, Partitioned Survival Model
health-seeking behavior: The pattern of actions — attending preventive visits, getting screened, filling prescriptions, eating well — that marks a person as proactively engaged in their own health, independent of any single treatment.; Appears in: Healthy User Bias
healthcare utilization intensity: How often a patient uses healthcare services — measured as outpatient visit counts, specialist visits, or distinct encounter dates — used as a proxy for how thoroughly a patient's medical history is observed and coded.; Appears in: Surveillance and Detection Bias
healthcare-seeking bias: The distortion that occurs when vaccinated and unvaccinated people differ in how readily they visit a doctor; it can make a vaccine look more or less effective than it really is unless the study design controls for it.; Appears in: Test-Negative Design
healthy-user bias: A form of confounding where people who choose to take a preventive drug are systematically healthier than non-users for reasons unrelated to the drug — making the drug look more effective than it really is.; Appears in: Negative Control Outcomes
height-for-age z-score (HAZ): A z-score specifically for height that accounts for both the child's age and sex, so a 5-year-old and a 10-year-old can be compared on the same scale even though expected heights differ greatly.; Appears in: Pediatric Growth and Development Endpoints in RWE
heterogeneity: The degree to which the true effect appears to differ across studies beyond what random chance alone would produce; high heterogeneity means studies are not estimating the same underlying quantity.; Appears in: Meta-Analysis of Observational Studies, Meta-Analysis of Randomized Controlled Trials
heterogeneity of treatment effect (HTE): The phenomenon where a drug's benefit or harm is larger in some patient groups than others, often summarized as the ratio or difference of the effect estimates across groups.; Appears in: Subgroup Analysis and Heterogeneity of Treatment Effect
heteroscedasticity: A pattern where the spread of prediction errors grows or shrinks depending on the predicted value — common in cost data, where sicker patients are both harder to predict and more expensive, creating a fan shape in diagnostic plots.; Appears in: Gamma Distribution for Cost and Skewed Outcomes, Regression Diagnostics and Model Checking, Welch's t-Test (Unequal Variances)
hierarchical composite endpoint: A set of outcomes ranked from most to least important (e.g., death first, then hospitalization) so the most serious one decides a comparison whenever possible.; Appears in: Win Ratio and Generalized Pairwise Comparisons
high-dimensional propensity score: A single number (0 to 1) summarizing how likely a patient is to have received the study drug rather than the comparator, calculated from hundreds of automatically selected pre-treatment claims codes rather than a short investigator-chosen list.; Appears in: High-Dimensional Propensity Score (hdPS)
history dependence: The idea that what happens to a patient next can depend on what already happened to them — for example, having had a severe side effect on first-line therapy makes second-line therapy less effective.; Appears in: Discrete-Event Simulation Using RWE Inputs
Hodges-Lehmann estimator: A rank-based estimate of the location shift between two groups, computed as the median of all pairwise differences (one value from each group); it gives a single number summarizing how much one group tends to exceed the other, with an accompanying confidence interval.; Appears in: Mann-Whitney U Test (Wilcoxon Rank-Sum)
hurdle model: A two-part count model where a logistic regression determines whether a patient has any events at all (crosses the hurdle), and a zero-truncated count distribution describes the number of events among those who do; unlike zero-inflated models, all zeros come from the logistic part.; Appears in: Zero-Inflated and Hurdle Count Models
hypergeometric distribution: The probability distribution that describes how many successes you get when drawing a fixed number of items without replacement from a population split into two types — exactly the situation Fisher's test uses to figure out how likely your 2×2 table is under the null hypothesis.; Appears in: Fisher's Exact Test
hypothesis generation: Using an early description to suggest an idea worth testing later with a stronger design, rather than to prove anything.; Appears in: Case Series
hypothetical bias: The tendency for people to express stronger preferences or higher willingness to pay in a survey than they would in a real decision with actual consequences.; Appears in: Patient Preference Study (DCE / BWS)
hypothetical strategy: An intercurrent-event rule that asks what the outcome would have been if the disruption had never occurred — requires special statistical methods because you cannot simply drop patients who switched.; Appears in: Estimands (ATE/ATT) and Intercurrent Events in RWE

I

I-squared: A 0-to-100 percent score for heterogeneity; near 0 percent means the trials largely agree, and a high value means they diverge a lot.; Appears in: Meta-Analysis of Randomized Controlled Trials
ICC (intraclass correlation coefficient): A number between 0 and 1 used to measure test-retest reliability — it compares a patient's score at two time points when their health has not changed; higher values (generally 0.70 or above) mean the scores are reproducible.; Appears in: PRO Instrument Validation
ICD code: A standardized numeric label (e.g., I48.11 for persistent atrial fibrillation) that clinicians and coders place on claims to describe a diagnosis; it is a billing label, not a clinical test result.; Appears in: Diagnosis Phenotype Algorithm (1 IP / 2 OP, Time Window)
ICD diagnosis code: A standardized alphanumeric label (e.g., K71.6) that a coder assigns to each medical claim to record what condition was treated or evaluated.; Appears in: Safety Signal Case Definition
ICD procedure codes: Seven-character codes (ICD-10-PCS) used by hospitals to describe procedures performed during an inpatient stay, covering the same surgeries as CPT but in a completely different coding language.; Appears in: Procedure Identification and Measurement in Claims and EHR
ICD-10 code: A standardized billing code that a clinician enters to describe a diagnosis — for example, E11.9 means type 2 diabetes without complications.; Appears in: EHR Phenotyping Algorithms
ICD-10 WHO: The World Health Organization's international version of ICD-10, used in Europe and most of the world for diagnosis coding; it contains approximately 15,000 codes and lacks the US-specific extensions present in ICD-10-CM, so US phenotyping algorithms must be adapted before use in international databases.; Appears in: International Real-World Data Sources
ICD-10-CM code: A standardized billing code — like I21.0 — that hospitals and clinics submit on claims to describe a patient's diagnosis.; Appears in: Outcome Algorithm Construction
ICD-10-PCS: The separate, public-domain code system used on inpatient hospital bills to describe surgical and procedural services during a hospital stay; it is maintained by CMS, not the AMA, and its codes look nothing like CPT codes — confusing the two is one of the most common errors in claims-based procedure research.; Appears in: CPT Codes (HCPCS Level I)
ICD-9-CM diagnosis code: A 3-to-5-character number (e.g., 250.00 for type 2 diabetes) stored without the decimal point in raw claims data, used to label every diagnosis billed on a US insurance claim before 1 October 2015.; Appears in: ICD-9-CM Legacy Diagnosis and Procedure Codes
ICEMAN: A credibility-assessment checklist for subgroup claims that rates eight dimensions including pre-specification, use of a formal interaction test, biological plausibility, and sample size within the subgroup.; Appears in: Subgroup Analysis and Heterogeneity of Treatment Effect
ICER: Incremental cost-effectiveness ratio — the extra cost divided by the extra health benefit (measured in quality-adjusted life years) when comparing a new treatment to the current standard; health technology bodies use it to decide whether a treatment is worth funding.; Appears in: Cost-Utility Analysis (CUA), Health Economic Modeling Methods Using RWE, Partitioned Survival Model
ICER (incremental cost-effectiveness ratio): The extra cost of a new treatment divided by the extra health it produces compared to the current option — for example, $50,000 per QALY gained.; Appears in: Cost-effectiveness Analysis (CEA)
identifiability: Whether the observed data contain enough information to reliably separate the cure fraction from a very slowly declining uncured tail; cure models require long, mature follow-up to be identifiable.; Appears in: Cure Models (Mixture and Non-Mixture)
identification bias: A distortion that occurs when patients are enrolled into the study after the clinic has already been told which arm it is in, so that sicker or healthier patients are selectively entered depending on the arm.; Appears in: Cluster-Randomized Trial
immortal time: Person-time that is counted toward an exposed group even though the exposure had not yet started -- this artificially makes the exposed group look healthier and must be excluded by anchoring the watch window at the true start of exposure.; Appears in: OMOP Time-at-Risk and Cohort Exit, Target Trial Emulation, Time Zero (Index Date) Alignment
immortal time bias: A distortion that occurs when a researcher counts the period before a patient met the study entry rule as part of their follow-up, making the drug or group look safer or more effective than it truly is because only survivors reached that entry point.; Appears in: Biomarker-Defined Cohort (RWE), Cox Regression with Time-Dependent Covariates, Time-Updated Exposures and Cumulative Dose
imputation model: The statistical model used to predict and draw plausible replacement values for the missing data, which must include the study outcome and all covariates used in the main analysis so the imputed values are consistent with the analysis.; Appears in: Multiple Imputation for Longitudinal RWE
incidence: The rate of brand-new cases appearing over a stretch of time, which a one-day snapshot cannot measure because it never watches people across time.; Appears in: Cross-Sectional Study
incidence rate: The number of new events divided by the total time-at-risk across the study population, usually expressed per 1,000 person-years; person-time denominator construction produces the divisor of this calculation.; Appears in: Acute Event Deduplication Window, Descriptive Epidemiology in RWE, Person-Time Denominator Construction, Risk Evaluation Study (Post-Authorization Safety / Active Surveillance)
incidence rate ratio: A number comparing how often an event (like a side effect) occurs per year in one treatment group versus another; a value below 1.0 means the first group had fewer events per year.; Appears in: Multi-Database / Distributed Network Study, Self-Controlled Case Series (SCCS)
incidence rate ratio (IRR): The ratio of the event rate in one group to the event rate in another group, where each rate is expressed as events per unit of person-time (e.g., hospitalizations per person-year).; Appears in: Poisson and Negative Binomial Count Models for HCRU and Utilization
incident diagnosis: The first recorded occurrence of a disease for a patient within a defined look-back window that is free of any prior diagnosis for that condition.; Appears in: OMOP CONDITION_OCCURRENCE and CONDITION_ERA
incident event: A brand-new occurrence of the outcome (a first heart-failure hospitalization), as opposed to one the patient already had before follow-up started.; Appears in: Incidence Rate Calculation
incident outcome: A brand-new occurrence of the outcome that happens after time zero, not one a person already had before they entered the study.; Appears in: Prospective Cohort Study
incident user: A patient who has no record of filling a given drug in the months before the study window starts — meaning this fill is truly their first time on it, not a continuation of an old prescription.; Appears in: Prescription Sequence Symmetry Analysis (PSSA)
incidental finding: A diagnosis made during a routine check-up or monitoring visit because a test was run, not because the patient reported symptoms — these findings accumulate faster in groups that receive more frequent medical contact.; Appears in: Surveillance and Detection Bias
incremental cost: The extra spending caused by a disease, calculated by subtracting the average cost of matched controls from the average cost of patients with the disease.; Appears in: All-Cause vs Attributable Costs, Cost-effectiveness Analysis (CEA)
incremental cost (ΔC): The difference in average total spending between the new treatment group and the comparison group — a positive number means the new treatment costs more.; Appears in: ICER and Net Monetary Benefit (NMB)
incremental cost-effectiveness ratio (ICER): The extra cost of the new treatment divided by the extra health benefit it produces, expressed as cost per unit of health gained (for example, cost per QALY).; Appears in: Probabilistic Sensitivity Analysis (PSA) for Health-Economic Models
incremental effect: The difference in health outcome between the new treatment and the comparator — the extra QALYs (or life-years) gained by using the new option.; Appears in: Cost-effectiveness Analysis (CEA)
incremental effect (ΔE): The difference in average QALYs (or other health outcome) between the new treatment group and the comparison group — a positive number means the new treatment produces better health.; Appears in: ICER and Net Monetary Benefit (NMB)
independence assumption: The statistical requirement that knowing the outcome for one row tells you nothing about the outcome for another row; violated whenever rows share a cluster.; Appears in: Cluster-Robust Standard Errors
index date: The date a participant officially 'enters the clock' — for example, the date of their first prescription fill for the drug being studied — which is the starting point for counting their time-at-risk.; Appears in: Biomarker-Defined Cohort (RWE), Composite Endpoint Construction, Diagnosis Phenotype Algorithm (1 IP / 2 OP, Time Window), Exposure Lag, Induction, and Latency Windows, Healthcare Resource Utilization (HCRU), Immortal Time Bias Handling, Landmark Analysis, New-User (Incident-User) Design, Persistence Time to Discontinuation, Person-Time Denominator Construction, Prevalent New-User Design, Prevalent User Bias, Proportion of Days Covered (PDC), Protopathic Bias and Reverse Causation, Real-World Progression and rwPFS, Restart, Rechallenge, and New-Episode Rules, Study Time Windows: Baseline, Observation, and Outcome Windows, Washout / Clean / Lookback Period
index date (cohort entry): The patient's starting point in the study -- typically the date of their first prescription fill for the drug being studied, serving as day zero from which all follow-up is measured.; Appears in: OMOP Time-at-Risk and Cohort Exit
index date (time zero): The patient's chosen "day zero" at which eligibility, treatment-group assignment, and the start of follow-up all happen at once.; Appears in: Active Comparator, New-User Design, Time Zero (Index Date) Alignment
index fill: The first prescription fill that starts the clock — the patient's 'day zero' when on-treatment follow-up begins.; Appears in: As-Treated Risk Window Construction
index test: The thing being checked — here, a claims or EHR rule that decides whether a patient has the disease.; Appears in: Diagnostic Accuracy Study
index-condition exclusion: The rule that you must not count the very condition you are studying (or the admission reason) as a comorbidity, so you do not adjust away your exposure or outcome.; Appears in: Elixhauser Comorbidity Measures / Index
indirect comparison: An estimate of how two treatments compare when no trial tested them head-to-head, derived by chaining their separate results through a shared comparator treatment.; Appears in: Network Meta-Analysis
indirect costs: Money lost because the disease keeps people from working, including missed workdays and reduced productivity on the job.; Appears in: Burden of Disease and Cost-of-Illness (COI) Studies, Cost-of-Illness (COI) Study
indirect effect: The portion of a drug's total effect that operates specifically through the mediator, sometimes called the natural indirect effect (NIE); together, direct and indirect effects add up to the total effect.; Appears in: Causal Mediation and Effect Modification
individual patient data: The original row-level records for each patient in a study — one row per person — as opposed to the single summary number (like an average hazard ratio) that a published paper reports.; Appears in: Individual Participant Data (IPD) Meta-Analysis
individual patient data (IPD): The row-per-patient data from a trial, as opposed to the published group averages everyone else can see.; Appears in: MAIC and STC: Population-Adjusted Indirect Comparisons
individual-level association: A within-patient link: patients who respond better on the surrogate also tend to do better on the true endpoint; this is necessary but not enough to use the surrogate as a stand-in for the true endpoint.; Appears in: Surrogate Endpoint Validation
individual-level measurement: A value recorded for a specific person — for example, a patient's own screening answer about food insecurity — as opposed to a neighborhood average.; Appears in: Social Determinants of Health (SDoH) in RWE
induction offset: A deliberate one-day (or longer) delay between the index date and the start of the time-at-risk window, used to exclude outcomes that were already present on the first day of treatment rather than caused by it.; Appears in: OMOP Time-at-Risk and Cohort Exit
induction period: The minimum biological time that must pass between taking a drug and the drug being able to cause a disease event; events that occur before this minimum delay are not attributed to the drug.; Appears in: Exposure Lag, Induction, and Latency Windows
induction window: A short post-index exclusion zone (e.g., the first 30 days after the index date) during which outcome events are not counted because the drug has not yet had time to cause them.; Appears in: Study Time Windows: Baseline, Observation, and Outcome Windows
inequality aversion (Atkinson epsilon): A dial that says how much extra a society values health gains to the worse-off; at zero it ignores the distribution, and higher values weight the worst-off groups more heavily.; Appears in: Distributional Cost-Effectiveness Analysis
inflation probability (pi): In a zero-inflated model, the estimated probability that any given patient belongs to the structural-zero class (can never have the event); the complement (1 minus pi) is the probability of being in the active-count group.; Appears in: Zero-Inflated and Hurdle Count Models
influence (Cook's distance): How much the fitted model coefficients would shift if you removed one patient from the dataset; a Cook's distance near 1 means that single patient is materially changing the estimated results.; Appears in: Regression Diagnostics and Model Checking
informative censoring: An observation endpoint where the reason for leaving the study is connected to how sick the patient is — if the sickest patients tend to drop out earliest, those still being watched are healthier than average and the event estimates become overly optimistic.; Appears in: Attrition and Loss to Follow-Up, Censoring: Types, Mechanisms, and Informativeness, Inverse Probability of Censoring Weighting (IPCW)
Ingredient (IN): The active chemical in a drug, stripped of dose and form — for example, "atorvastatin" is the ingredient-level concept that all atorvastatin products of every strength and brand share.; Appears in: RxNorm Drug Terminology
ingredient roll-up: The step that maps every specific drug product (branded name, dose, formulation) to its underlying active ingredient so that fills of atorvastatin 20 mg and atorvastatin 40 mg are counted together as one 'atorvastatin' treatment episode.; Appears in: OMOP Drug Exposure and Drug Era
inpatient claim: A billing record generated when a patient is formally admitted overnight (or longer) to a hospital; it carries discharge codes that reflect what the patient was treated for during the stay.; Appears in: Diagnosis Phenotype Algorithm (1 IP / 2 OP, Time Window)
instantaneous rate: A measure of how quickly something happens at a precise moment, expressed as events per unit of time (e.g., 0.02 events per month), not as a cumulative count or probability over a whole period.; Appears in: The Hazard Ratio as an Effect Measure
instrument: A variable that changes which treatment a patient receives but has no direct effect on the outcome and shares no hidden cause with it — for example, a physician who simply tends to prescribe Drug A more often than Drug B.; Appears in: Instrumental Variables in Pharmacoepidemiology
integration: The deliberate act of connecting the two strands so that one informs the other — for example, using numbers to decide who to interview, then placing interview themes alongside claims patterns in a single comparison table.; Appears in: Mixed-Methods Study
intention-to-treat: An analysis rule that counts every participant in the group they were randomly assigned to, regardless of whether they actually took the treatment as directed, preserving the fairness that randomization created.; Appears in: Pragmatic Trial
intention-to-treat analysis: Comparing patients by the treatment they were randomly assigned to, regardless of what they actually received, in order to preserve the protection that randomization provides against background differences between groups.; Appears in: Registry-Based Randomized Controlled Trial (RRCT)
interaction test: The statistical test that directly asks whether a drug's effect differs between two groups, by adding a product term (the interaction) to a model and testing whether that term's coefficient is nonzero.; Appears in: Subgroup Analysis and Heterogeneity of Treatment Effect
intercept: The model's predicted value of the outcome when every predictor equals zero; in most clinical models it is a mathematical anchor rather than a clinically meaningful quantity.; Appears in: Ordinary Least Squares (OLS) Linear Regression
intercurrent event: Something that happens after a patient enters a study and complicates how you count their outcome — for example, the patient switches to a different drug, stops treatment entirely, or dies before the outcome of interest occurs.; Appears in: Estimand-to-Analysis Traceability, Estimands (ATE/ATT) and Intercurrent Events in RWE
interquartile range: The spread from the 25th percentile to the 75th percentile of a distribution, capturing the middle half of the data; preferred for skewed variables like healthcare costs and length of stay.; Appears in: Descriptive Statistics
intraclass correlation (ICC): A number between 0 and 1 that measures how similar patients within the same cluster are to each other compared with patients across different clusters; even a small ICC (e.g., 0.05) meaningfully reduces the statistical information in the data.; Appears in: Cluster-Randomized Trial
intraclass correlation coefficient (ICC): A family of statistics measuring how consistently multiple raters score the same subjects on a continuous scale; the choice between absolute-agreement and consistency forms determines whether systematic rater differences count against the ICC.; Appears in: Agreement Statistics: Kappa, ICC, and Bland-Altman
inverse-odds-of-sampling weight: A number assigned to each study participant that up-weights people who look like the target population and down-weights people who are over-represented in the study, so the weighted study group resembles the target.; Appears in: Generalizability, Transportability, and External Validity
inverse-probability weighting: A technique that gives each patient a statistical weight — patients who look like they 'should' have been in the other arm are upweighted — so the weighted groups resemble a population in which treatment was assigned independently of patient characteristics.; Appears in: Propensity Score Methods (PSM, IPTW)
inverse-probability-of-censoring weight: A number multiplied onto each remaining clone-record to mathematically represent the clones that were artificially removed, so the surviving clones still reflect the full original population.; Appears in: Clone-Censor-Weight for Per-Protocol Target-Trial Emulation
inverse-probability-of-treatment weighting: A technique that gives each person-month record a numeric weight equal to the inverse of how likely that person was to receive the treatment they actually received, given their history; people who received an unexpected treatment get a high weight so their experience counts more, creating a pseudo-population where treatment looks random.; Appears in: Marginal Structural Models and G-Methods
inverse-variance weighting: A method of combining estimates from multiple sites that gives more influence to results from sites with smaller uncertainty, producing a weighted average that reflects precision.; Appears in: Federated and Distributed Network Analysis, Meta-Analysis of Randomized Controlled Trials
IPPS: Inpatient Prospective Payment System — Medicare's method of paying hospitals a fixed, pre-determined amount per discharge based on the DRG, rather than reimbursing each individual service after the fact.; Appears in: MS-DRG (Medicare Severity Diagnosis-Related Groups)
is-a hierarchy: The network of parent-child links in SNOMED CT that lets you say "Diabetes mellitus type 2 IS A type of Diabetes mellitus IS A type of Disorder of glucose metabolism" — so a search for the parent automatically captures all the children below it.; Appears in: SNOMED CT Clinical Terminology
item: A single question or statement on a questionnaire that patients respond to, such as 'Rate your worst itch in the past 24 hours on a scale of 0 to 10.'; Appears in: PRO Instrument Development

J

J-code: A five-character billing code starting with the letter J, used to report a drug that was given to a patient by a provider (for example, infused IV chemotherapy or an injected biologic) rather than dispensed at a pharmacy for the patient to take at home.; Appears in: HCPCS Level II Codes and J-Codes, Infused Biologic Administration Capture, Route-of-Administration Differences in RWE, Therapeutic-Area-Specific RWE Challenges — Oncology
joint display: A side-by-side table or diagram that shows the quantitative finding and the qualitative theme for the same people or groups, making it easy to see where the two strands agree or disagree.; Appears in: Mixed-Methods Study

K

k-fold cross-validation: A method that splits the dataset into k equal groups, trains the model k times (each time leaving one group out as the test set), and averages the k test-set performance scores to get an honest estimate.; Appears in: Cross-Validation, Overfitting, and Optimism
Kaplan-Meier curve: A step-down graph tracking the fraction of each treatment group that has not yet experienced an event (e.g., hospitalization) as time passes from the start of the study.; Appears in: Visualizations and Diagrams in Pharmacoepidemiology and RWE
Kaplan-Meier plateau: A flat section at the right tail of the survival curve that persists across a long follow-up period, suggesting that some patients will never have the event and the proportion surviving has stopped decreasing.; Appears in: Cure Models (Mixture and Non-Mixture)
knot: An anchor point in a spline where two polynomial segments join; the curve changes its bending behavior at each knot while remaining smooth and continuous.; Appears in: Splines and Flexible Functional Forms

L

labeler code: The first part of an NDC (5 digits in the HIPAA format), assigned by the FDA to the specific company responsible for putting the drug on the market — manufacturer, repackager, or private-label distributor.; Appears in: NDC (National Drug Code)
labeler code reuse: The rare but documented situation in which the FDA reassigns a labeler code prefix to a different company after the original holder surrenders it, meaning the same code prefix can refer to different manufacturers at different points in time in a long dataset.; Appears in: NDC (National Drug Code)
lag exclusion period: A fixed window at the beginning of follow-up that is removed from the analysis to exclude outcomes detected during the dense initial wave of monitoring visits that often follows a new drug prescription.; Appears in: Surveillance and Detection Bias
lag period: The chunk of time immediately after a prescription start that researchers skip over when counting disease events, because events in that window were probably already developing before the drug could have caused anything.; Appears in: Exposure Lag, Induction, and Latency Windows
lag window: A defined number of months before the index date during which drug exposures are deliberately excluded from the analysis, because any prescription in that window may have been triggered by early disease symptoms rather than an unrelated indication.; Appears in: Protopathic Bias and Reverse Causation
landmark day: A pre-chosen calendar point (e.g., day 90 after diagnosis) that divides the study into a classification period before it and an outcome-tracking period after it; only patients still alive and event-free on this day are included going forward.; Appears in: Landmark Analysis
latency distribution: The parametric survival distribution (such as Weibull or exponential) that describes how quickly the at-risk, not-cured subgroup eventually experiences the event.; Appears in: Cure Models (Mixture and Non-Mixture)
latency period: The delay between when a disease actually begins in the body and when a doctor diagnoses it and it appears in the data; for slow-developing diseases like cancer, this can be years.; Appears in: Exposure Lag, Induction, and Latency Windows
latent class: A hidden subgroup that the model infers from patterns in the data but that is never directly observed — patients are not labeled by class in the raw claims file; the model assigns them based on their observed pattern.; Appears in: Group-Based Trajectory Models and Latent Class Analysis
leading zero: The zero that prefixes a revenue code to make it 4 digits (e.g., 0450 rather than 450); many databases strip this zero, so analysts must normalize the field before filtering.; Appears in: Revenue (Center) Codes
leaf vs branch: A leaf is the most specific code (like the single term myocarditis); a branch is a grouping above it (like the whole cardiac category) whose count is the sum of its leaves.; Appears in: Tree-Based Scan Statistics (TreeScan)
learning rate: In gradient boosting, a small number (typically 0.01 to 0.1) that controls how much each new tree contributes to the ensemble; a lower rate means slower but more stable learning and usually requires more trees.; Appears in: Tree-Based Ensembles: Random Forests and Gradient Boosting
least absolute deviations (LAD): Median regression by another name — the special case of quantile regression at the 50th percentile, which minimises the sum of absolute rather than squared residuals.; Appears in: Quantile Regression
left truncation: A statistical technique that tells the survival analysis to start counting a patient's at-risk time only from the date they actually entered the study, not from some earlier date, so patients who died before being tested are correctly excluded from the risk set.; Appears in: Biomarker-Defined Cohort (RWE)
level: One of the specific values an attribute can take in a choice task — for example, an efficacy attribute might have levels of 40%, 55%, and 70% response rate.; Appears in: Patient Preference Study (DCE / BWS)
level change: The sudden, one-time jump or drop in the outcome rate that happens right at the intervention date, measured by comparing what the rate actually was at that moment to what the pre-period trend predicted it would be.; Appears in: Interrupted Time Series (Segmented Regression)
leverage: A measure of how unusual a patient's predictor values are (e.g., very old or very sick relative to everyone else in the dataset); high leverage means that patient has more potential to pull the fitted line toward themselves.; Appears in: Regression Diagnostics and Model Checking
Levin's formula: The standard PAF formula: PAF = p*(RR-1) / (p*(RR-1) + 1), where p is the population exposure prevalence and RR is the risk ratio; valid only when RR is the unconfounded (causal) estimate.; Appears in: Attributable Risk and Population Attributable Fraction
lifetime horizon: The full remaining lifespan of a typical patient in the model — often 30 or more years — which health technology assessment bodies require so all future costs and benefits are captured.; Appears in: Survival Extrapolation for HTA Using RWE
likelihood: The probability of observing the actual data you collected, as a function of the unknown parameter; it tells you how much the data favor each possible parameter value.; Appears in: Bayesian Inference Foundations
likelihood function: The probability of the observed data treated as a function of an unknown model parameter, telling you how plausible each possible parameter value is given what you actually observed.; Appears in: Maximum Likelihood Estimation
likelihood ratio: A multiplier that says how many times more likely a given test result is in people with the disease than in people without it; 1 means the result tells you nothing.; Appears in: Diagnostic Likelihood Ratios
line of therapy: The ordered sequence of cancer treatment regimens a patient receives; first-line is the initial treatment, second-line starts when the first stops working or causes too much harm, and so on.; Appears in: Therapeutic-Area-Specific RWE Challenges — Oncology, Treatment Patterns and Lines of Therapy (LOT)
linear predictor: The weighted sum of predictor variables (β₀ + β₁X₁ + … + βₚXₚ) that sits inside a GLM; the link function connects this sum to the outcome's expected value.; Appears in: Generalized Linear Models (GLM)
linear vs monotone association: Linear association means the relationship follows a straight line; monotone association is weaker — it only requires that higher values of X tend to pair with higher (or lower) values of Y, even if the relationship curves.; Appears in: Pearson and Spearman Correlation
link function: A mathematical transformation applied to the outcome's expected value so that the transformed value can be modelled as a straight-line combination of predictors; for example, the logit link transforms a probability into a log-odds so it can range from negative to positive infinity.; Appears in: Generalized Linear Models (GLM)
linkage-selection bias: The distortion that occurs when patients who successfully link across sources differ systematically from patients who do not link — for example, younger or healthier patients may link more easily, making the linked cohort unrepresentative of the original population.; Appears in: Linked Multi-Database Study (Record Linkage)
linked death registry: A national vital-statistics or death-certificate database that is matched to registry patients after the fact, so that deaths occurring outside the hospital are still counted in the study.; Appears in: Registry-Based Randomized Controlled Trial (RRCT)
Listwise deletion: Another name for complete-case analysis — every row (patient record) with even one missing value is deleted from the analysis list before estimation begins.; Appears in: Complete-Case Analysis
live-birth selection: The fact that a linked mother-infant study can only include pregnancies that produced a live, enrolled baby, meaning pregnancies that ended in loss or in an unenrollable baby are excluded, which can skew the results if the drug being studied also causes pregnancy loss.; Appears in: Mother-Infant Linkage
LMP (last menstrual period): The first day of the mother's last menstrual cycle before conception, used as the standard starting point from which gestational age and trimester boundaries are calculated.; Appears in: Pregnancy Exposure Window
LMS method: The mathematical formula (named for its three parameters) that turns a raw height or weight measurement into a z-score by adjusting for the child's age and sex using the reference standard's published values.; Appears in: Pediatric Growth and Development Endpoints in RWE
local average treatment effect: The effect of treatment estimated only among the patients whose drug choice was actually changed by the instrument — it is not the average effect across all patients, and the exact group of affected patients cannot be listed from the data.; Appears in: Instrumental Variables in Pharmacoepidemiology
local code: An internal lab identifier assigned by a single hospital or laboratory system, such as "CREAT-S" or "LB-1042," which has no meaning outside that site and must be translated to LOINC before results can be shared or combined with data from other sites.; Appears in: LOINC Laboratory and Observation Codes
local randomization: The idea that patients who land just above versus just below the cutoff are so similar in everything else that their tiny difference in the running variable is essentially random, making the comparison as credible as a small randomized trial run right at the threshold.; Appears in: Regression Discontinuity Design
locked data snapshot: A frozen copy of the source dataset taken at a defined cut date, stored with a cryptographic checksum so that any later change to the file is detectable; the cut date itself is part of what the analysis estimates in a living database.; Appears in: QC, Double Programming, and Reproducible Analysis
log link: The mathematical function in the GLM that connects the regression equation (which can produce any number) to the mean cost (which must be positive) by exponentiating the linear predictor, ensuring all predicted costs are greater than zero.; Appears in: Gamma Distribution for Cost and Skewed Outcomes
log odds: The natural logarithm of the odds; also called the logit. Taking the log converts odds (which are always positive) into a number that can be negative, zero, or positive, which a regression line can use without going out of bounds.; Appears in: Binomial Distribution and the Logit Link
log odds ratio: The natural logarithm of an odds ratio, used as the working scale for combining binary outcome results across trials because log-scale effects add and subtract neatly.; Appears in: Network Meta-Analysis
log-likelihood: The natural logarithm of the likelihood function; it converts products of probabilities into sums and is always maximized at the same parameter value as the likelihood itself.; Appears in: Maximum Likelihood Estimation
log-likelihood ratio (LLR): A single number summarizing how much more the observed events look like an excess than like the no-excess expectation at a given look; here it is zero unless observed exceeds expected.; Appears in: MaxSPRT and Sequential Safety Surveillance
log-likelihood-ratio (LLR): A score for one cut that measures how surprising its observed event count is compared to its expected count; bigger means a more unusual excess.; Appears in: Tree-Based Scan Statistics (TreeScan)
log-linear model for time: The core AFT structure — instead of modeling the hazard directly, AFT models treat log(survival time) as a linear function of covariates, like ordinary linear regression applied to the logarithm of the time-to-event outcome.; Appears in: Accelerated Failure Time (AFT) Models
logit: The link function that maps a probability between 0 and 1 to the entire real line by computing the natural log of the odds; the inverse logit (expit or plogis) converts back to a probability.; Appears in: Binomial Distribution and the Logit Link
longitudinal data: Data where the same patients are measured multiple times at different points in time, so each person contributes more than one row to the dataset.; Appears in: Mixed-Effects (Random-Effects) Models for Longitudinal RWE
look-back period (washout): A stretch of record before time-zero that the analyst inspects to confirm the patient is a brand-new user and to measure their starting health.; Appears in: Retrospective Cohort Study Design
lookback (baseline) window: The fixed stretch of time before the study index date over which you search the data for qualifying diagnoses; it must be the same length for every patient.; Appears in: Charlson Comorbidity Index (CCI)
lookback window: The block of time before a patient's study entry date that a researcher searches for prior diagnoses, prior drug use, or other baseline information — it only produces valid results when it falls entirely inside an observation period.; Appears in: OMOP Observation Period
loss to follow-up: When a patient stops appearing in the data — here it can mean recovered, died, or just moved their care elsewhere, and the record can't tell you which.; Appears in: EHR-Based Study, Pregnancy Exposure Registry
Love plot: A dot plot showing how similar the two treatment groups are on each background characteristic before and after statistical adjustment — a dot past the 0.1 line means that characteristic is still imbalanced.; Appears in: Visualizations and Diagrams in Pharmacoepidemiology and RWE
Luhn check digit: The 10th digit of every NPI, computed by a standard algorithm called Luhn that lets a computer instantly detect whether a 10-digit string is a plausible NPI or a data-entry error.; Appears in: NPI (National Provider Identifier)

M

M1/M2: A regulatory framework for setting the non-inferiority margin: M1 is the effect of the reference drug over placebo (from prior studies), and M2 is the fraction of that effect the new drug must preserve to be considered acceptable — the NI margin equals M2.; Appears in: Equivalence and Non-Inferiority Testing
machine learning nuisance models: Flexible computer-learned models (such as random forests or gradient boosting) used to handle the statistical background work of adjusting for confounders, rather than making rigid assumptions about a simple linear relationship.; Appears in: Targeted Maximum Likelihood Estimation (TMLE)
maintenance dose: The steady dose a patient stays on once the dose stops changing.; Appears in: Dose Titration / Up-Titration to Target Dose
major congenital malformation (MCM): A serious structural birth defect present at birth, such as a heart septal defect or cleft palate, that is the primary safety outcome a pregnancy registry is designed to detect.; Appears in: Pregnancy Exposure Registry
malformation prevalence ratio: The proportion of babies with a birth defect among exposed mothers divided by the same proportion among unexposed mothers; a ratio above 1 suggests the drug may raise risk.; Appears in: Pregnancy Exposure Registry
mapping (crosswalk): A published statistical formula that converts scores from a disease-specific symptom questionnaire into the 0-to-1 utility scale, used when the study collected symptom data but not a direct utility measure like the EQ-5D.; Appears in: QALY Utility Mapping (Crosswalking to Health-State Utilities)
mapping version: The specific release of a crosswalk file (e.g., FY2018 GEMs, Q1 2024 ASP crosswalk) that was applied; must be documented because crosswalks are updated on their own schedules and different versions produce different results.; Appears in: Code Crosswalks and Mappings Between Coding Systems
Maps to: The CONCEPT_RELATIONSHIP link that connects a source code to its standard concept, the central translation step the ETL uses to convert raw ICD or NDC codes into OMOP concept_ids.; Appears in: OMOP Standardized Vocabularies (OHDSI/Athena)
MAR: Missing At Random: the chance a value is missing can be fully explained by other variables you already have in your dataset, such as treatment arm or care site, but not by the missing value itself.; Appears in: Missing Data Pattern Table
marginal effect: A treatment effect averaged over the whole study population rather than estimated within a specific subgroup or by holding other patient characteristics fixed; what g-computation produces, in contrast to the conditional odds or hazard ratio a regression model reports by default.; Appears in: G-Computation and the Parametric G-Formula
marginal expected count: The population-average expected event count from a zero-inflated model, computed as (1 minus pi) times the count-component mean; this is the correct quantity for budget-impact analysis, not the count-component mean alone.; Appears in: Zero-Inflated and Hurdle Count Models
marginal model: Another name for a population-average model: one that describes outcomes averaged across the population rather than modeled within each individual.; Appears in: GEE Population-Average (Marginal) Models
marginal proportion: The overall fraction of patients who had a "yes" outcome under one condition — computed across all pairs, not just discordant ones; McNemar's test asks whether the marginal proportion under condition 1 equals the marginal proportion under condition 2.; Appears in: McNemar's Test for Paired Proportions
marginal standardisation: Computing the average of individual predicted values across all patients in a dataset, under a specified treatment assignment, to recover the population-average expected cost rather than the cost at an "average patient."; Appears in: Two-Part and Hurdle Models for Semicontinuous Costs
marginal structural model: An outcome model fit on the reweighted pseudo-population that estimates the population-averaged effect of always following one treatment strategy versus another, without conditioning on the time-varying confounders.; Appears in: Marginal Structural Models and G-Methods
marker drug: Drug B — a medication whose newly prescribed use signals that the patient likely developed a specific condition, serving as a proxy for that condition in place of a separately recorded diagnosis.; Appears in: Prescription Sequence Symmetry Analysis (PSSA)
marketing-authorization holder (MAH): The company that legally owns the right to sell an approved drug in a given country or region — the entity the regulator holds responsible for post-approval safety monitoring.; Appears in: Imposed Post-Authorisation Safety Study (PASS)
Markov model: A type of decision model that divides patients into mutually exclusive health states and moves them between states cycle by cycle using fixed transition probabilities, like moving tokens on a board each month.; Appears in: Health Economic Modeling Methods Using RWE
Markov vs semi-Markov: Whether the rate of leaving a state depends on time since the study began (Markov, clock-forward) or time since entering the current state (semi-Markov, clock-reset).; Appears in: Multi-State Models
match rate: The fraction of records in one source that find a corresponding record in the other source — a high match rate means more patients are included, but it says nothing about whether the matched pairs are actually the same person.; Appears in: Linked Multi-Database Study (Record Linkage), Tokenization and Privacy-Preserving Record Linkage
matched control: A person without the disease who is similar in age, sex, and health complexity to someone with the disease, used as a comparison to estimate what the diseased patient would have spent if they did not have the disease.; Appears in: All-Cause vs Attributable Costs
matched set: One case paired with its sampled controls; every member of the set shares the same event-time anchor and any other matching factors like age or sex.; Appears in: Nested Case-Control Design
MCAR: Missing Completely At Random: the chance a value is missing has nothing to do with any variable in your dataset, observed or unobserved, so the missing records look just like the complete ones.; Appears in: Missing Data Pattern Table
mean: The arithmetic average — add up all values and divide by the number of observations; sensitive to extreme values because every value contributes equally.; Appears in: Descriptive Statistics
mean cumulative function: A curve showing the expected total number of events per patient by each point in time, read directly as events-per-person rather than as a probability.; Appears in: Recurrent Events Analysis
mean difference: The gap between the two group averages on the original scale of the outcome (e.g., $840 more in total costs per patient in the treated group); this is the primary effect estimate for clinical and economic interpretation.; Appears in: Two-Sample (Student's) t-Test, Welch's t-Test (Unequal Variances)
mean survival (life-years): The average total time alive across all patients in a model, calculated as the area under the survival curve; this is the key number that drives the cost-effectiveness calculation.; Appears in: Survival Extrapolation for HTA Using RWE
MedDRA: The Medical Dictionary for Regulatory Activities — the controlled vocabulary the FDA and other regulators require for coding adverse events and medical history in submissions; mapping ICD-10 billing codes to MedDRA terms requires a documented crosswalk.; Appears in: CDISC Standards (SDTM/ADaM) for RWE Submissions
median: The middle value when observations are sorted from lowest to highest; robust to extreme values because it depends only on rank, not magnitude.; Appears in: Descriptive Statistics
median survival time: The time point at which the Kaplan-Meier curve drops to exactly 50%, meaning that half the patients in the group are estimated to have had the event by that day.; Appears in: Kaplan-Meier Estimator
mediator: A variable that sits on the causal chain between the drug and the outcome (the drug causes the mediator, which then causes the outcome); adjusting for it removes part of the drug's effect from your estimate.; Appears in: Causal Mediation and Effect Modification, DAGs and the Backdoor Criterion for Drug Studies
medical benefit: The part of health insurance that covers services provided by a doctor or clinic, such as infusions given in a hospital outpatient center or office; distinct from the pharmacy benefit, which covers drugs a patient picks up at a retail or specialty pharmacy.; Appears in: Infused Biologic Administration Capture, Route-of-Administration Differences in RWE
medical vs pharmacy benefit: Medical benefit covers services and drugs administered by a provider during a visit (billed with HCPCS codes on a medical claim); pharmacy benefit covers drugs dispensed at a pharmacy for the patient to take home (billed with NDC codes on a pharmacy claim). Most cancer infusions are medical-benefit; most pills are pharmacy-benefit.; Appears in: HCPCS Level II Codes and J-Codes
Medicare Advantage (MA): A private-plan alternative to traditional Medicare where the government pays the plan a fixed monthly amount per enrollee rather than a fee per service; the plan then submits encounter records to report what services were delivered.; Appears in: Medicare FFS vs Medicare Advantage vs Commercial Claims Differences
Medicare Fee-for-Service (FFS): Traditional Medicare where the government pays a separate fee for each covered service a provider delivers, generating a claim record for every billed service.; Appears in: Medicare FFS vs Medicare Advantage vs Commercial Claims Differences
member-month: One patient enrolled and observable for one calendar month; a patient followed for 5 months contributes 5 member-months.; Appears in: Healthcare Costs (PPPM, PPPY, PMPM)
meta-analysis: An optional final step that statistically combines the results of similar studies into a single pooled estimate; only valid when the studies are alike enough to be measuring the same thing.; Appears in: Systematic Review
meta-inference: The conclusion you can only reach after placing both strands together — the insight that neither numbers alone nor words alone could produce.; Appears in: Mixed-Methods Study
method of moments: A way to fit a statistical distribution using only a published mean and standard error rather than patient-level data, by matching the distribution's theoretical mean and variance to the reported values.; Appears in: Beta Distribution for Proportions and Utilities
mg/kg/day: Milligrams of drug per kilogram of body weight per day, the standard unit for expressing how much drug a child receives relative to their size.; Appears in: Pediatric Dose Normalization
mid-p correction: A small adjustment to the exact McNemar p-value that adds half the probability of the observed outcome table to each tail, giving better balance between false-positive and false-negative rates than the fully exact or fully asymptotic test when the number of discordant pairs is small.; Appears in: Fisher's Exact Test, McNemar's Test for Paired Proportions
minimal important difference (MID): The smallest change in score that a patient would actually notice and consider meaningful — used to decide whether a change detected in a study represents real benefit, not just measurement noise.; Appears in: PRO Instrument Validation
minimally important difference (MID): The smallest change in a PRO score that patients would notice and consider meaningful; changes smaller than the MID are statistically real but clinically unimportant.; Appears in: Patient-Reported Outcomes in Real-World Settings
minimum detectable effect: The smallest true difference between two treatments that a study of a given size can reliably detect.; Appears in: Sample Size, Power, and Precision in RWE
misclassification: An error in how a patient is labeled — for example, a patient who truly had a stroke is coded as stroke-free, or vice versa, because the billing-code algorithm is imperfect.; Appears in: Misclassification Bias Correction
Missing At Random (MAR): An assumption that the chance a value is missing depends only on other data we can see (like age or prior test results), not on the missing value itself — for example, a lab is missing more often in healthy patients who had fewer clinic visits, and we can see how many visits each patient had.; Appears in: Complete-Case Analysis, Longitudinal Outcomes Modeling, Multiple Imputation for Longitudinal RWE
Missing Completely At Random (MCAR): Missingness is pure chance — the probability that a value is missing has nothing to do with any characteristic of the patient, whether observed or not.; Appears in: Complete-Case Analysis
Missing Not At Random (MNAR): Whether a value is missing depends on the value that is missing — for instance, very high BMI patients decline to have their BMI recorded precisely because it is high.; Appears in: Complete-Case Analysis
missing-at-random (MAR): The assumption that whether a patient's visit measurement is missing can be fully explained by the data already collected for that patient, not by the unobserved value itself.; Appears in: Mixed Model for Repeated Measures (MMRM) in RWE
MMRM: Mixed Model for Repeated Measures — a statistical model that estimates the average treatment difference at each scheduled visit using all observed data, without imputing missing visits.; Appears in: Mixed Model for Repeated Measures (MMRM) in RWE
MNAR: Missing Not At Random: the chance a value is missing depends on the value that is missing, for example a lab being skipped precisely because a patient is doing poorly, so no amount of observed information can fully account for the gap.; Appears in: Missing Data Pattern Table
modifier: A two-character code attached to a CPT code that tells the insurer something important about how the service was delivered — for example, that only one side of the body was treated, or that two separate services happened on the same day.; Appears in: CPT Codes (HCPCS Level I)
monetized benefit: A health or social outcome (such as an injury avoided) that has been converted into a dollar amount so it can be compared directly with costs.; Appears in: Cost-Benefit Analysis (CBA)
monotone hazard: A hazard that moves in only one direction over time (strictly increasing or strictly decreasing), which is the key structural assumption of the Weibull — it cannot describe a hazard that rises then falls.; Appears in: Weibull Distribution for Time-to-Event Data
monotone pattern: A dropout-like missingness structure where once a variable is missing for a patient, later variables are also missing in a staircase sequence, which allows simpler imputation methods.; Appears in: Missing Data Pattern Table
Monte Carlo permutation: A shuffling procedure that re-creates the data many times assuming no real effect, so you can see how big a cluster could appear just by chance across the whole tree.; Appears in: Tree-Based Scan Statistics (TreeScan)
Monte Carlo simulation: A computational method that runs a model repeatedly using random draws from input distributions to characterize the range of possible outcomes.; Appears in: Probabilistic Sensitivity Analysis (PSA) for Health-Economic Models
mother-infant dyad: The linked pair of a pregnant person and their infant, treated as a single unit of analysis when the outcome of interest is a birth defect or newborn condition rather than a maternal one.; Appears in: Special Populations RWE Methods
MS-DRG: Medicare Severity Diagnosis Related Group — the payment category a hospital discharge is assigned to based on the principal diagnosis and procedures; ICD-10-PCS codes directly drive whether a discharge is classified as a surgical DRG (which pays more) or a medical DRG.; Appears in: ICD-10-PCS Inpatient Procedure Codes
MSIS_ID: A unique identifier assigned to a person in Medicaid (the U.S. public insurance program for low-income individuals) that is sometimes carried onto the baby's record at birth, allowing the same kind of family-based matching used in commercial insurance.; Appears in: Mother-Infant Linkage
multi-axial code: A code where each character position independently encodes a different clinical dimension, so the full meaning is the combination of all seven positions rather than a single memorized label.; Appears in: ICD-10-PCS Inpatient Procedure Codes
multi-outcome reusability: The ability to use the same subcohort as the comparison group for several different outcomes, which is unique to the case-cohort design and not possible in nested case-control studies.; Appears in: Case-Cohort Design
multiple comparisons: The problem that arises when you test many hypotheses at once — if you compare 5 groups pairwise you run 10 tests, and 10 tests at alpha=0.05 each will produce roughly half a false positive by chance even if nothing is truly different.; Appears in: One-Way ANOVA
multiple imputation: A technique that fills in each missing value not with one fixed guess but with several different plausible draws, producing multiple complete datasets whose variation captures the uncertainty about what the true value was.; Appears in: Missing Data Pattern Table, Multiple Imputation for Longitudinal RWE
multiplicative effect: A change that scales the outcome by a fixed factor rather than adding a fixed amount; a regression coefficient of 0.405 on the log scale means the treated group's typical cost is exp(0.405) approximately 1.5 times the control group's typical cost.; Appears in: Log-Normal Distribution and the Retransformation Problem

N

named-entity recognition (NER): A step in clinical NLP that scans text and marks specific words or phrases as belonging to a category — for example, tagging "pembrolizumab 200 mg" as a drug-dose entity or "ECOG 2" as a performance-status score.; Appears in: NLP for Clinical Text in RWE
national burden: The total economic cost of a disease across an entire country, calculated by multiplying the per-patient cost by the number of people with the disease.; Appears in: Burden of Disease and Cost-of-Illness (COI) Studies
National Death Index (NDI): A federal database, maintained by the CDC, that collects death certificates from all U.S. states and is considered the most complete and accurate source for confirming that a patient died and on what date.; Appears in: Mortality Source Hierarchy
natural frequency: A way of expressing a risk as a concrete count in a fixed group — "12 of 100 people" rather than "12%" or "an 0.12 probability" — which research consistently shows is understood more accurately by lay readers than probability fractions or percentages.; Appears in: Plain-Language Summaries of Evidence
natural history: The typical course of a disease over time in patients who receive standard care or no treatment, documented in registries or observational studies to understand what happens without the new treatment.; Appears in: Rare Disease External Controls
NCPDP D.0: The National Council for Prescription Drug Programs Telecommunication Standard (version D.0) — the electronic transaction format used by all US pharmacies to submit prescription claims for real-time adjudication.; Appears in: NCPDP Pharmacy Claim Fields
NDC (National Drug Code): An 11-digit code printed on every drug package in the US that identifies the specific manufacturer, product, and package size — one drug can have dozens of NDCs but maps to a single RxCUI ingredient.; Appears in: RxNorm Drug Terminology
NDC strength: The milligrams per tablet/unit recorded on a pharmacy claim, used with quantity and days_supply to infer the daily dose.; Appears in: Dose Titration / Up-Titration to Target Dose
NDC-to-ATC crosswalk: The three-step translation used with US claims data: NDC (package-level US drug code) maps to RxNorm (ingredient-level) via the RxNav API, then RxNorm maps to ATC class via the RxClass service — lossy for combination products and drugs with multiple therapeutic indications.; Appears in: ATC Classification and Defined Daily Dose (DDD)
NDJSON: "Newline-Delimited JSON" — a file format where each line is one complete JSON record; FHIR bulk export delivers data this way so each line is one resource (one prescription, one diagnosis, etc.).; Appears in: FHIR and Healthcare Interoperability for RWE
near-real-time surveillance: Re-running the safety check on a fresh data feed every week or month as new exposures and events arrive, rather than waiting for one analysis at the end.; Appears in: MaxSPRT and Sequential Safety Surveillance
negation handling: A rule or model component that detects when an entity in a clinical note has been denied or qualified (for example, "no evidence of progression") so the system does not falsely record the entity as present.; Appears in: NLP for Clinical Text in RWE
negative control exposure: A second drug or treatment added to a study that cannot plausibly cause the study outcome, used only to test whether the analytic method is producing spurious associations due to residual confounding.; Appears in: Negative Control Exposures
negative control outcome: An outcome that cannot plausibly be caused or prevented by the exposure being studied, used to reveal hidden differences in surveillance intensity between groups — if that outcome also shows a group difference, the cause is the watching, not the treatment.; Appears in: Empirical Calibration with Negative Controls, Negative Control Outcomes, Surveillance and Detection Bias
negative controls: Drug-outcome pairs that researchers know have no real causal relationship, used to test whether an analysis method is producing spurious results due to bias rather than true effects.; Appears in: Federated and Distributed Network Analysis
negative-control outcome: A health event the drug being studied cannot biologically cause or prevent — such as car accidents — used to check whether an apparent benefit is really just evidence of a healthier patient group.; Appears in: Healthy User Bias
nested cross-validation: A two-layer CV design where the inner layer tunes model settings (like the regularization penalty) and the outer layer honestly evaluates the whole training pipeline; needed whenever settings are chosen using the same data.; Appears in: Cross-Validation, Overfitting, and Optimism
net benefit: The total monetized benefit of a program minus its total cost; a positive number means the program generates more value than it consumes.; Appears in: Cost-Benefit Analysis (CBA), Win Ratio and Generalized Pairwise Comparisons
net health benefit: For each subgroup, the health a program adds minus the health lost elsewhere because the budget that funded it could no longer pay for other care.; Appears in: Distributional Cost-Effectiveness Analysis
net monetary benefit (NMB): A strategy's health gain converted to money (its QALYs times what the payer will pay per QALY) minus its cost, so strategies can be compared on a single number.; Appears in: Value of Information Analysis (EVPI, EVPPI, EVSI)
net survival: The probability of surviving a cancer in a hypothetical world where the cancer is the only possible cause of death; it strips away the background mortality that the patient would have experienced regardless of the cancer diagnosis.; Appears in: Relative and Net Survival
net-of-rebate price: A drug's price after manufacturer discounts the payer actually receives, which can be much lower than the list price and is the figure that should drive a brand-versus-generic cost comparison.; Appears in: Cost-minimization Analysis (CMA)
network: The map of all treatments and the direct trial comparisons linking them; a treatment pair with no connecting path through the network cannot be compared.; Appears in: Network Meta-Analysis
new user: A patient whose first observable fill of the study drug occurs after a clean stretch with no prior fills, meaning we are watching them start the drug fresh rather than catching them mid-treatment.; Appears in: Washout / Clean / Lookback Period
new user (incident user): A patient who is starting the drug for the very first time, with no fills of it in the recent past.; Appears in: Active Comparator, New-User Design
new-user design: A study rule that counts only patients starting a drug for the first time during the study period, so researchers can observe the full treatment experience from day one.; Appears in: Comparative Effectiveness Research (CER) Methods
new-user washout: A lookback period — typically 6 to 12 months — during which a patient must have had no use of the study drug; this ensures the analysis captures people starting the drug for the first time, not those already on it.; Appears in: Database Feasibility Assessment and Attrition Funnel
NOC code: A "not otherwise classified" billing placeholder (J3490 for drugs, J3590 for biologics) used when a newly approved drug does not yet have its own permanent J-code; during this period the specific drug can only be identified by looking at the NDC (National Drug Code) field on the same claim line, which is often missing or incomplete.; Appears in: HCPCS Level II Codes and J-Codes
non-collapsibility: The mathematical property of odds ratios and hazard ratios that makes them change value when you add more covariates to the model — even with no confounding — which is why they cannot be averaged across subgroups the way a risk difference can.; Appears in: Marginal Effects and Interpretation of Inferential Statistics
non-comparative incidence rate: A count of how often an outcome (like a side effect) occurs per unit of time among people using the product, without comparing that rate to any group not using the product.; Appears in: Product/Exposure Registry
non-differential vs differential misclassification: Non-differential means labeling errors happen at the same rate in both the treated and comparison groups; differential means the error rate differs by group, which can push the estimate in either direction rather than always toward 1.0.; Appears in: Misclassification Bias Correction
non-inferiority: A study result showing a treatment is not meaningfully worse than its comparator by a margin set in advance; a strong such result is the kind of evidence that can justify assuming equal outcomes.; Appears in: Cost-minimization Analysis (CMA)
non-inferiority margin: The largest amount by which the test treatment can be worse than the comparator and still be considered clinically acceptable; it is set in advance by the study team before any data are collected.; Appears in: Equivalence and Non-Inferiority Testing
noncollapsibility: A mathematical property of odds ratios and hazard ratios where the population-level value differs from the within-stratum value even when there is no confounding; g-computation bypasses noncollapsibility by producing risk differences and risk ratios directly from the averaged predictions.; Appears in: Binomial Distribution and the Logit Link, G-Computation and the Parametric G-Formula
nonparametric test: A statistical test that converts the raw data values into ranks (1st, 2nd, 3rd...) and bases the comparison on those ranks, making fewer assumptions about the shape of the data distribution.; Appears in: Parametric and Nonparametric Tests
normal distribution: A symmetric, bell-shaped probability distribution fully described by its mean and standard deviation, where roughly 68%, 95%, and 99.7% of values fall within one, two, and three standard deviations of the mean respectively.; Appears in: Normal Distribution and the Central Limit Theorem
normality assumption: The requirement, for many parametric tests, that the data (or the sample mean, for large samples) follow a bell-curve shape; violated by very skewed data or extreme outliers, especially in small samples.; Appears in: Parametric and Nonparametric Tests
NPPES: The National Plan and Provider Enumeration System — the federal database run by CMS that issues NPIs and stores each provider's name, address, and specialty codes in a public file updated monthly.; Appears in: NPI (National Provider Identifier)
nuisance function: In causal ML, a model (for the propensity score or the outcome) that is fit only to clean up confounding, not to be interpreted on its own.; Appears in: Predictive and Causal ML Models
null distribution: The theoretical distribution of a test statistic (like a t-statistic or a rank sum) that you would expect to see if there were truly no difference between the groups being compared.; Appears in: Parametric and Nonparametric Tests
null hypothesis: The default assumption that there is no effect or no difference between groups — hypothesis testing asks whether the data are inconsistent enough with this assumption to warrant rejecting it.; Appears in: Inferential Statistics Foundations
null model: The simplest possible prediction: assign every patient the same probability equal to the overall event rate in the dataset, ignoring all individual patient information.; Appears in: Brier Score
null-effect sequence ratio (NSR): The sequence ratio you would expect to see even if drug A caused nothing, calculated from background trends in how often each drug is being newly prescribed over the same time period.; Appears in: Prescription Sequence Symmetry Analysis (PSSA)
number at risk: The count of patients still actively being followed at a given time point, displayed below the KM curve to show when the estimate becomes unreliable because too few patients remain in the study.; Appears in: Kaplan-Meier Estimator
Number Needed to Harm (NNH): The same calculation as NNT but for a bad outcome: how many patients you treat before one extra patient is harmed.; Appears in: Number Needed to Treat (and Number Needed to Harm)

O

observable time: The stretches of calendar time during which a patient is enrolled and the database is actually capturing their care — the only periods where a missing record safely means a missing event.; Appears in: Continuous Enrollment and Observable Time
observation period: The span of dates on which a data source (claims or EHR) has reliable records for a patient; the time-at-risk window can never extend beyond this boundary, because records outside it simply do not exist.; Appears in: OMOP CDM Method Patterns for RWE, OMOP Observation Period, OMOP Time-at-Risk and Cohort Exit
observation window: The span of time when the data source was actually capturing the patient's healthcare — in claims, this is when the patient was enrolled in their health plan; only within this window can a missing record be read as nothing happening.; Appears in: Cumulative Incidence and Absolute Risk Estimation, Medication Possession Ratio (MPR), Proportion of Days Covered (PDC), Study Time Windows: Baseline, Observation, and Outcome Windows
observed count: How many of the events have actually happened so far in the exposed people being watched.; Appears in: MaxSPRT and Sequential Safety Surveillance
observed events: The actual count of events (deaths, diagnoses, hospitalizations) that occurred among study participants during follow-up.; Appears in: Indirect Standardization, SMR, and SIR
observed events (O): The actual number of events (deaths, hospitalizations, etc.) recorded in a group during the study.; Appears in: Log-Rank Test
observed follow-up: The actual span of time a patient was enrolled and generating claims in the dataset — it varies across patients because people join or leave a health plan at different times.; Appears in: Healthcare Costs (PPPM, PPPY, PMPM)
observed vs expected counts: The "observed" count is the number you actually found in the data for a given cell; the "expected" count is what you would predict if group membership and outcome were completely unrelated (calculated from the row and column totals).; Appears in: Chi-Square Test of Independence
Occurrence code: A paired code-and-date field (FL31–36) that records a specific event associated with the claim, such as the accident date, symptom onset, or date a home health plan was established; used in injury- mechanism and symptom-onset analyses.; Appears in: UB-04 / 837I Institutional Claim Fields
odds: The number of patients with the event divided by the number without it — for example, odds of 0.25 mean the event happens once for every four non-events, which corresponds to a 20% probability.; Appears in: Binomial Distribution and the Logit Link, Logistic Regression for Binary Outcomes
odds ratio: The odds of the event in one group divided by the odds in another; an odds ratio of 2.25 means the first group has 2.25 times the odds of the event. For common outcomes the odds ratio is larger than the risk ratio describing the same data.; Appears in: Binomial Distribution and the Logit Link, Case-Control Study Design, Exact and Penalized-Likelihood Methods for Sparse Data
odds ratio (OR): A number expressing how much more common the acute event was during windows when the patient was exposed to the drug versus windows when they were not; an OR of 4 means the event was 4 times as likely on exposed days within that person.; Appears in: Case-Crossover Design, Logistic Regression for Binary Outcomes, Marginal Effects and Interpretation of Inferential Statistics, Test-Negative Design
offset: In a Poisson regression, the log of each patient's follow-up time added to the model with its coefficient locked at 1, so the model predicts a rate (events per unit time) rather than a raw count; omitting it turns a rate comparison into a biased count comparison.; Appears in: Negative Binomial Distribution for Overdispersed Counts, Poisson and Negative Binomial Count Models for HCRU and Utilization, Poisson Distribution for Counts and Rates
omnibus test: A single test covering all groups simultaneously — one-way ANOVA is omnibus because it asks "any difference anywhere?" rather than testing each pair separately; it controls the overall false-positive rate for the multi-group comparison.; Appears in: Kruskal-Wallis Test, One-Way ANOVA
OMOP: Observational Medical Outcomes Partnership — the organization that designed the CDM now maintained by OHDSI; the two names are often used interchangeably to refer to the same data standard.; Appears in: OMOP CDM Method Patterns for RWE
OMOP CDM: A standardized data format (Common Data Model) created by the OHDSI community that reorganizes claims, EHR, and registry data into the same table structure so analysis code can run across many different data sources.; Appears in: OMOP Observation Period
on-treatment person-time: The total count of days during which a patient is inside an open risk window and therefore counted as actively exposed to the drug.; Appears in: As-Treated Risk Window Construction
one-stage vs two-stage: Two ways to pool the data: two-stage fits a separate statistical model within each study and then combines those study-level results; one-stage stacks all patients' rows together and fits a single model that treats study membership as a built-in grouping variable — both are valid, but they can give slightly different answers.; Appears in: Individual Participant Data (IPD) Meta-Analysis
one-to-many mapping: When a single code in the source system corresponds to two or more codes in the target system, because the target system has more granularity or finer clinical distinctions.; Appears in: Code Crosswalks and Mappings Between Coding Systems
one-way sensitivity analysis: Re-running the model with a single input set to a justified low and then high value, holding all other inputs at base case, to see how much the result moves.; Appears in: Scenario Analysis and Deterministic Sensitivity Analysis (DSA)
operational rule: The exact, reproducible data instruction that turns a conceptual question attribute into a computable flag or variable in the claims or EHR dataset.; Appears in: Estimand-to-Analysis Traceability
opportunity cost: The health given up somewhere else when money is spent on this program - estimated here as the program's cost divided by the cost-per-QALY threshold.; Appears in: Distributional Cost-Effectiveness Analysis
opportunity loss: The benefit given up when the strategy chosen today turns out not to be the best one in a particular scenario.; Appears in: Value of Information Analysis (EVPI, EVPPI, EVSI)
optimism: The gap between a model's apparent performance on training data and its honest cross-validated performance; an optimism of 0.10 in c-statistic means the model looks 0.10 points better than it actually is on new patients.; Appears in: Cross-Validation, Overfitting, and Optimism
order: A prescription a clinician writes in the EHR; it shows the doctor intended a drug, not that a pharmacy ever dispensed it or the patient took it.; Appears in: EHR-Based Study
order-to-fill window: The number of days after a prescription is written during which a fill still counts as the patient starting that drug; analysts choose 30 or 90 days and test different lengths.; Appears in: Primary Non-Adherence and Treatment Initiation
organogenesis: The early-pregnancy window, roughly weeks 3 through 8 after conception, when the fetus forms its major organs and is most vulnerable to drugs that cause structural birth defects.; Appears in: Pregnancy Exposure Window
organogenesis window: Roughly the first 12 weeks (90 days) after the last menstrual period, when fetal organs are forming and exposure to a drug is most likely to cause a structural defect.; Appears in: Pregnancy Exposure Registry, Special Populations RWE Methods
out-of-bag error: A built-in accuracy estimate for random forests: because each tree is trained on a random sample, the patients left out of that sample can be used to test that specific tree, giving a free estimate of how well the forest generalizes.; Appears in: Tree-Based Ensembles: Random Forests and Gradient Boosting
outcome model: A statistical model (often logistic or survival) that predicts the study outcome from baseline characteristics.; Appears in: Disease Risk Scores
outcome window: The span after the induction zone (or directly after the index date when no induction is used) during which events are counted and attributed to the exposure, ending at disenrollment, death, or an administrative data cutoff.; Appears in: Study Time Windows: Baseline, Observation, and Outcome Windows
outlier leverage: The disproportionate influence a single extreme data point can exert on Pearson r; one outlier far from the group mean can substantially raise or lower r without reflecting the relationship in the rest of the data.; Appears in: Pearson and Spearman Correlation
outpatient claim: A billing record from a visit where the patient was not admitted — a doctor's office, clinic, or same-day procedure — that carries the code the clinician used to describe the reason for the visit.; Appears in: Diagnosis Phenotype Algorithm (1 IP / 2 OP, Time Window)
outpatient pharmacy claim: A billing record submitted when a patient fills a prescription at a retail or mail-order pharmacy — this record is absent when the same drug is given inside a hospital, because hospitals bill differently.; Appears in: Inpatient Bridging of Drug Exposure
overall survival: The length of time from the start of treatment until a patient dies; preferred as the primary outcome in external-control studies because it is less likely to be measured differently across the two data sources.; Appears in: Partitioned Survival Model, Single-Arm Trial with External (Historical) Control
overdispersion: A condition in count data where the variance (spread) of the counts is larger than the mean, which happens when a small group of high-utilizers drives a long right tail; Poisson assumes variance equals the mean, so overdispersed data require negative binomial instead.; Appears in: Negative Binomial Distribution for Overdispersed Counts, Poisson and Negative Binomial Count Models for HCRU and Utilization
overlap: The extent to which patients in both arms have similar propensity scores — good overlap means there are comparable patients on both sides; poor overlap means some patients are so unlike the other arm that no fair comparison is possible.; Appears in: Propensity Score Methods (PSM, IPTW), Switch, Add-On, and Augmentation Rules
overlap weights: A weighting formula that assigns each patient a weight equal to the probability they would have received the opposite treatment — treated patients get weight 1 minus their propensity score, controls get weight equal to their propensity score — keeping all weights between zero and one by construction.; Appears in: Overlap Weights and Modern Propensity Weighting

P

p-value: The probability of seeing data at least as extreme as what was observed if the null hypothesis (no effect) were true — a small p-value means the data are surprising under the null, but does not measure the size of the effect or the probability that the null is true.; Appears in: Inferential Statistics Foundations
package code: The last part of an NDC (2 digits in the HIPAA format), which distinguishes different package sizes or container types of the exact same drug and strength from the same labeler (for example, a 30-count bottle versus a 90-count bottle).; Appears in: NDC (National Drug Code)
paired 2x2 table: A 2-row by 2-column table where each cell counts patient pairs by their combination of outcomes under the two conditions (both yes, first yes only, second yes only, both no); the layout that McNemar's test is built for.; Appears in: McNemar's Test for Paired Proportions
pairwise comparison: Lining up one treated patient against one control patient and deciding which of the two did better.; Appears in: Win Ratio and Generalized Pairwise Comparisons
parallel trends: The assumption that, without the policy, the treated group and the comparison group would have changed by the same amount over time — meaning any difference you see after the policy must be caused by the policy, not by the two groups naturally drifting apart.; Appears in: Difference-in-Differences with Staggered Adoption
parameter uncertainty: Uncertainty about the true value of a model input (such as a hazard ratio or a cost estimate) because it was measured in a finite sample and therefore has a margin of error.; Appears in: Probabilistic Sensitivity Analysis (PSA) for Health-Economic Models
parametric survival model: A mathematical formula that describes how quickly patients in a study are dying over time; once fit to the observed data, the formula can be used to project survival into the unobserved future.; Appears in: Survival Extrapolation for HTA Using RWE
parametric test: A statistical test that assumes the data come from a specific distributional family (usually the normal/bell-curve family) and uses that assumption to derive what "random chance" should look like.; Appears in: Parametric and Nonparametric Tests
Part 1 (any-cost model): The logistic regression component of a two-part model that estimates the probability a patient has any positive cost at all, answering the question of who enters the healthcare system.; Appears in: Two-Part and Hurdle Models for Semicontinuous Costs
Part 2 (positive-cost model): The gamma or log-normal regression component fitted only among patients who had positive costs, estimating how much those users spent conditional on having any cost at all.; Appears in: Two-Part and Hurdle Models for Semicontinuous Costs
part-worth utility: The estimated value a respondent places on one level of one attribute, calculated from the pattern of their choices; higher part-worth means that level is strongly preferred.; Appears in: Patient Preference Study (DCE / BWS)
partial likelihood: The mathematical trick Cox's method uses to estimate hazard ratios by asking, at each moment an event occurs, who in the still-event-free group was most likely to have that event — this lets the model work without assuming any particular shape for how background risk changes over time.; Appears in: Cox Proportional Hazards Regression
partial value score: An option's performance on one criterion rescaled to 0-100, where 0 is the agreed worst plausible level and 100 the best.; Appears in: Multi-Criteria Decision Analysis (MCDA)
participant observation: The core method where the researcher watches, and often partly joins in, the everyday activity of a setting rather than just asking people about it afterward.; Appears in: Ethnographic / Observational Qualitative Study
participating sites: The specific clinics that agreed to enroll patients into the registry; patients seen elsewhere never enter the data.; Appears in: Disease Registry
PASS: Post-Authorization Safety Study — any research on an already-approved medicine that aims to identify, measure, or confirm a safety risk in the people actually using it.; Appears in: Imposed Post-Authorisation Safety Study (PASS), Voluntary (Non-Imposed) Post-Authorisation Safety Study
PATH statement: A framework recommending that researchers stratify patients by overall baseline risk score — rather than by single covariates — to identify who benefits most in absolute terms from a treatment, giving a more stable and clinically useful picture of HTE.; Appears in: Subgroup Analysis and Heterogeneity of Treatment Effect
patient-level splitting: Ensuring that all records belonging to one patient go entirely into training or entirely into testing, never split across both; required in claims and EHR data where one patient contributes many rows.; Appears in: Cross-Validation, Overfitting, and Optimism
patient-reported outcome (PRO): A measurement of a patient's health status — symptoms, physical function, or quality of life — reported by the patient themselves, not interpreted by a clinician.; Appears in: Patient-Reported Outcomes in Real-World Settings
payer perspective: Counting only the costs the health plan itself pays, not costs paid by patients, employers, or society.; Appears in: Budget Impact Analysis
PDC: Proportion of Days Covered — an adherence measure for pharmacy fills that divides the number of days a patient had medication on hand by the total days in the observation window; it requires days_supply and cannot be calculated from J-code claims.; Appears in: Route-of-Administration Differences in RWE
PDC (Proportion of Days Covered): A related adherence measure that counts only the unique calendar days a patient had any supply on hand — overlapping fill periods are merged so each day is counted at most once, keeping PDC at or below 1.0.; Appears in: Inpatient Bridging of Drug Exposure, Medication Possession Ratio (MPR)
penalized likelihood: A modified version of the standard model-fitting procedure that adds a small penalty term to prevent the estimates from drifting to impossible values when the data are very sparse.; Appears in: Firth Penalized Regression
penalty (lambda): A mathematical constraint added to the model fitting that pushes coefficient estimates toward zero; a larger lambda shrinks coefficients more aggressively, producing a sparser or smaller model.; Appears in: Regularized Regression: LASSO, Ridge, and Elastic Net
per-patient per-year (PPPY) cost: The average annual health care spend for one patient with the disease, calculated by dividing total costs by the number of patients.; Appears in: Burden of Disease and Cost-of-Illness (COI) Studies
per-patient-per-year (PPPY): A rate that expresses how many events occurred for every full year a single average patient was observed, calculated by dividing total events by total observed years across all patients.; Appears in: Healthcare Resource Utilization (HCRU)
per-protocol analysis: An analysis restricted to patients who actually followed the study protocol correctly; in non-inferiority trials this is a co-primary analysis because patients who drifted from protocol push results toward "no difference," which can falsely support non-inferiority.; Appears in: Equivalence and Non-Inferiority Testing
per-protocol effect: The estimated difference in outcomes between two groups assuming everyone actually followed their assigned treatment strategy exactly as specified.; Appears in: Clone-Censor-Weight for Per-Protocol Target-Trial Emulation
percentile: The percentage of the reference population a child's measurement exceeds — a child at the 10th percentile for height is taller than 10 out of every 100 children of the same age and sex.; Appears in: Cost Outlier Handling (Winsorization, Trimming, Robust/Two-Part Models), Pediatric Growth and Development Endpoints in RWE
percentile confidence interval: A bootstrap CI formed by taking the 2.5th and 97.5th percentiles of the distribution of replicate statistics — no symmetry assumption is required.; Appears in: Bootstrap and Resampling Methods
performance matrix: The table of how each option actually performs on each criterion, in natural units like months of survival gain or adverse events per 100 patients.; Appears in: Multi-Criteria Decision Analysis (MCDA)
period prevalence: The fraction of a population that is a case at any time during a defined interval, such as a calendar year; it counts anyone who had the condition during that window, not just on one day.; Appears in: Prevalence (Point, Period, and Annual) in RWE
period-specific effect: The hazard ratio calculated within a defined time window (e.g., months 0–6 vs. months 6–24); when early and late HRs differ markedly, reporting them separately is more informative than a single averaged number.; Appears in: The Hazard Ratio as an Effect Measure
period_type_concept_id: A code in the OMOP table that records how the observation period was created — for example, from insurance enrollment records in claims data versus from first-to-last clinical visit in EHR data.; Appears in: OMOP Observation Period
permissible gap: The maximum number of days allowed between the end of one fill's supply and the start of the next before an analyst calls it a stop — commonly set at 60 days, and always varied in sensitivity analyses.; Appears in: Persistence Time to Discontinuation
permutation test: A significance test that randomly shuffles group labels among observed units to build a null distribution, used for testing rather than estimation; the bootstrap's hypothesis- testing cousin.; Appears in: Bootstrap and Resampling Methods
persistence gap: The maximum number of days allowed between the end of one fill and the start of the next before the algorithm decides the patient stopped treatment and begins a new era; the OMOP default is 30 days.; Appears in: OMOP Drug Exposure and Drug Era
persistence window: The maximum allowed gap in days between two consecutive diagnosis events before they are treated as separate disease periods; the OMOP default is 30 days.; Appears in: OMOP CONDITION_OCCURRENCE and CONDITION_ERA
person-days (person-years): A unit that combines people and time: one person followed for 90 days contributes 90 person-days; summing across all participants gives the total time-at-risk that forms the denominator of an incidence rate.; Appears in: Person-Time Denominator Construction
person-time: The total amount of follow-up time added up across all patients in a study — if 500 patients are each followed for two years, person-time is 1000 person-years; it is the denominator when computing an incidence rate.; Appears in: Healthcare Resource Utilization (HCRU), Immortal Time Bias Handling, Incidence Rate Calculation, Indirect Standardization, SMR, and SIR, Poisson and Negative Binomial Count Models for HCRU and Utilization, Poisson Distribution for Counts and Rates, Recurrent Events Analysis, Risk Evaluation Study (Post-Authorization Safety / Active Surveillance)
person-time at risk: The sum of calendar days a patient is both enrolled and counted in the study's follow-up — used as the denominator when calculating how often an event occurs per unit of time.; Appears in: Continuous Enrollment and Observable Time
person-time intervals: Short time segments that together cover a patient's full follow-up period; each segment carries its own exposure label so the model can see how the exposure changed over time.; Appears in: Time-Updated Exposures and Cumulative Dose
person-years: A unit that combines the number of people observed with how long each was followed — one person watched for two years contributes two person-years, the same as two people each watched for one year.; Appears in: Descriptive Epidemiology in RWE, Incidence Rate Calculation
personal identity number: A unique, stable national identifier assigned to every resident at birth or immigration in Nordic countries, used consistently across all health and administrative records to link hospital visits, prescriptions, and death certificates for the same person across their entire life.; Appears in: International Real-World Data Sources
perspective: Whose costs you count (for example the insurer/payer, the whole health system, or society), which must be fixed before adding anything up.; Appears in: Cost-minimization Analysis (CMA), Cost-of-Illness (COI) Study
pharmacy benefit: The part of a health insurance plan that covers drugs a patient picks up at a pharmacy and takes at home, such as pills or self-injected pens.; Appears in: Route-of-Administration Differences in RWE
pharmacy benefit vs medical benefit: Drugs dispensed at a pharmacy flow through the pharmacy benefit and appear as NCPDP claims with NDC codes; drugs infused in a clinic flow through the medical benefit and appear on hospital or doctor claims with HCPCS J-codes instead.; Appears in: NCPDP Pharmacy Claim Fields
pharmacy claim: A single billing record created when a prescription is filled, listing the drug, the date, and the days_supply.; Appears in: Drug Utilization Study
phenotype: A rule that turns raw EHR signals (codes, lab values, note text) into a yes/no label for an exposure, outcome, or condition, and must be checked against a chart review before you trust it.; Appears in: Administrative Claims Analysis, EHR Phenotyping Algorithms, EHR-Based Study
PICOTS: A structured way to specify a research question: Population, Intervention (or exposure), Comparator, Outcome, Timing, and Setting — fitness cannot be judged until PICOTS is fixed.; Appears in: Fit-for-Purpose Data Assessment
place of service: A two-digit code in box 24B that records where the service was delivered (for example, 11 = doctor's office, 21 = inpatient hospital), used to distinguish ambulatory visits from hospital-based services.; Appears in: CMS-1500 / 837P Professional Claim Fields, Healthcare Resource Utilization (HCRU)
placebo test: A check in which the same SCM procedure is repeated pretending each donor was the treated unit; if the true treated region's post-intervention gap is larger than almost all these fake gaps, that is the evidence of a real effect.; Appears in: Synthetic Control Method
plausibility anchor: A real-world reference value — such as the observed event rate among completers, or the strength of a known measured risk factor — used to judge whether the tipping point represents a realistic flaw.; Appears in: Tipping Point Analysis
PMPM (per-member-per-month): The budget impact spread across every covered member and every month, so plans can compare it against premiums on a familiar scale.; Appears in: Budget Impact Analysis
POA indicator: A code attached to each diagnosis on an inpatient hospital bill that says whether the condition was Present On Admission (Y = yes, already there; N = no, new during the stay); when it is N, the condition is a potential hospital-acquired complication.; Appears in: Diagnosis Position, Type, and Qualifiers on Claims, UB-04 / 837I Institutional Claim Fields
Pohar Perme estimator: The current international-standard formula for computing net survival; it corrects for the bias that arises when older patients (who have both worse cancer prognosis and worse general-population survival) exit the cohort early, by giving them higher statistical weight at each event time.; Appears in: Relative and Net Survival
point prevalence: The fraction of a population that is a case on one specific date — a snapshot of who has the condition that day.; Appears in: Prevalence (Point, Period, and Annual) in RWE
point-estimate E-value: The confounder strength needed to drag your main estimate all the way back to 1.0 (the null).; Appears in: E-value Sensitivity Analysis
Poisson-gamma mixture: The mathematical reason the negative binomial distribution arises naturally: if each patient has their own event rate drawn from a gamma distribution, the observable counts across all patients follow a negative binomial distribution.; Appears in: Negative Binomial Distribution for Overdispersed Counts
polychoric correlation: A correlation measure designed for ordinal variables with only a few levels, such as a 5-point severity scale; it estimates the underlying continuous association rather than treating the ordinal scores as if they were continuous numbers.; Appears in: Pearson and Spearman Correlation
polyhierarchy: SNOMED CT's property of allowing a single concept to have more than one parent (for example, a concept can belong to both a disease family and an anatomical location family), unlike ICD-10-CM where each code sits in exactly one place in the tree.; Appears in: SNOMED CT Clinical Terminology
pooled estimate: A single combined number (like one overall effect size) calculated by statistically merging many studies; a scoping review never produces one.; Appears in: Meta-Analysis of Observational Studies, Meta-Analysis of Randomized Controlled Trials, Scoping Review
pooled standard deviation: A single 'typical spread' for a characteristic, combining the spread from both groups, used as the denominator of the SMD.; Appears in: Baseline Characteristics and Covariate Balance
pooled variance: A single estimate of spread computed by combining the within-group variability from both groups, weighted by sample size; Student's t-test uses this instead of separate estimates, which is valid only when both groups have similar spread.; Appears in: Two-Sample (Student's) t-Test
pooled vs unpooled variance: A pooled variance blends the spread from both groups into a single estimate (Student's t-test); an unpooled variance keeps a separate spread estimate for each group (Welch's t-test), which is safer when the groups have different amounts of variability.; Appears in: Welch's t-Test (Unequal Variances)
Population: The specific group of patients who qualify for the study, defined by their diagnosis, age, insurance coverage, and any prior treatment history required before they can be included.; Appears in: PICOTS Framework for RWE
population at risk (denominator): The full set of people who could have had the outcome, which you divide by to get a rate; a case series doesn't have one because it only collects the patients themselves.; Appears in: Case Series
population attributable fraction (PAF): The proportion of all cases in the entire population that is attributable to the exposure, accounting for what fraction of the population is actually exposed; computed via Levin's formula from exposure prevalence and the risk ratio.; Appears in: Attributable Risk and Population Attributable Fraction
population life table: A statistical table that records, for each age group, sex, and calendar year, the probability that a person from the general population survives to the next age; it is the benchmark for what "normal" survival looks like.; Appears in: Relative and Net Survival
population vs sample: The population is every patient (or event) the study aims to say something about; the sample is the subset actually observed — inference is the act of reasoning from the second to the first.; Appears in: Inferential Statistics Foundations
population-average effect: The average difference in outcome between two groups across the entire study population, as opposed to the predicted change for one specific individual.; Appears in: GEE Population-Average (Marginal) Models
population-averaged effect: An estimate of the difference in the average outcome across an entire group of patients, rather than for any one individual.; Appears in: Longitudinal Outcomes Modeling
portability: Whether an NLP model trained on notes from one hospital works accurately on notes from a different hospital; models often fail at new sites because phrasing conventions and note templates differ across institutions.; Appears in: NLP for Clinical Text in RWE
positive predictive value (PPV): The fraction of events flagged by the algorithm that turn out to be real events after chart review — for example, PPV 0.84 means 84 out of every 100 flagged events are confirmed as true.; Appears in: Algorithm Validation, Claims Outcome Algorithm PPV/Sensitivity Trade-off, Diagnosis Phenotype Algorithm (1 IP / 2 OP, Time Window), EHR Phenotyping Algorithms, Endpoint Adjudication and Chart Review, NLP for Clinical Text in RWE, Outcome Algorithm Construction, Safety Signal Case Definition
positivity: The requirement that every type of patient in the study must have at least some chance of remaining in follow-up; if certain patients are virtually guaranteed to drop out, their censoring weight becomes impossibly large and the method breaks down.; Appears in: Generalizability, Transportability, and External Validity, Inverse Probability of Censoring Weighting (IPCW)
post-hoc comparison: A pairwise test run after a significant ANOVA to determine which specific pairs of groups differ, using a multiplicity correction (such as Tukey HSD) to avoid false positives from testing many pairs at once.; Appears in: One-Way ANOVA
post-hoc pairwise comparison: A follow-up test applied after a significant omnibus result to identify which specific pairs of groups are responsible for the overall difference, with an adjustment to prevent false positives from accumulating across multiple comparisons.; Appears in: Kruskal-Wallis Test
post-treatment adjustment bias: The distortion that results when an analyst controls for a variable that was caused by the treatment itself; this changes what question is being answered and can introduce new, misleading associations.; Appears in: Causal Mediation and Effect Modification
postcoordination: A way of building a SNOMED CT expression on the fly by combining a base concept with extra qualifiers (like a finding site or severity) to describe clinical detail that no single pre-defined concept covers — rarely used in routine EHR coding but supported by the terminology standard.; Appears in: SNOMED CT Clinical Terminology
posterior: The updated probability distribution over the unknown parameter after combining the prior with the likelihood; it is the complete summary of knowledge about the parameter given both the prior belief and the observed data.; Appears in: Bayesian Inference Foundations
posterior probability: The model's estimated probability that a given patient belongs to each class, computed after fitting the model — a patient might have a 0.80 probability of being a consistent adherer and a 0.20 probability of being a moderate adherer.; Appears in: Group-Based Trajectory Models and Latent Class Analysis
power: The chance that a study will correctly find a real difference as statistically significant, usually set at 80% or 90%.; Appears in: Sample Size, Power, and Precision in RWE
power prior: A specific borrowing technique that discounts the historical control data by raising its statistical contribution to a fraction called a0, where a0 = 1 means full use and a0 = 0 means ignore it entirely.; Appears in: Bayesian Borrowing from Historical / External Controls
PRAC: The Pharmacovigilance Risk Assessment Committee — the European Medicines Agency body that reviews drug safety, endorses imposed PASS protocols, and decides what regulatory action the results should trigger.; Appears in: Imposed Post-Authorisation Safety Study (PASS)
pre-intervention period: The span of time before the policy or program took effect, used to fit the donor weights; good fit during this period is the prerequisite for trusting the post-intervention gap.; Appears in: Synthetic Control Method
pre-post design: A study where the same patients are measured at two time points — before and after an event, treatment, or policy — and the question is whether anything changed on average across the group.; Appears in: Paired t-Test
pre-registered protocol: A written plan, posted in a public registry like PROSPERO before you look at any results, that fixes your question and methods so you cannot quietly change them once you see how the numbers come out.; Appears in: Systematic Review
pre-specification: Writing down every key study decision (who is included, what the exposure and outcome are, how the analysis runs) before you look at results, so those choices cannot be quietly changed after seeing the data.; Appears in: Study Protocol and SAP Elements for RWE
pre-specified: Decided and written down before the data are analyzed, so the choice cannot have been influenced by knowing what the result would be — the opposite of choosing a method after seeing which one gives the most favorable answer.; Appears in: Regulatory and HTA Readiness for RWE
pre-test vs post-test probability: Your belief that the patient has the disease before seeing the test result versus after seeing it; the likelihood ratio is the tool that moves you from one to the other.; Appears in: Diagnostic Likelihood Ratios
pre-treatment survival window: The stretch of days between cohort entry and a patient's very first treatment fill, during which the patient must stay alive (and event-free) simply to become eligible to be called 'treated'; this window is sometimes called immortal time because no outcome can happen there by the study's own rules.; Appears in: Immortal Time Bias Handling
PRECIS-2: A nine-domain scoring tool that researchers use to rate how pragmatic or explanatory a trial design is, helping them be explicit about design choices rather than just calling a trial pragmatic.; Appears in: Pragmatic Trial
precision: How narrow the confidence interval (the range of uncertainty) around an estimated effect is; more patients generally means a narrower, more precise interval.; Appears in: Sample Size, Power, and Precision in RWE
predicted probability: A number between 0 and 1 that the model assigns to each patient, representing how likely the model thinks that patient is to experience the outcome (for example, 0.8 means the model thinks there is an 80% chance of the event).; Appears in: Brier Score
predicted probability of dependency: The Faurot index's output — a model's estimate of how likely a patient is to need help with daily activities, used as a continuous frailty proxy.; Appears in: Claims-Based Frailty Index (Faurot / Kim)
prediction vs causation: Prediction asks who will have an outcome (correlations are fine); causation asks whether a treatment caused the outcome (requires ruling out other explanations).; Appears in: Predictive and Causal ML Models
preferential independence: The assumption behind adding weighted scores - how much you care about improving one criterion must not depend on the level of another.; Appears in: Multi-Criteria Decision Analysis (MCDA)
Prentice/Barlow weights: Numerical multipliers assigned to each subject's contribution to the statistical model that correct for the fact that only a fraction of the cohort was measured; without these weights the hazard-ratio estimate is biased.; Appears in: Case-Cohort Design
prescribing vs dispensing: A prescribing record (as in UK CPRD) documents that a doctor wrote a prescription, but does not confirm the patient collected or took the medication; a dispensing record (as in Nordic prescription databases) confirms the patient actually received the drug at a pharmacy.; Appears in: International Real-World Data Sources
prescription abandonment: A specific form of primary non-adherence where the patient goes as far as the pharmacy counter but leaves without the drug, often because of cost or side-effect concerns.; Appears in: Primary Non-Adherence and Treatment Initiation
present value: The current-day worth of an amount that will be paid or received in the future, after applying the discount rate for every year between now and then.; Appears in: Discounting of Costs and Effects in Economic Evaluation
present-on-admission (POA): A required flag on inpatient claims indicating whether each secondary diagnosis was present when the patient arrived at the hospital; only POA-coded conditions can qualify as CCs or MCCs to drive up the DRG severity tier.; Appears in: MS-DRG (Medicare Severity Diagnosis-Related Groups)
prespecification: Committing in writing — before the data are analyzed — exactly which hypotheses will be tested and what correction will be applied, so that choices made after seeing the data cannot silently inflate the false-positive rate beyond what was budgeted.; Appears in: Multiplicity and Multiple Comparisons
prevalence: The proportion of a population that has a condition at a specific point in time or during a defined period, reflecting how common the condition currently is rather than how fast it is arising.; Appears in: Burden of Disease and Cost-of-Illness (COI) Studies, Cross-Sectional Study, Descriptive Epidemiology in RWE, Positive and Negative Predictive Value
prevalence paradox: The phenomenon where kappa is misleadingly low even when observed agreement is high, because one outcome category is so dominant that chance agreement alone would explain most of the observed agreement.; Appears in: Agreement Statistics: Kappa, ICC, and Bland-Altman
prevalent new-user: A patient who has already been taking the comparator drug for some time and then switches to the study drug, as opposed to a brand-new starter who has never taken either drug.; Appears in: Prevalent New-User Design
prevalent user: A patient who is already taking the drug when observation begins, rather than just starting it; including these patients can make the treated group look healthier than it really is.; Appears in: New-User (Incident-User) Design
principal diagnosis: The condition a hospital coder assigns at discharge as the main reason for the entire inpatient stay, after all test results and workup are reviewed — it can differ from what the doctor suspected when the patient arrived.; Appears in: Diagnosis Position, Type, and Qualifiers on Claims
prior: A probability distribution over an unknown parameter that encodes what is believed about it before the current data are analyzed; in Bayesian updating it is combined with the data's likelihood to form the posterior.; Appears in: Bayesian Borrowing from Historical / External Controls, Bayesian Inference Foundations
prior effective sample size: The number of real observations that the prior is informationally equivalent to; a Beta(2,8) prior has effective sample size 10, meaning it carries as much weight as 10 observed data points when updating the posterior.; Appears in: Bayesian Inference Foundations
prior persistence: How many days a patient had been continuously taking the first drug before the second drug was added — used to distinguish a planned combination from an augmentation after an adequate treatment trial.; Appears in: Switch, Add-On, and Augmentation Rules
prior-data conflict: The situation where the borrowed historical control group behaves differently from the current control group, signaling that the historical patients may not be comparable and the discount should increase.; Appears in: Bayesian Borrowing from Historical / External Controls
PRISMA flow: A standard accounting diagram that tracks every record from the initial search down to the final included studies, showing how many were removed at each stage and why.; Appears in: Systematic Review
PRO instrument: A validated questionnaire used to collect PRO data, such as the PROMIS Physical Function scale or the FACT-G quality-of-life scale; each has a defined score range and a known MID.; Appears in: Patient-Reported Outcomes in Real-World Settings
probabilistic bias analysis: A version of bias analysis where the bias parameters are treated as uncertain ranges rather than fixed values; the computer draws thousands of plausible combinations and returns a spread of corrected estimates instead of one number.; Appears in: Quantitative Bias Analysis Toolkit
probabilistic linkage: Scoring each candidate record pair on how well multiple fields agree, weighting fields by how discriminating they are, and accepting pairs whose total score clears a threshold — recovers matches that exact rules miss but can also link two different people if scores are calibrated poorly.; Appears in: Linked Multi-Database Study (Record Linkage), Tokenization and Privacy-Preserving Record Linkage
probabilistic phenotype: A phenotype that uses a statistical model to score each patient's likelihood of having the condition, combining many weak signals rather than hard cutoffs.; Appears in: EHR Phenotyping Algorithms
probabilistic sensitivity analysis: A technique in health-economics modelling that randomly draws plausible values for every uncertain parameter (typically 5,000 to 10,000 times) to show how much the overall cost-effectiveness conclusion might change depending on what the true values turn out to be.; Appears in: Beta Distribution for Proportions and Utilities, Probabilistic Sensitivity Analysis (PSA) for Health-Economic Models
probabilistic sensitivity analysis (PSA): Re-running a cost-effectiveness model thousands of times with different plausible values for every uncertain input, to see how often each strategy comes out on top.; Appears in: Value of Information Analysis (EVPI, EVPPI, EVSI)
problem list: The running list in a patient's EHR of their active conditions, typically coded in SNOMED CT; entries are added by clinicians and may persist for years, which is different from a claim that is generated at each encounter.; Appears in: SNOMED CT Clinical Terminology
process metric: A measurement of whether a safety program is being carried out as required — for example, the percentage of prescription fills that had a mandatory monitoring test completed beforehand — as distinct from measuring whether the harm itself occurred.; Appears in: Risk Evaluation Study (Post-Authorization Safety / Active Surveillance)
prodrome: The early set of symptoms — such as pain, bloating, or breathlessness — that appear before a disease is formally diagnosed, which can prompt a prescription for the drug under study.; Appears in: Protopathic Bias and Reverse Causation
product-limit: The mathematical approach behind the Kaplan-Meier curve: multiply together the fraction who survived each individual event time to get the cumulative probability of being event-free up to any given time.; Appears in: Kaplan-Meier Estimator
professional claim: The bill submitted by a physician, nurse practitioner, or other individual provider for their own services, filed on a CMS-1500 form (or its electronic equivalent, the 837P), as distinct from the facility bill filed by the hospital.; Appears in: CPT Codes (HCPCS Level I), Place of Service (POS) Codes, Procedure Identification and Measurement in Claims and EHR
profile likelihood confidence interval: A confidence interval built by inverting the likelihood-ratio test rather than using a symmetric normal approximation; more accurate than the Wald interval when estimates are near boundary values or sample sizes are small.; Appears in: Maximum Likelihood Estimation
profile-likelihood confidence interval: A type of confidence interval calculated by re-fitting the model many times to find where the evidence becomes too weak, rather than using a simple formula based on the standard error; this is the only valid CI to report after Firth estimation.; Appears in: Firth Penalized Regression
prognostic score: Another name for a disease risk score - a single number summarizing baseline outcome risk (prognosis).; Appears in: Disease Risk Scores
progression-free survival: The probability that a patient is still alive AND has not yet had their cancer worsen or spread; it drops over time as patients either progress or die.; Appears in: Partitioned Survival Model
promotional drift: The gradual shift of a lay summary's language toward marketing framing — overstating benefit, downplaying harm, or implying certainty — which is explicitly prohibited in EU Clinical Trial Regulation lay summaries and can mislead patient decision-making.; Appears in: Plain-Language Summaries of Evidence
propensity score: A single summary number (between 0 and 1) that captures how similar each external-control patient is to the trial patients based on their recorded characteristics, used to balance the two groups before comparing outcomes.; Appears in: Comparative Effectiveness Research (CER) Methods, Missing Data, Trimming, and Winsorization in RWE, Propensity Score Methods (PSM, IPTW), Single-Arm Trial with External (Historical) Control, Targeted Maximum Likelihood Estimation (TMLE)
propensity score weighting: A statistical technique that assigns each external control patient a weight based on how similar their characteristics are to the trial patients, effectively balancing the two groups as if they had been randomized.; Appears in: Rare Disease External Controls
proper scoring rule: A way of grading probability predictions that can only be made better by giving your honest best estimate — you cannot improve your score by inflating or deflating probabilities, so the score rewards truthful predictions.; Appears in: Brier Score
proportional hazards: The assumption that the ratio of event rates between two groups stays roughly the same throughout the entire follow-up period; when this breaks down, a single hazard ratio is an average that can mask very different early versus late effects.; Appears in: Log-Rank Test, The Hazard Ratio as an Effect Measure, Weibull Distribution for Time-to-Event Data
proportional hazards assumption: The model's core requirement that the ratio of event rates between the two groups stays roughly constant throughout follow-up — if one group's risk starts high and the other's catches up later, a single hazard ratio is misleading.; Appears in: Cox Proportional Hazards Regression
proportional hazards vs AFT: The proportional hazards (PH) model assumes the ratio of hazards between groups is constant throughout follow-up; AFT models assume the entire time axis is stretched or compressed by a constant factor — the two are equivalent only for the Weibull distribution.; Appears in: Accelerated Failure Time (AFT) Models
prospective cohort: The contrasting design where researchers enroll patients now and collect new data forward in time as events unfold.; Appears in: Retrospective Cohort Study Design
prospective enrollment: Signing a patient up for the study while the outcome is still unknown, so the researchers cannot accidentally select only the cases that ended badly.; Appears in: Pregnancy Exposure Registry
prospective follow-up: Patients are enrolled first and then observed going forward in time, so the study collects new data as events happen rather than looking backward through old records.; Appears in: Product/Exposure Registry
protopathic bias: A distortion that makes a drug look as though it caused a disease when, in reality, the early symptoms of that disease prompted the doctor to prescribe the drug before the disease was officially diagnosed.; Appears in: Exposure Lag, Induction, and Latency Windows, Protopathic Bias and Reverse Causation
provider-based billing: A billing arrangement that allows a physician practice owned by or affiliated with a hospital to file its professional claims under the hospital's provider number, which shifts the POS code from office (11) to hospital outpatient (19 or 22) and triggers an additional facility fee — even if the physical location of care has not changed.; Appears in: Place of Service (POS) Codes
proxy confounder: A code in the data (such as a frequent ophthalmology visit) that is not the unmeasured factor itself (like diabetic eye disease severity) but tracks closely enough with it to partially stand in for it when adjusting for confounding.; Appears in: High-Dimensional Propensity Score (hdPS)
PRR: Proportional Reporting Ratio: the share of reports mentioning the drug that also mention the event, divided by that same share for all other drugs combined; a value well above 1 suggests a reporting excess.; Appears in: Signal Detection (Disproportionality Analysis)
pseudo-population: The reweighted version of the study population created by IPTW, where the statistical association between a patient's health status and their treatment decision has been removed, so the treatment is effectively unconfounded.; Appears in: Inverse Probability of Censoring Weighting (IPCW), Marginal Structural Models and G-Methods
pseudomedian: The effect estimate that pairs with the Wilcoxon signed-rank test — it is the median of all possible pairwise averages of the differences, not the simple median of the differences; the two numbers are equal only when the difference distribution is symmetric.; Appears in: Wilcoxon Signed-Rank Test
psychometric testing: Statistical analysis run on questionnaire responses from a development sample to confirm the tool is consistent (reliability), measures the intended concept (validity), and can detect real change over time (responsiveness).; Appears in: PRO Instrument Development
purposive sampling: Choosing interview participants intentionally based on a specific characteristic — such as the patients who filled their prescriptions least often — rather than picking them at random.; Appears in: Mixed-Methods Study, Qualitative Interview Study

Q

QALY: A quality-adjusted life-year is one year of life weighted by how healthy that year is, where 1.0 means perfect health and 0.0 means a state equivalent to death, so a year spent at 0.7 utility counts as 0.7 QALYs.; Appears in: Cost-Utility Analysis (CUA), Discounting of Costs and Effects in Economic Evaluation, Discrete-Event Simulation Using RWE Inputs, Distributional Cost-Effectiveness Analysis, Health Economic Modeling Methods Using RWE, Health-Related Quality of Life (HRQoL) Measurement, ICER and Net Monetary Benefit (NMB), Partitioned Survival Model, QALY Utility Mapping (Crosswalking to Health-State Utilities), Value of Information Analysis (EVPI, EVPPI, EVSI)
QALY (quality-adjusted life-year): A single number that combines how long a patient lives with how healthy they feel during that time: one year in perfect health equals 1.0 QALY, while one year with a serious disability might equal 0.5 QALY.; Appears in: Cost-effectiveness Analysis (CEA)
QQ plot: A quantile-quantile plot that compares observed data quantiles to expected normal quantiles; points falling on a straight line indicate normality, while S-shaped curves indicate skewness and banana shapes indicate heavy tails.; Appears in: Normal Distribution and the Central Limit Theorem
qualitative strand: The part of the study that captures meaning through words — interviews, focus groups, or structured analysis of clinical notes — to understand why patients or clinicians behave the way they do.; Appears in: Mixed-Methods Study
qualitative study: A study that collects words rather than numbers — recorded interviews or focus-group discussions — to understand how people experience or feel about something.; Appears in: Qualitative Evidence Synthesis
quantile (percentile): A cut-point in a ranked list of values; the 50th percentile (median) is the middle value, the 90th percentile is the value below which 90% of patients fall.; Appears in: Quantile Regression
quantile crossing: A finite-sample artefact where fitted quantile lines for different percentiles (say, the 50th and 75th) cross each other at some covariate values, which is mathematically impossible in the true distribution and indicates a model problem requiring correction.; Appears in: Quantile Regression
quantitative bias analysis: A family of methods that convert a qualitative concern about bias (such as imperfect coding) into a number — a corrected estimate and an interval — so reviewers can audit the assumption rather than just accept a verbal caveat.; Appears in: Misclassification Bias Correction, Quantitative Bias Analysis Toolkit
quantitative strand: The part of the study that produces numbers — counts, rates, costs, or comparisons — usually drawn from insurance claims, electronic health records, or a registry.; Appears in: Mixed-Methods Study
quasi-likelihood: An extension of the GLM that keeps the mean model but inflates standard errors to account for extra variability (overdispersion) without committing to a fully specified distributional family; quasi-Poisson and quasi-binomial are the most common examples.; Appears in: Generalized Linear Models (GLM)

R

R-squared: The fraction of the total variability in the outcome that is explained by the model's predictors together; it ranges from 0 (model explains nothing) to 1 (model explains everything), but a high R-squared does not mean the coefficients are causal.; Appears in: Ordinary Least Squares (OLS) Linear Regression
R-squared (R2): A statistic from 0 to 1 that says how well one variable predicts another; an R2 of 0.90 means 90% of the variation in the true-endpoint effect across trials is explained by the surrogate-endpoint effect.; Appears in: Surrogate Endpoint Validation
random assignment: Each participant is assigned to a treatment or comparison group by chance, like a coin flip, so the groups start out roughly equal on every characteristic, even ones nobody measured.; Appears in: Pragmatic Trial
random intercept: A patient-specific offset that shifts that individual's starting value up or down from the group average baseline, capturing the fact that people differ at the start.; Appears in: Mixed-Effects (Random-Effects) Models for Longitudinal RWE
random seed: A number passed to a software random-number generator before any stochastic step — like bootstrapping or multiple imputation — that makes the randomly selected values reproducible; without a seed, the same code produces slightly different numbers each time it runs.; Appears in: QC, Double Programming, and Reproducible Analysis
random slope: A patient-specific adjustment to the rate of change over time, capturing the fact that some people improve faster or slower than the average trend.; Appears in: Mixed-Effects (Random-Effects) Models for Longitudinal RWE
randomization: Assigning each patient to a treatment group by chance (equivalent to flipping a coin), so the two groups start out comparable on everything except the treatment being tested.; Appears in: Registry-Based Randomized Controlled Trial (RRCT)
rank: The position of a value when every observation from all groups is sorted from smallest to largest together — the smallest value gets rank 1, the next gets rank 2, and so on, regardless of which group it came from.; Appears in: Kruskal-Wallis Test, Mann-Whitney U Test (Wilcoxon Rank-Sum), Parametric and Nonparametric Tests
rank correlation: A correlation computed after replacing each raw data value with its rank in the sorted list; this removes the influence of extreme values because the largest observation simply gets the top rank, regardless of how extreme it is.; Appears in: Pearson and Spearman Correlation
rank sum: The total of all rank positions assigned to one group; if group A has ranks 1, 3, 5, and 8, its rank sum is 17.; Appears in: Mann-Whitney U Test (Wilcoxon Rank-Sum)
rank-based standard error: An analytic formula for the uncertainty of a quantile regression coefficient that does not require assuming a distribution for the outcome; the default in most quantile regression software for quantiles near the median.; Appears in: Quantile Regression
rare-event approximation: When a binary outcome is very rare (probability well below 5%), the Poisson distribution closely approximates the binomial, so Poisson regression can be used on a binary endpoint to produce risk ratios (via Zou's robust-variance method) instead of odds ratios from logistic regression.; Appears in: Poisson Distribution for Counts and Rates
rate: A count of events divided by the population (or time) at risk, like "3 cases per 1,000 people"; it needs a denominator a case series cannot supply.; Appears in: Case Series
rate (λ, lambda): The average number of events per unit of observation time, the single number that fully specifies the Poisson distribution; it equals both the mean and the variance under the Poisson assumption.; Appears in: Poisson Distribution for Counts and Rates
rate per 100,000: How many events happened for every 100,000 people in the group, which lets you compare groups of different sizes fairly.; Appears in: Ecological (Aggregate) Study
rate ratio: The ratio of the event rate in one group to the event rate in another group, expressed as events per unit of person-time; a rate ratio of 0.80 means the treated group had 20% fewer events per year than the comparison group.; Appears in: Negative Binomial Distribution for Overdispersed Counts
ratio of rate ratios: The interaction estimate on the multiplicative scale — the effect in subgroup B divided by the effect in subgroup A; a value of 1.0 means no multiplicative interaction (the effects are the same in both groups).; Appears in: Subgroup Analysis and Heterogeneity of Treatment Effect
readability grade level: A number produced by formulas such as Flesch-Kincaid that estimates the school grade a reader needs to understand a text; a grade 6–8 target is standard for patient communications, but the formula is a rough screen, not a substitute for clear writing.; Appears in: Plain-Language Summaries of Evidence
readmission: A new, separate hospital admission that occurs after a prior stay has fully ended — typically measured within 30 days of the prior discharge.; Appears in: Hospitalization and Transfer Collapse
real-world progression (rwP): A disease-worsening event assigned by reading the patient's radiology reports or clinician notes and recording the date the record first indicates the cancer grew or spread — constructed by a trained abstractor rather than measured by a standardized trial protocol.; Appears in: Real-World Progression and rwPFS
reason for visit: The presenting complaint or chief reason a patient came to an outpatient or emergency department visit, recorded in a separate field on the hospital's outpatient billing form and useful for studying why patients seek care.; Appears in: Diagnosis Position, Type, and Qualifiers on Claims
recalibration: Refitting just the intercept and slope of the model on new data so that predicted probabilities agree with observed rates, while leaving the underlying ranking of patients unchanged.; Appears in: Prediction Model Validation and Recalibration in RWE
recall period: The time window a question asks patients to think back over, such as 'in the past 7 days' or 'in the past 24 hours' — the choice affects both what patients can accurately remember and how quickly the score responds to treatment.; Appears in: PRO Instrument Development
rechallenge: The patient is given the drug again and the same problem comes back — stronger evidence the drug was involved, but only documented when it happened naturally, never set up on purpose.; Appears in: Case Report, Restart, Rechallenge, and New-Episode Rules
recurrent event: An outcome that can happen more than once to the same patient over follow-up, such as hospitalizations, disease flares, or infections.; Appears in: Recurrent Events Analysis
reference population: The group of untreated (unexposed) or historical patients in which the disease risk score model is fit.; Appears in: Disease Risk Scores
reference rate: An event rate drawn from an external source (such as national vital statistics or a cancer registry) that represents what happens in the general population within a given age group, sex, and time period.; Appears in: Indirect Standardization, SMR, and SIR
reference standard: A published table of typical growth values (median and spread) for children by age and sex, such as the WHO Child Growth Standards or CDC 2000 growth charts, that z-scores are calculated against.; Appears in: Diagnostic Accuracy Study, Diagnostic Likelihood Ratios, Pediatric Growth and Development Endpoints in RWE, Sensitivity and Specificity
referent window: An earlier stretch of time in the same patient's history — for example, the 7-day period ending 30 days before the event — used as the comparison, standing in for what exposure looked like when nothing acute was happening.; Appears in: Case-Case-Time-Control Design, Case-Crossover Design, Case-Time-Control Design
refill number: A sequential counter on each fill (0 = new prescription, 1 = first refill, etc.) used to identify the index fill for new-user cohort designs.; Appears in: NCPDP Pharmacy Claim Fields
regimen: The specific drug or combination of drugs that make up a single line of therapy (for example, erlotinib alone, or carboplatin plus pemetrexed together).; Appears in: Treatment Patterns and Lines of Therapy (LOT)
registry: An ongoing database that records clinical information on every patient who meets a defined condition or undergoes a defined procedure — for example, all patients who had an emergency heart procedure at a participating hospital — as a normal part of care, not for a specific study.; Appears in: Disease Registry, Registry-Based Randomized Controlled Trial (RRCT)
regression coefficient: A number the model fits for each variable; for the treatment arm, the coefficient is the natural logarithm of the odds ratio, so taking exp(coefficient) gives you the odds ratio directly.; Appears in: Logistic Regression for Binary Outcomes
regression to the mean: The tendency for patients selected because of an extreme measurement (very high costs, very high lab values) to have a less extreme measurement at the next observation purely by chance, even without any treatment — making a simple before-after comparison look like an improvement when none occurred.; Appears in: Paired t-Test
regulatory commitment: A legally binding requirement that a regulatory agency (such as EMA or FDA) attaches to a drug's approval, requiring the company to complete a specific study or action.; Appears in: Voluntary (Non-Imposed) Post-Authorisation Safety Study
regulatory obligation: A legal requirement attached to a drug approval, meaning the company must comply or face formal regulatory consequences such as fines, label changes, or loss of the marketing authorization.; Appears in: Imposed Post-Authorisation Safety Study (PASS)
regulatory-grade: Describes a study whose documentation, data, and analysis choices meet the standards a regulator (FDA or EMA) requires before using the results to make a drug-approval or safety decision.; Appears in: Regulatory and HTA Readiness for RWE
reification trap: The mistake of treating a model's latent classes as if they are real, discovered biological entities; in fact, they are mathematical approximations that depend on the analyst's choices and will change if the data or model specification changes.; Appears in: Group-Based Trajectory Models and Latent Class Analysis
relative incidence: The ratio of the event rate during the risk interval to the event rate during the control interval; a value above 1 means the event happened more often right after the exposure.; Appears in: Self-Controlled Risk Interval (SCRI) Design
relative weight: A published number (e.g., 1.44) that represents how resource-intensive the average case in a DRG is compared with the average Medicare inpatient case; multiply it by the hospital's base dollar rate to estimate payment.; Appears in: MS-DRG (Medicare Severity Diagnosis-Related Groups)
relevance: Whether a data source actually captures the patients, drug exposure, outcome, important background characteristics, and length of follow-up that a specific study question requires.; Appears in: Fit-for-Purpose Data Assessment
reliability: A questionnaire is reliable if the same patient, in the same health state, gets roughly the same score on two separate occasions — the measure is consistent, not random.; Appears in: Fit-for-Purpose Data Assessment, PRO Instrument Validation
REMS: Risk Evaluation and Mitigation Strategy — an FDA-required safety program that may restrict who can prescribe or receive a drug, require monitoring tests before each dispensing, or mandate a patient agreement; RWE is used to measure whether the program is actually working.; Appears in: Risk Evaluation Study (Post-Authorization Safety / Active Surveillance)
rendering provider: The individual clinician who actually performed the service, identified by their NPI in box 24J; this is often different from the billing provider (box 33), which may be a practice group.; Appears in: CMS-1500 / 837P Professional Claim Fields
rendering vs billing provider: The rendering provider is the individual who actually delivered the service; the billing provider is the entity (often a group practice) that submitted the claim. Different NPI fields on the claim capture each role.; Appears in: NPI (National Provider Identifier)
repackager: A company that buys a drug from its original manufacturer and redistributes it in different quantities or containers under a new NDC — creating additional valid NDC codes for the same underlying drug that must be included in any complete drug code list.; Appears in: NDC (National Drug Code)
repeated measures: Multiple outcome values collected from the same patient across different time points, as opposed to a single measurement taken once.; Appears in: Longitudinal Outcomes Modeling
replacement claim: A corrected resubmission that supersedes an original claim, typically changing billed amounts, diagnosis codes, or provider information; for institutional claims, the Type of Bill frequency digit is 7.; Appears in: Claim Adjustments, Reversals, and Denials
RERI (relative excess risk due to interaction): The interaction measure on the additive (risk-difference) scale, equal to the joint effect of two factors minus the sum of their separate effects; RERI = 0 means no additive interaction, RERI > 0 means supra-additive (synergistic) effects.; Appears in: Subgroup Analysis and Heterogeneity of Treatment Effect
resampling with replacement: Drawing a new sample of the same size from the observed data while allowing the same observation to appear multiple times, so each draw is independent of the others.; Appears in: Bootstrap and Resampling Methods
Research question drift: The gradual, unplanned narrowing or broadening of a study question that happens when eligibility rules are adjusted after analysts have already seen the data, which can make results look stronger than they really are.; Appears in: PICOTS Framework for RWE
residual: The difference between what a patient actually experienced (e.g., 10 days in hospital) and what the model predicted (e.g., 7 days); a positive residual means the model underpredicted, a negative residual means it overpredicted.; Appears in: Ordinary Least Squares (OLS) Linear Regression, Regression Diagnostics and Model Checking
residual confounding: Systematic error that remains in an estimate even after statistical adjustment, because important factors that influence both who gets treated and who develops the outcome were not fully measured or controlled.; Appears in: Negative Control Exposures, Negative Control Outcomes
responder: A patient whose PRO score improved by at least the MID from baseline to follow-up, meaning the benefit they reported was large enough to matter.; Appears in: Patient-Reported Outcomes in Real-World Settings
responsiveness: A questionnaire is responsive if its score changes meaningfully when a patient's health truly improves or worsens, and stays stable when health is stable.; Appears in: PRO Instrument Validation
REST API: A web-based interface that lets software request data using standard internet calls, rather than requiring a direct database connection; FHIR uses REST APIs to deliver clinical data between systems.; Appears in: FHIR and Healthcare Interoperability for RWE
restricted cubic spline (RCS): A piecewise cubic curve that is constrained to be linear (straight) outside the outermost anchor points, preventing wild extrapolation in the tails; also called a natural cubic spline.; Appears in: Splines and Flexible Functional Forms
retention rate: The share of patients who remain continuously observable through a given time point, calculated as the number still being followed divided by the number who started.; Appears in: Attrition and Loss to Follow-Up
retransformation bias: The systematic underestimate of the arithmetic mean that occurs when a log-scale predicted value is naively converted back to the original scale using exp() without a smearing correction; it is a mathematical property of the log-normal distribution, not a modeling error.; Appears in: Log-Normal Distribution and the Retransformation Problem
retransformation problem: The bias that arises when you log-transform costs, fit a regression, then exponentiate the results to get back to dollars — the exponentiated fitted value gives the geometric mean, not the arithmetic mean that budget models need, unless a separate smearing correction is applied.; Appears in: Gamma Distribution for Cost and Skewed Outcomes
retrospective: The data were recorded before the research question was asked, so the whole study sits in the past even though the analyst runs it today.; Appears in: Retrospective Cohort Study Design
revenue center: A hospital department or cost center identified on each line of a UB-04 institutional claim, telling the analyst which part of the hospital generated the charge on that line.; Appears in: Revenue (Center) Codes
Revenue code line: A detail row in FL42–47 that itemizes one category of service (e.g., emergency room, pharmacy, physical therapy) by revenue code; the outpatient procedure code (CPT/HCPCS) lives at this line level in FL44, and total billed charges appear in FL47.; Appears in: UB-04 / 837I Institutional Claim Fields
reversal transaction: An NCPDP transaction type (B2) that voids a previously paid pharmacy claim — for example when a patient returns unused medication — and must be removed from research data to avoid counting a fill that was never actually kept.; Appears in: NCPDP Pharmacy Claim Fields
reverse causation: When the outcome being studied actually causes the exposure being studied, rather than the other way around, making the causal direction in data appear the opposite of reality.; Appears in: Cross-Sectional Study, Protopathic Bias and Reverse Causation
right censoring: The most common situation in real-world studies: a patient stops being observed before the event occurs, so you know only that the event — if it happened at all — occurred after observation ended.; Appears in: Censoring: Types, Mechanisms, and Informativeness
right-skewed distribution: A cost pattern where most patients have low-to-moderate costs but a small tail of patients has very high costs, pulling the average far above the amount a typical patient actually spends.; Appears in: Cost Outlier Handling (Winsorization, Trimming, Robust/Two-Part Models)
risk: The fraction of patients in a group who had the outcome during the follow-up window, for example 90 out of 1000 is a risk of 0.09.; Appears in: Number Needed to Treat (and Number Needed to Harm)
risk (incidence proportion): The fraction of patients in a group who had the outcome during a defined time window — for example, 36 out of 300 patients is a risk of 0.12; risk requires a fixed time window, and without one you are computing a rate instead.; Appears in: Risk Ratio and Risk Difference
Risk adjustment (HCC model): A government formula that sets the monthly payment to a Medicare Advantage plan based on the diagnosed conditions of its enrollees; more serious diagnoses on record mean higher payments, creating an incentive to capture every possible diagnosis.; Appears in: Medicare FFS vs Medicare Advantage vs Commercial Claims Differences
risk difference: The exposed group's risk minus the unexposed group's risk; an RD of 0.06 means 6 additional events per 100 exposed patients over the stated window, or 60 per 1,000.; Appears in: Marginal Effects and Interpretation of Inferential Statistics, Observational Comparative Effectiveness Research, Risk Ratio and Risk Difference, Tipping Point Analysis
risk interval: The short window of time immediately after the exposure (for example, days 1 through 8 after a vaccine dose) when a biological effect is considered plausible and events are counted.; Appears in: Self-Controlled Risk Interval (SCRI) Design
risk period: The short window of days immediately after an exposure (such as a vaccine dose) during which a biologically plausible adverse event might occur.; Appears in: Self-Controlled Case Series (SCCS)
risk ratio: The probability (risk) of the event in one group divided by the probability in another; for common outcomes the risk ratio is closer to 1.0 than the odds ratio, so the two numbers can look quite different even when describing the same 2x2 table.; Appears in: Binomial Distribution and the Logit Link, Observational Comparative Effectiveness Research, Risk Ratio and Risk Difference
risk ratio (RR): How many times the risk of an outcome is in the treated group compared to the control group; RR below 1 means the treatment lowered the risk.; Appears in: E-value Sensitivity Analysis, Meta-Analysis of Randomized Controlled Trials
risk score: A number assigned to each patient by a model, meant to reflect how likely that patient is to have an outcome — higher usually means higher predicted risk.; Appears in: ROC Curves, AUC, and the c-statistic
risk set: At any given moment in a survival study, the group of patients who are still being observed and have not yet had the event — only these patients count toward the denominator in the survival calculation at that time.; Appears in: Censoring: Types, Mechanisms, and Informativeness, Kaplan-Meier Estimator, Log-Rank Test, Nested Case-Control Design
risk-minimization measure: Any intervention designed to prevent or reduce a known drug harm — examples include restricted prescriber certification, mandatory pregnancy testing, or patient enrollment in a monitoring registry.; Appears in: Risk Evaluation Study (Post-Authorization Safety / Active Surveillance)
risk-of-bias appraisal: A structured check of each included study asking whether its design or conduct could have pushed its result in a particular direction, so weak studies are flagged rather than trusted equally.; Appears in: Scoping Review, Systematic Review
risk-set sampling: The method of picking controls by drawing randomly from the risk set at the case's event time, so that controls come from the same follow-up context as the case.; Appears in: Nested Case-Control Design
RMST: Restricted Mean Survival Time — the average number of event-free months a patient experiences up to a pre-chosen horizon (e.g., 24 months); an absolute, assumption- light alternative to the HR that translates directly into clinical and economic terms.; Appears in: The Hazard Ratio as an Effect Measure
robust standard errors: Standard errors calculated using a formula (the sandwich estimator) that is valid even when the residuals do not have constant spread across patients, making confidence intervals reliable in the skewed, unequal-variance data typical of health research.; Appears in: Ordinary Least Squares (OLS) Linear Regression
robust variance: A variance calculation that accounts for the fact that the same subcohort members appear in many parts of the analysis, preventing artificially narrow confidence intervals.; Appears in: Case-Cohort Design
robustness: A property of a statistical test meaning it still works correctly even when some assumptions are violated; Welch's t-test is robust to unequal variances, while Student's t-test is not.; Appears in: Welch's t-Test (Unequal Variances)
root operation: The third character of an ICD-10-PCS code, which defines the precise objective of the procedure using a strict technical definition — for example, Resection means removing all of a body part while Excision means removing only a portion, a distinction that determines which code applies even when the clinical note uses the same word for both.; Appears in: ICD-10-PCS Inpatient Procedure Codes
rootogram: A diagnostic plot that compares the observed count frequency distribution to model-predicted frequencies for each count value (0, 1, 2, ...); a hanging rootogram from the R countreg package is the standard visual check for whether the NB underpredicts the zero bar.; Appears in: Zero-Inflated and Hurdle Count Models
ROR: Reporting Odds Ratio: the odds that a report names the event given it names the drug, divided by the odds that a report names the event given it does not name the drug; numerically close to PRR for rare events but better-behaved when the event is common.; Appears in: Signal Detection (Disproportionality Analysis)
Rubin's rules: A formula that combines the point estimates and standard errors from each separately analyzed completed dataset into one final estimate, inflating the standard error to account for the between-dataset disagreement caused by imputation uncertainty.; Appears in: Multiple Imputation for Longitudinal RWE
rule-based phenotype: A phenotype built from explicit if-then logic: a patient qualifies if they meet a fixed combination of codes, thresholds, or time requirements, with no machine learning involved.; Appears in: EHR Phenotyping Algorithms
run-out date: The first calendar day when a fill's supply is expected to be exhausted — calculated as fill date plus days supply, or shifted later if a carryover rule is applied.; Appears in: Grace Period and Permissible Gap Rules, Stockpiling and Carryover Rules
running variable: The continuous measurement (for example, LDL cholesterol in mg/dL or age in years) whose value determines whether a patient receives treatment once it crosses the cutoff.; Appears in: Regression Discontinuity Design
rwPFS: Real-world progression-free survival: the number of days from the start of a treatment line to the earlier of a real-world progression event or death, whichever happens first.; Appears in: Real-World Progression and rwPFS
rwPFS proxy: A stand-in measure for cancer progression built from insurance records — typically the day a patient stopped their drug, switched to a new regimen, or died — because actual imaging-based progression is not recorded in claims.; Appears in: Therapeutic-Area-Specific RWE Challenges — Oncology
RxClass: A companion service that links RxCUIs to drug-class systems like ATC (international anatomical-therapeutic-chemical classification) and VA drug classes, so analysts can build defensible "all statins" or "all ACE inhibitors" concept lists without manually curating hundreds of individual drug codes.; Appears in: RxNorm Drug Terminology
RxCUI: The unique number RxNorm assigns to a drug concept — for example, every atorvastatin 20 mg oral tablet product from any manufacturer shares one RxCUI at the generic level.; Appears in: RxNorm Drug Terminology

S

sampling distribution: The distribution that a summary statistic (like a mean or difference) would follow if the study were repeated many times; the Central Limit Theorem says this distribution approaches a bell curve for means as sample size grows.; Appears in: Normal Distribution and the Central Limit Theorem
sampling fraction: The proportion of the full cohort included in the subcohort (subcohort size divided by cohort size); this fraction is used to weight each subcohort member's contribution so the analysis represents the full cohort.; Appears in: Case-Cohort Design
sampling weight: A number assigned to each survey respondent that equals roughly how many people in the target population that one respondent stands for — larger for undersampled groups, smaller for oversampled groups.; Appears in: Survey Weights and Complex Sampling
sampling zero: A zero-count observation from a patient who could have the event but did not in the observation window purely by chance; these zeros are handled correctly by the negative binomial distribution and do not require a zero-inflated model.; Appears in: Zero-Inflated and Hurdle Count Models
sandwich estimator: The mathematical formula behind cluster-robust standard errors; it adjusts the variance calculation to account for within-cluster correlation without changing the main effect estimate.; Appears in: Cluster-Robust Standard Errors
sandwich variance: A method for calculating standard errors and confidence intervals that remains valid even when the working correlation assumption is incorrect, which is why GEE results are considered robust.; Appears in: GEE Population-Average (Marginal) Models
SAP: Statistical Analysis Plan: a locked document that states, before any outcome data are examined, exactly which statistical method, covariates, and sensitivity checks will be used to answer the study question.; Appears in: Study Protocol and SAP Elements for RWE
Satterthwaite degrees of freedom: A formula that calculates an adjusted "effective sample size" for the test based on the sizes and variances of the two groups; it tells the test how much uncertainty to build in, and it falls somewhere between the smaller group's degrees of freedom and the combined total — reflecting which group is contributing more uncertainty.; Appears in: Welch's t-Test (Unequal Variances)
saturation: The point where new interviews stop producing new codes, signalling you have likely heard the range of experiences and can stop interviewing.; Appears in: Ethnographic / Observational Qualitative Study, Qualitative Interview Study
scale: Whether a result is a number (quantitative), a ranked category like low/normal/high (ordinal), a name such as a microbe identification (nominal), or a block of text (narrative); the scale is one of the six axes of a LOINC code.; Appears in: LOINC Laboratory and Observation Codes
scale parameter: A number (theta) that stretches or compresses the gamma distribution along the cost axis; together with the shape parameter it sets the mean and variance of the cost distribution.; Appears in: Gamma Distribution for Cost and Skewed Outcomes, Weibull Distribution for Time-to-Event Data
scaled Brier score: A version of the Brier score that is adjusted for how common the outcome is, so you can fairly compare model performance across studies where the event rate differs.; Appears in: Brier Score
scenario analysis: Re-running the whole model under a discrete alternative assumption — such as a different time horizon, discount rate, or survival extrapolation — rather than a numeric range on one input.; Appears in: Scenario Analysis and Deterministic Sensitivity Analysis (DSA)
scoping review: A related but different review that maps what evidence exists on a broad topic without judging study quality or combining results; it answers 'what is out there?' rather than a single focused question.; Appears in: Systematic Review
SDTM domain: A standardized dataset in CDISC format that holds one category of clinical observation — for example, the EX domain holds drug exposure records and the AE domain holds adverse events — with fixed column names recognized by FDA review software.; Appears in: CDISC Standards (SDTM/ADaM) for RWE Submissions
secondary diagnosis: Any diagnosis coded on a claim below the first position — comorbid conditions, complications, or other findings that affected care during the encounter but were not the main reason for the admission or visit.; Appears in: Diagnosis Position, Type, and Qualifiers on Claims
secondary non-adherence: When a patient fills a prescription at least once to start the drug but later takes it inconsistently or stops; measures like PDC track this, not the never-filled group.; Appears in: Primary Non-Adherence and Treatment Initiation
section detection: A preprocessing step that divides a clinical note into labeled zones (family history, past medical history, assessment and plan) so that entity extraction applies the correct scope and avoids tagging a relative's diagnosis as the patient's own.; Appears in: NLP for Clinical Text in RWE
Section X (New Technology): A dedicated ICD-10-PCS section for recently developed procedures, devices, and biologics that do not fit the existing Medical and Surgical tables; analysts studying novel implants or therapies must always check Section X for relevant codes.; Appears in: ICD-10-PCS Inpatient Procedure Codes
secular trend bias: The distortion that occurs when a historical control group was treated in an earlier era when standard care was worse, making the new drug look more effective than it actually is.; Appears in: Single-Arm Trial with External (Historical) Control
segmented regression: A type of regression model that fits one straight line to the pre-intervention data and a second, differently-angled line to the post-intervention data, estimating both the level change and the slope change at the break point.; Appears in: Interrupted Time Series (Segmented Regression)
Selection bias: Distortion in a study result that occurs when who gets included or stays in the analysis depends on both the treatment being studied and the health outcome being measured.; Appears in: Complete-Case Analysis, Selection Bias Sensitivity Analysis
selection probability: The chance that a patient who was eligible for the study actually ended up in the analyzed dataset, which can differ by treatment arm and health status.; Appears in: Selection Bias Sensitivity Analysis
self-controlled: A study feature where each person serves as their own comparison group, so fixed personal characteristics like sex or genetics cannot distort the result.; Appears in: Self-Controlled Case Series (SCCS)
Semantic Clinical Drug (SCD): The generic clinical-drug level in RxNorm — ingredient plus strength plus dose form — for example, "atorvastatin 20 mg oral tablet"; it is the typical target level for mapping pharmacy claims in research databases.; Appears in: RxNorm Drug Terminology
semicontinuous distribution: A distribution that has an exact zero at one point — representing patients with no cost — plus a continuous range of positive values; it cannot be modelled by a purely continuous distribution like the gamma, which requires all values to be strictly positive.; Appears in: Two-Part and Hurdle Models for Semicontinuous Costs
sensitivity: Of all patients who truly had the event, the fraction the algorithm correctly flagged as event-positive — a sensitivity of 0.78 means 78 of every 100 true events are captured.; Appears in: Algorithm Validation, Claims Outcome Algorithm PPV/Sensitivity Trade-off, Diagnostic Accuracy Study, Diagnostic Likelihood Ratios, Misclassification Bias Correction, Safety Signal Case Definition
sensitivity analysis: A repeat of the main analysis under a deliberately altered assumption (for example, a different follow-up length) to test whether the main result is robust; it must be named in the SAP before results are known to count as pre-specified evidence.; Appears in: Grace Period and Permissible Gap Rules, Study Protocol and SAP Elements for RWE, Tipping Point Analysis
separation: A data condition where a variable perfectly predicts the outcome in one direction, for example every single exposed patient has the event and no unexposed patient does, causing ordinary logistic regression to produce an infinite odds ratio.; Appears in: Firth Penalized Regression, Maximum Likelihood Estimation
sequence ratio (SR): The count of patients whose first fill of drug A came before their first fill of drug B, divided by the count whose first fill of B came before A — a ratio above 1 means A-before-B is the more common order.; Appears in: Prescription Sequence Symmetry Analysis (PSSA)
sequential exchangeability: The assumption that, at every moment in follow-up, we have measured everything that predicts both the upcoming treatment decision and the outcome; it is the time-varying equivalent of the no-unmeasured-confounding assumption.; Appears in: Marginal Structural Models and G-Methods
service line: A single row on a claim representing one service, procedure, or drug administration on one date; a claim can have multiple service lines, and the POS code is recorded at the service-line level so different lines on the same claim can theoretically have different settings.; Appears in: CMS-1500 / 837P Professional Claim Fields, Place of Service (POS) Codes
Setting: The specific database or data source being used (for example, insurance claims versus hospital records), the country, the calendar years covered, and the overall study design chosen to answer the question.; Appears in: PICOTS Framework for RWE
seventh-character extension: A required letter at the end of certain ICD-10-CM codes that records whether the patient is at an initial encounter (A), a follow-up encounter (D), or experiencing a late effect or sequela (S) of the condition.; Appears in: ICD-10-CM Diagnosis Codes
severity gradient: The range from mild to severe illness within a single diagnosis category; because this gradient drives prescribing decisions but is often not recorded in claims data, it is the main unmeasured source of confounding by indication.; Appears in: Confounding by Indication and Channeling Bias
shape parameter: The parameter k in the Weibull model that controls whether risk rises, stays flat, or falls over time; k greater than 1 means increasing risk, k equal to 1 means constant risk (exponential distribution), and k less than 1 means decreasing risk.; Appears in: Gamma Distribution for Cost and Skewed Outcomes, Weibull Distribution for Time-to-Event Data
shrinkage: The process by which penalized regression reduces the size of coefficient estimates relative to ordinary least squares; every coefficient moves closer to zero, and LASSO can push them all the way to exactly zero.; Appears in: Regularized Regression: LASSO, Ridge, and Elastic Net
signal generation: Spotting a possible new problem worth investigating later, without yet proving it is real.; Appears in: Case Report
signal generation vs confirmation: Generation means flagging a possible problem worth investigating; confirmation means proving it in a separate, carefully designed study - TreeScan only does the first.; Appears in: Tree-Based Scan Statistics (TreeScan)
signal of disproportionate reporting: A drug-event pair that meets a pre-specified numeric threshold, flagging it for clinical review; it is a hypothesis-generating alert, not a confirmed causal finding.; Appears in: Signal Detection (Disproportionality Analysis)
signed rank: The rank of a paired difference's absolute value, given the same sign (positive or negative) as the original difference — rank 3 becomes +3 if the change was an improvement, -3 if it was a worsening.; Appears in: Wilcoxon Signed-Rank Test
silver-standard trap: The mistake of training an NLP model using structured billing codes as its reference labels, then claiming the model adds information beyond those codes — it cannot, because it learned to reproduce the very signal it was supposed to supplement.; Appears in: NLP for Clinical Text in RWE
simulation interval: The range of bias-adjusted estimates produced by running thousands of random draws of the bias parameters; it reflects uncertainty about the hidden confounder, not just ordinary statistical noise, and must not be labeled a confidence interval.; Appears in: Quantitative Bias Analysis Toolkit, Unmeasured Confounding Probabilistic Bias Analysis
single-arm trial: A clinical trial in which all enrolled patients receive the investigational treatment with no concurrent placebo or comparator group inside the study.; Appears in: Rare Disease External Controls, Single-Arm Trial with External (Historical) Control
site of care: The physical or virtual location type where a health service is furnished, such as a physician's office, hospital outpatient department, emergency room, skilled nursing facility, or the patient's home; POS codes are the primary way this is recorded on professional claims.; Appears in: Place of Service (POS) Codes
skewness: A measure of how lopsided a distribution is; a right-skewed distribution like healthcare costs has a long tail of very large values that pull the arithmetic mean far above the median, which is why the geometric mean alone understates average spend for budget purposes.; Appears in: Descriptive Statistics, Log-Normal Distribution and the Retransformation Problem
slope change: The shift in how fast the outcome rate is rising or falling after the intervention compared to before — a positive slope change means the rate is climbing faster, a negative one means it is declining faster or reversing a prior upward trend.; Appears in: Interrupted Time Series (Segmented Regression)
slope coefficient: The estimated change in the average outcome for a one-unit increase in a predictor, holding all other variables in the model fixed at their current values.; Appears in: Ordinary Least Squares (OLS) Linear Regression
SMART on FHIR: An open login standard that lets a patient-facing or research app connect to any compatible hospital or insurer system and retrieve data with the patient's permission, without needing a custom integration built separately for each site.; Appears in: FHIR and Healthcare Interoperability for RWE
smearing factor: A correction multiplier equal to the average of the back-transformed OLS residuals; proposed by Duan (1983), it adjusts the geometric-mean estimate upward to approximate the true arithmetic mean without requiring any assumption about the residual distribution.; Appears in: Log-Normal Distribution and the Retransformation Problem
snapshot date: The single calendar day you freeze the population on and read every person's condition and treatment status as-of that day.; Appears in: Cross-Sectional Study
source concept: The original code from the raw data before translation — such as an ICD-10-CM diagnosis code or an NDC drug code — preserved in the CDM alongside the standard concept for audit purposes.; Appears in: OMOP Standardized Vocabularies (OHDSI/Athena)
source-to-standard mapping: The ETL step that converts raw billing codes (ICD-10-CM, NDC, CPT) into standard OMOP concept_ids; if a source code has no mapping, it becomes concept_id 0 and is invisible to a standard concept set.; Appears in: OMOP Concept Set Development
span citation: A requirement that an LLM quote the exact phrase from the source document that supports each extracted field; an extracted value with no supporting quote is flagged for human review.; Appears in: LLM-Assisted Data Abstraction and Evidence Work in RWE
sparse data: A table or dataset where one or more cells have very few observations — often zero — typically because the outcome is rare, the exposure is uncommon, or both.; Appears in: Exact and Penalized-Likelihood Methods for Sparse Data
sparsity: A property of LASSO and elastic net models where many coefficients are set exactly to zero, so only a small fraction of the original candidate predictors appear in the final model with non-zero weights.; Appears in: Regularized Regression: LASSO, Ridge, and Elastic Net
specificity: Of all patients who truly did not have the event, the fraction the algorithm correctly left unflagged — a specificity of 0.97 means 97 of every 100 true non-events are correctly cleared.; Appears in: Algorithm Validation, Diagnostic Accuracy Study, Diagnostic Likelihood Ratios, Misclassification Bias Correction
specimen (System): The biological sample the measurement is taken from, such as serum, plasma, whole blood, or urine; different specimens for the same analyte receive different LOINC codes because the reference ranges differ.; Appears in: LOINC Laboratory and Observation Codes
spontaneous report: A voluntary submission by a patient, prescriber, or manufacturer to a safety database such as FDA FAERS or WHO VigiBase, describing a suspected drug side effect.; Appears in: Signal Detection (Disproportionality Analysis)
stabilized inverse-probability weight: A number assigned to each patient that makes the study group look like the overall population — patients who were unlikely to receive their treatment get a large weight; patients whose treatment was expected get a small weight.; Appears in: Missing Data, Trimming, and Winsorization in RWE
staggered adoption: When different units (plans, states, hospitals) adopt the same policy at different calendar times rather than all at once, requiring more careful methods than a simple before-and-after comparison.; Appears in: Difference-in-Differences with Staggered Adoption
standard concept: A concept marked with standard_concept = 'S' in the OMOP vocabulary, meaning it is the intended target for analysis queries; diagnoses map to SNOMED standard concepts, drugs to RxNorm, and lab tests to LOINC.; Appears in: OMOP CDM Method Patterns for RWE, OMOP Standardized Vocabularies (OHDSI/Athena)
standard deviation: A measure of spread that describes the typical distance between each individual value and the mean; for bell-shaped data roughly two-thirds of values fall within one standard deviation of the mean.; Appears in: Descriptive Statistics
standard error: The standard deviation of the sample mean's distribution across repeated studies — equal to the data's standard deviation divided by the square root of sample size; it shrinks as more patients are enrolled, while individual variability stays constant.; Appears in: Cluster-Robust Standard Errors, Inferential Statistics Foundations, Normal Distribution and the Central Limit Theorem, Two-Sample (Student's) t-Test
standard error of the mean difference: The standard deviation of the within-person differences divided by the square root of the number of pairs; it measures how precisely the sample mean difference estimates the true population mean difference.; Appears in: Paired t-Test
standard population: An external reference group — for example, the 2000 US Standard Million published by the CDC — whose age (or age-sex) distribution is used as the shared set of weights when comparing two study groups.; Appears in: Direct Standardization
standardization: Rescaling each predictor to have mean zero and unit variance before fitting the penalized model; required so the penalty treats all predictors equally regardless of their original units or range (age in decades vs a binary flag must be made comparable).; Appears in: Regularized Regression: LASSO, Ridge, and Elastic Net
standardization (model-based): The step in g-computation that averages model predictions across the observed distribution of patient characteristics, so the final estimate reflects the actual mix of patients in the study rather than an artificial reference patient.; Appears in: G-Computation and the Parametric G-Formula
standardized difference: A sample-size-independent measure of how different two groups are on a variable, preferred over chi-square p-values for assessing covariate balance in large observational studies because it does not inflate with sample size.; Appears in: Chi-Square Test of Independence
standardized incidence ratio (SIR): The same calculation as the SMR but applied to new disease diagnoses (incidence) rather than deaths.; Appears in: Indirect Standardization, SMR, and SIR
standardized mean difference (SMD): A single number measuring how far apart two groups are on one characteristic, expressed in standard-deviation units; a value below 0.1 is the conventional target for acceptable balance.; Appears in: Baseline Characteristics and Covariate Balance, Propensity Score Methods (PSM, IPTW), Single-Arm Trial with External (Historical) Control
standardized mortality ratio (SMR): Observed deaths divided by expected deaths; a value of 1.20 means the cohort had 20% more deaths than the reference population's rates would predict for a group with the same age and sex distribution.; Appears in: Indirect Standardization, SMR, and SIR
state: One of the finite conditions a patient can be in at a point in time (e.g., Stable, Progressed, Dead); the model tracks which state each patient occupies.; Appears in: Multi-State Models
statistical power: The probability that a study will correctly detect a real effect of a given size; low power means a study might miss a true effect, producing a false-negative result.; Appears in: Inferential Statistics Foundations
step-up procedure: A testing algorithm that ranks all p-values from smallest to largest, then assigns each test a progressively less strict threshold based on its rank; the Benjamini-Hochberg method is the most widely used step-up procedure for controlling the false discovery rate.; Appears in: Multiplicity and Multiple Comparisons
stochastic dominance: The property being tested by the Kruskal-Wallis test — whether a randomly chosen observation from one group is more likely to be larger (or smaller) than a randomly chosen observation from another group, which is a broader statement than saying the group medians differ.; Appears in: Kruskal-Wallis Test, Mann-Whitney U Test (Wilcoxon Rank-Sum)
stockpiling: Refilling a medication before the current supply is gone, so the patient accumulates more pills than they take each day and builds a surplus.; Appears in: Stockpiling and Carryover Rules
stratification: Dividing the population into non-overlapping groups (strata) before sampling, then drawing a separate sample from each group, so every group is guaranteed to appear in the data.; Appears in: Disease Risk Scores, Survey Weights and Complex Sampling
stratum-specific rate: The event rate calculated within one slice of the population, such as people aged 65–74, computed as the number of events divided by the total time that group was observed.; Appears in: Direct Standardization
structural nested model: A mathematical description of the size of the treatment effect at each point in time, defined as the difference between what the outcome would have been if the patient continued treatment from that point forward versus if they had stopped — built up interval by interval rather than all at once.; Appears in: G-Estimation of Structural Nested Models
structural zero: A patient observation that is certainly zero because the patient cannot possibly have the event — for example, a never-user of a drug class or a patient whose anatomy makes the procedure impossible — as opposed to a patient who could have the event but happened not to.; Appears in: Two-Part and Hurdle Models for Semicontinuous Costs, Zero-Inflated and Hurdle Count Models
structured-output prompting: A technique that forces an LLM to return its answer in a fixed format (such as a JSON object with allowed values) rather than free text, which reduces the range of errors the model can make.; Appears in: LLM-Assisted Data Abstraction and Evidence Work in RWE
study package: A versioned bundle of code, variable definitions, and parameter files that is distributed to every site so each one runs exactly the same analysis on its own local data.; Appears in: Federated and Distributed Network Analysis
subcohort: A randomly chosen subset of the full cohort, selected at the very start of follow-up before any outcomes occur; the subcohort is the group on which expensive measurements are made, and it stands in for the whole cohort in the analysis.; Appears in: Case-Cohort Design
subdistribution hazard (Fine-Gray): A regression coefficient that maps directly onto the cumulative incidence curve; a subdistribution hazard ratio greater than 1 means higher probability of the outcome, but it is not a biological rate.; Appears in: Competing Risks (Cause-Specific Hazard, Cumulative Incidence, and Fine-Gray)
subject-specific effect: The predicted change in outcome for a particular individual patient, holding that patient's own characteristics constant; a generalized linear mixed model targets this, not GEE.; Appears in: GEE Population-Average (Marginal) Models, Longitudinal Outcomes Modeling
summary measure: The single number used to answer the question across the whole study population — for example, the difference in two-year risk between treatment arms, or the ratio of event rates.; Appears in: Estimand-to-Analysis Traceability
supply_end: The last calendar day a given fill's medication supply covers, calculated as fill_date plus days_supply minus one day.; Appears in: Persistence Time to Discontinuation
support [0,1]: The range of values a beta-distributed variable can take — any real number between 0 and 1, but never outside it, which makes it safe for probabilities, adherence rates, and quality-of-life utility weights.; Appears in: Beta Distribution for Proportions and Utilities
surrogate endpoint: An early or easier-to-measure outcome — such as tumor response, blood pressure, or LDL cholesterol — used in place of the clinical outcome patients actually care about, such as survival or heart attack.; Appears in: Surrogate Endpoint Validation
survival analysis: A family of statistical methods that estimates how long patients survive (or remain event-free) and compares those times across treatment groups.; Appears in: Mortality Source Hierarchy
survival curve: A step-shaped line on a graph that starts at 100% and drops each time a patient has the event, showing what fraction of the group remains event-free as time passes; RMST is the area under this line up to tau.; Appears in: Restricted Mean Survival Time (RMST)
swing weighting: A way to set weights by asking which criterion's jump from worst to best level the committee would most want, giving that 100 points, and rating the other jumps against it.; Appears in: Multi-Criteria Decision Analysis (MCDA)
switch: A substitution event where the patient stops a drug from the current regimen and starts a new, different drug — the clearest signal that one line has ended and a new one has begun.; Appears in: Drug Utilization Study, Treatment Patterns and Lines of Therapy (LOT)
symmetry assumption: The requirement that the distribution of within-patient differences is mirror-image symmetric around its center; the Wilcoxon signed-rank test can detect the wrong thing if this assumption is badly violated, such as when most patients improve a little but a few improve enormously.; Appears in: Wilcoxon Signed-Rank Test
symmetry window: The maximum number of days allowed between a patient's first fill of drug A and their first fill of drug B for the pair to count in the analysis — typically ±365 days.; Appears in: Prescription Sequence Symmetry Analysis (PSSA)
synthetic control: A stand-in outcome series for the treated region, built by blending the outcomes of untreated comparison regions using carefully chosen weights, so the blend mimics the treated region's pre-intervention history.; Appears in: Synthetic Control Method
systematic bias: A consistent shift in estimated effects caused by imperfect study design, data quirks, or residual confounding -- bias that pushes ALL estimates in the same direction, not just random noise.; Appears in: Empirical Calibration with Negative Controls
systematic review: A study that searches all available research on a specific question, selects the studies that meet defined quality standards, and summarizes their findings in a structured way.; Appears in: Umbrella Review (Review of Systematic Reviews)

T

target dose: The dose the clinician is aiming for - either a guideline maintenance dose or whatever dose hits a clinical goal like a lab value.; Appears in: Dose Titration / Up-Titration to Target Dose
target question (estimand): The precise causal question a study is designed to answer, written down and locked before any data are touched, specifying exactly which patients, treatments, outcome, complicating events, and summary statistic are in scope.; Appears in: Estimand-to-Analysis Traceability
target trial: The hypothetical randomized controlled trial you would ideally run to answer your research question, written out as a full protocol even though it will never actually be conducted.; Appears in: Target Trial Emulation
targeted maximum likelihood estimation: A two-stage estimation approach that first fits initial models for outcomes and treatment, then applies a small mathematical correction called the targeting step so that the final answer is optimized specifically for the causal effect you care about.; Appears in: Targeted Maximum Likelihood Estimation (TMLE)
tau (τ): The fixed horizon, chosen before any analysis, that caps the window of interest — for example, 1095 days (36 months); RMST counts only event-free days accumulated before tau.; Appears in: Restricted Mean Survival Time (RMST)
taxonomy code: A 10-character code that describes a provider's type and specialty (for example, family medicine or general surgery); providers self-report these codes and they are not verified against board-certification records.; Appears in: NPI (National Provider Identifier)
telehealth: A health care visit conducted by video or audio connection rather than in person; in Medicare claims, telehealth visits are identified by POS codes 02 (at a site other than the patient's home) or 10 (in the patient's home), with POS 10 introduced in January 2022.; Appears in: Place of Service (POS) Codes
temporality: Knowing which event happened first; a cross-sectional snapshot measures everything at once, so it cannot tell you whether the exposure or the outcome came first.; Appears in: Cross-Sectional Study
teratogenicity: The potential of a drug or chemical exposure during pregnancy to cause structural defects or developmental problems in the developing baby.; Appears in: Mother-Infant Linkage
Term type (TTY): The level in the RxNorm hierarchy a concept sits at — ingredient (IN), semantic clinical drug (SCD), branded drug (SBD), brand name (BN), and others — each representing a different granularity of drug information.; Appears in: RxNorm Drug Terminology
test-negative control: A patient who came to care with the same syndrome but whose laboratory test came back negative for the pathogen — they serve as the comparison group because they cleared the same care-seeking filter.; Appears in: Test-Negative Design
test-positive case: A patient who came to care with the target syndrome (e.g., flu-like illness) and whose laboratory test confirmed the pathogen of interest — they are the 'cases' in this design.; Appears in: Test-Negative Design
testing-selection bias: The distortion that arises because patients who received a biomarker test differ systematically from those who did not, for example having better insurance, being at a larger hospital, or having more advanced disease that prompted the test.; Appears in: Biomarker-Defined Cohort (RWE)
the null: The no-effect value, which on the risk-ratio scale is exactly 1.0 (the treatment makes no difference).; Appears in: E-value Sensitivity Analysis
the say-do gap: The difference between what people report they do and what they are actually observed doing, which is exactly what watching in person reveals.; Appears in: Ethnographic / Observational Qualitative Study
thematic synthesis: The most widely used approach for qualitative evidence synthesis: reviewers label each study's findings line by line, group similar labels into descriptive themes, then interpret those themes into analytical conclusions.; Appears in: Qualitative Evidence Synthesis
theme: A recurring pattern of meaning the researcher builds up across many observations, naming something that keeps showing up (for example, "paperwork delays starting treatment").; Appears in: Ethnographic / Observational Qualitative Study, Qualitative Evidence Synthesis, Qualitative Interview Study
tied ranks: When two or more observations have the same value, they each receive the average of the ranks they would have occupied; this average-rank approach ensures the math remains consistent and a correction factor adjusts the H statistic for any inflation caused by ties.; Appears in: Kruskal-Wallis Test
ties: When two or more observations share the same value (e.g., three patients all with 2 days of hospital stay), each tied observation is assigned the average rank of the positions they share, rather than distinct ranks.; Appears in: Mann-Whitney U Test (Wilcoxon Rank-Sum)
time horizon: The total span of years over which a health economic model tracks costs and health effects — often a patient's lifetime for chronic diseases.; Appears in: Cost-minimization Analysis (CMA), Discounting of Costs and Effects in Economic Evaluation, Number Needed to Treat (and Number Needed to Harm)
time ratio: The multiplicative factor by which a covariate stretches or compresses the time until an event; a time ratio of 1.5 means the event takes 1.5 times as long in the treated group as in the control group at every point in the survival distribution.; Appears in: Accelerated Failure Time (AFT) Models
time zero: The starting date for follow-up in each group -- in the trial it is the date of treatment assignment, and in the external control it must be set to an equivalent moment (such as the date the patient started standard-of-care treatment) to ensure both groups are measured from the same decision point.; Appears in: Cumulative Incidence and Absolute Risk Estimation, Prospective Cohort Study, Rare Disease External Controls, Registry-Based Randomized Controlled Trial (RRCT), Study Protocol and SAP Elements for RWE
time zero (index date): The starting clock tick for each patient — usually the day they first filled the study drug — from which all follow-up days are counted.; Appears in: Cox Proportional Hazards Regression, Restricted Mean Survival Time (RMST)
time zero alignment: The requirement that a patient's eligibility check, treatment assignment, and the start of follow-up all happen at exactly the same date so no disease-free time is silently credited to one treatment group before the study officially begins.; Appears in: Target Trial Emulation
time-at-risk: The stretch of calendar days during which a specific participant could have had the study event and the data source would have captured it — it starts when they enter the study and ends when they have the event, leave, or the data end.; Appears in: Person-Time Denominator Construction
time-at-risk window: The stretch of calendar days during which the study is actively watching a patient for the outcome of interest; only events that occur inside this window count toward the study results.; Appears in: OMOP Time-at-Risk and Cohort Exit
time-conditional matching: Pairing a switcher with a comparator-drug continuer who has been on the comparator for the same number of months, so both groups have the same amount of drug exposure history before the study clock starts.; Appears in: Prevalent New-User Design
time-control series: A separate group of people (ideally future patients who will later have the same event) who contribute the same two time windows so the study can measure and subtract the background prescribing trend.; Appears in: Case-Time-Control Design
time-dependent covariate: A variable in the model whose value is allowed to change during a patient's follow-up, such as whether they are currently on a drug, rather than being fixed at a single baseline value.; Appears in: Cox Regression with Time-Dependent Covariates
time-to-first-event: The number of days from a patient's study start date until the earliest component event occurs; this is what the statistical model uses as the outcome.; Appears in: Composite Endpoint Construction, Win Ratio and Generalized Pairwise Comparisons
time-updated exposure: An exposure variable that is re-measured at each point in a patient's follow-up rather than set once at the start, so the analysis reflects what the patient was actually doing at each moment.; Appears in: Time-Updated Exposures and Cumulative Dose
time-varying confounding: A situation where a patient characteristic (like a lab value) changes over the study period, affects whether the patient receives treatment at each point in time, and also influences the final outcome — making it both a confounder and something that can be affected by prior treatment.; Appears in: G-Estimation of Structural Nested Models, Marginal Structural Models and G-Methods
time-varying exposure: A study design where a patient's treatment status is allowed to change during follow-up — switching from 'untreated' to 'treated' at the exact date the first prescription is filled — rather than being locked in at cohort entry.; Appears in: Immortal Time Bias Handling
time-zero (index date): Each patient's 'day zero,' the single calendar day when their group is assigned and follow-up starts; here it sits in the past.; Appears in: Retrospective Cohort Study Design
Timing: All of the clock decisions in a study: how far back in history to look before a patient enters the study, when the clock starts ticking, how long patients are followed, and the rules for when they stop being counted.; Appears in: PICOTS Framework for RWE
tipping point: The smallest bias assumption large enough to flip a statistically significant result to non-significant (or to cross another pre-chosen decision line), expressed as a single number you can judge against what is clinically believable.; Appears in: Quantitative Bias Analysis Toolkit, Tipping Point Analysis
titration period: The early stretch of treatment during which the dose is being adjusted (usually raised) before it settles at a steady level.; Appears in: Dose Titration / Up-Titration to Target Dose
token: The irreversible code produced from someone's identifiers; matching tokens indicate the same person.; Appears in: Tokenization and Privacy-Preserving Record Linkage
tokenization: Scrambling a person's identifiers into a fixed, irreversible code so the same person can be recognized across datasets without sharing their name.; Appears in: Tokenization and Privacy-Preserving Record Linkage
tornado diagram: A bar chart ranking inputs by how much each one swings the model result, widest bar on top, so the most influential inputs are visible at a glance.; Appears in: Scenario Analysis and Deterministic Sensitivity Analysis (DSA)
TOST (two one-sided tests): A method that runs two separate one-sided hypothesis tests simultaneously — one ruling out being too much worse, one ruling out being too much better — and declares equivalence only when both pass.; Appears in: Equivalence and Non-Inferiority Testing
traceability chain: The documented path from a raw data field (e.g., a claims fill_date) through each transformation step (EX domain, ADaM derivation) to the final analysis number, such that any intermediate value can be independently reproduced.; Appears in: CDISC Standards (SDTM/ADaM) for RWE Submissions
traceability matrix: A table, written before programming begins, that connects each of the five parts of the target question to the specific data rule, code function, and check that implements it in the analysis.; Appears in: Estimand-to-Analysis Traceability
training error vs test error: Training error is how well the model fits the data it was built on (always too optimistic); test error is how well it performs on patients it has never seen (the honest estimate you actually need).; Appears in: Cross-Validation, Overfitting, and Optimism
trajectory: The pattern of how a patient's outcome value rises, falls, or stays flat across a sequence of visits or time points.; Appears in: Longitudinal Outcomes Modeling
trajectory group: One of the k subpopulations in a GBTM, each described by its own average time-path of medication use (for example, a group that starts high and declines steeply each month).; Appears in: Group-Based Trajectory Models and Latent Class Analysis
transcript: The full written-out text of a recorded interview, word for word, which is what the analyst actually reads and works from.; Appears in: Qualitative Interview Study
transfer: When a patient is moved from one hospital to a second hospital during the same illness; this generates two separate inpatient records in claims data but represents one clinical event.; Appears in: Acute Event Deduplication Window
transition: An allowed move from one state to another (e.g., Stable to Progressed); arrows in the state diagram.; Appears in: Multi-State Models
transition intensity: The instantaneous rate of a specific move among patients currently in the starting state - a hazard for that one arrow.; Appears in: Multi-State Models
transition probability: The chance that a patient currently in one health state will be in a specific health state at the end of the next cycle, expressed as a number between 0 and 1.; Appears in: Health Economic Modeling Methods Using RWE, Markov Transition Probabilities from Real-World Data
transitivity: The assumption that the patients in the trials informing each comparison are similar enough that the shared comparator treatment behaves the same way across all of them; if this breaks down, the indirect estimates are misleading.; Appears in: Network Meta-Analysis
transportability: The target population is entirely separate from the study — you are carrying the effect estimate across to a group that was never sampled, which requires stronger assumptions about what was measured.; Appears in: External Adjustment and Validation-Substudy Bias Correction, Generalizability, Transportability, and External Validity, Surrogate Endpoint Validation
trapezoidal rule: A simple way to estimate the area under a curve by drawing trapezoids between consecutive measurement points — used here to compute total QALYs as the average of two consecutive utility scores multiplied by the time between them.; Appears in: QALY Utility Mapping (Crosswalking to Health-State Utilities)
treated group: The plans, states, or providers that actually received the policy change (for example, the insurance plans that added a step-therapy rule).; Appears in: Difference-in-Differences with Staggered Adoption
treatment channeling: When doctors tend to steer a particular type of patient toward one drug, which is why the untreated comparison group starts out different.; Appears in: Baseline Characteristics and Covariate Balance
treatment episode: A single continuous stretch of time during which a patient is considered to be actively on a drug, bounded by the first fill date and the projected last day the final fill covers.; Appears in: Exposure Episode Construction
treatment policy strategy: An intercurrent-event rule that says: ignore the disruption and follow every patient to the end of the study regardless, mirroring an intention-to-treat design.; Appears in: Estimands (ATE/ATT) and Intercurrent Events in RWE
treatment-by-visit interaction: A model term that allows the difference between the two treatment arms to be a different size at each visit, rather than forcing a single average across all time points.; Appears in: Mixed Model for Repeated Measures (MMRM) in RWE
treatment-confounder feedback: A cycle where a post-baseline health measure (such as a lab value) is both changed by past treatment and predictive of future treatment decisions; the parametric g-formula handles this by simulating covariate histories forward rather than conditioning on the observed values.; Appears in: G-Computation and the Parametric G-Formula, G-Estimation of Structural Nested Models, Marginal Structural Models and G-Methods
trial-level association: A relationship measured across multiple randomized trials: treatments that produced a bigger effect on the surrogate also produced a bigger effect on the true endpoint, and this is captured as R-squared (R2) — a number from 0 to 1 where 1 means perfect prediction.; Appears in: Surrogate Endpoint Validation
trimming: Removing patients whose costs exceed a threshold from the analysis entirely, which changes which population you are studying — you are no longer describing all patients, only the non-catastrophic ones.; Appears in: Cost Outlier Handling (Winsorization, Trimming, Robust/Two-Part Models), Missing Data, Trimming, and Winsorization in RWE
true endpoint: The clinical outcome that directly matters to patients, such as overall survival or occurrence of a major cardiac event, against which the surrogate is judged.; Appears in: Surrogate Endpoint Validation
true negative (TN): A person who truly does not have the condition and whom the test correctly clears as negative.; Appears in: Sensitivity and Specificity
true positive (TP): A patient the algorithm correctly flagged as having the outcome — they were flagged and they truly had it.; Appears in: F1 Score, Precision, and Recall, Positive and Negative Predictive Value, Sensitivity and Specificity
trusted research environment: A secure computing environment in which approved analysts run code against patient-level data that never physically leaves the data holder's servers — the OpenSAFELY and DARWIN EU model — so that only analytical results (not patient records) are released.; Appears in: International Real-World Data Sources
Tweedie distribution: A one-model alternative that handles zeros and positive continuous values in a single equation using a compound Poisson-gamma structure, at the cost of constraining covariate effects to be the same for the any-cost and the conditional-cost parts.; Appears in: Two-Part and Hurdle Models for Semicontinuous Costs
Type 1 vs Type 2 NPI: A Type 1 NPI belongs to an individual clinician (a person); a Type 2 NPI belongs to a health-care organization such as a hospital or group practice. One doctor can hold both.; Appears in: NPI (National Provider Identifier)
Type of Bill (TOB): A four-digit code in FL4 that encodes the facility type, bill classification (inpatient vs outpatient), and a frequency digit indicating whether the claim covers the full stay (frequency 1), is an interim partial bill (frequency 2–4), a correction (frequency 7), or a cancellation (frequency 8).; Appears in: UB-04 / 837I Institutional Claim Fields
type-I error rate: The probability of concluding there is a difference when there actually is none; Welch's t-test maintains this at the chosen level (e.g., 5%) across a wide range of variance ratios, while Student's t-test can exceed it when variances differ.; Appears in: Welch's t-Test (Unequal Variances)

U

U statistic: The Mann-Whitney U statistic counts how many times a value from one group exceeds a value from the other group, across all possible pairs; it equals the rank sum minus a correction for the group size.; Appears in: Mann-Whitney U Test (Wilcoxon Rank-Sum)
UB-04: The standardized claim form (also called the 837I electronic transaction) that hospitals submit to payers for inpatient and outpatient facility services; ICD-10-PCS procedure codes appear only in the procedure code fields of the UB-04 inpatient claim.; Appears in: ICD-10-PCS Inpatient Procedure Codes, Revenue (Center) Codes
UCUM: The Unified Code for Units of Measure — the companion standard to LOINC that specifies exactly how to write measurement units (mg/dL, umol/L, %) so that unit mismatches across sites can be detected and corrected before numeric thresholds are applied.; Appears in: LOINC Laboratory and Observation Codes
unconditional share: Each stage's count expressed as a percentage of the very first (largest) group, so all percentages have the same denominator and you can compare losses across stages directly.; Appears in: Cascade of Care Analysis
union rule: A decision rule that flags a patient as having died if any source — even just one — reports a death, maximizing the chance of catching every real death.; Appears in: Mortality Source Hierarchy
unit of analysis: The 'thing' each row of your data represents; in an ecological study it is a group (a state or county), not a person.; Appears in: Ecological (Aggregate) Study
unit of randomization vs unit of analysis: In a cluster-randomized trial the cluster (e.g., clinic) is randomized, but outcomes are usually measured on individual patients — this mismatch is what creates the within-cluster correlation problem.; Appears in: Cluster-Randomized Trial
unmeasured confounder: A risk factor for the outcome that is related to which treatment patients received, but is absent from the database — if ignored, it can make one treatment look better or worse than it really is.; Appears in: E-value Sensitivity Analysis, External Adjustment and Validation-Substudy Bias Correction
unmeasured confounding: A situation where a factor that affects both who receives a treatment and what outcome they have is not recorded in the data, so its effect on the result cannot be directly removed.; Appears in: Tipping Point Analysis, Unmeasured Confounding Probabilistic Bias Analysis
unobserved exposure: Time a patient spent on the drug before the study window opened — the researcher cannot see what happened during this period, including any early bad reactions that caused the patient to stop or have an outcome.; Appears in: Prevalent User Bias
uptake: The share of eligible members expected to actually take the new drug, which usually starts small and grows over a few years rather than jumping to everyone at once.; Appears in: Budget Impact Analysis
US Core profile: A set of rules that narrows the broad FHIR standard to the specific data fields US payers and hospitals are legally required to provide under federal interoperability regulations.; Appears in: FHIR and Healthcare Interoperability for RWE
usual care: The standard treatment a patient would receive at a typical clinic if they were not in a study, including whatever the doctor normally prescribes and however the patient normally takes it.; Appears in: Pragmatic Trial
utility: A single number between 0 and 1 that represents how good or bad a health state is, anchored so that 0 equals being dead and 1 equals perfect health; it is the only type of quality-of-life score that can be multiplied by time to calculate a QALY.; Appears in: Health-Related Quality of Life (HRQoL) Measurement, QALY Utility Mapping (Crosswalking to Health-State Utilities)
utility score: A number between 0 and 1 that represents how desirable a given health state is, usually measured with a standardized questionnaire such as the EQ-5D; it is the weight applied to each year of life when calculating QALYs.; Appears in: Cost-Utility Analysis (CUA)

V

V code: An ICD-9-CM supplementary code beginning with the letter V (e.g., V58.x for aftercare) that records why a patient contacted healthcare when no active disease code applies; used for vaccination visits, screening exams, and follow-up care.; Appears in: ICD-9-CM Legacy Diagnosis and Procedure Codes
vaccine effectiveness (VE): The percentage reduction in risk of the target disease attributable to vaccination, estimated here as VE = (1 − OR) × 100%; a VE of 60% means vaccinated people had 60% lower odds of being a case.; Appears in: Test-Negative Design
validation substudy: A small, intensive study nested inside the main analysis where researchers go beyond the database — through chart review or additional data linkage — to measure something (like smoking, BMI, or true disease status) that the main database does not capture.; Appears in: External Adjustment and Validation-Substudy Bias Correction
validity: A questionnaire is valid if its score actually captures the health concept it is supposed to measure — for example, a pain scale should correlate with other pain measures, not with unrelated outcomes like height.; Appears in: PRO Instrument Validation
value set: A published lookup table, derived from surveys of the general public, that converts a set of EQ-5D questionnaire responses into a utility number; different countries have different value sets, so the same questionnaire responses can produce different utility scores depending on which table is used.; Appears in: Health-Related Quality of Life (HRQoL) Measurement
van Walraven point score: A way to collapse the 31 flags into one number by adding published integer weights, some of which are negative, calibrated to in-hospital death.; Appears in: Elixhauser Comorbidity Measures / Index
variable importance: A score that ranks features by how much the model's accuracy drops when that feature is scrambled — a measure of predictive usefulness, NOT of whether the feature causes the outcome.; Appears in: Tree-Based Ensembles: Random Forests and Gradient Boosting
variance function: The mathematical rule that connects the mean of a count distribution to its variance; for the negative binomial (NB-2), this is Var = mean + dispersion-parameter times mean squared, so variance grows faster than the mean as counts get larger.; Appears in: Negative Binomial Distribution for Overdispersed Counts
variance inflation factor: A number that tells you how much a predictor's estimated effect is inflated in uncertainty because it is correlated with other predictors; a VIF of 4 means the standard error for that predictor is twice as wide as it would be if it were uncorrelated with the others.; Appears in: Regression Diagnostics and Model Checking
version pinning: Recording and locking the exact identifier of the LLM (model name, version, and API snapshot) used during a study so that the extraction can be audited or reproduced later and a model update does not silently change the results.; Appears in: LLM-Assisted Data Abstraction and Evidence Work in RWE
vocabulary: A standardized coding system — such as SNOMED for diagnoses, RxNorm for drugs, or LOINC for lab tests — that OMOP uses as its common language across different data sources.; Appears in: OMOP Concept Set Development
vocabulary version: A dated release identifier (e.g., v5.0 2024-02-23) stamped on each Athena download; the same concept-set expression can resolve to different concept_ids on different vocabulary versions, so every reproducible study must record which version was used.; Appears in: OMOP Standardized Vocabularies (OHDSI/Athena)
void claim: A claim transaction that cancels a previously submitted claim in its entirety, usually carrying negative dollar amounts equal to the original; for institutional claims, the Type of Bill frequency digit is 8.; Appears in: Claim Adjustments, Reversals, and Denials
Volume 3 procedure code: A 2-to-4-digit ICD-9-CM code (e.g., 81.54 for total knee replacement) that describes a surgical or therapeutic procedure performed during a hospital stay; it appeared only on inpatient hospital claims and was replaced by ICD-10-PCS in October 2015.; Appears in: ICD-9-CM Legacy Diagnosis and Procedure Codes

W

Wald confidence interval: The most common type of confidence interval, calculated by multiplying a standard error by 1.96 and adding or subtracting from the estimate; it requires a well-behaved, finite standard error to be valid.; Appears in: Exact and Penalized-Likelihood Methods for Sparse Data
Wald standard error: The uncertainty estimate for an MLE-based coefficient, derived from how steeply curved the log-likelihood is at its peak; it is the SE shown next to every coefficient in a regression output table.; Appears in: Maximum Likelihood Estimation
washout: The requirement that the study drug did not appear in the patient's records during the baseline period, which is how researchers confirm someone is a truly new user rather than a continuing one.; Appears in: Active Comparator, New-User Design, New-User (Incident-User) Design, Prevalent New-User Design, Study Time Windows: Baseline, Observation, and Outcome Windows
washout (clean lookback): A stretch of time before day zero during which the patient must have no prior fills of the study drug, so you know this is their first-ever use.; Appears in: Time Zero (Index Date) Alignment
washout period: A required lookback window before a patient's study entry date, used to confirm they had no prior diagnosis or treatment being studied — valid only if the entire lookback falls inside observable time.; Appears in: Continuous Enrollment and Observable Time, Prevalent User Bias, Washout / Clean / Lookback Period
weight: The share of the standard population that falls in one age stratum; weights for all strata sum to 1.0 (or to the total standard population count), so the weighted sum is a proper average.; Appears in: Direct Standardization, Meta-Analysis of Observational Studies, Meta-Analysis of Randomized Controlled Trials
weight normalization: Dividing the absolute dose by the patient's body weight so that exposure can be compared fairly across patients of different sizes.; Appears in: Pediatric Dose Normalization
weighted kappa: A variant of Cohen's kappa for ordinal categories that penalizes near-miss disagreements (e.g., one grade apart) less severely than large disagreements (e.g., three grades apart).; Appears in: Agreement Statistics: Kappa, ICC, and Bland-Altman
weighting: Giving some patients more or less mathematical 'vote' so the two groups end up looking alike on the measured characteristics, like rebalancing a scale.; Appears in: Baseline Characteristics and Covariate Balance
weights: Numbers assigned to each donor region, all between 0 and 1 and summing to exactly 1, that determine how much each donor contributes to the synthetic control; they are chosen so the weighted blend matches the treated region's pre-intervention outcome as closely as possible.; Appears in: Synthetic Control Method
willingness to accept (WTA): The minimum gain in a desirable attribute (such as higher efficacy) that a respondent requires before they will accept a worse level of another attribute (such as a higher rate of side effects).; Appears in: Patient Preference Study (DCE / BWS)
willingness-to-pay threshold: The maximum a health system or payer is prepared to spend to gain one additional unit of health (such as one QALY); if the ICER is below this ceiling, the treatment is judged cost-effective.; Appears in: Cost-effectiveness Analysis (CEA), Probabilistic Sensitivity Analysis (PSA) for Health-Economic Models, Scenario Analysis and Deterministic Sensitivity Analysis (DSA), Value of Information Analysis (EVPI, EVPPI, EVSI)
willingness-to-pay threshold (λ): The maximum a payer or health system is willing to spend to gain one additional QALY — in the US a common benchmark is $100,000 per QALY.; Appears in: ICER and Net Monetary Benefit (NMB)
win ratio: The number of winning pairs divided by the number of losing pairs; above 1 means the treatment came out ahead.; Appears in: Win Ratio and Generalized Pairwise Comparisons
win, loss, tie: For a pair, the treated patient either did better (win), worse (loss), or could not be told apart from the control (tie).; Appears in: Win Ratio and Generalized Pairwise Comparisons
winsorization: Replacing any cost above a chosen ceiling (such as the 99th-percentile value) with that ceiling value, so the patient stays in the dataset but their cost no longer pulls the average to extremes.; Appears in: Cost Outlier Handling (Winsorization, Trimming, Robust/Two-Part Models), Missing Data, Trimming, and Winsorization in RWE
within-group variance: How much individual patients scatter around their own group's average — the "noise" against which the between-group signal is measured; also called the residual or error variance.; Appears in: One-Way ANOVA
within-pair difference: The value computed by subtracting each patient's "before" measurement from their "after" measurement; the paired t-test performs all its arithmetic on these individual differences, not on the original measurements.; Appears in: Paired t-Test
within-person change: How a single patient's outcome shifts from visit to visit -- as opposed to between-person differences, which compare patients to each other.; Appears in: Mixed-Effects (Random-Effects) Models for Longitudinal RWE
within-person comparison: Comparing two time windows belonging to the same individual rather than comparing one group of people against another, so that fixed traits like genetics and baseline health automatically cancel out.; Appears in: Case-Time-Control Design, Self-Controlled Risk Interval (SCRI) Design
within-person control: Using the same individual's earlier time periods as the comparison group, so that everything stable about that person (age, sex, chronic diseases, lifestyle) is held constant by design.; Appears in: Case-Crossover Design
within-person correlation: The tendency for outcome values from the same patient to resemble each other more than values from different patients, simply because many patient characteristics stay stable over time.; Appears in: Longitudinal Outcomes Modeling
within-series proportion: A fraction calculated only among the collected patients (e.g., 5 of 8 = 62.5%), which describes this specific group but is not a population rate.; Appears in: Case Series
working correlation: The assumed pattern of similarity among repeated measurements within one person that GEE uses to be more efficient; the final estimate is valid even if this assumed pattern turns out to be wrong.; Appears in: GEE Population-Average (Marginal) Models

Y

Yates continuity correction: A small adjustment to the chi-square formula for 2x2 tables that makes the result more conservative; the modern consensus is that it over-corrects, and Fisher's exact test is the preferred alternative when sample sizes are small.; Appears in: Chi-Square Test of Independence

Z

z-score: The number of standard deviations a single value is above or below a reference mean, computed as (observed value minus reference mean) divided by reference standard deviation; a z-score of 2.0 means the value is two standard deviations above the mean.; Appears in: Normal Distribution and the Central Limit Theorem, Pediatric Growth and Development Endpoints in RWE
zero cell: A cell in a 2×2 table that contains no events (count = 0); Fisher's test can still compute a p-value in this case, but the odds ratio becomes undefined and requires special methods like Firth logistic regression to estimate.; Appears in: Exact and Penalized-Likelihood Methods for Sparse Data, Fisher's Exact Test
zero difference handling: What to do when a patient shows no change between the two time points; the original Wilcoxon method discards those pairs entirely, while the Pratt method keeps them but assigns them a rank without a sign, affecting how the test statistic is computed.; Appears in: Wilcoxon Signed-Rank Test