Statistics for Epidemiology-Nicholas P. Jewell-1584884339-CRC-2003-352-$94

Statistics for Epidemiology-Nicholas P. Jewell-1584884339-CRC-2003-352-$94

(Parte 1 de 7)

Statistics for Epidemiology Statistics for Epidemiology

Texts in Statistical Science Series

Series Editors

Chris Chatfield, University of Bath, UK

Martin Tanner, Northwestern University, USA Jim Zidek, University of British Columbia, Canada

Analysis of Failure and Survival Data Peter J.Smith

The Analysis and Interpretation of Multivariate Data for Social Scientists David J.Bartholomew, Fiona Steele, Irini Moustaki, and Jane Galbraith

The Analysis of Time Series—An Introduction, Sixth Edition Chris Chatfield

Applied Bayesian Forecasting and Time Series Analysis A.Pole, M.West and J.Harrison

Applied Nonparametric Statistical Methods, Third Edition P.Sprent and N.C.Smeeton

Applied Statistics—Handbook of GENSTAT Analysis E.J.Snell and H.Simpson

Applied Statistics—Principles and Examples D.R.Cox and E.J.Snell

Bayes and Empirical Bayes Methods for Data Analysis, Second Edition Bradley P.Carlin and Thomas A.Louis

Bayesian Data Analysis, Second Edition Andrew Gelman, John B.Carlin, Hal S.Stern, and Donald B.Rubin

Beyond ANOVA—Basics of Applied Statistics R.G.Miller, Jr.

Computer-Aided Multivariate Analysis, Third Edition A.A.Afifi and V.A.Clark

A Course in Categorical Data Analysis T.Leonard

A Course in Large Sample Theory T.S.Ferguson

Data Driven Statistical Methods P.Sprent

Decision Analysis—A Bayesian Approach J.Q.Smith

Elementary Applications of Probability Theory, Second Edition H.C.Tuckwell

Elements of Simulation B.J.T.Morgan

Epidemiology—Study Design and Data Analysis M.Woodward

Essential Statistics, Fourth Edition D.A.G.Rees

A First Course in Linear Model Theory Nalini Ravishanker and Dipak K.Dey

Interpreting Data—A First Course in Statistics A.J.B.Anderson

An Introduction to Generalized Linear Models, Second Edition A.J.Dobson

Introduction to Multivariate Analysis C.Chatfield and A.J.Collins

Introduction to Optimization Methods and their Applications in Statistics B.S.Everitt

Large Sample Methods in Statistics P.K.Sen and J.da Motta Singer

Markov Chain Monte Carlo—Stochastic Simulation for Bayesian Inference D.Gamerman

Mathematical Statistics K.Knight

Modeling and Analysis of Stochastic Systems V.Kulkarni

Modelling Binary Data, Second Edition D.Collett

Modelling Survival Data in Medical Research, Second Edition D.Collett

Multivariate Analysis of Variance and Repeated Measures—A Practical Approach for Behavioural Scientists D.J.Hand and C.C.Taylor

Multivariate Statistics—A Practical Approach B.Flury and H.Riedwyl

Practical Data Analysis for Designed Experiments B.S.Yandell

Practical Longitudinal Data Analysis D.J.Hand and M.Crowder

Practical Statistics for Medical Research D.G.Altman

Probability—Methods and Measurement A.O’Hagan

Problem Solving—A Statistician’s Guide, Second Edition C.Chatfield

Randomization, Bootstrap and Monte Carlo Methods in Biology, Second Edition B.F.J.Manly

Readings in Decision Analysis S.French

Sampling Methodologies with Applications Poduri S.R.S.Rao

Statistical Analysis of Reliability Data

M.J.Crowder, A.C.Kimber, T.J.Sweeting, and R.L.Smith

Statistical Methods for SPC and TQM D.Bissell

Statistical Methods in Agriculture and Experimental Biology, Second Edition R.Mead, R.N.Curnow, and A.M.Hasted

Statistical Process Control—Theory and Practice, Third Edition

G.B.Wetherill and D.W.Brown

Statistical Theory, Fourth Edition B.W.Lindgren

Statistics for Accountants, Fourth Edition S.Letchford

Statistics for Epidemiology Nicholas P.Jewell

Statistics for Technology—A Course in Applied Statistics, Third Edition C.Chatfield

Statistics in Engineering—A Practical Approach A.V.Metcalfe

Statistics in Research and Development, Second Edition R.Caulcutt

Survival Analysis Using S—Analysis of Time-to-Event Data Mara Tableman and Jong Sung Kim

The Theory of Linear Models B.Jørgensen

Statistics for Epidemiology Nicholas P.Jewell

A CRC Press Company Boca Raton London NewYork Washington, D.C.

This edition published in the Taylor & Francis e-Library, 2009.

To purchase your own copy of this or any of

Taylor & Francis or Routledge’s collection of thousands of eBooks please go to

Datasets and solutions to exercises can be downloaded at downloads/.

Send correspondence to Nicholas P.Jewell, Division of Biostatistics, School of Public Health, 140 Warren Hall #7360, University of California, Berkeley, CA 94720, USA. Phone: 510-642-4627, Fax: 510-643-5163, e-mail:

Library of Congress Cataloging-in-Publication Data

Statistics for epidemiology/by Nicholas P.Jewell.—(Texts in statistical science series; 58) Includes bibliographical references and index.

ISBN 1-58488-433-9 (alk. paper) 1. Epidemiology—Statistical methods. I. Jewell, Nicholas P, 1952– I. Texts in statistical science.

This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use.

Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher.

The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying.

Direct all inquiries to CRC Press LLC, 2000 N.W.Corporate Blvd., Boca Raton, Florida 33431.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation, without intent to infringe.

Visit the CRC Press Web site at © 2004 by Chapman & Hall/CRC No claim to original U.S.Government works

ISBN 0-203-49686-8 Master e-book ISBN

ISBN 0-203-59461-4 International Standard Book Number 1-58488-433-9 Library of Congress Card Number 2003051458

(Adobe ebook Reader Format)

To Debra and Britta, my very soul of life To Debra and Britta, my very soul of life


1 Introduction 1 1.1 Disease processes 1 1.2 Statistical approaches to epidemiological data 2

1.2.1 Study design 3 1.2.2 Binary outcome data 4 1.3 Causality 4 1.4 Overview 5 1.4.1 Caution: what is not covered 7

1.5 Comments and further reading 7 2 Measures of Disease Occurrence 9

2.1 Prevalence and incidence 9 2.2 Disease rates 12 2.2.1 The hazard function 13 2.3 Comments and further reading 15 2.4 Problems 16 3 The Role of Probability in Observational Studies 19 3.1 Simple random samples 19 3.2 Probability and the incidence proportion 21 3.3 Inference based on an estimated probability 2 3.4 Conditional probabilities 24 3.4.1 Independence of two events 26 3.5 Example of conditional probabilities—Berkson’s bias 27 3.6 Comments and further reading 28 3.7 Problems 29 4 Measures of Disease-Exposure Association 31 4.1 Relative risk 31 4.2 Odds ratio 32 4.3 The odds ratio as an approximation to the relative risk 3 4.4 Symmetry of roles of disease and exposure in the odds ratio 35 xii Contents

4.5 Relative hazard 35 4.6 Excess risk 37 4.7 Attributable risk 38 4.8 Comments and further reading 41 4.9 Problems 42 5 Study Designs 4 5.1 Population-based studies 46 5.1.1 Example—mother’s marital status and infant birthweight 47 5.2 Exposure-based sampling—cohort studies 48 5.3 Disease-based sampling—case-control studies 49 5.4 Key variants of the case-control design 52 5.4.1 Risk-set sampling of controls 52 5.4.2 Case-cohort studies 5 5.5 Comments and further reading 56 5.6 Problems 57 6 Assessing Significance in a 2x2 Table 60 6.1 Population-based designs 60 6.1.1 Role of hypothesis tests and interpretation of p-values 62 6.2 Cohort designs 63 6.3 Case-control designs 6 6.3.1 Comparison of the study designs 67 6.4 Comments and further reading 70

6.4.1 Alternative formulations of the x2 test statistic 71

6.4.2 When is the sample size too small to do a x2 test? 72

6.5 Problems 73 7 Estimation and Inference for Measures of Association 76 7.1 The odds ratio 76 7.1.1 Sampling distribution of the odds ratio 7 7.1.2 Confidence interval for the odds ratio 81 7.1.3 Example—coffee drinking and pancreatic cancer 82 7.1.4 Small sample adjustments for estimators of the odds ratio 83 7.2 The relative risk 85

Contents xiii

7.2.1 Example—coronary heart disease in the Western Collaborative Group Study 87

7.3 The excess risk 87 7.4 The attributable risk 8 7.5 Comments and further reading 90 7.5.1 Measurement error or misclassification 91 7.6 Problems 95 8 Causal Inference and Extraneous Factors: Confounding and Interaction 98 8.1 Causal inference 9 8.1.1 Counterfactuals 9 8.1.2 Confounding variables 104 8.1.3 Control of confounding by stratification 105 8.2 Causal graphs 107

8.2.1 Assumptions in causal graphs 110

8.2.2 Causal graph associating childhood vaccination to subsequent health condition 1

8.2.3 Using causal graphs to infer the presence of confounding 112 8.3 Controlling confounding in causal graphs 114

8.3.1 Danger: controlling for colliders 114 8.3.2 Simple rules for using a causal graph to choose the crucial confounders 116

8.4 Collapsibility over strata 117 8.5 Comments and further reading 121 8.6 Problems 124 9 Control of Extraneous Factors 128 9.1 Summary test of association in a series of 2 x 2 tables 128 9.1.1 The Cochran-Mantel-Haenszel test 130

9.1.2 Sample size issues and a historical note 133

9.2 Summary estimates and confidence intervals for the odds ratio, adjusting for confounding factors 134

9.2.1 Woolf’s method on the logarithm scale 134 9.2.2 The Mantel-Haenszel method 136 9.2.3 Example—the Western Collaborative Group Study: part 2 137

9.2.4 Example—coffee drinking and pancreatic cancer: part 2 140 xiv Contents

9.3 Summary estimates and confidence intervals for the relative risk, adjusting for confounding factors 140

9.3.1 Example—the Western Collaborative Group Study: part 3 141

9.4 Summary estimates and confidence intervals for the excess risk, adjusting for confounding factors 143

9.4.1 Example—the Western Collaborative Group Study: part 4 144 9.5 Further discussion of confounding 145 9.5.1 How do adjustments for confounding affect precision? 145 9.5.2 An empirical approach to confounding 149 9.6 Comments and further reading 150 9.7 Problems 151 10 Interaction 153 10.1 Multiplicative and additive interaction 154 10.1.1 Multiplicative interaction 154 10.1.2 Additive interaction 155 10.2 Interaction and counterfactuals 156 10.3 Test of consistency of association across strata 159 10.3.1 The Woolf method 159 10.3.2 Alternative tests of homogeneity 162 10.3.3 Example—the Western Collaborative Group Study: part 5 163 10.3.4 The power of the test for homogeneity 166 10.4 Example of extreme interaction 167 10.5 Comments and further reading 169 10.6 Problems 170 1 Exposures at Several Discrete Levels 172 1.1 Overall test of association 172 1.2 Example—coffee drinking and pancreatic cancer: part 3 174 1.3 A test for trend in risk 175 1.3.1 Qualitatively ordered exposure variables 177 1.3.2 Goodness of fit and nonlinear trends in risk 178 1.4 Example—the Western Collaborative Group Study: part 6 178 1.5 Example—coffee drinking and pancreatic cancer: part 4 180 1.6 Adjustment for confounding, exact tests, and interaction 182

Contents xv

1.7 Comments and further reading 184 1.8 Problems 184 12 Regression Models Relating Exposure to Disease 186 12.1 Some introductory regression models 188 12.1.1 The linear model 188 12.1.2 Pros and cons of the linear model 190 12.2 The log linear model 190 12.3 The probit model 191 12.4 The simple logistic regression model 193 12.4.1 Interpretation of logistic regression parameters 194 12.5 Simple examples of the models with a binary exposure 196 12.6 Multiple logistic regression model 198 12.6.1 The use of indicator variables for discrete exposures 199 12.7 Comments and further reading 205 12.8 Problems 206 13 Estimation of Logistic Regression Model Parameters 209 13.1 The likelihood function 209

13.1.1 The likelihood function based on a logistic regression model 212

13.1.2 Properties of the log likelihood function and the maximum likelihood estimate 214

13.1.3 Null hypotheses that specify more than one regression coefficient 216 13.2 Example—the Western Collaborative Group Study: part 7 218 13.3 Logistic regression with case-control data 223 13.4 Example—coffee drinking and pancreatic cancer: part 5 226 13.5 Comments and further reading 229 13.6 Problems 230 14 Confounding and Interaction within Logistic Regression Models 232 14.1 Assessment of confounding using logistic regression models 232

14.1.1 Example—the Western Collaborative Group Study: part 8 234 14.2 Introducing interaction into the multiple logistic regression model 236 14.3 Example—coffee drinking and pancreatic cancer: part 6 239

14.4 Example—the Western Collaborative Group Study: part 9 242

14.5 Collinearity and centering variables 242 xvi Contents

14.5.1 Centering independent variables 245 14.5.2 Fitting quadratic models 246 14.6 Restrictions on effective use of maximum likelihood techniques 248 14.7 Comments and further reading 249 14.7.1 Measurement error 249 14.7.2 Missing data 250 14.8 Problems 252 15 Goodness of Fit Tests for Logistic Regression Models and Model Building 255 15.1 Choosing the scale of an exposure variable 255 15.1.1 Using ordered categories to select exposure scale 255 15.1.2 Alternative strategies 256 15.2 Model building 258 15.3 Goodness of fit 262 15.3.1 The Hosmer-Lemeshow test 264 15.4 Comments and further reading 266 15.5 Problems 267 16 Matched Studies 270 16.1 Frequency matching 270 16.2 Pair matching 271 16.2.1 Mantel-Haenszel techniques applied to pair-matched data 276

16.2.2 Small sample adjustment for odds ratio estimator 277

16.3 Example—pregnancy and spontaneous abortion in relation to coronary heart disease in women 277

16.4 Confounding and interaction effects 278

16.4.1 Assessing interaction effects of matching variables 279

16.4.2 Possible confounding and interactive effects due to nonmatching variables 280

16.5 The logistic regression model for matched data 282

16.5.1 Example—pregnancy and spontaneous abortion in relation to coronary heart disease in women: part 2 285

16.6 Example—the effect of birth order on respiratory distress syndrome in twins 288

16.7 Comments and further reading 290 16.7.1 When can we break the match? 291

Contents xvii

16.7.2 Final thoughts on matching 292 16.8 Problems 293 17 Alternatives and Extensions to the Logistic Regression Model 299 17.1 Flexible regression model 299

17.2 Beyond binary outcomes and independent observations 304

17.3 Introducing general risk factors into formulation of the relative hazard—the Cox model 305

17.4 Fitting the Cox regression model 308 17.5 When does time at risk confound an exposure-disease relationship? 310

(Parte 1 de 7)