Dictionary of Statistics (SAGE 2004)

Dictionary of Statistics (SAGE 2004)

(Parte 1 de 9)

The SAGE Dictionary of Statistics

The S A G E

D i c t i o n a r y of

Duncan Cr mer and Den nis How i t t at i s t i c s Duncan Cramer and Dennis Howitt

The SAGEDictionary of Statistics Cramer-Prelims.qxd 4/2/04 2:09 PM Page i

Cramer-Prelims.qxd 4/2/04 2:09 PM Page i Cramer-Prelims.qxd 4/2/04 2:09 PM Page i

The SAGEDictionary of Statistics a practical resource for students in the social sciences

Duncan Cramer and Dennis Howitt

SAGE Publications London●Thousand Oaks●New Delhi

Cramer-Prelims.qxd 4/2/04 2:09 PM Page i

© Duncan Cramer and Dennis Howitt 2004 First published 2004

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, this publication may be reproduced, stored or transmitted in any form, or by any means, only with the prior permission in writing of the publishers, or in the case of reprographic reproduction, in accordance with the terms of licences issued by the Copyright Licensing Agency. Inquiries concerning reproduction outside those terms should be sent to the publishers.

SAGE Publications Ltd 1 Oliver’s Yard 5 City Road London EC1Y1SP

SAGE Publications Inc. 2455 Teller Road Thousand Oaks, California 91320

SAGE Publications India Pvt Ltd B-42, Panchsheel Enclave Post Box 4109 New Delhi 110 017

British Library Cataloguing in Publication data

Acatalogue record for this book is available from the British Library

ISBN 0 7619 4137 1 ISBN 0 7619 4138 X (pbk)

Library of Congress Control Number: 2003115348

Typeset by C&M Digitals (P) Ltd. Printed in Great Britain by The Cromwell Press Ltd, Trowbridge, Wiltshire

Cramer-Prelims.qxd 4/2/04 2:09 PM Page iv


Preface vii Some Common Statistical Notationix

Some Useful Sources187

Cramer-Prelims.qxd 4/2/04 2:09 PM Page v

To our mothers – it is not their fault that lexicography took its toll. Cramer-Prelims.qxd 4/2/04 2:09 PM Page vi


Writing a dictionary of statistics is not many people’s idea of fun. And it wasn’t ours. Can we say that we have changed our minds about this at all? No. Nevertheless, now the reading and writing is over and those heavy books have gone back to the library, we are glad that we wrote it. Otherwise we would have had to buy it. The dictionary provides a valuable resource for students – and anyone else with too little time on their hands to stack their shelves with scores of specialist statistics textbooks.

Writing a dictionary of statistics is one thing – writing a practical dictionary of statistics is another. The entries had to be useful, not merely accurate. Accuracy is not that useful on its own. One aspect of the practicality of this dictionary is in facilitating the learning of statistical techniques and concepts. The dictionary is not intended to stand alone as a textbook – there are plenty of those. We hope that it will be more important than that. Perhaps only the computer is more useful. Learning statistics is a complex business. Inevitably, students at some stage need to supplement their textbook. Atrip to the library or the statistics lecturer’s office is daunting. Getting a statistics dictionary from the shelf is the lesser evil. And just look at the statistics textbook next to it – you probably outgrew its usefulness when you finished the first year at university.

Few readers, not even ourselves, will ever use all of the entries in this dictionary.

That would be a bit like stamp collecting. Nevertheless, all of the important things are here in a compact and accessible form for when they are needed. No doubt there are omissions but even The Collected Works of Shakespeareleaves out Pygmalion! Let us know of any. And we are not so clever that we will not have made mistakes. Let us know if you spot any of these too – modern publishing methods sometimes allow corrections without a major reprint.

Many of the key terms used to describe statistical concepts are included as entries elsewhere. Where we thought it useful we have suggested other entries that are related to the entry that might be of interest by listing them at the end of the entry under ‘See’ or ‘See also’. In the main body of the entry itself we have not drawn attention to the terms that are covered elsewhere because we thought this could be too distracting to many readers. If you are unfamiliar with a term we suggest you look it up.

Many of the terms described will be found in introductory textbooks on statistics.

We suggest that if you want further information on a particular concept you look it up in a textbook that is ready to hand. There are a large number of introductory statistics

Cramer-Prelims.qxd 4/2/04 2:09 PM Page vii texts that adequately discuss these terms and we would not want you to seek out a particular text that we have selected that is not readily available to you. For the less common terms we have recommended one or more sources for additional reading. The authors and year of publication for these sources are given at the end of the entry and full details of the sources are provided at the end of the book. As we have discussed some of these terms in texts that we have written, we have sometimes recommended our own texts! The key features of the dictionary are:

•Compact and detailed descriptions of key concepts. •Basic mathematical concepts explained.

•Details of procedures for hand calculations if possible.

•Difficulty level matched to the nature of the entry: very fundamental concepts are the most simply explained; more advanced statistics are given a slightly more sophisticated treatment.

•Practical advice to help guide users through some of the difficulties of the application of statistics.

•Exceptionally wide coverage and varied range of concepts, issues and procedures – wider than any single textbook by far. •Coverage of relevant research methods.

•Compatible with standard statistical packages.

• Extensive cross-referencing.

•Useful additional reading.

One good thing, we guess, is that since this statistics dictionary would be hard to distinguish from a two-author encyclopaedia of statistics, we will not need to write one ourselves.

Duncan Cramer Dennis Howitt

THE SAGE DICTIONARY OF STATISTICSviii Cramer-Prelims.qxd 4/2/04 2:09 PM Page viii

Some Common Statistical Notation

Roman letter symbols or abbreviations:

a constant dfdegrees of freedom F test log nnatural or Napierian logarithm Marithmetic mean MSmean square nor Nnumber of cases in a sample p probability rPearson’s correlation coefficient R multiple correlation SD standard deviation SSsum of squares t test

Greek letter symbols:

(lower case alpha) Cronbach’s alpha reliability, significance level or alpha error (lower case beta) regression coefficient, beta error (lower case gamma) (lower case delta) (lower case eta) (lower case kappa) (lower case lambda) (lower case rho) (lower case tau) (lower case phi) (lower case chi)

Cramer-Prelims.qxd 4/2/04 2:09 PM Page ix

Some common mathematical symbols:

sum of infinity equal to less than less than or equal to greater than greater than or equal to square root

THE SAGE DICTIONARY OF STATISTICSx Cramer-Prelims.qxd 4/2/04 2:09 PM Page x

a posteriori tests:see post hoctests a priori comparisons ortests:where there are three or more means that may be compared (e.g. analysis of variance with three groups), one strategy is to plan the analysis in advance of collecting the data (or examining them). So, in this context, a priori means before the data analysis. (Obviously this would only apply if the researcher was not the data collector, otherwise it is in advance of collecting the data.) This is important because the process of deciding what groups are to be compared should be on the basis of the hypotheses underlying the planning of the research. By definition, this implies that the researcher is generally disinterested in general or trivial aspects of the data which are not the researcher’s primary focus. As a consequence, just a few of the possible comparisons are needed to be made as these contain the crucial information relative to the researcher’s interests. Table A.1 involves a simple ANOVA design in which there are four conditions – two are drug treatments and there are two control conditions. There are two control conditions because in one case the placebo tablet is for drug Aand in the other case the placebo tablet is for drug B.

An appropriate a priori comparison strategy in this case would be:

•Meanaagainst Meanb •Meanaagainst Meanc

•Meanbagainst Meand

Notice that this is fewer than the maximum number of comparisons that could be made (a total of six). This is because the researcher has ignored issues which perhaps are of little practical concern in terms of evaluating the effectiveness of the different drugs. For example, comparing placebo control Awith placebo control B answers questions about the relative effectiveness of the placebo conditions but has no bearing on which drug is the most effective overall.

The a priori approach needs to be compared with perhaps the more typical alternative research scenario – post hoccomparisons. The latter involves an unplanned analysis of the data following their collection. While this may be a perfectly adequate process, it is nevertheless far less clearly linked with the established priorities of the research than a priori comparisons. In post hoctesting, there tends to be an exhaustive examination of all of the possible pairs of means – so in the example in Table A.1 all four means would be compared with each other in pairs. This gives a total of six different comparisons.

In a priori testing, it is not necessary to carry out the overall ANOVAsince this merely tests whether there are differences across the various means. In these circumstances, failure of some means to differ from

Table A.1A simple ANOVA design

PlaceboPlacebo Drug ADrug Bcontrol Acontrol B

Mean = Mean = Mean = Mean =

Cramer Chapter-A.qxd 4/2/04 2:09 PM Page 1 the others may produce non-significant findings due to conditions which are of little or no interest to the researcher. In a prioritesting,the number of comparisons to be made has been limited to a small number of key comparisons. It is generally accepted that if there are relatively few a priori comparisons to be made, no adjustment is needed for the number of comparisons made. One rule of thumb is that if the comparisons are fewer in total than the degrees of freedom for the main effect minus one, it is perfectly appropriate to compare means without adjustment for the number of comparisons.

Contrasts are examined in a priori testing.

This is a system of weighting the means in order to obtain the appropriate mean difference when comparing two means. One mean is weighted (multiplied by) 1 and the other is weighted 1. The other means are weighted 0. The consequence of this is that the two key means are responsible for the mean difference. The other means (those not of interest) become zero and are always in the centre of the distribution and hence cannot influence the mean difference.

There is an elegance and efficiency in the a priori comparison strategy. However, it does require an advanced level of statistical and research sophistication. Consequently, the more exhaustive procedure of the post hoc test (multiple comparisons test) is more familiar in the research literature. See also: analysis of variance;Bonferroni test;contrast;Dunn’s test;Dunnett’s C test;Dunnett’s T3 test;Dunnett’s test;Dunn–Sidak multiple comparison test;omnibus test;post hoc tests abscissa:this is the horizontal or xaxis in a graph. See xaxis absolute deviation:this is the difference between one numerical value and another numerical value. Negative values are ignored as we are simply measuring the distance between the two numbers. Most commonly, absolute deviation in statistics is the differencebetween a score and the mean (or sometimes median) of the set of scores. Thus, the absolute deviation of a score of 9 from the mean of 5 is 4. The absolute deviation of a score of 3 from the mean of 5 is 2 (Figure A.1). One advantage of the absolute deviation over deviation is that the former totals (and averages) for a set of scores to values other than 0.0 and so gives some indication of the variability of the scores. See also: mean deviation;mean, arithmetic acquiescence oryea-saying response set orstyle:this is the tendency to agree or to say ‘yes’ to a series of questions. This tendency is the opposite of disagreeing or saying ‘no’ to a set of questions, sometimes called a nay-saying response set. If agreeing or saying ‘yes’ to a series of questions results in a high score on the variable that those questions are measuring, such as being anxious, then a high score on the questions may indicate either greater anxiety or a tendency to agree. To control or to counteract this tendency, half of the questions may be worded in the opposite or reverse way so that if a person has a tendency to agree the tendency will cancel itself out when the two sets of items are combined.

adding:see negative values


Absolute deviation 4

Absolute deviation 2

35 Figure A.1Absolute deviations

Cramer Chapter-A.qxd 4/2/04 2:09 PM Page 2 addition rule:a simple principle of probability theory is that the probability of either of two different outcomes occurring is the sum of the separate probabilities for those two different events (Figure A.2). So, the probability of a die landing 3 is 1 divided by 6 (i.e. 0.167) and the probability of a die landing 5 is 1 divided by 6 (i.e. 0.167 again). The probability of getting either a 3 or a 5 when tossing a die is the sum of the two separate probabilities (i.e. 0.167 0.167 0.3). Of course, the probability of getting any of the numbers from 1 to 6 spots is 1.0 (i.e. the sum of six probabilities of 0.167).

adjusted means,analysis of covariance: see analysis of covariance agglomeration schedule:a table that shows which variables or clusters of variables are paired together at different stages of a cluster analysis. See cluster analysis Cramer (2003) algebra:in algebra numbers are represented as letters and other symbols when giving equations or formulae. Algebra therefore is the basis of statistical equations. So a typical example is the formula for the mean:

In this mstands for the numerical value of the mean, Xis the numerical value of a score,

Nis the number of scores and is the symbol indicating in this case that all of the scores under consideration should be added together.

One difficulty in statistics is that there is a degree of inconsistency in the use of the symbols for different things. So generally speaking, if a formula is used it is important to indicate what you mean by the letters in a separate key.

algorithm:this is a set of steps which describe the process of doing a particular calculation or solving a problem. It is a common term to use to describe the steps in a computer program to do a particular calculation. See also: heuristic alpha error:see Type I oralpha error alpha ( ) reliability,Cronbach’s:one of a number of measures of the internal consistency of items on questionnaires, tests and other instruments. It is used when all the items on the measure (or some of the items) are intended to measure the same concept (such as personality traits such as neuroticism). When a measure is internally consistent, all of the individual questions or items making up that measure should correlate well with the others. One traditional way of checking this is split-half reliability in which the items making up the measure are split into two sets (odd-numbered items versus

(Parte 1 de 9)