Statistics

Statistics is the field of inquiry concerned with the collection, analysis and presentation of quantitative empirical data. The use of statistical methods forms the cornerstone of quantitative experimental research in most scientific disciplines, not only in the social sciences such as economics, psychology and sociology, but also in biology, medicine, physics and chemistry. Statistical heuristics are employed in applied computational fields such as data mining and computational linguistics.

Scientists are generally interested in the effect that one variable (a predictor variable) has on another variable (a criterion variable). Typically an experiment will involve collecting data by recording measurements of the criterion variable, carried out on small subsets or samples taken from the larger population of entities which we are studying, with the samples differing from each other on the predictor variable. But there is inherent variability in empirically-derived data, because any one variable can be influenced by an arbitrary number of factors, and because any particular sample will not perfectly reflect the properties of the population. A central concern of statistics is to deal with this variability.

The methods of descriptive statistics usually come into play in the early, exploratory stages of data analysis, and provide measures of, e.g., central tendency (such as mean, median or modal values) or variability (such as variance or the standard deviation).

The methods of inferential statistics allow us to extrapolate from the sample in order to infer properties of the larger population. This could take the form of data modelling, i.e. trying to obtain an explicit formula that describes the contribution of the various predictor variables to the value of the criterion variable. The most commonly-used technique is multiple linear regression. The other common use of inferential statistics is to test an experimental hypothesis that one variable influences the value of another variable. (The default, or null hypothesis is that there is no such influence.) The researcher calculates a standard test statistic, for which the obtained values are known to be distributed according to a characteristic sampling distribution. If the obtained value for the test statistic is so extreme that it would have occurred by chance with some low probability p, then the null hypothesis can be rejected at the significance level of p, and the experiment can be interpreted as supporting the experimental hypothesis. Commonly-used statistical procedures yielding test statistics include the t-test, Pearson's chi-square test, and the analysis of variance (ANOVA).

The field of statistics is rapidly changing, with the prevalence of increasingly powerful computers allowing the development of statistical methods that have until recently been regarded as intractable. Examples include multivariate analysis of variance, discriminant analysis, factor analysis and cluster analysis.

References

Howell, D.C. (1997). Statistical Methods for Psychology, 4th Ed. Duxbury, Boston, Mass.
Kohler, H. (1988). Statistics for Business and Economics, 2nd Ed .. Scott, Foresman and Co., Glenview, Ill.
Fienberg, S.E. & Kadane, J.B. (2001). Statistics: the field. In Baltes, P.B. & Smelser, N.J. (Eds.), International encyclopedia of the social & behavioral sciences, Elsevier, Amsterdam.

Recommended Textbook

Howell, D.C. (2006). Statistical Methods for Psychology, 6th Ed. Wadsworth Publishing.

Summary Written By

Richard Leibbrandt
School of Informatics and Engineering
Flinders University of South Australia