Data Mining

Data Mining (DM) is the use of algorithms to discover patterns in data and retrieve information that enhances human understanding and decision-making (Dunham, 2003, p. 9, p. 44). Many of the algorithms introduced in the fields of Artificial Intelligence and Machine Learning are now used in the DM task. Statistics are often an important part of validating the emerging models, and ensuring that future predictions are significant.

Descriptive models are aimed at identifying patterns or relationships in data so as to understand or express the nature of the data. Clustering is an unsupervised learning technique that is used to identify like data. Clusters are discovered based on data similarities and differences. Bayesian classification or other statistical techniques may be useful. Summarisation is one application of clustering and seeks to provide high-level summaries of the data. Association rules look for correlations by examining the behaviour of attributes across the data set. Note that the resulting rules may be observational rather than causal. Sequence discovery seeks to discover association rules on the basis of time.

Predictive models allow future values to be predicted, for example, using historical or correlated data. Classification is a supervised learning technique where data are classified into predefined categories. Pattern recognition seeks to discover and define the rules that determine when a particular classification is appropriate. Time series analysis is used to determine correlations, to confirm models, and to predict future values. Regression analysis can be used for time series or other analysis – for example in linear regression analysis we seek to fit a linear model to the data and minimise the error of the fit by changing the coefficients of the linear model.

Reference Book and Recommended Text Book

Dunham, M. (2003). Data Mining Introductory and Advanced Topics. New Jersey: Pearson Education Inc. . ISBN 0-13-088892-3.

Summary written by

Megan Vazey
Department of Computing, Macquarie University
Freyatech Pty Ltd
March 2006