Book Review
Data Mining Techniques : For Marketing, Sales and Customer Support

Review

2 stars

This book gives an overview of what data mining is and the tools available to perform it; Market Basket Analysis, Memory Based Reasoning, Automatic Cluster Detection, Link Analysis, Decision Trees, Artificial Neural Networks. Genetic Algorithms are also included, which, while not a data mining tool, are being used to train neural nets.

In each case the authors describe the principles behind the tool, its strengths and weaknesses and applications were it are applicable. The authors give tips on what data preparation is required for the tool, both in terms of data massaging, (which is required for neural nets) and indicate were it is important to select training sets that have approximately equal proportions of good & bad outcomes, in order for the tool to predict correctly.

The descriptions include simple examples of the tool to give an overview of how the tool works. But as the title indicates, this book is for users who are considering using data mining tools. It does not describe how to use particular applications, neither does it include code examples (pseudo or actual) if you are interesting in developing your own tools.

The book is easy to read and includes many examples from their experience of data mining in the real world. Further information on topics covered in the book can be found at the authors' web site www.data-miners.com.
4 Sep 2001

Brief Description of Contents

Chap. 1.

Why Data Mining?

Contains many descriptions of a few of the possible applications in which data mining tools can be used.
Chap. 2.

The Virtuous Cycle of Data Mining

Discusses the data mining process within the context of creating business, namely 1. Identifying the problem, 2. Analysing the data, 3. Taking action, 4. Measuring the outcome.

The first & third stages are business issues, the others relate to the data mining tools.

The virtuous cycle combines selecting the right data mining tool and data and integrating them into the business.

Chap. 3.

The Virtuous Cycle in Practice

An overview of the process used in selected cases, (from wireless communications, automobile manufacturing and banking industries), where data mining was employed effectively to support marketing activities.
Chap. 4.

What Can Data Mining Do?

Details and provides an example, (based on a database of movie goers), of what data mining can do, namely Classification, Estimation, Prediction, Affinity Grouping, Clustering, Description.
Chap. 5.

Data Mining Methodology

This chapter concentrates on the second stage of the virtuous cycle, the actual data mining process. It describes the process behind the two basic styles of data mining, hypothesis testing and knowledge discovery. The chapter concludes with a real-life example of each type.
Chap. 6.

Measuring the Effectiveness of Data Mining

This chapter concentrates on the forth stage of the virtuous cycle. Data mining is expensive and so the cost must be justified. This chapter describes how to evaluate the results of a data mining exercise for a business.
Chap. 7.

Overview of Data Mining Techniques

Describes the various types of models, classification, predictive clustering and time-series. Also described are the characteristics of the data mining models, namely, underfitting or over-fitting data, directed & undirected data mining, the explainablity of what the model is doing, and how easy the model is to apply. The chapter provides an introduction to each data mining tool.
The following seven chapters provide more detail on each of the data mining tools in turn. These more detailed chapters, describe principles behind the tool, what it can be used for, how to select the data to be analysed, (and were required, how the data can be transformed to meet the requirements of the tool) and ways to overcome the practical limits of the tool. Each chapter concludes with a list and description of the strengths and weaknesses of the tool.
Chap. 8.

Market Basket Analysis

Market Basket Analysis can be applied to undirected data mining problems that consist of well defined items that group together in interesting ways. Conversely, by applying it to directed data, it can be used to find outliers in data. This tool can also be used with time-series problems by transforming the data in a time-series.
Chap. 9.

Memory-Based Reasoning

MBR mimics how people are able to make decisions based on their past experience, by identifying previous cases and applying the information from these cases. MBR uses a distance function, a combination function and the number of neighbours to find the most similar existing cases for classification or prediction. Unlike other types of data mining tools, BMR is readily applicable to analysing text.

The chapter describes how to choose the distance and combination functions and the number of neighbours to be used. Included is a description of how MBR was used to classify news stories.

Chap. 10.

Automatic Cluster Detection

Clustering is able to perform undirected knowledge discovery or unsupervised learning, as it identifies clusters that are similar to each other. It is rarely used by itself, as we are generally not interested in finding clusters, rather what the items in the cluster have in common.

Clustering is a good tool to use at the start of a new data mining project when faced with a large, complex set of data that may have a lot of internal structures.

Chap. 11.

Link Analysis

Link analysis is based on a branch of mathematics called graph theory. It is able to identify relationships between data. Most data mining applications are unable to take advantage of this information, but consequently link analysis is not applicable to all types of data or able to solve all problems.

The chapter includes brief examples of how link analysis was used by telephone companies to identify who has home fax machines, and how cellular telephone customers can be segmented for the purpose of selling new services.

Chap. 12.

Decision Trees

Decision trees are powerful and popular tools for classification and prediction. Their attractiveness stems in part from the fact that the decisions are based on rules that can be represented in English or SQL.

The chapter includes an introduction as to how decision trees work and to the CART, CHAID & C4.5 algorithms used for building the trees. The authors provide a case study of how they used decision trees as part of a decision support system for the credit card division of a bank, and how they can be used with time-series data, where decision trees were used to simulate a coffee roaster.

Constructing Intelligent Agents with JAVA™ (Chapter 5), also contains information on decision trees and includes code to implement one.

Chap. 13.

Artificial Neural Networks

Neural networks are popular because they have a proven track record in many data mining and decision support systems. They are very powerful, general purpose tools capable of performing prediction, classification and clustering, and have been applied across a wide range of industries. Their drawback though is that they can not detail why the solution is valid.

In addition to explaining what a neural network comprises of and how it works, this chapter details how to select the training data, and how to prepare a wide range of data types and time-series for use by a neural net.

Also covered are Self Organising Maps (SOMs) or Kohonen (Feature) Maps. Constructing Intelligent Agents with JAVA™ (Chapter 5), also contains information on neural nets and includes code to implement a back propagation neural net and Kohonen map.

Chap. 14.

Genetic Algorithms

Chap. 15.

Data Mining and the Corporate Data Warehouse

Chap. 16.

Where Does OLAP Fit In?

Chap. 17.

Choosing the Right Tool for the Job

Chap. 18.

Putting Data Mining to Work

Back