Preface |
|
vii | |
|
|
1 | (18) |
|
|
2 | (2) |
|
|
4 | (2) |
|
The Origins of Data Mining |
|
|
6 | (1) |
|
|
7 | (4) |
|
Scope and Organization of the Book |
|
|
11 | (2) |
|
|
13 | (3) |
|
|
16 | (3) |
|
|
19 | (78) |
|
|
22 | (14) |
|
Attributes and Measurement |
|
|
23 | (6) |
|
|
29 | (7) |
|
|
36 | (8) |
|
Measurement and Data Collection Issues |
|
|
37 | (6) |
|
Issues Related to Applications |
|
|
43 | (1) |
|
|
44 | (21) |
|
|
45 | (2) |
|
|
47 | (3) |
|
|
50 | (2) |
|
|
52 | (3) |
|
|
55 | (2) |
|
Discretization and Binarization |
|
|
57 | (6) |
|
|
63 | (2) |
|
Measures of Similarity and Dissimilarity |
|
|
65 | (19) |
|
|
66 | (1) |
|
Similarity and Dissimilarity between Simple Attributes |
|
|
67 | (2) |
|
Dissimilarities between Data Objects |
|
|
69 | (3) |
|
Similarities between Data Objects |
|
|
72 | (1) |
|
Examples of Proximity Measures |
|
|
73 | (7) |
|
Issues in Proximity Calculation |
|
|
80 | (3) |
|
Selecting the Right Proximity Measure |
|
|
83 | (1) |
|
|
84 | (4) |
|
|
88 | (9) |
|
|
97 | (48) |
|
|
98 | (1) |
|
|
98 | (7) |
|
|
99 | (1) |
|
|
100 | (1) |
|
Measures of Location: Mean and Median |
|
|
101 | (1) |
|
Measures of Spread: Range and Variance |
|
|
102 | (2) |
|
Multivariate Summary Statistics |
|
|
104 | (1) |
|
Other Ways to Summarize the Data |
|
|
105 | (1) |
|
|
105 | (26) |
|
Motivations for Visualization |
|
|
105 | (1) |
|
|
106 | (4) |
|
|
110 | (14) |
|
Visualizing Higher-Dimensional Data |
|
|
124 | (6) |
|
|
130 | (1) |
|
OLAP and Multidimensional Data Analysis |
|
|
131 | (8) |
|
Representing Iris Data as a Multidimensional Array |
|
|
131 | (2) |
|
Multidimensional Data: The General Case |
|
|
133 | (2) |
|
Analyzing Multidimensional Data |
|
|
135 | (4) |
|
Final Comments on Multidimensional Data Analysis |
|
|
139 | (1) |
|
|
139 | (2) |
|
|
141 | (4) |
|
Classification: Basic Concepts, Decision Trees, and Model Evaluation |
|
|
145 | (62) |
|
|
146 | (2) |
|
General Approach to Solving a Classification Problem |
|
|
148 | (2) |
|
|
150 | (22) |
|
How a Decision Tree Works |
|
|
150 | (1) |
|
How to Build a Decision Tree |
|
|
151 | (4) |
|
Methods for Expressing Attribute Test Conditions |
|
|
155 | (3) |
|
Measures for Selecting the Best Split |
|
|
158 | (6) |
|
Algorithm for Decision Tree Induction |
|
|
164 | (2) |
|
An Example: Web Robot Detection |
|
|
166 | (2) |
|
Characteristics of Decision Tree Induction |
|
|
168 | (4) |
|
|
172 | (14) |
|
Overfitting Due to Presence of Noise |
|
|
175 | (2) |
|
Overfitting Due to Lack of Representative Samples |
|
|
177 | (1) |
|
Overfitting and the Multiple Comparison Procedure |
|
|
178 | (1) |
|
Estimation of Generalization Errors |
|
|
179 | (5) |
|
Handling Overfitting in Decision Tree Induction |
|
|
184 | (2) |
|
Evaluating the Performance of a Classifier |
|
|
186 | (2) |
|
|
186 | (1) |
|
|
187 | (1) |
|
|
187 | (1) |
|
|
188 | (1) |
|
Methods for Comparing Classifiers |
|
|
188 | (5) |
|
Estimating a Confidence Interval for Accuracy |
|
|
189 | (2) |
|
Comparing the Performance of Two Models |
|
|
191 | (1) |
|
Comparing the Performance of Two Classifiers |
|
|
192 | (1) |
|
|
193 | (5) |
|
|
198 | (9) |
|
Classification: Alternative Techniques |
|
|
207 | (120) |
|
|
207 | (16) |
|
How a Rule-Based Classifier Works |
|
|
209 | (2) |
|
|
211 | (1) |
|
How to Build a Rule-Based Classifier |
|
|
212 | (1) |
|
Direct Methods for Rule Extraction |
|
|
213 | (8) |
|
Indirect Methods for Rule Extraction |
|
|
221 | (2) |
|
Characteristics of Rule-Based Classifiers |
|
|
223 | (1) |
|
Nearest-Neighbor classifiers |
|
|
223 | (4) |
|
|
225 | (1) |
|
Characteristics of Nearest-Neighbor Classifiers |
|
|
226 | (1) |
|
|
227 | (19) |
|
|
228 | (1) |
|
Using the Bayes Theorem for Classification |
|
|
229 | (2) |
|
|
231 | (7) |
|
|
238 | (2) |
|
|
240 | (6) |
|
Artificial Neural Network (ANN) |
|
|
246 | (10) |
|
|
247 | (4) |
|
Multilayer Artificial Neural Network |
|
|
251 | (4) |
|
|
255 | (1) |
|
Support Vector Machine (SVM) |
|
|
256 | (20) |
|
Maximum Margin Hyperplanes |
|
|
256 | (3) |
|
Linear SVM: Separable Case |
|
|
259 | (7) |
|
Linear SVM: Nonseparable Case |
|
|
266 | (4) |
|
|
270 | (6) |
|
|
276 | (1) |
|
|
276 | (18) |
|
Rationale for Ensemble Method |
|
|
277 | (1) |
|
Methods for Constructing an Ensemble Classifier |
|
|
278 | (3) |
|
Bias-Variance Decomposition |
|
|
281 | (2) |
|
|
283 | (2) |
|
|
285 | (5) |
|
|
290 | (4) |
|
Empirical Comparison among Ensemble Methods |
|
|
294 | (1) |
|
|
294 | (12) |
|
|
295 | (3) |
|
The Receiver Operating Characteristic Curve |
|
|
298 | (4) |
|
|
302 | (3) |
|
Sampling-Based Approaches |
|
|
305 | (1) |
|
|
306 | (3) |
|
|
309 | (6) |
|
|
315 | (12) |
|
Association Analysis: Basic Concepts and Algorithms |
|
|
327 | (88) |
|
|
328 | (4) |
|
Frequent Itemset Generation |
|
|
332 | (17) |
|
|
333 | (2) |
|
Frequent Itemset Generation in the Apriori Algorithm |
|
|
335 | (3) |
|
Candidate Generation and Pruning |
|
|
338 | (4) |
|
|
342 | (3) |
|
|
345 | (4) |
|
|
349 | (4) |
|
|
350 | (1) |
|
Rule Generation in Apriori Algorithm |
|
|
350 | (2) |
|
An Example: Congressional Voting Records |
|
|
352 | (1) |
|
Compact Representation of Frequent Itemsets |
|
|
353 | (6) |
|
Maximal Frequent Itemsets |
|
|
354 | (1) |
|
|
355 | (4) |
|
Alternative Methods for Generating Frequent Itemsets |
|
|
359 | (4) |
|
|
363 | (7) |
|
|
363 | (3) |
|
Frequent Itemset Generation in FP-Growth Algorithm |
|
|
366 | (4) |
|
Evaluation of Association Patterns |
|
|
370 | (16) |
|
Objective Measures of Interestingness |
|
|
371 | (11) |
|
Measures beyond Pairs of Binary Variables |
|
|
382 | (2) |
|
|
384 | (2) |
|
Effect of Skewed Support Distribution |
|
|
386 | (4) |
|
|
390 | (14) |
|
|
404 | (11) |
|
Association Analysis: Advanced Concepts |
|
|
415 | (72) |
|
Handling Categorical Attributes |
|
|
415 | (3) |
|
Handling Continuous Attributes |
|
|
418 | (8) |
|
Discretization-Based Methods |
|
|
418 | (4) |
|
|
422 | (2) |
|
Non-discretization Methods |
|
|
424 | (2) |
|
Handling a Concept Hierarchy |
|
|
426 | (3) |
|
|
429 | (13) |
|
|
429 | (2) |
|
Sequential Pattern Discovery |
|
|
431 | (5) |
|
|
436 | (3) |
|
Alternative Counting Schemes |
|
|
439 | (3) |
|
|
442 | (15) |
|
|
443 | (1) |
|
|
444 | (3) |
|
|
447 | (1) |
|
|
448 | (5) |
|
|
453 | (4) |
|
|
457 | (1) |
|
|
457 | (12) |
|
|
458 | (1) |
|
Negatively Correlated Patterns |
|
|
458 | (2) |
|
Comparisons among Infrequent Patterns, Negative Patterns, and Negatively Correlated Patterns |
|
|
460 | (1) |
|
Techniques for Mining Interesting Infrequent Patterns |
|
|
461 | (2) |
|
Techniques Based on Mining Negative Patterns |
|
|
463 | (2) |
|
Techniques Based on Support Expectation |
|
|
465 | (4) |
|
|
469 | (4) |
|
|
473 | (14) |
|
Cluster Analysis: Basic Concepts and Algorithms |
|
|
487 | (82) |
|
|
490 | (6) |
|
What Is Cluster Analysis? |
|
|
490 | (1) |
|
Different Types of Clusterings |
|
|
491 | (2) |
|
Different Types of Clusters |
|
|
493 | (3) |
|
|
496 | (19) |
|
The Basic K-means Algorithm |
|
|
497 | (9) |
|
K-means: Additional Issues |
|
|
506 | (2) |
|
|
508 | (2) |
|
K-means and Different Types of Clusters |
|
|
510 | (1) |
|
|
510 | (3) |
|
K-means as an Optimization Problem |
|
|
513 | (2) |
|
Agglomerative Hierarchical Clustering |
|
|
515 | (11) |
|
Basic Agglomerative Hierarchical Clustering Algorithm |
|
|
516 | (2) |
|
|
518 | (6) |
|
The Lance-Williams Formula for Cluster Proximity |
|
|
524 | (1) |
|
Key Issues in Hierarchical Clustering |
|
|
524 | (2) |
|
|
526 | (1) |
|
|
526 | (6) |
|
Traditional Density: Center-Based Approach |
|
|
527 | (1) |
|
|
528 | (2) |
|
|
530 | (2) |
|
|
532 | (23) |
|
|
533 | (3) |
|
Unsupervised Cluster Evaluation Using Cohesion and Separation |
|
|
536 | (6) |
|
Unsupervised Cluster Evaluation Using the Proximity Matrix |
|
|
542 | (2) |
|
Unsupervised Evaluation of Hierarchical Clustering |
|
|
544 | (2) |
|
Determining the Correct Number of Clusters |
|
|
546 | (1) |
|
|
547 | (1) |
|
Supervised Measures of Cluster Validity |
|
|
548 | (5) |
|
Assessing the Significance of Cluster Validity Measures |
|
|
553 | (2) |
|
|
555 | (4) |
|
|
559 | (10) |
|
Cluster Analysis: Additional Issues and Algorithms |
|
|
569 | (82) |
|
Characteristics of Data, Clusters, and Clustering Algorithms |
|
|
570 | (7) |
|
Example: Comparing K-means and DBSCAN |
|
|
570 | (1) |
|
|
571 | (2) |
|
|
573 | (2) |
|
General Characteristics of Clustering Algorithms |
|
|
575 | (2) |
|
Prototype-Based Clustering |
|
|
577 | (23) |
|
|
577 | (6) |
|
Clustering Using Mixture Models |
|
|
583 | (11) |
|
Self-Organizing Maps (SOM) |
|
|
594 | (6) |
|
|
600 | (12) |
|
|
601 | (3) |
|
|
604 | (4) |
|
Denclue: A Kernel-Based Scheme for Density-Based Clustering |
|
|
608 | (4) |
|
|
612 | (18) |
|
|
613 | (1) |
|
Minimum Spanning Tree (MST) Clustering |
|
|
614 | (2) |
|
Opossum: Optimal Partitioning of Sparse Similarities Using METIS |
|
|
616 | (1) |
|
Chameleon: Hierarchical Clustering with Dynamic Modeling |
|
|
616 | (6) |
|
Shared Nearest Neighbor Similarity |
|
|
622 | (3) |
|
The Jarvis-Patrick Clustering Algorithm |
|
|
625 | (2) |
|
|
627 | (2) |
|
SNN Density-Based Clustering |
|
|
629 | (1) |
|
Scalable Clustering Algorithms |
|
|
630 | (9) |
|
Scalability: General Issues and Approaches |
|
|
630 | (3) |
|
|
633 | (2) |
|
|
635 | (4) |
|
Which Clustering Algorithm? |
|
|
639 | (4) |
|
|
643 | (4) |
|
|
647 | (4) |
|
|
651 | (34) |
|
|
653 | (5) |
|
|
653 | (1) |
|
Approaches to Anomaly Detection |
|
|
654 | (1) |
|
|
655 | (1) |
|
|
656 | (2) |
|
|
658 | (8) |
|
Detecting Outliers in a Univariate Normal Distribution |
|
|
659 | (2) |
|
Outliers in a Multivariate Normal Distribution |
|
|
661 | (1) |
|
A Mixture Model Approach for Anomaly Detection |
|
|
662 | (3) |
|
|
665 | (1) |
|
Proximity-Based Outlier Detection |
|
|
666 | (2) |
|
|
666 | (2) |
|
Density-Based Outlier Detection |
|
|
668 | (3) |
|
Detection of Outliers Using Relative Density |
|
|
669 | (1) |
|
|
670 | (1) |
|
Clustering-Based Techniques |
|
|
671 | (4) |
|
Assessing the Extent to Which an Object Belongs to a Cluster |
|
|
672 | (2) |
|
Impact of Outliers on the Initial Clustering |
|
|
674 | (1) |
|
The Number of Clusters to Use |
|
|
674 | (1) |
|
|
674 | (1) |
|
|
675 | (5) |
|
|
680 | (5) |
|
Appendix A Linear Algebra |
|
|
685 | (16) |
|
|
685 | (6) |
|
|
685 | (1) |
|
Vector Addition and Multiplication by a Scalar |
|
|
685 | (2) |
|
|
687 | (1) |
|
The Dot Product, Orthogonality, and Orthogonal Projections |
|
|
688 | (2) |
|
Vectors and Data Analysis |
|
|
690 | (1) |
|
|
691 | (9) |
|
|
691 | (1) |
|
Matrices: Addition and Multiplication by a Scalar |
|
|
692 | (1) |
|
|
693 | (2) |
|
Linear Transformations and Inverse Matrices |
|
|
695 | (2) |
|
Eigenvalue and Singular Value Decomposition |
|
|
697 | (2) |
|
Matrices and Data Analysis |
|
|
699 | (1) |
|
|
700 | (1) |
|
Appendix B Dimensionality Reduction |
|
|
701 | (18) |
|
|
701 | (7) |
|
Principal Components Analysis (PCA) |
|
|
701 | (5) |
|
|
706 | (2) |
|
Other Dimensionality Reduction Techniques |
|
|
708 | (8) |
|
|
708 | (2) |
|
Locally Linear Embedding (LLE) |
|
|
710 | (2) |
|
Multidimensional Scaling, FastMap, and ISOMAP |
|
|
712 | (3) |
|
|
715 | (1) |
|
|
716 | (3) |
|
Appendix C Probability and Statistics |
|
|
719 | (10) |
|
|
719 | (4) |
|
|
722 | (1) |
|
|
723 | (3) |
|
|
724 | (1) |
|
|
724 | (1) |
|
|
725 | (1) |
|
|
726 | (3) |
|
|
729 | (10) |
|
|
729 | (1) |
|
|
730 | (6) |
|
|
731 | (2) |
|
Analyzing Regression Errors |
|
|
733 | (2) |
|
Analyzing Goodness of Fit |
|
|
735 | (1) |
|
Multivariate Linear Regression |
|
|
736 | (1) |
|
Alternative Least-Square Regression Methods |
|
|
737 | (2) |
|
|
739 | (11) |
|
Unconstrained Optimization |
|
|
739 | (7) |
|
|
742 | (4) |
|
|
746 | (4) |
|
|
746 | (1) |
|
|
747 | (3) |
Author Index |
|
750 | (8) |
Subject Index |
|
758 | (11) |
Copyright Permissions |
|
769 | |