Foreword |
|
xiii | |
Preface |
|
xv | |
Acknowledgments |
|
xvii | |
|
|
1 | (8) |
|
|
1 | (1) |
|
Where Is Data Mining Used? |
|
|
2 | (1) |
|
The Origins of Data Mining |
|
|
2 | (1) |
|
The Rapid Growth of Data Mining |
|
|
3 | (1) |
|
Why Are There So Many Different Methods? |
|
|
4 | (1) |
|
|
4 | (2) |
|
|
6 | (3) |
|
Overview of the Data Mining Process |
|
|
9 | (26) |
|
|
9 | (1) |
|
Core Ideas in Data Mining |
|
|
9 | (2) |
|
Supervised and Unsupervised Learning |
|
|
11 | (1) |
|
|
11 | (2) |
|
|
13 | (8) |
|
Building a Model: Example with Linear Regression |
|
|
21 | (6) |
|
Using Excel for Data Mining |
|
|
27 | (8) |
|
|
31 | (4) |
|
Data Exploration and Dimension Reduction |
|
|
35 | (18) |
|
|
35 | (1) |
|
|
35 | (2) |
|
Example 1: House Prices in Boston |
|
|
36 | (1) |
|
|
37 | (1) |
|
|
38 | (2) |
|
|
40 | (1) |
|
Reducing the Number of Categories in Categorical Variables |
|
|
41 | (1) |
|
Principal Components Analysis |
|
|
41 | (12) |
|
Example 2: Breakfast Cereals |
|
|
42 | (3) |
|
|
45 | (1) |
|
|
46 | (3) |
|
Using Principal Components for Classification and Prediction |
|
|
49 | (2) |
|
|
51 | (2) |
|
Evaluating Classification and Predictive Performance |
|
|
53 | (22) |
|
|
53 | (1) |
|
Judging Classification Performance |
|
|
53 | (19) |
|
|
53 | (3) |
|
Cutoff for Classification |
|
|
56 | (4) |
|
Performance in Unequal Importance of Classes |
|
|
60 | (1) |
|
Asymmetric Misclassification Costs |
|
|
61 | (5) |
|
Oversampling and Asymmetric Costs |
|
|
66 | (6) |
|
Classification Using a Triage Strategy |
|
|
72 | (1) |
|
Evaluating Predictive Performance |
|
|
72 | (3) |
|
|
74 | (1) |
|
Multiple Linear Regression |
|
|
75 | (16) |
|
|
75 | (1) |
|
Explanatory vs. Predictive Modeling |
|
|
76 | (1) |
|
Estimating the Regression Equation and Prediction |
|
|
76 | (5) |
|
Example: Predicting the Price of Used Toyota Corolla Automobiles |
|
|
77 | (4) |
|
Variable Selection in Linear Regression |
|
|
81 | (10) |
|
Reducing the Number of Predictors |
|
|
81 | (1) |
|
How to Reduce the Number of Predictors |
|
|
82 | (4) |
|
|
86 | (5) |
|
Three Simple Classification Methods |
|
|
91 | (20) |
|
|
91 | (1) |
|
Example 1: Predicting Fraudulent Financial Reporting |
|
|
91 | (1) |
|
Example 2: Predicting Delayed Flights |
|
|
92 | (1) |
|
|
92 | (1) |
|
|
93 | (10) |
|
Conditional Probabilities and Pivot Tables |
|
|
94 | (1) |
|
|
94 | (1) |
|
|
95 | (5) |
|
Advantages and Shortcomings of the naive Bayes Classifier |
|
|
100 | (3) |
|
|
103 | (8) |
|
|
104 | (1) |
|
|
105 | (1) |
|
k-NN for a Quantitative Response |
|
|
106 | (1) |
|
Advantages and Shortcomings of k-NN Algorithms |
|
|
106 | (2) |
|
|
108 | (3) |
|
Classification and Regression Trees |
|
|
111 | (26) |
|
|
111 | (2) |
|
|
113 | (1) |
|
|
113 | (1) |
|
|
113 | (7) |
|
|
115 | (5) |
|
Evaluating the Performance of a Classification Tree |
|
|
120 | (1) |
|
Example 2: Acceptance of Personal Loan |
|
|
120 | (1) |
|
|
121 | (9) |
|
Stopping Tree Growth: CHAID |
|
|
121 | (4) |
|
|
125 | (5) |
|
Classification Rules from Trees |
|
|
130 | (1) |
|
|
130 | (2) |
|
|
130 | (1) |
|
|
131 | (1) |
|
|
132 | (1) |
|
Advantages, Weaknesses, and Extensions |
|
|
132 | (5) |
|
|
134 | (3) |
|
|
137 | (30) |
|
|
137 | (1) |
|
The Logistic Regression Model |
|
|
138 | (8) |
|
Example: Acceptance of Personal Loan |
|
|
139 | (2) |
|
Model with a Single Predictor |
|
|
141 | (2) |
|
Estimating the Logistic Model from Data: Computing Parameter Estimates |
|
|
143 | (1) |
|
Interpreting Results in Terms of Odds |
|
|
144 | (2) |
|
Why Linear Regression Is Inappropriate for a Categorical Response |
|
|
146 | (2) |
|
Evaluating Classification Performance |
|
|
148 | (2) |
|
|
148 | (2) |
|
Evaluating Goodness of Fit |
|
|
150 | (3) |
|
Example of Complete Analysis: Predicting Delayed Flights |
|
|
153 | (7) |
|
|
154 | (1) |
|
Model Fitting and Estimation |
|
|
155 | (1) |
|
|
155 | (1) |
|
|
155 | (2) |
|
|
157 | (1) |
|
|
158 | (2) |
|
Logistic Regression for More Than Two Classes |
|
|
160 | (7) |
|
|
160 | (1) |
|
|
161 | (2) |
|
|
163 | (4) |
|
|
167 | (20) |
|
|
167 | (1) |
|
Concept and Structure of a Neural Network |
|
|
168 | (1) |
|
Fitting a Network to Data |
|
|
168 | (13) |
|
|
169 | (1) |
|
Computing Output of Nodes |
|
|
170 | (2) |
|
|
172 | (1) |
|
|
172 | (4) |
|
Example 2: Classifying Accident Severity |
|
|
176 | (1) |
|
|
177 | (4) |
|
Using the Output for Prediction and Classification |
|
|
181 | (1) |
|
|
181 | (1) |
|
Exploring the Relationship Between Predictors and Response |
|
|
182 | (1) |
|
Advantages and Weaknesses of Neural Networks |
|
|
182 | (5) |
|
|
184 | (3) |
|
|
187 | (16) |
|
|
187 | (1) |
|
|
187 | (1) |
|
Example 2: Personal Loan Acceptance |
|
|
188 | (1) |
|
Distance of an Observation from a Class |
|
|
188 | (3) |
|
Fisher's Linear Classification Functions |
|
|
191 | (3) |
|
Classification Performance of Discriminant Analysis |
|
|
194 | (1) |
|
|
195 | (1) |
|
Unequal Misclassification Costs |
|
|
195 | (1) |
|
Classifying More Than Two Classes |
|
|
196 | (1) |
|
Example 3: Medical Dispatch to Accident Scenes |
|
|
196 | (1) |
|
Advantages and Weaknesses |
|
|
197 | (6) |
|
|
200 | (3) |
|
|
203 | (16) |
|
|
203 | (1) |
|
Discovering Association Rules in Transaction Databases |
|
|
203 | (1) |
|
Example 1: Synthetic Data on Purchases of Phone Faceplates |
|
|
204 | (1) |
|
Generating Candidate Rules |
|
|
204 | (2) |
|
|
205 | (1) |
|
|
206 | (6) |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
207 | (2) |
|
The Process of Rule Selection |
|
|
209 | (1) |
|
|
210 | (1) |
|
Statistical Significance of Rules |
|
|
211 | (1) |
|
Example 2: Rules for Similar Book Purchases |
|
|
212 | (1) |
|
|
212 | (7) |
|
|
215 | (4) |
|
|
219 | (22) |
|
|
219 | (1) |
|
Example: Public Utilities |
|
|
220 | (2) |
|
Measuring Distance Between Two Records |
|
|
222 | (5) |
|
|
223 | (1) |
|
Normalizing Numerical Measurements |
|
|
223 | (1) |
|
Other Distance Measures for Numerical Data |
|
|
223 | (3) |
|
Distance Measures for Categorical Data |
|
|
226 | (1) |
|
Distance Measures for Mixed Data |
|
|
226 | (1) |
|
Measuring Distance Between Two Clusters |
|
|
227 | (1) |
|
Hierarchical (Agglomerative) Clustering |
|
|
228 | (5) |
|
Minimum Distance (Single Linkage) |
|
|
229 | (1) |
|
Maximum Distance (Complete Linkage) |
|
|
229 | (1) |
|
Group Average (Average Linkage) |
|
|
230 | (1) |
|
Dendrograms: Displaying Clustering Process and Results |
|
|
230 | (1) |
|
|
231 | (1) |
|
Limitations of Hierarchical Clustering |
|
|
232 | (1) |
|
Nonhierarchical Clustering: The k-Means Algorithm |
|
|
233 | (8) |
|
Initial Partition into k Clusters |
|
|
234 | (3) |
|
|
237 | (4) |
|
|
241 | (30) |
|
|
241 | (9) |
|
|
250 | (4) |
|
|
254 | (4) |
|
Segmenting Consumers of Bath Soap |
|
|
258 | (4) |
|
|
262 | (3) |
|
|
265 | (2) |
|
|
267 | (4) |
References |
|
271 | (2) |
Index |
|
273 | |