Series Foreword |
|
xi | |
Preface |
|
xiii | |
Symbols and Notation |
|
xvii | |
|
|
1 | (6) |
|
A Pictorial Introduction to Bayesian Modelling |
|
|
3 | (2) |
|
|
5 | (2) |
|
|
7 | (26) |
|
|
7 | (6) |
|
The Standard Linear Model |
|
|
8 | (3) |
|
Projections of Inputs into Feature Space |
|
|
11 | (2) |
|
|
13 | (6) |
|
Varying the Hyperparameters |
|
|
19 | (2) |
|
Decision Theory for Regression |
|
|
21 | (1) |
|
|
22 | (2) |
|
Smoothing, Weight Functions and Equivalent Kernels |
|
|
24 | (3) |
|
Incorporating Explicit Basis Functions |
|
|
27 | (2) |
|
|
29 | (1) |
|
|
29 | (1) |
|
|
30 | (3) |
|
|
33 | (46) |
|
|
34 | (3) |
|
Decision Theory for Classification |
|
|
35 | (2) |
|
Linear Models for Classification |
|
|
37 | (2) |
|
Gaussian Process Classification |
|
|
39 | (2) |
|
The Laplace Approximation for the Binary GP Classifier |
|
|
41 | (7) |
|
|
42 | (2) |
|
|
44 | (1) |
|
|
45 | (2) |
|
|
47 | (1) |
|
Multi-class Laplace Approximation |
|
|
48 | (4) |
|
|
51 | (1) |
|
|
52 | (8) |
|
|
56 | (1) |
|
|
57 | (1) |
|
|
57 | (3) |
|
|
60 | (12) |
|
|
60 | (2) |
|
|
62 | (1) |
|
Binary Handwritten Digit Classification Example |
|
|
63 | (7) |
|
10-class Handwritten Digit Classification Example |
|
|
70 | (2) |
|
|
72 | (2) |
|
Appendix: Moment Derivations |
|
|
74 | (1) |
|
|
75 | (4) |
|
|
79 | (26) |
|
|
79 | (2) |
|
Mean Square Continuity and Differentiability |
|
|
81 | (1) |
|
Examples of Covariance Functions |
|
|
81 | (15) |
|
Stationary Covariance Functions |
|
|
82 | (7) |
|
Dot Product Covariance Functions |
|
|
89 | (1) |
|
Other Non-stationary Covariance Functions |
|
|
90 | (4) |
|
Making New Kernels from Old |
|
|
94 | (2) |
|
Eigenfunction Analysis of Kernels |
|
|
96 | (3) |
|
|
97 | (1) |
|
Numerical Approximation of Eigenfunctions |
|
|
98 | (1) |
|
Kernels for Non-vectorial Inputs |
|
|
99 | (3) |
|
|
100 | (1) |
|
|
101 | (1) |
|
|
102 | (3) |
|
Model Selection and Adaptation of Hyperparameters |
|
|
105 | (24) |
|
The Model Selection Problem |
|
|
106 | (2) |
|
|
108 | (3) |
|
|
111 | (1) |
|
Model Selection for GP Regression |
|
|
112 | (12) |
|
|
112 | (4) |
|
|
116 | (2) |
|
|
118 | (6) |
|
Model Selection for GP Classification |
|
|
124 | (4) |
|
Derivatives of the Marginal Likelihood for Laplace's Approximation |
|
|
125 | (2) |
|
Derivatives of the Marginal Likelihood for EP |
|
|
127 | (1) |
|
|
127 | (1) |
|
|
128 | (1) |
|
|
128 | (1) |
|
Relationships between GPs and Other Models |
|
|
129 | (22) |
|
Reproducing Kernel Hilbert Spaces |
|
|
129 | (3) |
|
|
132 | (4) |
|
Regularization Defined by Differential Operators |
|
|
133 | (2) |
|
Obtaining the Regularized Solution |
|
|
135 | (1) |
|
The Relationship of the Regularization View to Gaussian Process Prediction |
|
|
135 | (1) |
|
|
136 | (5) |
|
A 1-d Gaussian Process Spline Construction |
|
|
138 | (3) |
|
|
141 | (5) |
|
Support Vector Classification |
|
|
141 | (4) |
|
Support Vector Regression |
|
|
145 | (1) |
|
Least-squares Classification |
|
|
146 | (3) |
|
Probabilistic Least-squares Classification |
|
|
147 | (2) |
|
Relevance Vector Machines |
|
|
149 | (1) |
|
|
150 | (1) |
|
|
151 | (20) |
|
|
151 | (4) |
|
Some Specific Examples of Equivalent Kernels |
|
|
153 | (2) |
|
|
155 | (4) |
|
|
155 | (2) |
|
Equivalence and Orthogonality |
|
|
157 | (2) |
|
Average-case Learning Curves |
|
|
159 | (2) |
|
|
161 | (4) |
|
|
162 | (1) |
|
|
163 | (1) |
|
PAC-Bayesian Analysis of GP Classification |
|
|
164 | (1) |
|
Comparison with Other Supervised Learning Methods |
|
|
165 | (3) |
|
Appendix: Learning Curve for the Ornstein-Uhlenbeck Process |
|
|
168 | (1) |
|
|
169 | (2) |
|
Approximation Methods for Large Datasets |
|
|
171 | (18) |
|
Reduced-rank Approximations of the Gram Matrix |
|
|
171 | (3) |
|
|
174 | (1) |
|
Approximations for GPR with Fixed Hyperparameters |
|
|
175 | (10) |
|
|
175 | (2) |
|
|
177 | (1) |
|
|
177 | (1) |
|
Projected Process Approximation |
|
|
178 | (2) |
|
Bayesian Committee Machine |
|
|
180 | (1) |
|
Iterative Solution of Linear Systems |
|
|
181 | (1) |
|
Comparison of Approximate GPR Methods |
|
|
182 | (3) |
|
Approximations for GPC with Fixed Hyperparameters |
|
|
185 | (1) |
|
Approximating the Marginal Likelihood and its Derivatives |
|
|
185 | (2) |
|
Appendix: Equivalence of SR and GPR Using the Nystrom Approximate Kernel |
|
|
187 | (1) |
|
|
187 | (2) |
|
Further Issues and Conclusions |
|
|
189 | (10) |
|
|
190 | (1) |
|
Noise Models with Dependencies |
|
|
190 | (1) |
|
|
191 | (1) |
|
|
191 | (1) |
|
Prediction with Uncertain Inputs |
|
|
192 | (1) |
|
Mixtures of Gaussian Processes |
|
|
192 | (1) |
|
|
193 | (1) |
|
|
193 | (1) |
|
|
194 | (1) |
|
|
194 | (2) |
|
|
196 | (1) |
|
Conclusions and Future Directions |
|
|
196 | (3) |
|
Appendix A Mathematical Background |
|
|
199 | (8) |
|
Joint, Marginal and Conditional Probability |
|
|
199 | (1) |
|
|
200 | (1) |
|
|
201 | (1) |
|
|
202 | (1) |
|
|
202 | (1) |
|
|
202 | (1) |
|
Entropy and Kullback-Leibler Divergence |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
204 | (1) |
|
|
205 | (1) |
|
|
205 | (1) |
|
|
206 | (1) |
|
Appendix B Gaussian Markov Processes |
|
|
207 | (14) |
|
|
208 | (3) |
|
Sampling and Periodization |
|
|
209 | (2) |
|
Continuous-time Gaussian Markov Processes |
|
|
211 | (3) |
|
Continuous-time GMPs on R |
|
|
211 | (2) |
|
The Solution of the Corresponding SDE on the Circle |
|
|
213 | (1) |
|
Discrete-time Gaussian Markov Processes |
|
|
214 | (3) |
|
|
214 | (1) |
|
The Solution of the Corresponding Difference Equation on PN |
|
|
215 | (2) |
|
The Relationship Between Discrete-time and Sampled Continuous-time GMPs |
|
|
217 | (1) |
|
Markov Processes in Higher Dimensions |
|
|
218 | (3) |
|
Appendix C Datasets and Code |
|
|
221 | (2) |
Bibliography |
|
223 | (16) |
Author Index |
|
239 | (6) |
Subject Index |
|
245 | |