Gaussian Processes for Machine Learning

by Rasmussen, Carl Edward; Williams, Christopher K. I.

ISBN13: 9780262182539

ISBN10: 026218253X

Format: Hardcover

Pub. Date: 2005-11-23

Publisher(s): The MIT Press

Other versions by this Author

This Item Qualifies for Free Shipping!*

*Excludes marketplace orders.
We Buy This Book Back!

In-Store Credit: $8.45

Check/Direct Deposit: $8.05

PayPal: $8.05

Sell Book

List Price: ~~$56.00~~

Buy New

Arriving Soon. Will ship when available.

$53.33

Add to Cart

Rent Textbook

Select for Price

Add to Cart

There was a problem. Please try again later.

Used Textbook

We're Sorry
Sold Out

eTextbook

We're Sorry
Not Available

Buy from our Marketplace starting at $37.65

Summary

Winner, 2009 DeGroot Prize for the best book in statistical science, awarded by the International Society for Bayesian Analysis. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

Author Biography

Carl Edward Rasmussen is a Lecturer at the Department of Engineering, University of Cambridge, and Adjunct Research Scientist at the Max Planck Institute for Biological Cybernetics, Tübingen.

Christopher K. I. Williams is Professor of Machine Learning and Director of the Institute for Adaptive and Neural Computation in the School of Informatics, University of Edinburgh.

Series Foreword

Preface

xiii

Symbols and Notation

xvii

Introduction

(6)

A Pictorial Introduction to Bayesian Modelling

(2)

Roadmap

(2)

Regression

(26)

Weight-space View

(6)

The Standard Linear Model

(3)

Projections of Inputs into Feature Space

(2)

Function-space View

(6)

Varying the Hyperparameters

(2)

Decision Theory for Regression

(1)

An Example Application

(2)

Smoothing, Weight Functions and Equivalent Kernels

(3)

Incorporating Explicit Basis Functions

(2)

Marginal Likelihood

(1)

History and Related Work

(1)

Exercises

(3)

Classification

(46)

Classification Problems

(3)

Decision Theory for Classification

(2)

Linear Models for Classification

(2)

Gaussian Process Classification

(2)

The Laplace Approximation for the Binary GP Classifier

(7)

Posterior

(2)

Predictions

(1)

Implementation

(2)

Marginal Likelihood

(1)

Multi-class Laplace Approximation

(4)

Implementation

(1)

Expectation Propagation

(8)

Predictions

(1)

Marginal Likelihood

(1)

Implementation

(3)

Experiments

(12)

A Toy Problem

(2)

One-dimensional Example

(1)

Binary Handwritten Digit Classification Example

(7)

10-class Handwritten Digit Classification Example

(2)

Discussion

(2)

Appendix: Moment Derivations

(1)

Exercises

(4)

Covariance Functions

(26)

Preliminaries

(2)

Mean Square Continuity and Differentiability

(1)

Examples of Covariance Functions

(15)

Stationary Covariance Functions

(7)

Dot Product Covariance Functions

(1)

Other Non-stationary Covariance Functions

(4)

Making New Kernels from Old

(2)

Eigenfunction Analysis of Kernels

(3)

An Analytic Example

(1)

Numerical Approximation of Eigenfunctions

(1)

Kernels for Non-vectorial Inputs

(3)

String Kernels

100

(1)

Fisher Kernels

101

(1)

Exercises

102

(3)

Model Selection and Adaptation of Hyperparameters

105

(24)

The Model Selection Problem

106

(2)

Bayesian Model Selection

108

(3)

Cross-validation

111

(1)

Model Selection for GP Regression

112

(12)

Marginal Likelihood

112

(4)

Cross-validation

116

(2)

Examples and Discussion

118

(6)

Model Selection for GP Classification

124

(4)

Derivatives of the Marginal Likelihood for Laplace's Approximation

125

(2)

Derivatives of the Marginal Likelihood for EP

127

(1)

Cross-validation

127

(1)

Example

128

(1)

Exercises

128

(1)

Relationships between GPs and Other Models

129

(22)

Reproducing Kernel Hilbert Spaces

129

(3)

Regularization

132

(4)

Regularization Defined by Differential Operators

133

(2)

Obtaining the Regularized Solution

135

(1)

The Relationship of the Regularization View to Gaussian Process Prediction

135

(1)

Spline Models

136

(5)

A 1-d Gaussian Process Spline Construction

138

(3)

Support Vector Machines

141

(5)

Support Vector Classification

141

(4)

Support Vector Regression

145

(1)

Least-squares Classification

146

(3)

Probabilistic Least-squares Classification

147

(2)

Relevance Vector Machines

149

(1)

Exercises

150

(1)

Theoretical Perspectives

151

(20)

The Equivalent Kernel

151

(4)

Some Specific Examples of Equivalent Kernels

153

(2)

Asymptotic Analysis

155

(4)

Consistency

155

(2)

Equivalence and Orthogonality

157

(2)

Average-case Learning Curves

159

(2)

PAC-Bayesian Analysis

161

(4)

The PAC Framework

162

(1)

PAC-Bayesian Analysis

163

(1)

PAC-Bayesian Analysis of GP Classification

164

(1)

Comparison with Other Supervised Learning Methods

165

(3)

Appendix: Learning Curve for the Ornstein-Uhlenbeck Process

168

(1)

Exercises

169

(2)

Approximation Methods for Large Datasets

171

(18)

Reduced-rank Approximations of the Gram Matrix

171

(3)

Greedy Approximation

174

(1)

Approximations for GPR with Fixed Hyperparameters

175

(10)

Subset of Regressors

175

(2)

The Nystrom Method

177

(1)

Subset of Datapoints

177

(1)

Projected Process Approximation

178

(2)

Bayesian Committee Machine

180

(1)

Iterative Solution of Linear Systems

181

(1)

Comparison of Approximate GPR Methods

182

(3)

Approximations for GPC with Fixed Hyperparameters

185

(1)

Approximating the Marginal Likelihood and its Derivatives

185

(2)

Appendix: Equivalence of SR and GPR Using the Nystrom Approximate Kernel

187

(1)

Exercises

187

(2)

Further Issues and Conclusions

189

(10)

Multiple Outputs

190

(1)

Noise Models with Dependencies

190

(1)

Non-Gaussian Likelihoods

191

(1)

Derivative Observations

191

(1)

Prediction with Uncertain Inputs

192

(1)

Mixtures of Gaussian Processes

192

(1)

Global Optimization

193

(1)

Evaluation of Integrals

193

(1)

Student's t Process

194

(1)

Invariances

194

(2)

Latent Variable Models

196

(1)

Conclusions and Future Directions

196

(3)

Appendix A Mathematical Background

199

(8)

Joint, Marginal and Conditional Probability

199

(1)

Gaussian Identities

200

(1)

Matrix Identities

201

(1)

Matrix Derivatives

202

(1)

Matrix Norms

202

(1)

Cholesky Decomposition

202

(1)

Entropy and Kullback-Leibler Divergence

203

(1)

Limits

204

(1)

Measure and Integration

204

(1)

Lp Spaces

205

(1)

Fourier Transforms

205

(1)

Convexity

206

(1)

Appendix B Gaussian Markov Processes

207

(14)

Fourier Analysis

208

(3)

Sampling and Periodization

209

(2)

Continuous-time Gaussian Markov Processes

211

(3)

Continuous-time GMPs on R

211

(2)

The Solution of the Corresponding SDE on the Circle

213

(1)

Discrete-time Gaussian Markov Processes

214

(3)

Discrete-time GMPs on Z

214

(1)

The Solution of the Corresponding Difference Equation on PN

215

(2)

The Relationship Between Discrete-time and Sampled Continuous-time GMPs

217

(1)

Markov Processes in Higher Dimensions

218

(3)

Appendix C Datasets and Code

221

(2)

Bibliography

223

(16)

Author Index

239

(6)

Subject Index

245

Kids

Men

Women

For You

For Your Car

For Your Home

For Your Pet

For Your Tech

Artwork

Cooking Essentials

Games

Gift Wraps

Holiday

Home Decor

Mascot

Office Decor

Outdoor/Recreation

Graduation Gear

Graduation Gifts

Art Supplies

For Your Office

Medical Supplies

Office Supplies

School Supplies

Gaussian Processes for Machine Learning

Buy New

Rent Textbook

Used Textbook

eTextbook

Summary

Author Biography

Table of Contents

Gaussian Processes for Machine Learning

Buy New

Rent Textbook

Used Textbook

eTextbook

How Marketplace Works:

Summary

Author Biography

Table of Contents

Digital License