کتابخانه مرکزی دانشگاه صنعتی شریف - Machine learning : a probabilistic perspective, Murphy, Kevin P.,- Machine learningمحتواي کتاب

صفحه

از 0

Machine learning : a probabilistic perspective
Murphy, Kevin P.,

اطلاعات کتابشناختی

Machine learning : a probabilistic perspective
Author :	Murphy, Kevin P.,
Publisher :	MIT Press,
Pub. Year :	2012
Subjects :	Machine learning. Probabilities.
Call Number :	‭Q 325 .5 .M87 2012

جستجو در محتوا


ترتيب

فهرست مطالب

Cover Page (1)
Half Title Page (2)
Title Page (4)
Copyright Page (5)
Dedication (6)
Contents (8)
Preface (28)
1 Introduction (32)
- 1.1 Machine learning: what and why? (32)
  - 1.1.1 Types of machine learning (33)
- 1.2 Supervised learning (34)
  - 1.2.1 Classification (34)
  - 1.2.2 Regression (39)
- 1.3 Unsupervised learning (40)
- 1.4 Some basic concepts in machine learning (47)
2 Probability (58)
- 2.1 Introduction (58)
- 2.2 A brief review of probability theory (59)
- 2.3 Some common discrete distributions (65)
- 2.4 Some common continuous distributions (69)
- 2.5 Joint probability distributions (75)
- 2.6 Transformations of random variables (80)
- 2.7 Monte Carlo approximation (83)
- 2.8 Information theory (87)
3 Generative Models for Discrete Data (96)
- 3.1 Introduction (96)
- 3.2 Bayesian concept learning (96)
- 3.3 The beta-binomial model (103)
- 3.4 The Dirichlet-multinomial model (109)
- 3.5 Naive Bayes classifiers (113)
4 Gaussian Models (128)
- 4.1 Introduction (128)
- 4.2 Gaussian discriminant analysis (132)
- 4.3 Inference in jointly Gaussian distributions (141)
- 4.4 Linear Gaussian systems (150)
- 4.5 Digression: The Wishart distribution * (156)
  - 4.5.1 Inverse Wishart distribution (157)
  - 4.5.2 Visualizing the Wishart distribution * (158)
- 4.6 Inferring the parameters of an MVN (158)
5 Bayesian Statistics (180)
- 5.1 Introduction (180)
- 5.2 Summarizing posterior distributions (180)
- 5.3 Bayesian model selection (186)
- 5.4 Priors (196)
- 5.5 Hierarchical Bayes (202)
  - 5.5.1 Example: modeling related cancer rates (202)
- 5.6 Empirical Bayes (203)
  - 5.6.1 Example: beta-binomial model (204)
  - 5.6.2 Example: Gaussian-Gaussian model (204)
- 5.7 Bayesian decision theory (207)
6 Frequentist Statistics (222)
- 6.1 Introduction (222)
- 6.2 Sampling distribution of an estimator (222)
  - 6.2.1 Bootstrap (223)
  - 6.2.2 Large sample theory for the MLE * (224)
- 6.3 Frequentist decision theory (225)
- 6.4 Desirable properties of estimators (231)
- 6.5 Empirical risk minimization (235)
- 6.6 Pathologies of frequentist statistics * (242)
7 Linear Regression (248)
- 7.1 Introduction (248)
- 7.2 Model specification (248)
- 7.3 Maximum likelihood estimation (least squares) (248)
- 7.4 Robust linear regression * (254)
- 7.5 Ridge regression (256)
- 7.6 Bayesian linear regression (262)
8 Logistic Regression (276)
- 8.1 Introduction (276)
- 8.2 Model specification (276)
- 8.3 Model fitting (276)
- 8.4 Bayesian logistic regression (285)
- 8.5 Online learning and stochastic optimization (292)
- 8.6 Generative vs discriminative classifiers (298)
9 Generalized Linear Models and the Exponential Family (312)
- 9.1 Introduction (312)
- 9.2 The exponential family (312)
- 9.3 Generalized linear models (GLMs) (321)
- 9.4 Probit regression (324)
- 9.5 Multi-task learning (327)
- 9.6 Generalized linear mixed models * (329)
  - 9.6.1 Example: semi-parametric GLMMs for medical data (329)
  - 9.6.2 Computational issues (331)
- 9.7 Learning to rank * (331)
10 Directed Graphical Models (Bayes Nets) (338)
- 10.1 Introduction (338)
- 10.2 Examples (342)
- 10.3 Inference (350)
- 10.4 Learning (351)
- 10.5 Conditional independence properties of DGMs (355)
- 10.6 Influence (decision) diagrams * (359)
11 Mixture Models and the EM Algorithm (368)
- 11.1 Latent variable models (368)
- 11.2 Mixture models (368)
- 11.3 Parameter estimation for mixture models (376)
  - 11.3.1 Unidentifiability (377)
  - 11.3.2 Computing a MAP estimate is non-convex (378)
- 11.4 The EM algorithm (379)
- 11.5 Model selection for latent variable models (401)
  - 11.5.1 Model selection for probabilistic models (401)
  - 11.5.2 Model selection for non-probabilistic methods (401)
- 11.6 Fitting models with missing data (403)
  - 11.6.1 EM for the MLE of an MVN with missing data (404)
12 Latent Linear Models (412)
- 12.1 Factor analysis (412)
- 12.2 Principal components analysis (PCA) (418)
- 12.3 Choosing the number of latent dimensions (429)
  - 12.3.1 Model selection for FA/PPCA (429)
  - 12.3.2 Model selection for PCA (430)
- 12.4 PCA for categorical data (433)
- 12.5 PCA for paired and multi-view data (435)
- 12.6 Independent Component Analysis (ICA) (438)
13 Sparse Linear Models (452)
- 13.1 Introduction (452)
- 13.2 Bayesian variable selection (453)
- 13.3 e1 regularization: basics (460)
- 13.4 e1 regularization: algorithms (472)
- 13.5 e1 regularization: extensions (480)
- 13.6 Non-convex regularizers (488)
- 13.7 Automatic relevance determination (ARD)/sparse Bayesian learning (SBL) (494)
- 13.8 Sparse coding * (499)
14 Kernels (510)
- 14.1 Introduction (510)
- 14.2 Kernel functions (510)
- 14.3 Using kernels inside GLMs (517)
  - 14.3.1 Kernel machines (517)
  - 14.3.2 L1VMs, RVMs, and other sparse vector machines (518)
- 14.4 The kernel trick (519)
- 14.5 Support vector machines (SVMs) (527)
- 14.6 Comparison of discriminative kernel methods (536)
- 14.7 Kernels for building generative models (538)
15 Gaussian Processes (546)
- 15.1 Introduction (546)
- 15.2 GPs for regression (547)
- 15.3 GPs meet GLMs (556)
- 15.4 Connection with other methods (563)
- 15.5 GP latent variable model (571)
- 15.6 Approximation methods for large datasets (573)
16 Adaptive Basis Function Models (574)
- 16.1 Introduction (574)
- 16.2 Classification and regression trees (CART) (575)
- 16.3 Generalized additive models (583)
- 16.4 Boosting (585)
- 16.5 Feedforward neural networks (multilayer perceptrons) (594)
- 16.6 Ensemble learning (611)
- 16.7 Experimental comparison (613)
  - 16.7.1 Low-dimensional features (613)
  - 16.7.2 High-dimensional features (614)
- 16.8 Interpreting black-box models (616)
17 Markov and Hidden Markov Models (620)
- 17.1 Introduction (620)
- 17.2 Markov models (620)
- 17.3 Hidden Markov models (634)
  - 17.3.1 Applications of HMMs (635)
- 17.4 Inference in HMMs (637)
- 17.5 Learning for HMMs (648)
- 17.6 Generalizations of HMMs (652)
18 State Space Models (662)
- 18.1 Introduction (662)
- 18.2 Applications of SSMs (663)
- 18.3 Inference in LG-SSM (671)
  - 18.3.1 The Kalman filtering algorithm (671)
  - 18.3.2 The Kalman smoothing algorithm (674)
- 18.4 Learning for LG-SSM (677)
- 18.5 Approximate online inference for non-linear, non-Gaussian SSMs (678)
- 18.6 Hybrid discrete/continuous SSMs (686)
19 Undirected Graphical Models (Markov Random Fields) (692)
- 19.1 Introduction (692)
- 19.2 Conditional independence properties of UGMs (692)
- 19.3 Parameterization of MRFs (696)
  - 19.3.1 The Hammersley-Clifford theorem (696)
  - 19.3.2 Representing potential functions (698)
- 19.4 Examples of MRFs (699)
- 19.5 Learning (707)
- 19.6 Conditional random fields (CRFs) (715)
- 19.7 Structural SVMs (724)
20 Exact Inference for Graphical Models (738)
- 20.1 Introduction (738)
- 20.2 Belief propagation for trees (738)
- 20.3 The variable elimination algorithm (745)
- 20.4 The junction tree algorithm * (751)
- 20.5 Computational intractability of exact inference in the worst case (757)
  - 20.5.1 Approximate inference (758)
21 Variational Inference (762)
- 21.1 Introduction (762)
- 21.2 Variational inference (763)
  - 21.2.1 Alternative interpretations of the variational objective (764)
  - 21.2.2 Forward or reverse KL? * (764)
- 21.3 The mean field method (766)
  - 21.3.1 Derivation of the mean field update equations (767)
  - 21.3.2 Example: mean field for the Ising model (768)
- 21.4 Structured mean field * (770)
  - 21.4.1 Example: factorial HMM (771)
- 21.5 Variational Bayes (773)
  - 21.5.1 Example: VB for a univariate Gaussian (773)
  - 21.5.2 Example: VB for linear regression (777)
- 21.6 Variational Bayes EM (780)
  - 21.6.1 Example: VBEM for mixtures of Gaussians * (781)
- 21.7 Variational message passing and VIBES (787)
- 21.8 Local variational bounds * (787)
22 More Variational Inference (798)
- 22.1 Introduction (798)
- 22.2 Loopy belief propagation: algorithmic issues (798)
- 22.3 Loopy belief propagation: theoretical issues * (807)
- 22.4 Extensions of belief propagation * (814)
  - 22.4.1 Generalized belief propagation (814)
  - 22.4.2 Convex belief propagation (816)
- 22.5 Expectation propagation (818)
- 22.6 MAP state estimation (830)
23 Monte Carlo Inference (846)
- 23.1 Introduction (846)
- 23.2 Sampling from standard distributions (846)
  - 23.2.1 Using the cdf (846)
  - 23.2.2 Sampling from a Gaussian (Box-Muller method) (848)
- 23.3 Rejection sampling (848)
- 23.4 Importance sampling (851)
- 23.5 Particle filtering (854)
- 23.6 Rao-Blackwellised particle filtering (RBPF) (862)
24 Markov Chain Monte Carlo (MCMC) Inference (868)
- 24.1 Introduction (868)
- 24.2 Gibbs sampling (869)
- 24.3 Metropolis Hastings algorithm (879)
- 24.4 Speed and accuracy of MCMC (887)
- 24.5 Auxiliary variable MCMC * (894)
- 24.6 Annealing methods (899)
- 24.7 Approximating the marginal likelihood (903)
25 Clustering (906)
- 25.1 Introduction (906)
  - 25.1.1 Measuring (dis)similarity (906)
  - 25.1.2 Evaluating the output of clustering methods * (907)
- 25.2 Dirichlet process mixture models (910)
- 25.3 Affinity propagation (918)
- 25.4 Spectral clustering (921)
- 25.5 Hierarchical clustering (924)
- 25.6 Clustering datapoints and features (932)
  - 25.6.1 Biclustering (934)
  - 25.6.2 Multi-view clustering (934)
26 Graphical Model Structure Learning (938)
- 26.1 Introduction (938)
- 26.2 Structure learning for knowledge discovery (939)
  - 26.2.1 Relevance networks (939)
  - 26.2.2 Dependency networks (940)
- 26.3 Learning tree structures (941)
- 26.4 Learning DAG structures (945)
- 26.5 Learning DAG structure with latent variables (953)
- 26.6 Learning causal DAGs (962)
- 26.7 Learning undirected Gaussian graphical models (969)
- 26.8 Learning undirected discrete graphical models (973)
  - 26.8.1 Graphical lasso for MRFs/CRFs (973)
  - 26.8.2 Thin junction trees (975)
27 Latent Variable Models for Discrete Data (976)
- 27.1 Introduction (976)
- 27.2 Distributed state LVMs for discrete data (977)
- 27.3 Latent Dirichlet allocation (LDA) (981)
- 27.4 Extensions of LDA (992)
- 27.5 LVMs for graph-structured data (1001)
- 27.6 LVMs for relational data (1006)
  - 27.6.1 Infinite relational model (1007)
  - 27.6.2 Probabilistic matrix factorization for collaborative filtering (1010)
- 27.7 Restricted Boltzmann machines (RBMs) (1014)
28 Deep Learning (1026)
- 28.1 Introduction (1026)
- 28.2 Deep generative models (1026)
- 28.3 Deep neural networks (1030)
- 28.4 Applications of deep networks (1032)
- 28.5 Discussion (1036)
Notation (1040)
Bibliography (1046)
Index to Code (1078)
Index to Keywords (1081)