Class 08: Large Scale Learning by Sketching
Instructor: Lorenzo Rosasco
Description
We discuss computational strategies for learning with large scale kernel methods, i.e. memory efficient when dealing with large datasets. We focus on subsampling methods that replace the empirical kernel matrix with a smaller matrix obtained by (column) subsampling.
Slides
Slides for this lecture: PDF
Class Reference Material
L. Rosasco, T. Poggio, Machine Learning: a Regularization Approach, MIT-9.520 Lectures Notes, Manuscript, Dec. 2017
Chapter 4 - Regularization Networks
Note: The course notes, in the form of the circulated book draft is the reference material for this class. Related and older material can be accessed through previous year offerings of the course.
Further Reading
- A. Smola and B. Scholkopf, Sparse Greedy Matrix Approximation for Machine Learning, Proc. International Conference on Machine Learning, 2000.
- C. Williams and M. Seeger, Using the Nystrom Method to Speed Up Kernel Machines, Advances in Neural Information Processing Systems (NIPS), 2000.
- A. Rahimi and B. Recht, Random Features for Large-Scale Kernel Machines, Advances in Neural Information Processing Systems (NIPS), 2007.
- Q. Le, S. Tamas, and A. Smola, Fastfood-Approximating Kernel Expansions in Loglinear Time, Proc. International Conference on Machine Learning, 2013.
- A. Rudi, R. Camoriano, and L. Rosasco, Less is More: Nystrom Computational Regularization, Advances in Neural Information Processing Systems (NIPS), 2015.
- A. Rudi, R. Camoriano, and L. Rosasco, Generalization Properties of Learning with Random Features, arXiv:1602.04474, 2016.