Home · JohnLangford/vowpal_wabbit Wiki · GitHub
Vowpal Wabbit
TheVowpal Wabbit(VW) project is a fast out-of-core learning system sponsored byMicrosoft Researchand (previously)Yahoo! Research. Support is available through themailing list.
There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation.
There are several optimization algorithms available with the baseline being sparse gradient descent (GD) on a loss function (several are available), The code should be easily usable. Its only external dependence is on theboost library, which is often installed by default.
To build vw from source, in various environments, please follow the instructions in theREADME.mdfile.
- Download
- Tutorial
- Command line arguments
- Algorithm details(e.g.,input format,loss functions)
- Examples
- Goals
- Feature Hashing and Extraction
- Multiclass Classification
- Learning to Search Sub-System
- Python interface for learning to search
There are several features that (in combination) can be powerful. Features
- Input Format. Theinput formatfor the learning algorithm is substantially more flexible than might be expected. Examples can have features consisting of free form text, which is interpreted in a bag-of-words way. There can even be multiple sets of free form text in different namespaces.
- Speed. The learning algorithm is pretty fast---similar to the few other online algorithm implementations out there. As one datapoint, it can be effectively applied on learning problems with a sparse terafeature (i.e. 1012sparse features). As another example, it's about a factor of 3 faster thanLeon Bottou'ssvmsgdon theRCV1 examplein wall clock execution time.
- Scalability. This is not the same as fast. Instead, the important characteristic here is that the memory footprint of the program is bounded independent of data. This means the training set is not loaded into main memory before learning starts. In addition, the size of the set of features is bounded independent of the amount of training data using thehashing trick.
- Feature Pairing. Subsets of features can be internally paired so that the algorithm is linear in the cross-product of the subsets. This is useful for ranking problems.David Grangierseems to have a similar trick in thePAMIR code. The alternative of explicitly expanding the features before feeding them into the learning algorithm can be both computation and space intensive, depending on how it's handled.
Many people have contributed to the project at this point.John Langford,Alekh Agarwal,Miroslav Dudik,Daniel Hsu,Nikos Karampatziakis,Olivier Chapelle,Paul Mineiro,Matt Hoffman,Jake Hofman,Sudarshan Lamkhede, Shubham Chopra, Ariel Faigon,Lihong Li, Gordon Rios, andAlex Strehlhave all worked on VW. Many others have contributed via feature requests, bug reports, or bug patches.
VW is also a vehicle for advanced research. The Researchfirst public versioncontaining hashing, caching, and true online learning was released in 2007. Since then, many different algorithms and results have influenced its design, including:
- Kai-Wei Chang,He He,Hal Daumé III,John Langford,Stephane Ross,A Credit Assignment Compiler for Joint Prediction, NIPS 2016.
- Kai-Wei Chang,Akshay Krishnamurthy,Alekh Agarwal,Hal Daumé III,John LangfordLearning to Search Better Than Your Teacher, ICML 2015.
- Alekh Agarwal,Olivier Chapelle,Miroslav Dudik,John Langford,A Reliable Effective Terascale Linear Learning System, 2011.
- M. Hoffman,D. Blei,F. Bach,Online Learning for Latent Dirichlet Allocation, in Neural Information Processing Systems (NIPS) 2010.
- Alina Beygelzimer,Daniel Hsu,John Langford, andTong ZhangAgnostic Active Learning Without ConstraintsNIPS 2010.
- John Duchi,Elad Hazan, andYoram Singer,Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, JMLR 2011 & COLT 2010.
- H. Brendan McMahan,Matthew Streeter,Adaptive Bound Optimization for Online Convex Optimization, COLT 2010.
- Nikos KarampatziakisandJohn Langford,Importance Weight Aware Gradient UpdatesUAI 2010.
- Kilian Weinberger,Anirban Dasgupta,John Langford,Alex Smola,Josh Attenberg,Feature Hashing for Large Scale Multitask Learning, ICML 2009.
- Qinfeng Shi,James Petterson,Gideon Dror,John Langford,Alex Smola, andSVN Vishwanathan,Hash Kernels for Structured Data, AISTAT 2009.
- John Langford,Lihong Li, andTong Zhang,Sparse Online Learning via Truncated Gradient, NIPS 2008.
- Leon Bottou,Stochastic Gradient Descent, 2007.
- Avrim Blum,Adam Kalai, andJohn LangfordBeating the Holdout: Bounds for KFold and Progressive Cross-Validation. COLT99 pages 203-208.
- Nocedal, J.(1980). "Updating Quasi-Newton Matrices with Limited Storage". Mathematics of Computation 35: 773–782.