Big Learning Workshop: Algorithms, Systems, and Tools for Learning at Scale at NIPS 2011
Invited Talk: High-Performance Computing Needs Machine Learning...and Vice Versa by Nicolas Pinto
Abstract: Large-scale parallelism is a common feature of many neuro-inspired algorithms. In this short paper, we present a practical tutorial on ways that metaprogramming techniques -- dynamically generating specialized code at runtime and compiling it just-in-time -- can be used to greatly accelerate a large data-parallel algorithm. We use filter-bank convolution, a key component of many neural networks for vision, as a case study to illustrate these tech- niques. We present an overview of several key themes in template metaprogramming, and culminate in a full example of GPU auto-tuning in which an instrumented GPU kernel template is built and the space of all possible instantiations of this kernel is automatically grid- searched to find the best implementation on various hardware/software platforms. We show that this method can, in concert with traditional hand-tuning techniques, achieve significant speed-ups, particularly when a kernel will be run on a variety of hardware platforms.