NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Spark: In-Memory Cluster...

Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=qLvLg-sqxKc



Duration: 40:52
12,419 views
46


Big Learning Workshop: Algorithms, Systems, and Tools for Learning at Scale at NIPS 2011
Invited Talk: Spark: In-Memory Cluster Computing for Iterative and Interactive Applications by Matei Zaharia

Matei Zaharia is a fifth year graduate student at UC Berkeley, working with Scott Shenker and Ion Stoica on topics in cloud computing, operating systems and networking. He is also a committer on Apache Hadoop. He is funded by a Google PhD fellowship. Before joining Berkeley, Matei got his undergraduate degree at the University of Waterloo in Canada.

Abstract: MapReduce and its variants have been highly successful in supporting large-scale data-intensive cluster applications. However, these systems are inefficient for applications that share data among multiple computation stages, including many machine learning algorithms, because they are based on an acyclic data flow model. We present Spark, a new cluster computing framework that extends the data flow model with a set of in-memory storage abstractions to efficiently support these applications. Spark outperforms Hadoop by up to 30x in iterative machine learning algorithms while retaining MapReduce's scalability and fault tolerance. In addition, Spark makes programming jobs easy by integrating into the Scala programming language. Finally, Spark's ability to load a dataset into memory and query it repeatedly makes it especially suitable for interactive analysis of big data. We have modified the Scala interpreter to make it possible to use Spark interactively as a highly responsive data analytics tool.

At Berkeley, we have used Spark to implement several large-scale machine learning applications, including a Twitter spam classifier and a real-time automobile traffic estimation system based on expectation maximization. We will present lessons learned from these applications and optimizations we added to Spark as a result.




Other Videos By Google TechTalks


2012-02-23NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Vowpal Wabbit Tutorial
2012-02-23NIPS 2011 Sparse Representation & Low-rank Approximation Workshop: Group Sparse Hidden Markov...
2012-02-23NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: A Common GPU...
2012-02-23The Relative Happiness Index (RHI)
2012-02-23A Chinese Typewriter in Silicon Valley
2012-02-233D Computer Vision: Past, Present, and Future
2012-02-20Knowledge is... Love
2012-02-16Meditate with Father Laurence Freeman
2012-02-14Agile C++ with Supporting Eclipse CDT Plug-ins
2012-02-14Santa Tracker - 1.6 Million Requests per Second
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Spark: In-Memory Cluster...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Real time data...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Hazy - Making Data-driven...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Block splitting for...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: No-U-Turn Sampler...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Graphlab 2...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Graphlab 2 Tutorial
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Large-Scale Matrix...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Randomized Smoothing for...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Machine Learning's Role...
2012-02-13NIPS 2011 Big Learning - Algorithms, Systems, & Tools Workshop: Fast Cross-Validation...



Tags:
new
bigml
d2
zaharia