Near-Optimal Parallel Join Processing in MapReduce

Channel:

Google TechTalks

Subscribers:

348,000

Published on May 19, 2011 12:34:03 AM ● Video Link: https://www.youtube.com/watch?v=kiuUGXWRzPA

Duration: 57:00

4,734 views

Google Tech Talk (more info below)
May 5, 2011

Presented by Dr Mirek Riedewald, Associate Professor College of Computer and Information Science Northeastern University http://www.ccs.neu.edu/home/mirek/

ABSTRACT

As the amount and complexity of data in many fields increases rapidly, new tools are needed for exploratory analysis and scientific discovery. Our Scolopax system's goal is to address these challenges with novel techniques for large-scale parallel data management. In this talk, we will present an overview of Scolopax and then focus on parallel processing of joins. Joins combine information across data sets, e.g., to discover correlations. Our proposed join model simplifies reasoning about how to assign computation tasks to processors in MapReduce and other parallel environments. Using this model, we derive a surprisingly simple randomized algorithm, called 1-Bucket-Theta, for implementing arbitrary joins in a single MapReduce job. This algorithm only requires minimal statistics (input cardinality) and we provide proofs and strong evidence that for a variety of join problems, its latency is either close to optimal or the best realizable option. For some popular joins we show how to improve over 1-Bucket-Theta by exploiting additional input statistics. Most of these results will appear at SIGMOD 2011.

Other Videos By Google TechTalks

2011-06-07	How to Create World Peace and Silicon Valley's Stake in the Game
2011-06-07	Video Games and the Future of Learning (Jan Plass and Bruce Homer)
2011-06-07	Mining Your Logs - Gaining Insight Through Visualization
2011-06-07	Heath@Google Series: When Stress Becomes Stressed Out - 5 Ways to Outsmart the Invisible Killer
2011-06-01	Bufferbloat: Dark Buffers in the Internet
2011-05-31	IMUG Meetup: Mobile App Localization as a Service
2011-05-27	Oakland International High School @Google
2011-05-26	Self-Publishing: A Googler's Journey
2011-05-25	Racial Profiling Analysis in a Post-Beer Summit World
2011-05-25	The Middle East and Its Current Political Climate
2011-05-18	Near-Optimal Parallel Join Processing in MapReduce
2011-05-18	Michel Beaudouin-Lafon_Lessons from the WILD Room, an Interactive Multi-Surface Environment
2011-05-18	Large-scale Image Classification: ImageNet and ObjectBank
2011-05-16	Predator: A Visual Tracker that Learns from its Errors
2011-05-03	Social Networks and Community (Re)Engineering: Creating Health Through Information and Policy
2011-05-02	Where Did This Code Come From? Discovering the Provenance of Program Binaries
2011-04-25	Health@Google Series: Reset Yourself, Starting with Food
2011-04-25	Health@Google Series: Boosting Performance Through Plant-Based Whole Foods
2011-04-15	To Harness The Long Tail Online, Location Does Matter As Does Time
2011-04-15	Bay Area Vision Meeting: Visual Recognition via Feature Learning
2011-04-15	Health@Google Series: Hair Loss and Hair Restoration

Tags:

google tech talk

mapreduce

database

data management

Channel	Latest
CZor	6 hours ago
legorocks99	6 hours ago
HIKI	6 hours ago
Alone Player	6 hours ago
石川Yaya	6 hours ago
LEGIQN	6 hours ago
Riftory	6 hours ago
Brad Hahn	7 hours ago
Troydan	7 hours ago
Valentyme	7 hours ago
theloladass Gaming	7 hours ago
Hideaki KAGAWA	7 hours ago
YT-洪同十(不定時直播(目標1000訂閱)	7 hours ago
Tugarych	7 hours ago
KATITO JOGA	7 hours ago
DjMaRiiO	7 hours ago
LGLegendary	7 hours ago
Raveydemon	7 hours ago
Minatoqt Gaming	7 hours ago
Galizma	7 hours ago
Emanu Korok	7 hours ago
lolWillieP	7 hours ago
GsQ Zeus - Da Moose	7 hours ago
JENIBAtv	7 hours ago
TheUntouchableWolf	7 hours ago