Sampling Techniques for Massive Data

Channel:

Google TechTalks

Subscribers:

349,000

Published on October 9, 2007 2:53:41 AM ● Video Link: https://www.youtube.com/watch?v=pU9QC75uUMY

Duration: 49:51

6,153 views

Google Tech Talks
March 27, 2007

ABSTRACT

Consider a giant data matrix A of N rows and D columns. At Web scale, both N and D can be in the order of billions. In applications including duplicate (doc) detections, word associations, databases, nearest neighbors, kernels (e.g., for SVM), it is often desirable to store a very small fraction (sample) of the data to fit in physical memory for quickly computing summary statistics (e.g. L1 or L2 distances). Because the data are often highly sparse, conventional sampling methods (i.e., randomly selecting a few columns from the data matrix) would not work well. Two sampling methods, conditional random sampling (CRS) and stable random projections (SRP),...

Other Videos By Google TechTalks

2007-10-08	Ocean Wave Energy
2007-10-08	Efficient and Flexible Information Retrieval Using a...
2007-10-08	Open Source Speaker Series: SilverStripe CMS
2007-10-08	Towards Telesophy: Federating All the World' s Knowledge
2007-10-08	Beyond formalism: The art and science of designing pliant...
2007-10-08	PhotoTechEDU Day 9: Amateur Astrophotography
2007-10-08	Pimp my Genome! The Mainstreaming of Digital Genetic...
2007-10-08	Self-Reconfigurable Robots and Digital Hormones
2007-10-08	Multi-Texture Mapping Using the GPU
2007-10-08	Flex, Flash and Apollo for Rich Internet Applications
2007-10-08	Sampling Techniques for Massive Data
2007-10-08	Web Applications and the Ubiquitous Web
2007-10-08	Challenges in the Design of Sponsored Search Auctions
2007-10-08	Mobile in Africa: Doing HCI Differently in the...
2007-10-08	People as Medium: Some Principles of Responsive...
2007-10-08	Understanding SVG with Inkscape
2007-10-08	Can Poor Peoples' Incomes Grow: Liberalizing vs....
2007-10-08	Seattle Conference on Scalability: Abstractions for...
2007-10-08	BGP at 18: Lessons In Protocol Design
2007-10-08	Kyoto University Presentation
2007-10-08	weRobot: Robotics and Community for Learning and Exploration

Tags:

google

howto

sampling

techniques

massive

data

Channel	Latest
Thibault Triat	6 hours ago
🌙Odessa Amaris	6 hours ago
Mr Saint Jake	6 hours ago
Franchise Gaming	6 hours ago
CrispyPorkss	6 hours ago
M4cM4nus	6 hours ago
Brice Gaming Z	6 hours ago
Gemingu Channel	6 hours ago
Bota TCG	6 hours ago
Bladii	6 hours ago
노기의 게임방	6 hours ago
Nashara	6 hours ago
KyuAzaK	6 hours ago
flaposvk	6 hours ago
SoHo WTF	6 hours ago
Heroic Spartans	6 hours ago
XboxCZ	7 hours ago
Chuflocka	7 hours ago
Xbox	7 hours ago
NollywoodTVNOLLY	7 hours ago
NaIgre	7 hours ago
Kilovolt	7 hours ago
Lost	7 hours ago
GameCross	7 hours ago
Idea Factory International	7 hours ago