Sampling Techniques for Massive Data
Channel:
Subscribers:
348,000
Published on ● Video Link: https://www.youtube.com/watch?v=pU9QC75uUMY
Google Tech Talks
March 27, 2007
ABSTRACT
Consider a giant data matrix A of N rows and D columns. At Web scale, both N and D can be in the order of billions. In applications including duplicate (doc) detections, word associations, databases, nearest neighbors, kernels (e.g., for SVM), it is often desirable to store a very small fraction (sample) of the data to fit in physical memory for quickly computing summary statistics (e.g. L1 or L2 distances). Because the data are often highly sparse, conventional sampling methods (i.e., randomly selecting a few columns from the data matrix) would not work well. Two sampling methods, conditional random sampling (CRS) and stable random projections (SRP),...
Other Videos By Google TechTalks
Tags:
google
howto
sampling
techniques
massive
data