Using Compression Models to Filter Spam; Exploiting Structural Information for Categorization

Subscribers:
349,000
Published on ● Video Link: https://www.youtube.com/watch?v=5mroGBzu1O8



Category:
Guide
Duration: 1:14:18
72 views
3


In the first part of this talk, I will present a spam filtering method based on statistical data compression models. The nature of these models allows them to be employed as Bayesian text classifiers based on character sequences. The models are fast to construct and incrementally updateable. I will present experimental results indicating that this method performs well in comparison to established spam filters, and that the method is extremely robust to noise, which should make it difficult for spammers to defeat. I will also give some examples, which show that the method is capable of picking up interesting, non-trivial patterns that are indicative of spam/ham. The second part of this talk describes how to exploit structural information for document categorization.  Classifier stacking can be used to exploit the structure of semi-structured documents for improved text categorization performance. In this approach, a meta-classifier is used to combine predictions based on different structural elements. It will be shown that this approach consistently outperforms a flat-text linear SVM on a number of standard text categorization datasets, often by a wide margin. I will present selected nomograms that visualize the resulting meta-classifier and give interesting insight into the characteristics of the datasets.




Other Videos By Microsoft Research


2016-09-06Efficient Actions in Dynamic Auction Environment
2016-09-06Two Network Coding Talks for the price of one: Security, Low Complexity
2016-09-06Some recent results in camera calibration and shape reconstruction
2016-09-06Implicit Feedback: Techniques for Deployment and Evaluation
2016-09-06Better k-best Parsing, Hypergraphs, and Dynamic Programming
2016-09-06Rock 'n Roll : Earthquake & Disaster Preparedness
2016-09-06Understanding Customers: Shaping Our Future through Understanding Social Change
2016-09-06Fast Database and Data Streaming Operations using Graphics Processors
2016-09-06Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
2016-09-06Multi-Engine Machine Translation Guided by Explicit Word Matching
2016-09-06Using Compression Models to Filter Spam; Exploiting Structural Information for Categorization
2016-09-06The Man Who Knew Too Much: Alan Turing and the Invention of the Computer [1/4]
2016-09-06Estimation of intrinsic dimensionality using high-rate vector quantization
2016-09-06Abducted: How People Come to Believe They Were Kidnapped by Aliens [1/11]
2016-09-06Spontaneous Speech: Challenges and Opportunities for Parsing
2016-09-06Some Recent Advances in Gaussian Mixture Modeling for Speech Recognition
2016-09-06How to Survive a Robot Uprising: Tips to Defend Yourself Against The Coming Rebellion
2016-09-06Body for Life for Women
2016-09-06A Low-level Approach to Reuse for Programming-Language Infrastructure
2016-09-06Sensor Networks Workshop 05 - Short Talks (See Abstract)
2016-09-06Sensor Networks Workshop 05 - Keynote



Tags:
microsoft research