Using Compression Models to Filter Spam; Exploiting Structural Information for Categorization

Channel:

Microsoft Research

Subscribers:

349,000

Published on September 6, 2016 6:06:08 AM ● Video Link: https://www.youtube.com/watch?v=5mroGBzu1O8

Category:

Guide

Duration: 1:14:18

72 views

In the first part of this talk, I will present a spam filtering method based on statistical data compression models. The nature of these models allows them to be employed as Bayesian text classifiers based on character sequences. The models are fast to construct and incrementally updateable. I will present experimental results indicating that this method performs well in comparison to established spam filters, and that the method is extremely robust to noise, which should make it difficult for spammers to defeat. I will also give some examples, which show that the method is capable of picking up interesting, non-trivial patterns that are indicative of spam/ham. The second part of this talk describes how to exploit structural information for document categorization.┬á Classifier stacking can be used to exploit the structure of semi-structured documents for improved text categorization performance. In this approach, a meta-classifier is used to combine predictions based on different structural elements. It will be shown that this approach consistently outperforms a flat-text linear SVM on a number of standard text categorization datasets, often by a wide margin. I will present selected nomograms that visualize the resulting meta-classifier and give interesting insight into the characteristics of the datasets.

Other Videos By Microsoft Research

2016-09-06	Efficient Actions in Dynamic Auction Environment
2016-09-06	Two Network Coding Talks for the price of one: Security, Low Complexity
2016-09-06	Some recent results in camera calibration and shape reconstruction
2016-09-06	Implicit Feedback: Techniques for Deployment and Evaluation
2016-09-06	Better k-best Parsing, Hypergraphs, and Dynamic Programming
2016-09-06	Rock 'n Roll : Earthquake & Disaster Preparedness
2016-09-06	Understanding Customers: Shaping Our Future through Understanding Social Change
2016-09-06	Fast Database and Data Streaming Operations using Graphics Processors
2016-09-06	Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
2016-09-06	Multi-Engine Machine Translation Guided by Explicit Word Matching
2016-09-06	Using Compression Models to Filter Spam; Exploiting Structural Information for Categorization
2016-09-06	The Man Who Knew Too Much: Alan Turing and the Invention of the Computer [1/4]
2016-09-06	Estimation of intrinsic dimensionality using high-rate vector quantization
2016-09-06	Abducted: How People Come to Believe They Were Kidnapped by Aliens [1/11]
2016-09-06	Spontaneous Speech: Challenges and Opportunities for Parsing
2016-09-06	Some Recent Advances in Gaussian Mixture Modeling for Speech Recognition
2016-09-06	How to Survive a Robot Uprising: Tips to Defend Yourself Against The Coming Rebellion
2016-09-06	Body for Life for Women
2016-09-06	A Low-level Approach to Reuse for Programming-Language Infrastructure
2016-09-06	Sensor Networks Workshop 05 - Short Talks (See Abstract)
2016-09-06	Sensor Networks Workshop 05 - Keynote

Tags:

microsoft research

Channel	Latest
VEGETTA777	6 hours ago
TG RAJU 28	11 hours ago
DEADLY 999 ʸᵀ	11 hours ago
Supervettel	11 hours ago
Mrx Mobile	11 hours ago
쫀미뇽의 지 영상 저장실	12 hours ago
Misty Kathrine	12 hours ago
KTheme. com	12 hours ago
Despair_Dead	12 hours ago
papa스머프	12 hours ago
Asayhi channel	12 hours ago
AVENGERS GAMING WORLD	12 hours ago
DoiMaiDoi	12 hours ago
Naveen Yadav Gaming	12 hours ago
QULISH TECH	12 hours ago
PONSEL PINTAR	12 hours ago
alistudio gadget	12 hours ago
Tuan Nguyễn	12 hours ago
OverWatch POTG	12 hours ago
sonicfan5340	12 hours ago
Mocreinart Ent	12 hours ago
채널 명예훈장	12 hours ago
HSAComedyRemix	12 hours ago
광배언니	12 hours ago
SebasBloxxxx	12 hours ago