Words, links, and patterns: novel representations for Web-scale text mining

Channel:

Subscribers:

351,000

Published on September 6, 2016 5:01:20 AM ● Video Link: https://www.youtube.com/watch?v=c_k4sQMhuqU

Duration: 1:11:44

44 views

Textual data is everywhere, in email and scientific papers, in online newspapers and e-commerce sites. The Web contains more than 200 terabytes of text not even counting the contents of dynamic textual databases. This enormous source of knowledge is seriously underexploited. Textual documents on the Web are very hard to model computationally: they are mostly unstructured, time-dependent, collectively authored, multilingual, and of uneven importance. Traditional grammar-based techniques don't scale up to address such problems. Novel representations and analytical tools are needed. I will discuss several recent contributions related to text mining from a variety of genres. More specifically these include (a) lexical models of the growth of the Web, (b) graph-based entity classification, (c) evolving news summarization, and (d) mining protein interactions in papers. As it turns out, the right representations, when complemented with traditional NLP techniques, turn all of these into instances of better studied problems in areas such as social networks, statistical mechanics, sequence analysis, and computational phylogenetics.

Other Videos By Microsoft Research

2016-09-05	Parameterized Model Checking of Protocols: Two Developments
2016-09-05	A Sample of Monte Carlo Methods in Robotics and Vision
2016-09-05	Virtual Customer Environments & Customer Involvement in Innovation and Value Creation
2016-09-05	Large Margin Generative Models
2016-09-05	Strategies for Enhancing Ethnic and Gender Diversity in Engineering and Computer Science
2016-09-05	DRM and MSFT: a product no customer wants
2016-09-05	A Dynamic Pari-Mutuel Market for Hedging, Wagering, and Information Aggregation
2016-09-05	Dynamic Point Samples for Free-Viewpoint Video
2016-09-05	Fast Belief Propagation for Early Vision
2016-09-05	Culture and Prosperity: The Truth About Markets
2016-09-05	Words, links, and patterns: novel representations for Web-scale text mining
2016-09-05	Asymptotic Enumeration of Spanning Trees via Traces and Random Walks [1/27]
2016-09-05	Traffic Constraints Instead of Traffic Matrices: A New Approach to Traffic Characterization
2016-09-05	Unknowable
2016-09-05	Neighbourhood Component Analysis
2016-09-05	Defying Categorization: DXARTS
2016-09-05	Complex Arithmetic for Hardware Implementation: Division and Square Root
2016-09-05	Distributed Implementations of Vickrey-Clarke-Groves Mechanisms
2016-09-05	Not Even Wrong
2016-09-05	Measurement and Monitoring in Wireless Sensor Networks
2016-09-05	From TimeSync to EmStar: What's really hard in sensor networks?

Tags:

microsoft research

Channel	Latest
Mehmet Uzun	6 hours ago
domisumReplay: Syndra	6 hours ago
domisumReplay: Mordekaiser	6 hours ago
Shhoto	6 hours ago
DismArchus	6 hours ago
Baba Behwish	6 hours ago
domisumReplay: Aatrox	6 hours ago
domisumReplay: Akali	7 hours ago
domisumReplay: Sett	7 hours ago
domisumReplay: Kayle	7 hours ago
iTownGamePlay Terror&Diversión	7 hours ago
Nickich	7 hours ago
League of SUPPORT - LOL Replays	7 hours ago
Happy Animes Recaps	7 hours ago
SiIvaGunner	7 hours ago
Oh Shiitake Mushrooms	7 hours ago
domisumReplay: Nasus	7 hours ago
domisumReplay: Ahri	8 hours ago
HeroVoltsy	8 hours ago
JustSaySteven	8 hours ago
WildGamerSK	8 hours ago
Fz Frost	8 hours ago
RobtheMod	8 hours ago
domisumReplay: Camille	8 hours ago
Tivvv3 TivvyCat!	8 hours ago