Key Phrase Indexing With Controlled Vocabularies

Subscribers:
21,100
Published on ● Video Link: https://www.youtube.com/watch?v=wXtsgUx9QAg



Duration: 44:11
3,490 views
27


Google TechTalks
June 21, 2006

Olena Medelyan is a grad student who has just started on a Google-funded PhD scholarship, looking at keyphrase extraction using lexical and linguistic techniques.

ABSTRACT
Keyphrases are widely used in information retrieval as a brief but precise summary of documents. They are usually selected by professional human indexers. The more consistent the indexers are with each other, the higher the retrieval efficiency. 1. We describe an experiment where six professionals assigned keyphrases from a controlled vocabulary to the same documents, and evaluate their indexing consistency. Interesting patterns discovered in this experiment helped in developing an automatic approach for this task. 2. The keyphrase extraction algorithm KEA++ extracts phrases from the documents and maps them onto index terms from a domain-specific thesaurus. A machine learning scheme determines the most significant phrases based on their statistical, syntactic and semantic properties. The evaluation reveals that KEA++ is almost as consistent with the indexers as they with each other. 3. It is important that a keyphrase set covers all main topics of a document.

Google engEDU







Tags:
googlevideo