Key Phrase Indexing With Controlled Vocabularies
Google TechTalks
June 21, 2006
Olena Medelyan is a grad student who has just started on a Google-funded PhD scholarship, looking at keyphrase extraction using lexical and linguistic techniques.
ABSTRACT
Keyphrases are widely used in information retrieval as a brief but precise summary of documents. They are usually selected by professional human indexers. The more consistent the indexers are with each other, the higher the retrieval efficiency. 1. We describe an experiment where six professionals assigned keyphrases from a controlled vocabulary to the same documents, and evaluate their indexing consistency. Interesting patterns discovered in this experiment helped in developing an automatic approach for this task. 2. The keyphrase extraction algorithm KEA++ extracts phrases from the documents and maps them onto index terms from a domain-specific thesaurus. A machine learning scheme determines the most significant phrases based on their statistical, syntactic and semantic properties. The evaluation reveals that KEA++ is almost as consistent with the indexers as they with each other. 3. It is important that a keyphrase set covers all main topics of a document.
Google engEDU