Resource acquisition via an unsupervised WSD system
In the field of WSD, it has been established that supervised systems (systems that rely on sense labeled data) perform significantly better than unsupervised WSD systems. Unfortunately, acquiring manual sense annotated data has proven to be a tedious expensive task. In this talk, I will show you how I use my unsupervised system, SALAAM, to acquire high quality sense annotated data to be used by supervised WSD systems for training. The approach could effectively reduces manual annotations by at least 40. In this portion of my talk I will characterize some crucial identifying features for discovering which automatically annotated data could be useful and which ones do require hand labeling. Another use for SALAAM lies in the ability to bootstrap resources for different languages. I will illustrate how I build an Arabic WordNet with a relatively high accuracy based on human ratings and judgments. The results obtained are very much in agreement with results reported by builders of the EuroWordNet.