From the Information Extraction Pipeline to Global Models, and Back
Decisions in information extraction (IE), such as determining the types and relations of entities mentioned in text, depend on each other. To remain efficient, most systems make decisions in a sequential pipeline fashion, even if later decisions could help earlier ones. In this talk I will show how we used Conditional Random Fields to make these decisions jointly, substantially outperformed less global approaches and ranked first in several international IE competitions. I will then present relaxation methods we developed and applied to scale up (exact) inference in such models. In the final part of my talk I will argue why we should not dismiss the pipeline and present an exact beam-search algorithm, based on column generation, to overcome the pipeline's greedy nature.