Improving the discoverability of zooarchaeological using Natural Language Processing
Leontien Talboom
he amount of digital archaeological data has grown rapidly in recent years, much of which is textual data contained in unpublished fieldwork reports resulting from contractor-led research, making it harder for users to discover relevant data. This paper will discuss research exploring the use of Natural Language Processing for the discoverability of zooarchaeological data within textual documents. This includes the creation of a Named Entity Recognition (NER) tool using deep neural networks, which has shown promising results. The model outperforms previous classifiers on all evaluated datasets, is fast to train, and suitable for smaller datasets. The importance of data preparation, controlled vocabularies and the involvement of domain experts were important factors for creating a reliable and useful tool. In order to show the utility of automated metadata extraction systems, a search tool and pipeline were developed, allowing users to search and filter archaeological reports according to animal remains present in these textual documents.