Semi-automatic WordNet Linking using Synset Embeddings

Published in LREC-Coling 2024, 2023

Currently under review

Abstract

WordNet is a lexical database of English words that groups them into sets of synonyms called synsets and provides information about the relationships between these synsets. Multilingual WordNets are lexical databases that extend the WordNet concept to multiple languages. They are organized in a similar way as WordNet, with sets of synonyms and relationships between them. Linking between multilingual WordNets is important for several reasons. It can aid in cross-lingual information retrieval, machine translation, and language learning. It also helps to identify relationships between concepts in different languages and facilitates the development of multilingual NLP systems. In this paper, we present our efforts to automate the process of linking IndoWordNet(Wordnet for Indian languages) with English WordNet. Main purpose behind presenting this automatic linking approach is to reduce the manual labour work done to link the Indian language synsets with English Wordnet. We propose two approaches based on the synset embedding and finding the best similar synset by one to all and filtering via translation method, the metric used for correlation between synset embeddings is cosine similarity. Additionally, we introduce a novel methodology for linking transitive words. Our results are evaluated and discussed to showcase the effectiveness of our proposed methods.