WordNet

Spanish WordNet in a readable Prolog format

With the aim of integrating the Spanish WordNet databases into a fuzzy logic extension of Prolog called \BPL, in

Pascual Julián Iranzo, Germán Rigau, Fernando Sáenz-Pérez, Pablo Velasco-Crespo: Conversion of the Spanish WordNet databases into a Prolog-readable formatLanguage Resources and Evaluation 59(2): 1631–1657 (2025)

we translated the Multilingual Central Repository (MCR) version of Spanish WordNet into a Prolog-compatible format. The programs that perform this translation are available at:
https://github.com/PabloVelascoCrespo/MCR_to_Prolog

This translation produces a set of Spanish lexical databases that enable access to WordNet information through declarative techniques and the deductive capabilities of the Prolog language.

However, the MCR databases lack important information regarding the relevance of words, and this information is therefore also missing in the Prolog version. To address this limitation, in recent work submitted to the Language Resources and Evaluation journal entitled Computing Word Relevance in the Spanish WordNet, we extracted word usage information from several annotated corpora in order to fill this gap in the Prolog version of the MCR Spanish WordNet. This process also makes it possible to compute the ordering of the words belonging to a synset.

We implemented a series of programs to compute word relevance and generate the updated file wn_s.pl from both the corpora and the MCR-WordNet information. These programs are organized as a sequence of steps forming a processing pipeline.

First step: computation of word occurrences in the corpora

1-procesar_ancora.py
1-procesar_SenSem.py
1-procesar_wikicorpus_parallel.py

Second step: corpora alignment

2-corpus_wn16_to_wn30.py
2-1-resolve_synset_not_found_by_context.py
2-2-resolve_label_subset.py
2-3-resolve_labels_jaccard_soft.py
2-resolve-single-synsets.py

Third step: computation of the Tag_Count, ordering of words within a synset, and generation of wn_s.csv

3-actualiza_tagcount.py
4-ordenar_synsets_W_num.py

Fourth step: conversion to Prolog

5-csv_a_prolog.py

These programs, together with a script that facilitates their execution, can be downloaded here.

Since some of the corpus annotations use synset identifiers corresponding to WordNet version 1.6, we developed several alignment methods to map these identifiers to WordNet version 3.0 in order to support the second step of the pipeline. The programs that perform this task can be downloaded here.

Leave a Reply

Your email address will not be published. Required fields are marked *