Using comparable corpora to filter bilingual dictionaries generated by transitivity
Keywords:
natural language processing, information extraction, comparable corpora, bilingual dictionariesAbstract
This article proposes a method for building new bilingual dictionaries from existing ones and the use of comparable corpora. More precisely, a new bilingual dictionary with pairs in two target languages is built in two steps. First, a noisy dictionary is generated by transitivity by crossing two existing dictionaries containing translation pairs in one of the two target languages and an intermediary one. The result of crossing the two existing dictionaries gives rise to a noisy resource because of the ambiguity of words in the intermediary language. Second, odd translation pairs are filtered out by making use of a set of bilingual lexicons automatically extracted from comparable corpora. The quality of the filtered dictionary is very high, close to that of those dictionaries built by lexicographs. We also report a case study where a new, non noisy, English-Portuguese dictionary with more than 7,000 bilingual entries was automatically generated.Downloads
Published
2014-09-24
How to Cite
Gamallo, P. (2014). Using comparable corpora to filter bilingual dictionaries generated by transitivity. DELTA: Documentação E Estudos Em Linguística Teórica E Aplicada, 30(2). Retrieved from https://revistas.pucsp.br/index.php/delta/article/view/6268
Issue
Section
Articles