Using comparable corpora to filter bilingual dictionaries generated by transitivity

Authors

  • Pablo Gamallo Universidade de Santiago de Compostela

Keywords:

natural language processing, information extraction, comparable corpora, bilingual dictionaries

Abstract

This article proposes a method for building new bilingual dictionaries from existing ones and the use of comparable corpora. More precisely, a new bilingual dictionary with pairs in two target languages is built in two steps. First, a noisy dictionary is generated by transitivity by crossing two existing dictionaries containing translation pairs in one of the two target languages and an intermediary one. The result of crossing the two existing dictionaries gives rise to a noisy resource because of the ambiguity of words in the intermediary language. Second, odd translation pairs are filtered out by making use of a set of bilingual lexicons automatically extracted from comparable corpora. The quality of the filtered dictionary is very high, close to that of those dictionaries built by lexicographs. We also report a case study where a new, non noisy, English-Portuguese dictionary with more than 7,000 bilingual entries was automatically generated.

Author Biography

Pablo Gamallo, Universidade de Santiago de Compostela

Born 27 July 1969 at Vigo, Galiza, Spain Current Positions: “Ramón y Cajal” Researcher, University of Santiago de Compostela, Spain. Departamento de Língua Espanhola, Área de Linguística Computacional. Promoter and founder member of Cilenis, a Spin-Off on language technologies qualified as IEBT by the Galician government EDUCATION Mars 1998, Ph.D in Linguistics, Blaise Pascal University, France. October 1993, Master on Linguistics, Logic and Computing, Blaise Pascal University, France. July 1992, Graduated in Hispanic Languages, University of Santiago de Compostela, Galiza, Spain PREVIOUS POSITIONS 2004 - 2007 “Parga Pondal" Reseacher, University of Santiago de Compostela, Spain 2002 - 2004 Post-Doc supported by Fundação da Ciência e a Tecnologia (FCT), Ref: SFRG / BDP / 1189 / 2002, Group CITI, Faculdade de Ciência e Tecnologia, Universidade Nova de Lisboa, Portugal. 2000 - 2002 Post-Doc supported by Fundação da Ciência e a Tecnologia (FCT), PRAXIS XXI / BDP / 2213 / 99, Group CENTRIA, Faculdade de Ciência e Tecnologia, Universidade Nova de Lisboa, Portugal. 1999 - 2000 Auxiliar Professor (Asociado P3) at University of Vigo, Spain 1998 -1999 Auxiliar Professor (ATER) at University of Blaise Pascal, France

Published

2014-09-24

How to Cite

Gamallo, P. (2014). Using comparable corpora to filter bilingual dictionaries generated by transitivity. DELTA: Documentação E Estudos Em Linguística Teórica E Aplicada, 30(2). Retrieved from https://revistas.pucsp.br/index.php/delta/article/view/6268

Issue

Section

Articles