History and compilation of a large registerdiversified corpus of portuguese at cepril

Autores/as

  • Tony Berber Sardinha Pontifical Catholic University of São Paulo (PUC-SP), São Paulo, Brasil

Palabras clave:

corpora, DIRECT, CEPRIL, Corpus Linguistics

Resumen

In this paper I describe the Bank of Portuguese, a large registerdiversified corpus of Brazilian Portuguese, which is held at CEPRIL (Center for Language Research, Information and Resources) at Pontifícia Universidade Católica de São Paulo (Pontifical Catholic University of São Paulo, Brazil). The aim is to provide details of its nature, history, current state, as well as of issues related to its planning, development and future prospects. With nearly 230 million words, it is currently one of the largest corpora of Portuguese. The corpus started off as a collection of texts in hard copy and then turned into an electronic collection built around smaller corpora that were collected by individual researchers. Later on, other large subcorpora were added, such as a newspaper collection. There are problems with the corpus, such as register imbalance (the newspaper section is much larger than the others), lack of access to its full contents outside of the university, and the need for updating its contents.

Biografía del autor/a

Tony Berber Sardinha, Pontifical Catholic University of São Paulo (PUC-SP), São Paulo, Brasil

Tony Berber Sardinha received a BA in English from the Catholic University of São Paulo, Brazil, an MA in Applied Linguistics from the same university and a PhD from the English Department of the University of Liverpool (UK). He is a researcher with CNPq (Brazilian National Research Council) and CEPRIL (Center for Research, Resources and Information on Language), an Adjunct Professor with both the Linguistics Department and the Graduate Program in Applied Linguistics, Catholic University of São Paulo. He was recently a visiting scholar in Corpus Linguistics at Northern Arizona University (USA) and his research interests include Corpus Linguistics, Applied Linguistics, Language Teaching, Business Discourse, Metaphor, Forensic Linguistics, Computer Programming, and Web Design and Tools Development.

Descargas

Cómo citar

Sardinha, T. B. (2011). History and compilation of a large registerdiversified corpus of portuguese at cepril. The ESPecialist, 28(2). Recuperado a partir de https://revistas.pucsp.br/index.php/esp/article/view/6175

Número

Sección

Papers