The learner corpus path: a worthwhile methodological challenge



Palabras clave:

learner corpus, academic writing, EAP, Corpus Linguistics


Corpus compilation is a challenging research endeavor that many researchers decide to pursue. Few learner corpora, however, can be easily accessed (e.g.,the International Corpus of Learner English), and none of them carry a variety of text registers written by English learners at different proficiency levels studying in the Brazilian university context. Therefore, the aim of this paper is to present the compilation of a learner corpus, much needed in our research and teaching context, pointing out the advantages of building this type of corpus for the understanding of learners’ needs as well as for pedagogical decision-making based on sound data. Presenting a detailed rationale of the corpus compilation, this article reveals the various decisions made in order to guarantee that fair comparisons can be made. To exemplify the value of building a carefully designed corpus, results of previous studies are compared. Some of the conclusions reached refer to the need for discipline-specific tasks to propel writing proficiency and for authorship skills to be developed in English for Academic Purposes classes to foster academic success.

Biografía del autor/a

Deise Prina Dutra, Universidade Federal de Minas Gerais

Professora Titular da Faculdade de Letras da UFMG.

Atua na área do Inglês na graduação e no Programa de Pós-Graduação em Estudos Linguísticos nas linhas de Estudo Linguísticos Baseados em Corpora e Ensino/Aprendizagem de Línguas Estrangeiras. Seus interesses de pesquisa são: corpora de aprendiz e especializados, letramento acadêmico, internacionalização do ensino superior. Coordenadora do Grupo de Estudos em Corpora Especializados e de Aprendizes (GECEA/CNPq). Coordena o Setor de Proficiência Linguística da Diretoria de Relações Internacionais da UFMG.

Bárbara Malveira Orfanò, Universidade Federal de Minas Gerais

Professora Adjunta da Faculdade de Letras da UFMG. Atua na área do Inglês para Fins Acadêmicos na graduação e no Programa de Pós-Graduação em Estudos Linguísticos na linha de Ensino/Aprendizagem de Línguas Estrangeiras. Seus interesses de pesquisa são: corpora de aprendizes, discurso acadêmico oral e escrito da língua inglesa e formação de professores de língua inglesa. Desde 2018 é diretora do Instituto Confúcio da UFMG e coordena o Centro de Estudos da Ásia Oriental.

Annallena de Souza Guedes, Instituto Federal de Educação , Ciência e Tecnologia da Bahia (IFBA)

Professora de Inglês do Instituto Federal de Educação , Ciência e Tecnologia da Bahia (IFBA), campus Ilhéus. Integrante do Grupo de Estudos em Corpora Especializados e de Aprendizes (GECEA). Tem se dedicado à área da Linguística Aplicada, com ênfase em corpora de aprendizes de Inglês, escrita acadêmica, letramentos acadêmicos e fraseologia. 

Jessica Ceritello Alves, Universidade Federal de Minas Gerais

Bacharel em Tradução/Inglês pela UFMG. Mestranda do Programa de Pós-Graduação em Estudos Linguísticos - UFMG e bolsista do CNPq. Tem interesse em estudos em corpora de aprendizes e estudos longitudinais.   

João Gabriel Fekete, Universidade Federal de Minas Gerais

Bolsista de Iniciação Científica (PIBIC/CPNp) na da Faculdade de Letras da UFMG. Tem interesse em ensino de línguas, estatística, programação e linguística de corpus


Almeida, V. C., Orfanò, B. M., & Dutra, D. P. (in press). In V. Viana (Ed.), Is there a better choice? Raising learners’ awareness of academic collocations. Teaching English with corpora: A resource book. Routledge.

Altenberg, B., & Tapper, M. (1998). The use of adverbial connectors in advanced Swedish learners’ written English. In S. Granger (Ed.), Learner English on computer (pp. 80-93). Pearson Education.

Alves, A. L. L., & Pinto, P. T. (2018). A utilização de that-clauses em abstracts escritos por alunos-pesquisadores brasileiros. Entrepalavras, 8(2), 288-303.

Anthony, L. (2016). AntConc (Version 3.4.3) [Computer Software]. Waseda University.

Biber, D. (1993). Representativeness in Corpus Design. Literary and Linguistic Computing, 8(4), 243-257.

Biber, D., Reppen, R., Staples, S., & Egbert, J. (2020). Exploring the longitudinal development of grammatical complexity in the disciplinary writing of L2-English university students. International Journal of Learner Corpus Research, 6(1), 38-71.

Biber, D., & Conrad, S, (2009). Register, genre and style. Cambridge University Press.

Biber, D., & Finegan, E. (1994). Intra-textual variation within medical research articles. In N. Oostdijk, & P. de Haan (Eds.), Corpus-based research into language (pp. 201-221). Rodopi.

Biber, D., & Gray, B. (2016). Grammatical complexity in academic English: Linguistic change in writing. Cambridge University Press.

Biber, D., Grieve, J., & Iberri-Shea, G. (2009). Noun phrase modification. In G. Rohdenburg, & Schlüter, J. (Eds.), One language, two grammars? Differences between British and American English (pp. 182-193). Cambridge University Press.

Bohórquez, C. G. (2015). Eliminação de pacotes lexicais relacionados ao tópico e de pacotes lexicais em contexto de sobreposição: uma proposta metodológica para os estudos da linguística de corpus [Unpublisehd Master’s thesis]. Universidade Federal de Minas Gerais.

Carter, R., & McCarthy, M. (2006). Cambridge Grammar of English: Spoken and written English grammar and usage. Cambridge University Press.

Chen, C. W. (2006). The use of conjunctive adverbials in the academic papers of advanced Taiwanese EFL learners. International Journal of Corpus Linguistics, 11(1),113-130.

Common European Reference Framework for languages: learning, teaching, assessment. (2001). Council of Europe. (accessed December 7, 2021).

Crawford, W. J., & Csomay, E. (2016). Doing corpus linguistics. Routledge.

De Cock, S., Granger, S., Leech, G., & McEnery, T (1998). An automated approach to the phrasicon of EFL learners. In S. Granger (Ed.), Learner English on computer (pp. 67-79). Pearson Education.

Dutra, D., Queiroz, J. M. S., & Alves, J. C. (2017). Adding information in argumentative texts: a learner corpus-based study of additive linking adverbials. Estudos Anglo-Americanos, 46(1), 9-32.

Dutra, D. P., Orfanò, B. M., & Almeida, V. C. (2019). Result linking adverbials in learner corpora. Domínios de Lingu@gem, 13(1), 400-431.

Dutra, D. P., Queiroz, J. M. S., Macedo, L. D. de, Costa, D. D., & Mattos, E. (2020). Adjectives as nominal premodifiers in Chemistry and Applied Linguistics Corpora. In U. Römer, V. Cortes, & E. Friginal (Eds.), Advances in Corpus-based Research on Academic Writing Effects of discipline, register, and writer expertise (pp. 205-226). John Benjamins Publishing Company.

Fernández, E. M., Souza, R. A., & Carando, A. (2017). Bilingual innovations: Experimental evidence offers clues regarding the psycholinguistics of language change. Bilingualism: Language and Cognition, 20(2), 251-268.

Gilquin, G. (2015). From design to collection of learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 9-34). Cambridge University Press.

Gilquin, G., De Cock, S., & Granger, S. (2010). Louvain international database of spoken English interlanguage. Presses Universitaires de Louvain.

Goutéraux, P. (2013). Learners of English and Conversational Proficiency. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty years of Learner Corpus Research: Looking Back, Moving Ahead (pp.197-210). Presses Universitaires de Louvain.

Granger, S. (1998). The computerized learner corpus: a versatile new source of data for SLA research. In S. Granger (Ed.), Learner English on Computer. Longman.

Granger, S. (2015a). The contribution of learner corpora to reference and instructional materials design. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge handbook of learner corpus research (pp. 486-510). Cambridge University Press.

Granger, S. (2015b). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1(1), 7-24.

Granger, S., Dupont, M., Meunier, F., Naets, H., & Paquot, M. (2020). International corpus of learners English: Version 3. Presses Universitaires de Louvain.

Gray, B. (2015). Linguistic variation in research articles: When discipline tells only art of the story. John Benjamins Publishing Company.

Guedes, A. de S. (2017). Verbos do inglês acadêmico escrito e suas colocações: um estudo baseado em um corpus de aprendizes brasileiros de inglês [Unpublished PhD Dissertation]. Universidade Federal de Minas Gerais.

Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. Longman.

Johns, T. (1991). Should You Be Persuaded: Two Examples of Data-driven Learning. In T. Johns, & P. King (Eds.), Classroom Concordancing. ELR Journal, 4, 1-16.

Johns, T. (1994). From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. In T. Odlin (Ed.), Perspectives on pedagogical grammar (pp. 293-313). Cambridge University Press.

Johns, T. (1997). Contexts: The background, development and trialling of a concordance-based CALL program. In A. Wichmann, S. Fligelstone, T. McEnery, & G. Knowles (Eds.), Teaching and language corpora (pp. 100-115). Routledge.

Lee, H. (2011). In defense of concordancing: An application of data-driven learning in Taiwan. Procedia-Social and Behavioral Sciences, 12, 399-408.

Leffa, V. J., & Irala, V. B. (2014). O ensino de outra(s) língua(s) na contemporaneidade. In V. Leffa, & V. B. Irala (Eds.), Uma espiadinha na sala de aula: ensinando língua adicionais no Brasil (pp. 21-48). Educat.

Littré, D. (2015). Combining Experimental Data and Corpus Data: Intermediate French-speaking Learners and the English Present. Corpus Linguistics and Linguistic Theory, 11(1), 89-126.

Liu, D. (2008). Linking adverbials: An across-register corpus study and its implications. International Journal of Corpus Linguistics, 13(4), 491-518.

McEnery, T.; Xiao, R., & Tono, Y. (2006). Corpus-based language studies: an advanced resource book. Routledge.

Meunier, F., & Littré, D. (2013). Tracking Learners’ Progress. Adopting a Dual ‘Corpus Cum Experimental Data’ Approach. The Modern Language Journal, 97(1), 61-76.

Mitchell, R., Myles, F., & Marsden, E. (2013). Second language learning theories. Routledge.

Nesselhauf, N. (2004). Learner corpora and their potential for language teaching. How to use corpora in language teaching, 12, 125-156.

Nunes, L. P., & Orfanò, B. (2020). Investigating the system of transitivity in passive that-clauses of research abstracts. In N. Kenny, & L. Escobar (Eds.), The changing face of ESP in today’s classroom and workplace (pp. 63-177). Vernon Press.

O’Keefe, A., McCarthy, M., & Carter, R. (2007). From corpus to classroom: Language use and language teaching. Cambridge University Press.

Oliveira, S. B., Fonseca, M. M. S., & Marques, D. S. (2017). Collaborative writing: the English language learners gaze on Art. Fórum Linguístico, 14, 21-52. (accessed June 23, 2020).

Parkinson, J., & Musgrave, J. (2014). Development of noun phrase complexity in the writing of English for Academic Purposes students. Journal of English for Academic Purposes, 14, 48-59.

Queiroz, J. M. S. (2019). The Grammatical Complexity of English Noun Phrases in Brazilian Learner’s Academic Writing: A Corpus-based Study. [Unpublished master thesis]. Universidade Federal de Minas Gerais. (accessed 15 December, 2021).

Reppen, R. (2010). Building a corpus. The Routledge Handbook of Corpus Linguistics. Routledge.

Santos, M. A. (2018). Descrição do uso das conjunções but e however em redações acadêmicas em língua inglesa de nível B1 com base em corpus. [Unpublished master thesis]. Universidade Estadual Paulista Júlio de Mesquita Filho. (accessed 15 December, 2021).

Scott, M., & Tribble, C. (2006). Textual patterns: Key words and corpus analysis in language education. John Benjamins Publishing.

Selinker, L. Interlanguage. (1972). International Review of Applied Linguistics, 10(3), 209-231.

Sinclair, J. (Ed.). (1993). Collins COBUILD English Grammar. Collins.

Sinclair, J. (2005). Corpus and text-basic principles. In M. Wynne (Ed.), Developing linguistic corpora: A guide to good practice. Guide to Good Practice (pp. 1-16). Oxbow Books.

Staples, S., & Reppen, R. (2016). Understanding first-year L2 writing: A lexico-grammatical analysis across L1s, genres, and language ratings. Journal of second language writing, 32, 17-35.

Swales, J., & Feak, C. (2009). Abstracts and the writing of abstracts. Michigan University Press.

Tono, Y. (2003). Learner corpora: design, development and applications. Proceedings of the Corpus Linguistics 2003 Conference (pp. 800-809).

Xavier, A. D., Oliveira, S. B., & Souza, E. L. M. (2019). A construção de memes como ferramenta de ensino da língua inglesa. Periferia, 11, 140-161.

Zihan, Y. (2014). Linking adverbials in English. [Unpublished PhD Dissertation]. Victoria University of Wellington.



Cómo citar

Dutra, D. P., Orfanò, B. M., Guedes, A. de S., Alves, J. C., & Fekete, J. G. (2023). The learner corpus path: a worthwhile methodological challenge. DELTA: Documentação E Estudos Em Linguística Teórica E Aplicada, 38(2).