Skip to main content


Linguistic Corpora

Linguistic Corpora are a collection of linguistic data.This data may be comprised of:
  • Speech and text databases
  • Lexicons
  • Text corpora
  • Other metadata-added textual resources used for language and linguistic research
Some text corpora uses are:
  • Publishing: Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it's important to know how well their corpora are planned and constructed.
  • Linguistic Research: Raw data for studying lexis, syntax, morphology, semantics, discourse analysis, stylistics, sociolinguistics, etc.
  • Artificial Intelligence: Data test bed for program development.
  • Natural language: Processing Taggers, parsers, natural language understanding programs, spell checking word lists, etc.
  • Language Teaching: Syllabus and materials design, classroom reference, independent learner research.

English Corpora

Multilingual Corpora