Skip to main content


Linguistic Corpora

Simply put, Linguistic Corpora are a collection of linguistic data. Here some of the most popular.
This data may be comprised of
  • Speech and text databases
  • Lexicons
  • Text corpora
  • Other metadata-added textual resources used for language and linguistic research
Some text corpora uses are:
  • Publishing: Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it's important to know how well their corpora are planned and constructed.
  • Linguistic Research: Raw data for studying lexis, syntax, morphology, semantics, discourse analysis, stylistics, sociolinguistics, etc.
  • Artificial Intelligence: Data test bed for program development.
  • Natural language: Processing Taggers, parsers, natural language understanding programs, spell checking word lists, etc.
  • Language Teaching: Syllabus and materials design, classroom reference, independent learner research.

List of Corpora

University of Alberta Libraries - University of Alberta, Edmonton, AB, Canada T6G 2R3 - We are located on Treaty 6 / Métis Territory.