Simply put, Linguistic Corpora are a collection of linguistic data. Here some of the most popular.
This data may be comprised of
- Speech and text databases
- Text corpora
- Other metadata-added textual resources used for language and linguistic research
Some text corpora uses are:
- Publishing: Dictionaries, grammar books, teaching materials, usage guides, thesauri. Increasingly, publishers are referring to the use they make of corpus facilities: it's important to know how well their corpora are planned and constructed.
- Linguistic Research: Raw data for studying lexis, syntax, morphology, semantics, discourse analysis, stylistics, sociolinguistics, etc.
- Artificial Intelligence: Data test bed for program development.
- Natural language: Processing Taggers, parsers, natural language understanding programs, spell checking word lists, etc.
- Language Teaching: Syllabus and materials design, classroom reference, independent learner research.