The Linguistic Data Consortium (LDC) is an open consortium of various research organizations, and is hosted by the University of Pennsylvania, which creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes.
The LDC Catalog includes:
- Hundreds of corpora of language data
- Indexed collections of Arabic, Chinese and English newswire text
- Millions of words of English telephone speech from the Switchboard and Fisher collection
The University of Alberta is an institutional member of the Linguistic Data Consortium (LDC). All language corpora listed in the LDC Catalog are available on request.