Skip to Main Content

Digital Humanities

Data and Text Mining

   

What is text and data mining?

Text and data mining (TDM) are often used together although they are slightly different.  Both use computational analysis of vast quantities of digital information.  However text mining looks more at  natural language text  and data mining at structured data. 

Researchers use specialized tools to extract data, identify trends, look for patterns and better understand the relationships of terms within and between documents. Analysis might focus on word frequency, words that frequently appear near each other, contextual information for key words, common phrases and other patterns. 

Analyzes can happen from websites (such as publicly available Facebook posts),  to 16th C. manuscripts,  to DNA sequences,  as well as old newspapers

UAlberta Library databases that allow some mining

These resources allow for some kind of mining. Please contact a subject librarian for more details and to investigate options (and they can contact our licensing department ). Note: many databases do not allow TDM, or have quite limited options, without individual agreements with the company and possible extra fees.

Electronic Text Centers

Free Sources: Open access databases and repositories used for text and data mining

Text Collections