Centre de recherche informatique de Montréal (CRIM)
Humanity is awash in a constant flood of data, and research often requires pouring through thousands of data points to look for evidence and gather statistics. While computers are excellent at some text processing tasks such as searching for keywords, they are very easily confused by human language and lack the ability to understand typos, shorthand, idiomatic phrases, archaic spellings, and most importantly, the intent behind words. Without natural language training, computers cannot sift through large bodies of text except in the simplest of cases.
This prompted researchers at Centre de recherche informatique de Montréal (CRIM) to create PACTE, a research software platform for collaborative text annotation and analysis. Training computers to understand text requires the use of annotations: small tags inserted into a text that explain to the computer what the text represents. These annotations point out how the text’s grammar is structured, what meaning the text may contain, and what specific constructs are worthy of attention. PACTE is designed to simplify the entire process of machine learning: managing huge text databases, manually annotating texts, training learning algorithms, computer-annotating text, and analyzing the results.