Tag: NLP

Analysis of Media Coverage on Refugees

Analysis of Media Coverage on Refugees

Hannah Yan Han is doing #100dayprojects on data science and visual storytelling and I can only recommend that you take a look yourself. Below you find her R text analysis (#41) of UNHCR speeches and TV coverage on refugees.

Unsurprisingly, nouns like asylum, repatriation, displacement, persecution, plight, and crisis appear significantly more often in UNHCR speeches on refugees than in general English texts. The first visualization below shows the action-oriented verbs most commonly used in combination with these nouns.

This second visualization shows the most occurring verb-noun pairs.

Hannah used newsflash to retrieve the GDELT data on US TV news. Some channels seem to cover refugees more than others. I would have loved to see which topics occurred on each channel, but unfortunately she did not report on this.

TACIT: An open-source Text Analysis, Crawling, and Interpretation Tool

Click here for the original PDF: TACIT 2017


The first programs for (scientific) text mining are already over 50 years old. More recent efforts, such as the Linguistic Inquiry Word Count (LIWC; Tausczik & Pennebaker, 2010), have greatly improved our text analytical capabilities. Moreover, several single-purpose programs have been developed, which also consider syntactic text structures (e.g., Syntactic Complexity Analyzer [Lu, 2010], TAALES [Kyle & Crossley, 2015]).However, the widespread use of many of these programs has been hampered by two major barriers.

First, considerable technical expertise is required, which obstructs researchers without statistical backgrounds. For example, packages such as tm in R (Meyer et al., 2015) have been developed to conduct natural-language processing, but the steep learning curve forms a challenge. Additionally, the constant increase of computational processing power and the proliferation of new algorithms makes it difficult for researchers to maintain working knowledge of state-of-the-art methods.

Alternatively, most of the existing user-friendly NLP programs (and packages), such as RapidMiner (Akthar & Hahne, 2012), SAS Text Miner (Abell, 2014), or SPSS Modeler (IBM Corp., 2011), charge either a large software fee up front or a subscription fee. The cost of these programs can be prohibitively expensive for junior researchers and researchers looking to integrate new techniques into their research toolbox.

In the attached article, TACIT is introduced: Text Analysis, Crawling and Investigation Tool. TACIT is an open-source architecture that establishes a pipeline between the various stages of text-based research by integrating tools for text mining, data cleaning, and analysis under a single user-friendly architecture. In addition to being prepackaged with a range of easily applied, cutting-edge methods, TACIT’s design also allows other researchers to write their own plugins.

The authors’ hope is that TACIT can facilitate the integration and use of advancements in computational linguistics in psychological research, and by doing so can help researchers make use of the ever-growing documents of our social discourse in ways that have previously not been possible.