Voyant & Wordle

I became interested in using Voyant and Wordle as tools for analyzing the reasons for arrest of the clergy members.  In particular, the original CSV file contained the reasons for arrest for a fraction of the clergy.  The reasons for arrest were not regularized or even in one language (they were in both English and German), meaning that it was difficult to do quantitative analysis of the reasons for arrest.  However, the reasons for arrest are incredibly interesting in their own right, and therefore, I chose to use Voyant and Wordle to attempt to learn more about the reasons for arrest and produce word clouds.

Voyant is an online tool that, given a corpus of texts, produces interesting summary statistics about the text, such as the frequency of words and their distributions throughout the text.  

I began exploring with Voyant by writing a Python script that pulled all of the reasons for arrest and wrote them to a text file. The Python script can be found below:

After uploading the text to Voyant, Voyant produced the following interface:

voyant.png

Discounting the common German words, the remaining most frequent words are quite interesting: propaganda, reichsfeindlich, enemy, etc (note that Voyant implicitly filters out common English words).  Thus, I wanted to produce a word cloud without the common English and German words.  I first tried using Wordle to produce this word cloud.  Wordle is an online tool that specializes in word clouds that provides more flexibility that Voyant's word cloud feature.  In particular, Wordle provides a feature that filters out common words for many languages.  Copying my text into Wordle, Wordle provided the following options for filtering out common words.

wordle_interface.png

However, Wordle unfortunately does not allow both English and German to be selected simultaneously.  For this reason, I used Atom to find and remove all of the common  German words that appeared most frequently in the text: er, der, die, und, hatte, des, war, den, mit, von, das, dem, eine, im, ein, seine, wurde, zu, dass, einer, sich, als, sie, auf, einem, es, nach, aus, ihm, vor, eines, aber, bei, seiner, man, diese, weil, zur, ist, zum, sei, vom, de, gab, damit.  Naturally, this did not remove all common German words, but it did remove many of the most frequent ones.  Copying this edited text into Wordle and using the feature to fill out common English words, I arrived at a more accurate word cloud.  With some changing of font, colors, and layout, I arrived at the following word cloud:

wordle_4.png

For a more detailed description of the word cloud itself, please refer to the page Word Cloud of Reasons for Arrest Using Wordle in the Presentation of Results section.

I also created a list of the names of the 2679 clergy incarcerated at Dachau to serve as the banner of this exhibit.  I created this list by writing all of the names to a text file using the following Python code:

I then generated the banner of this exhibit by using an image editor to copy the text into the PNG file consisting of a white background and then manipulating the textbox to be the correct size.  The banner appears as follows:

compressed copy-min.png

Note that in the actual banner, a few of the bottom rows are cut off, as it is very difficult to find the exact dimensions to fit the banner window perfectly (in fact, because the banner is a function of screen size, this may not even be possible).  Regardless, the size of the text in the banner is intended to convey the scale of just how many clergy were incarcerated at Dachau Concentration Camp.