Linking the Datasets

The first step in linking the datasets is to target the 66 biographies of the priests who are listed in the main CSV file containing information of the 2679 clergy members.  I accomplished this by first importing both CSV files into FileMaker Pro and adding a unique ID field to both CSV files, thereby reducing the task to estabilshing a series of ID pairs between the two CSV files.  I then exported the ID and name from the Dachau incarceration data CSV file, as well as the new biographies CSV file, in order to perform the string matching in Python.

For the biographies, I stored each name in the form of one continuous string (in order to avoid having to handle the cases in which the priests had two, three, or four middle names).  This string was of the form "last name, first name, middle name(s)".  Contrastingly, the Dachau incarceration data file contains the first name and last name of each priest.  Therefore, the approach that I adopted was to iterate over each of the names in each CSV file in the form of a double 'for' loop and check whether the biography name string started with the last name and also contained the first name from the Dachau incarceration data file.  This handled the overwhelming majority of the cases; however, there were still edge cases, such as the name "Hans" appearing in one dataset but "Johann" appearing in the other.  Fortunately, because there were only 66 biographies to target, I could handle the edge cases on a case-by-case basis.  Similarly, I had to eliminate false matches, but these, too, were straightforward to identify and handle.  In total, the code below allowed me to match 63 of the biographies; I was unable to recover the remaining 3 biographies.  Furthermore, the below code adds the corresponding Dachau ID to the biographies CSV file, allowing  the two CSV files to be related in a relational database in FileMaker Pro, ideal for the methodology portion of this project. 

Running this code produced a CSV file with the following columns:  Dachau ID, Biographies ID, Last Name, First Name, Birth Day, Birth Month, Birth Year, Rest of Biography.

I then used the regular expressions to separate the biography and the references using the following Find and Replace:

. (Lit)

.","$1

This file was then ready to be imported into FileMaker Pro.

Note:  I embedded the Jupyter Notebook above by saving my python script as a Jupyter notebook, then using the command nbconvert on the command line to convert it to an HTML file:

nbconvert.png

With Jeremy's help, I then used Github Pages to host the HTML file.  Displaying the Jupyter Notebook was then as simple as creating an iframe with the appropriate URL.