#2. Identifying Datasets

The Challenges of Finding Good Data

Finding a dataset was the biggest challenge in this project. Because there wasn't a clear set of data waiting to be mined, I had to go looking for it, with varing results. In this section, I'll talk about my initial ideas for potential datasets, the challenges that arose, the decision I made to change approaches, and the new challenges that arose.

Data That Didn't Work: The Archives

My initial idea for a dataset was influenced by my desire to use this project as an opportunity to identify primary source material in the Harvard archives. I had multiple meetings with archivists, and created a list of a wide range of possibilities for where interesting information might be. The problem was that this class moved too quickly to spend too much time identifying where my data could be. I needed it NOW.

I decided a simple solution would be to analyize multiple years of Harvard Yearbooks. The first book that I looked at, for the year of 1900, had at least one group photo posed in formalwear, plus an interesting collection of tailoring advertisements. I felt confident that looking at the following 10 years of yearbooks could prove instructive on how ideas of formalwear might have evolved during the years that the semi-formal "tuxedo" rose in popularity. I thought perhaps if I could find similar group photos posed in dinner jackets instead of tailcoats, I could identify when this fashion reached student culture. I also wanted to test out my hypothesis that one of the reasons that tuxedos rose in popularity was due to tailoring advertisements, used by tailors to promote this new item of dress.

Unfortunately, this idea turned out to be a bigger challenge than I'd anticipated. When I went back to digitize the following years, I found that many of them didn't even include advertisements. Such a disappointment! I decided that I could spend hours (days!) chasing down potential data sets in the archives and not even knowing what I might find in them.

A New Dataset: The Family Tree

I decided that, in the interest of time, I needed a dataset (ANY dataset!) ASAP, and that that was more important than the relative value to my actual project. I decided to revisit the Family Tree Project (see: Research Goals #3), in order to take advantage of easily-identifyable data and the clear objective of what to do with the information within the possible software programs.

This is the background for the new dataset that I'll talk more about in further pages.

Author: Chloe Chapin