Methodology Part III - Topic Modeling

I have followed through with the Programming Historian's tutorial on topic modeling, and familiarizied myself with the basics of using MALLET. In order to execute topic modeling, however, entailed a special process of preparing my dataset, namely, breaking the current corpus into separtae files that contain one entry each. This was implemented through the following script in FileMaker Pro.

Export Text.png

Script in FileMaker to export text into separate text files.

After preparing my dataset for topic modeling, I have yet to implement the MALLET program on it. Another obstacle that needs to be resolved is procuring stop words for Classical Chinese. Running MALLET without using these stop words to train its functions would return results that are not useful.