OCR/Step Two

This dataset continued to be extremely efficient when it came to the OCR process. Once again, there was no need to OCR as the file from google had already manipulated the digitized file. Google Books allowed me to download the EPUB file of the book unto my desktop. With the already created EPUB file, I could then check to see if the text had been converted into a manipuable file.

For the next several steps, I used and will keep using both applications atom and sublime. They will allow me to read the data that I have as well as create any neccesary changes. When the EPUB file of David Barnes book was opened in Atom, the file contained several xml and jpg files. The jpg files were every page that did not contain text. The xml files were what contained all of the data I was looking to use. The image below shows all of the contents that were contained within the EPUB file. This dataset continues to be workable as a majority of the digitization steps need in my research have already been excecuted.

Screen Shot 2016-10-23 at 9.07.53 PM.png

Content found within EPUB file of book