The past week BlueGreen Labs provided expert input at the Hermes Bring-Your-Own-Data (BYOD) workshop on text recognition and text analysis at the Leibniz-Institute for European History (IEG). The HERMES program provides resources teach topics in humanities education in research, data, and methods.
The workshop organized by Prof. Dr. Monika Barget brought together participants from the digital humanities to talk about data workflows, issues they encountered and the role of Machine Learning (ML) and Artificial Intelligence (AI) within this context. People shared their own experiences and both experts provided strategic feedback and real world examples.
We touched upon topics such as computer vision based methods to clean up data, the use of these methods for registration. ML based approaches for text segmentation, and the use of Large Language Models (LLMs), Natural Language Processing (NLP) such as Named Entity Recognition (NER) in post-processing. Discussions around cost and effort were also common, here strategic advice (depending on size, scope and budget of a project) provided key guidance for participants.
Reference material
The HERMES website and blog:
Workshop notes by Prof. dr. Monika Barget:
https://monikabarget.github.io/atr-historical-research/
HTR/OCR project management advice by BlueGreen Labs:
https://bluegreen-labs.github.io/text_recognition_and_analysis/