HERMES Bring-Your-Own-Data workshop

Teaching OCR/HTR machine learning techniques

The past week BlueGreen Labs provided expert input at the Hermes Bring-Your-Own-Data (BYOD) workshop on text recognition and text analysis at the Leibniz-Institute for European History (IEG). The HERMES program provides resources teach topics in humanities education in research, data, and methods.

The workshop organized by Prof. Dr. Monika Barget brought together participants from the digital humanities to talk about data workflows, issues they encountered and the role of Machine Learning (ML) and Artificial Intelligence (AI) within this context. People shared their own experiences and both experts provided strategic feedback and real world examples.

We touched upon topics such as computer vision based methods to clean up data, the use of these methods for registration. ML based approaches for text segmentation, and the use of Large Language Models (LLMs), Natural Language Processing (NLP) such as Named Entity Recognition (NER) in post-processing. Discussions around cost and effort were also common, here strategic advice (depending on size, scope and budget of a project) provided key guidance for participants.

Reference material

The HERMES website and blog:

https://hermes-hub.de/

Workshop notes by Prof. dr. Monika Barget:

https://monikabarget.github.io/atr-historical-research/

HTR/OCR project management advice by BlueGreen Labs:

https://bluegreen-labs.github.io/text_recognition_and_analysis/

Avatar
Koen Hufkens, PhD
Founder, Researcher

As an earth system scientist and ecologist I model ecosystem processes.

Related

Next
Previous