Handwritten text recognition of 'Jungle Weather' data

During the COBECORE project large volumes of data were transcribed by citizen scientists. However, the dataset transcribed was only 4% of the total volume of data digitized during the project. This was by design, as the data only serves as a way to train in-house handwritten text recognition solutions.

Despite the availability of commercial solutions transcribing all data would still require a significant budget, with no upper bound defined as quality is not guaranteed. In order to transcribe the whole dataset, limit costs and privacy liabilities, a bespoke solution was needed.

Recent experiments with the Keras framework using vanilla machine learning models, trained on the Jungle Weather data, shows significant progress. The potential to train our own custom model allows for full control of the workflow and should provide a handwritten text recognition which is open and free, to all scientists alike.

Recent additional recovery work by Phd candidate Derrick Muheki (under supervision of Dr. Wim Thiery at VUB) at Yangambi (DR Congo) will benefit from this work. Derrick will be involved in the further development of a pipeline and final processing.

Avatar
Koen Hufkens, PhD
Founder, Researcher

As an earth system scientist and ecologist I model ecosystem processes.

Related

Next
Previous