AI meets colonialism: Germany develops new research tool
Germany's Federal Archives own an important collection of documents from the colonial era. To untrained eyes, they are undecipherable. Now AI can help researchers.
Anyone aiming to do serious research into Germany's pre-WWII archives needs a particular skill: They should be able to read forms of handwriting that have since completely disappeared from everyday use in the German language.
There is Kurrent, a form of cursive writing that developed in the late medieval era, as well as different variant forms, most notably the short-lived Sütterlin. This cursive script was developed in 1911 and taught in German schools from 1915 to 1941, until it was banned by the Nazis. Afterwards, schoolchildren rather learned a handwriting similar to present-day English cursive.
Even though German speakers who grew up with Sütterlin continued to use it well into the post-war period, most Germans cannot read the letters written by their grandparents.
But now, an AI program can do just that.
A new tool has been developed by the German Federal Archives (Bundesarchiv) to help decode the different types of writing that can be found in documents from the colonial era.
Important collection that must still be worked through
Documents from this era in particular were interesting for such a project, since the German Federal Archives own a collection of around 10,000 files from the Reich Colonial Office, which was the central authority for the German Empire's colonial policy.
They "were selected because a major part of them were handwritten," the archives' press spokesperson, Elmar Kramer, told DW. This collection was also selected for the pilot program because the files from the Reich Colonial Office have already been fully digitized and are no longer subject to any user restrictions, explains project manager Inger Banse.
But most importantly, as she points out, "coming to terms with the colonial era is a focus of our entire society, and we can make a good contribution to this with this collection."
"For too long, the crimes of the German colonial era have been a blind spot in our culture of remembrance," said German Commissioner for Culture and Media, Claudia Roth, welcoming the Federal Archives' project of using specially developed AI technology "to help strengthen knowledge about this dark chapter of German history. In doing so, it is making an important contribution to coming to terms with the past."
First genocide of the 20th century
Colonization by the German Empire began at the end of the 19th century and focused mainly on taking possession of territories and establishing colonies in Africa, the South Seas and China.
Germany's colonial empire only lasted 30 years — from 1884 until the end of the First World War — but shortly after it was established, it became the third-largest colonial empire after the United Kingdom and France. And its colonial rule was particularly brutal.
Documented in the Federal Archives' collection
are dark chapters that include the Sokehs rebellion from 1910/1911 that started on Sokehs Island off the main island of Pohnpei in the Eastern Caroline Islands, presently the Federated States of Micronesia. As a consequence, the German colonial rulers applied a scorched-earth policy to hunt down the rebels and had the tribe deported from their own island in the South Seas.
Another prominent case of colonial injustice is how King Rudolf Douala Manga Bell and Adolf Ngoso Din were executed in 1914 for peacefully campaigning against the German colonial administration's measures to remove and relocate the Douala people from their homes in the littoral and southwest region of Cameroon.
Most infamously, it was responsible for the Herero and Nama genocide, known as the first genocide of the 20th century. It took place from 1904 to 1908, after the Herero and the Nama people rebelled against their German colonial rulers.
It was only in 2021 that Germany officially acknowledged committing genocide during its colonial occupation of present-day Namibia.
Early adopters of AI
That same year, the Federal Archives started developing an AI tool to make their colonial-era records more accessible. That was before the so-called new AI era began, when ChatGPT and other large language models were publicly released, turning artificial intelligence into an omnipresent topic of discussion.
"We find it important to always be part of the latest developments," explains Elmar Kramer, about the Federal Archives' pioneering role in the domain. "That's why AI has been a topic of interest for us for a few years already. In this case, we can say that we are now bringing together one of our oldest holdings and one of the newest technologies, if you will: AI meets colonialism."
One needs to keep in mind that the AI not only needs to be able to decode Sütterlin, but also sometimes "quite sloppy, scribbled writing," points out Kramer. And beyond "the different handwriting in general, we also have printed and typewritten material. There is a lot of crossing out, but there are also very clean pages," adds Inger Banse, which is why they separated the documents in three different categories, according to the complexity of the material on the page.
"We looked at how the model behaves in these different categories," explains Banse. They trained the model by manually checking and improving, line by line, the AI's transcription results of about 170 pages of varied material.
Banse says that they have now reached a point where the AI model provides an acceptable rate of accuracy in its transcriptions of even the most complex material.
Achieving perfection in the transcriptions would have required a disproportional time investment, says Banse, quoting the Pareto principle according to which the hardest 20% of the optimization process requires 80% of the efforts. "So at some point, we had to draw the line," she explains. Instead, they rather developed a more lenient search engine that allows a broader range of results to be obtained.
And now that the Federal Archives' AI model has been trained to decode Kurrent, it opens a whole field of possibilities for other German-language archives. At the moment, however, it is still a pilot project specifically designed for this collection. It can be consulted on site, in the archives' research hall in Berlin-Lichterfelde, and it will soon be made available online.
DW