BEERSHEBA, Israel (Reuters) - Researchers in Israel say they have developed a computer program that can decipher previously unreadable ancient texts and possibly lead the way to a Google-like search engine for historical documents.
The program uses a pattern recognition algorithm similar to those law enforcement agencies have adopted to identify and compare fingerprints.
But in this case, the program identifies letters, words and even handwriting styles, saving historians and liturgists hours of sitting and studying each manuscript.
By recognizing such patterns, the computer can recreate with high accuracy portions of texts that faded over time or even those written over by later scribes, said Itay Bar-Yosef, one of the researchers from Ben-Gurion University of the Negev.
“The more texts the program analyses, the smarter and more accurate it gets,” Bar-Yosef said.
The computer works with digital copies of the texts, assigning number values to each pixel of writing depending on how dark it is. It separates the writing from the background and then identifies individual lines, letters and words.
It also analyses the handwriting and writing style, so it can “fill in the blanks” of smeared or faded characters that are otherwise indiscernible, Bar-Yosef said.
The team has focused their work on ancient Hebrew texts, but they say it can be used with other languages, as well.
The team published its work, which is being further developed, most recently in the academic journal Pattern Recognition due out in December but already available online.
A program for all academics could be ready in two years, Bar-Yosef said.
And as libraries across the world move to digitize their collections, they say the program can drive an engine to search instantaneously any digital database of handwritten documents.
Uri Ehrlich, an expert in ancient prayer texts who works with Bar-Yosef’s team of computer scientists, said that with the help of the program, years of research could be done within a matter of minutes.
“When enough texts have been digitized, it will manage to combine fragments of books that have been scattered all over the world,” Ehrlich said.
Editing by Jon Hemming
Our Standards: The Thomson Reuters Trust Principles.