LONDON (Reuters) - A good place to see the true frontline in Microsoft’s battle with Google is deep in the bowels of the British Library in London.
In a room under the library’s red brick building near St Pancras railway station, a Microsoft-funded team is working 14 hours a day to scan shelf upon shelf of books.
Launched a year ago, the project will scan 25 million carefully preserved pages of the British Library’s 19th-century archive, around 100,000 books, over the next two years. Together with collections from other libraries including Yale and Cornell University, the pages are destined for Live Search Books, Microsoft’s answer to Google Book Search.
It’s a field where Microsoft is playing catch-up with Google, whose mass digitization project already has around one million books online, 10,000 publishers and almost 30 major world libraries on board.
If the inventor of the PC operating system’s recent bid for Yahoo is an effort to glean more online advertising, this painstaking copying points to a deeper flaw behind the weakness in advertising: search.
“When we are able to do a better job of answering people’s questions we are going to build loyalty and then ultimately increase the size of our user community,” said Cliff Guren, Microsoft Director of Publisher Evangelism, a title the company hopes will attract libraries and publishers to its scheme.
“By doing this we increase our query share which helps us increase our advertising rates and that’s how our business makes money,” he said. Query share is the percentage of individual consumer Web search requests attracted by services like Google and Microsoft.
Internet audience measurement firm comScore estimates only 4 percent of Internet searches worldwide use Microsoft’s engine, against 77 percent through Google’s. Yahoo, the second-largest Web search provider, has a 16 percent share.
But Microsoft’s problem is not just that Google is bigger. As search technology advances, the real headache for the company whose software currently drives most of the world’s computers is that Google has its eye on a more sophisticated prize.
THE WORLD’S INFORMATION
Google’s mission is “to organize the world’s information and make it universally accessible and useful” and to do that it is hoovering up not only books, but any data it can grab. An example is how anyone using a Google e-mail account is invited never to delete anything: that’s data Google can use.
Its initiative in mass digitization stems from a vision of Internet search as a tool to hunt not only the words we type in, but also the things we might have meant to type in, said Colin Gillis, an Internet analyst at brokerage Canaccord Adams.
Like many concepts in the Internet industry, this idea can go with an arcane title: the Semantic Web. It’s a concept already partly realized on some Web sites, for instance when search engines recognize a common typing error.
To achieve this needs data.
“The important component of the Semantic Web is mass digitization,” Gillis said. “You have to have all the little bits of data, all the little pieces. From comprehensive data sets come deeper insights.”
Even though the list of libraries and publishers on both sides in the Google-Microsoft digitization race is growing fast, the challenge goes far beyond the world’s books. Jason Hanley, who manages Google’s Book Search partnerships in the UK, said there was no competition for exclusivity between his company and Microsoft.
“I wouldn’t say there is any arms race in terms of picking off certain people to work with and excluding others,” Hanley said. “That wouldn’t make any sense as we’re trying to be comprehensive. These things are massive undertakings.”
Neither company will say how much they are investing, but Microsoft’s Guren said it was “a very substantial financial commitment”. The projects are strategic, said Danny Sullivan, Editor-in-Chief of SearchEngineLand.com.
Sullivan said Google sets the tone by spending large sums of money to develop new businesses without rushing to make money back. Books is one example. It undertakes many “pie-in-the-sky” projects betting some will become big money-spinners once they are popular, allowing Google to sell advertising alongside them.
“Microsoft and Google are both building libraries and the way you get the books off the shelves at these digital libraries is through their search engine. Their search engine is an electronic librarian,” Sullivan said.
“The battle shouldn’t be over getting the books, the battle should be over who is building the best librarian.”
FLOTSAM AND JETSAM
Latecomer Microsoft says it is taking a selective approach. At the British Library, books destined for the scanning machines include editions of classics such as Charles Dickens’ Bleak House and Daniel Defoe’s Robinson Crusoe.
“Google’s general mission has been to organize the world’s information. With that they bring in the flotsam and the jetsam, the good with the bad because they’re casting their net far and wide,” Microsoft’s Cliff Guren said.
“We are taking a much more focused approach to figuring out what content we need to drive user satisfaction.”
But while Microsoft may be pickier, Google’s Hanley said choice is not relevant: “Something you class as useless may not be something that somebody else may class as useless,” Hanley said. “One man’s meat is another man’s poison.”
Canaccord Adams’ Gillis said that is the point of the project.
On the Semantic Web, data would be ‘understood’ and could be linked up, automated and combined by meaning, regardless of its format, as search engines make connections and associations similar to the way the human brain does, and more.
Quantity of data is crucial to that goal, he said. The more data there is, the more connections can be made and the better and more interesting they are.
“Mass digitization will help represent what I call the truth -- a more complete picture of who we are, and who we are as a society,” Willis said.
Editing by Eric Auchard and Sara Ledwith
Our Standards: The Thomson Reuters Trust Principles.