Thursday, April 07, 2011

Bar-Ilan Responsa Project upgrade

THE BAR-ILAN RESPONSA PROJECT is getting a technological upgrade:
NovoDynamics'(R) VERUS(TM) OCR Supports Responsa Project
World's Most Comprehensive Searchable Digital Library of Judaic Studies


By: Marketwire .
Apr. 6, 2011 08:30 AM

ANN ARBOR, MI -- (Marketwire) -- 04/06/11 -- NovoDynamics®, Inc., a recognized leader in providing advanced information capture and analysis solutions, today announced that VERUS™, the company's advanced global language optical character recognition (OCR) solution, is being utilized by Israel's Bar-Ilan University Responsa Project to expand its library of Judaic studies.

Bar-Ilan has assumed a leadership role in bringing Jewish studies into the 21st century. Its Responsa Project is the most comprehensive and easily searchable digital Jewish library in the world with the largest electronic collection of keyed-in and proofread texts of Jewish literature. The texts cover thousands of years of writing, from the Bible and the Talmud to current volumes. There are more than one billion bits of data now available on CD, flash drive and the internet, which is widely available to students, researchers and the public.

The Responsa Project began 40 years ago and remains active today as researchers have embarked on a 10-year work plan to add another 40,000 books and 10 million bits of data to the library. Relying heavily on optical character recognition technology (OCR), actual letters of texts will be copied from the page to the computer memory, enabling users easy access to information. NovoDynamic's VERUS™ OCR for Middle Eastern languages is helping researchers overcome some of the difficulties in recognizing the ancient Hebrew texts that are an integral part of the Responsa Project.

[...]
I'm going to hazard a conjectural emendation and suggest that we read 10 billion bits instead of 10 million in the third paragraph. Ten million bits would be about 1/6 of one of those old 8-megabyte floppy disks, which is not very impressive for a 10-year project and can't be right. With 40,000 books, 10 billion bits would average out to 31,250 bytes/letters per book, which makes more sense. It's a little embarrassing for NovoDynamics that such an elementary error ended up in their press release.

Some background on the Bar-Ilan Responsa Project is here.