[Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

Brian Brian.Mingus at colorado.edu
Tue Jun 23 17:09:19 UTC 2009

2009/6/23 Samuel Klein <meta.sj at gmail.com>

> Yes, but my understanding is that while google provided part of the mbp
> data
> and scans, its continued updates to ocr since then are not being shared.  I
> would be glad to learn this was not the case...

The dataset you need to train an OCR system to be as good as theirs is the
raw images and the plain text. They aren't making it easy to get either of
those things :( They have presumably improved the software in other ways as


More information about the foundation-l mailing list