[Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

Anthony wikimail at inbox.org
Tue Jun 23 19:52:21 UTC 2009

On Tue, Jun 23, 2009 at 1:09 PM, Brian <Brian.Mingus at colorado.edu> wrote:

> 2009/6/23 Samuel Klein <meta.sj at gmail.com>
> > Yes, but my understanding is that while google provided part of the mbp
> > data
> > and scans, its continued updates to ocr since then are not being shared.
>  I
> > would be glad to learn this was not the case...
> >
> The dataset you need to train an OCR system to be as good as theirs is the
> raw images and the plain text. They aren't making it easy to get either of
> those things :( They have presumably improved the software in other ways as
> well..

It's almost like they're trying to run a business or something.

More information about the foundation-l mailing list