[Foundation-l] Info/Law blog: Using Wikisource as an Alternative Open Access Repository for Legal Scholarship

Michael Snow wikipedia at verizon.net
Tue Jun 23 17:44:56 UTC 2009

Brian wrote:
> 2009/6/23 Samuel Klein <meta.sj at gmail.com>
>> Yes, but my understanding is that while google provided part of the mbp
>> data
>> and scans, its continued updates to ocr since then are not being shared.  I
>> would be glad to learn this was not the case...
> The dataset you need to train an OCR system to be as good as theirs is the
> raw images and the plain text. They aren't making it easy to get either of
> those things :( They have presumably improved the software in other ways as
> well..
Well, when your shorthand uses their stock ticker symbol, your argument 
has already been coopted.

--Michael Snow

More information about the foundation-l mailing list