On Tue, Jun 23, 2009 at 1:09 PM, Brian <Brian.Mingus(a)colorado.edu> wrote:
2009/6/23 Samuel Klein <meta.sj(a)gmail.com>
Yes, but my understanding is that while google
provided part of the mbp
data
and scans, its continued updates to ocr since then are not being shared.
I
would be glad to learn this was not the case...
The dataset you need to train an OCR system to be as good as theirs is the
raw images and the plain text. They aren't making it easy to get either of
those things :( They have presumably improved the software in other ways as
well..
WTF GOOG?
It's almost like they're trying to run a business or something.