On Tue, Jul 14, 2009 at 7:36 PM, dan nessett<dnessett(a)yahoo.com> wrote:
Hm. Sounds like an opportunity. How about Mediawiki
issuing a grand challenge. Create a well-documented/structured (open source) parser that
produces the same results as the current parser on 98% of Wikipedia pages. The prize is
bragging rights and a letter of commendation from someone or other. I suspect there are a
bunch of graduate students out there that would find the challenge interesting.
I suspect nobody's going to stand a chance without funding.
$ cat includes/parser/*.php | wc -l
11064
That's not the kind of thing most people write for an interesting challenge.
Also, you realize that 2% of pages would mean 350,000 pages on the
English Wikipedia alone? Probably a million pages across all
Wikimedia wikis? And who knows how many if you include third-party
wikis?