On Tue, Jul 14, 2009 at 7:36 PM, dan nessettdnessett@yahoo.com wrote:
Hm. Sounds like an opportunity. How about Mediawiki issuing a grand challenge. Create a well-documented/structured (open source) parser that produces the same results as the current parser on 98% of Wikipedia pages. The prize is bragging rights and a letter of commendation from someone or other. I suspect there are a bunch of graduate students out there that would find the challenge interesting.
I suspect nobody's going to stand a chance without funding.
$ cat includes/parser/*.php | wc -l 11064
That's not the kind of thing most people write for an interesting challenge.
Also, you realize that 2% of pages would mean 350,000 pages on the English Wikipedia alone? Probably a million pages across all Wikimedia wikis? And who knows how many if you include third-party wikis?