Hm. Sounds like an opportunity. How about Mediawiki issuing a grand challenge. Create a well-documented/structured (open source) parser that produces the same results as the current parser on 98% of Wikipedia pages. The prize is bragging rights and a letter of commendation from someone or other. I suspect there are a bunch of graduate students out there that would find the challenge interesting.
Rationalizing the parser would help the development process. For the 2% of the pages that fail, challenge others to fix them. They key is not getting stuck in the "we need a formal syntax" debate. If the challengers want to create a formal syntax that is up to them. Mediawiki should only be interested in the final results.
--- On Tue, 7/14/09, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
They're supposed to pass, in theory, but never have. Someone wrote the tests and the expected output at some point as a sort of to-do list. I don't know why we keep them, since they just confuse everything and make life difficult. (Using the --record and --compare options helps, but they're not that convenient.) All of them would require monkeying around with the parser that nobody's willing to do, since the parser is a hideous mess that no one understands or wants to deal with unless absolutely necessary.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On Tue, Jul 14, 2009 at 7:36 PM, dan nessettdnessett@yahoo.com wrote:
Hm. Sounds like an opportunity. How about Mediawiki issuing a grand challenge. Create a well-documented/structured (open source) parser that produces the same results as the current parser on 98% of Wikipedia pages. The prize is bragging rights and a letter of commendation from someone or other. I suspect there are a bunch of graduate students out there that would find the challenge interesting.
I suspect nobody's going to stand a chance without funding.
$ cat includes/parser/*.php | wc -l 11064
That's not the kind of thing most people write for an interesting challenge.
Also, you realize that 2% of pages would mean 350,000 pages on the English Wikipedia alone? Probably a million pages across all Wikimedia wikis? And who knows how many if you include third-party wikis?
wikitech-l@lists.wikimedia.org