On 2/21/08, Jay R. Ashworth jra@baylink.com wrote:
On Thu, Feb 21, 2008 at 01:16:22AM +1100, Steve Bennett wrote:
Time to take this grammar and do something with it.
Build a parser with it, run it against the corpus, and see how often each individual rule pukes?
Ok. I've actually done a bit of that, but I guess I should ramp up the scale. It can be hard to detect pukage without actually generating XHTML and comparing it, though.
Generally, though, the answer is "not often". Flip through some random wikitext. You'll find that a very small number of rules amount for the vast majority of actual use. Though that may change once I have to contend with the body of templates. People don't use tables much. They don't use HTML tags or entities much. They almost never use magic links (especially PMID - wtf is that about it). They almost never use horizontal rules, HTML comments and rarely even extensions like <ref>
Steve