Steve Bennett wrote:
Ok, it's still backwards from how I would picture it:
- Come up with a solution (ie, new parser)
- See how many pages that solution fits, call it X%.
- If X% is too small, either extend the parser by adding more rules,
or updating pages.
But this is probably just philosophy at this point: I'd rather be focussing on the grammar that we want to implement, than the grammar that we don't want to implement.
Actually I think that's a good point to start. Here's my idea, which I invite you to critique:
We agree a limit for X% of articles parsed correctly, such that if > X% are correct then we say the parser is good and do some amount of hand-editing on the remaining (100-X)%. Then someone* knocks up a very rough "new wave" parser and runs a copy of, say, en: on top of it. We all try it out and see for ourselves how much stuff breaks. If too much, refine the parser and repeat. Hopefully, we eventually reach X% correctness; then we are happy and can think about how to roll it out more widely. If, however, we just can't reach X% using our optimistic two-pass approach, then we debate whether a more complex parser is necessary. If it is, the two-pass version will likely make a good basis for it, and the work is not wasted.
This plan allows us to actually do something, which is probably preferable to arguing about whether something hypothetical is doable.
Thoughts?
Soo Reams
* I would volunteer but I probably lack both skill and time.