Re: [Wikitech-l] A Modest Proposal on grammar and parsers

10 Nov 2007


      On 11/10/07, Jay R. Ashworth jra@baylink.com wrote:
...
Specifically, I was proposing defining the combinations of the current
parser tokens which are difficult to interpret (primarily, combinations
of bold, italics, and apostrophes), and determining how frequently they
appear in the live corpus.
This will delimit the *actual* size of the Installed Base problem, in
both meanings I gave it earlier.  If in 2 megapages, there are only 100
occurrences, you fix them by hand.  If 1000, you grind a robot.  If
500K, then you take a different approach to the overall problem.
Ok, it's still backwards from how I would picture it:
1) Come up with a solution (ie, new parser)
2) See how many pages that solution fits, call it X%.
3) If X% is too small, either extend the parser by adding more rules,
or updating pages.
But this is probably just philosophy at this point: I'd rather be focussing
on the grammar that we want to implement, than the grammar that we don't
want to implement.
Steve

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A Modest Proposal on grammar and parsers