Re: [Wikitech-l] A Modest Proposal on grammar and parsers

10 Nov 2007


      On Sat, Nov 10, 2007 at 05:30:53PM +1100, Steve Bennett wrote:
...
On 11/10/07, Jay R. Ashworth jra@baylink.com wrote:
...
They certainly are, if no one ever examines the corpus.  I've just
banged up a new server in the office, if no one else who already *has*
a mirror of, say, en.wp set up steps up, I may do the testing myself,
in my Copious Free Time.
What are you proposing, autobotically replacing ''' with **?
Specifically, I was proposing defining the combinations of the current
parser tokens which are difficult to interpret (primarily, combinations
of bold, italics, and apostrophes), and determining how frequently they
appear in the live corpus.
This will delimit the *actual* size of the Installed Base problem, in
both meanings I gave it earlier.  If in 2 megapages, there are only 100
occurrences, you fix them by hand.  If 1000, you grind a robot.  If
500K, then you take a different approach to the overall problem.
(To USAdians, this is referred to as "Dropping back 10, and punting".)
Cheers,
-- jra
-- 
Jay R. Ashworth                   Baylink                      jra@baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] A Modest Proposal on grammar and parsers