Re: [Wikitext-l] Cunningham's exploratory parsing

12 Jul 2011

      I'm probably mis-remembering that... I probably was the one disappointed 
in it being a translation to HTML. Still I understand why you did it 
that way.
It's kind of amazing how we all have these projects we call parsers, and 
then they all do completely different things. :)
On 7/11/11 11:01 PM, Karl Matthias wrote:
...
I'm surprised, Neil,
that you think Ward was disappointed with this as he was always
supportive of our efforts and indeed introduced us to Peg and spent some
time helping us get into writing grammars and understanding the
pitfalls.  I'm sorry it doesn't solve the problem you guys have off the
shelf, but hopefully it helps open some doors, or at least serves as a
model of how a grammar can be written.
If I can be of help, please just give me a shout.
Cheers,
Karl
On Tue, Jul 12, 2011 at 4:35 AM, Neil Kandalgaonkar <neilk@wikimedia.org
mailto:neilk@wikimedia.org> wrote:
Trevor & I talked with him extensively about this. BTW, around here,
he's just Ward. :)

He too was disappointed that his team wrote rules to directly transform
wikitext into HTML.

The parse-everything-in-Wikipedia thing isn't quite what it sounds like.
If I recall correctly it works like this:

As part of his job at About.us, he was really looking for patterns of
Wikitext that he could use to snag business information. One target was
the Infobox on Wikipedia. So, the tool was a way of cataloging the
various ways that people structure an Infobox template.

Because he wrote this in C, he added rules to the grammar to discard
information in favor of keeping a data structure of constant size.
That's mostly what the the <<< >>> in the grammar mean. Anyway, this
then serves as a sampling of the majority of the structures one is
interested in. The more rules you write, the more "unknown" stuff falls
into the fixed size of structures that are unparsed. IIRC he agreed it
might not be so useful if you were writing a grammar for PHP or JS (I
assume the same is true for Python).

On 7/11/11 5:24 PM, Erik Rose wrote:
>  On Jul 11, 2011, at 5:17 PM, Brion Vibber wrote:
> > We are however producing a different sort of intermediate
structure rather than going straight to HTML output, so things won't
be an exact match (especially where we do template stuff).
>
>  Nor are we going straight to HTML, which is one reason we didn't
steal this stuff. :-)
>  _______________________________________________
>  Wikitext-l mailing list
>  Wikitext-l@lists.wikimedia.org <mailto:Wikitext-l@lists.wikimedia.org>
>  https://lists.wikimedia.org/mailman/listinfo/wikitext-l

--
Neil Kandalgaonkar  |) <neilk@wikimedia.org
<mailto:neilk@wikimedia.org>>

_______________________________________________
Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org <mailto:Wikitext-l@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l
-- 
Neil Kandalgaonkar  |) neilk@wikimedia.org

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: [Wikitext-l] Cunningham's exploratory parsing