Re: [WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser

30 Aug 2004

      I already got it in your reply here:
Maybe you are underestimating the vast differences in implementation
between the current not-really-a-parser and what I am working on.
There is nothing wrong with using a group of templates together, but
there *is* something majorly wrong with patching together one object (a
table, in this case) using pieces from different places. It works with
the current not-really-a-parser because it takes the wiki source texts
from the templates, sticks them together somehow, and then converts them
to HTML. This kind of practice is exactly what leads to all the problems
with our current not-really-a-parser. A proper parser should parse each
template individually, and then use its parse tree in the processing of
the page that uses it.
It's great that you're working on a different way to do it thats not
just dumb-text-includes.
On Mon, 30 Aug 2004 16:55:01 +0100, Timwi timwi@gmx.net wrote:
...
Ævar Arnfjörð Bjarmason wrote:
...
Why would it ever break? I can see it getting slow because it cannot
be optimized but not breaking, all it's doing is just including one
thing after the other
{{a}} gets Template:A which contains "foo" and {{b}} gets Template:B
which contains "bar" hence
{{a}}{{b}} = foobar
Of course, this simple example would still work. But picture this:
Template:A contains:         I ''li
Template:B contains:         ke'' hamburgers
currently, {{a}}{{b}} would yield "I <em>like</em> hamburgers", but only
because it sticks the pieces together and then tries to make sense of it.
Why is this bad? Picture this:
Template:A contains:
        {|
        | nowrap
Template:B contains:
        | Text
        |}
Is the "nowrap" a table cell attribute or text in a separate cell? Does
this change depending on whether there is a newline after "nowrap"? ...
And this is just a simple example.
...
Why would this break in whatever parser you plan to implement?
Because a parser is not a converter. The current not-really-a-parser is
actually a converter: It looks out for particular syntax elements like
''these'' and turns them into <em>HTML tags</em>. This is bad because it
means that several of these conversions can interfere with each other:
    I ''like [[hamburger|hamburgers'']]

produces invalid HTML. It gets even worse when it tries to locate
{{template inclusions}} and replaces them with some other text, not
knowing what it is or how it fits into the document structure.
A real parser analyses the document's structure. It turns the wiki text
into a data structure in memory that actually bears resemblance to the
structure of the document. It creates a "heading" element where there is
a heading, instead of turning some strategically-placed equals signs
into <h#> tags.
...
The only reason i can see why that would happen is if you were to
implement some auto-completion of the table syntax. Sort of like
tidy(html) for wikisyntax and do it before things get fetched from
Template: rather than after everything has been included.
Your terminology "auto-completion" reveals that you are thinking in
terms of conversion. Don't think of it as auto-completion; for example,
if a '' has no matching '', I can tell the parser what to do
independently of what it does when there *is* a matching ''. There are
several possibilities: make an italics element (what you would probably
call auto-completion); make a text element (i.e. pretend the "''" was
actually text); or bail out saying "syntax error". Of course, we don't
want the latter. My parser currently does the second: It turns the ''
into text. I did that because this is also how the current
not-really-a-parser functions. However, I can easily change that.
In our specific case, there would be a document (a template) that has a
{| with no matching |}. What should it do? Unfortunately, none of the
three options make it work the way you have come to expect from the
current not-really-a-parser.
Timwi

WikiEN-l mailing list
WikiEN-l@Wikipedia.org
http://mail.wikipedia.org/mailman/listinfo/wikien-l

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [WikiEN-l] Templates/taxoboxes, or: why a converter isn't a parser