Re: [Wikitech-l] Re: Test my lex/yacc parser!

29 Sep 2004

On Wed, 29 Sep 2004 22:36:36 +0200, Timwi &lt;timwi(a)gmx.net&gt; wrote:
...
  Sorry for the late reply. 
No probs: it's about 3 months quicker than my replies to messages from
friends seem to end up being... :-/

...
   Not really. We can still recognise redirects with a
regexp (or anything
else in PHP) before passing the page to the parser. 
 But why make that a special case? Why say "before using the nice
 eficient real parser, use a not-a-parser to check for the #REDIRECT
 directive, and have it do some voodoo" Far better to just have the
 parser recognise "#REDIRECT" (and any variants anyone wants) and
 output a parse tree with a special redirect node.  
 Why is that "better"? I prefer my suggestion because: 
Well, why is the new parser in general better? Because it separates
out parsing into a proper parser rather than mingling at as
special-case checks everywhere, and thus makes the software easier to
maintain. This is parsing too, so lets put it in the parser, not in
special-case checks everywhere.

...
  * it might be more efficient because it means that we
don't have to
    invoke the external parser just to find out whether what the user just
    submitted is a redirect or not 
We wouldn't invoke the parser *just* to do that. We'll still need a
parse-on-save for things like updating link tables; if it comes back
saying "this is a redirect", we use that information to set the
is_redirect flag.

...
  * it means the parser needn't be programmed to
recognise redirects
    (makes the code simpler) 
OTOH, the code *outside* the parser will have to, and more: if you
want to avoid putting redirects through the parser at all, that code
has to do the following:
1] spot that the page is a redirect
2] determine what it is a redirect to
3] find all the article's entries in the links tables (links,
brokenlinks, categorylinks) and delete all those except the target of
the redirect.
1 has to be done one way or another anyway; 2 and 3 would be natural
side-effects of parsing *any* page before save (unless I've completely
misunderstood the current code structure, or you have some magic way
of avoiding this), so why duplicate them in special case code just for
redirects?

...
  * it means we can assume that parse trees will be
articles. Otherwise
    all output code would have to consider this special case. How should a
    class that is supposed to output LaTeX code react when you give it a
    redirect? 
No, all output code will need a way of rendering a <redirect> element
- that's no more a special case than any other element. If you had an
output system that was guaranteed *never* to deal with an
"&redirect=no" request, you could simply leave this output undefined;
otherwise, the parser and output will have to do *something* with the
result, and I've always hated the 'misinterpret it as a numbered list'
approach.

I imagine an outputter to something static like LaTeX would want to either: 
* always follow redirects to their destination (so it won't see any
actual redirect pages anyway; except for double-redirects, but they're
broken anyway); note that this has nothing to do with the parser
whatsoever, since it is a navigation issue, and shouldn't require
access to the page's content at all (currently, the page's text *is*
accessed, somewhere in Title.php I believe, but it needn't be)
or:
* render redirects as cross-references [e.g. "Such and such: See So
and so"] (in which case having the parser output some explanation that
this is a redirect would be very helpful indeed).

...
   Actually, I
have to admit I had no idea how difficult it would be, but
 I assumed it would mean having at least a compiler, if not a
 compiler-compiler and a whole load of other tools. Editing PHP doesn't
 need that kind of thing, and the way its designed now, you needn't
 notice your editing code.  
 That is also true. But I really don't see why it's so hard to have a
 compiler? 
No, it's not very hard; it's just harder than not needing one.
Remember that most web-hosting is accessed by FTP, not SSH; any
compilation has to happen on a different system, usually a home PC;
with any luck, things will end up binary compatible with the server.
So: plain-text options: great; needs-a-c-compiler options: slightly
awkward, but perhaps a necessary evil; needs-a-compiler-compiler
options: you're no longer an administrator, you're a developer.

...
   If it were
possible to only require a c compiler, it would certainly
 be a favour to other admins running MediaWiki. It's going to be
 annoying enough for some of them to have to deal with a binary part as
 well as PHP.  
 As I mentioned before, it is *not* necessary for anyone to "deal with"
 anything. People can continue to use the old not-a-parser if they want! 
For how long? As new features come along, what are the chances that
they will be back-ported to the not-a-parser? How many versions will
be released before the not-a-parser is completely incompatible with
large parts of the codebase? I'm not saying this whole thing is a bad
idea - I think it's extremely sensible - just that we do need to
minimise the hassle for other MediaWiki admins who want to use the
latest version, but don't want to play the role of developer.

BTW, did anyone ever find a compiler-compiler that could output PHP?

-- 
Rowan Collins BSc
[IMSoP]

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: [Wikitech-l] Re: Test my lex/yacc parser!