[Wikitech-l] Re: Proposal for deprecation of syntax features for 1.5

13 May 2005


      On 5/13/05, Timwi timwi@gmx.net wrote:
...
Andrew Rodland wrote:
...
I had guessed that it might find some use in fr -- It's too bad to 
hear that it's "widely" used. However, I should note that it's not 
_required_.
Of *course* it's required. By saying it isn't, you're thinking too
technically. Humans aren't like that, humans just want to write their
text and not ugly tags and syntax elements just for a single apostrophe.
I didn't know that it was too technical of me to think that "required"
should mean "required".
...
...
<nowiki> resolves the ambiguity nicely.
Again -- "nicely" only in the technical sense, but not in the human
usability sense.
It's nice in the human-usability sense that you can say exactly what
you mean, instead of having to guess how it's going to be interpreted
(speaking of which, can you show me any document, preferably in
English, which explains this behavior?). I agree that <nowiki> is
rather unwieldy, but that in itself doesn't make the existing solution
a good one.
...
...
The workaround, on the other hand, does bad things to the language,
and makes the implementation of a more advanced parser exceedingly 
difficult.
You are making two assumptions here that are both false.
Firstly, you are assuming that the language becomes more ambiguous this 
way. This is false, because by handling this case explicitly, I have 
actually made it *less* ambiguous. Previously, it was only a side-effect 
of the way regular expressions match text that three apostrophes were 
rendered as <i> followed by an apostrophe. Now I have specifically 
written code to define three apostrophes to mean "an apostrophe followed 
by open-italics, unless there is another triple-apostrophe in the line, 
in which case it's open-bold". No ambiguity there.
How does "a side-effect of the way regular expressions match text"
turn the markup for bold into an apostrophe and the markup for italic?
...
The second assumption you are making (explicitly, even) is that it is 
more difficult to implement, when in fact you really just mean that you 
found it harder because it is not the way regular expressions normally 
work (and because you find the behaviour confusing because you don't 
normally think of French). I didn't find this particularly difficult to 
do -- neither in the current parser, nor in flexbisonparse.
If you had read my messages, you might have noticed that my reasoning
was based neither on anything to do with regexes at all, nor on
linguistic prejudice, but on a simple consideration. It is impossible,
at the time that the parser sees a ''', to resolve what type of token
it is, without looking ahead to the end of the line (an unbounded and
unknown distance away). _That's_ what I called ambiguity. The
alternative is that '' means '', and ''' means '''. My current feeling
is that the "cleanest" solution to the problem would be to introduce a
separator which produces no output, but breaks up tokens; then you
could write (with ∙ as sequence operator) ' ∙ '', '' ∙ ', ' ∙ ''', '''
∙ ', and even '' ∙ '' all you want, with no ambiguity to the parser
and no considerable hassle to the user. "Otherwise how is the computer
supposed to know what you mean?" is an argument anyone can understand.
The existing code in doQuotes() simply operates by logically
_separating_ the consecutive quotes, so automatic conversion wouldn't
be overly taxing, nor time-critical. I haven't seen flexbisonparse,
but the reason it's "easy" in the current parser is, as I'm sure you
know, that it makes N passes over the entire string, with the benefit
of unlimited lookahead. You're right that it _can_ be done -- I think
I've got it down. But it's still not pretty. And it's still, I think,
a violation of expectations. Nonetheless, I'll shut up about it.
Andrew

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Re: Proposal for deprecation of syntax features for 1.5