This thread is quickly becoming quite tl;dr - but I think the discussions are valid and useful

Maybe we could break some of these desperate topics up a bit?

- Trevor

On Tue, Feb 7, 2012 at 8:35 AM, Pavel Tkachenko <proger.xp@gmail.com> wrote:
Platonides,

2012/2/7 Platonides <platonides@gmail.com>:
> they are not imposed by the wikitext in any way, and could be
> removed today if wished:
Then why are they still there?

As I have said in my previous message I am ready to break down any
piece of markup that you want. Templates are just most crazy part of
current wikitext.

>> 2. What is "pp-semi" why it "move-indef"?
> Names given by the users.
It's funny that users give names that others don't understand. Even
those who are "technically proficient" but not part of "the elite".

> The pipe in {{About|the planet}} can look odd, but the pipe
> at the beginning of the line looks natural. It seems like some kind of
> continuation of the {{.
Apart from just "looking natural" argument I would put "crucial need".
I think quotes look natural after template parameter names' - but they
have no use and will duplicate existing functionality (a parameter
cannot last after the beginning of next parameter, for instance).

In other words why do we need a pipe if we assume that a template can
have two modes: inline and block. Inline contain no line feeds, block
contain line feeds before each of their parameters.

But first let's think if template should have 'mixed' mode. This will
make things more complex without any need and it will make source look
messy because it'll depend on the user if he wants to put that
particular parameter on a separate line or not (imagine two guys: one
of a "VT100" terminal with 80 characters in column and one with a
latest 30" plasma display). It will also require markup to provide
additional means of separating parameters if line feed can be trusted
no more.

It's time to sharpen our Occam's Razor or the life will do this.

> Space separated arguments are more readable for the casual editor, but
> normal editors would have a harder time to find out what's the template
> and what the parameters.
You're thinking in terms of parameters and my point was to discard all
of this stuff and think in terms of human writing.

What looks more natural - {{About|Earth|the planet}} or {{About Earth,
the planet}}? Since when our handwriting produces pipes instead of
spaces (and not even all of them?).

> Also, there are colons as parameters. How would you write as the parameter
> the article [[Gypsy: A Musical Fable]] or [[Batman: Year One]] ?
> By banning ':' in titles?
Have I said something about colons and links? Links are fine with
colons or any other symbols.

But if we're touching this pipes in links are not that intuitive
either. Pipes are actually not present on many keyboard layouts but
even apart from that it's more natural to use an equality sign. Or
double, for the purpose of text markup.

In fact, doubling a symbol is a great way of both differentiating it
from misoperations and making easily recognizable by human eye. Space
can be used for the same purpose (as a delimiter).

Let's be concrete:
1. Links are wrapped in [[ and ]].
2. Links may optionally have titles. A title is separated from link
URL (or local page name) with a space.
3. Those links which contain spaces in their address can be given
title after double equality sign.
4. Finally, in very rare cases when both space and equality symbol is
necessary a special markup-wise (!) escape symbol can be used.

Note how we tie up the links between markup at a whole and its
particular tokens:
1. Markup-wise, tokens (links, headings, formatting, etc.) are created
using double symbols. Say, [[ and ]] for links, == and == for
headings, **bold**, __underline__, etc. This is the cornerstone.
2. Markup-wise, there is a single way of escaping markup. Escaping
must be particularly well-thought because it will be the trickiest
part even for experienced editors.

Currently wikitext uses terrible "<nowiki>stuff</nowiki>" but it
doesn't always work and HTMLTidy comes in handy with its &lt; and
&gt;. And some places (such as link titles) cannot be escaped
altogether.
Of course, given this approach we are forced to use an inconvenient
symbol - pipe - which is unlikely to occur in normal text and thus
needs no escaping 99% of cases.

But let's put this from bottom-up ot top-down. Think about a
good-looking, rarely-used symbol that we will use for escaping... I'm
sure the amount of texts Wikipedia has it's easy to conduct a research
to determine that symbol but in this example I'll pick tilde (~), I'm
actually using it for that purpose in my home-made markups.

What follows is the complete list of cases:
1. [[Link]]
2. [[Link Title]] = [[Link|Title]] now
3. [[Space link==Title]] == [[Space link|Title]] now (pipe needs
layout change while ==' doesn't)
4. An URL containing '==' in its query part:
[[http://weird.url/?k~==v]]. I've put tilde before '==' to prevent it
from being treated as an address/title separator. Since there's no
other separator - link has no caption. How wikitext handles links with
pipes? Are they banned?
5. Local page name containing tilde: [[~ (Tilde)==Title]] - nothing
breaks here because tilde is only special when it escapes something...
and space isn't something to be escaped, so tilde is treated as normal
text.
6. Extreme case: an URL containing both tilde and double equality sign
which originally (in browser address bar) looks like:
http://url?k~==v. In our link we will simply triple the tilde - making
first tilde to escape itself and last tilde - to escape the separator:
[[http://url?k~~~==v]].
7. For the sake of completeness, most extreme case: URL with ~== and
we've got to specify a title. We don't have to invent a wheel - simply
put the title after a space and properly escape the URL as in case #6:
[[http://url?k~~~==v Title]].

Now if you think "wow. that's a particular mess, nobody is going to
understand and use this syntax". But before that note that only the
first 3 cases are standard - others are exceptions which still
increase their complexity gracefully according to the task.

For example, how wikitext will handle links containing pipes? hard to
say, probably [[&#x7c;]]. <nowiki>? [[&lt;nowiki&gtl]].
&lt;nowiki&gt;? [[&amp;lt;nowiki&amp;gtl]].

As it is demonstrated the above system scales well and thus there is a
space for improving wikitext, even if not using exactly my scheme.

A best thing about this is that once you're memorized the above 2
fundamental rules you can apply them anywhere. Markup can be escaped
using tilde: ~[[no more a link]]. Associative (aka definition) lists
can be uniformized:
 = Definition Value
 = Definition with spaces == Value
 = Definition of ~==, now its == Value

And so on. Scalable.

> No. We have "Earth, the planet"!
You mean that a template cannot put "the" in front because some
planets have their name without "the"? If so, they are not planets
(nebulas, satellites, etc.) and a different template must be used. And
if it's used the machine can handle language peculiarities.

> You will need the proper name in the infobox, such as "Felis silvestris
> catus", even if the article is just called "Cat".
Yes: {{About Felis silvestris catus, cat}}

>> What kind of false positives are we talking about? Will any sane
>> individual spend his precious time not editing but preparing to edit
>> this mess?
> I think they copy and paste, then fill the fields. Which is a good way
> of learning as they encounter it.
And the best way is to create/write things from scratch on our own.
Where have you learned programming, in the class copy-pasting lines
from blackboard to your notebook or in the office actually hitting the
keys?

Many things can be hidden under copy-paste approach, there's even a
notion "code monkey". I believe if wikitext continutes the path of STL
there will be "wiki monkeys", provided that WMF gets commercial (which
is a last thing I want to see).

But only syntax that hides nothing and is crisp to its bones can be
called fair. Only syntax that doesn't require any "templates" that you
just "copy" and "feel"... sorry, "fill in".
And of course, there are cases when templates (I mean wikitext {{ and
}}) with parameters are of great help and there are cases when
parameters with their pipes and whistles are redundant. Just like
life.

> Well, just by looking at it I have no idea what those temperatures are :)
You mean that "| max_temp_1  = 331&nbsp;K<ref name=asu_highest_temp/>"
gives you more ideas?

> What if they were in a different order?
So the machine cannot sort them out and determine which is "max"?

> 184 and 331 are probably some kind of limits, but what's that 287.2?
> Some kind of boilding point?
Hm, you might be right on this one and that it's better to have two
template parameters: "temperature" and "temperature mean". This is
what discussion is for - finding rough edges, right.

> The goal of wikitext is to make html editing easy.
HTML editing? I thought wikitext was about text editing. Why not edit
HTML using HTML?

> HTML only needs a few special characters: <>&;=" but it's bothersome.
And 4 of them are absent on my native layout.

> We define that * is a bullet and serves to make lists:
> It's easier to type, and looks good.
Completely agree, this is a human-readable markup.

> We define # as the equivalent for numbered lists. Note that there's no
> usage of # for numbers in many cultures, so that's less 'visual' there.
And this is to be refactored adding features along the way. Let's
write ordered lists using digits!
1. One
2. Two
3. Three

The Japanese have their own kanji for numbers but they use Arabic
digits sometimes as well.

Oh, I hear your thoughts - "items are moved/removed and the order is
gone". Sure but machine can help us out:
1. One
1. Two
1. Three

We use a little trick here: since ordered items with identical marker
value are useless in human texts the markup can use them to represent
automatically ordered markers.
But even if someday we need two identical markers this can be fixed
using some clean syntax, such as:
 1. First
 1#1 First

Just as with links and [[eq==signs]] above our syntax gradually
increases complexity. The person gets crazy so does the markup - but
not before the user.

The added features of this approach are:
1. Lists now support, say, 6 marker types: 1. (digits) 01.
(zero-padded digits) a. (lower-alpha) A. (upper-alpha) i.
(lower-Roman) I. (upper-Roman)
2. Lists now can have markers of any value:
 3. Third
 2. Second
 1. First
3. List types can be mixed:
 1. Digit
 i. Roman
 a. Alpha

> But each feature requires new symbols, and when you
> look at those available on every layout, you get *very* limited...
This is far from truth. During last years I have developed for my
projects a markup that generally surpasses MediaWiki's and uses just
common symbols. I will briefly summarize it in the end of this
message.

> For example, I could decide to list imagemaps as `Image1? `Image2?...
> (grave and acute), but oh, many keyboards don't have both accents.
Obviously, you cannot get a symbol for each and every particular
markup case. But you don't have to. Compare how often will you use
imagemaps and, say, highlighted PHP code. Right now both are pretty
lengthy and angular in wikitext and the former (less used) is even
shorter:

<imagemap>
...
</imagemap>
10 + 11 symbols.

<source lang="php">
...
</source>
19 + 9 symbols.

Do you see my point?

> And obviously, you can't use something that would easily appear in a
> normal text (or you start defining escape codes which are uglier, too).
Nah, escape codes come from C, don't forget it's not a very friendly
language. Instead of escape codes one can use "quoting" (in Pascal
style) - I have already touched tilde symbol above.

But I agree that markup that relies on escapes/quoting of any kind is
not fair. Escapes by definition are exceptions and cannot overwhelm
the common rule.

> How do you type the *content* of the references?
As a title: [[*http:// My reference goes here]]. Or, better: we can
use the same syntax for footnotes: [[*My footnote]] and references
will be removed. So two types of footnotes: inline [[*text with
''markup'']], and block which start with [[*, then line break, then
any kind of markup (including line breaks, more footnotes, links, even
headings and lists), then line break, closing ]] and another line
break. Uniform and consistent yet powerful and flexible.

> But when you want to "go further", you start being limited.
I believe we will not if we start thinking in term of simplicity, not
features - the former will give features as demonstrated above but if
we focus on the latter it will force it out.

> Just because that's what the underlying html uses.
This thinking is the problem - it is attached to particular use case.
"We will use HTML 5 DOM just because we won't need to transform it
when rendering". But what about PDF? XML? FB2? RTF? DOC? ODT?

Machine can handle everything, its time is much less precious than
human's. Once written, a framework will perform wikitext <-> HTML
transformations in an instant; so goes for some intermediate
(completely detached from target, notice this) tree serialization
format - even if it's binary (personally I think binary is the only
choice here).

Then why care about if we will be rendering Wikipedia for someone's
browser or Kindle? Because if we do we will need to invent adaptors
and switches for all but the format we have chosen as primary. And
things change, even that format may change and the framework will be
left with DOM format theoretically based on some old HTML3 with
patches here and there. It will no more use "underlying HTML".

> The reason being that use of underlining is discouraged.
I agree but this is again target-thinking. Looking in future, markup
does not necessary define presentation, even basic like bold and
italic. You certainly know that <b> is discouraged in HTML in favor of
<strong> - why? Because it's semantics, not presentation. Similarly,
<u> is presentation but __ is semantics. We can define __terms__ like
this. Does it look good? Can you define <u>terms</u> like that? No,
you will need a new entity - of course soon there will be no symbols
left on any keyboard, even Japanese!

Ockam's razor.

> *And that also turned out to have issues, ever tried to write wikitext
> in piedmontese?
I have already said that a research of modern keyboard layouts with
their approximate user count is necessary if someone is going to
define an ideal keyboard layout. I am sure that even "=" can be absent
in some layouts (Japanese again). But there are general symbols thanks
to IBM.

> #REDIRECT and __TOC__ are a sad effect of
> separate building of contents.
Half of current wikitext is a sad effect. Most of C++ standard is a
sad effect. Come on, does this prevents us from producing positive?

This is what next generation of a software is meant to bring - core
reworked to the last screw based on previous use experience.

> You can't wrap #REDIRECT in a template, though, because the redirect applies
> to the template itself (unless you use some odd escaping?)
Once again target-thinking. Why limit {{...}} to templates? I have
mentioned this in my message:

2012/2/6 Pavel Tkachenko <proger.xp@gmail.com>:
> can be uniformized in a way similar to template
> insertions: {{redir New page}}, ... Templates can be
> called as {{tpl template arg arg arg}}

Note that templates are subset of {{construct}} features. "Redir" and
"TOC" are actually not templates but "extensions". Or "actions", name
doesn't change the meaning. The point is to have a uniform syntax for
custom constructs, in other words extensions. It's obvious that no
matter how well-thought a standard is there will always be missing
features once it hits the reality. It must be prepared for this and
this construct is one of the ways.

Now, to be concrete, what I see as a better syntax for text markup:
* formatting: **bold** //italic// __underline__ (or other semantic
meaning) --strikethru-- ++small++ (semantics) ^^superscript^^
* styling text - replacement for <span class="cls">: !!(cls)text!!
* code: %%unformatted%%
* highlighting: %%(php)unformatted%%
* lists (ordered, unordered, definition) - already covered above
* blocks in different language (ISO 639): @@ru text@@
* footnotes: [[*footnote text]]
* quotes: >inline ( >>older etc. ) and <[block]>
* terms - replacement for <span title="desc">: (?term desc?) or
(?space term==desc?)
* headings - just like current wikitext, ==heading== and so on
* misc markup: ??comment?? (invisible in resulting document), HTML
&&entities; (double ampersand)

The actual markup is almost twice as large as the above but you
already know it: most tokens have block form as opposed to the inline
(above).
* styling blocks:
!!(cls)
content
!!
* code highlighting:
%%(php)
echo 'Hello, world!';
%%
* language blocks:
@@jp
some Japanese text
@@
* footnotes:
[[*
Footnote. [[Link]]
More text.
]]
* comments:
??
author's comment.
can be shown in "draft output mode".
??

As demonstrated, it uses no HTML, BB-codes or any other tag-driven
markup. Symbols used are (ordered by rough use count):
* / % ( ) ! = - ? & > [ ] _ + ^

The first 9 symbols are quite common, so are _ and +.
I have put [ ] in the end because if alternative ((link)) syntax is
allowed then those two are only used by <[blockquote]>.

If wikitext syntax improvement is to be considered by the community I
am ready to give it more details. The above listing misses several
important points which require more explanations (in particular about
%%code and {{action}} calls).

Signed,
P. Tkachenko

_______________________________________________
Wikitext-l mailing list
Wikitext-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitext-l