This thread is quickly becoming quite tl;dr - but I think the discussions are valid and useful
Maybe we could break some of these desperate topics up a bit?
- Trevor
On Tue, Feb 7, 2012 at 8:35 AM, Pavel Tkachenko proger.xp@gmail.com wrote:
Platonides,
2012/2/7 Platonides platonides@gmail.com:
they are not imposed by the wikitext in any way, and could be removed today if wished:
Then why are they still there?
As I have said in my previous message I am ready to break down any piece of markup that you want. Templates are just most crazy part of current wikitext.
- What is "pp-semi" why it "move-indef"?
Names given by the users.
It's funny that users give names that others don't understand. Even those who are "technically proficient" but not part of "the elite".
The pipe in {{About|the planet}} can look odd, but the pipe at the beginning of the line looks natural. It seems like some kind of continuation of the {{.
Apart from just "looking natural" argument I would put "crucial need". I think quotes look natural after template parameter names' - but they have no use and will duplicate existing functionality (a parameter cannot last after the beginning of next parameter, for instance).
In other words why do we need a pipe if we assume that a template can have two modes: inline and block. Inline contain no line feeds, block contain line feeds before each of their parameters.
But first let's think if template should have 'mixed' mode. This will make things more complex without any need and it will make source look messy because it'll depend on the user if he wants to put that particular parameter on a separate line or not (imagine two guys: one of a "VT100" terminal with 80 characters in column and one with a latest 30" plasma display). It will also require markup to provide additional means of separating parameters if line feed can be trusted no more.
It's time to sharpen our Occam's Razor or the life will do this.
Space separated arguments are more readable for the casual editor, but normal editors would have a harder time to find out what's the template and what the parameters.
You're thinking in terms of parameters and my point was to discard all of this stuff and think in terms of human writing.
What looks more natural - {{About|Earth|the planet}} or {{About Earth, the planet}}? Since when our handwriting produces pipes instead of spaces (and not even all of them?).
Also, there are colons as parameters. How would you write as the
parameter
the article [[Gypsy: A Musical Fable]] or [[Batman: Year One]] ? By banning ':' in titles?
Have I said something about colons and links? Links are fine with colons or any other symbols.
But if we're touching this pipes in links are not that intuitive either. Pipes are actually not present on many keyboard layouts but even apart from that it's more natural to use an equality sign. Or double, for the purpose of text markup.
In fact, doubling a symbol is a great way of both differentiating it from misoperations and making easily recognizable by human eye. Space can be used for the same purpose (as a delimiter).
Let's be concrete:
- Links are wrapped in [[ and ]].
- Links may optionally have titles. A title is separated from link
URL (or local page name) with a space. 3. Those links which contain spaces in their address can be given title after double equality sign. 4. Finally, in very rare cases when both space and equality symbol is necessary a special markup-wise (!) escape symbol can be used.
Note how we tie up the links between markup at a whole and its particular tokens:
- Markup-wise, tokens (links, headings, formatting, etc.) are created
using double symbols. Say, [[ and ]] for links, == and == for headings, **bold**, __underline__, etc. This is the cornerstone. 2. Markup-wise, there is a single way of escaping markup. Escaping must be particularly well-thought because it will be the trickiest part even for experienced editors.
Currently wikitext uses terrible "<nowiki>stuff</nowiki>" but it doesn't always work and HTMLTidy comes in handy with its < and >. And some places (such as link titles) cannot be escaped altogether. Of course, given this approach we are forced to use an inconvenient symbol - pipe - which is unlikely to occur in normal text and thus needs no escaping 99% of cases.
But let's put this from bottom-up ot top-down. Think about a good-looking, rarely-used symbol that we will use for escaping... I'm sure the amount of texts Wikipedia has it's easy to conduct a research to determine that symbol but in this example I'll pick tilde (~), I'm actually using it for that purpose in my home-made markups.
What follows is the complete list of cases:
- [[Link]]
- [[Link Title]] = [[Link|Title]] now
- [[Space link==Title]] == [[Space link|Title]] now (pipe needs
layout change while ==' doesn't) 4. An URL containing '==' in its query part: [[http://weird.url/?k~==v]]. I've put tilde before '==' to prevent it from being treated as an address/title separator. Since there's no other separator - link has no caption. How wikitext handles links with pipes? Are they banned? 5. Local page name containing tilde: [[~ (Tilde)==Title]] - nothing breaks here because tilde is only special when it escapes something... and space isn't something to be escaped, so tilde is treated as normal text. 6. Extreme case: an URL containing both tilde and double equality sign which originally (in browser address bar) looks like: http://url?k~==v. In our link we will simply triple the tilde - making first tilde to escape itself and last tilde - to escape the separator: [[http://url?k~~~==v]]. 7. For the sake of completeness, most extreme case: URL with ~== and we've got to specify a title. We don't have to invent a wheel - simply put the title after a space and properly escape the URL as in case #6: [[http://url?k~~~==v Title]].
Now if you think "wow. that's a particular mess, nobody is going to understand and use this syntax". But before that note that only the first 3 cases are standard - others are exceptions which still increase their complexity gracefully according to the task.
For example, how wikitext will handle links containing pipes? hard to say, probably [[|]]. <nowiki>? [[<nowiki>l]]. <nowiki>? [[&lt;nowiki&gtl]].
As it is demonstrated the above system scales well and thus there is a space for improving wikitext, even if not using exactly my scheme.
A best thing about this is that once you're memorized the above 2 fundamental rules you can apply them anywhere. Markup can be escaped using tilde: ~[[no more a link]]. Associative (aka definition) lists can be uniformized: = Definition Value = Definition with spaces == Value = Definition of ~==, now its == Value
And so on. Scalable.
No. We have "Earth, the planet"!
You mean that a template cannot put "the" in front because some planets have their name without "the"? If so, they are not planets (nebulas, satellites, etc.) and a different template must be used. And if it's used the machine can handle language peculiarities.
You will need the proper name in the infobox, such as "Felis silvestris catus", even if the article is just called "Cat".
Yes: {{About Felis silvestris catus, cat}}
What kind of false positives are we talking about? Will any sane individual spend his precious time not editing but preparing to edit this mess?
I think they copy and paste, then fill the fields. Which is a good way of learning as they encounter it.
And the best way is to create/write things from scratch on our own. Where have you learned programming, in the class copy-pasting lines from blackboard to your notebook or in the office actually hitting the keys?
Many things can be hidden under copy-paste approach, there's even a notion "code monkey". I believe if wikitext continutes the path of STL there will be "wiki monkeys", provided that WMF gets commercial (which is a last thing I want to see).
But only syntax that hides nothing and is crisp to its bones can be called fair. Only syntax that doesn't require any "templates" that you just "copy" and "feel"... sorry, "fill in". And of course, there are cases when templates (I mean wikitext {{ and }}) with parameters are of great help and there are cases when parameters with their pipes and whistles are redundant. Just like life.
Well, just by looking at it I have no idea what those temperatures are :)
You mean that "| max_temp_1 = 331 K<ref name=asu_highest_temp/>" gives you more ideas?
What if they were in a different order?
So the machine cannot sort them out and determine which is "max"?
184 and 331 are probably some kind of limits, but what's that 287.2? Some kind of boilding point?
Hm, you might be right on this one and that it's better to have two template parameters: "temperature" and "temperature mean". This is what discussion is for - finding rough edges, right.
The goal of wikitext is to make html editing easy.
HTML editing? I thought wikitext was about text editing. Why not edit HTML using HTML?
HTML only needs a few special characters: <>&;=" but it's bothersome.
And 4 of them are absent on my native layout.
We define that * is a bullet and serves to make lists: It's easier to type, and looks good.
Completely agree, this is a human-readable markup.
We define # as the equivalent for numbered lists. Note that there's no usage of # for numbers in many cultures, so that's less 'visual' there.
And this is to be refactored adding features along the way. Let's write ordered lists using digits!
- One
- Two
- Three
The Japanese have their own kanji for numbers but they use Arabic digits sometimes as well.
Oh, I hear your thoughts - "items are moved/removed and the order is gone". Sure but machine can help us out:
- One
- Two
- Three
We use a little trick here: since ordered items with identical marker value are useless in human texts the markup can use them to represent automatically ordered markers. But even if someday we need two identical markers this can be fixed using some clean syntax, such as:
- First
1#1 First
Just as with links and [[eq==signs]] above our syntax gradually increases complexity. The person gets crazy so does the markup - but not before the user.
The added features of this approach are:
- Lists now support, say, 6 marker types: 1. (digits) 01.
(zero-padded digits) a. (lower-alpha) A. (upper-alpha) i. (lower-Roman) I. (upper-Roman) 2. Lists now can have markers of any value: 3. Third 2. Second
- First
- List types can be mixed:
- Digit
i. Roman a. Alpha
But each feature requires new symbols, and when you look at those available on every layout, you get *very* limited...
This is far from truth. During last years I have developed for my projects a markup that generally surpasses MediaWiki's and uses just common symbols. I will briefly summarize it in the end of this message.
For example, I could decide to list imagemaps as `Image1? `Image2?... (grave and acute), but oh, many keyboards don't have both accents.
Obviously, you cannot get a symbol for each and every particular markup case. But you don't have to. Compare how often will you use imagemaps and, say, highlighted PHP code. Right now both are pretty lengthy and angular in wikitext and the former (less used) is even shorter:
<imagemap> ... </imagemap> 10 + 11 symbols.
<source lang="php"> ... </source> 19 + 9 symbols.
Do you see my point?
And obviously, you can't use something that would easily appear in a normal text (or you start defining escape codes which are uglier, too).
Nah, escape codes come from C, don't forget it's not a very friendly language. Instead of escape codes one can use "quoting" (in Pascal style) - I have already touched tilde symbol above.
But I agree that markup that relies on escapes/quoting of any kind is not fair. Escapes by definition are exceptions and cannot overwhelm the common rule.
How do you type the *content* of the references?
As a title: [[*http:// My reference goes here]]. Or, better: we can use the same syntax for footnotes: [[*My footnote]] and references will be removed. So two types of footnotes: inline [[*text with ''markup'']], and block which start with [[*, then line break, then any kind of markup (including line breaks, more footnotes, links, even headings and lists), then line break, closing ]] and another line break. Uniform and consistent yet powerful and flexible.
But when you want to "go further", you start being limited.
I believe we will not if we start thinking in term of simplicity, not features - the former will give features as demonstrated above but if we focus on the latter it will force it out.
Just because that's what the underlying html uses.
This thinking is the problem - it is attached to particular use case. "We will use HTML 5 DOM just because we won't need to transform it when rendering". But what about PDF? XML? FB2? RTF? DOC? ODT?
Machine can handle everything, its time is much less precious than human's. Once written, a framework will perform wikitext <-> HTML transformations in an instant; so goes for some intermediate (completely detached from target, notice this) tree serialization format - even if it's binary (personally I think binary is the only choice here).
Then why care about if we will be rendering Wikipedia for someone's browser or Kindle? Because if we do we will need to invent adaptors and switches for all but the format we have chosen as primary. And things change, even that format may change and the framework will be left with DOM format theoretically based on some old HTML3 with patches here and there. It will no more use "underlying HTML".
The reason being that use of underlining is discouraged.
I agree but this is again target-thinking. Looking in future, markup does not necessary define presentation, even basic like bold and italic. You certainly know that <b> is discouraged in HTML in favor of <strong> - why? Because it's semantics, not presentation. Similarly, <u> is presentation but __ is semantics. We can define __terms__ like this. Does it look good? Can you define <u>terms</u> like that? No, you will need a new entity - of course soon there will be no symbols left on any keyboard, even Japanese!
Ockam's razor.
*And that also turned out to have issues, ever tried to write wikitext in piedmontese?
I have already said that a research of modern keyboard layouts with their approximate user count is necessary if someone is going to define an ideal keyboard layout. I am sure that even "=" can be absent in some layouts (Japanese again). But there are general symbols thanks to IBM.
#REDIRECT and __TOC__ are a sad effect of separate building of contents.
Half of current wikitext is a sad effect. Most of C++ standard is a sad effect. Come on, does this prevents us from producing positive?
This is what next generation of a software is meant to bring - core reworked to the last screw based on previous use experience.
You can't wrap #REDIRECT in a template, though, because the redirect
applies
to the template itself (unless you use some odd escaping?)
Once again target-thinking. Why limit {{...}} to templates? I have mentioned this in my message:
2012/2/6 Pavel Tkachenko proger.xp@gmail.com:
can be uniformized in a way similar to template insertions: {{redir New page}}, ... Templates can be called as {{tpl template arg arg arg}}
Note that templates are subset of {{construct}} features. "Redir" and "TOC" are actually not templates but "extensions". Or "actions", name doesn't change the meaning. The point is to have a uniform syntax for custom constructs, in other words extensions. It's obvious that no matter how well-thought a standard is there will always be missing features once it hits the reality. It must be prepared for this and this construct is one of the ways.
Now, to be concrete, what I see as a better syntax for text markup:
- formatting: **bold** //italic// __underline__ (or other semantic
meaning) --strikethru-- ++small++ (semantics) ^^superscript^^
- styling text - replacement for <span class="cls">: !!(cls)text!!
- code: %%unformatted%%
- highlighting: %%(php)unformatted%%
- lists (ordered, unordered, definition) - already covered above
- blocks in different language (ISO 639): @@ru text@@
- footnotes: [[*footnote text]]
- quotes: >inline ( >>older etc. ) and <[block]>
- terms - replacement for <span title="desc">: (?term desc?) or
(?space term==desc?)
- headings - just like current wikitext, ==heading== and so on
- misc markup: ??comment?? (invisible in resulting document), HTML
&&entities; (double ampersand)
The actual markup is almost twice as large as the above but you already know it: most tokens have block form as opposed to the inline (above).
- styling blocks:
!!(cls) content !!
- code highlighting:
%%(php) echo 'Hello, world!'; %%
- language blocks:
@@jp some Japanese text @@
- footnotes:
[[* Footnote. [[Link]] More text. ]]
- comments:
?? author's comment. can be shown in "draft output mode". ??
As demonstrated, it uses no HTML, BB-codes or any other tag-driven markup. Symbols used are (ordered by rough use count):
- / % ( ) ! = - ? & > [ ] _ + ^
The first 9 symbols are quite common, so are _ and +. I have put [ ] in the end because if alternative ((link)) syntax is allowed then those two are only used by <[blockquote]>.
If wikitext syntax improvement is to be considered by the community I am ready to give it more details. The above listing misses several important points which require more explanations (in particular about %%code and {{action}} calls).
Signed, P. Tkachenko
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l