(Continuing the crunching. huh? But this message is only 4 pages long.)
Forked into the new thread.
I'm afraid latest maillist messages were considered TLDR by most of the subscribers so I will put this in the beginning: is there any point in continuing our discussion on the subject? Platonides is a constructive company but he seems to be the only one participating.
Is the community truly interested in reworking the markup?
I have some knowledge and code assets that I will be happy to contribute; I will gladly take part in discussions or help improve the situation in some other way. But if Wikimedia team has different views onto the markup evolution it's fruitless to spend so much time chatting before the closed doors.
My reply follows.
On 08.02.2012 2:27, Platonides wrote:
Nobody proposed to change the template in that way? :)
You mean that nobody has actually studied markup usability?
If you start creating inline, block and mixed template modes, I suspect the syntax will end up being chaotic (I'm thinking in concrete cases in MW syntax).
True, that's why I propose only two modes: block and inline, both with clear distinctions and features.
That assumes that there's a non-ambiguous way to express that in natural language (plus that it is easily parseable by a machine).
Yes, added a few simple rules an unambiguous language can be created. I'm sure most of those business e-mails and official documents can be processed by the machine without much effort. And we're talking about even more formalized language here - text markup.
So, how do you split {{About Bijection, injection and surjection}} ?
If that is supposed to be a long caption (4 words and a comma) then just use quotes - like in natural handwriting. {{About "Bijection, injection and surjection"}}
The point of using an additional character not used in normal language is precisely for working the metalanguage.
I disagree, it only means that this subject has not yet been researched enough.
Also, there are colons as parameters. How would you write as the parameter the article [[Gypsy: A Musical Fable]] or [[Batman: Year One]] ? By banning ':' in titles?
Have I said something about colons and links? Links are fine with colons or any other symbols.
You mentioned colons for template arguments I'm acting as the devil's advocate asking you how to provide those titles as parameters to a template.
Uh, I have mistyped "comma" instead of "colon". Let me correct this: 1. {{About Something}} 2. {{About Something, of kind}} 3. {{About "Something, something and something", of kind}} 4. {{About "Something, something and something", "of kind, kind and kind"}}
As you can see, no character is banned from the title while in current pipe-centric approach I don't thing it's possible to have pipes there without a headache.
But if we're touching this pipes in links are not that intuitive either. Pipes are actually not present on many keyboard layouts but even apart from that it's more natural to use an equality sign. Or double, for the purpose of text markup.
It's consistent with the use of pipes in templates (which do use equal in that way to name parameters). Although link syntax was probably earlier.
Right, and pipes should not appear in templates either. It's too special symbol.
So is [[Batman Forever]] your syntax for [[Batman Forever|Batman Forever]] or [[Batman|Forever]] ? So much cases are bad, KISS.
I do not see your point. The processing is straightforward: 1. Link contains == - it separates address from title. 2. Link contains no == but contains a space - the first space separates address from title. 3. There is neither == nor ' ' - link is titleless. This means that: * local links get titles from page name, not page address (this is important and differs from current MediaWiki implementation in a better way) * remote links can also get their title from <title> after fetching first 4 KiB of that page or something
Use cases: * "[[http://google/search?q=%61%62%63 Google it]]" - for external links == delimiter won't be used at all * "See this [[page]]" - current wikitext is the same * "See [[page that page]]" vs. current [[page|that page]]. Looks more clean and easier to type (space is present on all keyboards and is quite large in size). This covers not less than half local links. * "See [[Some page==this page]]" vs. current [[Some page|this page]]. This case has less drastic differences than previous 3 but a pipe is still both special to English layouts and less noticeable to human eye than double equality sign.
Does "KISS" mean that every use case should be created with uniform but because of this equally inconvenient syntax? I agree that more complex cases should have correspondingly more complex syntax but this scaling must be adequate. By placing pipe everywhere not only cross-language usability is reduced but the fact that it's redundant in some cases (#1 and #3 items above) is ignored.
- Finally, in very rare cases when both space and equality symbol
is necessary a special markup-wise (!) escape symbol can be used.
As an example: [[2 + 2 = 5]]
Your example contains no double equality symbol and is treated as space-separated title: [[2| + 2 = 5]] in current wikitext.
Would you remove === headings?
No, headings are consistent because the first heading starts with double equality sign.
Currently wikitext uses terrible "<nowiki>stuff</nowiki>" but it doesn't always work and HTMLTidy comes in handy with its< and >. And some places (such as link titles) cannot be escaped altogether.
Really? I think you can.
Give some examples and we will examine their adequateness.
Your proposal for forcing to edit the urls is very bad. You can't just paste, you need to go changing every = on it (which is a frequent character) to ~==.
No, no, no, you have got a completely wrong idea. You don't have to escape SINGLE = because it is not special. You only need to escape double ==. How much double == have you seen in the links? I have seen them being used on my local bookstore site but it's surely an exception.
Pipes are banned from titles.
Great, let's make machine's life easier.
I'm not sure this is a good analogy. Copy-pasting chunks of code look like copying phrases from other articles to make your own. That should be original. OTOH, reusing the existing LaTeX template is much more appropiate than writing your own from scratch trying to copy the style of the provided one.
For such things templates must be created that will reduce the number of entities identical to all of their use cases to minimum. In MediaWiki this is done using {{templates and=parameters}} and this is good. If you were talking about copy-pasting these templates, their parameters and empty values - this is fine. But if it was about copy-pasting the same code with all rendering tricks ( , {{iejrhgy}} and other cryptic things) - this is bad.
Even if I write a program from scratch, I should make it consistent with other tools. That means an appropiate arguments would be sort -r --ignore-case --sort=month ./myfile instead of sort<- !case (sort as month) \\./myfile\\
Standardizing is fine unless it starts looking unnatural. The following example might be argued but I can't think of another one quickly: tar -czf file.tar.gz .
While this uses standard CLI syntax is in true *nix ideology this is what (among other things) separate POSIX from Windows. For instance, I could write: tar file.tar.gz .
...and the program will detect -czf arguments on its own based on -f is simply implied because there are 2 unnamed arguments (without leading -X) -c target file doesn't exist -z target file has extension .gz
It's the same with templates or other markup: while {{About page=Earth kind=planet}} or something similar is fine, {{About Earth, planet}} or some other form is more appropriate in this particular use case.
You are giving many attributions to the machine. Personally, I would spit out an error, just in they were eg. in different units.
Yes, this is one of the ways and I would opt for it if we want to have a strict syntax.
But you are making up your syntax, then requiring the system to adapt for you.
Can you elaborate more on this point?
The goal of wikitext is to make html editing easy.
HTML editing? I thought wikitext was about text editing. Why not edit HTML using HTML?
Because it's considered cumbersome. (Actually, it's presentational editing, but as the presentation is obtained by using HTML as an intermediate language...)
Indeed, HTML is cumbersome, that's why wikitext and all other text markups have been invented. But they don't have to copy HTML syntax - just the opposite.
And you have complicated the originally clean syntax of 1, 2, 3
Clean syntax for whom? For Englishmen? And are hashes actually clean? If so, why don't we use them in our e-mail messages?
Would html links become italic? (that was a problem of wikicreole, it was defined as 'italic unless in links')
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
And yes, context-specific grammar is more than regular expressions can handle. Regexps are good but this doesn't mean anything incompatible with sed is beyond "too complex".
As already mentioned, I am using my own markup processor written in PHP on my projects and it implements all markup already described including the [[http://italic]] (context-specific grammar) case. And its parsing loop is under 350 lines of code.
Well, I have to say it seems well though, it "doesn't look bad".
Thank you. I have given it a lot of thinking and practice but I'm sure there still are things to improve. I would be ecstatic if my experience can help the world's largest free knowledge community.
Thanks again for your mail, Platonides.
Signed, P. Tkachenko
I'm afraid latest maillist messages were considered TLDR by most of the subscribers so I will put this in the beginning: is there any point in continuing our discussion on the subject? Platonides is a constructive company but he seems to be the only one participating.
Is the community truly interested in reworking the markup?
The discussion is certainly valid.
Honestly, if i'm allowed to speak out my crazy optimistic utopian dream, then: <crazy-optimistic-utopian-dream>i want the current-style wiki markup to disappear completely. I'm referring to *, '''''', {{}}, [[]] etc. It was very beneficial for the beginning, because it was for the most part more intuitive to type than <ul><li></li></ul>, <strong></strong> and <a href=""></a>, but for people who want easiness, the Visual Editor is supposed to provide it and after that most of them should never look back to the markup.
For people who will want text-based markup, it should be mostly XHTML. So, <section>, <poem>, <source>, and <nowiki> are kinda XHTML so they can stay. *, '''''' and [[]] are not XHTML, and they can and should be replaced by XHTML, althogh. And {{}} needs its own markup, but it should be XHTML-like <template name="citation needed" />.
So there. My idea of a bright wikifuture is less home-grown parsers and more standards. It's easier for the developers and works organically with the browsers. It's not necessarily easier for people who want to write articles in plain text with markup, but hey, they asked for it.</crazy-optimistic-utopian-dream>
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
2012/2/8 Pavel Tkachenko proger.xp@gmail.com:
(Continuing the crunching. huh? But this message is only 4 pages long.)
Forked into the new thread.
I have some knowledge and code assets that I will be happy to contribute; I will gladly take part in discussions or help improve the situation in some other way. But if Wikimedia team has different views onto the markup evolution it's fruitless to spend so much time chatting before the closed doors.
My reply follows.
On 08.02.2012 2:27, Platonides wrote:
Nobody proposed to change the template in that way? :)
You mean that nobody has actually studied markup usability?
If you start creating inline, block and mixed template modes, I suspect the syntax will end up being chaotic (I'm thinking in concrete cases in MW syntax).
True, that's why I propose only two modes: block and inline, both with clear distinctions and features.
That assumes that there's a non-ambiguous way to express that in natural language (plus that it is easily parseable by a machine).
Yes, added a few simple rules an unambiguous language can be created. I'm sure most of those business e-mails and official documents can be processed by the machine without much effort. And we're talking about even more formalized language here - text markup.
So, how do you split {{About Bijection, injection and surjection}} ?
If that is supposed to be a long caption (4 words and a comma) then just use quotes - like in natural handwriting. {{About "Bijection, injection and surjection"}}
The point of using an additional character not used in normal language is precisely for working the metalanguage.
I disagree, it only means that this subject has not yet been researched enough.
Also, there are colons as parameters. How would you write as the parameter the article [[Gypsy: A Musical Fable]] or [[Batman: Year One]] ? By banning ':' in titles?
Have I said something about colons and links? Links are fine with colons or any other symbols.
You mentioned colons for template arguments I'm acting as the devil's advocate asking you how to provide those titles as parameters to a template.
Uh, I have mistyped "comma" instead of "colon". Let me correct this:
- {{About Something}}
- {{About Something, of kind}}
- {{About "Something, something and something", of kind}}
- {{About "Something, something and something", "of kind, kind and kind"}}
As you can see, no character is banned from the title while in current pipe-centric approach I don't thing it's possible to have pipes there without a headache.
But if we're touching this pipes in links are not that intuitive either. Pipes are actually not present on many keyboard layouts but even apart from that it's more natural to use an equality sign. Or double, for the purpose of text markup.
It's consistent with the use of pipes in templates (which do use equal in that way to name parameters). Although link syntax was probably earlier.
Right, and pipes should not appear in templates either. It's too special symbol.
So is [[Batman Forever]] your syntax for [[Batman Forever|Batman Forever]] or [[Batman|Forever]] ? So much cases are bad, KISS.
I do not see your point. The processing is straightforward:
- Link contains == - it separates address from title.
- Link contains no == but contains a space - the first space separates
address from title. 3. There is neither == nor ' ' - link is titleless. This means that: * local links get titles from page name, not page address (this is important and differs from current MediaWiki implementation in a better way) * remote links can also get their title from <title> after fetching first 4 KiB of that page or something
Use cases:
- "[[http://google/search?q=%61%62%63 Google it]]" - for external links
== delimiter won't be used at all
- "See this [[page]]" - current wikitext is the same
- "See [[page that page]]" vs. current [[page|that page]]. Looks more
clean and easier to type (space is present on all keyboards and is quite large in size). This covers not less than half local links.
- "See [[Some page==this page]]" vs. current [[Some page|this page]].
This case has less drastic differences than previous 3 but a pipe is still both special to English layouts and less noticeable to human eye than double equality sign.
Does "KISS" mean that every use case should be created with uniform but because of this equally inconvenient syntax? I agree that more complex cases should have correspondingly more complex syntax but this scaling must be adequate. By placing pipe everywhere not only cross-language usability is reduced but the fact that it's redundant in some cases (#1 and #3 items above) is ignored.
- Finally, in very rare cases when both space and equality symbol
is necessary a special markup-wise (!) escape symbol can be used.
As an example: [[2 + 2 = 5]]
Your example contains no double equality symbol and is treated as space-separated title: [[2| + 2 = 5]] in current wikitext.
Would you remove === headings?
No, headings are consistent because the first heading starts with double equality sign.
Currently wikitext uses terrible "<nowiki>stuff</nowiki>" but it doesn't always work and HTMLTidy comes in handy with its< and >. And some places (such as link titles) cannot be escaped altogether.
Really? I think you can.
Give some examples and we will examine their adequateness.
Your proposal for forcing to edit the urls is very bad. You can't just paste, you need to go changing every = on it (which is a frequent character) to ~==.
No, no, no, you have got a completely wrong idea. You don't have to escape SINGLE = because it is not special. You only need to escape double ==. How much double == have you seen in the links? I have seen them being used on my local bookstore site but it's surely an exception.
Pipes are banned from titles.
Great, let's make machine's life easier.
I'm not sure this is a good analogy. Copy-pasting chunks of code look like copying phrases from other articles to make your own. That should be original. OTOH, reusing the existing LaTeX template is much more appropiate than writing your own from scratch trying to copy the style of the provided one.
For such things templates must be created that will reduce the number of entities identical to all of their use cases to minimum. In MediaWiki this is done using {{templates and=parameters}} and this is good. If you were talking about copy-pasting these templates, their parameters and empty values - this is fine. But if it was about copy-pasting the same code with all rendering tricks ( , {{iejrhgy}} and other cryptic things) - this is bad.
Even if I write a program from scratch, I should make it consistent with other tools. That means an appropiate arguments would be sort -r --ignore-case --sort=month ./myfile instead of sort<- !case (sort as month) \\./myfile\\
Standardizing is fine unless it starts looking unnatural. The following example might be argued but I can't think of another one quickly: tar -czf file.tar.gz .
While this uses standard CLI syntax is in true *nix ideology this is what (among other things) separate POSIX from Windows. For instance, I could write: tar file.tar.gz .
...and the program will detect -czf arguments on its own based on -f is simply implied because there are 2 unnamed arguments (without leading -X) -c target file doesn't exist -z target file has extension .gz
It's the same with templates or other markup: while {{About page=Earth kind=planet}} or something similar is fine, {{About Earth, planet}} or some other form is more appropriate in this particular use case.
You are giving many attributions to the machine. Personally, I would spit out an error, just in they were eg. in different units.
Yes, this is one of the ways and I would opt for it if we want to have a strict syntax.
But you are making up your syntax, then requiring the system to adapt for you.
Can you elaborate more on this point?
The goal of wikitext is to make html editing easy.
HTML editing? I thought wikitext was about text editing. Why not edit HTML using HTML?
Because it's considered cumbersome. (Actually, it's presentational editing, but as the presentation is obtained by using HTML as an intermediate language...)
Indeed, HTML is cumbersome, that's why wikitext and all other text markups have been invented. But they don't have to copy HTML syntax - just the opposite.
And you have complicated the originally clean syntax of 1, 2, 3
Clean syntax for whom? For Englishmen? And are hashes actually clean? If so, why don't we use them in our e-mail messages?
Would html links become italic? (that was a problem of wikicreole, it was defined as 'italic unless in links')
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
And yes, context-specific grammar is more than regular expressions can handle. Regexps are good but this doesn't mean anything incompatible with sed is beyond "too complex".
As already mentioned, I am using my own markup processor written in PHP on my projects and it implements all markup already described including the [[http://italic]] (context-specific grammar) case. And its parsing loop is under 350 lines of code.
Well, I have to say it seems well though, it "doesn't look bad".
Thank you. I have given it a lot of thinking and practice but I'm sure there still are things to improve. I would be ecstatic if my experience can help the world's largest free knowledge community.
Thanks again for your mail, Platonides.
Signed, P. Tkachenko
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Amir,
Your idea doesn't sound that utopian or crazy to me but, IMO, it has its weak points.
First, it's a superstition that XML is the only standard way of representing information. The fact that even after its heavy lobbying by the-company-we-all-know-about languages like YAML still appear means that not all people are happy with XML. Similarly, textile/markdown//bb-codes/wikitext and a dozen of others including latex, *nix man pages, etc. are appearing even after HTML has been around for decades.
What is a standard? This is a set of rules. Strict ABNF schemes. UML, if you please. Can you call Windows INI files "standard"? Yes, albeit they have just a few entities. And YAML? TeX? Yes. And PDF? EPS? Yes, and they're even unreadable by humans.
Similarly, wiki markup can be standardized. Creole is meant to be a standard but it's limited; however, the direction is right and can be voted for. I am ready to personally standardize and unificate wiki markup if only to prove my point.
Second, by dividing people into those who "can write texts using a visual editor" and those who "have to write texts using a storage format" you're making the same discrimination towards "geeks" that "geeks" are currently making towards "common folk" by providing nothing but a text field for writing articles.
Let's put this plain: XML and mostly (X)HTML (SGML at a whole) are storage formats. This is why they have namespaces, DTD and other features. But they are generic and while this is an advantage (even binary data can be stored in some form there) when it comes in touch with humans things break or just don't move.
This is because XML and friends are not problem-based solutions. While I have to agree that editing texts might be easier by some people using a rich editor I cannot agree that editing them in plain text form must be limited to storage formats. Have you tried hexediting an article? Having to perform codepage conversions (read, layout changes) in your mind at the same time. This is the same.
Going further into this looks like speaking about personal taste for colors and forms so I will just summarize it up: let's leave everyone with their tool. We have three groups of "users": machines, who process the text - they're fine with XML or BAML all alike; users, who need a visual editor to "parse markup" as was said on the neighbor thread; and someone in between, "geeks", who are enough humans to dislike XML and enough technicians to despise WYSIWYG.
This seems fair and not that big deal to implement because you'll get the first and last "markups" ready by definition to have a working parser (something to store trees in and something to input them using) and the middle (visual editor) will come in naturally given the other two.
Signed, P. Tkachenko
2012/2/8 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Honestly, if i'm allowed to speak out my crazy optimistic utopian dream, then: <crazy-optimistic-utopian-dream>i want the current-style wiki markup to disappear completely. I'm referring to *, '''''', {{}}, [[]] etc. It was very beneficial for the beginning, because it was for the most part more intuitive to type than <ul><li></li></ul>, <strong></strong> and <a href=""></a>, but for people who want easiness, the Visual Editor is supposed to provide it and after that most of them should never look back to the markup.
For people who will want text-based markup, it should be mostly XHTML. So, <section>, <poem>, <source>, and <nowiki> are kinda XHTML so they can stay. *, '''''' and [[]] are not XHTML, and they can and should be replaced by XHTML, althogh. And {{}} needs its own markup, but it should be XHTML-like <template name="citation needed" />.
So there. My idea of a bright wikifuture is less home-grown parsers and more standards. It's easier for the developers and works organically with the browsers. It's not necessarily easier for people who want to write articles in plain text with markup, but hey, they asked for it.</crazy-optimistic-utopian-dream>
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Hello,
I think this debate might happen on the wrong level. If wikitext is going to be replaced the new language should be designed on an abstract level first. If I understand some messages on this list, maybe the current parser work will provide a DOM scheme which will be able to represent about all wiki pages. This is necessary for the Visual Editor. If this scheme is consistent and feature complete then we can say that we have _the_ "data model" for wikis. If you have a good DOM data model with enough documentation, a single summer of code student can write you a parser for your syntax of desire. And I suspect (need to think a bit more about it) that if you come up a DOM solution for representing a wikitext, than that stuff can serve also as a context free grammar.
So the real question is whether a new-gen wiki syntax will be compatible with a consensual data model we might have in the future.
Best, Mihály
On 8 February 2012 13:31, Pavel Tkachenko proger.xp@gmail.com wrote:
Amir,
Your idea doesn't sound that utopian or crazy to me but, IMO, it has its weak points.
First, it's a superstition that XML is the only standard way of representing information. The fact that even after its heavy lobbying by the-company-we-all-know-about languages like YAML still appear means that not all people are happy with XML. Similarly, textile/markdown//bb-codes/wikitext and a dozen of others including latex, *nix man pages, etc. are appearing even after HTML has been around for decades.
What is a standard? This is a set of rules. Strict ABNF schemes. UML, if you please. Can you call Windows INI files "standard"? Yes, albeit they have just a few entities. And YAML? TeX? Yes. And PDF? EPS? Yes, and they're even unreadable by humans.
Similarly, wiki markup can be standardized. Creole is meant to be a standard but it's limited; however, the direction is right and can be voted for. I am ready to personally standardize and unificate wiki markup if only to prove my point.
Second, by dividing people into those who "can write texts using a visual editor" and those who "have to write texts using a storage format" you're making the same discrimination towards "geeks" that "geeks" are currently making towards "common folk" by providing nothing but a text field for writing articles.
Let's put this plain: XML and mostly (X)HTML (SGML at a whole) are storage formats. This is why they have namespaces, DTD and other features. But they are generic and while this is an advantage (even binary data can be stored in some form there) when it comes in touch with humans things break or just don't move.
This is because XML and friends are not problem-based solutions. While I have to agree that editing texts might be easier by some people using a rich editor I cannot agree that editing them in plain text form must be limited to storage formats. Have you tried hexediting an article? Having to perform codepage conversions (read, layout changes) in your mind at the same time. This is the same.
Going further into this looks like speaking about personal taste for colors and forms so I will just summarize it up: let's leave everyone with their tool. We have three groups of "users": machines, who process the text - they're fine with XML or BAML all alike; users, who need a visual editor to "parse markup" as was said on the neighbor thread; and someone in between, "geeks", who are enough humans to dislike XML and enough technicians to despise WYSIWYG.
This seems fair and not that big deal to implement because you'll get the first and last "markups" ready by definition to have a working parser (something to store trees in and something to input them using) and the middle (visual editor) will come in naturally given the other two.
Signed, P. Tkachenko
2012/2/8 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Honestly, if i'm allowed to speak out my crazy optimistic utopian dream, then: <crazy-optimistic-utopian-dream>i want the current-style wiki markup to disappear completely. I'm referring to *, '''''', {{}}, [[]] etc. It was very beneficial for the beginning, because it was for the most part more intuitive to type than <ul><li></li></ul>, <strong></strong> and <a href=""></a>, but for people who want easiness, the Visual Editor is supposed to provide it and after that most of them should never look back to the markup.
For people who will want text-based markup, it should be mostly XHTML. So, <section>, <poem>, <source>, and <nowiki> are kinda XHTML so they can stay. *, '''''' and [[]] are not XHTML, and they can and should be replaced by XHTML, althogh. And {{}} needs its own markup, but it should be XHTML-like <template name="citation needed" />.
So there. My idea of a bright wikifuture is less home-grown parsers and more standards. It's easier for the developers and works organically with the browsers. It's not necessarily easier for people who want to write articles in plain text with markup, but hey, they asked for it.</crazy-optimistic-utopian-dream>
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Hello. I'm new here and I wish to help out here as well. I don't have the full picture of what is being discussed seeing that the e-Mail I tried to read it too long. From what I understand it that the wiki markup editor is being replaced with a graphic editor with a completely new markup processor. My opinion on this is that that is a bad idea because there are millions of experienced users who are used to working with Wiki-Markup codes. My opinion on the matter is to have the graphic editor available side by side with the markup editor being able to choose at will which editor people would like to edit with. Default preferences can be set. The graphic editor will translate the page into Wiki-Markup and save. This method will be more efficient for the programmers saving time and resources. Any thoughts?
Gesendet von Maximilian's iPhone. (Sent from Maximilian's iPhone.)
Am Feb 8, 2012 um 9:20 schrieb Mihály Héder hedermisi@gmail.com:
Hello,
I think this debate might happen on the wrong level. If wikitext is going to be replaced the new language should be designed on an abstract level first. If I understand some messages on this list, maybe the current parser work will provide a DOM scheme which will be able to represent about all wiki pages. This is necessary for the Visual Editor. If this scheme is consistent and feature complete then we can say that we have _the_ "data model" for wikis. If you have a good DOM data model with enough documentation, a single summer of code student can write you a parser for your syntax of desire. And I suspect (need to think a bit more about it) that if you come up a DOM solution for representing a wikitext, than that stuff can serve also as a context free grammar.
So the real question is whether a new-gen wiki syntax will be compatible with a consensual data model we might have in the future.
Best, Mihály
On 8 February 2012 13:31, Pavel Tkachenko proger.xp@gmail.com wrote:
Amir,
Your idea doesn't sound that utopian or crazy to me but, IMO, it has its weak points.
First, it's a superstition that XML is the only standard way of representing information. The fact that even after its heavy lobbying by the-company-we-all-know-about languages like YAML still appear means that not all people are happy with XML. Similarly, textile/markdown//bb-codes/wikitext and a dozen of others including latex, *nix man pages, etc. are appearing even after HTML has been around for decades.
What is a standard? This is a set of rules. Strict ABNF schemes. UML, if you please. Can you call Windows INI files "standard"? Yes, albeit they have just a few entities. And YAML? TeX? Yes. And PDF? EPS? Yes, and they're even unreadable by humans.
Similarly, wiki markup can be standardized. Creole is meant to be a standard but it's limited; however, the direction is right and can be voted for. I am ready to personally standardize and unificate wiki markup if only to prove my point.
Second, by dividing people into those who "can write texts using a visual editor" and those who "have to write texts using a storage format" you're making the same discrimination towards "geeks" that "geeks" are currently making towards "common folk" by providing nothing but a text field for writing articles.
Let's put this plain: XML and mostly (X)HTML (SGML at a whole) are storage formats. This is why they have namespaces, DTD and other features. But they are generic and while this is an advantage (even binary data can be stored in some form there) when it comes in touch with humans things break or just don't move.
This is because XML and friends are not problem-based solutions. While I have to agree that editing texts might be easier by some people using a rich editor I cannot agree that editing them in plain text form must be limited to storage formats. Have you tried hexediting an article? Having to perform codepage conversions (read, layout changes) in your mind at the same time. This is the same.
Going further into this looks like speaking about personal taste for colors and forms so I will just summarize it up: let's leave everyone with their tool. We have three groups of "users": machines, who process the text - they're fine with XML or BAML all alike; users, who need a visual editor to "parse markup" as was said on the neighbor thread; and someone in between, "geeks", who are enough humans to dislike XML and enough technicians to despise WYSIWYG.
This seems fair and not that big deal to implement because you'll get the first and last "markups" ready by definition to have a working parser (something to store trees in and something to input them using) and the middle (visual editor) will come in naturally given the other two.
Signed, P. Tkachenko
2012/2/8 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Honestly, if i'm allowed to speak out my crazy optimistic utopian dream, then: <crazy-optimistic-utopian-dream>i want the current-style wiki markup to disappear completely. I'm referring to *, '''''', {{}}, [[]] etc. It was very beneficial for the beginning, because it was for the most part more intuitive to type than <ul><li></li></ul>, <strong></strong> and <a href=""></a>, but for people who want easiness, the Visual Editor is supposed to provide it and after that most of them should never look back to the markup.
For people who will want text-based markup, it should be mostly XHTML. So, <section>, <poem>, <source>, and <nowiki> are kinda XHTML so they can stay. *, '''''' and [[]] are not XHTML, and they can and should be replaced by XHTML, althogh. And {{}} needs its own markup, but it should be XHTML-like <template name="citation needed" />.
So there. My idea of a bright wikifuture is less home-grown parsers and more standards. It's easier for the developers and works organically with the browsers. It's not necessarily easier for people who want to write articles in plain text with markup, but hey, they asked for it.</crazy-optimistic-utopian-dream>
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
On 02/08/2012 09:33 AM, Maximilian Doerr wrote:
Hello. I'm new here and I wish to help out here as well. I don't have the full picture of what is being discussed seeing that the e-Mail I tried to read it too long. From what I understand it that the wiki markup editor is being replaced with a graphic editor with a completely new markup processor. My opinion on this is that that is a bad idea because there are millions of experienced users who are used to working with Wiki-Markup codes. My opinion on the matter is to have the graphic editor available side by side with the markup editor being able to choose at will which editor people would like to edit with. Default preferences can be set. The graphic editor will translate the page into Wiki-Markup and save. This method will be more efficient for the programmers saving time and resources. Any thoughts?
Hi, Maximilian! Thanks for your interest in our plans.
https://www.mediawiki.org/wiki/Future is the best place to read what we have planned. The very short answer is -- yes! You have described basically what we are planning to do. :-)
You might especially be interested in
https://www.mediawiki.org/wiki/Visual_editor/Software_design
and
https://www.mediawiki.org/wiki/Visual_editor/Phase_2
Gabriel Wicke and Trevor Parscal can help you find ways to help out -- they read this list, and Gabriel is often on IRC during the German daytime as gwicke in the #mediawiki channel on Freenode.
Thanks!
On Wed, Feb 8, 2012 at 3:33 PM, Maximilian Doerr cybernet678@yahoo.com wrote:
Hello. I'm new here and I wish to help out here as well. I don't have the full picture of what is being discussed seeing that the e-Mail I tried to read it too long. From what I understand it that the wiki markup editor is being replaced with a graphic editor with a completely new markup processor. My opinion on this is that that is a bad idea because there are millions of experienced users who are used to working with Wiki-Markup codes. My opinion on the matter is to have the graphic editor available side by side with the markup editor being able to choose at will which editor people would like to edit with. Default preferences can be set. The graphic editor will translate the page into Wiki-Markup and save. This method will be more efficient for the programmers saving time and resources. Any thoughts?
That's exactly what we're already doing. Trevor said [1] that "we have no plan or intention to (...) disable markup (plain text) editing".
Roan
[1] http://lists.wikimedia.org/pipermail/wikitext-l/2012-February/000543.html
I'm a "geek" and do not "dislike" or "despise" XML/[X]HTML or WYSIWYG or wikimarkup. They all have their uses for different users and even the same user in different situations for different purposes. And I would rather chew my own feet off than have to type '<template name="citation needed" />'. Meanwhile, a WYSIWYG-only editing environment leads to problems (most often overlapping markup, and markup that is not quite constrained to where it should be; anyone who has used something like Microsoft Word more than a handful of times knows exactly what I mean, because you've time and again italicized or boldfaced something and found it difficult to get this effect to stop on subsequent text, so often that it's usually easier to write the text with no markup, then go back in and specifically add it to the bits that need emphasis.
Or to put it in wikimarkup terms, what you want it is: Tolkien's masterpiece, ''The Lord of the Rings'', was written between... but what WYSIWYG editors often give you is: Tolkien's masterpiece, ''The Lord of the Rings, ''was written between... because it's virtually impossible sometimes to tell when WYSIWYG italics are including more than they should.
Just one example. It's a major issue for metadata and machine readability, especially when templates are applying classes: Tolkien's masterpiece, {{titletext|The Lord of the Rings, }}was written between... applying a 'class="work-title"', would produce factually incorrect output (i.e., in one context at least, outright *corrupt data*) that said that a comma-space was part of the title of the work.
It's important that we be able to access and edit the source, without it looking like almost-human-unreadble TeX or RTF, and that geeky editors be able to edit all the time in source mode without ever seeing WYSIWYG if that's their preference (it would be mine; I still hand-code a lot when building websites, too, even if I let whatever the platform is (WordPress, whatever) generate a lot of the basic structure automatically. It's crucial that I be able to tweak stuff at the character-by-character level, and alter the markup around that content in any way I need to.
But for actual article drafting, in prose sentences and paragraphs, as opposed to tweaking, I vastly prefer WYSIWYG. I seriously doubt I'm alone in any of this, even in the combination of preferences I've outlined.
-- Stanton McCandlish McCandlish Consulting 9505 Tanoan Dr NE Albuquerque NM 87111-5836
505 715-7650
I'm all for a modern WYSIWYG editor however it would still require an underlying syntax.
I disagree that that xhtml is a geek only storage format or that the current Wikisyntax has a lower learning curve. Hacking templates to overcome parser bugs is one of the worst experiences I've has as an editor.
I think that an xml subset is the ideal should be the underlying format. It's the best known technology, has mature development tools. It could be parsed to and written to most efficiently by browser, and even the editor could be simplified by using it.
A well designed format, would be easily transformed to and from other formats. (xslt == toOthers, domParser = from others. This could provide interoperability with other wikis format and a friendlier variant of the existing wiki markup.
A well designed format should be: easy to parse (read : unambiguous, won't require context or semantics to parse) would be possible to auto complete would permit gracefully error recovery without bothering the editor unless required. Would specify syntax errors and advise on corrections Would be fully learnable in a couple of hours...
If we put our heads together and come up with something like that we will make some real progress. I think a time out is need because the future == https://www.mediawiki.org/wiki/Future is unclear and developing the new editor without a design documents is just a way to perpetuate the problems of the current syntax.
Operation Manager E-mail: oren@romai-horizon.com Mobil: +36 30 866 6706
Római Horizon Kft. H-1039 Budapest Királyok útja 291. D. ép. fszt. 2. Tel: +36 1 492 1492 Fax: +36 1 266 5529
-----Original Message----- From: wikitext-l-bounces@lists.wikimedia.org [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of Pavel Tkachenko Sent: Wednesday, February 08, 2012 1:32 PM To: Wikitext-l Subject: Re: [Wikitext-l] Markup syntax
Amir,
Your idea doesn't sound that utopian or crazy to me but, IMO, it has its weak points.
First, it's a superstition that XML is the only standard way of representing information. The fact that even after its heavy lobbying by the-company-we-all-know-about languages like YAML still appear means that not all people are happy with XML. Similarly, textile/markdown//bb-codes/wikitext and a dozen of others including latex, *nix man pages, etc. are appearing even after HTML has been around for decades.
What is a standard? This is a set of rules. Strict ABNF schemes. UML, if you please. Can you call Windows INI files "standard"? Yes, albeit they have just a few entities. And YAML? TeX? Yes. And PDF? EPS? Yes, and they're even unreadable by humans.
Similarly, wiki markup can be standardized. Creole is meant to be a standard but it's limited; however, the direction is right and can be voted for. I am ready to personally standardize and unificate wiki markup if only to prove my point.
Second, by dividing people into those who "can write texts using a visual editor" and those who "have to write texts using a storage format" you're making the same discrimination towards "geeks" that "geeks" are currently making towards "common folk" by providing nothing but a text field for writing articles.
Let's put this plain: XML and mostly (X)HTML (SGML at a whole) are storage formats. This is why they have namespaces, DTD and other features. But they are generic and while this is an advantage (even binary data can be stored in some form there) when it comes in touch with humans things break or just don't move.
This is because XML and friends are not problem-based solutions. While I have to agree that editing texts might be easier by some people using a rich editor I cannot agree that editing them in plain text form must be limited to storage formats. Have you tried hexediting an article? Having to perform codepage conversions (read, layout changes) in your mind at the same time. This is the same.
Going further into this looks like speaking about personal taste for colors and forms so I will just summarize it up: let's leave everyone with their tool. We have three groups of "users": machines, who process the text - they're fine with XML or BAML all alike; users, who need a visual editor to "parse markup" as was said on the neighbor thread; and someone in between, "geeks", who are enough humans to dislike XML and enough technicians to despise WYSIWYG.
This seems fair and not that big deal to implement because you'll get the first and last "markups" ready by definition to have a working parser (something to store trees in and something to input them using) and the middle (visual editor) will come in naturally given the other two.
Signed, P. Tkachenko
2012/2/8 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Honestly, if i'm allowed to speak out my crazy optimistic utopian dream, then: <crazy-optimistic-utopian-dream>i want the current-style wiki markup to disappear completely. I'm referring to *, '''''', {{}}, [[]] etc. It was very beneficial for the beginning, because it was for the most part more intuitive to type than <ul><li></li></ul>, <strong></strong> and <a href=""></a>, but for people who want easiness, the Visual Editor is supposed to provide it and after that most of them should never look back to the markup.
For people who will want text-based markup, it should be mostly XHTML. So, <section>, <poem>, <source>, and <nowiki> are kinda XHTML so they can stay. *, '''''' and [[]] are not XHTML, and they can and should be replaced by XHTML, althogh. And {{}} needs its own markup, but it should be XHTML-like <template name="citation needed" />.
So there. My idea of a bright wikifuture is less home-grown parsers and more standards. It's easier for the developers and works organically with the browsers. It's not necessarily easier for people who want to write articles in plain text with markup, but hey, they asked for it.</crazy-optimistic-utopian-dream>
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
_______________________________________________ Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Hello,
I cannot speak for the MW parser dev team, but I don't think your suggestion is operable. There is no shortage in well-designed, even wikitext like languages, that one could just pick and build a good editor for. Also, I'm sure we could come up with an even better one after a rational, collaborative work you suggest.
But then there are millions of pages already written in legacy wikitext and those must be editable with the new editor. So right now instead the rational approach, an empirical one should be taken - they have to rather ''find'' than invent a good enough model for those old articles, and also store everything in the old format. Not an enviable task, I feel I have to keep my fingers crossed for them all the time.
Best Mihály
On 8 February 2012 16:06, Oren Bochman orenbochman@gmail.com wrote:
I'm all for a modern WYSIWYG editor however it would still require an underlying syntax.
I disagree that that xhtml is a geek only storage format or that the current Wikisyntax has a lower learning curve. Hacking templates to overcome parser bugs is one of the worst experiences I've has as an editor.
I think that an xml subset is the ideal should be the underlying format. It's the best known technology, has mature development tools. It could be parsed to and written to most efficiently by browser, and even the editor could be simplified by using it.
A well designed format, would be easily transformed to and from other formats. (xslt == toOthers, domParser = from others. This could provide interoperability with other wikis format and a friendlier variant of the existing wiki markup.
A well designed format should be: easy to parse (read : unambiguous, won't require context or semantics to parse) would be possible to auto complete would permit gracefully error recovery without bothering the editor unless required. Would specify syntax errors and advise on corrections Would be fully learnable in a couple of hours...
If we put our heads together and come up with something like that we will make some real progress. I think a time out is need because the future == https://www.mediawiki.org/wiki/Future is unclear and developing the new editor without a design documents is just a way to perpetuate the problems of the current syntax.
Operation Manager E-mail: oren@romai-horizon.com Mobil: +36 30 866 6706
Római Horizon Kft. H-1039 Budapest Királyok útja 291. D. ép. fszt. 2. Tel: +36 1 492 1492 Fax: +36 1 266 5529
-----Original Message----- From: wikitext-l-bounces@lists.wikimedia.org [mailto:wikitext-l-bounces@lists.wikimedia.org] On Behalf Of Pavel Tkachenko Sent: Wednesday, February 08, 2012 1:32 PM To: Wikitext-l Subject: Re: [Wikitext-l] Markup syntax
Amir,
Your idea doesn't sound that utopian or crazy to me but, IMO, it has its weak points.
First, it's a superstition that XML is the only standard way of representing information. The fact that even after its heavy lobbying by the-company-we-all-know-about languages like YAML still appear means that not all people are happy with XML. Similarly, textile/markdown//bb-codes/wikitext and a dozen of others including latex, *nix man pages, etc. are appearing even after HTML has been around for decades.
What is a standard? This is a set of rules. Strict ABNF schemes. UML, if you please. Can you call Windows INI files "standard"? Yes, albeit they have just a few entities. And YAML? TeX? Yes. And PDF? EPS? Yes, and they're even unreadable by humans.
Similarly, wiki markup can be standardized. Creole is meant to be a standard but it's limited; however, the direction is right and can be voted for. I am ready to personally standardize and unificate wiki markup if only to prove my point.
Second, by dividing people into those who "can write texts using a visual editor" and those who "have to write texts using a storage format" you're making the same discrimination towards "geeks" that "geeks" are currently making towards "common folk" by providing nothing but a text field for writing articles.
Let's put this plain: XML and mostly (X)HTML (SGML at a whole) are storage formats. This is why they have namespaces, DTD and other features. But they are generic and while this is an advantage (even binary data can be stored in some form there) when it comes in touch with humans things break or just don't move.
This is because XML and friends are not problem-based solutions. While I have to agree that editing texts might be easier by some people using a rich editor I cannot agree that editing them in plain text form must be limited to storage formats. Have you tried hexediting an article? Having to perform codepage conversions (read, layout changes) in your mind at the same time. This is the same.
Going further into this looks like speaking about personal taste for colors and forms so I will just summarize it up: let's leave everyone with their tool. We have three groups of "users": machines, who process the text - they're fine with XML or BAML all alike; users, who need a visual editor to "parse markup" as was said on the neighbor thread; and someone in between, "geeks", who are enough humans to dislike XML and enough technicians to despise WYSIWYG.
This seems fair and not that big deal to implement because you'll get the first and last "markups" ready by definition to have a working parser (something to store trees in and something to input them using) and the middle (visual editor) will come in naturally given the other two.
Signed, P. Tkachenko
2012/2/8 Amir E. Aharoni amir.aharoni@mail.huji.ac.il:
Honestly, if i'm allowed to speak out my crazy optimistic utopian dream, then: <crazy-optimistic-utopian-dream>i want the current-style wiki markup to disappear completely. I'm referring to *, '''''', {{}}, [[]] etc. It was very beneficial for the beginning, because it was for the most part more intuitive to type than <ul><li></li></ul>, <strong></strong> and <a href=""></a>, but for people who want easiness, the Visual Editor is supposed to provide it and after that most of them should never look back to the markup.
For people who will want text-based markup, it should be mostly XHTML. So, <section>, <poem>, <source>, and <nowiki> are kinda XHTML so they can stay. *, '''''' and [[]] are not XHTML, and they can and should be replaced by XHTML, althogh. And {{}} needs its own markup, but it should be XHTML-like <template name="citation needed" />.
So there. My idea of a bright wikifuture is less home-grown parsers and more standards. It's easier for the developers and works organically with the browsers. It's not necessarily easier for people who want to write articles in plain text with markup, but hey, they asked for it.</crazy-optimistic-utopian-dream>
-- Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי http://aharoni.wordpress.com “We're living in pieces, I want to live in peace.” – T. Moore
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
On 02/08/2012 04:06 PM, Oren Bochman wrote:
I'm all for a modern WYSIWYG editor however it would still require an underlying syntax.
I disagree that that xhtml is a geek only storage format or that the current Wikisyntax has a lower learning curve. Hacking templates to overcome parser bugs is one of the worst experiences I've has as an editor.
I think that an xml subset is the ideal should be the underlying format. It's the best known technology, has mature development tools. It could be parsed to and written to most efficiently by browser, and even the editor could be simplified by using it.
This is actually what we are doing, so I take that as an encouragement ;)
We are building an HTML5 DOM, and plan to represent higher-level functionality as microdata. We discussed this last week on this list in the following thread:
http://thread.gmane.org/gmane.science.linguistics.wikipedia.wikitext/512
More information about the parser and DOM can be found at https://www.mediawiki.org/wiki/Future/Parser_development.
Also note that the HTML5 DOM can be serialized to XML. Links to relevant docs can be found in the notes section of https://www.mediawiki.org/wiki/Future/HTML5_DOM_with_microdata.
Cheers,
Gabriel
On 8 February 2012 15:06, Oren Bochman orenbochman@gmail.com wrote:
I think that an xml subset is the ideal should be the underlying format. It's the best known technology, has mature development tools. It could be parsed to and written to most efficiently by browser, and even the editor could be simplified by using it.
This is likely. But the magical key to the issue is: it doesn't actually matter.
The important thing about the eventual intermediate format is that it will be *properly defined* and can be manipulated. It will be things that are processed in a particular way.
This format could be stored in whatever format is convenient - XML, bytecode, cat GIFs, a version of the existing wikitext format - as a properly defined syntax, we can arbitrarily transform any version into any other version as needed.
Hopefully this is possible ...
- d.
On 08/02/12 11:20, Pavel Tkachenko wrote:
(Continuing the crunching. huh? But this message is only 4 pages long.)
Forked into the new thread.
I'm afraid latest maillist messages were considered TLDR by most of the subscribers so I will put this in the beginning: is there any point in continuing our discussion on the subject? Platonides is a constructive company but he seems to be the only one participating.
Is the community truly interested in reworking the markup?
I doubt so.
I have some knowledge and code assets that I will be happy to contribute; I will gladly take part in discussions or help improve the situation in some other way. But if Wikimedia team has different views onto the markup evolution it's fruitless to spend so much time chatting before the closed doors.
My reply follows.
On 08.02.2012 2:27, Platonides wrote:
Nobody proposed to change the template in that way? :)
You mean that nobody has actually studied markup usability?
I just mean that noone standed up and proposed "Hey, this would look better in this different way". Anyone can create new templates, with any name and parameters he wishes. So albeit their good intentions, they may not be the best way to do it. And templates are rarely refactorized.
(...)
So, how do you split {{About Bijection, injection and surjection}} ?
If that is supposed to be a long caption (4 words and a comma) then just use quotes - like in natural handwriting. {{About "Bijection, injection and surjection"}}
The point of using an additional character not used in normal language is precisely for working the metalanguage.
I disagree, it only means that this subject has not yet been researched enough.
Also, there are colons as parameters. How would you write as the parameter the article [[Gypsy: A Musical Fable]] or [[Batman: Year One]] ? By banning ':' in titles?
Have I said something about colons and links? Links are fine with colons or any other symbols.
You mentioned colons for template arguments I'm acting as the devil's advocate asking you how to provide those titles as parameters to a template.
Uh, I have mistyped "comma" instead of "colon". Let me correct this:
- {{About Something}}
- {{About Something, of kind}}
- {{About "Something, something and something", of kind}}
- {{About "Something, something and something", "of kind, kind and kind"}}
As you can see, no character is banned from the title (...)
What about the separator? Eg. [[The character "]]
But if we're touching this pipes in links are not that intuitive either. Pipes are actually not present on many keyboard layouts but even apart from that it's more natural to use an equality sign. Or double, for the purpose of text markup.
It's consistent with the use of pipes in templates (which do use equal in that way to name parameters). Although link syntax was probably earlier.
Right, and pipes should not appear in templates either. It's too special symbol.
Why so? So far the only reason you gave is that it's not on all keyboard layouts.
So is [[Batman Forever]] your syntax for [[Batman Forever|Batman Forever]] or [[Batman|Forever]] ? So much cases are bad, KISS.
I do not see your point. The processing is straightforward:
- Link contains == - it separates address from title.
- Link contains no == but contains a space - the first space separates
address from title. 3. There is neither == nor ' ' - link is titleless. This means that:
- local links get titles from page name, not page address (this is
important and differs from current MediaWiki implementation in a better way)
- remote links can also get their title from <title> after fetching
first 4 KiB of that page or something
No way. That can be good for preloading a title on link insertion and storing it indefinitely, but not for doing it every time.
Use cases:
- "[[http://google/search?q=%61%62%63 Google it]]" - for external links
== delimiter won't be used at all
- "See this [[page]]" - current wikitext is the same
- "See [[page that page]]" vs. current [[page|that page]]. Looks more
clean and easier to type (space is present on all keyboards and is quite large in size). This covers not less than half local links.
Only if pages with no spaces are more common than pages with spaces in the name. Taking enwiki articles as a sample: * 7746101 articles with space. * 1416235 articles without space.
So more often than not the wouldn't be useful. However, you could take advantage of the space-is-underscore, and use [[Some_page this page]] (but still not 'clean')
- "See [[Some page==this page]]" vs. current [[Some page|this page]].
This case has less drastic differences than previous 3 but a pipe is still both special to English layouts and less noticeable to human eye than double equality sign.
Does "KISS" mean that every use case should be created with uniform but because of this equally inconvenient syntax? I agree that more complex cases should have correspondingly more complex syntax but this scaling must be adequate. By placing pipe everywhere not only cross-language usability is reduced but the fact that it's redundant in some cases (#1 and #3 items above) is ignored.
- Finally, in very rare cases when both space and equality symbol
is necessary a special markup-wise (!) escape symbol can be used.
As an example: [[2 + 2 = 5]]
Your example contains no double equality symbol and is treated as space-separated title: [[2| + 2 = 5]] in current wikitext.
Would you remove === headings?
No, headings are consistent because the first heading starts with double equality sign.
Nitpicking, first heading has just one equal sign [at each side] :)
I'm not sure this is a good analogy. Copy-pasting chunks of code look like copying phrases from other articles to make your own. That should be original. OTOH, reusing the existing LaTeX template is much more appropiate than writing your own from scratch trying to copy the style of the provided one.
For such things templates must be created that will reduce the number of entities identical to all of their use cases to minimum. In MediaWiki this is done using {{templates and=parameters}} and this is good. If you were talking about copy-pasting these templates, their parameters and empty values - this is fine. But if it was about copy-pasting the same code with all rendering tricks ( , {{iejrhgy}} and other cryptic things) - this is bad.
Yes, i mean about copy-pasting templates, and then the users changing the parameters to a more suitable value. You can't expect people to just start writing and think of "I'll write {{Infobox Planet}}" :). Copying the article headings can also be useful.
Even if I write a program from scratch, I should make it consistent with other tools. That means an appropiate arguments would be sort -r --ignore-case --sort=month ./myfile instead of sort<- !case (sort as month) \\./myfile\\
Standardizing is fine unless it starts looking unnatural. The following example might be argued but I can't think of another one quickly: tar -czf file.tar.gz .
Not a bad example, as that's one of those utilities with odd parameters "There are three styles for writing operations and options to the command line invoking `tar'. The different styles were developed at different times during the history of `tar'..." (exract of GNU tar info page)
While this uses standard CLI syntax is in true *nix ideology this is what (among other things) separate POSIX from Windows. For instance, I could write: tar file.tar.gz .
...and the program will detect -czf arguments on its own based on -f is simply implied because there are 2 unnamed arguments (without leading -X) -c target file doesn't exist -z target file has extension .gz
That's a source of problems. It's fine having dumb programs that you need to walk-through. When the programs are smart, if they don't go up to the leve, that's an issue. http://arstechnica.com/apple/reviews/2011/07/mac-os-x-10-7.ars/5#address-boo...
In this case "tar file.tar.gz /mydata" sometimes could create a backup called file.tar.gz, and sometimes could put file.tar.gz and /mydata inside a newtar.out file. (I'm assy¡uming it's refusing to implicitely send it to stdout when it's a tty).
It's the same with templates or other markup: while {{About page=Earth kind=planet}} or something similar is fine, {{About Earth, planet}} or some other form is more appropriate in this particular use case.
You are giving many attributions to the machine. Personally, I would spit out an error, just in they were eg. in different units.
Yes, this is one of the ways and I would opt for it if we want to have a strict syntax.
But you are making up your syntax, then requiring the system to adapt for you.
Can you elaborate more on this point?
The goal of wikitext is to make html editing easy.
HTML editing? I thought wikitext was about text editing. Why not edit HTML using HTML?
Because it's considered cumbersome. (Actually, it's presentational editing, but as the presentation is obtained by using HTML as an intermediate language...)
Indeed, HTML is cumbersome, that's why wikitext and all other text markups have been invented. But they don't have to copy HTML syntax - just the opposite.
And you have complicated the originally clean syntax of 1, 2, 3
Clean syntax for whom? For Englishmen? And are hashes actually clean? If so, why don't we use them in our e-mail messages?
No. Clean syntax of 1. Foo 2. Bar 3. Baz
Would html links become italic? (that was a problem of wikicreole, it was defined as 'italic unless in links')
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
Oh, you're not autolinking urls.
And yes, context-specific grammar is more than regular expressions can handle. Regexps are good but this doesn't mean anything incompatible with sed is beyond "too complex".
As already mentioned, I am using my own markup processor written in PHP on my projects and it implements all markup already described including the [[http://italic]] (context-specific grammar) case. And its parsing loop is under 350 lines of code.
Well, I have to say it seems well though, it "doesn't look bad".
Thank you. I have given it a lot of thinking and practice but I'm sure there still are things to improve. I would be ecstatic if my experience can help the world's largest free knowledge community.
Thanks again for your mail, Platonides.
Signed, P. Tkachenko
On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder hedermisi@gmail.com
If wikitext is going to be replaced the new language should be designed on an abstract level first.
This is correct but if we're talking about a universal DOM that could represent all potential syntax and has space for extensions (nodes of the new type can be safely added in future) then new markup can be discussed in general terms before the DOM itself.
It doesn't really matter unless we start correlating DOM and markup - then it will be time for BNFs.
So the real question is whether a new-gen wiki syntax will be compatible with a consensual data model we might have in the future.
I don't think it's a good idea to design wiki DOM and new wiki syntax separately, otherwise it'll be the same trouble current wikitext is stuck in.
The real problem is whether the core devs is interested in new markup at all or not. I don't think anything difficult in designing new DOM except a few tricky places (templates, inclusions) but it should not take another year to be complete, definitely not.
On Wed, 8 Feb 2012 07:42:33 -0700, Stanton McCandlish smccandlish@gmail.com wrote:
I'm a "geek" and do not "dislike" or "despise" XML/[X]HTML or WYSIWYG or wikimarkup. They all have their uses for different users and even the same user in different situations for different purposes.
Indeed, but my point was that XML is hardly more usable in text editing environments than some convenient wiki markup. You seem to agree with this later in your text.
applying a 'class="work-title"', would produce factually incorrect output (i.e., in one context at least, outright *corrupt data*) that said that a comma-space was part of the title of the work.
This is because wikitext allows dealing with underlying/resulting HTML on low-level. A proper markup must abstract the user out of everything so he can't just insert a tag wherever he feels is pertinent. If a user does need to insert a tag then the markup is not well-planned and must be corrected.
This will both increase security (XSS prevention, etc.), uniformity (someone writes <b>, someone <strong>) and portability - this in particular because this is why current wikitext is so problematic with all of its low-level HTML stuff that must be transformed on upgrade.
It's crucial that I be able to tweak stuff at the character-by-character level, and alter the markup around that content in any way I need to.
Good point. Also, in text-only environments text tools like Search & Replace can be used and not only to edit text itself but its markup as well.
But for actual article drafting, in prose sentences and paragraphs, as opposed to tweaking, I vastly prefer WYSIWYG. I seriously doubt I'm alone in any of this, even in the combination of preferences I've outlined.
This might be true. This only seconds the point everybody seems to agree on - to have both markup- and WYSIWYG editors in one place.
On Wed, 8 Feb 2012 16:06:55 +0100, "Oren Bochman" orenbochman@gmail.com
I disagree that that xhtml is a geek only storage format or that the current Wikisyntax has a lower learning curve.
This is exactly the problem of current wikitext. I would compare it with C++ and "ideal" wiki markup - with Pascal or even BASIC.
I think that an xml subset is the ideal should be the underlying format.
Underlying format NOT MEANT for human interaction directly. Not by non-geeks. This is what I meant under "storage format".
This could provide interoperability with other wikis format and a friendlier variant of the existing wiki markup.
Good point.
easy to parse (unambiguous, won't require context or semantics to parse)
This definition should be extended to "context-specific" because some items might be ambiguous but used in different places. For example, anything inside a code block is unprocessed can can be as ambigous as the editor desires - this is the point of a code block. It only needs to have a proper end token.
Would be fully learnable in a couple of hours...
Starting editor should be able to learn the new markup in 5 minutes. Or have all of its basic formatting listed under a small help box.
If we put our heads together and come up with something like that we will make some real progress.
This is what I'm trying to push here. One point that keeps me from starting doing this myself and presenting the results is whether this research will actually be used by the MediaWiki team - Gabriel says it's "all planned" which I read "things just won't get worse".
On Wed, 8 Feb 2012 16:27:57 +0100, Mihaly Heder hedermisi@gmail.com
But then there are millions of pages already written in legacy wikitext and those must be editable with the new editor. So right now instead the rational approach, an empirical one should be taken - they have to rather ''find'' than invent a good enough model for those old articles, and also store everything in the old format.
This is bad practice. I agree that the amount of pages written in "outdated" markup is overwhelming; however, this only means that the migration layer must be well-tested and thoroughly written, nothing else. If you will "find" with a "good enough model" you will end up with the same millions of pages (or more by that time) that will hopefully use slightly better markup.
After all, even if some hundreds of wiki pages cannot be converted in completely automatic mode Wikipedia/WMF has enough staff to fix this in a sane period.
At this time a cardinal action must be taken to eliminate old syntax completely, one and for all. Otherwise the same discussion will arise several years later (after several more years of "searching for a model").
I just mean that noone standed up and proposed "Hey, this would look better in this different way".
All those millions of people who edited pages didn't think this actually looks wrong? I doubt it very much, perhaps there was just no place to tell their thoughts or there was no one to hear because "this is fine for amateurs".
Anyone can create new templates, with any name and parameters he wishes.
Templates are powerful but widely abused feature since they can be used to hide parser/markup bugs. I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
- {{About "Something, something and something", of kind}}
As you can see, no character is banned from the title (...)
What about the separator? Eg. [[The character "]]
Nothing, it's fine. Two options exist for the parser: 1. Either it treats all " as starting a new context and thus [[The character "]]"]] actually creates a link with caption <The character "]]">. 2. Or it treats ]] as an ultimate token and standalone " is output as is.
Right, and pipes should not appear in templates either. It's too special symbol.
Why so? So far the only reason you gave is that it's not on all keyboard layouts.
And is not used in most languages, yes. Is it bad enough reason? Why choose it for an international project like MediaWiki, if there are alternatives?
- remote links can also get their title from <title> after fetching
first 4 KiB of that page or something
No way. That can be good for preloading a title on link insertion and storing it indefinitely, but not for doing it every time.
Of course not every time, the engine might maintain the cache with remote links or somehow else alleviate the traffic. And it can be disabled and then the parser wil use some other means of generating title for titleless external links.
Only if pages with no spaces are more common than pages with spaces in the name. Taking enwiki articles as a sample:
- 7746101 articles with space.
- 1416235 articles without space.
Thanks for the statistics. Well, then my point about "half of the cases" isn't fair; however, this doesn't change the fact the pipe isn't as universal as double equality sign which can still typed with the same speed and is less prone to misoperations because it's double and has less chances to appear in-text.
However, you could take advantage of the space-is-underscore, and use [[Some_page this page]] (but still not 'clean')
Yes, this is not very clean and relies on parser/engine behavior. It should be fine to have "Some page" and "Some_page" as two different pages.
Nitpicking, first heading has just one equal sign [at each side] :)
And this is the problem. Even DokuWiki uses not less than 2 "=" for headings. First-level heading appears so rare in the document that it can have "==". Actually, since a document has just one first-level heading all others (2+) can use two "==" as well because there's no sense in creating second-level heading before the document title (first-level).
I think MediaWiki currently lets the user create even the 6th level heading before the doc title?
Standardizing is fine unless it starts looking unnatural. The following example might be argued but I can't think of another one quickly: tar -czf file.tar.gz .
Not a bad example, as that's one of those utilities with odd parameters "... The different styles were developed at different times ..."
Yes, you've got my idea.
That's a source of problems. It's fine having dumb programs that you need to walk-through. When the programs are smart, if they don't go up to the leve, that's an issue.
This is true but this just requires more conscious developer. Nobody will argue that it's harder to write smart programs that dump programs following certain preexisting conventions (e.g. cryptic *nix CL interface that can explain everything or nearly so).
When designing a text markup why should we follow bad guidelines?
No. Clean syntax of
- Foo
- Bar
- Baz
This is the syntax I have suggested for ordered lists earlier. "1. 1. 1." only compliments it.
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
Oh, you're not autolinking urls.
I didn't really understand that.
Well, it seems like the thread ends here.
Signed, P. Tkachenko
On 02/12/2012 12:34 PM, Pavel Tkachenko wrote:
On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder hedermisi@gmail.com
If wikitext is going to be replaced the new language should be designed on an abstract level first.
This is correct but if we're talking about a universal DOM that could represent all potential syntax and has space for extensions (nodes of the new type can be safely added in future) then new markup can be discussed in general terms before the DOM itself.
HTML DOM provides a lot of extension mechanisms, but that should not really be needed if all you want to do is to provide a different text-based user interface (syntax) to edit the same data.
If you would however like to change or extend the data model itself, then these extension mechanisms will become more important. I am quite optimistic that HTML DOM as a tree with attributes can support a wide range of new features. If you have an example of a useful data model extension that does not fit into HTML DOM, then I would be very interested in hearing about it.
Gabriel
Hello,
Um, I see that you disagree, but I don't think that you proposed arguments that actually support your position.
On 12 February 2012 12:34, Pavel Tkachenko proger.xp@gmail.com wrote:
On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder hedermisi@gmail.com
If wikitext is going to be replaced the new language should be designed on an abstract level first.
This is correct but if we're talking about a universal DOM that could represent all potential syntax and has space for extensions (nodes of the new type can be safely added in future) then new markup can be discussed in general terms before the DOM itself.
I don't think so. They are not talking about DOM in general, which in itself is not even context free. They have to design a language that can be represented in DOM, and have a fixed set of language constructs and therefore it is context free. Without that they cannot make a new parser work.
It doesn't really matter unless we start correlating DOM and markup - then it will be time for BNFs.
If they don't correlate the DOM with the markup then what is the point of the DOM language? Also, in the case of the old grammar BNF won't be the way to go. The correlation will happen in custom parser code, and this is unavoidable.
So the real question is whether a new-gen wiki syntax will be compatible with a consensual data model we might have in the future.
I don't think it's a good idea to design wiki DOM and new wiki syntax separately, otherwise it'll be the same trouble current wikitext is stuck in.
I don't think that this remark is relevant, as they are not designing a new wiki syntax. They have to keep the old one.
The real problem is whether the core devs is interested in new markup at all or not. I don't think anything difficult in designing new DOM except a few tricky places (templates, inclusions) but it should not take another year to be complete, definitely not.
The muscle is in the parser that can completely parse the old syntax into the new DOM language. BNFs won't solve that. I don't know this team in person, so I cannot judge their capabilities. But I can tell you that here in Budapest we had a really talented MSc student working on such a thing for about a year and we could not get even close to 100% compatibility (not even 90%...)
On Wed, 8 Feb 2012 07:42:33 -0700, Stanton McCandlish smccandlish@gmail.com wrote:
I'm a "geek" and do not "dislike" or "despise" XML/[X]HTML or WYSIWYG or wikimarkup. They all have their uses for different users and even the same user in different situations for different purposes.
Indeed, but my point was that XML is hardly more usable in text editing environments than some convenient wiki markup. You seem to agree with this later in your text.
applying a 'class="work-title"', would produce factually incorrect output (i.e., in one context at least, outright *corrupt data*) that said that a comma-space was part of the title of the work.
This is because wikitext allows dealing with underlying/resulting HTML on low-level. A proper markup must abstract the user out of everything so he can't just insert a tag wherever he feels is pertinent. If a user does need to insert a tag then the markup is not well-planned and must be corrected.
This will both increase security (XSS prevention, etc.), uniformity (someone writes <b>, someone <strong>) and portability - this in particular because this is why current wikitext is so problematic with all of its low-level HTML stuff that must be transformed on upgrade.
It's crucial that I be able to tweak stuff at the character-by-character level, and alter the markup around that content in any way I need to.
Good point. Also, in text-only environments text tools like Search & Replace can be used and not only to edit text itself but its markup as well.
But for actual article drafting, in prose sentences and paragraphs, as opposed to tweaking, I vastly prefer WYSIWYG. I seriously doubt I'm alone in any of this, even in the combination of preferences I've outlined.
This might be true. This only seconds the point everybody seems to agree on - to have both markup- and WYSIWYG editors in one place.
On Wed, 8 Feb 2012 16:06:55 +0100, "Oren Bochman" orenbochman@gmail.com
I disagree that that xhtml is a geek only storage format or that the current Wikisyntax has a lower learning curve.
This is exactly the problem of current wikitext. I would compare it with C++ and "ideal" wiki markup - with Pascal or even BASIC.
I think that an xml subset is the ideal should be the underlying format.
Underlying format NOT MEANT for human interaction directly. Not by non-geeks. This is what I meant under "storage format".
This could provide interoperability with other wikis format and a friendlier variant of the existing wiki markup.
Good point.
easy to parse (unambiguous, won't require context or semantics to parse)
This definition should be extended to "context-specific" because some items might be ambiguous but used in different places. For example, anything inside a code block is unprocessed can can be as ambigous as the editor desires - this is the point of a code block. It only needs to have a proper end token.
Would be fully learnable in a couple of hours...
Starting editor should be able to learn the new markup in 5 minutes. Or have all of its basic formatting listed under a small help box.
If we put our heads together and come up with something like that we will make some real progress.
This is what I'm trying to push here. One point that keeps me from starting doing this myself and presenting the results is whether this research will actually be used by the MediaWiki team - Gabriel says it's "all planned" which I read "things just won't get worse".
On Wed, 8 Feb 2012 16:27:57 +0100, Mihaly Heder hedermisi@gmail.com
But then there are millions of pages already written in legacy wikitext and those must be editable with the new editor. So right now instead the rational approach, an empirical one should be taken - they have to rather ''find'' than invent a good enough model for those old articles, and also store everything in the old format.
This is bad practice. I agree that the amount of pages written in "outdated" markup is overwhelming; however, this only means that the migration layer must be well-tested and thoroughly written, nothing else. If you will "find" with a "good enough model" you will end up with the same millions of pages (or more by that time) that will hopefully use slightly better markup.
It's not only the documents, it's also the user base which want to work with the usual syntax. You cannot just migrate them.
After all, even if some hundreds of wiki pages cannot be converted in completely automatic mode Wikipedia/WMF has enough staff to fix this in a sane period.
At this time a cardinal action must be taken to eliminate old syntax completely, one and for all. Otherwise the same discussion will arise several years later (after several more years of "searching for a model").
I don't see why would it surface years later, it they replace the parser now. Once the old parser is replaced with the new one, we can work with the DOM model and even create new markup with BNF-s as you suggest. And the old will work, too.
Mihály
I just mean that noone standed up and proposed "Hey, this would look better in this different way".
All those millions of people who edited pages didn't think this actually looks wrong? I doubt it very much, perhaps there was just no place to tell their thoughts or there was no one to hear because "this is fine for amateurs".
Anyone can create new templates, with any name and parameters he wishes.
Templates are powerful but widely abused feature since they can be used to hide parser/markup bugs. I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
- {{About "Something, something and something", of kind}}
As you can see, no character is banned from the title (...)
What about the separator? Eg. [[The character "]]
Nothing, it's fine. Two options exist for the parser:
- Either it treats all " as starting a new context and thus [[The
character "]]"]] actually creates a link with caption <The character "]]">. 2. Or it treats ]] as an ultimate token and standalone " is output as is.
Right, and pipes should not appear in templates either. It's too special symbol.
Why so? So far the only reason you gave is that it's not on all keyboard layouts.
And is not used in most languages, yes. Is it bad enough reason? Why choose it for an international project like MediaWiki, if there are alternatives?
* remote links can also get their title from <title> after fetching first 4 KiB of that page or something
No way. That can be good for preloading a title on link insertion and storing it indefinitely, but not for doing it every time.
Of course not every time, the engine might maintain the cache with remote links or somehow else alleviate the traffic. And it can be disabled and then the parser wil use some other means of generating title for titleless external links.
Only if pages with no spaces are more common than pages with spaces in the name. Taking enwiki articles as a sample:
- 7746101 articles with space.
- 1416235 articles without space.
Thanks for the statistics. Well, then my point about "half of the cases" isn't fair; however, this doesn't change the fact the pipe isn't as universal as double equality sign which can still typed with the same speed and is less prone to misoperations because it's double and has less chances to appear in-text.
However, you could take advantage of the space-is-underscore, and use [[Some_page this page]] (but still not 'clean')
Yes, this is not very clean and relies on parser/engine behavior. It should be fine to have "Some page" and "Some_page" as two different pages.
Nitpicking, first heading has just one equal sign [at each side] :)
And this is the problem. Even DokuWiki uses not less than 2 "=" for headings. First-level heading appears so rare in the document that it can have "==". Actually, since a document has just one first-level heading all others (2+) can use two "==" as well because there's no sense in creating second-level heading before the document title (first-level).
I think MediaWiki currently lets the user create even the 6th level heading before the doc title?
Standardizing is fine unless it starts looking unnatural. The following example might be argued but I can't think of another one quickly: tar -czf file.tar.gz .
Not a bad example, as that's one of those utilities with odd parameters "... The different styles were developed at different times ..."
Yes, you've got my idea.
That's a source of problems. It's fine having dumb programs that you need to walk-through. When the programs are smart, if they don't go up to the leve, that's an issue.
This is true but this just requires more conscious developer. Nobody will argue that it's harder to write smart programs that dump programs following certain preexisting conventions (e.g. cryptic *nix CL interface that can explain everything or nearly so).
When designing a text markup why should we follow bad guidelines?
No. Clean syntax of
- Foo
- Bar
- Baz
This is the syntax I have suggested for ordered lists earlier. "1. 1. 1." only compliments it.
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
Oh, you're not autolinking urls.
I didn't really understand that.
Well, it seems like the thread ends here.
Signed, P. Tkachenko
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l
Okay, okay, guys, I give up. Either I'm misunderstanding some crucial bit or I'm misunderstanding the concept of wiki and programming at a whole. But in any case this is unimportant.
One thing I can agree with is the latest David Gerard's message, though.
Signed, P. Tkachenko
2012/2/12 Mihály Héder hedermisi@gmail.com:
Hello,
Um, I see that you disagree, but I don't think that you proposed arguments that actually support your position.
On 12 February 2012 12:34, Pavel Tkachenko proger.xp@gmail.com wrote:
On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder hedermisi@gmail.com
If wikitext is going to be replaced the new language should be designed on an abstract level first.
This is correct but if we're talking about a universal DOM that could represent all potential syntax and has space for extensions (nodes of the new type can be safely added in future) then new markup can be discussed in general terms before the DOM itself.
I don't think so. They are not talking about DOM in general, which in itself is not even context free. They have to design a language that can be represented in DOM, and have a fixed set of language constructs and therefore it is context free. Without that they cannot make a new parser work.
It doesn't really matter unless we start correlating DOM and markup - then it will be time for BNFs.
If they don't correlate the DOM with the markup then what is the point of the DOM language? Also, in the case of the old grammar BNF won't be the way to go. The correlation will happen in custom parser code, and this is unavoidable.
So the real question is whether a new-gen wiki syntax will be compatible with a consensual data model we might have in the future.
I don't think it's a good idea to design wiki DOM and new wiki syntax separately, otherwise it'll be the same trouble current wikitext is stuck in.
I don't think that this remark is relevant, as they are not designing a new wiki syntax. They have to keep the old one.
The real problem is whether the core devs is interested in new markup at all or not. I don't think anything difficult in designing new DOM except a few tricky places (templates, inclusions) but it should not take another year to be complete, definitely not.
The muscle is in the parser that can completely parse the old syntax into the new DOM language. BNFs won't solve that. I don't know this team in person, so I cannot judge their capabilities. But I can tell you that here in Budapest we had a really talented MSc student working on such a thing for about a year and we could not get even close to 100% compatibility (not even 90%...)
On Sun, Feb 12, 2012 at 09:34, Pavel Tkachenko proger.xp@gmail.com wrote:
Starting editor should be able to learn the new markup in 5 minutes. Or have all of its basic formatting listed under a small help box.
For the record: this is already provided by WikiEditor extension for current markup. Take a look at this screenshot: http://commons.wikimedia.org/wiki/File:Toolbar_with_help_section_of_vector_s...
On 12/02/12 12:34, Pavel Tkachenko wrote:
I just mean that noone standed up and proposed "Hey, this would look better in this different way".
All those millions of people who edited pages didn't think this actually looks wrong? I doubt it very much, perhaps there was just no place to tell their thoughts or there was no one to hear because "this is fine for amateurs".
I don't think so. Some may have thought "How odd", but probably few thought "this is wrong", as that requires a deeper knowledge, and when you start you don't usually enter challenging everything. For instance, the traffic sign showing you a man inside a circle marks paths were pedestrians SHALL NOT go. Does everybody get confused at that when learning how to drive? No. You just accept that and move on.
Anyone can create new templates, with any name and parameters he wishes.
Templates are powerful but widely abused feature since they can be used to hide parser/markup bugs. I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
The concept of template is for transcluding content which is stored once, used many times. It is wrong to need a dev to create a template for listing the stars of galaxy XYZ, or the emperors of the Roman Empire. Yes, they are abused, many times simplifying complex constructs. To defense them, they weren't expected to be used for conditional programming. It was a technique discovered by the wikipedians themselves.
- {{About "Something, something and something", of kind}}
As you can see, no character is banned from the title (...)
What about the separator? Eg. [[The character "]]
Nothing, it's fine. Two options exist for the parser:
- Either it treats all " as starting a new context and thus [[The
character "]]"]] actually creates a link with caption <The character "]]">. 2. Or it treats ]] as an ultimate token and standalone " is output as is.
Would you write {{About "The character "]]", page 2", of kind}} ?
Right, and pipes should not appear in templates either. It's too special symbol.
Why so? So far the only reason you gave is that it's not on all keyboard layouts.
And is not used in most languages, yes. Is it bad enough reason? Why choose it for an international project like MediaWiki, if there are alternatives?
That it's not used in most languages is a feature. And its nonetheless present on many keyboards, due to its usage in C, the shell...
Looking at the ~70 layouts at http://en.wikipedia.org/wiki/Keyboard_layout it seems present on most of them, with the exceptions being Bulgarian, Russian, Serbian, Ukrainian, Sinhalese, Thai, Dzongkha
A 10% is higher than desired, but seems fitted enough. Also note that Sinhalese, Thai or Dzongkha don't contain symbols like ", that you are proposing as alternatives (yes, typing wikitext there must be a nightmare).
- remote links can also get their title from <title> after fetching
first 4 KiB of that page or something
No way. That can be good for preloading a title on link insertion and storing it indefinitely, but not for doing it every time.
Of course not every time, the engine might maintain the cache with remote links or somehow else alleviate the traffic. And it can be disabled and then the parser wil use some other means of generating title for titleless external links.
It's not just a issue of adding a caching layer. You can't show content stored elsewhere. Suppose you added to an article about a Queen a link titled "Interview with X". Then later you change the title to "X is a bitch". The change is the fault of the target document author, but you failed miserably. That's while it can be suggested when loading the article, it should be stored with the link.
Only if pages with no spaces are more common than pages with spaces in the name. Taking enwiki articles as a sample:
- 7746101 articles with space.
- 1416235 articles without space.
Thanks for the statistics. Well, then my point about "half of the cases" isn't fair; however, this doesn't change the fact the pipe isn't as universal as double equality sign which can still typed with the same speed and is less prone to misoperations because it's double and has less chances to appear in-text.
Double equality sign is only popular among programmers.
I think MediaWiki currently lets the user create even the 6th level heading before the doc title?
Yes, header hierarchy is not enforced by the software, although I don't think I have been seen it misused.
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
Oh, you're not autolinking urls.
I didn't really understand that.
If you write http://www.google.com in your syntax, that isn't a link, but just text. By autolinking I mean that urls in the text are automatically converted into links.
On Sun, Feb 12, 2012 at 4:34 AM, Pavel Tkachenko proger.xp@gmail.comwrote:
I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
You can't be serious. Half the useful work at I do at en.wikipedia is template coding, to do all sorts of useful things. If I had to apply though some anal process of developer supplication to get something like {{Shamos 1999}} or {{Rp}} or {{Cue sports nav}} set up, I would simply quit the project. I don't mean that in a histrionic WP:DIVA way, I mean it would just frustrate me too much to not be able to fix things, expediently, that need to be fixed and which have to do with usage of the tools and writing a wiki, not with the underlying coding of the tools themelves. It's exceedingly presumptuous of developers of tools to suggest they should be in control of how people use the tools and work internally around their limitations. Which are often severe (see {{bq}} and {{gloss}}, especially the latter's subpage of test cases showing how awful MW's parsing of definition lists is when not overridden with explicit HTML). Note: All of what I'm referring to is at en.wikipedia.org.
-- Stanton McCandlish McCandlish Consulting 9505 Tanoan Dr NE Albuquerque NM 87111-5836
505 715-7650
On 13 February 2012 22:07, Stanton McCandlish smccandlish@gmail.com wrote:
On Sun, Feb 12, 2012 at 4:34 AM, Pavel Tkachenko proger.xp@gmail.com wrote:
I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
You can't be serious. Half the useful work at I do at en.wikipedia is template coding, to do all sorts of useful things. If I had to apply though some anal process of developer supplication to get something like {{Shamos 1999}} or {{Rp}} or {{Cue sports nav}} set up, I would simply quit the project. I don't mean that in a histrionic WP:DIVA way, I mean it would just frustrate me too much to not be able to fix things, expediently, that need to be fixed and which have to do with usage of the tools and writing a wiki, not with the underlying coding of the tools themelves. It's exceedingly presumptuous of developers of tools to suggest they should be in control of how people use the tools and work internally around their limitations. Which are often severe (see {{bq}} and {{gloss}}, especially the latter's subpage of test cases showing how awful MW's parsing of definition lists is when not overridden with explicit HTML). Note: All of what I'm referring to is at en.wikipedia.org.
Yes. The fundamental reason people do unspeakable things with ParserFunctions is precisely so they don't need to go through a gatekeeper. This is a feature, even if the ParserFunctions syntax isn't.
(Every domain-specific language that is allowed to become Turing-complete turns into brainfuck.)
- d.
On 13 February 2012 22:07, Stanton McCandlish smccandlish@gmail.com wrote:
On Sun, Feb 12, 2012 at 4:34 AM, Pavel Tkachenko proger.xp@gmail.com wrote:
I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
You can't be serious. Half the useful work at I do at en.wikipedia is template coding, to do all sorts of useful things. If I had to apply
though
some anal process of developer supplication to get something like
{{Shamos
1999}} or {{Rp}} or {{Cue sports nav}} set up, I would simply quit the project.
[snip] On Mon, Feb 13, 2012 at 2:15 PM, David Gerard dgerard@gmail.com wrote:
Yes. The fundamental reason people do unspeakable things with ParserFunctions is precisely so they don't need to go through a gatekeeper. This is a feature, even if the ParserFunctions syntax isn't.
This is one of the reasons we're starting on Lua support for templating; being able to use a real scripting language when it's called for, while keeping it sandboxed to avoid gatekeeping bottlenecks is going to be a big step up from (ab)using what was originally meant as a very simple substitution-based templating language to do the same things but really really hard.
-- brion
On 12-02-13 02:34 PM, Brion Vibber wrote:
Yes. The fundamental reason people do unspeakable things with ParserFunctions is precisely so they don't need to go through a gatekeeper. This is a feature, even if the ParserFunctions syntax isn't.
This is one of the reasons we're starting on Lua support for templating; being able to use a real scripting language when it's called for, while keeping it sandboxed to avoid gatekeeping bottlenecks is going to be a big step up from (ab)using what was originally meant as a very simple substitution-based templating language to do the same things but really really hard.
-- brion
{{qif}} happens when dev are unresponsive to the community, iirc. Everyone involved had the best of intentions, but with different visions, priorities... friction tends to result.
I am sure with hindsight and experience we will avoid some of the issues faced (and largely overcome) in the past.
Amgine
wikitext-l@lists.wikimedia.org