Hello,
Um, I see that you disagree, but I don't think that you proposed arguments that actually support your position.
On 12 February 2012 12:34, Pavel Tkachenko proger.xp@gmail.com wrote:
On Wed, 8 Feb 2012 15:20:41 +0100, Mihaly Heder hedermisi@gmail.com
If wikitext is going to be replaced the new language should be designed on an abstract level first.
This is correct but if we're talking about a universal DOM that could represent all potential syntax and has space for extensions (nodes of the new type can be safely added in future) then new markup can be discussed in general terms before the DOM itself.
I don't think so. They are not talking about DOM in general, which in itself is not even context free. They have to design a language that can be represented in DOM, and have a fixed set of language constructs and therefore it is context free. Without that they cannot make a new parser work.
It doesn't really matter unless we start correlating DOM and markup - then it will be time for BNFs.
If they don't correlate the DOM with the markup then what is the point of the DOM language? Also, in the case of the old grammar BNF won't be the way to go. The correlation will happen in custom parser code, and this is unavoidable.
So the real question is whether a new-gen wiki syntax will be compatible with a consensual data model we might have in the future.
I don't think it's a good idea to design wiki DOM and new wiki syntax separately, otherwise it'll be the same trouble current wikitext is stuck in.
I don't think that this remark is relevant, as they are not designing a new wiki syntax. They have to keep the old one.
The real problem is whether the core devs is interested in new markup at all or not. I don't think anything difficult in designing new DOM except a few tricky places (templates, inclusions) but it should not take another year to be complete, definitely not.
The muscle is in the parser that can completely parse the old syntax into the new DOM language. BNFs won't solve that. I don't know this team in person, so I cannot judge their capabilities. But I can tell you that here in Budapest we had a really talented MSc student working on such a thing for about a year and we could not get even close to 100% compatibility (not even 90%...)
On Wed, 8 Feb 2012 07:42:33 -0700, Stanton McCandlish smccandlish@gmail.com wrote:
I'm a "geek" and do not "dislike" or "despise" XML/[X]HTML or WYSIWYG or wikimarkup. They all have their uses for different users and even the same user in different situations for different purposes.
Indeed, but my point was that XML is hardly more usable in text editing environments than some convenient wiki markup. You seem to agree with this later in your text.
applying a 'class="work-title"', would produce factually incorrect output (i.e., in one context at least, outright *corrupt data*) that said that a comma-space was part of the title of the work.
This is because wikitext allows dealing with underlying/resulting HTML on low-level. A proper markup must abstract the user out of everything so he can't just insert a tag wherever he feels is pertinent. If a user does need to insert a tag then the markup is not well-planned and must be corrected.
This will both increase security (XSS prevention, etc.), uniformity (someone writes <b>, someone <strong>) and portability - this in particular because this is why current wikitext is so problematic with all of its low-level HTML stuff that must be transformed on upgrade.
It's crucial that I be able to tweak stuff at the character-by-character level, and alter the markup around that content in any way I need to.
Good point. Also, in text-only environments text tools like Search & Replace can be used and not only to edit text itself but its markup as well.
But for actual article drafting, in prose sentences and paragraphs, as opposed to tweaking, I vastly prefer WYSIWYG. I seriously doubt I'm alone in any of this, even in the combination of preferences I've outlined.
This might be true. This only seconds the point everybody seems to agree on - to have both markup- and WYSIWYG editors in one place.
On Wed, 8 Feb 2012 16:06:55 +0100, "Oren Bochman" orenbochman@gmail.com
I disagree that that xhtml is a geek only storage format or that the current Wikisyntax has a lower learning curve.
This is exactly the problem of current wikitext. I would compare it with C++ and "ideal" wiki markup - with Pascal or even BASIC.
I think that an xml subset is the ideal should be the underlying format.
Underlying format NOT MEANT for human interaction directly. Not by non-geeks. This is what I meant under "storage format".
This could provide interoperability with other wikis format and a friendlier variant of the existing wiki markup.
Good point.
easy to parse (unambiguous, won't require context or semantics to parse)
This definition should be extended to "context-specific" because some items might be ambiguous but used in different places. For example, anything inside a code block is unprocessed can can be as ambigous as the editor desires - this is the point of a code block. It only needs to have a proper end token.
Would be fully learnable in a couple of hours...
Starting editor should be able to learn the new markup in 5 minutes. Or have all of its basic formatting listed under a small help box.
If we put our heads together and come up with something like that we will make some real progress.
This is what I'm trying to push here. One point that keeps me from starting doing this myself and presenting the results is whether this research will actually be used by the MediaWiki team - Gabriel says it's "all planned" which I read "things just won't get worse".
On Wed, 8 Feb 2012 16:27:57 +0100, Mihaly Heder hedermisi@gmail.com
But then there are millions of pages already written in legacy wikitext and those must be editable with the new editor. So right now instead the rational approach, an empirical one should be taken - they have to rather ''find'' than invent a good enough model for those old articles, and also store everything in the old format.
This is bad practice. I agree that the amount of pages written in "outdated" markup is overwhelming; however, this only means that the migration layer must be well-tested and thoroughly written, nothing else. If you will "find" with a "good enough model" you will end up with the same millions of pages (or more by that time) that will hopefully use slightly better markup.
It's not only the documents, it's also the user base which want to work with the usual syntax. You cannot just migrate them.
After all, even if some hundreds of wiki pages cannot be converted in completely automatic mode Wikipedia/WMF has enough staff to fix this in a sane period.
At this time a cardinal action must be taken to eliminate old syntax completely, one and for all. Otherwise the same discussion will arise several years later (after several more years of "searching for a model").
I don't see why would it surface years later, it they replace the parser now. Once the old parser is replaced with the new one, we can work with the DOM model and even create new markup with BNF-s as you suggest. And the old will work, too.
Mihály
I just mean that noone standed up and proposed "Hey, this would look better in this different way".
All those millions of people who edited pages didn't think this actually looks wrong? I doubt it very much, perhaps there was just no place to tell their thoughts or there was no one to hear because "this is fine for amateurs".
Anyone can create new templates, with any name and parameters he wishes.
Templates are powerful but widely abused feature since they can be used to hide parser/markup bugs. I even think templates should only be created by devs after discussion, otherwise it results in what we see now.
- {{About "Something, something and something", of kind}}
As you can see, no character is banned from the title (...)
What about the separator? Eg. [[The character "]]
Nothing, it's fine. Two options exist for the parser:
- Either it treats all " as starting a new context and thus [[The
character "]]"]] actually creates a link with caption <The character "]]">. 2. Or it treats ]] as an ultimate token and standalone " is output as is.
Right, and pipes should not appear in templates either. It's too special symbol.
Why so? So far the only reason you gave is that it's not on all keyboard layouts.
And is not used in most languages, yes. Is it bad enough reason? Why choose it for an international project like MediaWiki, if there are alternatives?
* remote links can also get their title from <title> after fetching first 4 KiB of that page or something
No way. That can be good for preloading a title on link insertion and storing it indefinitely, but not for doing it every time.
Of course not every time, the engine might maintain the cache with remote links or somehow else alleviate the traffic. And it can be disabled and then the parser wil use some other means of generating title for titleless external links.
Only if pages with no spaces are more common than pages with spaces in the name. Taking enwiki articles as a sample:
- 7746101 articles with space.
- 1416235 articles without space.
Thanks for the statistics. Well, then my point about "half of the cases" isn't fair; however, this doesn't change the fact the pipe isn't as universal as double equality sign which can still typed with the same speed and is less prone to misoperations because it's double and has less chances to appear in-text.
However, you could take advantage of the space-is-underscore, and use [[Some_page this page]] (but still not 'clean')
Yes, this is not very clean and relies on parser/engine behavior. It should be fine to have "Some page" and "Some_page" as two different pages.
Nitpicking, first heading has just one equal sign [at each side] :)
And this is the problem. Even DokuWiki uses not less than 2 "=" for headings. First-level heading appears so rare in the document that it can have "==". Actually, since a document has just one first-level heading all others (2+) can use two "==" as well because there's no sense in creating second-level heading before the document title (first-level).
I think MediaWiki currently lets the user create even the 6th level heading before the doc title?
Standardizing is fine unless it starts looking unnatural. The following example might be argued but I can't think of another one quickly: tar -czf file.tar.gz .
Not a bad example, as that's one of those utilities with odd parameters "... The different styles were developed at different times ..."
Yes, you've got my idea.
That's a source of problems. It's fine having dumb programs that you need to walk-through. When the programs are smart, if they don't go up to the leve, that's an issue.
This is true but this just requires more conscious developer. Nobody will argue that it's harder to write smart programs that dump programs following certain preexisting conventions (e.g. cryptic *nix CL interface that can explain everything or nearly so).
When designing a text markup why should we follow bad guidelines?
No. Clean syntax of
- Foo
- Bar
- Baz
This is the syntax I have suggested for ordered lists earlier. "1. 1. 1." only compliments it.
Not at all because we are talking about context-specific grammar. Addresses in links can hold no formatting and thus all but context ending tokens (]], space and ==) are ignored there.
Oh, you're not autolinking urls.
I didn't really understand that.
Well, it seems like the thread ends here.
Signed, P. Tkachenko
Wikitext-l mailing list Wikitext-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitext-l