I'm working on writing out an EBNF description of Wikitext at http://meta.wikimedia.org/wikiWikitext_Metasyntax , which I hear is much needed, but have encountered a snag. I don't think EBNF has the power to describe Wikitext. If anyone here can work out how EBNF can describe Wikitext's system for bullet points, I'd like to see it. The problem is that bullet points can build on each other, except each new level has to retain the markup from the old level, plus a new symbol. e.g. **#* then **#* * then **#** * then **#*** # etc.
On the topic of "things that need to be done" if writing an EBNF of Wikitext isn't going to be beneficial for MediaWiki's longstanding success, I'll not bother, but is there anything that needs to be done like this to create a standard?
Please forward to any other mailing lists in which this might be appropriate. Thanks!
Hoi, What the f* is an EBNF .. It does not help that you URL is wrong. Being clever (ahum) I found Extended Backus–Naur form.. A question, do you really think that an average Wiki editor will NOT get hopelessly confused and get it hopelessly wrong as well?
Personally I find it horribly ugly as well
Thanks, GerardM
http://meta.wikimedia.org/wiki/Wikitext_Metasyntax
Virgil Ierubino schreef:
I'm working on writing out an EBNF description of Wikitext at http://meta.wikimedia.org/wikiWikitext_Metasyntax , which I hear is much needed, but have encountered a snag. I don't think EBNF has the power to describe Wikitext. If anyone here can work out how EBNF can describe Wikitext's system for bullet points, I'd like to see it. The problem is that bullet points can build on each other, except each new level has to retain the markup from the old level, plus a new symbol. e.g. **#* then **#* * then **#** * then **#*** # etc.
On the topic of "things that need to be done" if writing an EBNF of Wikitext isn't going to be beneficial for MediaWiki's longstanding success, I'll not bother, but is there anything that needs to be done like this to create a standard?
Please forward to any other mailing lists in which this might be appropriate. Thanks!
Despite how ugly (E)BNF may be, it's the standard format for defining the syntax of most prominent computer languages. It had the additional benefit of automatic conversion (via Bison or a similar program) into a native parser.
Gerard Meijssen wrote:
Hoi, What the f* is an EBNF .. It does not help that you URL is wrong. Being clever (ahum) I found Extended Backus–Naur form.. A question, do you really think that an average Wiki editor will NOT get hopelessly confused and get it hopelessly wrong as well?
Personally I find it horribly ugly as well
Thanks, GerardM
http://meta.wikimedia.org/wiki/Wikitext_Metasyntax
Virgil Ierubino schreef:
I'm working on writing out an EBNF description of Wikitext at http://meta.wikimedia.org/wikiWikitext_Metasyntax , which I hear is much needed, but have encountered a snag. I don't think EBNF has the power to describe Wikitext. If anyone here can work out how EBNF can describe Wikitext's system for bullet points, I'd like to see it. The problem is that bullet points can build on each other, except each new level has to retain the markup from the old level, plus a new symbol. e.g. **#* then **#* * then **#** * then **#*** # etc.
On the topic of "things that need to be done" if writing an EBNF of Wikitext isn't going to be beneficial for MediaWiki's longstanding success, I'll not bother, but is there anything that needs to be done like this to create a standard?
Please forward to any other mailing lists in which this might be appropriate. Thanks!
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
Hoi, When this provides a better result than our current wikisyntax when it comes to internationalisation there might be something to it. As you may know the use of two quotes to indicate italics breaks the usage of these same two quotes in languages like Neapolitan.
When it provides this better result, there might be something to it. Otherwise to me it is a hopeless "see how clever we are" exercise never mind how "standard" it is. A standard that does not take internationalisation seriously is useless in an international environment like the Wikimedia projects.
Thanks, GerardM
David Strauss schreef:
Despite how ugly (E)BNF may be, it's the standard format for defining the syntax of most prominent computer languages. It had the additional benefit of automatic conversion (via Bison or a similar program) into a native parser.
Gerard Meijssen wrote:
Hoi, What the f* is an EBNF .. It does not help that you URL is wrong. Being clever (ahum) I found Extended Backus–Naur form.. A question, do you really think that an average Wiki editor will NOT get hopelessly confused and get it hopelessly wrong as well?
Personally I find it horribly ugly as well
Thanks, GerardM
http://meta.wikimedia.org/wiki/Wikitext_Metasyntax
Virgil Ierubino schreef:
I'm working on writing out an EBNF description of Wikitext at http://meta.wikimedia.org/wikiWikitext_Metasyntax , which I hear is much needed, but have encountered a snag. I don't think EBNF has the power to describe Wikitext. If anyone here can work out how EBNF can describe Wikitext's system for bullet points, I'd like to see it. The problem is that bullet points can build on each other, except each new level has to retain the markup from the old level, plus a new symbol. e.g. **#* then **#* * then **#** * then **#*** # etc.
On the topic of "things that need to be done" if writing an EBNF of Wikitext isn't going to be beneficial for MediaWiki's longstanding success, I'll not bother, but is there anything that needs to be done like this to create a standard?
Please forward to any other mailing lists in which this might be appropriate. Thanks!
On 1/22/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, When this provides a better result than our current wikisyntax when it comes to internationalisation there might be something to it. As you may know the use of two quotes to indicate italics breaks the usage of these same two quotes in languages like Neapolitan.
When it provides this better result, there might be something to it. Otherwise to me it is a hopeless "see how clever we are" exercise never mind how "standard" it is. A standard that does not take internationalisation seriously is useless in an international environment like the Wikimedia projects.
EBNF is not a replacement for or variation of wikisyntax. It's a different approach for a parser that understands the /current/ wikisyntax. That would * clean up the mess that is the current parser * allow the parser to be generated from EBNF automatically in C/C++/C#/Perl/whatever * ease generation of different output formats, such as XML (or PDF or docbook or...) * avoid most of those little implementation bugs that come with manual parser writing
Magnus
Hoi, Your do not address my main question; is this thing Internationalisation proof. I care little for something that brings new and or other incompatibilities. Bringing technical advantages does not necessarily make it work well in the real world and our real world is multi-linguistic.
PS I do understand why something like EBNF is useful from a TECHNICAL point of view.
Thanks, GerardM
On 1/22/07, Magnus Manske magnusmanske@googlemail.com wrote:
On 1/22/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, When this provides a better result than our current wikisyntax when it comes to internationalisation there might be something to it. As you may know the use of two quotes to indicate italics breaks the usage of these same two quotes in languages like Neapolitan.
When it provides this better result, there might be something to it. Otherwise to me it is a hopeless "see how clever we are" exercise never mind how "standard" it is. A standard that does not take internationalisation seriously is useless in an international environment like the Wikimedia projects.
EBNF is not a replacement for or variation of wikisyntax. It's a different approach for a parser that understands the /current/ wikisyntax. That would
- clean up the mess that is the current parser
- allow the parser to be generated from EBNF automatically in
C/C++/C#/Perl/whatever
- ease generation of different output formats, such as XML (or PDF or
docbook or...)
- avoid most of those little implementation bugs that come with manual
parser writing
Magnus
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
To address the issue of the two quotes, EBNF does not directly help there. EBNF does not define the tokens you use or their semantics. It only helps you verify and convert raw text to a structured tree of tokens. If one of your tokens happens to be a double quote, then the problem remains.
EBNF would help, however, to expand MediaWiki markup and update it. With the token trees EBNF provides, we could actually update existing versions of articles to newer syntactical structures with relatively little risk. (That's dependent on having a perfect characterization of existing MediaWiki markup in EBNF, which might be impossible.)
So, theoretically, we could create an EBNF grammar for the current MediaWiki markup, devise a replacement for the two quotes, and have the parser output an equivalent in a newer MediaWiki markup form.
GerardM wrote:
Hoi, Your do not address my main question; is this thing Internationalisation proof. I care little for something that brings new and or other incompatibilities. Bringing technical advantages does not necessarily make it work well in the real world and our real world is multi-linguistic.
PS I do understand why something like EBNF is useful from a TECHNICAL point of view.
Thanks, GerardM
On 1/22/07, Magnus Manske magnusmanske@googlemail.com wrote:
On 1/22/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, When this provides a better result than our current wikisyntax when it comes to internationalisation there might be something to it. As you may know the use of two quotes to indicate italics breaks the usage of these same two quotes in languages like Neapolitan.
When it provides this better result, there might be something to it. Otherwise to me it is a hopeless "see how clever we are" exercise never mind how "standard" it is. A standard that does not take internationalisation seriously is useless in an international environment like the Wikimedia projects.
EBNF is not a replacement for or variation of wikisyntax. It's a different approach for a parser that understands the /current/ wikisyntax. That would
- clean up the mess that is the current parser
- allow the parser to be generated from EBNF automatically in
C/C++/C#/Perl/whatever
- ease generation of different output formats, such as XML (or PDF or
docbook or...)
- avoid most of those little implementation bugs that come with manual
parser writing
Magnus
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
On 1/22/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, What the f* is an EBNF .. It does not help that you URL is wrong. Being clever (ahum) I found Extended Backus–Naur form.. A question, do you really think that an average Wiki editor will NOT get hopelessly confused and get it hopelessly wrong as well?
Personally I find it horribly ugly as well
Virgil did not even hint at the fact that this is to be used by editors. You assumed that and it's most probably a wrong assumption.
Maybe it's for developers? Maybe it's for research and comparison with other similar markups? Maybe it's just for the fun of it?
Personally, I'm intrigued. Virgil, could you elaborate on the purpose of this project? In what ways can it help us (and who exactly is the 'us' in this case :))?
Indeed. Having an (E)BNF grammar is invaluable to creating efficient, reliable parsers. (E)BNF grammars have enormous numbers of tools available including automatic detection of ambiguity and (as I mentioned before) automatic conversion into efficient parsers.
MediaWiki's parser is currently a product of years of subtle additions and compatibility fixes. Instead of being a proper parser, it relies on regular expressions.
This is not to malign Brion, Erik, and others' work. It's amazing everything works as well as it does and that they've been able to keep improving Wiki syntax without significantly changing existing semantics or syntax.
However, I'm not sure a MediaWiki syntax -- as it is now -- can be modeled unambiguously in BNF.
Łukasz Garczewski wrote:
Virgil did not even hint at the fact that this is to be used by editors. You assumed that and it's most probably a wrong assumption.
Maybe it's for developers? Maybe it's for research and comparison with other similar markups? Maybe it's just for the fun of it?
Personally, I'm intrigued. Virgil, could you elaborate on the purpose of this project? In what ways can it help us (and who exactly is the 'us' in this case :))?
Łukasz Garczewski wrote:
On 1/22/07, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, What the f* is an EBNF .. It does not help that you URL is wrong. Being clever (ahum) I found Extended Backus–Naur form.. A question, do you really think that an average Wiki editor will NOT get hopelessly confused and get it hopelessly wrong as well?
Personally I find it horribly ugly as well
Virgil did not even hint at the fact that this is to be used by editors. You assumed that and it's most probably a wrong assumption.
Maybe it's for developers? Maybe it's for research and comparison with other similar markups? Maybe it's just for the fun of it?
Personally, I'm intrigued. Virgil, could you elaborate on the purpose of this project? In what ways can it help us (and who exactly is the 'us' in this case :))?
The current wikitext parser is basically a pile of regular-expression hacks, and deeply embedded within the MediaWiki codebase, which makes it hard to extend and also to use for other projects (like building PDFs of Wikipedia articles). So there have been occasional suggestions to build a "real" parser in a more standard way, like by defining a grammar for the language. As far as I know none have actually been completed, partly because Wikitext wasn't really designed with ease of formal parsing in mind.
-Mark
Virgil Ierubino wrote: [fixed URL]
I'm working on writing out an EBNF description of Wikitext at http://meta.wikimedia.org/wiki/Wikitext_Metasyntax , which I hear is much needed, but have encountered a snag. I don't think EBNF has the power to describe Wikitext. If anyone here can work out how EBNF can describe Wikitext's system for bullet points, I'd like to see it. The problem is that bullet points can build on each other, except each new level has to retain the markup from the old level, plus a new symbol. e.g. **#* then **#* * then **#** * then **#*** # etc.
If you can find an EBNF description of HTML, it would be a good place to start; in my experience, MediaWiki's Wikitext syntax is (mostly) directly translatable to HTML (eg. links turn into anchors, bullets turn into list items).
On the topic of "things that need to be done" if writing an EBNF of Wikitext isn't going to be beneficial for MediaWiki's longstanding success, I'll not bother, but is there anything that needs to be done like this to create a standard?
It /might/ be useful if you wanted to import/export between different wiki engines that use different syntax - IIRC the MediaWiki syntax is (fairly) similar to that of UseModWiki (Phase 1), but vastly different to that of eg. DocuWiki. However, I wouldn't rate it very high on the priority list.
It seems unrealistic to me to get try to get the full MediaWiki syntax represented in EBNF---too many grammar issues outside its abilities.
If I were you, I would start with an agreed upon subset. We had a session on this at WikiSym 2006 which lead to Wiki Creole, see www.wikicreole.org (Please note Brion was there.)
I think you might want to sync up with the Wiki Creole people, which are C Sauer and C Smith jointly with several leading wiki engine implementers.
Also, wikitech-l/research-l or wiki-standards/wiki-research seem better places for discussing this.
Finally, I hope you'll submit your results to WikiSym www.wikisym.org. We are all in desparate need of a good grammar and semantics definition.
Dirk
On 1/22/07, Alphax (Wikipedia email) alphasigmax@gmail.com wrote:
Virgil Ierubino wrote: [fixed URL]
I'm working on writing out an EBNF description of Wikitext at http://meta.wikimedia.org/wiki/Wikitext_Metasyntax , which I hear is much needed, but have encountered a snag. I don't think EBNF has the power to describe Wikitext. If anyone here can work out how EBNF can describe Wikitext's system for bullet points, I'd like to see it. The problem is that bullet points can build on each other, except each new level has to retain the markup from the old level, plus a new symbol. e.g. **#* then **#* * then **#** * then **#*** # etc.
If you can find an EBNF description of HTML, it would be a good place to start; in my experience, MediaWiki's Wikitext syntax is (mostly) directly translatable to HTML (eg. links turn into anchors, bullets turn into list items).
On the topic of "things that need to be done" if writing an EBNF of Wikitext isn't going to be beneficial for MediaWiki's longstanding success, I'll not bother, but is there anything that needs to be done like this to create a standard?
It /might/ be useful if you wanted to import/export between different wiki engines that use different syntax - IIRC the MediaWiki syntax is (fairly) similar to that of UseModWiki (Phase 1), but vastly different to that of eg. DocuWiki. However, I wouldn't rate it very high on the priority list.
-- Alphax - http://en.wikipedia.org/wiki/User:Alphax Contributor to Wikipedia, the Free Encyclopedia "We make the internet not suck" - Jimbo Wales Public key: http://en.wikipedia.org/wiki/User:Alphax/OpenPGP
foundation-l mailing list foundation-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/foundation-l
wikimedia-l@lists.wikimedia.org