[crossposted to foundation-l and wikitech-l]
"There has to be a vision though, of something better. Maybe something that is an actual wiki, quick and easy, rather than the template coding hell Wikipedia's turned into." - something Fred Bauder just said on wikien-l.
Our current markup is one of our biggest barriers to participation.
AIUI, edit rates are about half what they were in 2005, even as our fame has gone from "popular" through "famous" to "part of the structure of the world." I submit that this is not a good or healthy thing in any way and needs fixing.
People who can handle wikitext really just do not understand how offputting the computer guacamole is to people who can cope with text they can see.
We know this is a problem; WYSIWYG that works is something that's been wanted here forever. There are various hideous technical nightmares in its way, that make this a big and hairy problem, of the sort where the hair has hair.
However, I submit that it's important enough we need to attack it with actual resources anyway.
This is just one data point, where a Canadian government office got *EIGHT TIMES* the participation in their intranet wiki by putting in a (heavily locally patched) copy of FCKeditor:
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034062.html
"I have to disagree with you given my experience. In one government department where MediaWiki was installed we saw the active user base spike from about 1000 users to about 8000 users within a month of having enabled FCKeditor. FCKeditor definitely has it's warts, but it very closely matches the experience non-technical people have gotten used to while using Word or WordPerfect. Leveraging skills people already have cuts down on training costs and allows them to be productive almost immediately."
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034071.html
"Since a plethora of intelligent people with no desire to learn WikiCode can now add content, the quality of posts has been in line with the adoption of wiki use by these people. Thus one would say it has gone up.
"In the beginning there were some hard core users that learned WikiCode, for the most part they have indicated that when the WYSIWYG fails, they are able to switch to WikiCode mode to address the problem. This usually occurs with complex table nesting which is something that few of the users do anyways. Most document layouts are kept simple. Additionally, we have a multilingual english/french wiki. As a result the browser spell-check is insufficient for the most part (not to mention it has issues with WikiCode). To address this a second spellcheck button was added to the interface so that both english and french spellcheck could be available within the same interface (via aspell backend)."
So, the payoffs could be ridiculously huge: eight times the number of smart and knowledgeable people even being able to *fix typos* on material they care about.
Here are some problems. (Off the top of my head; please do add more, all you can think of.)
- The problem:
* Fidelity with the existing body of wikitext. No conversion flag day. The current body exploits every possible edge case in the regular expression guacamole we call a "parser". Tim said a few years ago that any solution has to account for the existing body of text.
* Two-way fidelity. Those who know wikitext will demand to keep it and will bitterly resist any attempt to take it away from them.
* FCKeditor (now CKeditor) in MediaWiki is all but unmaintained.
* There is no specification for wikitext. Well, there almost is - compiled as C, it runs a bit slower than the existing PHP compiler. But it's a start! http://lists.wikimedia.org/pipermail/wikitext-l/2010-August/000318.html
- Attempting to solve it:
* The best brains around Wikipedia, MediaWiki and WMF have dashed their foreheads against this problem for at least the past five years and have got *nowhere*. Tim has a whole section in the SVN repository for "new parser attempts". Sheer brilliance isn't going to solve this one.
* Tim doesn't scale. Most of our other technical people don't scale. *We have no resources and still run on almost nothing*.
($14m might sound like enough money to run a popular website, but for comparison, I work as a sysadmin at a tiny, tiny publishing company with more money and staff just in our department than that to do *almost nothing* compared to what WMF achieves. WMF is an INCREDIBLY efficient organisation.)
- Other attempts:
* Starting from a clear field makes it ridiculously easy. The government example quoted above is one. Wikia wrote a good WYSIWYG that works really nicely on new wikis (I'm speaking here as an experienced wikitext user who happily fixes random typos on Wikia). Of course, I noted that we can't start from a clear field - we have an existing body of wikitext.
So, specification of the problem:
* We need good WYSIWYG. The government example suggests that a simple word-processor-like interface would be enough to give tremendous results. * It needs two-way fidelity with almost all existing wikitext. * We can't throw away existing wikitext, much as we'd love to. * It's going to cost money in programming the WYSIWYG. * It's going to cost money in rationalising existing wikitext so that the most unfeasible formations can be shunted off to legacy for chewing on. * It's going to cost money in usability testing and so on. * It's going to cost money for all sorts of things I haven't even thought of yet.
This is a problem that would pay off hugely to solve, and that will take actual money thrown at it.
How would you attack this problem, given actual resources for grunt work?
- d.
There are some things that we know:
1) as Brion says, MediaWiki currently only presents content in one way: as wikitext run through the parser. He may well be right that there is a bigger fish which could be caught than WYSIWYG editing by saying that MW should present data in other new and exciting ways, but that's actually a separate question. *If* you wish to solve WYSIWYG editing, your baseline is wikitext and the parser.
2) "guacamole" is one of the more unusual descriptors I've heard for the parser, but it's far from the worst. We all agree that it's horribly messy and most developers treat it like either a sleeping dragon or a *very* grumpy neighbour. I'd say that the two biggest problems with it are that a) it's buried so deep in the codebase that literally the only way to get your wikitext parsed is to fire up the whole of the rest of MediaWiki around it to give it somewhere comfy to live in, and b) there is as David says no way of explaining what it's supposed to be doing except saying "follow the code; whatever it does is what it's supposed to do". It seems to be generally accepted that it is *impossible* to represent everything the parser does in any standard grammar.
Those are all standard gripes, and nothing new or exciting. There are also, to quote a much-abused former world leader, some known unknowns:
1) we don't know how to explain What You See when you parse wikitext except by prodding an exceedingly grumpy hundred thousand lines of PHP and *asking What it thinks* You Get.
2) We don't know how to create a WYSIWYG editor for wikitext.
Now, I'd say we have some unknown unknowns.
1) *is* it because of wikitext's idiosyncracies that WYSIWYG is so difficult? Is wikitext *by its nature* not amenable to WYSIWYG editing?
2) would a wikitext which *was* representable in a standard grammar be amenable to WYSIWYG editing?
3) would a wikitext which had an alternative parser, one that was not buried in the depths of MW (perhaps a full JS library that could be called in real-time on the client), be amenable to WYSIWYG editing?
4) are questions 2 and 3 synonymous?
--HM
"David Gerard" dgerard@gmail.com wrote in message news:AANLkTimthUx-UndO1CTnexcRqbPP89t2M-PVhA6FkFp8@mail.gmail.com...
[crossposted to foundation-l and wikitech-l]
"There has to be a vision though, of something better. Maybe something that is an actual wiki, quick and easy, rather than the template coding hell Wikipedia's turned into." - something Fred Bauder just said on wikien-l.
Our current markup is one of our biggest barriers to participation.
AIUI, edit rates are about half what they were in 2005, even as our fame has gone from "popular" through "famous" to "part of the structure of the world." I submit that this is not a good or healthy thing in any way and needs fixing.
People who can handle wikitext really just do not understand how offputting the computer guacamole is to people who can cope with text they can see.
We know this is a problem; WYSIWYG that works is something that's been wanted here forever. There are various hideous technical nightmares in its way, that make this a big and hairy problem, of the sort where the hair has hair.
However, I submit that it's important enough we need to attack it with actual resources anyway.
This is just one data point, where a Canadian government office got *EIGHT TIMES* the participation in their intranet wiki by putting in a (heavily locally patched) copy of FCKeditor:
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034062.html
"I have to disagree with you given my experience. In one government department where MediaWiki was installed we saw the active user base spike from about 1000 users to about 8000 users within a month of having enabled FCKeditor. FCKeditor definitely has it's warts, but it very closely matches the experience non-technical people have gotten used to while using Word or WordPerfect. Leveraging skills people already have cuts down on training costs and allows them to be productive almost immediately."
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034071.html
"Since a plethora of intelligent people with no desire to learn WikiCode can now add content, the quality of posts has been in line with the adoption of wiki use by these people. Thus one would say it has gone up.
"In the beginning there were some hard core users that learned WikiCode, for the most part they have indicated that when the WYSIWYG fails, they are able to switch to WikiCode mode to address the problem. This usually occurs with complex table nesting which is something that few of the users do anyways. Most document layouts are kept simple. Additionally, we have a multilingual english/french wiki. As a result the browser spell-check is insufficient for the most part (not to mention it has issues with WikiCode). To address this a second spellcheck button was added to the interface so that both english and french spellcheck could be available within the same interface (via aspell backend)."
So, the payoffs could be ridiculously huge: eight times the number of smart and knowledgeable people even being able to *fix typos* on material they care about.
Here are some problems. (Off the top of my head; please do add more, all you can think of.)
- The problem:
- Fidelity with the existing body of wikitext. No conversion flag day.
The current body exploits every possible edge case in the regular expression guacamole we call a "parser". Tim said a few years ago that any solution has to account for the existing body of text.
- Two-way fidelity. Those who know wikitext will demand to keep it and
will bitterly resist any attempt to take it away from them.
FCKeditor (now CKeditor) in MediaWiki is all but unmaintained.
There is no specification for wikitext. Well, there almost is -
compiled as C, it runs a bit slower than the existing PHP compiler. But it's a start! http://lists.wikimedia.org/pipermail/wikitext-l/2010-August/000318.html
- Attempting to solve it:
- The best brains around Wikipedia, MediaWiki and WMF have dashed
their foreheads against this problem for at least the past five years and have got *nowhere*. Tim has a whole section in the SVN repository for "new parser attempts". Sheer brilliance isn't going to solve this one.
- Tim doesn't scale. Most of our other technical people don't scale.
*We have no resources and still run on almost nothing*.
($14m might sound like enough money to run a popular website, but for comparison, I work as a sysadmin at a tiny, tiny publishing company with more money and staff just in our department than that to do *almost nothing* compared to what WMF achieves. WMF is an INCREDIBLY efficient organisation.)
- Other attempts:
- Starting from a clear field makes it ridiculously easy. The
government example quoted above is one. Wikia wrote a good WYSIWYG that works really nicely on new wikis (I'm speaking here as an experienced wikitext user who happily fixes random typos on Wikia). Of course, I noted that we can't start from a clear field - we have an existing body of wikitext.
So, specification of the problem:
- We need good WYSIWYG. The government example suggests that a simple
word-processor-like interface would be enough to give tremendous results.
- It needs two-way fidelity with almost all existing wikitext.
- We can't throw away existing wikitext, much as we'd love to.
- It's going to cost money in programming the WYSIWYG.
- It's going to cost money in rationalising existing wikitext so that
the most unfeasible formations can be shunted off to legacy for chewing on.
- It's going to cost money in usability testing and so on.
- It's going to cost money for all sorts of things I haven't even
thought of yet.
This is a problem that would pay off hugely to solve, and that will take actual money thrown at it.
How would you attack this problem, given actual resources for grunt work?
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Hi,
When this topic was raised before a few years ago (I dont remember which time, it's been continuingly discussed throughout the years) I found an idea especially interesting but it got buried in the mass.
From memory and imagination:
The idea is to write a new parser that is not deep in MediaWiki and can therefor be used apart from MediaWiki and is fairly easy to be translated to, for example, javascript.
This parser accepts similar input as we do now (ie. '''bold''', {{template}}, [[link|text]] etc.) however totally rewritten and with more logical behavour. Call it a 2.0 parser without any worries about compatibilty or old wikitext edge cases which (ab)use the edge cases of the current parser.
This would become the default in MediaWiki for new pages created, and indicated by an int in the revision table (ie. rev_pv (parserversion) ). A WYSIWYG editor can be written for this in javascript and it's great.
So what about articles with the old paser (ie. rev_pv=NULL / rev_pv=1) ? No problem, the old parser stick around for a while and such articles simply dont have a WYSIWYG editor.
Editing articles with the old parser will show a small notice on top (like the one for pages larger than x bytes due to old browser limits) showing an option 'switch' it. That would result in previewing the page's wikitext with the new parser. The user can then make adjuistment as needed to make it look good again (if neccecary at all) and save page (which saves the new revision with rev_pv=2, like it would do for new articles).
Since there are lots of articles which likely will have the same output in HTML and require no modification whatshowever there could be a script written (either as a userbot for the end user or as a maintenance script) that would automatically check all pages that have the old rev_pv and compare them to the output of the new parser and automatically update the rev_rv field if it matches. All others would be visible on a SpecialPage for "pages of which the last revision has an older version of the parser", with a link to an MW.org page with an overview of a few things that regulars may want to know (ie. the most common differences).
Just an idea :) -- Krinkle
On 29 December 2010 02:07, Happy-melon happy-melon@live.com wrote:
There are some things that we know:
- as Brion says, MediaWiki currently only presents content in one way: as
wikitext run through the parser. He may well be right that there is a bigger fish which could be caught than WYSIWYG editing by saying that MW should present data in other new and exciting ways, but that's actually a separate question. *If* you wish to solve WYSIWYG editing, your baseline is wikitext and the parser.
Specifically, it only presents content as HTML. It's not really a parser because it doesn't create an AST (Abstract Syntax Tree). It's a wikitext to HTML converter. The flavour of the HTML can be somewhat modulated by the skin but it could never output directly to something totally different like RTF or PDF.
- "guacamole" is one of the more unusual descriptors I've heard for the
parser, but it's far from the worst. We all agree that it's horribly messy and most developers treat it like either a sleeping dragon or a *very* grumpy neighbour. I'd say that the two biggest problems with it are that a) it's buried so deep in the codebase that literally the only way to get your wikitext parsed is to fire up the whole of the rest of MediaWiki around it to give it somewhere comfy to live in,
I have started to advocate the isolation of the parser from the rest of the innards or MediaWiki for just this reason: https://bugzilla.wikimedia.org/show_bug.cgi?id=25984
Free it up so that anybody can embed it in their code and get exactly the same rendering that Wikipedia et al get, guaranteed.
We have to find all the edges where the parser calls other parts of MediaWiki and all the edges where other parts of MediaWiki call the parser. We then define these edges as interfaces so that we can drop an alternative parser into MediaWiki and drop the current parser into say an offline viewer or whatever.
With a freed up parser more people will hack on it, more people will come to grok it and come up with strategies to address some of its problems. It should also be a boon for unit testing.
(I have a very rough prototype working by the way with lots of stub classes)
and b) there is as David says no way of explaining what it's supposed to be doing except saying "follow the code; whatever it does is what it's supposed to do". It seems to be generally accepted that it is *impossible* to represent everything the parser does in any standard grammar.
I've thought a lot about this too. It certainly is not any type of standard grammar. But on the other hand it is a pretty common kind of nonstandard grammar. I call it a "recursive text replacement grammar".
Perhaps this type of grammar has some useful characteristics we can discover and document. It may be possible to follow the code flow and document each text replacement in sequence as a kind of parser spec rather than trying and failing again to shoehorn it into a standard LALR grammar.
If it is possible to extract such a spec it would then be possible to implement it in other languages.
Some research may even find that is possible to transform such a grammar deterministically into an LALR grammar...
But even if not I'm certain it would demysitfy what happens in the parser so that problems and edge cases would be easier to locate.
Andrew Dunbar (hippietrail)
Those are all standard gripes, and nothing new or exciting. There are also, to quote a much-abused former world leader, some known unknowns:
- we don't know how to explain What You See when you parse wikitext except
by prodding an exceedingly grumpy hundred thousand lines of PHP and *asking What it thinks* You Get.
- We don't know how to create a WYSIWYG editor for wikitext.
Now, I'd say we have some unknown unknowns.
- *is* it because of wikitext's idiosyncracies that WYSIWYG is so
difficult? Is wikitext *by its nature* not amenable to WYSIWYG editing?
- would a wikitext which *was* representable in a standard grammar be
amenable to WYSIWYG editing?
- would a wikitext which had an alternative parser, one that was not buried
in the depths of MW (perhaps a full JS library that could be called in real-time on the client), be amenable to WYSIWYG editing?
- are questions 2 and 3 synonymous?
--HM
"David Gerard" dgerard@gmail.com wrote in message news:AANLkTimthUx-UndO1CTnexcRqbPP89t2M-PVhA6FkFp8@mail.gmail.com...
[crossposted to foundation-l and wikitech-l]
"There has to be a vision though, of something better. Maybe something that is an actual wiki, quick and easy, rather than the template coding hell Wikipedia's turned into." - something Fred Bauder just said on wikien-l.
Our current markup is one of our biggest barriers to participation.
AIUI, edit rates are about half what they were in 2005, even as our fame has gone from "popular" through "famous" to "part of the structure of the world." I submit that this is not a good or healthy thing in any way and needs fixing.
People who can handle wikitext really just do not understand how offputting the computer guacamole is to people who can cope with text they can see.
We know this is a problem; WYSIWYG that works is something that's been wanted here forever. There are various hideous technical nightmares in its way, that make this a big and hairy problem, of the sort where the hair has hair.
However, I submit that it's important enough we need to attack it with actual resources anyway.
This is just one data point, where a Canadian government office got *EIGHT TIMES* the participation in their intranet wiki by putting in a (heavily locally patched) copy of FCKeditor:
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034062.html
"I have to disagree with you given my experience. In one government department where MediaWiki was installed we saw the active user base spike from about 1000 users to about 8000 users within a month of having enabled FCKeditor. FCKeditor definitely has it's warts, but it very closely matches the experience non-technical people have gotten used to while using Word or WordPerfect. Leveraging skills people already have cuts down on training costs and allows them to be productive almost immediately."
http://lists.wikimedia.org/pipermail/mediawiki-l/2010-May/034071.html
"Since a plethora of intelligent people with no desire to learn WikiCode can now add content, the quality of posts has been in line with the adoption of wiki use by these people. Thus one would say it has gone up.
"In the beginning there were some hard core users that learned WikiCode, for the most part they have indicated that when the WYSIWYG fails, they are able to switch to WikiCode mode to address the problem. This usually occurs with complex table nesting which is something that few of the users do anyways. Most document layouts are kept simple. Additionally, we have a multilingual english/french wiki. As a result the browser spell-check is insufficient for the most part (not to mention it has issues with WikiCode). To address this a second spellcheck button was added to the interface so that both english and french spellcheck could be available within the same interface (via aspell backend)."
So, the payoffs could be ridiculously huge: eight times the number of smart and knowledgeable people even being able to *fix typos* on material they care about.
Here are some problems. (Off the top of my head; please do add more, all you can think of.)
- The problem:
- Fidelity with the existing body of wikitext. No conversion flag day.
The current body exploits every possible edge case in the regular expression guacamole we call a "parser". Tim said a few years ago that any solution has to account for the existing body of text.
- Two-way fidelity. Those who know wikitext will demand to keep it and
will bitterly resist any attempt to take it away from them.
FCKeditor (now CKeditor) in MediaWiki is all but unmaintained.
There is no specification for wikitext. Well, there almost is -
compiled as C, it runs a bit slower than the existing PHP compiler. But it's a start! http://lists.wikimedia.org/pipermail/wikitext-l/2010-August/000318.html
- Attempting to solve it:
- The best brains around Wikipedia, MediaWiki and WMF have dashed
their foreheads against this problem for at least the past five years and have got *nowhere*. Tim has a whole section in the SVN repository for "new parser attempts". Sheer brilliance isn't going to solve this one.
- Tim doesn't scale. Most of our other technical people don't scale.
*We have no resources and still run on almost nothing*.
($14m might sound like enough money to run a popular website, but for comparison, I work as a sysadmin at a tiny, tiny publishing company with more money and staff just in our department than that to do *almost nothing* compared to what WMF achieves. WMF is an INCREDIBLY efficient organisation.)
- Other attempts:
- Starting from a clear field makes it ridiculously easy. The
government example quoted above is one. Wikia wrote a good WYSIWYG that works really nicely on new wikis (I'm speaking here as an experienced wikitext user who happily fixes random typos on Wikia). Of course, I noted that we can't start from a clear field - we have an existing body of wikitext.
So, specification of the problem:
- We need good WYSIWYG. The government example suggests that a simple
word-processor-like interface would be enough to give tremendous results.
- It needs two-way fidelity with almost all existing wikitext.
- We can't throw away existing wikitext, much as we'd love to.
- It's going to cost money in programming the WYSIWYG.
- It's going to cost money in rationalising existing wikitext so that
the most unfeasible formations can be shunted off to legacy for chewing on.
- It's going to cost money in usability testing and so on.
- It's going to cost money for all sorts of things I haven't even
thought of yet.
This is a problem that would pay off hugely to solve, and that will take actual money thrown at it.
How would you attack this problem, given actual resources for grunt work?
- d.
foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2010-12-29 08:33, Andrew Dunbar skrev:
I've thought a lot about this too. It certainly is not any type of standard grammar. But on the other hand it is a pretty common kind of nonstandard grammar. I call it a "recursive text replacement grammar".
Perhaps this type of grammar has some useful characteristics we can discover and document. It may be possible to follow the code flow and document each text replacement in sequence as a kind of parser spec rather than trying and failing again to shoehorn it into a standard LALR grammar.
If it is possible to extract such a spec it would then be possible to implement it in other languages.
Some research may even find that is possible to transform such a grammar deterministically into an LALR grammar...
But even if not I'm certain it would demysitfy what happens in the parser so that problems and edge cases would be easier to locate.
From my experience of implementing a wikitext parser, I would say that
it might be possible to transform wikitext to a token stream that is possible to parse with a LALR parser. My implementation (http://svn.wikimedia.org/svnroot/mediawiki/trunk/parsers/libmwparser) uses Antlr (which is an LL parser generator) and only rely on context sensitive parsing (Antlr's semantic predicates) for parsing apostrophes (bold and italics), and this might be possible to solve in a different way. The rest of the complex cases are handled by the lexical analyser that produce a well behaving token stream that can be relatively straightforwardly parsed.
My implementation is not 100% compatible, but I think that a 100% compatible parser is not desirable since the most exotic border cases would probably be characterized as "bugs" anyway (e.g. [[Link|<table class="]]">). But I think that the basic idea can be used to produce a sufficiently compatible parser.
Best Regards,
/Andreas
On 3 January 2011 21:54, Andreas Jonsson andreas.jonsson@kreablo.se wrote:
2010-12-29 08:33, Andrew Dunbar skrev:
I've thought a lot about this too. It certainly is not any type of standard grammar. But on the other hand it is a pretty common kind of nonstandard grammar. I call it a "recursive text replacement grammar".
Perhaps this type of grammar has some useful characteristics we can discover and document. It may be possible to follow the code flow and document each text replacement in sequence as a kind of parser spec rather than trying and failing again to shoehorn it into a standard LALR grammar.
If it is possible to extract such a spec it would then be possible to implement it in other languages.
Some research may even find that is possible to transform such a grammar deterministically into an LALR grammar...
But even if not I'm certain it would demysitfy what happens in the parser so that problems and edge cases would be easier to locate.
From my experience of implementing a wikitext parser, I would say that it might be possible to transform wikitext to a token stream that is possible to parse with a LALR parser. My implementation (http://svn.wikimedia.org/svnroot/mediawiki/trunk/parsers/libmwparser) uses Antlr (which is an LL parser generator) and only rely on context sensitive parsing (Antlr's semantic predicates) for parsing apostrophes (bold and italics), and this might be possible to solve in a different way. The rest of the complex cases are handled by the lexical analyser that produce a well behaving token stream that can be relatively straightforwardly parsed.
My implementation is not 100% compatible, but I think that a 100% compatible parser is not desirable since the most exotic border cases would probably be characterized as "bugs" anyway (e.g. [[Link|<table class="]]">). But I think that the basic idea can be used to produce a sufficiently compatible parser.
In that case what is needed is to hook your parser into our current code and get it create output if you have not done that already. Then you will want to run the existing parser tests on it. Then you will want to run both parsers over a large sample of existing Wikipedia articles (make sure you use the same revisions on both parsers!) and run them through diff. Then we'll have a decent idea of whether there are any edge cases you didn't spot or whether any of them are exploited in template magic.
Let us know the results!
Andrew Dunbar (hippietrail)
Best Regards,
/Andreas
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
Neil Kandalgaonkar wrote:
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
[quote] The "Viable alternative to Wikipedia" isn't going to be another Mediawiki site, in any event - it's going to be something that someone puts some real effort into developing the software for, not to mention the user experience... [/quote]
http://wikipediareview.com/index.php?showtopic=31808&view=findpost&p...
I largely agree with that.
You can't make more than broad generalizations about what a "Wikipedia killer" would be. If there were a concrete answer or set of answers, Wikipedia would be dead already. A number of organizations and companies have tried to replicate Wikipedia's success (e.g. Wikia) with varying degrees of success. The most common factor to past Wikipedia competitors has been MediaWiki (though if someone can refute this, please do). To me (and others), that leaves the question of what would happen if you wrote some software that was actually built for making an encyclopedia, rather than the jack of all trades product that MediaWiki is.
As for follow-up questions, be explicit.
MZMcBride
2010/12/29 MZMcBride z@mzmcbride.com
Neil Kandalgaonkar wrote:
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
It's simply evolution rule! The day this would happen - that something will appear, collecting all the best from wiki, adding too something better and successful - wiki will slowly disappear. But all the better of wiki will survive into the emerging "species".... where's the problem, you you don't consider wiki in terms of competition, but in terms of utiliy? I'm actively working for wiki principles, not at all for wiki project! I hope, that this will be not considered offensive for wiki community.
Alex
On 29 December 2010 08:24, MZMcBride z@mzmcbride.com wrote:
To me (and others), that leaves the question of what would happen if you wrote some software that was actually built for making an encyclopedia, rather than the jack of all trades product that MediaWiki is.
MediaWiki is precisely that software. And there's any number of specialist wikis using it that are basically Wikipedia in a specialist area.
This sounds like "software that looks to me on the surface like it was actually built for making an encyclopedia". This is, of course, not at all the same as success.
Note the number of competitors, forks and comolements that have already beached having assumed "I can do a better Wikipedia if it fits my idea of how an encyclopedia *should* look" and been dead wrong. You're reasoning from the assumption of no knowledge, rather than one of considerable knowledge.
- d.
David Gerard wrote:
On 29 December 2010 08:24, MZMcBride z@mzmcbride.com wrote:
To me (and others), that leaves the question of what would happen if you wrote some software that was actually built for making an encyclopedia, rather than the jack of all trades product that MediaWiki is.
MediaWiki is precisely that software. And there's any number of specialist wikis using it that are basically Wikipedia in a specialist area.
No, I don't think MediaWiki is precisely that software. MediaWiki is a wiki engine that can be used for a variety of purposes. It may have started out as a tool to make an encyclopedia, but very shortly after its mission drifted.
Since Wikipedia's creation, there have been countless debates about what an "encyclopedia" is. However, at a most basic level, we can say that an encyclopedia is its content. As an exercise, try retrieving the first sentence of every article on the English Wikipedia. You'll quickly discover it's a real pain in the ass. Or try extracting the birth year from every living person's article on the English Wikipedia that uses an infobox. Even more of a difficult task, if not an impossible one.
MediaWiki was designed to fit a number of ideas: free dictionary, free encyclopedia, free news site, free media repo, etc. And thus its design has been held back in many areas in order to ensure that any change doesn't break its various use-cases.
How do you build a better Wikipedia? By building software designed to make an encyclopedia. That leaves two options: abandon MediaWiki or re-focus MediaWiki. The current MediaWiki will never lead to a "Wikipedia killer." I firmly believe that.
Assuming you focused on only building a better encyclopedia, a MediaWiki 2.0 would put meta-content in a separate area, so that clicking edit doesn't stab the user in the eye with nasty infobox content. MW2.0 would use actual input forms for data, instead of the completely hackish hellhole that is "[[Category:]]" and "{{Infobox |param}}". MW2.0 would standardize and normalize template parameters to something more sane and would allow categories to be added, removed, and moved without divine intervention (and a working knowledge of Python). MW2.0 would have the ability to edit pages without knowing an esoteric, confusing, and non-standardized markup.
All of this (and much more) is possible, but it requires killing the one-size-fits-all model that allows MediaWiki to work (ehhh, function) as a dictionary, media repository, news site, etc. For an encyclopedia, you want to use categories and make a category interface as nice as possible, for example. For a media repository, the categories are in[s]ane and would be replaced by tags. And we won't begin to discuss the changes needed to make Wiktionary not the scrambled, hacked-up mess that it currently is.
You make a "Wikipedia killer" by building software that's actually designed to kill Wikipedia, not kill Wikipedia, Wiktionary, Wikimedia Commons, Wikinews, and whatever else.
This sounds like "software that looks to me on the surface like it was actually built for making an encyclopedia". This is, of course, not at all the same as success.
I'm not sure what "this" is. Can you clarify?
MZMcBride
On 29 December 2010 11:21, MZMcBride z@mzmcbride.com wrote:
David Gerard wrote:
MediaWiki is precisely that software. And there's any number of specialist wikis using it that are basically Wikipedia in a specialist area.
No, I don't think MediaWiki is precisely that software. MediaWiki is a wiki engine that can be used for a variety of purposes. It may have started out as a tool to make an encyclopedia, but very shortly after its mission drifted. MediaWiki was designed to fit a number of ideas: free dictionary, free encyclopedia, free news site, free media repo, etc. And thus its design has been held back in many areas in order to ensure that any change doesn't break its various use-cases.
No, it's pretty much a simple free-encyclopedia engine. Ask people on the other projects about how hard it is to get anyone interested in what they need.
This sounds like "software that looks to me on the surface like it was actually built for making an encyclopedia". This is, of course, not at all the same as success.
I'm not sure what "this" is. Can you clarify?
Your original statement of what you thought was needed. I'm not convinced that putting more policy into the engine will make a Wikipedia killer. There's already rather a lot of encyclopedia-directed policy in the WMF deploy.
- d.
On 12/29/10 3:21 AM, MZMcBride wrote:
David Gerard wrote:
On 29 December 2010 08:24, MZMcBridez@mzmcbride.com wrote:
To me (and others), that leaves the question of what would happen if you wrote some software that was actually built for making an encyclopedia, rather than the jack of all trades product that MediaWiki is.
MediaWiki is precisely that software. And there's any number of specialist wikis using it that are basically Wikipedia in a specialist area.
No, I don't think MediaWiki is precisely that software. MediaWiki is a wiki engine that can be used for a variety of purposes. It may have started out as a tool to make an encyclopedia, but very shortly after its mission drifted.
I agree. If MediaWiki is software for creating an encyclopedia then why are the tools for creating references and footnotes optional extras? It's rather difficult to set up MediaWiki so that it works in a manner similar to Wikipedia or Wikimedia Commons (and I know this from experience).
Scale matters too. Right now, any new feature for MediaWiki has to be considered in the light of some tiny organization running it on an aging Windows PC, as well as running a top ten website. It's amazing that MediaWiki has managed to bridge this gap at all, but it's come at a noticeable cost.
Question: assuming that our primary interest is creating software for Wikipedia and similar WMF projects, do we actually get anything from the Windows PC intranet users that offsets the cost of keeping MediaWiki friendly to both environments? In other words, do we get contributions from them that help us do Wikipedia et al,?
MW2.0 would use actual input forms for data, instead of the completely hackish hellhole that is "[[Category:]]" and "{{Infobox |param}}". MW2.0 would standardize and normalize template parameters to something more sane and would allow categories to be added, removed, and moved without divine intervention (and a working knowledge of Python). MW2.0 would have the ability to edit pages without knowing an esoteric, confusing, and non-standardized markup.
+1
I think it is vital to keep templates -- it's a whole new layer of creativity and MediaWiki's shown that many powerful features can come about that way. That said, we also want a template to have some guaranteee of sane inputs and outputs, and maybe a template could also suggest how its data should be indexed in search engines or for internal search, or how to create a friendly GUI interface. Perhaps that would be impossible for all cases, but if XML made it easy in 95% of cases, I'd take that tradeoff.
As a programmer I am somewhat dismayed at the atrocities that have been perpetrated with MediaWiki. (Wikimedia's hacking language variants to show licensing options is my favorite). As someone who believes that given freedom, users will create amazing things, I'm blown away by the creativity. I think those show the need for a *more* powerful template system, that is hopefully easier to use.
Maybe this is anathema here, but XML seems like a logical choice to me. While inefficient to type in raw code, it is widely understood, and we can use existing tools to make WYSIWYG editors easily. So perhaps an infobox could be edited with a sort of form that was autogenerated from some metadata that described the possible contents of an infobox.
Also, XML can encapsulate data and even other templates to an infinite degree. A few months ago somebody asked how they could implement a third layer of "quoting" in the geocoding template syntax and it just seemed to me like this problem shouldn't have to exist.
Question: assuming that our primary interest is creating software for Wikipedia and similar WMF projects, do we actually get anything from the Windows PC intranet users that offsets the cost of keeping MediaWiki friendly to both environments? In other words, do we get contributions from them that help us do Wikipedia et al,?
As someone who originally started contributing from maintaining a small MediaWiki instance, I kind of dislike this question. I also don't think we should be mixing "we" when discussing WMF and MediaWiki.
But to answer your question: yes. We get contributions, we get employees, and we get a larger, more vibrant community. A number of contributors come from enterprises and small shops, but they often don't contribute directly to Wikimedia projects. However, their contributions often allow other people to use the software in environments they couldn't be used in otherwise (LDAP authentication is a perfect example of this). The people who then get to use the software may turn into contributors that do benefit WMF.
MediaWiki is created primarily for WMF use, but a lot of other people depend on it. I advocate the use of the software by everyone, and emphasize in talks that we want contributions from everyone, even if they don't benefit WMF. I don't think we should discourage this. We should really try harder to embrace enterprise users to get *more* non-WMF specific extensions and features.
It doesn't take that much effort to keep core small, and maintain extensions for WMF use. I honestly don't think this is a limiting factor to the usability of WMF projects, either.
Respectfully,
Ryan Lane
On Wed, Dec 29, 2010 at 4:55 PM, Ryan Lane rlane32@gmail.com wrote:
As someone who originally started contributing from maintaining a small MediaWiki instance, I kind of dislike this question. I also don't think we should be mixing "we" when discussing WMF and MediaWiki.
But to answer your question: yes. We get contributions, we get employees, and we get a larger, more vibrant community. A number of contributors come from enterprises and small shops, but they often don't contribute directly to Wikimedia projects. However, their contributions often allow other people to use the software in environments they couldn't be used in otherwise (LDAP authentication is a perfect example of this). The people who then get to use the software may turn into contributors that do benefit WMF.
QFT.
-Chad
Thanks... I know this is a provocative question but I meant it just as it was stated, nothing more, nothing less. For better or worse my history with the foundation is too short to know the answers to these questions.
All the assumptions in my question are up for grabs, including the assumption that we're even primarily developing MediaWiki for WMF projects. Maybe we think it's just a good thing for the world and that's that.
Anyway, I would question that it doesn't take a lot of effort to keep the core small -- it seems to me that more and more of the things we use to power the big WMF projects are being pushed into extensions and templates and difficult-to-reproduce configuration and even data entered directly into the wiki, commingled indistinguishably with documents. (As you are aware, it takes a lot of knowledge to recreate Wikipedia for a testing environment. ;)
Meanwhile, MediaWiki is perhaps too powerful and too complex to administer for the small organization. I work with a small group of artists that run a MediaWiki instance and whenever online collaboration has to happen, nobody in this group says "Let's make a wiki page!" That used to happen, but nowadays they go straight to Google Docs. And that has a lot of downsides; no version history, complex to auth credentials, lack of formatting power, can't easily transition to a doc published on a website, etc.
I'm not saying MediaWiki has to be the weapon of choice for lightweight collaboration. Maybe that suggests maybe we should narrow the focus of what we're doing. Or, get more serious about going after those use cases.
On 12/29/10 1:55 PM, Ryan Lane wrote:
Question: assuming that our primary interest is creating software for Wikipedia and similar WMF projects, do we actually get anything from the Windows PC intranet users that offsets the cost of keeping MediaWiki friendly to both environments? In other words, do we get contributions from them that help us do Wikipedia et al,?
As someone who originally started contributing from maintaining a small MediaWiki instance, I kind of dislike this question. I also don't think we should be mixing "we" when discussing WMF and MediaWiki.
But to answer your question: yes. We get contributions, we get employees, and we get a larger, more vibrant community. A number of contributors come from enterprises and small shops, but they often don't contribute directly to Wikimedia projects. However, their contributions often allow other people to use the software in environments they couldn't be used in otherwise (LDAP authentication is a perfect example of this). The people who then get to use the software may turn into contributors that do benefit WMF.
MediaWiki is created primarily for WMF use, but a lot of other people depend on it. I advocate the use of the software by everyone, and emphasize in talks that we want contributions from everyone, even if they don't benefit WMF. I don't think we should discourage this. We should really try harder to embrace enterprise users to get *more* non-WMF specific extensions and features.
It doesn't take that much effort to keep core small, and maintain extensions for WMF use. I honestly don't think this is a limiting factor to the usability of WMF projects, either.
Respectfully,
Ryan Lane
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
* Neil Kandalgaonkar neilk@wikimedia.org [Wed, 29 Dec 2010 14:40:13 -0800]:
Thanks... I know this is a provocative question but I meant it just as it was stated, nothing more, nothing less. For better or worse my history with the foundation is too short to know the answers to these questions.
All the assumptions in my question are up for grabs, including the assumption that we're even primarily developing MediaWiki for WMF projects. Maybe we think it's just a good thing for the world and
that's
that.
Anyway, I would question that it doesn't take a lot of effort to keep the core small -- it seems to me that more and more of the things we
use
to power the big WMF projects are being pushed into extensions and templates and difficult-to-reproduce configuration and even data
entered
directly into the wiki, commingled indistinguishably with documents.
(As
you are aware, it takes a lot of knowledge to recreate Wikipedia for a testing environment. ;)
Meanwhile, MediaWiki is perhaps too powerful and too complex to administer for the small organization. I work with a small group of artists that run a MediaWiki instance and whenever online
collaboration
has to happen, nobody in this group says "Let's make a wiki page!"
That
used to happen, but nowadays they go straight to Google Docs. And that has a lot of downsides; no version history, complex to auth
credentials,
lack of formatting power, can't easily transition to a doc published
on
a website, etc.
MediaWIki wasn't always so complex. The first version, I've used in 2007 (1.9.3) was reasonably simpler than current 1.17 / 1.18 revisions. And one might learn it gradually, step by step in many months or even years. Besides of writing extensions for various clients, I do use it for my own small memo / blog, where I do put code samples, useful links (bookmarking) and a lot of various texts (quotations and articles to read later).
To me, a standalone MediaWiki on a flash drive sounds like a good idea. However, there are many limitations, although SQLite support have become much better and there is a Nanoweb http server; some computers might already listen to 127.0.0.1:80. I wish it was possible to run a kind of web server with system sockets, or even no sockets at all, however browsers probably do not support this :-( Otherwise, one should pre-run a port scanner (not a very good thing). Dmitriy
----- Original Message -----
From: "Neil Kandalgaonkar" neilk@wikimedia.org
Meanwhile, MediaWiki is perhaps too powerful and too complex to administer for the small organization. I work with a small group of artists that run a MediaWiki instance and whenever online collaboration has to happen, nobody in this group says "Let's make a wiki page!"
Why not?
That used to happen, but nowadays they go straight to Google Docs.
Oh.
Well, that's bad. But people will choose the wrong tools; I don't think that's evidence that MediaWiki's Broken As Designed.
"Too powerful and complex to administer"?
It needs administration? In a small organization?
I set one up at my previous employers, and used it to take all my notes, which required exactly zero administration: I just slapped it on a box, and I was done.
And my successor is *very* happy about it. :-)
Cheers, -- jra
On this note, MTV Networks (my previous job) switched from using Mediawiki to Confluence a couple years ago. They mainly cited ease of use and Microsoft Office integration as the reasons. Personally I hated it, except for the dashboard interface, which was pretty slick. Some Wikipedia power-users have similar dashboard style interfaces that they have custom built on their User Pages, but I think it would be cool if we let people add these sort of interfaces without having to be a template-hacker.
The sort of interface I'm talking about would include stuff like community and WikiProject notices and various real-time stats. If you were a vandal fighter, you would get a vandalism thermometer, streaming incident notices, a recent changes feed, etc. If you were a content reviewer, you would get lists of the latest Featured Article and Good Article candidates, as well as the latest images nominated for Featured Picture Status, and announcements from the Guild of Copyeditors. The possibilities are endless.
Ryan Kaldari
On 12/31/10 4:35 PM, Jay Ashworth wrote:
----- Original Message -----
From: "Neil Kandalgaonkar"neilk@wikimedia.org
Meanwhile, MediaWiki is perhaps too powerful and too complex to administer for the small organization. I work with a small group of artists that run a MediaWiki instance and whenever online collaboration has to happen, nobody in this group says "Let's make a wiki page!"
Why not?
That used to happen, but nowadays they go straight to Google Docs.
Oh.
Well, that's bad. But people will choose the wrong tools; I don't think that's evidence that MediaWiki's Broken As Designed.
"Too powerful and complex to administer"?
It needs administration? In a small organization?
I set one up at my previous employers, and used it to take all my notes, which required exactly zero administration: I just slapped it on a box, and I was done.
And my successor is *very* happy about it. :-)
Cheers, -- jra
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 1 January 2011 03:03, Ryan Kaldari rkaldari@wikimedia.org wrote:
On this note, MTV Networks (my previous job) switched from using Mediawiki to Confluence a couple years ago. They mainly cited ease of use and Microsoft Office integration as the reasons. Personally I hated it, except for the dashboard interface, which was pretty slick. Some Wikipedia power-users have similar dashboard style interfaces that they have custom built on their User Pages, but I think it would be cool if we let people add these sort of interfaces without having to be a template-hacker.
The sort of interface I'm talking about would include stuff like community and WikiProject notices and various real-time stats. If you were a vandal fighter, you would get a vandalism thermometer, streaming incident notices, a recent changes feed, etc. If you were a content reviewer, you would get lists of the latest Featured Article and Good Article candidates, as well as the latest images nominated for Featured Picture Status, and announcements from the Guild of Copyeditors. The possibilities are endless.
Ryan Kaldari
So, what stop people from writing a "dashboard wizard" that let people select a predefined one?
On 1 January 2011 02:03, Ryan Kaldari rkaldari@wikimedia.org wrote:
On this note, MTV Networks (my previous job) switched from using Mediawiki to Confluence a couple years ago.
There's a certain large media organisation in the UK that uses Confluence for WYSIWYG and access control lists. And not MediaWiki. I could have talked them past the ACLs, but not the lack of WYSIWYG. That's one of the reasons I'm so very gung-ho on the stuff.
They mainly cited ease of use and Microsoft Office integration as the reasons.
It doesn't have ease of use at all. What it has is a features list and a sales team.
In terms of ease of use, my current workplace has an official Plone-based intranet and a few less-official MediaWiki installations. Our office wiki is ridiculously easier to actually use than the Plone site, despite the lack of WYSIWYG (FCK was pretty good, but not quite good enough). The Plone site is a write-only
Personally I hated
it, except for the dashboard interface, which was pretty slick. Some Wikipedia power-users have similar dashboard style interfaces that they have custom built on their User Pages, but I think it would be cool if we let people add these sort of interfaces without having to be a template-hacker.
The sort of interface I'm talking about would include stuff like community and WikiProject notices and various real-time stats. If you were a vandal fighter, you would get a vandalism thermometer, streaming incident notices, a recent changes feed, etc. If you were a content reviewer, you would get lists of the latest Featured Article and Good Article candidates, as well as the latest images nominated for Featured Picture Status, and announcements from the Guild of Copyeditors. The possibilities are endless.
Ryan Kaldari
On 12/31/10 4:35 PM, Jay Ashworth wrote:
----- Original Message -----
From: "Neil Kandalgaonkar"neilk@wikimedia.org
Meanwhile, MediaWiki is perhaps too powerful and too complex to administer for the small organization. I work with a small group of artists that run a MediaWiki instance and whenever online collaboration has to happen, nobody in this group says "Let's make a wiki page!"
Why not?
That used to happen, but nowadays they go straight to Google Docs.
Oh.
Well, that's bad. But people will choose the wrong tools; I don't think that's evidence that MediaWiki's Broken As Designed.
"Too powerful and complex to administer"?
It needs administration? In a small organization?
I set one up at my previous employers, and used it to take all my notes, which required exactly zero administration: I just slapped it on a box, and I was done.
And my successor is *very* happy about it. :-)
Cheers, -- jra
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 1 January 2011 15:03, David Gerard dgerard@gmail.com wrote:
It doesn't have ease of use at all. What it has is a features list and a sales team.
In terms of ease of use, my current workplace has an official Plone-based intranet and a few less-official MediaWiki installations. Our office wiki is ridiculously easier to actually use than the Plone site, despite the lack of WYSIWYG (FCK was pretty good, but not quite good enough). The Plone site is a write-only
... document graveyard. It's where documentation goes to die, unloved and unnoticed. The wiki is what people actually read and update.
But I do think WYSIWYG could give it about eight times the participation.
So, yeah. I'm picturing a happy world of bunnies and flowers where the MediaWiki tarball includes WYSIWYG right there and people use an office wiki as the massively multiplayer office whiteboard it should be, and the sysadmin gets treated like a hero with very little work. Because MediaWiki is very little work. And we like to be treated like heroes every now and then.
- d.
On 01/01/11 16:06, David Gerard wrote:
Because MediaWiki is very little work. And we like to be treated like heroes every now and then.
This is my exact experience. And I have been a "hero" for 4 years in my current company. Almost all department now have a MediaWiki installation and nobody complained about the lack of ACL or WYSIWTF :b
The main issues users encountered were : - installing the parserfunction - getting the wikipedia look'n feel (just add some CSS) - single sign on (install Ryan Lane LDAP authentication)
* Ashar Voultoiz hashar+wmf@free.fr [Sat, 08 Jan 2011 23:08:23 +0100]:
On 01/01/11 16:06, David Gerard wrote:
Because MediaWiki is very little work. And we like to be treated
like
heroes every now and then.
MediaWiki is not a little work. Not everybody can set up a farm with it's own (not WMF's) shared repository "commons" (I am especially speaking of pre-instant commons era, where you had to alter many global settings). Not everybody can have a path-based farm, instead of DNS-based one. Even memorizing these wg* globals is a large work. 99% of users do not even know that one might add JS-scripts to MediaWiki namespace.
There's been done everything at my primary work to undermine my MediaWiki deployment efforts - that it "easily can be installed via the linux package - so why he is installing that manually", "markup is primitive", "inflexible", "PHP is inferior language, use ASP.NET instead" and so on.
This is my exact experience. And I have been a "hero" for 4 years in
my
current company. Almost all department now have a MediaWiki installation and nobody complained about the lack of ACL or WYSIWTF :b
BTW, there's HaloACL nowadays, although I haven't deployed it yet. Unfortunately my own experience with earning on MediaWiki is not so bright - perhaps because this is a third world country.
The main issues users encountered were :
- installing the parserfunction
- getting the wikipedia look'n feel (just add some CSS)
- single sign on (install Ryan Lane LDAP authentication)
Yes, that is simple. However not everything is simple and sometimes you have to write your own extension. For example, there was no flexible poll extensions some years ago. Dmitriy
On Mon, Jan 10, 2011 at 7:25 PM, Dmitriy Sintsov questpc@rambler.ru wrote:
There's been done everything at my primary work to undermine my MediaWiki deployment efforts - that it "easily can be installed via the linux package - so why he is installing that manually", "markup is primitive", "inflexible", "PHP is inferior language, use ASP.NET instead" and so on.
ASP.NET? Only if you want all your sourcecode exposed.
Marco
Neil Kandalgaonkar wrote:
Question: assuming that our primary interest is creating software for Wikipedia and similar WMF projects, do we actually get anything from the Windows PC intranet users that offsets the cost of keeping MediaWiki friendly to both environments? In other words, do we get contributions from them that help us do Wikipedia et al,?
Not generally, no.
MediaWiki is just one of Wikimedia's projects, something that I think is sometimes overlooked or forgotten. Probably as it's the current base upon which all the other projects are built. To me, that appears to be the fundamental problem here. I've said this in a roundabout way a few times now, but the horse is still whimpering, so let's try once more.
I don't think the software that a dictionary or quote database needs is ever going to be the same as the software that an encyclopedia or news site needs. And I don't think the software options that fit those four use-cases will ever work (well!) for a media repository. I don't think it's a lack of creativity. Given the hacks put in place on sites like the English Wiktionary, it's clearly not. But at some point there has to be a recognition that using a screwdriver to put nails in the wall is a bad idea. You need a hammer.
Tim wrote a blog on techblog.wikimedia.org in July 2010 about MediaWiki version statistics. Someone commented that it was ironic that Wikimedia was using WordPress instead of MediaWiki as a blogging platform. Tim's response: they do different things.[1]
This isn't a matter of not knowing what the problem is. The problem is recognized by the leading MediaWiki developers and it's an old software principle (cf. Unix's philosophy[2] of doing one thing and doing it well). The full phrase quoted earlier is "jack of all trades, master of none." I think MediaWiki fits this perfectly.
MediaWiki needs to re-focus or fork. Ultimately, however, there are not enough resources to maintain every current Wikimedia project and nobody is willing to make the necessary cuts. So we end up with a lot of mediocre projects/products rather than one or two great ones.
MZMcBride
[1] http://techblog.wikimedia.org/?p=970#comment-819 [2] http://en.wikipedia.org/wiki/Unix_philosophy
On 12/29/2010 5:14 PM, MZMcBride wrote:
Neil Kandalgaonkar wrote:
Question: assuming that our primary interest is creating software for Wikipedia and similar WMF projects, do we actually get anything from the Windows PC intranet users that offsets the cost of keeping MediaWiki friendly to both environments? In other words, do we get contributions from them that help us do Wikipedia et al,?
Not generally, no.
MediaWiki is just one of Wikimedia's projects, something that I think is sometimes overlooked or forgotten. Probably as it's the current base upon which all the other projects are built. To me, that appears to be the fundamental problem here. I've said this in a roundabout way a few times now, but the horse is still whimpering, so let's try once more.
I don't think the software that a dictionary or quote database needs is ever going to be the same as the software that an encyclopedia or news site needs. And I don't think the software options that fit those four use-cases will ever work (well!) for a media repository. I don't think it's a lack of creativity. Given the hacks put in place on sites like the English Wiktionary, it's clearly not. But at some point there has to be a recognition that using a screwdriver to put nails in the wall is a bad idea. You need a hammer.
Tim wrote a blog on techblog.wikimedia.org in July 2010 about MediaWiki version statistics. Someone commented that it was ironic that Wikimedia was using WordPress instead of MediaWiki as a blogging platform. Tim's response: they do different things.[1]
This isn't a matter of not knowing what the problem is. The problem is recognized by the leading MediaWiki developers and it's an old software principle (cf. Unix's philosophy[2] of doing one thing and doing it well). The full phrase quoted earlier is "jack of all trades, master of none." I think MediaWiki fits this perfectly.
I don't think using a general purpose wiki engine for every project is inherently a poor idea. MediaWiki is highly extensible. We just, for some reason, haven't really taken advantage of that where it could really matter. Most of the extensions we use just kind of work in the background. I don't know if its due to lack of resources, or whether the WMF wants all the projects to look and work the same.
Wiktionary is probably the easiest example. All of the entries follow a fairly rigid layout that lends itself rather easily to a form, yet we're still inputting them using a single big textarea.
Though that's not to say we couldn't still do better than we are with a general purpose wiki engine. I still stand by my earlier suggestion that we drop the requirement that everything WMF uses has to be able to work for others right out of the box using only PHP. We should use PHP when possible, but it shouldn't be a limitation.
On Wed, Dec 29, 2010 at 9:58 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
Question: assuming that our primary interest is creating software for Wikipedia and similar WMF projects, do we actually get anything from the Windows PC intranet users that offsets the cost of keeping MediaWiki friendly to both environments? In other words, do we get contributions from them that help us do Wikipedia et al,?
Why would I contribute to software that I can't even run?
On Wed, Dec 29, 2010 at 2:26 AM, David Gerard dgerard@gmail.com wrote:
On 29 December 2010 08:24, MZMcBride z@mzmcbride.com wrote:
To me (and others), that leaves the question of what would happen if you wrote some software that was actually built for making an encyclopedia, rather than the jack of all trades product that MediaWiki is.
MediaWiki is precisely that software.
However - see also the other threads on other lists recently, about MW core failings.
MW was designed to build an encyclopedia with Web 1.5 technology. It was a major step forwards compared to its contemporaries, but sites like Gmail, Facebook, Twitter are massive user experience advances over where we are and can credibly go with MediaWiki.
So, to modify David's comment -
MediaWiki *was* precisely that software.
----- Original Message -----
From: "George Herbert" george.herbert@gmail.com
MW was designed to build an encyclopedia with Web 1.5 technology. It was a major step forwards compared to its contemporaries, but sites like Gmail, Facebook, Twitter are massive user experience advances over where we are and can credibly go with MediaWiki.
MediaWiki is nearly perfectly usable from my Blackberry with CSS, images, and JavaScript disabled; please don't break that.
Cheers, -- jra
2010-12-29 08:31, Neil Kandalgaonkar:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
If one would have a budget of gazillions of dollars then it would be quite easy ;-). The problem is - what would be the point of investing such money if you wouldn't get it back from this investment?
If you wouldn't have such money (mostly to pay users for creating content), then the most problematic part would be to convince community you are OK. IMHO this has nothing to do with usability or any such thing it's rather a matter of gaining trust. A part from that you would have to make all (or almost all) the things that work now work. If you would make a brand new software then you would have to rewrite at least most popular user scripts which would alone be a lot work. You would probably also have to make a nice WYSIWYG to make your site worth moving to. To make it worth to change at least some of user habits. Not to mention your site would need to build on a quikcly scalable infrastructure to guarantee high availabilty (at least as high as Wikipedia is).
In general, you have to remember that even if something is technically better it's not guaranteed to be successful. For example I think that DP (pgdp.net) and Rastko are technically better equiped for proofreading then Wikisource, but I guess for thoose already familiar with MediaWiki it's easier to create texts for Wikisource.
Regards, Nux.
On Wed, Dec 29, 2010 at 12:36 PM, Maciej Jaros egil@wp.pl wrote:
If one would have a budget of gazillions of dollars then it would be quite easy ;-). The problem is - what would be the point of investing such money if you wouldn't get it back from this investment?
While money can fix a lot of things, I don't think the current bottleneck is money. To break stuff you need to find community consensus, developer consensus, somebody willing to implement it and somebody to review it. Of course for a gazillion dollars you could perhaps the eliminate a few of these steps, but in general they are not really easy to solve with money I think.
Bryan Tong Minh (2010-12-29 13:05):
On Wed, Dec 29, 2010 at 12:36 PM, Maciej Jarosegil@wp.pl wrote:
If one would have a budget of gazillions of dollars then it would be quite easy ;-). The problem is - what would be the point of investing such money if you wouldn't get it back from this investment?
While money can fix a lot of things, I don't think the current bottleneck is money. To break stuff you need to find community consensus, developer consensus, somebody willing to implement it and somebody to review it. Of course for a gazillion dollars you could perhaps the eliminate a few of these steps, but in general they are not really easy to solve with money I think.
Well if you would pay users for editing you could attract more users (at least those that are not willing to work for free). But I guess that would only work if you would have practically unlimited resources... Having said that I just remembered that Youtube works like that - you can get money if get a lot of viewers on your movies.
Regards, Nux.
On 12/29/2010 7:05 AM, Bryan Tong Minh wrote:
On Wed, Dec 29, 2010 at 12:36 PM, Maciej Jaros egil@wp.pl wrote:
If one would have a budget of gazillions of dollars then it would be quite easy ;-). The problem is - what would be the point of investing such money if you wouldn't get it back from this investment?
While money can fix a lot of things, I don't think the current bottleneck is money. To break stuff you need to find community consensus, developer consensus, somebody willing to implement it and somebody to review it. Of course for a gazillion dollars you could perhaps the eliminate a few of these steps, but in general they are not really easy to solve with money I think.
I think one of the biggest obstacles to improving the Wikipedia user experience is the requirement that the content has to not only be reusable, but reusable with a minimum amount of effort - i.e. on a free shared hosting environment with neither shell access nor the ability to install or compile programs. With only a couple exceptions[1] any software that's required to display Wikipedia content has to be PHP, have a PHP implementation available, or be done client-side (and of course, we can't use Flash). We're hamstrung by the limitations of what can be reasonably done in pure PHP even in cases when we would be using a C extension or shelling out to an executable.
The recently revived discussion on StringFunctions is a good example of this. Tim and others don't want to install StringFunctions because it will just increase the complexity of wikitext and, like ParserFunctions, will only be a temporary fix until template coders write new templates that reach new limits created. A real solution to the issue is to use a real programming language in place of wikitext for complex templates. But until the aforementioned limitation is relaxed, that's likely never going to happen. We have to either implement an existing language like Lua in PHP or write our own language and maintain 2 implementations of it (the compiled version for WMF and the pure PHP version).
[1] LaTeX and EasyTimeline
On 12/29/10 4:05 AM, Bryan Tong Minh wrote:
On Wed, Dec 29, 2010 at 12:36 PM, Maciej Jarosegil@wp.pl wrote:
If one would have a budget of gazillions of dollars then it would be quite easy ;-). The problem is - what would be the point of investing such money if you wouldn't get it back from this investment?
While money can fix a lot of things, I don't think the current bottleneck is money.
I apologize for sending this discussion in a direction I hadn't intended. The money was purely to imply that you had to be motivated, not that you had a vast budget.
Let me be more explicit. The "innovator's dilemma" problem, already referred to in this discussion, occurs because the successful innovator can't see past the goal of defending their earlier successes, and working with their existing assets.
The thought experiment of working for a competitor was meant to suggest this: what would you do if you wanted to make Wikipedia's earlier successes *obsolete*? The point is to then try to look at some of our greatest assets and see if, in the current environment, they could be potential liabilities.
And the followup question was "if a competitor can do this, why don't WE do this?"
Brion already suggested something like this, where we would end up with a transition regime between old and new.
P.S. All due respect to RobLa, but "Microsoft tried this and failed" doesn't exactly convince me it's impossible. ;)
Neil Kandalgaonkar (2010-12-29 21:40):
On 12/29/10 4:05 AM, Bryan Tong Minh wrote:
On Wed, Dec 29, 2010 at 12:36 PM, Maciej Jarosegil@wp.pl wrote:
If one would have a budget of gazillions of dollars then it would be quite easy ;-). The problem is - what would be the point of investing such money if you wouldn't get it back from this investment?
While money can fix a lot of things, I don't think the current bottleneck is money.
I apologize for sending this discussion in a direction I hadn't intended. The money was purely to imply that you had to be motivated, not that you had a vast budget.
Let me be more explicit. The "innovator's dilemma" problem, already referred to in this discussion, occurs because the successful innovator can't see past the goal of defending their earlier successes, and working with their existing assets.
The thought experiment of working for a competitor was meant to suggest this: what would you do if you wanted to make Wikipedia's earlier successes *obsolete*? The point is to then try to look at some of our greatest assets and see if, in the current environment, they could be potential liabilities.
My original point was that the community is the power of WMF sites and that this alone is IMHO hard to beat. To be more exact this is a community that I believe is loyal and needs to trust the corporation/founder/foundation behind the site (I've seen a community driven project fall after loosing this trust).
And the followup question was "if a competitor can do this, why don't WE do this?"
We don't because it would probably be more reasonable for our competitor to do something completely different to gather different community or he would have to make a gigantic effort to steal current community (both in technical and PR terms). I think the effort would simply be inefficient.
In any case - the next killer functionality (if that's what you're asking) is well known and already mentioned - WYSIWYG. WYSIWYG that makes edits easy for new users and make them not break existing markup. And yes I believe present markup needs to be preserved. Not because it's good, it's because it is well know to many current users. It's because community is accustomed with it. Loosing users after changing markup drastically would certainly not be a good idea. You have to remember how many disappointment brought a simple change of default skin. Something that can be changed back in 3 clicks. And so new markup (if such would be used) would have to at least be parseable back to wikitext.
Regards, Nux.
On 12/29/2010 08:31 AM, Neil Kandalgaonkar wrote:
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
This is a foolish discussion for two reasons. First, wikitech-l is a technical list, and not suited for talk on organizational change. Second, innovation doesn't come from within. Encyclopaedia Britannica didn't invent Wikipedia, AT&T didn't invent the Internet, Gorbachev didn't succeed in implementing competition within the communist party (although he tried), and dinosaurs weren't invited to the design committee for the surviving mammals. If there is a new alternative to Wikipedia, we, the subscribers to wikitech-l, are not invited to design it.
What we can easily achieve is to make bureaucracy so slow and rigid (and our discussions derailed, like now) that more people will leave WMF projects. But this is not enough to create a working alternative.
On Tue, Dec 28, 2010 at 11:31 PM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
I'll start off by saying that I have no idea how anyone would do it, realistically. I'm pretty sure it's possible, but I think a big reason that it hasn't happened yet is because the economics of creating a competitor are really difficult. There are very few markets that Microsoft completely gave up in (especially markets in which they've had success), but yet that's exactly what they did with Encarta. Good luck getting VC money to take on a market that Microsoft abandons. ;-)
I suspect if I had to choose, though, I'd go with #2. I'd probably bootstrap by creating tools *for* Wikipedia editors rather than trying right off the bat to create a wholly separate site. For example, it'd probably be possible to scrape our data to create a really fantastic citation database, which then could be used to build tools that make creating citations much easier. The goal would be to make it easier for editors to keep *my* database up-to-date, and push a copy to Wikipedia, rather than having to constantly suck things out of Wikipedia.
That's such a small part of the overall editing problem that I'm not sure how I'd bootstrap that into something much larger (and in the case of a citation database, it wouldn't be necessary for Wikipedia to lose in order to have a modest ad-supported business).
As to the implied question, I think we need to figure out ways of making things like this easier for third parties to tackle. If we can make it easier for third parties to create tools for editing Wikipedia (regardless of their motivations), we'll probably accidentally make it easier for us to make it easier to edit Wikipedia.
Rob
I would steal some of the better ideas from Wikia like the "hot article" lists, user polls, user avatars, and throw in some real-time collaboration software a la Etherpad.
Ryan Kaldari
On 12/28/10 11:31 PM, Neil Kandalgaonkar wrote:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
Of course, you have to remember that Wikipedia is a top 10 website. Wikia is a top 200 website. "hot article"s just don't scale that well to a wiki like Wikipedia. It's fundamentally flawed.
On the flip side, an Etherpad-like feature would be nice.
-X!
On Dec 29, 2010, at 6:41 PM, Ryan Kaldari wrote:
I would steal some of the better ideas from Wikia like the "hot article" lists, user polls, user avatars, and throw in some real-time collaboration software a la Etherpad.
Ryan Kaldari
On 12/28/10 11:31 PM, Neil Kandalgaonkar wrote:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Actually, I would implement "hot articles" per WikiProject. So, for example, you could see the 5 articles under WikiProject Arthropods that had been edited the most in the past week. That should scale well. In fact, I would probably redesign Wikipedia to be WikiProject-based from the ground up, rather than as an afterthought. Like when you first sign up for an account it asks you which WikiProjects you want to join, etc. and there are cool extensions for earning points and awards within WikiProjects (that don't require learning how to use templates).
Ryan Kaldari
On 12/29/10 3:49 PM, Soxred93 wrote:
Of course, you have to remember that Wikipedia is a top 10 website. Wikia is a top 200 website. "hot article"s just don't scale that well to a wiki like Wikipedia. It's fundamentally flawed.
On the flip side, an Etherpad-like feature would be nice.
-X!
On Dec 29, 2010, at 6:41 PM, Ryan Kaldari wrote:
I would steal some of the better ideas from Wikia like the "hot article" lists, user polls, user avatars, and throw in some real-time collaboration software a la Etherpad.
Ryan Kaldari
On 12/28/10 11:31 PM, Neil Kandalgaonkar wrote:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Ryan Kaldari wrote:
Actually, I would implement "hot articles" per WikiProject. So, for example, you could see the 5 articles under WikiProject Arthropods that had been edited the most in the past week. That should scale well. In fact, I would probably redesign Wikipedia to be WikiProject-based from the ground up, rather than as an afterthought. Like when you first sign up for an account it asks you which WikiProjects you want to join, etc. and there are cool extensions for earning points and awards within WikiProjects (that don't require learning how to use templates).
Ryan Kaldari
Well, that's an interesting point. People ask for things like a "chat per article" without realising what that would mean. Grouping communication in bigger "wikiproject" channels could work. Although some tree-like structure would be needed to manually split / magically join depending on the amount of people there.
On Wed, Dec 29, 2010 at 2:31 AM, Neil Kandalgaonkar neilk@wikimedia.org wrote:
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
By having content that's consistently better. It doesn't matter how easy your site is to edit. Even if your site is so easy to edit that you get 10% of viewers editing, 10% of your few million (at best) viewers is still going to get you vastly worse content than a small fraction of a percent of Wikipedia's billions. Wikipedia survives off network effects; it's not even remotely a level playing field. People who are focusing on things like WYSIWYG or better-quality editing software are missing the point. You need to have better *content* to attract viewers, before you even stand a chance of edits through your site being meaningful.
If you somehow manage to have content that's consistently better than Wikipedia's, though, people will figure out over time, as long as you can maintain the quality advantage. One obvious strategy would be to mirror Wikipedia in real time and send viewers to Wikipedia proper to edit it, but to have more useful features or a better experience. Maybe a better mobile site, maybe faster page load times, maybe easier navigation or search. Maybe more content, letting people put up vanity bios or articles about obscure webcomics that integrate more or less seamlessly with the Wikipedia corpus. You could even compete by putting up a better editing interface, conceivably, although auth would be tricky to work out. If you ever got a majority of viewers coming to your site, you could fork transparently.
On Wed, Dec 29, 2010 at 6:59 PM, Brion Vibber brion@pobox.com wrote:
I think this isn't as useful a question as it might be; defining a project in terms of competing with something else leads to stagnation, not innovation.
I agree. The correct strategy to take down Wikipedia would involve overcoming the network effect that locks it into its current position of dominance, and that's not something that would be useful for Wikipedia itself to do. To fend off attacks of this sort, what you'd want is to make your content harder to reuse, which we explicitly *don't* want to do. Better to ask: how can we enable more people to contribute who want to but can't be bothered?
On 30 December 2010 00:27, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
You could even compete by putting up a better editing interface, conceivably, although auth would be tricky to work out.
You know, this is something that would be extremely easy to experiment with right now,
On Wed, Dec 29, 2010 at 6:59 PM, Brion Vibber brion@pobox.com wrote:
I think this isn't as useful a question as it might be; defining a project in terms of competing with something else leads to stagnation, not innovation.
I agree. The correct strategy to take down Wikipedia would involve overcoming the network effect that locks it into its current position of dominance, and that's not something that would be useful for Wikipedia itself to do. To fend off attacks of this sort, what you'd want is to make your content harder to reuse, which we explicitly *don't* want to do. Better to ask: how can we enable more people to contribute who want to but can't be bothered?
Making Wikipedia easy to mirror and fork is the best protection I can think of for the content itself. It also keeps the support structures (Foundation) and community good and honest. Comparison: People keep giving Red Hat money; Debian continues despite a prominent and successful fork (Ubuntu), and quite a bit goes back from the fork (both pull and push).
- d.
On 29/12/10 18:31, Neil Kandalgaonkar wrote:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
This has been done before: Wikinfo, Citizendium, etc.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
This is basically Wikia's business model. I think you need to think outside the box.
I would make it more like World of Warcraft. We should incentivise people to set up wiki sweatshops in Indonesia, paying local people to "grind" all day, cleaning up articles, in order to build up a level 10 admin character that can then be sold for thousands of dollars on the open market. Also it should have cool graphics.
OK, if you want a real answer: I think if you could convince admins to be nicer to people, then that would make a bigger impact to Wikipedia's long-term viability than any ease-of-editing feature. Making editing easier will give you a one-off jump in editing statistics, it won't address the trend.
We know from interviews and departure messages that the editing interface creates an initial barrier for entry, but for people who get past that barrier, various social factors, such as incivility and bureaucracy, limit the time they spend contributing.
Once you burn someone out, they don't come back for a long time, maybe not ever. So you introduce a downwards trend which extends over decades, until the rate at which we burn people out meets the rate at which new editors are born.
Active, established editors have a battlefront mentality. They feel as if they are fighting for the survival of Wikipedia against a constant stream of newbies who don't understand or don't care about our policies. As the stream of newbies increases, they become more desperate, and resort to more desperate (and less civil) measures for controlling the flood.
Making editing easier could actually be counterproductive. If we let more people past the editing interface barrier before we fix our social problems, then we could burn out the majority of the Internet population before we figure out what's going on. Increasing the number of new editors by a large factor will increase the anxiety level of admins, and thus accelerate this process.
I think there are things we can do in software to help de-escalate this conflict between established editors and new editors.
One thing we can do is to reduce the sense of urgency. Further deployment of FlaggedRevs (pending changes) is the obvious way to do this. By hiding recent edits, admins can deal with bad edits in their own time, rather reacting in the heat of the moment.
Another thing we could do is to improve the means of communication. Better communication often helps to de-escalate a conflict.
We could replace the terrible user talk page interface with an easy-to-use real-time messaging framework. We could integrate polite template responses with the UI. And we could provide a centralised forum-like view of such messages, to encourage mediators to review and de-escalate emotion-charged conversations.
-- Tim Starling
On 12/29/10 7:26 PM, Tim Starling wrote:
OK, if you want a real answer: I think if you could convince admins to be nicer to people, then that would make a bigger impact to Wikipedia's long-term viability than any ease-of-editing feature. Making editing easier will give you a one-off jump in editing statistics, it won't address the trend.
We know from interviews and departure messages that the editing interface creates an initial barrier for entry, but for people who get past that barrier, various social factors, such as incivility and bureaucracy, limit the time they spend contributing.
For me the usability projects always had the unstated intent of broadening the pool of good editors. More hands to ease the burdens of the beleagured admins, and also fresher blood that wasn't quite as ensconced in wikipolitics.
But overall I agree.
Making editing easier could actually be counterproductive. If we let more people past the editing interface barrier before we fix our social problems, [...]
This is an interesting insight!
I have been thinking along these lines too, although in a more haphazard way.
At some point, if we believe our community is our greatest asset, we have to think of Wikipedia as infrastructure not only for creating high quality articles, but also for generating and sustaining a high quality editing community. My sense is that the Wiki* communities are down with goal #1, but goal #2 is not on their radar at all.
So we probably need an employee dedicated to this. (I think? Arguments?)
When the Usability Project closed down, the team was also unhappy with the narrow focus paid to editing. Research showed the most serious problems were elsewhere. We then said we were going to address UX issues in a very broad way, which included social issues. Unfortunately the person in charge of that left the Foundation soon after and in the kerfuffle I'm not sure if we now have anybody whose primary job it is to think about the experience of the user in such broad terms.
2010/12/30 Neil Kandalgaonkar neilk@wikimedia.org
On 12/29/10 7:26 PM, Tim Starling wrote:
Making editing easier could actually be counterproductive. If we let more people past the editing interface barrier before we fix our social problems, [...]
This is an interesting insight!
Yes it's really interesting and highlighting!
I'm following another talk about StringFunctions; and I recently got an account into toolserver (I only hope that my skill is merely sufficient!). In both cases, there's an issue of "security by obscurity". I hate it at beginning, but perhaps such an approach is necessary, it's the simplest way to get a very difficult result.
So, what's important is, the balance between simplicity and complexity, since this turns out into a "contributor filter". At the beginning, wiki markup has been designed to be very simple. A very important feature of markup has been sacrificed: the code is not "well formed". There are lots of simple, but ambiguous tags (for bold and italic characters, for lists); tags don't need to be closed; text content and tags/attributes are mixed freely into the template code. This makes simpler their use but causes terrible quizzes for advanced users facing with unusual cases or trying to parse wikitext by scripts or converting wikitext into a formally well formed markup. My question is: can we imagine to move a little bit that balance accepting a little more complexity and to think to a well formed wiki markup?
Alex
Neil Kandalgaonkar wrote:
I have been thinking along these lines too, although in a more haphazard way.
At some point, if we believe our community is our greatest asset, we have to think of Wikipedia as infrastructure not only for creating high quality articles, but also for generating and sustaining a high quality editing community. My sense is that the Wiki* communities are down with goal #1, but goal #2 is not on their radar at all.
So we probably need an employee dedicated to this. (I think? Arguments?)
He would be quite busy (and polyglot!) to keep an eye over the community of +800 projects.
On 12/30/10 10:24 AM, Platonides wrote:
Neil Kandalgaonkar wrote:
At some point, if we believe our community is our greatest asset, we have to think of Wikipedia as infrastructure not only for creating high quality articles, but also for generating and sustaining a high quality editing community.
So we probably need an employee dedicated to this. (I think? Arguments?)
He would be quite busy (and polyglot!) to keep an eye over the community of +800 projects.
Why is this a requirement?
If you think about the sum total of user-hours spent on Wikipedia, the vast majority of them are spent in just three or four interface flows.
But you're right; they can't be everywhere, so maybe there should be a guidelines page on design principles. We have WP:CIVILITY, do we have similar guidelines for software developers, on how to make it easy for the community to be civil?
Frankly I don't think I'm qualified to do this. I know of a few people are brilliant at this, and who do this sort of thing for a living, but they are consultants. Fostering community on the web is generally considered a sort of black art... does anybody know of any less mystified way of dealing with the problem?
Neil Kandalgaonkar wrote:
On 12/30/10 10:24 AM, Platonides wrote:
Neil Kandalgaonkar wrote:
At some point, if we believe our community is our greatest asset, we have to think of Wikipedia as infrastructure not only for creating high quality articles, but also for generating and sustaining a high quality editing community.
So we probably need an employee dedicated to this. (I think? Arguments?)
He would be quite busy (and polyglot!) to keep an eye over the community of +800 projects.
Why is this a requirement?
The point is, there's no "one community" to "watch". Most people think in enwiki, for being the biggest project, and most probably the base project of those people.
But one must not forget that there are many WMF projects out there. It doesn't end in enwp. They have similar problems, but cannot be generalised either. There's a risk of contracting someone as an injerence on the project (seems the role for a facilitator, but I'd only place people that were already in the community -the otrs folks seem a good fishing pool-, if doing such thing). Plus, there's the view on how it may be perceived (WMF trying to impose its views over the community, WMF really having power on the project and thus being liable...).
If you think about the sum total of user-hours spent on Wikipedia, the vast majority of them are spent in just three or four interface flows.
What are you thinking about? Things such as talk page messages. There are shortcuts for those interfaces. Several gadgets/scripts provide a tab for adding a template to a page + leave a predefined message to the author talk page. That's good in a sense as the users *get* messages (eg. when listing images for deletion), they are also quite full and translated (relevant just for commons). But it also means that it's a generic message, so not as appropiate for everyone.
We can make the flow faster, but we lose precision.
But you're right; they can't be everywhere, so maybe there should be a guidelines page on design principles. We have WP:CIVILITY, do we have similar guidelines for software developers, on how to make it easy for the community to be civil?
I'm lost here. Are you calling uncivil the developer community for this thread? You mean that WP:CIVILITY should be enforced by mediawiki? Developers should be more helopful when dealing bug reports? What do you mean?
Frankly I don't think I'm qualified to do this. I know of a few people are brilliant at this, and who do this sort of thing for a living, but they are consultants. Fostering community on the web is generally considered a sort of black art... does anybody know of any less mystified way of dealing with the problem?
On 12/30/10 3:33 PM, Platonides wrote:
But you're right; they can't be everywhere, so maybe there should be a guidelines page on design principles. We have WP:CIVILITY, do we have similar guidelines for software developers, on how to make it easy for the community to be civil?
I'm lost here. Are you calling uncivil the developer community for this thread? You mean that WP:CIVILITY should be enforced by mediawiki? Developers should be more helopful when dealing bug reports? What do you mean?
I guess I have not been clear... I was picking up on what Tim said, that we have to work on making WP and other projects into places where people feel more welcome.
Telling people to be nicer may help, but I actually think that people are more shaped by their environment. If you go from a party at a friend's warm apartment to an anonymous street your mood and receptiveness to others changes instantly.
The point is to make MediaWiki more like the friend's apartment, and less like the anonymous street. If we have interfaces that make it easy for admins to be rude to new editors, they will be more rude. If we make it easy to be nice, then maybe they'll also be nicer. This isn't a radical new idea.
Tim already noted that he hopes Pending Changes (nee FlaggedRevs) would help people be less brusque with one another. Polite template responses, things like that.
Users are influenced by very subtle cues. Understanding how they work is a very rare ability. So I was suggesting we collect rules of thumb for people who are making interfaces. Not policies to bash each other with.
Tim Starling wrote:
OK, if you want a real answer: I think if you could convince admins to be nicer to people, then that would make a bigger impact to Wikipedia's long-term viability than any ease-of-editing feature. Making editing easier will give you a one-off jump in editing statistics, it won't address the trend.
We know from interviews and departure messages that the editing interface creates an initial barrier for entry, but for people who get past that barrier, various social factors, such as incivility and bureaucracy, limit the time they spend contributing.
Is there any evidence to support these claims? From what I understand, a lot of Wikipedia's best new content is added by anonymous users.[1] Thousands more editors are capable of registering and editing without much interaction with the broader Wikimedia community at all. If there's evidence that mean admins are a credible threat to long-term viability, I'd be interested to see it.
Given that there are about 770 active administrators[2] on the English Wikipedia and I think you could reasonably say that a good portion are not mean, is it really quite a few people who are having this far-reaching impact that you're suggesting exists? That seems unlikely.
Making editing easier could actually be counterproductive. If we let more people past the editing interface barrier before we fix our social problems, then we could burn out the majority of the Internet population before we figure out what's going on. Increasing the number of new editors by a large factor will increase the anxiety level of admins, and thus accelerate this process.
I think the growth should be organic. With a better interface in place, a project has a much higher likelihood of successful, healthy growth.
One thing we can do is to reduce the sense of urgency. Further deployment of FlaggedRevs (pending changes) is the obvious way to do this. By hiding recent edits, admins can deal with bad edits in their own time, rather reacting in the heat of the moment.
Endless backlogs are going to draw people in? Delayed gratification is going to keep people contributing? This proposal seems anti-wiki in a literal and philosophical sense.
MZMcBride
[1] http://www.cs.dartmouth.edu/reports/abstracts/TR2007-606/ [2] http://en.wikipedia.org/wiki/Wikipedia:List_of_administrators
On 30 December 2010 11:06, MZMcBride z@mzmcbride.com wrote:
Tim Starling wrote:
OK, if you want a real answer: I think if you could convince admins to be nicer to people, then that would make a bigger impact to Wikipedia's long-term viability than any ease-of-editing feature. Making editing easier will give you a one-off jump in editing statistics, it won't address the trend.
Given that there are about 770 active administrators[2] on the English Wikipedia and I think you could reasonably say that a good portion are not mean, is it really quite a few people who are having this far-reaching impact that you're suggesting exists? That seems unlikely.
There is some discussion of how the community and ArbCom enable grossly antisocial behaviour on internal-l at present. Admin behaviour is enforced by the ArbCom, and the AC member on internal-l has mostly been evasive. It's not clear what approach would work at this stage; it would probably have to get worse before the Foundation could reasonably step in.
- d.
On 30 December 2010 09:07, David Gerard dgerard@gmail.com wrote:
On 30 December 2010 11:06, MZMcBride z@mzmcbride.com wrote:
Tim Starling wrote:
OK, if you want a real answer: I think if you could convince admins to be nicer to people, then that would make a bigger impact to Wikipedia's long-term viability than any ease-of-editing feature. Making editing easier will give you a one-off jump in editing statistics, it won't address the trend.
Given that there are about 770 active administrators[2] on the English Wikipedia and I think you could reasonably say that a good portion are
not
mean, is it really quite a few people who are having this far-reaching impact that you're suggesting exists? That seems unlikely.
There is some discussion of how the community and ArbCom enable grossly antisocial behaviour on internal-l at present. Admin behaviour is enforced by the ArbCom, and the AC member on internal-l has mostly been evasive. It's not clear what approach would work at this stage; it would probably have to get worse before the Foundation could reasonably step in.
Perhaps if communication actually took place with Arbcom itself, rather than on a list in which there is no Arbcom representative, there might be a better understanding of the concerns you have mentioned. There's no "Arbcom representative" on internal-L, and in fact this is something of a bone of contention.
Nonetheless, I think the most useful post in this entire thread has been Tim Starling's, and I thank him for it.
Risker (who is coincidentally an enwp Arbitration Committee member but is in no way an Arbcom representative on this list)
On Fri, Dec 31, 2010 at 1:07 AM, David Gerard dgerard@gmail.com wrote:
There is some discussion of how the community and ArbCom enable grossly antisocial behaviour on internal-l at present. Admin behaviour is enforced by the ArbCom, and the AC member on internal-l has mostly been evasive.
Wtf? ArbCom members are expected to be responsive to discussions about English Wikipedia occurring on internal-l? Could you please clarify who are you're obliquely attacking here?
-- John Vandenberg
Blog post on this topic:
http://davidgerard.co.uk/notes/2010/12/30/how-does-a-project-bite-only-the-p...
- d.
With open source software, there are people who think “that’s dumb,” there are people who think “I want to see it >fixed” and there are people who think “I can do something about it.” The people at the intersection of all three >power open source.
A lot of people in the open source project Y will not see a problem with X, being X a huge usability problem that stop a lot of people from using Y.
So what you have is a lot of people "I don't see the problem with that" ( realistically, a lot of people that will talk about a lot of things, and not about X ), and maybe some of the people that have problems with X that don't know how to communicate his problem, or don't care enough.
Any open source project work like a club. The club work for the people that is part of the club, and does the things that the people of the club enjoy. If you like chess, you will not join the basket club, and probably the basket club will never run a chess competition. Or the chess club a basket competition.
If anything, the "Problem" with open source, is that any change is incremental, and there's a lot of endogamy.
Also user suggestions are not much better. Users often ask for things that are too hard, or incremental enhancements that will result on bloat on the long term.
So really, what you may need is one person that can see the problems of the newbies, of the devs, of the people with a huge investment on the project, and make long term decisions, and have a lot of influence on the people, while working on the shadows towards that goal.
On Wed, Dec 29, 2010 at 10:26 PM, Tim Starling tstarling@wikimedia.org wrote:
I think there are things we can do in software to help de-escalate this conflict between established editors and new editors.
One thing we can do is to reduce the sense of urgency. Further deployment of FlaggedRevs (pending changes) is the obvious way to do this. By hiding recent edits, admins can deal with bad edits in their own time, rather reacting in the heat of the moment.
Another thing we could do is to improve the means of communication. Better communication often helps to de-escalate a conflict.
We could replace the terrible user talk page interface with an easy-to-use real-time messaging framework. We could integrate polite template responses with the UI. And we could provide a centralised forum-like view of such messages, to encourage mediators to review and de-escalate emotion-charged conversations.
We could also try to work out ways to make adminship less important. If protection, blocking, and deletion could be made less necessary and important in day-to-day editing, that would reduce the importance of admins and reduce the difference between established and new contributors. You could often make do with much "softer" versions of these three things, which could be given out much more liberally.
For instance, to replace blocking, you could have a system whereby any reasonably established editor (> X edits/Y days) can place another editor or IP address in moderation, so that their edits have to be approved before going live, in Flagged Revs style. As with blocking, any established editor could also reverse such a block. Abuse would thus be easily reversed and fairly harmless (since the edits could go through automatically when it's lifted, barring conflicts). Sysops would only be necessary if people with established accounts abuse their rights.
Likewise, most deletion doesn't really need to make anything private. Reasonably established editors could be given the right to soft-delete a page such that any other such editor could read or undelete it. This would be fine for the vast majority of deletions, like vanity pages and spam. Sysops would only have to get involved for copyright infringement, privacy issues, and so on.
As for protection, we already have Flagged Revs. Lower levels of flagging should be imposable by people other than sysops, and since those largely supersede semiprotection, sysops would again only be needed to adjudicate disputes between established editors (like full-protecting an edit-warred page). Obviously, all these rights would be revocable by sysops in the event of abuse.
Unfortunately, I don't think that technical solutions are going to fix the problem on enwiki. I think the only thing that will do it is if Wikimedia adopts more explicit policies about creating a friendly editing environment, and enforces them in the same vein as it does copyright policies. But that's easier said than done for a number of reasons.
Aryeh Gregor wrote:
We could also try to work out ways to make adminship less important. If protection, blocking, and deletion could be made less necessary and important in day-to-day editing, that would reduce the importance of admins and reduce the difference between established and new contributors. You could often make do with much "softer" versions of these three things, which could be given out much more liberally.
For instance, to replace blocking, you could have a system whereby any reasonably established editor (> X edits/Y days) can place another editor or IP address in moderation, so that their edits have to be approved before going live, in Flagged Revs style. As with blocking, any established editor could also reverse such a block. Abuse would thus be easily reversed and fairly harmless (since the edits could go through automatically when it's lifted, barring conflicts). Sysops would only be necessary if people with established accounts abuse their rights.
Likewise, most deletion doesn't really need to make anything private. Reasonably established editors could be given the right to soft-delete a page such that any other such editor could read or undelete it. This would be fine for the vast majority of deletions, like vanity pages and spam. Sysops would only have to get involved for copyright infringement, privacy issues, and so on.
As for protection, we already have Flagged Revs. Lower levels of flagging should be imposable by people other than sysops, and since those largely supersede semiprotection, sysops would again only be needed to adjudicate disputes between established editors (like full-protecting an edit-warred page). Obviously, all these rights would be revocable by sysops in the event of abuse.
There's an extension to 'delete' pages by blanking. I find that approach much more wiki. We should also work on allowing more protection levels. Fixing problems with the "if you can protect, you can edit anything" behavior and such.
On 31 December 2010 00:02, Platonides Platonides@gmail.com wrote:
There's an extension to 'delete' pages by blanking. I find that approach much more wiki.
"Pure wiki deletion" is a perennial proposal. One problem is that there doesn't appear to be a wiki anywhere that actually uses it, or ever have been one. (I've asked for examples before - does anyone have any?) This suggests that the biggest wiki in the world might not be the greatest place to be the very first.
- d.
On 31 December 2010 00:08, David Gerard dgerard@gmail.com wrote:
On 31 December 2010 00:02, Platonides Platonides@gmail.com wrote:
There's an extension to 'delete' pages by blanking. I find that approach much more wiki.
"Pure wiki deletion" is a perennial proposal. One problem is that there doesn't appear to be a wiki anywhere that actually uses it, or ever have been one. (I've asked for examples before - does anyone have any?) This suggests that the biggest wiki in the world might not be the greatest place to be the very first.
If you want to being the biggest wiki in the world to mean anything, you need to innovate. Wikipedia will continue to stagnate if everyone is too scared to try out new stuff. This is in my mind the biggest problem facing Wikimedia — it's suffering from complete feature-freeze because everyone is so scared of making a mistake. On all fronts, encyclopedic, social, technical, nothing has really moved forward at all for the last year or two. Sure, we've optimized a few workflows, tightened a few procedures, and added some content — but there's no innovation, nothing exciting and new.
Evolution is the best model we have for how to build something, the way to keep progress going is to continually try new things; if they fail, "meh", if they succeed — "yay"! There are no planning meetings, no months of deliberation about exactly what shape a finger should be. Sure, nothing built by evolution is "perfect", but that's fine, it will continue to get better in ways not even imaginable from this point in time (everyone knows you can't see into the future, so stop wasting time trying). One reason that wikis are such a good way of creating content is that they use the same process — anyone can make a random change. If it is good, it is kept; if not it isn't. The same model is appearing in other places too. Github allows random people to change software, and only the good stuff gets merged. Google does the same: Wave was a fun idea, it turns out it was also useless — oh well, lesson learnt, move on.
There is no Wikipedia-killer in a concrete sense. The world will continue to evolve. Wikipedia has a simple choice: evolve or get left behind.
Conrad
2010/12/31 Conrad Irwin conrad.irwin@gmail.com
Evolution is the best model we have for how to build something, the way to keep progress going is to continually try new things; if they fail, "meh", if they succeed — "yay"!
Just to add a little bit of "pure theory" into the talk, wiki project is simply one of the most interesting, and successful, models of "adaptive complex systems theory". I encourage anyone to take a deeper look into it. It's both interesting for wiki users/sysops/high level managers and for complex systems researchers.
I guess, complex system theory wuld suggest too politics. Just an example: as in evolution, best environment where something new appears is not the wider environment, but the small ones, the "islands", just like Galapagos in evolution! This would suggest a great attention about what happens into smaller wiki projects. I guess, the most interesting things could be found there, while not so much evolution can be expected into the "mammoth" project. ;-)
Alex (from it.source)
masti wrote:
On 12/31/2010 01:02 AM, Platonides wrote:
There's an extension to 'delete' pages by blanking. I find that approach much more wiki.
if you like to be blocked for blanking ...
masti
If it was the right way of deleting, it would actually be the way specified by the policy... if that page really deserves to be deleted.
On 12/29/2010 10:26 PM, Tim Starling wrote:
On 29/12/10 18:31, Neil Kandalgaonkar wrote:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
This has been done before: Wikinfo, Citizendium, etc.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
This is basically Wikia's business model. I think you need to think outside the box.
I would make it more like World of Warcraft. We should incentivise people to set up wiki sweatshops in Indonesia, paying local people to "grind" all day, cleaning up articles, in order to build up a level 10 admin character that can then be sold for thousands of dollars on the open market. Also it should have cool graphics.
OK, if you want a real answer: I think if you could convince admins to be nicer to people, then that would make a bigger impact to Wikipedia's long-term viability than any ease-of-editing feature. Making editing easier will give you a one-off jump in editing statistics, it won't address the trend.
We know from interviews and departure messages that the editing interface creates an initial barrier for entry, but for people who get past that barrier, various social factors, such as incivility and bureaucracy, limit the time they spend contributing.
Once you burn someone out, they don't come back for a long time, maybe not ever. So you introduce a downwards trend which extends over decades, until the rate at which we burn people out meets the rate at which new editors are born.
Active, established editors have a battlefront mentality. They feel as if they are fighting for the survival of Wikipedia against a constant stream of newbies who don't understand or don't care about our policies. As the stream of newbies increases, they become more desperate, and resort to more desperate (and less civil) measures for controlling the flood.
Making editing easier could actually be counterproductive. If we let more people past the editing interface barrier before we fix our social problems, then we could burn out the majority of the Internet population before we figure out what's going on. Increasing the number of new editors by a large factor will increase the anxiety level of admins, and thus accelerate this process.
One thing that I think could help, at least on the English Wikipedia, would be to further restrict new article creation. Right now, any registered user can create a new article, and according to some statistics I gathered a few months ago[1], almost 25% of new users make their first edit creating an article. 81% of those users had their article deleted and <0.1% of them were still editing a few (6-7) months later, compared to 4% for the 19% whose articles were kept, giving a total retention rate of 1.3%.
However, for the 75% of users who started by editing an existing article, the overall retention rate was 2.5%. Still a small number, but almost double the rate for the article creation route.
The English Wikipedia, with 3.5 million articles, has been scraping the bottom of the notability barrel for a while. Creating a proper new article is not an especially easy task in terms of editing, yet the project practically encourages new users to do it. We're dropping new users into the deep end of the pool, then getting angry at them when they start to drown. What we should be doing instead is suggesting that users add their information to an existing article somewhere (with various tools to help them find it). And if they can't find anything remotely related in 3.5 million articles, ask themselves whether they still think its an appropriate topic.
This is an area where the foundation potentially could step in to change things. Its never going to happen through the community, since there's too many people (or at least too many loud people) with a "more is better" mentality. (Part of the reason I gathered the stats was to prove that most new users don't start by creating an article). They'll scream and moan for a while about how we're being anti-wiki, but in the end, most probably won't really care that much.
Alex wrote:
One thing that I think could help, at least on the English Wikipedia, would be to further restrict new article creation. Right now, any registered user can create a new article, and according to some statistics I gathered a few months ago[1], almost 25% of new users make their first edit creating an article. 81% of those users had their article deleted and <0.1% of them were still editing a few (6-7) months later, compared to 4% for the 19% whose articles were kept, giving a total retention rate of 1.3%.
However, for the 75% of users who started by editing an existing article, the overall retention rate was 2.5%. Still a small number, but almost double the rate for the article creation route.
This is significant, but I'm not convinced about the reason.
There is surely an attacking factor. You make them go through hoops, having to register an account, then destroy its work. It's normal that some potentially good contributors leave. But many of those are single purpose accounts which would only be interested in adding its myspace band, ever. We should support the first type users, but we don't want even its register for the second type.
The English Wikipedia, with 3.5 million articles, has been scraping the bottom of the notability barrel for a while. Creating a proper new article is not an especially easy task in terms of editing, yet the project practically encourages new users to do it. We're dropping new users into the deep end of the pool, then getting angry at them when they start to drown.
Completely. This mentality should be changed.
What we should be doing instead is suggesting that users add their information to an existing article somewhere (with various tools to help them find it). And if they can't find anything remotely related in 3.5 million articles, ask themselves whether they still think its an appropriate topic.
That's a good point, but not suitable for all topics. If I want to create an article that would have been considered relevant you shouldn't make me wander in circles. Some people shouldn't be treated as babies, while others should.
2010/12/29 Tim Starling tstarling@wikimedia.org:
One thing we can do is to reduce the sense of urgency. Further deployment of FlaggedRevs (pending changes) is the obvious way to do this. By hiding recent edits, admins can deal with bad edits in their own time, rather reacting in the heat of the moment.
The actual effect of FlaggedRevs on revert behavior appears to be, if anything, to accelerate reverts. See Felipe Ortega's presentation at Wikimania 2010, page 18 and following: http://en.wikipedia.org/wiki/File:Felipe_Ortega,_Flagged_revisions_study_res...
Performing review actions as quickly as possible is generally seen by FlaggedRevs-using communities as one of the key performance indicators connected with the feature. The moment of performing the review action also tends to be the moment of reverting. I see no evidence, on the other hand, that FlaggedRevs has contributed to a decreased sense of urgency anywhere it's been employed.
It's important to note that FlaggedRevs edits aren't like patches awaiting review. They must be processed in order for anyone's subsequent edits to be reader-visible. Logged-in users, on the other hand, always see the latest version by default. These factors and others may contribute to a sense that edits must be processed as quickly as possible.
I do fully agree with the rest of your note. We have sufficient data to show not only that the resistance against new edits as indicated by the revert ratio towards new users has increased significantly in the last few years, but also that only very few of the thousands of new users who complete their first 10 edits in any given month stick around. Our former contributors survey showed that among people with more than 10 edits/month who had stopped editing, 40% did so because of unpleasant experiences with other editors.
http://strategy.wikimedia.org/wiki/File:Former_contributors_survey_presentat...
While fixing the editing UI is absolutely essential, I strongly agree with your hypothesis that doing so without regard for the problematic social dynamics is likely to only accelerate people's negative experiences. Useful technology changes in the area of new user interaction are a lot harder to anticipate, however, and the only way we're going to learn is through lots of small experiments. We can follow in the footsteps of the GroupLens researchers and others who have experimented with interface changes such as changes to the revert process, and how these affect new user retention:
http://en.wikipedia.org/wiki/User:EpochFail/NICE (See their publications to-date at http://www.grouplens.org/biblio )
Once we've identified paths that are clearly fruitful (e.g. if we find that an experiment with real-time chat yields useful results), we can throw more resources at them to implement proper functionality.
Over the holidays, my mother shared her own "newbie biting" story. She's 64 years old and a professional adult educator. Her clearly constructive good faith edit in the FlaggedRevs-using German Wikipedia [1] was reverted within the minute it was made, without a comment of any kind. She explained that she doesn't have enough frustration tolerance to deal with this kind of behavior.
It's quite likely that we won't be able to make Wikipedia frustration-free enough to retain someone like my mother as an editor, but we should be able to make it a significantly more pleasant experience than it is today.
[1] http://de.wikipedia.org/w/index.php?title=Transaktionsanalyse&diff=76794...
Can I suggest a really simple trick to inject something new into "stagnating" wikipedia?
Simply install Labeled Section Trasclusion into a large pedia project; don't ask, simply install it. If you'd ask, typical pedian boldness would raise a comment "Thanks, we don't need such a thing" for sure. They need it... but they don't know, nor they can admit that a small sister project like source uses currently something very useful.
Let they discover the #lst surprising power.
Alex
2011/1/4 Alex Brollo alex.brollo@gmail.com:
Simply install Labeled Section Trasclusion into a large pedia project;
Just from looking at the LST code, I can tell that it has at least one performance problem: it initializes the parser on every request. This is easy to fix, so I'll fix it today. I can also imagine that there would be other performance concerns with LST preventing its deployment to large wikis, but I'm not sure of that.
Roan Kattouw (Catrope)
2011/1/4 Roan Kattouw roan.kattouw@gmail.com
Just from looking at the LST code, I can tell that it has at least one performance problem: it initializes the parser on every request. This is easy to fix, so I'll fix it today. I can also imagine that there would be other performance concerns with LST preventing its deployment to large wikis, but I'm not sure of that.
Excellent, I'm a passionate user of #lst extension, and I like that its code can be optimized (so I feel combortable to use it more and more). I can't read php, and I take this opportunity to ask you:
1. is #lsth option compatible with default #lst use? 2. I can imagine that #lst simply runs as a "substring finder", and I imagine that substring search is really an efficient, fast and resource-sparing server routine. Am I true? 3. when I ask for a section into a page, the same page is saved into a cache, so that next calls for other sections of the same page are fast and resource-sparing?
What a "creative" use of #lst allows, if it is really an efficient, light routine, is to build named variables and arrays of named variables into one page; I can't imagine what a good programmer could do with such a powerful tool. I'm, as you can imagine, far from a good programmer, nevertheless I built easily routines for unbeliavable results. Perhaps, coming back to the topic..... a good programmer would disrupt wikipedia using #lst? :-)
Alex
2011/1/4 Alex Brollo alex.brollo@gmail.com:
Excellent, I'm a passionate user of #lst extension, and I like that its code can be optimized (so I feel combortable to use it more and more). I can't read php, and I take this opportunity to ask you:
I haven't read the code in detail, and I can't really answer these question until I have. I'll look at these later today, I have some other things to do first.
- is #lsth option compatible with default #lst use?
No idea what #lsth even is or does, nor what you mean by 'compatible' in this case.
- I can imagine that #lst simply runs as a "substring finder", and I
imagine that substring search is really an efficient, fast and resource-sparing server routine. Am I true?
It does seem to load the entire page text (wikitext I think, not sure) and look for the section somehow, but I haven't looked at how it does this in detail.
- when I ask for a section into a page, the same page is saved into a
cache, so that next calls for other sections of the same page are fast and resource-sparing?
I'm not sure whether LST is caching as much as it should. I can tell you though that the "fetch the wikitext of revision Y of page Z" operation is already cached in MW core. Whether the "fetch the wikitext of section X of revision Y of page Z" operation is cached (and whether it makes sense to do so), I don't know.
What a "creative" use of #lst allows, if it is really an efficient, light routine, is to build named variables and arrays of named variables into one page; I can't imagine what a good programmer could do with such a powerful tool. I'm, as you can imagine, far from a good programmer, nevertheless I built easily routines for unbeliavable results. Perhaps, coming back to the topic..... a good programmer would disrupt wikipedia using #lst? :-)
Using #lst to implement variables in wikitext sounds like a terrible hack, similar to how using {{padleft:}} to implement string functions in wikitext is a terrible hack.
Roan Kattouw (Catrope)
2011/1/4 Roan Kattouw roan.kattouw@gmail.com
What a "creative" use of #lst allows, if it is really an efficient, light routine, is to build named variables and arrays of named variables into
one
page; I can't imagine what a good programmer could do with such a
powerful
tool. I'm, as you can imagine, far from a good programmer, nevertheless I built easily routines for unbeliavable results. Perhaps, coming back to
the
topic..... a good programmer would disrupt wikipedia using #lst? :-)
Using #lst to implement variables in wikitext sounds like a terrible hack, similar to how using {{padleft:}} to implement string functions in wikitext is a terrible hack.
Thanks Roan, your statement sound very alarming for me; I'll open a specific thread about into wikisource-l quoting this talk. I'm doing any efford to avoid server/history overload, since I know that I am using a free service (I just fixed {{loop}} template to optimize it into it.source, at my best...) and if you are right, I've to change deeply my approach to #lst.
:-(
Alex
Alex Brollo wrote:
Thanks Roan, your statement sound very alarming for me; I'll open a specific thread about into wikisource-l quoting this talk. I'm doing any efford to avoid server/history overload, since I know that I am using a free service (I just fixed {{loop}} template to optimize it into it.source, at my best...) and if you are right, I've to change deeply my approach to #lst.
:-(
Alex
The reason that labelled section transcluding is only enabled on wikisources, some wiktionaries... is that it is inefficient. Thus your proposal of enable it everywhere would be a bad idea. However, I am just remembering things said in the past. I haven't reviewed it myself.
Do not try to be over-paranoid on not using the fetaures you have available. You can ask for advice if something you have done is sane or not, of course. An interesting point is that labelled section transclusion is enabled on French wikipedia. It's strange that someone got tricked into enabling it on that 'big' project. I wonder how is it being used there.
2011/1/4 Platonides Platonides@gmail.com
Do not try to be over-paranoid on not using the fetaures you have available. You can ask for advice if something you have done is sane or not, of course. An interesting point is that labelled section transclusion is enabled on French wikipedia. It's strange that someone got tricked into enabling it on that 'big' project. I wonder how is it being used there.
French wikisource is hardly working to convert "naked texts" into "texts with scans", this means that theu user very largely #lst. Their work is excellent. There are many contributors. I guess, that many of them work too into pedia. Perhaps they import there their useful tool.
See http://toolserver.org/~thomasv/graphs/Wikisource_-_texts_fr.png to see how hard they use proofreading.
Perhaps #lst is inefficient.... but I'd like to compare #lst and template efficiency. Sometimes I see complex, very complex templates used for so banal aims.... ;-)
Alex
Alex Brollo wrote:
Perhaps #lst is inefficient.... but I'd like to compare #lst and template efficiency. Sometimes I see complex, very complex templates used for so banal aims.... ;-)
Alex
That's an excellent case! It's better to use a single "inefficient" tool than one hundred of templates which (standalone) are considered "efficient".
Platonites wrote:
That's an excellent case! It's better to use a single "inefficient" tool than one hundred of templates which (standalone) are considered
"efficient".
But the efficient template may be used more and increase total "cost", compare http://en.wikipedia.org/wiki/Efficient_energy_use#Rebound_effect
Nemo
On 05/01/11 11:08, Federico Leva (Nemo) wrote:
Platonites wrote:
That's an excellent case! It's better to use a single "inefficient" tool than one hundred of templates which (standalone) are considered
"efficient".
But the efficient template may be used more and increase total "cost", compare http://en.wikipedia.org/wiki/Efficient_energy_use#Rebound_effect
Hit [[Special:Randompage]] ? :-b
On 4 January 2011 16:00, Alex Brollo alex.brollo@gmail.com wrote:
2011/1/4 Roan Kattouw roan.kattouw@gmail.com
...
What a "creative" use of #lst allows, if it is really an efficient, light routine, is to build named variables and arrays of named variables into one page; I can't imagine what a good programmer could do with such a powerful tool. I'm, as you can imagine, far from a good programmer, nevertheless I built easily routines for unbeliavable results. Perhaps, coming back to the topic..... a good programmer would disrupt wikipedia using #lst? :-)
Don't use the words "good programmers", sounds like mythic creatures that never adds bugs and can work 24 hours without getting tired. Haha...
What you seems you may need, is a special type of people, maybe in the academia, or student, or working already on something that already ask for a lot performance . One interested in the intricate details of optimizing. The last time I tried to search something special about PHP (how to force a garbage recollection in old versions of PHP) there was very few hits on google, or none.
Tei wrote:
The last time I tried to search something special about PHP (how to force a garbage recollection in old versions of PHP) there was very few hits on google, or none.
Maybe that was because PHP only has garbage recollection since 5.3 :)
For reference: http://php.net/manual/features.gc.php
On 12/29/2010 2:31 AM, Neil Kandalgaonkar wrote:
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
Ok, first of all you need a pot of gold at the end of the rainbow. Let's assume it's a real business model and not that you know a few folks who have $1B burning a hole in their pocket. Let's also assume that it's a business model basic on getting a lot of traffic...
Secondly, if you want to go up against 'Wikipedia as a whole', that's a very difficult problem. Wikipedia is one of the strongest sites on the internet in terms of S.E.O., not because of any nasty stuff, but because so many people link to Wikipedia articles from all over the web. Wikipedia ranks highly for many terms and that's a situation that Google & Bing don't mind, since Wikipedia has something halfway decent to say about most topics... It makes search engines seem smart.
To overturn Wikipedia on the conventional web, you'd really need to beat it at S.E.O. Sneaky-peet tricks won't help you that much when you're working at this scale, because if you're able to make enough phony links to challenge one of the most-linked sites on Earth, you're probably going to set off alarm bells up and down the West coast. Thus, the challenge of a two-sided market faces anybody who wants to 'beat' Wikipedia, and I think it's just too hard a nut to crack, even if you've got software that's way better and if you've got a monster marketing budget.
I think there are three ways you can 'beat' Wikipedia in a smaller sense. (i) in another medium, (ii) by targeting very specific verticals, or (iii) by creating derivative products that add a very specific kind of value (that is, targeting a horizontal)
In (i) I think of companies like Foursquare and Fotopedia that follow a mobile-first strategy. If mobile apps got really big and eclipsed the 'web as we know it', I can see a space for a Wikipedia successor. This could entirely bypass the S.E.O. problem, but couldn't Wikipedia fight back with a mobile app of it's own? On the other hand, this might not be so plausible: the better mobile devices do an O.K. job with 'HTML 5' and with improvements in hardware, networking and in HTML-related specifications, so there might be no real advantage in having 'an app for that'. Already people are complaining that a collection of apps on your device creates a number of 'walled gardens' that can't be searched in aggregate, and these kinds of pressures may erode the progress of apps.
For (ii) I think of Wikia, which hosts things like
http://mario.wikia.com/wiki/MarioWiki
Stuff like this drives deletionists nuts on Wikipedia, but having a place for them to live in Wikia makes everybody happy. Here's a place where the Notability policy means that Wikipedia isn't competitive. Now, in general, Wikia is trying to do this for thousands of subjects (which might compete with Wikipedia overall) and they've had some success, but not an overwhelming amount.
Speaking of notability, another direction is to make something that's more comprehensive than Wikipedia. Consider Freebase, which accepts Person records for any non-fictional person and has detailed records of millions of TV episodes, music tracks, books, etc. If Wikipedia refuses to go someplace, they create opportunities.
As for (iii) you're more likely to have a complementary relationship with Wikipedia. You can take advantage of Wikipedia's success and get some income to pay for people and machines. There wouldn't be any possibility of 'replacing' Wikipedia except in a crazy long-term scenario where, say, we can convert Wikipedia into a knowledge base that can grow and update itself with limited human intervention. (Personally I think this is 10-25 years off)
Anyhow, I could talk your ear off about (iii) but I'd make you sign an N.D.A. first. ;-)
Looking over the thread, there are lots of good ideas. Its really important to have some plan towards cleaning up abstractions between "structured data", "procedures in representation", "visual representation" and "tools for participation".
But, I think its correct to identify the social aspects of the projects as more critical than purity of abstractions within wikitext. Tools, bots and scripts and clever ui components can abstract away some of the pain of the underlining platform as long as people are willing to accept a bit of abstraction leakage / lack of coverage in some areas as part of moving to something better.
One area that I did not see much mention of in this thread is automated systems for reputation. Reputation systems would be useful both for user interactions and for gauging expertise within particular knowledge domains.
Social capital within wikikmedia projects is presently stored in incredibly unstructured ways and has little bearing on user privileges or how the actions of others are represented to you, and how your actions are represented to others. Its presently based on traditional small scale capacities of individuals to gauge social standing within their social networks and or to read user pages.
We can see automatic reputation system emerging anytime you want to share anything online be it making a small loan to trading used DVDs. Sharing information should adopt some similar principals.
There has been some good work done in this area with wikitrust system ( and other user moderation / karma systems ). Tying that data into smart interface flows that reward positive social behaviour and productive contributions, should make it "more fun" to participate in the projects and result in more fluid higher quality information sharing.
peace, --michael
On 12/29/2010 01:31 AM, Neil Kandalgaonkar wrote:
I've been inspired by the discussion David Gerard and Brion Vibber kicked off, and I think they are headed in the right direction.
But I just want to ask a separate, but related question.
Let's imagine you wanted to start a rival to Wikipedia. Assume that you are motivated by money, and that venture capitalists promise you can be paid gazillions of dollars if you can do one, or many, of the following:
1 - Become a more attractive home to the WP editors. Get them to work on your content.
2 - Take the free content from WP, and use it in this new system. But make it much better, in a way Wikipedia can't match.
3 - Attract even more readers, or perhaps a niche group of super-passionate readers that you can use to build a new community.
In other words, if you had no legacy, and just wanted to build something from zero, how would you go about creating an innovation that was disruptive to Wikipedia, in fact something that made Wikipedia look like Friendster or Myspace compared to Facebook?
And there's a followup question to this -- but you're all smart people and can guess what it is.
"Michael Dale" mdale@wikimedia.org wrote in message news:4D1D31D7.9010507@wikimedia.org...
One area that I did not see much mention of in this thread is automated systems for reputation. Reputation systems would be useful both for user interactions and for gauging expertise within particular knowledge domains.
Social capital within wikikmedia projects is presently stored in incredibly unstructured ways and has little bearing on user privileges or how the actions of others are represented to you, and how your actions are represented to others. Its presently based on traditional small scale capacities of individuals to gauge social standing within their social networks and or to read user pages.
We can see automatic reputation system emerging anytime you want to share anything online be it making a small loan to trading used DVDs. Sharing information should adopt some similar principals.
There has been some good work done in this area with wikitrust system ( and other user moderation / karma systems ). Tying that data into smart interface flows that reward positive social behaviour and productive contributions, should make it "more fun" to participate in the projects and result in more fluid higher quality information sharing.
peace, --michael
I think this is a fascinating idea, and one that I think meets a very valuable criterion: being more useful to newcomers, who are used to seeing such things on other sites, than to established editors (who will inevitably hate it). I can see a deployment path along the lines of the Foundation saying "we are going to enable this extension, whether or not you ask for it. You do not have to use it, but you may not disable it.", and watching what happens. It could well be months or years before people get over complaining about it and it start to bed down. Of course in that time (and generally) it needs to be immune to various forms of credit farming, which could lead to some interesting metrics to try and ensure that sockmasters cannot earn huge reputations by passing credit amongst their socks, while ordinary users can be rewarded.
With ideas like this, and more generally, I think the Foundation has an increasing role to play in fighting the growing inertia in the projects. It's easy to say that any intervention will damage the community and so should be avoided; but let's not forget that a (mildly) torn muscle will heal stronger, and that the alternative is, as mentioned, complete stagnation. It's important that the community keep evolving and innovating as well; and if that means that some of its more inflexible members cannot keep up and leave, so be it, as long as they are replaced by new and excited members such that the community as a whole remains vibrant. Of course that's a horribly difficult balance to strike, and it would be easy to kill the golden goose. But the goose is now getting rather grey and arthritic, and we could really do with some golden goslings right now.
--HM
I've been following the discussion and as I can see it's already become rather unproductive*. So I hope my cutting in will not be very much out of place (even if I don't really know what I'm talking about).
Many people here has stated the main reason why a WYSIWYG editor is not feasible is the current wikitext syntax.
What's actually wrong with it?
The main thing I can thing of is the fact one template may include an opening of a table etc. and another one a closing (e.g. {{col-begin}}, {{col-end}}). It makes it impossible to isolate the template from the rest of the article - draw a frame around it, say "this box here is a template".
It could be fixed by forbidding leaving unclosed tags in templates. As a replacement, a kind of foreach loop could be introduced to iterate through an unspecified number of arguments.
Lack of standardisation has also been mentioned. Something else?
I've tried to think how a perfect parser should work. Most of this has been already mentioned. I think it should work in two steps: first tokenise the code and transform it into an intermediate tree structure like *paragraph title: * plain text: "Section 1" content: * plain text: "foo" * bold text: * plain text: "bar" * template name: "Infobox" * argument name: "last name": value: * plain text: "Shakespear" and so on. Then this structure could be transformed into a) HTML for display, b) JSON for the WYSIWYG editor. Thanks for this you wouldn't need to write a whole new JS parser. The editor would get a half-ready product. The JS code would need to be able to: a) transform this structure into HTML, b) modify the structure, c) transform this structure back into wikitext.
But I guess it's more realistic to write a new JS parser than to write a new PHP parser. The former can start as a stub, the latter would need to be fully operational from the beginning.
Stephanie's suggestions are also interesting.
lampak
* (except the WYSIWTF, of course)
2011/1/1 lampak llampak@gmail.com:
It could be fixed by forbidding leaving unclosed tags in templates.
[...]
I've tried to think how a perfect parser should work. Most of this has been already mentioned. I think it should work in two steps: first tokenise the code and transform it into an intermediate tree structure like
[...]
and so on. Then this structure could be transformed into a) HTML for display, b) JSON for the WYSIWYG editor. Thanks for this you wouldn't need to write a whole new JS parser. The editor would get a half-ready product. The JS code would need to be able to: a) transform this structure into HTML, b) modify the structure, c) transform this structure back into wikitext.
Trevor Parscal already has a proof-of-concept parser that follows this philosophy pretty much to the letter. I don't think it's in our SVN repository yet (he said he would commit it some time ago) and I haven't succeeded in convincing him to reply on this list (holidays, I guess), but he's been playing around for it for about nine months now, on and off, and from what I've heard and seen it's promising and entirely in the spirit of your post.
Roan Kattouw (Catrope)
----- Original Message -----
From: "lampak" llampak@gmail.com
I've been following the discussion and as I can see it's already become rather unproductive*. So I hope my cutting in will not be very much out of place (even if I don't really know what I'm talking about).
Many people here has stated the main reason why a WYSIWYG editor is not feasible is the current wikitext syntax.
What's actually wrong with it?
Oh god! *Run*!!!!!
:-)
This has been done a dozen times in the last 5 years, lampak. The short version, as much as *I* am displeased with the fact that we'll never have *bold*, /italic/ and _underscore_, is that the installed base, both of articles and editors, means that Mediawikitext will never change.
It *might* be possible to *extend* it, but that requires that at least one of the 94 projects to write a formally defined parser for it, in something resembling yacc, would have to complete -- and to my knowledge, none has done so.
Cheers, -- jra
On Sun, Jan 2, 2011 at 6:28 AM, Jay Ashworth jra@baylink.com wrote:
[...] This has been done a dozen times in the last 5 years, lampak. The short version, as much as *I* am displeased with the fact that we'll never have *bold*, /italic/ and _underscore_, is that the installed base, both of articles and editors, means that Mediawikitext will never change.
That we've multiply concluded that it will never change doesn't mean it won't; as a thought exercise, as I suggested in OtherThread, we should consider negating that conclusion and seeing what happens.
On Mon, Jan 3, 2011 at 4:59 PM, George Herbert george.herbert@gmail.com wrote:
That we've multiply concluded that it will never change doesn't mean it won't; as a thought exercise, as I suggested in OtherThread, we should consider negating that conclusion and seeing what happens.
Agreed. I think part of the problem in the past is that the conversation generally focused on the actual syntax, and not enough on the incremental changes that we can make to MediaWiki to make this happen.
If, for example, we can build some sort of per-revision indicator of markup language (sort of similar to mime type) which would let us support multiple parsers on the same wiki, then it would be possible to build alternate parsers that people could try out on a per-article basis (and more importantly, revert if it doesn't pan out). The thousands of MediaWiki installs could try out different syntax options, and maybe a clear winner would emerge.
Rob
On Mon, Jan 3, 2011 at 8:41 PM, Rob Lanphier robla@wikimedia.org wrote:
If, for example, we can build some sort of per-revision indicator of markup language (sort of similar to mime type) which would let us support multiple parsers on the same wiki, then it would be possible to build alternate parsers that people could try out on a per-article basis (and more importantly, revert if it doesn't pan out). The thousands of MediaWiki installs could try out different syntax options, and maybe a clear winner would emerge.
Or you end up supporting 5 different parsers that people like for slightly different reasons :)
-Chad
On Mon, Jan 3, 2011 at 5:54 PM, Chad innocentkiller@gmail.com wrote:
On Mon, Jan 3, 2011 at 8:41 PM, Rob Lanphier robla@wikimedia.org wrote:
If, for example, we can build some sort of per-revision indicator of markup language (sort of similar to mime type) which would let us support multiple parsers on the same wiki, then it would be possible to build alternate parsers that people could try out on a per-article basis (and more importantly, revert if it doesn't pan out). The thousands of MediaWiki installs could try out different syntax options, and maybe a clear winner would emerge.
Or you end up supporting 5 different parsers that people like for slightly different reasons :)
Yup, that would definitely be a strong possibility without a disciplined approach. However, done correctly, killing off fringe parsers on a particular wiki would be fairly easy to do. Just because the underlying wiki engine allows for 5 different parsers, doesn't mean a particular wiki would need to allow the creation of new pages or new revisions using any of the 5. If we build the tools that allow admins some ability to constrain the choices, it doesn't have to get too out of hand on a particular wiki.
If we were to go down this development path, we'd need to commit ahead of time to be pretty stingy about what we bless as a "supported" parser, and brutal about killing off support for outdated parsers.
Rob
2011/1/4 Rob Lanphier robla@robla.net
On Mon, Jan 3, 2011 at 5:54 PM, Chad innocentkiller@gmail.com wrote:
On Mon, Jan 3, 2011 at 8:41 PM, Rob Lanphier robla@wikimedia.org
wrote:
If, for example, we can build some sort of per-revision indicator of markup language (sort of similar to mime type) which would let us support multiple parsers on the same wiki, then it would be possible to build alternate parsers that people could try out on a per-article basis (and more importantly, revert if it doesn't pan out). The thousands of MediaWiki installs could try out different syntax options, and maybe a clear winner would emerge.
Or you end up supporting 5 different parsers that people like for slightly different reasons :)
Yup, that would definitely be a strong possibility without a disciplined approach. However, done correctly, killing off fringe parsers on a particular wiki would be fairly easy to do. Just because the underlying wiki engine allows for 5 different parsers, doesn't mean a particular wiki would need to allow the creation of new pages or new revisions using any of the 5. If we build the tools that allow admins some ability to constrain the choices, it doesn't have to get too out of hand on a particular wiki.
If we were to go down this development path, we'd need to commit ahead of time to be pretty stingy about what we bless as a "supported" parser, and brutal about killing off support for outdated parsers.
Rob
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I apologyze, I sent an empty reply. :-(
Just a brief comment: there's no need of seaching for "a perfect wiki syntax", since it exists: it's the present model of well formed markup, t.i. xml.
While digging into subtler troubles from wiki syntax, t.i. difficulties in parsing it by scripts or understanding fuzzy behavior of the code, I always find a trouble coming from tha simple fact, that wiki is a markup that isn't intrinsecally well formed - it doen't respect the simple, basic rules of a well formed syntax: strict and evident rules about beginning-ending of a modifier; no mixing of attributes and content inside its "tags", t.i. templates.
In part, wiki markup can be hacked to take a step forward; I'm using more and more "well formed templates", splitted into two parts, a "starting template" and an "ending template". Just a banal example: it.source users are encouraged to use {{Centrato!l=20em}}.... text ...</div> syntax, where text - as you see - is outside the template, while the usual syntax {{Centrato|.... text ... |l=20em}} mixes tags and contents (Centrato is Italian name of "center" and l attribute states the width of centered div). I find such a trick extremely useful when parsind text, since - as follows by the use of a well-formed marckup - I can retrieve the whole text simply removing any template code and any html tag; an impossible task using the common "not well formed" syntax, where nothing tells about the nature of parameters: they only can be classified by "human understanding" of the template code.... or by the whole body of wiki parser.
Alex
Alex Brollo alex.brollo@gmail.com writes:
Just a brief comment: there's no need of seaching for "a perfect wiki syntax", since it exists: it's the present model of well formed markup, t.i. xml.
And, from your answer, we can see that you mean “perfectly understandable to parsers”, but sacrifices human usability. XML is notoriously difficult to produce by hand.
Suppose there was some mythical “perfect” markup.
We wouldn't want to sacrifice the usability of simple Wiki markup — it would need to be something that could be picked up quickly (wiki-ly) by people. After all, if your perfect markup start barfing up XML parser errors whenever someone created not-so-well-formed XML, well, that wouldn't feel very “wiki”, would it?
From what I've seen of this iteration of this conversation, it looks
like people are most concerned with markup that is easy and unambiguous to parse.
While I understand the importance of unambiguous markup or syntax for machines, I think human-centered attributes such as “learn-ability” are paramount.
Perhaps this is where we can cooperate more with other Wiki writers to develop a common Wiki markup. From my brief perusal of efforts, it looks like there is a community of developers involved in http://www.wikicreole.org/ but MediaWiki involvement is lacking (http://bit.ly/hYoki3 — for a email from 2007(!!) quoting Tim Starling).
(Note that I think any conversation about parser changes should consider the GoodPractices page from http://www.wikicreole.org/wiki/GoodPractices.)
If nothing else, perhaps there would be some use for the EBNF grammar that was developed for WikiCreole. http://dirkriehle.com/2008/01/09/an-ebnf-grammar-for-wiki-creole-10/
Mark A. Hershberger wrote:
Perhaps this is where we can cooperate more with other Wiki writers to develop a common Wiki markup. From my brief perusal of efforts, it looks like there is a community of developers involved in http://www.wikicreole.org/ but MediaWiki involvement is lacking (http://bit.ly/hYoki3 — for a email from 2007(!!) quoting Tim Starling).
(Note that I think any conversation about parser changes should consider the GoodPractices page from http://www.wikicreole.org/wiki/GoodPractices.)
If nothing else, perhaps there would be some use for the EBNF grammar that was developed for WikiCreole. http://dirkriehle.com/2008/01/09/an-ebnf-grammar-for-wiki-creole-10/
WikiCreole used to not be parsable by a grammar, either. And it has inconsistencies like "italic is // unless it appears in a url". Good to see they improved.
(Note that I think any conversation about parser changes should consider the GoodPractices page from http://www.wikicreole.org/wiki/GoodPractices.)
If nothing else, perhaps there would be some use for the EBNF grammar that was developed for WikiCreole. http://dirkriehle.com/2008/01/09/an-ebnf-grammar-for-wiki-creole-10/
WikiCreole used to not be parsable by a grammar, either. And it has inconsistencies like "italic is // unless it appears in a url". Good to see they improved.
WikiCreole only had a prose specification, hence it was ambiguous. Our syntax definition improved that so that in theory (and practice) you could now have multiple competing parser implementations. The issue with WikiCreole now is that it is simply too small---lots of stuff that it can't do but that any wiki engine will want.
The real reason why to care about a precise specification (that is not, as in the case of Mediawiki, simply the implementation), is the option to evolve faster. The real paper for this is http://dirkriehle.com/2008/07/19/a-grammar-for-standardized-wiki-markup/ - wouldn't it be nice if we could be innovating on a wiki platform?
Cheers, Dirk
On Tue, Jan 4, 2011 at 12:03 PM, Mark A. Hershberger mah@everybody.orgwrote:
Perhaps this is where we can cooperate more with other Wiki writers to develop a common Wiki markup. From my brief perusal of efforts, it looks like there is a community of developers involved in http://www.wikicreole.org/ but MediaWiki involvement is lacking (http://bit.ly/hYoki3 — for a email from 2007(!!) quoting Tim Starling).
We poked a bit at the early days of the WikiCreole project, but never really saw it as something that would solve any of the problems that MediaWiki had. I was at the meeting at WikiSym 2006 in Denmark where some of the creole syntax bits got hammered out, and if anything that helped convince me that it wasn't going to do us good to continue on that path.
As long as we're hung up on details of the markup syntax, it's going to be very very hard to make useful forward motion on things that are actually going to enhance the capabilities of the system and put creative power in the hands of the users.
Forget about syntax -- what do we want to *accomplish*?
Requiring people to do all their document creation at this level is like asking people to punch binary ASCII codes into cards by hand -- it's low-level grunt work that computers can handle for us. We have keyboards and monitors to replace punchcards; not only has this let most people stop worrying about memorizing ASCII code points, it's let us go beyond fixed-width ASCII text (a monitor emulating a teletype, which was really a friendlier version of punch cards) to have things like _graphics_. Text can be in different sizes, different styles, and different languages. We can see pictures; we can draw pictures; we can use colors and shapes to create a far richer, more creative experience for the user.
GUIs didn't come about from a better, more universal way of encoding text -- Unicode came years after GUI conventions were largely standardized in practice.
-- brion
As long as we're hung up on details of the markup syntax, it's going to be very very hard to make useful forward motion on things that are actually going to enhance the capabilities of the system and put creative power in the hands of the users.
Forget about syntax -- what do we want to *accomplish*?
I think you got this sideways. The concrete syntax doesn't matter, but the abstract syntax does. Without a clear specification no competing parsers, no interoperability, no decoupling APIs, no independently evolving components.
(Abstract syntax here means "XML representation" or structured representation or DOM tree i.e. an abstract syntax tree. But for that you need a language i.e. Wikitext specification and an implementation of a parser as of today doesn't do the job.)
worrying about memorizing ASCII code points, it's let us go beyond fixed-width ASCII text (a monitor emulating a teletype, which was really a friendlier version of punch cards) to have things like _graphics_. Text can be in different sizes, different styles, and different languages. We can see pictures; we can draw pictures; we can use colors and shapes to create a far richer, more creative experience for the user.
GUIs didn't come about from a better, more universal way of encoding text -- Unicode came years after GUI conventions were largely standardized in practice.
In order to have a visual editor or three, combined with a plain text editor, combined with some fancy other editor we have yet to invent, you will still need that specification that tells you what a valid wiki instance is. This is the core data; only if you have a clear spec of that can you have tool and UI innovation on top of that.
Cheers, Dirk
On Tue, Jan 4, 2011 at 1:27 PM, Dirk Riehle dirk@riehle.org wrote:
As long as we're hung up on details of the markup syntax, it's going to be
very very hard to make useful forward motion on things that are actually going to enhance the capabilities of the system and put creative power in the hands of the users.
Forget about syntax -- what do we want to *accomplish*?
I think you got this sideways. The concrete syntax doesn't matter, but the abstract syntax does. Without a clear specification no competing parsers, no interoperability, no decoupling APIs, no independently evolving components.
(Abstract syntax here means "XML representation" or structured representation or DOM tree i.e. an abstract syntax tree. But for that you need a language i.e. Wikitext specification and an implementation of a parser as of today doesn't do the job.)
[snip]
In order to have a visual editor or three, combined with a plain text editor, combined with some fancy other editor we have yet to invent, you will still need that specification that tells you what a valid wiki instance is. This is the core data; only if you have a clear spec of that can you have tool and UI innovation on top of that.
Exactly my point -- spending time tinkering with sortof-human-readable-but-not-powerful-enough syntax distracts from thinking about what needs to be *described* in the data... which is the important thing needed when devising an actual storage or interchange format.
Wikis started out as *very* lightly formatted plaintext. The point was to be fast and easy -- in the context of web browsers which only offered plaintext editing, lightweight markup for bold/italics and a standard convention for link naming was about as close as you could get to WYSIWYG / WYSIYM.
As browsers have modernised and now offer pretty decent rich-text editing in native HTML, web apps can actually make use of that to provide formatting & embedding of images and other structural elements. In this context, why should we spend more than 10 seconds thinking about how to devise a syntax for links or tables? We already have a perfectly good language for this stuff, which is machine-parseable: HTML. (Serialize it as XML to make it even more machine-friendly!)
If the web browsers of 1995 had had native HTML editing, I rather suspect there would never have been series-of-single-quotes to represent italics and bold...
-- brion
On 4 January 2011 21:39, Brion Vibber brion@pobox.com wrote:
If the web browsers of 1995 had had native HTML editing, I rather suspect there would never have been series-of-single-quotes to represent italics and bold...
... They did. Netscape Gold was the version *most* people used, and it even had a WYSIWYG HTML editor built in.
- d.
On Jan 4, 2011 1:54 PM, "David Gerard" dgerard@gmail.com wrote:
On 4 January 2011 21:39, Brion Vibber brion@pobox.com wrote:
If the web browsers of 1995 had had native HTML editing, I rather
suspect
there would never have been series-of-single-quotes to represent italics
and
bold...
... They did. Netscape Gold was the version *most* people used, and it even had a WYSIWYG HTML editor built in.
As a separate tool to edit standalone HTML files yes. As a widget integrated into web pages and controllable via scripting, no.
-- brion
On 04.01.2011 22:39, Brion Vibber wrote:
In order to have a visual editor or three, combined with a plain text editor, combined with some fancy other editor we have yet to invent, you will still need that specification that tells you what a valid wiki instance is. This is the core data; only if you have a clear spec of that can you have tool and UI innovation on top of that.
Exactly my point -- spending time tinkering with sortof-human-readable-but-not-powerful-enough syntax distracts from thinking about what needs to be *described* in the data... which is the important thing needed when devising an actual storage or interchange format.
Perhaps we should stop thinking about "formats" and start thinking about the document model. Spec an (extensible) WikiDOM, let people knock themselves out with different syntaxes to describe/create it. The "native" format could be serialized php objects for all I care.
-- daniel
On Tue, Jan 4, 2011 at 1:39 PM, Brion Vibber brion@pobox.com wrote:
Exactly my point -- spending time tinkering with sortof-human-readable-but-not-powerful-enough syntax distracts from thinking about what needs to be *described* in the data... which is the important thing needed when devising an actual storage or interchange format.
Below is an outline, which I've also posted to mediawiki.org[1] for further iteration. There's a lot of different moving parts, and I think one thing that's been difficult about this conversation is that different people are interested in different parts. I know a lot of people on this list are already overwhelmed or just sick of this conversation, so maybe if some of us break off in an on-wiki discussion, we might actually be able to make some progress without driving everyone else nuts. Optimistically, we might make some progress, but the worst case scenario is that we'll at least have documented many of the issues so that we don't have to start from zero the next time the topic comes up.
Here's the pieces of the conversation that I'm seeing: 1. Goals: what are we trying to achieve? * Tool interoperability ** Alternative parsers ** GUIs ** Real-time editing (ala Etherpad) * Ease of editing raw text * Ease of structuring the data * Template language with fewer squirrelly brackets * Performance * Security * What else?
2. Abstract format: regardless of syntax, what are we trying to express? * Currently, we don't have an abstract format; markup just maps to a subset of HTML (so perhaps the HTML DOM is our abstract format) * What subset of HTML do we use? * What subset of HTML do we need? * What parts of HTML do we *not* want to allow in any form? * What parts of HTML do we only want to allow in limited form (e.g. only safely generated from some abstract format) * Is the HTML DOM sufficiently abstract, or do we want/need some intermediate conceptual format? * Is browser support for XML sufficiently useful to try to rely on that? * Will it be helpful to expose the abstract format in any way
3. Syntax: what syntax should we store (and expose to users)? * Should we store some serialization of the abstract format instead of markup? * Is hand editing of markup a viable long term strategy? * How important is having something expressible with BNF? * Is XML viable as an editing format? JSON? YAML?
4. Tools (e.g. WYSIWYG) * Do our tool options get better if we fix up the abstract format and syntax? * Tools: ** Wikia WYSIWYG editor ** Magnus Manske's new thing ** Line-by-line editing ....list goes on...
5. Infrastructure: how would one support mucking around with the data? * Support for per-wiki data formats? * Support for per-page data formats? * Support for per-revision data formats? * Evolve existing syntax with no infrastructure changes?
[1] http://www.mediawiki.org/wiki/User:RobLa-WMF/2011-01_format_discussion
* Brion Vibber brion@pobox.com [Tue, 4 Jan 2011 13:39:28 -0800]:
On Tue, Jan 4, 2011 at 1:27 PM, Dirk Riehle dirk@riehle.org wrote: Wikis started out as *very* lightly formatted plaintext. The point was to be fast and easy -- in the context of web browsers which only offered plaintext editing, lightweight markup for bold/italics and a standard convention for link naming was about as close as you could get to WYSIWYG / WYSIYM.
It is still faster to type link address in square brackets than clicking "add link" icon then typing the link name or selecting it from a drop-down list. Even '' is a bit faster than Ctrl+I (italics via the mouse will be even slower than that).
As browsers have modernised and now offer pretty decent rich-text editing in native HTML, web apps can actually make use of that to provide formatting & embedding of images and other structural elements. In this context,
why
should we spend more than 10 seconds thinking about how to devise a syntax for links or tables? We already have a perfectly good language for
this
stuff, which is machine-parseable: HTML. (Serialize it as XML to make
it
even more machine-friendly!)
If the web browsers of 1995 had had native HTML editing, I rather suspect there would never have been series-of-single-quotes to represent
italics
and bold...
Native HTML usually is a horrible bloat of tags, their attributes and css styles. Not really a well-readable and easily processable thing. Even XML, processed via XSLT would be much more compact and better readable. HTML is poor at separating of semantics and presentation. HTML also invites page editor to abuse all of these features, while wikitext encourages the editor to concentrate the efforts on the quality of content. Let's hope that wikitext won't be completely abandoned in MediaWiki 2.0. Dmitriy
2011/1/5 Dmitriy Sintsov questpc@rambler.ru
Let's hope that wikitext won't be completely abandoned in MediaWiki 2.0. Dmitriy
There's plenty of js running (some, not so useful IMHO); a js optional routine to convert on the fly underlying, machine-oriented, well formed code (XML?) into "old" wikitext when editing pages could be built IMHO. This will run into a long series of cases... in specific situations/syntaxes hard troubles will be found: well, those cases are precisely where wikitext is not well formed and where some work is needed to think and think again; since they are THE troubles of wikitext.
Alex
On Tue, Jan 4, 2011 at 11:42 PM, Dmitriy Sintsov questpc@rambler.ru wrote:
- Brion Vibber brion@pobox.com [Tue, 4 Jan 2011 13:39:28 -0800]:
On Tue, Jan 4, 2011 at 1:27 PM, Dirk Riehle dirk@riehle.org wrote: Wikis started out as *very* lightly formatted plaintext. The point was to be fast and easy -- in the context of web browsers which only offered plaintext editing, lightweight markup for bold/italics and a standard convention for link naming was about as close as you could get to WYSIWYG / WYSIYM.
It is still faster to type link address in square brackets than clicking "add link" icon then typing the link name or selecting it from a drop-down list. Even '' is a bit faster than Ctrl+I (italics via the mouse will be even slower than that).
This exercise is not about making it easier for us, those who have already run smack out onto the long statistical tail in terms of mastery of MediaWiki editing.
Yes, it would be nice if we preserve a geezers-mode for those of us who know shortcut methods. Adoption rate of new tools among existing userbase is a major issue with any changes; if we lose too many of the existing crowd due to a change then it takes us many years worth of newbie friendlyness to regain the losses during the conversion. But we should not hamstring thinking about how non-tech people would or could use the project by thinking in terms of geezers-mode.
Brion Vibber brion@pobox.com writes:
Forget about syntax -- what do we want to *accomplish*?
One thing *I* would like to accomplish is a fruitful *end* to parser discussions. A way to make any further discussion a moot point.
From the current discussion, it looks like people want to make it easier
to work with WikiText (e.g. Enable tool creation like WYSIWYG editors) while still supporting the old Wiki markup (aka, the Wikipedia installed base).
The problem naturally falls back on the parser: As I understand it, the only reliable way of creating XHTML from MW markup is the parser that is built into MediaWiki and is fairly hard to separate (something I learned when I tried to put the parser tests into a PHPUnit test harness.)
I think The first step for creating a reliable, independent parser for MW markup would be to write some sort of specification (http://www.mediawiki.org/wiki/Markup_spec) and then to make sure our parser tests have good coverage.
Then, we could begin to move from 7-bit ASCII to 16-bit Unicode because we would have a standard so that independent programmers could verify that their parser or wikitext generator was working acceptably and reliably. Once you have the ability to create inter-operable WikiText parser/generators, it seems easy (to me) to build more tools on top of that.
Mark.
----- Original Message -----
From: "Mark A. Hershberger" mah@everybody.org
The problem naturally falls back on the parser: As I understand it, the only reliable way of creating XHTML from MW markup is the parser that is built into MediaWiki and is fairly hard to separate (something I learned when I tried to put the parser tests into a PHPUnit test harness.)
I think The first step for creating a reliable, independent parser for MW markup would be to write some sort of specification (http://www.mediawiki.org/wiki/Markup_spec) and then to make sure our parser tests have good coverage.
The last time I spent any appreciable time on wikitech (which was 4 or 5 years ago), *someone* had a grammar and parser about 85-90% working. I don't have that email archive due to a crash, so I can't pin a name to it or comment on whether it's someone in this thread...
or, alas, comment on what happened later. But he seemed pretty excited and happy, as I recall.
Cheers, -- jra
On 5 January 2011 04:58, Jay Ashworth jra@baylink.com wrote:
The last time I spent any appreciable time on wikitech (which was 4 or 5 years ago), *someone* had a grammar and parser about 85-90% working. I don't have that email archive due to a crash, so I can't pin a name to it or comment on whether it's someone in this thread... or, alas, comment on what happened later. But he seemed pretty excited and happy, as I recall.
Many, many bright people have dashed their foreheads against the problem.
Andreas Jonsson thinks he's largely cracked it:
http://davidgerard.co.uk/notes/2010/08/22/staring-into-the-eye-of-cthulhu/
- and even that required custom patches to ANTLR. The result runs in C and is of comparable speed to PHP.
It isn't quite a specification as I understand it, but it's on the way.
- d.
----- Original Message -----
From: "David Gerard" dgerard@gmail.com
Many, many bright people have dashed their foreheads against the problem.
Andreas Jonsson thinks he's largely cracked it:
http://davidgerard.co.uk/notes/2010/08/22/staring-into-the-eye-of-cthulhu/
- and even that required custom patches to ANTLR. The result runs in C
and is of comparable speed to PHP.
I suspect it was Steve Bennett's attack run I was remembering.
Did anyone ever pull statistics about exactly how many instances of that Last Five Percent there really were, as I suspect I suggested at the time?
Cheers, -- jra
On Wed, Jan 5, 2011 at 7:35 PM, Jay Ashworth jra@baylink.com wrote:
----- Original Message -----
From: "David Gerard" dgerard@gmail.com
Many, many bright people have dashed their foreheads against the problem.
Andreas Jonsson thinks he's largely cracked it:
http://davidgerard.co.uk/notes/2010/08/22/staring-into-the-eye-of-cthulhu/
- and even that required custom patches to ANTLR. The result runs in C
and is of comparable speed to PHP.
I suspect it was Steve Bennett's attack run I was remembering.
Did anyone ever pull statistics about exactly how many instances of that Last Five Percent there really were, as I suspect I suggested at the time?
Cheers, -- jra
Expansion off "how many instances..?" -
At some point in the corner, the fix is to change the templates and pages to match a more sane parser's capabilities or a more standard specification for the markup, rather than make the parser match the insanity that's already out there.
If we know what we're looking at, we can assign corner cases to an on-wiki cleanup "hit squad". Who knows how many of the corners we can outright assassinate that way, but it's worth a go... The less used it is and harder to code for it is, the easier it is for us to justify taking it out.
----- Original Message -----
From: "George Herbert" george.herbert@gmail.com
On Wed, Jan 5, 2011 at 7:35 PM, Jay Ashworth jra@baylink.com wrote:
Did anyone ever pull statistics about exactly how many instances of that Last Five Percent there really were, as I suspect I suggested at the time?
Expansion off "how many instances..?" -
The thing you want expanded, George, is "Last Five Percent"; I refer there to (I think it was) David Gerard's comment earlier that the first 95% of wikisyntax fits reasonably well into current parser building frameworks, and the last 5% causes well adjusted programmers to consider heroin... or something like that. :-)
At some point in the corner, the fix is to change the templates and pages to match a more sane parser's capabilities or a more standard specification for the markup, rather than make the parser match the insanity that's already out there.
If we know what we're looking at, we can assign corner cases to an on-wiki cleanup "hit squad". Who knows how many of the corners we can outright assassinate that way, but it's worth a go... The less used it is and harder to code for it is, the easier it is for us to justify taking it out.
Yup; that's the point I was making.
The argument advanced was always "there's too much usage of that ugly stuff to consider Just Not Supporting It" and I always asked whether anyone with larger computers than me had ever extracted actual statistics, and no one ever answered.
Cheers, -- jra
"Jay Ashworth" jra@baylink.com wrote in message news:32162150.4910.1294292017738.JavaMail.root@benjamin.baylink.com...
----- Original Message -----
The thing you want expanded, George, is "Last Five Percent"; I refer there to (I think it was) David Gerard's comment earlier that the first 95% of wikisyntax fits reasonably well into current parser building frameworks, and the last 5% causes well adjusted programmers to consider heroin... or something like that. :-)
The argument advanced was always "there's too much usage of that ugly stuff to consider Just Not Supporting It" and I always asked whether anyone with larger computers than me had ever extracted actual statistics, and no one ever answered.
This is a key point. Every other parser discussion has floundered *before* the stage of saying "here is a working parser which does *something* interesting, now we can see how it behaves". Everyone before has got to that last 5% and said "I can't make this work; I can do *this* which is kinda similar, but when you combine it with *this* and *that* and *the other* we're now in a totally different set of edge cases". And stopped there. Obviously it's impossible to quantify all the edge cases of the current parser *because of* the lack of a schema, but until we actually get a new parser churning through real wikitext, we're blind in the dark to say whether those edge cases make up 5%, 0.5% or 50% of the corpus that's out there.
--HM
On 07/01/11 00:49, Happy-melon wrote:
"Jay Ashworth"jra@baylink.com wrote in message news:32162150.4910.1294292017738.JavaMail.root@benjamin.baylink.com...
----- Original Message -----
The thing you want expanded, George, is "Last Five Percent"; I refer there to (I think it was) David Gerard's comment earlier that the first 95% of wikisyntax fits reasonably well into current parser building frameworks, and the last 5% causes well adjusted programmers to consider heroin... or something like that. :-)
The argument advanced was always "there's too much usage of that ugly stuff to consider Just Not Supporting It" and I always asked whether anyone with larger computers than me had ever extracted actual statistics, and no one ever answered.
This is a key point. Every other parser discussion has floundered *before* the stage of saying "here is a working parser which does *something* interesting, now we can see how it behaves". Everyone before has got to that last 5% and said "I can't make this work; I can do *this* which is kinda similar, but when you combine it with *this* and *that* and *the other* we're now in a totally different set of edge cases". And stopped there. Obviously it's impossible to quantify all the edge cases of the current parser *because of* the lack of a schema, but until we actually get a new parser churning through real wikitext, we're blind in the dark to say whether those edge cases make up 5%, 0.5% or 50% of the corpus that's out there.
--HM
Am I right in assuming that "working" means in this case:
(a) being able to parse an article as a valid production of its grammar, and then (b) being able to complete the round trip by generating character-for-character identical wikitext output from that parse tree
If so, what would count as a statistically useful sample of articles to test? 1000? 10,000? 100,000? Or, if someone has access to serious computing resources, and a recent dump, is it worth just trying all of them? In any case, it would be interesting to have a list of failed revisions, so developers can study the problems involved.
Given the generality of wikimarkup, and that user-editability means that editors can provide absolutely any string as an input to it, it might also make sense trying it on random garbage inputs, and "fuzzed" versions of articles as well as real articles.
Flexbisonparser looks like the most plausible candidate for testing. Does anyone know if it is currently buildable?
-- Neil
Thinking about this question from the other day and the apparently deep conviction that XML is the magic elixir, I had to wonder: what about the existing Preprocessor_DOM class?
I'm asking out of ignorance. I realize a the preprocessor is not the parser, but it does turn the WikiText into a DOM (right?) and that could, conceivably, be used to create different parsers.
What am I missing?
Mark.
On 11-01-06 10:15 AM, Mark A. Hershberger wrote:
Thinking about this question from the other day and the apparently deep conviction that XML is the magic elixir, I had to wonder: what about the existing Preprocessor_DOM class?
I'm asking out of ignorance. I realize a the preprocessor is not the parser, but it does turn the WikiText into a DOM (right?) and that could, conceivably, be used to create different parsers.
What am I missing?
Mark.
The preprocessor only handles a minimal of WikiText... it's only function is things like template hierarchy, parser functions, and perhaps tags. It doesn't do any of the pile of other WikiText syntax. It's also not really anything special to do with XML, we have a Preprocessor_Hash too which iirc uses php arrays instead of a dom.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
----- Original Message -----
From: "Brion Vibber" brion@pobox.com
Requiring people to do all their document creation at this level is like asking people to punch binary ASCII codes into cards by hand -- it's low-level grunt work that computers can handle for us. We have keyboards and monitors to replace punchcards; not only has this let most people stop worrying about memorizing ASCII code points, it's let us go beyond fixed-width ASCII text (a monitor emulating a teletype, which was really a friendlier version of punch cards) to have things like _graphics_. Text can be in different sizes, different styles, and different languages. We can see pictures; we can draw pictures; we can use colors and shapes to create a far richer, more creative experience for the user.
None of which will be visible on phones from my Blackberry on down, which, IIRC, make up more than 50% of the Internet access points on the planet.
Minimalism is your friend; I can presently *edit* wikipedia on that BB, with no CSS, JS, or images. That's A Good Thing.
Cheers, -- jra
On Tue, Jan 4, 2011 at 8:53 PM, Jay Ashworth jra@baylink.com wrote:
----- Original Message -----
From: "Brion Vibber" brion@pobox.com
Requiring people to do all their document creation at this level is like asking people to punch binary ASCII codes into cards by hand -- it's low-level grunt work that computers can handle for us. We have keyboards and monitors to replace punchcards; not only has this let most
people stop
worrying about memorizing ASCII code points, it's let us go beyond fixed-width ASCII text (a monitor emulating a teletype, which was really a friendlier version of punch cards) to have things like
_graphics_.
Text can be in different sizes, different styles, and different
languages. We
can see pictures; we can draw pictures; we can use colors and shapes to
create
a far richer, more creative experience for the user.
None of which will be visible on phones from my Blackberry on down, which, IIRC, make up more than 50% of the Internet access points on the planet.
Minimalism is your friend; I can presently *edit* wikipedia on that BB, with no CSS, JS, or images. That's A Good Thing.
A good document structure would allow useful editing for both simple paragraphs and complex features like tables and templates even on such primitive devices, by giving a dedicated editing interface the information it needs to address individual paragraphs, template parameters, table cells, etc.
I would go so far as to say that this sort of fallback interface would in fact be far superior to editing a big blob of wikitext on a small cell phone screen -- finding the bit you want to edit in a huge paragraph full of references and image thumbnails is pretty dreadful at the best of times.
-- brion
2011/1/4 Brion Vibber brion@pobox.com:
A good document structure would allow useful editing for both simple paragraphs and complex features like tables and templates even on such primitive devices, by giving a dedicated editing interface the information it needs to address individual paragraphs, template parameters, table cells, etc.
Indeed, Google Docs has an optimized editing UI for Android and iOS that focuses precisely on making it easy to make a quick change to a paragraph in a document or a cell in a spreadsheet (with concurrent editing).
http://www.intomobile.com/2010/11/17/mobile-edit-google-docs-android-iphone-...
2011/1/4 Brion Vibber brion@pobox.com:
Indeed, Google Docs has an optimized editing UI for Android and iOS that focuses precisely on making it easy to make a quick change to a paragraph in a document or a cell in a spreadsheet (with concurrent editing).
http://www.intomobile.com/2010/11/17/mobile-edit-google-docs-android-iphone-...
A little bit of OT: try the new image vector image editor of Google Docs; it exports images into svg format, and I found it excellent to build such images and to upload them into Commons.
Now a "free roaming thought" about templates, just to share an exotic idea. The main issue of template syntax, is casual, free, unpredictable mixture of attributes and contents into template parameters. It's necessary, IMHO, to convert them into somehow "well formed structures" so that content could pulled out from the template code. This abstract structure could be this one: {{template name begin|param1|param2|...}} {{optional content 1 begin}} text 1.... {{optional content 1 end}} {{optional content 2 begin}} text 2.... {{optional content 2 end}} ..... {{template name end}}
Alex
----- Original Message -----
From: "Brion Vibber" brion@pobox.com
A good document structure would allow useful editing for both simple paragraphs and complex features like tables and templates even on such primitive devices, by giving a dedicated editing interface the information it needs to address individual paragraphs, template parameters, table cells, etc.
A 'dedicated editing interface' is the canonical counter example to my #1 fundamental tenet of program and systems design: "Get The Glue Right".
The Right Glue, in this case, is bare HTML, which can be run nearly everywhere these days.
I would go so far as to say that this sort of fallback interface would in fact be far superior to editing a big blob of wikitext on a small cell phone screen -- finding the bit you want to edit in a huge paragraph full of references and image thumbnails is pretty dreadful at the best of times.
Of course it would.
But the target audience here isn't people who *have* anything else; it's people in the Sudan. Well, the target audience I see from up here at 43,000 feet.
Cheers, -- jra
----- Original Message -----
From: "Alex Brollo" alex.brollo@gmail.com
Just a brief comment: there's no need of seaching for "a perfect wiki syntax", since it exists: it's the present model of well formed markup, t.i. xml.
I believe the snap reaction here is "you haven't tried to diff XML, have you?
My personal snap reaction is that the increase in cycles necessary to process XML in both directions, *multiplied by the number of machines in WMF data center* will make XML impractical, but I'm not a WMF engineer.
Cheers, -- jra
On 11-01-05 02:09 AM, Daniel Kinzler wrote:
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff XML, have you?
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
-- daniel
I don't think a discussion on diff comparison of XML has much point.
I believe the idea floating around here (or at least the idea I'm thinking of based on these discussions) is that we would store page text in an xml format or a serialized php format or something else where contents are semantically noted with things like '<template title="Template:Foo"><param name="1">...</param><param name="foo">bar</param></template><i>This is italic</i><link internal="true" title="FooBar">FooBar</link>', to actually edit this page content we provide the data in multiple formats: - Fully parsed output for page viewing - A semantically marked up version of the html that is compatible with the use of a WYSIWYG editor and can be converted back to the xml format and then saved - A WikiText like format similar to the WikiText we already have that users can edit in plaintext, we use the xml and covert it into that format, and then when the user saves parse that back into the xml format.
Naturally, if we're doing things like this, then rather than diffing the ugly xml, the natural thing would most likely be to take the xml format of both pages, convert it into that WikiText-like plaintext format and show the user a diff of that so they know what meaningful changes were made to the page. If you really wanted to, you could also show them a diff of the end html as an option, but that's fairly pointless.
As an extra bonus, besides enabling WYSIWYG, having that xml format also has a good chance of making efforts of giving users an in-page diff marking up what was actually changed in the contents itself much easier.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
Having XML-based content would also enable a wide variety of new re-uses of Wikimedia content. People could build all sorts of custom apps, games, feeds, etc., without having to worry about broken syntax or resorting to screen scraping (like we do for our mobile site). It would also make implementing semantic features easier and thus could improve our search capabilities. Plus it makes a great Bloody Mary!
Ryan Kaldari
On 1/5/11 8:26 AM, Daniel Friesen wrote:
On 11-01-05 02:09 AM, Daniel Kinzler wrote:
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff XML, have you?
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
-- daniel
I don't think a discussion on diff comparison of XML has much point.
I believe the idea floating around here (or at least the idea I'm thinking of based on these discussions) is that we would store page text in an xml format or a serialized php format or something else where contents are semantically noted with things like '<template title="Template:Foo"><param name="1">...</param><param name="foo">bar</param></template><i>This is italic</i><link internal="true" title="FooBar">FooBar</link>', to actually edit this page content we provide the data in multiple formats:
- Fully parsed output for page viewing
- A semantically marked up version of the html that is compatible with
the use of a WYSIWYG editor and can be converted back to the xml format and then saved
- A WikiText like format similar to the WikiText we already have that
users can edit in plaintext, we use the xml and covert it into that format, and then when the user saves parse that back into the xml format.
Naturally, if we're doing things like this, then rather than diffing the ugly xml, the natural thing would most likely be to take the xml format of both pages, convert it into that WikiText-like plaintext format and show the user a diff of that so they know what meaningful changes were made to the page. If you really wanted to, you could also show them a diff of the end html as an option, but that's fairly pointless.
As an extra bonus, besides enabling WYSIWYG, having that xml format also has a good chance of making efforts of giving users an in-page diff marking up what was actually changed in the contents itself much easier.
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
On 5 January 2011 22:16, Ryan Kaldari rkaldari@wikimedia.org wrote:
Having XML-based content would also enable a wide variety of new re-uses of Wikimedia content. People could build all sorts of custom apps, games, feeds, etc., without having to worry about broken syntax or resorting to screen scraping (like we do for our mobile site). It would also make implementing semantic features easier and thus could improve our search capabilities. Plus it makes a great Bloody Mary!
Before we go haring off - what would be *really* nice would be getting Magnus' WYSIFTW developed to a stage where it's fit to put in front of nontechnical users and do some decent usability testing:
http://meta.wikimedia.org/wiki/WYSIFTW
Magnus does this stuff in his spare time. and has to get back to actual work - but there's a list of needed features (which of course anyone can add to) and I know he very much welcomes other people hacking on it.
It's not ready for prime time yet, but it's one of the most promising approaches I've seen in a while.
(And the nice thing about WYSIFTW is that it requires *no* action on server side - the only thing it needs right now is to be developed to a state where it can be usability-tested.)
- d.
I just started testing WYSIWTF; I would like to encourage as many other people on this list to do so as well.
On Wed, Jan 5, 2011 at 2:22 PM, David Gerard dgerard@gmail.com wrote:
On 5 January 2011 22:16, Ryan Kaldari rkaldari@wikimedia.org wrote:
Having XML-based content would also enable a wide variety of new re-uses of Wikimedia content. People could build all sorts of custom apps, games, feeds, etc., without having to worry about broken syntax or resorting to screen scraping (like we do for our mobile site). It would also make implementing semantic features easier and thus could improve our search capabilities. Plus it makes a great Bloody Mary!
Before we go haring off - what would be *really* nice would be getting Magnus' WYSIFTW developed to a stage where it's fit to put in front of nontechnical users and do some decent usability testing:
http://meta.wikimedia.org/wiki/WYSIFTW
Magnus does this stuff in his spare time. and has to get back to actual work - but there's a list of needed features (which of course anyone can add to) and I know he very much welcomes other people hacking on it.
It's not ready for prime time yet, but it's one of the most promising approaches I've seen in a while.
(And the nice thing about WYSIFTW is that it requires *no* action on server side - the only thing it needs right now is to be developed to a state where it can be usability-tested.)
- d.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 5 January 2011 22:47, George Herbert george.herbert@gmail.com wrote:
I just started testing WYSIWTF; I would like to encourage as many other people on this list to do so as well.
It's not even close to finished - but the more features we can add and the more bugs we can find, the closer to a proper usability test we are.
Devs! Please give it a go! Please report problems!
- d.
On Wed, Jan 5, 2011 at 2:22 PM, David Gerard dgerard@gmail.com wrote:
http://meta.wikimedia.org/wiki/WYSIFTW Magnus does this stuff in his spare time. and has to get back to actual work - but there's a list of needed features (which of course anyone can add to) and I know he very much welcomes other people hacking on it.
---- Original Message -----
From: "Daniel Kinzler" daniel@brightbyte.de
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff XML, have you?
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
Sure, but how much more processor horsepower is that going to take.
Scale is a driver in Mediawiki, for obvious reasons.
Cheers, -- jra
On Wed, Jan 5, 2011 at 7:37 PM, Jay Ashworth jra@baylink.com wrote:
---- Original Message -----
From: "Daniel Kinzler" daniel@brightbyte.de
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff XML, have you?
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
Sure, but how much more processor horsepower is that going to take.
Scale is a driver in Mediawiki, for obvious reasons.
I suspect that diffs are relatively rare events in the day to day WMF processing, though non-trivial.
That said, and as much of a fan of some sort of conceptually object oriented page data approach... DOM? Really??
We're not trying to do 99% of what that does; we just need object / element contents, style and perhaps minimal other attributes, and order within a page.
* George Herbert george.herbert@gmail.com [Wed, 5 Jan 2011 19:52:18 -0800]:
On Wed, Jan 5, 2011 at 7:37 PM, Jay Ashworth jra@baylink.com wrote:
---- Original Message -----
From: "Daniel Kinzler" daniel@brightbyte.de
On 05.01.2011 05:25, Jay Ashworth wrote:
I believe the snap reaction here is "you haven't tried to diff
XML,
have you?
A text-based diff of XML sucks, but how about a DOM based
(structural)
diff?
Sure, but how much more processor horsepower is that going to take.
Scale is a driver in Mediawiki, for obvious reasons.
I suspect that diffs are relatively rare events in the day to day WMF processing, though non-trivial.
That said, and as much of a fan of some sort of conceptually object oriented page data approach... DOM? Really??
We're not trying to do 99% of what that does; we just need object / element contents, style and perhaps minimal other attributes, and order within a page.
DOM manipulation at templates level is not a bad thing. Also that could be partially unified with parsing because trees are used there as well. I just hope there is a chance to have XML to wikitext mapping (at least partially compatible in basic markups). Dmitriy
----- Original Message -----
From: "George Herbert" george.herbert@gmail.com
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
Sure, but how much more processor horsepower is that going to take.
Scale is a driver in Mediawiki, for obvious reasons.
I suspect that diffs are relatively rare events in the day to day WMF processing, though non-trivial.
Every single time you make an edit, unless I badly misunderstand the current architecture; that's how it's possible for multiple people editing the same article not to collide unless their edits actually collide at the paragraph level.
Not to mention pulling old versions.
Can someone who knows the current code better than me confirm or deny?
Cheers, -- jra
On Thu, Jan 6, 2011 at 11:01 AM, Jay Ashworth jra@baylink.com wrote:
----- Original Message -----
From: "George Herbert" george.herbert@gmail.com
A text-based diff of XML sucks, but how about a DOM based (structural) diff?
Sure, but how much more processor horsepower is that going to take.
Scale is a driver in Mediawiki, for obvious reasons.
I suspect that diffs are relatively rare events in the day to day WMF processing, though non-trivial.
Every single time you make an edit, unless I badly misunderstand the current architecture; that's how it's possible for multiple people editing the same article not to collide unless their edits actually collide at the paragraph level.
Not to mention pulling old versions.
Can someone who knows the current code better than me confirm or deny?
There's a few separate issues mixed up here, I think.
First: diffs for viewing and the external diff3 merging for resolving edit conflicts are actually unrelated code paths and use separate diff engines. (Nor does diff3 get used at all unless there actually is a conflict to resolve -- if nobody else edited since your change, it's not called.)
Second: the notion that diffing a structured document must inherently be very slow is, I think, not right.
A well-structured document should be pretty diff-friendly actually; our diffs are already working on two separate levels (paragraphs as a whole, then words within matched paragraphs). In the most common cases, the diffing might actually work pretty much the same -- look for nodes that match, then move on to nodes that don't; within changed nodes, look for sub-nodes that can be highlighted. Comparisons between nodes may be slower than straight strings, but the basic algorithms don't need to be hugely different, and the implementation can be in heavily-optimized C++ just like our text diffs are today.
Third: the most common diff view cases are likely adjacent revisions of recent edits, which smells like cache. :) Heck, these could be made once and then simply *stored*, never needing to be recalculated again.
Fourth: the notion that diffing structured documents would be overwhelming for the entire Wikimedia infrastructure... even if we assume such diffs are much slower, I think this is not really an issue compared to the huge CPU savings that it could bring elsewhere.
The biggest user of CPU has long been parsing and re-parsing of wikitext. Every time someone comes along with different view preferences, we have to parse again. Every time a template or image changes, we have to parse again. Every time there's an edit, we have to parse again. Every time something fell out of cache, we have to parse again.
And that parsing is *really expensive* on large, complex pages. Much of the history of MediaWiki's parser development has been in figuring out how to avoid parsing quite as much, or setting limits to keep the worst corner cases from bringing down the server farm.
We parse *way*, *wayyyyy* more than we diff.
Part of what makes these things slow is that we have to do a lot of work from scratch every time, and we have to do it in slow PHP code, and we have to keep going back and fetching more stuff halfway through. Expanding templates can change the document structure at the next parsing level, so referenced files and templates have to be fetched or recalculated, often one at a time because it's hard to batch up a list of everything we need at once.
I think there would be some very valuable savings to using a document model that can be stored in a machine-readable way up front. A data structure that can be described as JSON or XML (for examples) allows leaving the low-level "how do I turn a string into a structure" details to highly-tuned native C code. A document model that is easily traversed and mapped to/from hierarchical HTML allows code to process just the parts of the document it needs at any given time, and would make it easier to share intermediate data between variants if that's still needed.
In some cases, work that is today done in the 'parser' could even be done by client-side JavaScript (on supporting user-agents), moving little bits of work from the server farm (where CPU time is vast but sharply limited) to end-user browsers (where there's often a local surplus -- CPU's not doing much while it's waiting on the network to transfer big JPEG images).
It may be easier to prototype a lot of this outside of MediaWiki, though, or in specific areas such as media or interactive extensions, before we all go trying to redo the full core.
-- brion
On Thu, Jan 6, 2011 at 11:38 AM, Brion Vibber brion@pobox.com wrote:
On Thu, Jan 6, 2011 at 11:01 AM, Jay Ashworth jra@baylink.com wrote:
From: "George Herbert" george.herbert@gmail.com I suspect that diffs are relatively rare events in the day to day WMF processing, though non-trivial.
Every single time you make an edit, unless I badly misunderstand the current architecture; that's how it's possible for multiple people editing the same article not to collide unless their edits actually collide at the paragraph level.
Not to mention pulling old versions.
Can someone who knows the current code better than me confirm or deny?
There's a few separate issues mixed up here, I think.
First: diffs for viewing and the external diff3 merging for resolving edit conflicts are actually unrelated code paths and use separate diff engines. (Nor does diff3 get used at all unless there actually is a conflict to resolve -- if nobody else edited since your change, it's not called.)
Second: the notion that diffing a structured document must inherently be very slow is, I think, not right.
A well-structured document should be pretty diff-friendly actually; our diffs are already working on two separate levels (paragraphs as a whole, then words within matched paragraphs). In the most common cases, the diffing might actually work pretty much the same -- look for nodes that match, then move on to nodes that don't; within changed nodes, look for sub-nodes that can be highlighted. Comparisons between nodes may be slower than straight strings, but the basic algorithms don't need to be hugely different, and the implementation can be in heavily-optimized C++ just like our text diffs are today.
Third: the most common diff view cases are likely adjacent revisions of recent edits, which smells like cache. :) Heck, these could be made once and then simply *stored*, never needing to be recalculated again.
Fourth: the notion that diffing structured documents would be overwhelming for the entire Wikimedia infrastructure... even if we assume such diffs are much slower, I think this is not really an issue compared to the huge CPU savings that it could bring elsewhere.
The biggest user of CPU has long been parsing and re-parsing of wikitext. Every time someone comes along with different view preferences, we have to parse again. Every time a template or image changes, we have to parse again. Every time there's an edit, we have to parse again. Every time something fell out of cache, we have to parse again.
And that parsing is *really expensive* on large, complex pages. Much of the history of MediaWiki's parser development has been in figuring out how to avoid parsing quite as much, or setting limits to keep the worst corner cases from bringing down the server farm.
We parse *way*, *wayyyyy* more than we diff. [...]
Even if we diff on average 2-3x per edit, we're only doing order ten edits a second across the projects, right? Not going to dig up the current stats, but that's what I remember from last time I looked.
So; priority remains parser and actual used syntax cleanup, from a sanity point of view (being able to describe the syntax usefully, and in a way that allows multiple parsers to be written), with diff management as a distant low-impact priority...
2011/1/6 Brion Vibber brion@pobox.com:
Third: the most common diff view cases are likely adjacent revisions of recent edits, which smells like cache. :) Heck, these could be made once and then simply *stored*, never needing to be recalculated again.
We already do this for text diffs between revisions, we cache them in memcached.
Roan Kattouw (Catrope)
The perfect wiki syntax would be XML (at least behind the scenes). Then people could use whatever syntax they want and have it easily translated via XSLT.
Ryan Kaldari
On 1/1/11 9:51 AM, lampak wrote:
I've been following the discussion and as I can see it's already become rather unproductive*. So I hope my cutting in will not be very much out of place (even if I don't really know what I'm talking about).
Many people here has stated the main reason why a WYSIWYG editor is not feasible is the current wikitext syntax.
What's actually wrong with it?
The main thing I can thing of is the fact one template may include an opening of a table etc. and another one a closing (e.g. {{col-begin}}, {{col-end}}). It makes it impossible to isolate the template from the rest of the article - draw a frame around it, say "this box here is a template".
It could be fixed by forbidding leaving unclosed tags in templates. As a replacement, a kind of foreach loop could be introduced to iterate through an unspecified number of arguments.
Lack of standardisation has also been mentioned. Something else?
I've tried to think how a perfect parser should work. Most of this has been already mentioned. I think it should work in two steps: first tokenise the code and transform it into an intermediate tree structure like *paragraph title: * plain text: "Section 1" content: * plain text: "foo" * bold text: * plain text: "bar" * template name: "Infobox" * argument name: "last name": value: * plain text: "Shakespear" and so on. Then this structure could be transformed into a) HTML for display, b) JSON for the WYSIWYG editor. Thanks for this you wouldn't need to write a whole new JS parser. The editor would get a half-ready product. The JS code would need to be able to: a) transform this structure into HTML, b) modify the structure, c) transform this structure back into wikitext.
But I guess it's more realistic to write a new JS parser than to write a new PHP parser. The former can start as a stub, the latter would need to be fully operational from the beginning.
Stephanie's suggestions are also interesting.
lampak
- (except the WYSIWTF, of course)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Have a look at the xwiki/2.0 syntax (http://platform.xwiki.org/xwiki/bin/view/Main/XWikiSyntax) for an example of a wikitext syntax that works with WYSIWYG editing.
Best regards,
Andreas Jonsson
2011-01-01 18:51, lampak skrev:
I've been following the discussion and as I can see it's already become rather unproductive*. So I hope my cutting in will not be very much out of place (even if I don't really know what I'm talking about).
Many people here has stated the main reason why a WYSIWYG editor is not feasible is the current wikitext syntax.
What's actually wrong with it?
The main thing I can thing of is the fact one template may include an opening of a table etc. and another one a closing (e.g. {{col-begin}}, {{col-end}}). It makes it impossible to isolate the template from the rest of the article - draw a frame around it, say "this box here is a template".
It could be fixed by forbidding leaving unclosed tags in templates. As a replacement, a kind of foreach loop could be introduced to iterate through an unspecified number of arguments.
Lack of standardisation has also been mentioned. Something else?
I've tried to think how a perfect parser should work. Most of this has been already mentioned. I think it should work in two steps: first tokenise the code and transform it into an intermediate tree structure like *paragraph title: * plain text: "Section 1" content: * plain text: "foo" * bold text: * plain text: "bar" * template name: "Infobox" * argument name: "last name": value: * plain text: "Shakespear" and so on. Then this structure could be transformed into a) HTML for display, b) JSON for the WYSIWYG editor. Thanks for this you wouldn't need to write a whole new JS parser. The editor would get a half-ready product. The JS code would need to be able to: a) transform this structure into HTML, b) modify the structure, c) transform this structure back into wikitext.
But I guess it's more realistic to write a new JS parser than to write a new PHP parser. The former can start as a stub, the latter would need to be fully operational from the beginning.
Stephanie's suggestions are also interesting.
lampak
- (except the WYSIWTF, of course)
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org