TL;DR: How can we collaboratively put together a list of non-spammy sites that wikis may want to add to their interwiki tables for whitelisting purposes; and how can we arrange for the list to be efficiently distributed and imported?
Nemo bis points out that Interwikihttps://www.mediawiki.org/wiki/Extension:Interwikiis "the easiest way to manage whitelisting" since nofollow isn't applied to interwiki links. Should we encourage, then, wikis to make more use of interwiki links? Usually, MediaWiki installations are configured so that only sysops can add, remove, or modify interwiki prefixes and URLs. If a user wants to link to another wiki, but it's not on the list, often he will just use an external link rather than asking a sysop to add the prefix (since it's a hassle for both parties and often people don't want to bother the sysops too much in case they might need their help with something else later). This defeats much of the point of having the interwiki table available as a potential whitelist, unless the sysops are pretty on top of their game when it comes to figuring out what new prefixes should be added. In most cases, they probably aren't; the experience of Nupedia shows that elitist, top-down systems tend not to work as well as egalitarian, bottom-up systems.
Currently, interwiki.sqlhttps://git.wikimedia.org/blob/mediawiki%2Fcore//maintenance%2Finterwiki.sqlhas 100 wikis, and there doesn't seem to be much rhyme or reason to which ones are included (e.g. Seattlewiki?) I wrote InterwikiMaphttps://www.mediawiki.org/wiki/Extension:InterwikiMap, which dumps the contents of the wiki's interwiki table into a backup page and substitutes in its place the interwiki table of some other wiki (e.g. I usually use Wikimedia's), with such modifications as the sysops see fit to make. The extension lets sysops add, remove and modify interwiki prefixes and URLs in bulk rather than one by one through Special:Interwiki, which is a pretty tedious endeavor. Unfortunately, as written it is not a very scalable solution, in that it can't accommodate very many thousand wiki prefixes before the backup wikitables it generates exceed the capacity of wiki pages, or it breaks for other reasons.
I was thinking of developing a tool that WikiIndex (or some other wiki about wikis) could use to manage its own interwiki table via edits to pages. Users would add interwiki prefixes to the table by adding a parameter to a template that would in turn use a parser function that, upon the saving of the page, would add the interwiki prefix to the table. InterwikiMap could be modified to do incremental updates, polling the API to find out what changes have recently been made to the interwiki table, rather than getting the whole table each time. It would then be possible for WikiIndex (or whatever other site were to be used) to be the wikisphere's central repository of canconical interwiki prefixes. See http://wikiindex.org/index.php?title=User_talk%3AMarkDilley&diff=172654&...
But there's been some question as to whether there would be much demand for a 200,000-prefix interwiki table, or whether it would be desirable. It could also provide an incentive for spammers to try to add their sites to WikiIndex. See https://www.mediawiki.org/wiki/Talk:Canonical_interwiki_prefixes
It's hard to get stuff added to meta-wiki's interwiki maphttps://meta.wikimedia.org/wiki/Interwiki_mapbecause one of the criteria is that the prefix has to be one that would be used a lot on Wikimedia sites. How can we put together a list of non-spammy sites that wikis would be likely to want to have as prefixes for nofollow whitelisting purposes, and distribute that list efficiently? I notice that people are more likely to put together lists of spammy than non-spammy sites; see e.g. Freakipedia's listhttp://freakipedia.net/index.php5?title=Spam_Site_List. (Hmm, I think I'll pimp my websites to that wiki when I get a chance; the fact that the spam isn't just removed but put on permanent record in a public denunciation means it's a potential opportunity to gain exposure for my content. They say there's no such thing as bad publicity. ;) )
On 11/13/2013 05:44 AM, Nathan Larson wrote:
TL;DR: How can we collaboratively put together a list of non-spammy sites that wikis may want to add to their interwiki tables for whitelisting purposes; and how can we arrange for the list to be efficiently distributed and imported?
I like the idea. Unless I'm mistaken, it seems like most of this idea could be implemented and improved on as an extension.
While the use of Meta has the advantage of a large number of possible reviewers, I wonder if it might get better review elsewhere.
Alternatively, we could work with WikiApiary to tag spammy wikis that his bot finds.
Also, since we're talking about spam and MediaWiki, another good site to check out would be http://spamwiki.org/mediawiki/.
Mark.
On 11/13/2013 08:49 AM, Mark A. Hershberger wrote:
On 11/13/2013 05:44 AM, Nathan Larson wrote:
TL;DR: How can we collaboratively put together a list of non-spammy sites that wikis may want to add to their interwiki tables for whitelisting purposes; and how can we arrange for the list to be efficiently distributed and imported?
I like the idea. Unless I'm mistaken, it seems like most of this idea could be implemented and improved on as an extension.
Alternatively, we could work with WikiApiary to tag spammy wikis that his bot finds.
Also, since we're talking about spam and MediaWiki, another good site to check out would be http://spamwiki.org/mediawiki/.
With my hat of third party wiki admin I personally agree with all this.
Another possibility further down in the roadmap:
Imagine a World in which you could transclude in your wiki content from a subset of this interwiki table of wikis, based on license compatibility and whatever other filters. To avoid performance problems, the check with the sources could be done by a cron periodically, etc.
The most interesting part of this proposal is to start an interwiki table of friendly and compatible wikis willing to ease the task of linking and sharing content among them. The attributes of the table and a decentralized system to curate and maintain the data could open many possibilities of collaboration between MediaWiki sites.
On Thu, Nov 14, 2013 at 12:18 PM, Quim Gil qgil@wikimedia.org wrote:
With my hat of third party wiki admin I personally agree with all this.
Another possibility further down in the roadmap:
Imagine a World in which you could transclude in your wiki content from a subset of this interwiki table of wikis, based on license compatibility and whatever other filters. To avoid performance problems, the check with the sources could be done by a cron periodically, etc.
The most interesting part of this proposal is to start an interwiki table of friendly and compatible wikis willing to ease the task of linking and sharing content among them. The attributes of the table and a decentralized system to curate and maintain the data could open many possibilities of collaboration between MediaWiki sites.
Is there reason to think that a decentralized system would be likely to evolve, or that it would be optimal? It seems to me that most stuff in the wikisphere is centered around WMF; e.g. people usually borrow templates, the spam blacklist, MediaWiki extensions, and so on, from WMF sites. Most wikis that attempted to duplicate what WMF does have failed to catch on; e.g. no encyclopedia that tried to copy Wikipedia's approach (e.g. allegedly neutral point of view and a serious, rather than humorous, style of writing) came close to Wikipedia's size and popularity, and no wiki software caught on as much as MediaWiki. It's just usually more efficient to have a centralized repository and widely-applied standards so that people aren't duplicating their labor too much.
But if one were to pursue centralization of interwiki data, what would be the central repository? Would WMF be likely to be interested? Hardly anything at https://meta.wikimedia.org/wiki/Proposals_for_new_projects has been approved for creation, so I'm not sure how one would go about getting something like this established through WMF.
Some advantages of WMF are that we can be pretty confident its projects will be around for awhile, and none of them are clogged up with the kind of advertising we see at, say, Wikia. Non-WMF wikis come and go all the time; one never knows when the owner will get hit by a bus, lose interest, etc. and then the users are left high and dry. That could be a problem if the wiki in question is a central repository that thousands of wikis have come to rely upon.
Perhaps the MediaWiki Foundation could spearhead this? Aside from its nonexistence, I think that organization could be a pretty good venue for getting this done. I'll have to bring this up with some of my imaginary friends who sit on the MWF board of trustees.
On 11/14/2013 09:53 AM, Nathan Larson wrote:
Is there reason to think that a decentralized system would be likely to evolve, or that it would be optimal? It seems to me that most stuff in the wikisphere is centered around WMF; e.g. people usually borrow templates, the spam blacklist, MediaWiki extensions, and so on, from WMF sites. Most wikis that attempted to duplicate what WMF does have failed to catch on;
As mentioned by Mark and quoted in my email, http://wikiapiary.com/ could be a good starting point.
Just improvising a hypothetical starting point for a process to maintain the decentralized interwiki table:
In order to become a candidate, a wiki must have the extension installed and a quantifiable score based on age, size, license, and lack of reports as spammer.
The extension could perhaps check how much a wiki is linked by how many wikis pof which characteristics, calculating a popularity index of sorts. Maybe you can even have a classification of topics filtering the langage and type of content that matters to your wiki. By default, only wikis above some popularity index would be included in your local interwiki table. The admins could fine tune locally.
The master interwiki table could be hosted in Wikiapiary or wherever. It would be mirrored in some wahy by the wikis with the extension installed willing to do so.
The maintenance of the table itself doesn't even look like a big deal, compared to developing the extension and adding new interwiki features. It would be based on the userbase of wiki installing the extension.
Whether Wikimedia projects join the interwiki party of not, that would depend on the extension being ready for Wikimedia adoption annd a decision to deploy it. But that would be a Wikimedia discussion, not a Interwiki project discussion.
As said, all of the above is improvised and hypothetical. Sorry in advance for any planning flaws. :)
On Thu, Nov 14, 2013 at 1:08 PM, Quim Gil qgil@wikimedia.org wrote:
As mentioned by Mark and quoted in my email, http://wikiapiary.com/ could be a good starting point.
Just improvising a hypothetical starting point for a process to maintain the decentralized interwiki table:
In order to become a candidate, a wiki must have the extension installed and a quantifiable score based on age, size, license, and lack of reports as spammer.
The extension could perhaps check how much a wiki is linked by how many wikis pof which characteristics, calculating a popularity index of sorts. Maybe you can even have a classification of topics filtering the langage and type of content that matters to your wiki. By default, only wikis above some popularity index would be included in your local interwiki table. The admins could fine tune locally.
The master interwiki table could be hosted in Wikiapiary or wherever. It would be mirrored in some wahy by the wikis with the extension installed willing to do so.
The maintenance of the table itself doesn't even look like a big deal, compared to developing the extension and adding new interwiki features. It would be based on the userbase of wiki installing the extension.
Whether Wikimedia projects join the interwiki party of not, that would depend on the extension being ready for Wikimedia adoption annd a decision to deploy it. But that would be a Wikimedia discussion, not a Interwiki project discussion.
As said, all of the above is improvised and hypothetical. Sorry in advance for any planning flaws. :)
Do we really need to set criteria for a wiki being a candidate? Are we trying to keep the number of approved interwiki links down? I had in mind just letting everyone have a prefix and a URL that we would distribute.
Some might get the more preferable prefixes, though; e.g. we would have some sort of tiebreaker if, say, there were two wikis wanting to call themselves Foowiki and get the Foowiki: prefix. I guess we should continue this discussion on WikiApiary. I want to try to get someone (Jamie Thingelstad?) with authority to install extensions to give the go-ahead on the specs for the extension; once they're approved, there will be no reason not to start writing it. I was originally planning to work with WikiIndex but WikiApiary seems like a better idea. By the way, I like how they note that they're monitoring over 9,300https://encyclopediadramatica.es/Over_9000wikis.
It might help that WikiApiary runs SMW. Maybe the extension could be written to interact with it in some way that would be more elegant than using tag or parser functions to insert metadata into the database à la the {{setbpdprop: }} of Extension:BedellPenDragonhttps://www.mediawiki.org/wiki/Extension:BedellPenDragon .
On Sat, Nov 16, 2013 at 3:19 AM, Nathan Larson nathanlarson3141@gmail.comwrote:
Do we really need to set criteria for a wiki being a candidate? Are we trying to keep the number of approved interwiki links down? I had in mind just letting everyone have a prefix and a URL that we would distribute.
I meant to say, "approved interwiki prefixes". Anyway, this has been filed as bug 57131 https://bugzilla.wikimedia.org/show_bug.cgi?id=57131.
On 11/14/2013 12:53 PM, Nathan Larson wrote:
Is there reason to think that a decentralized system would be likely to evolve, or that it would be optimal?
Yes. Wikimedians are motivated to maintain Wikimedia sitess. I don't think it is likely that they'll have an interest in maintaining a list of non-spammy, non-Wikimedia sites. Using Meta to host the list implies that there is an interest in the wiki-world outside of Wikimedia.
It seems to me that most stuff in the wikisphere is centered around WMF; e.g. people usually borrow templates, the spam blacklist, MediaWiki extensions, and so on, from WMF sites.
Right. But here you have people re-using the work that the Wikimedia community has made available -- work they are already doing. It happens to work elsewhere, but it is focused on the Wikimedia sites.
It's just usually more efficient to have a centralized repository and widely-applied standards so that people aren't duplicating their labor too much.
True, but I don't think centralizing on Meta gives you the efficiency benefits you want. You're not re-using work that already happens on Meta. Instead, you're asking them to do work that they haven't (yet) shown an interest in.
And, while MediaWiki extensions are available from MediaWiki.org, a good number of those extensions have nothing to do with the WMF -- they're just hosted here. Several extensions are just hosted on github. Some don't even have a reference on MW.o.
This is a problem of community-building, really, and the work of WikiApiary is good step in that direction. I have discussed plans for MW 1.23 (https://bugzilla.wikimedia.org/54425, https://www.mediawiki.org/wiki/Requests_for_comment/Opt-in_site_registration...) for a way to really get this community-building effort going.
Thanks,
Mark.
wikitech-l@lists.wikimedia.org