Hey,
You mean site_config?
You're suggesting the interwiki system should look for a site by site_local_key, when it finds one parse out the site_config, check if it's disabled and if so ignore the fact it found a site with that local key? Instead of just not having a site_local_key for that row in the first place?
No. Since the interwiki system is not specific to any type of site, this approach would be making it needlessly hard. The site_link_inline field determines if the site should be usable as interwiki link, as you can see in the patchset:
-- If the site should be linkable inline as an "interwiki link" using -- [[site_local_key:pageTitle]]. site_link_inline bool NOT NULL,
So queries would be _very_ simple.
So data duplication simply because one wiki needs a second local name
will mean that one url now has two different global ids this sounds precisely like something that is going to get in the way of the whole reason you wanted this rewrite.
- It does not get in our way at all, and is completely disjunct from why we
want the rewrite
- It's currently done like this
- The changes we do need and are proposing to make will make such a rewrite
at a later point easier then it is now
Doing it this way frees us from creating any restrictions on whatever
source we get sites from that we shouldn't be placing on them.
- We don't need this for Wikidata
- It's a new feature that might or might not be nice to have that currently
does not exist
- The changes we do need and are proposing to make will make such a rewrite
at a later point easier then it is now
So you might as well drop the 3 url related columns and just use the data
blob that you already have.
I don't see what this would gain us at all. It's just make things more complicated.
The $1 pattern may not even work for some sites.
- We don't need this for Wikidata
- It's a new feature that might or might not be nice to have that currently
does not exist
- The changes we do need and are proposing to make will make such a rewrite
at a later point easier then it is now
And in fact we are making this more flexible by having the type system. The MediaWiki site type could for instance be able to form both "nice" urls and index.php ones. Or a gerrit type could have the logic to distinguish between the gerrit commit number and a sha1 hash.
Cheers
[Just to clarify, I'm doing inline replies to things various people said, not just Jeroen]
First and foremost, I'm a little confused as to what the actual use cases here are. Could we get a short summary for those who aren't entirely following how wikidata will work, why the current interwiki situation is insufficient? I've read the I0a96e585 and http://lists.wikimedia.org/pipermail/wikitech-l/2012-June/060992.html, but everything seems very vague "It doesn't work for our situation", without any detailed explanation of what that situation is. At most the messages kind of hint at wanting to be able to access the list of interwiki types of the wikidata "server" from a wikidata "client" (and keep them in sync, or at least have them replicated from server->client). But there's no explanation given to why one needs to do that (are we doing some form of interwiki transclusion and need to render foreign interwiki links correctly? Want to be able to do global whatlinkshere and need unique global ids for various wikis? Something else?)
- Site definitions can exist that are not used as "interlanguage link" and
not used as "interwiki link"
And if we put one of those on a talk page, what would happen? Or if foo was one such link, doing [[:foo:some page]] (Current behaviour is it becomes an interwiki).
Although to be fair, I do see how the current way we distinguish between interwiki and interlang links is a bit hacky.
And in fact we are making this more flexible by having the type system. The MediaWiki site type could for instance be able to form both "nice" urls and index.php ones. Or a gerrit type could have the logic to distinguish between the gerrit commit number and a sha1 hash.
I must admit I do like this this idea. In particular the current situation where we treat the value of an interwiki link as a title (aka spaces -> underscores etc) even for sites that do not use such conventions, has always bothered me. Having interwikis that support url re-writing based on the value does sound cool, but I certainly wouldn't want said code in a db blob (and just using an integer site_type identifier is quite far away from giving us that, but its still a step in a positive direction), which raises the question of where would such rewriting code go.
The issue I was trying to deal with was storage. Currently we 100% assume that the interwiki list is a table and there will only ever be one of them.
Do we really assume that? Certainly that's the default config, but I don't think that is the config used on WMF. As far as I'm aware, Wikimedia uses a cdb database file (via $wgInterwikiCache), which contains all the interwikis for all sites. From what I understand, it supports doing various "scope" levels of interwikis, including per db, per site (Wikipedia, Wiktionary, etc), or global interwikis that act on all sites.
The feature is a bit wmf specific, but it does seem to support different levels of interwiki lists.
Furthermore, I imagine (but don't know, so lets see how fast I get corrected ;) that the cdb database was introduced not just as convenience measure for easier administration of the interwiki tables, but also for better performance. If so, one should also take into account any performance hit that may come with switching to the proposed "sites" facility.
Cheers, -bawolff