On Thu, 09 Aug 2012 09:12:16 -0700, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
So if this is something hoping to replace the interwiki system I'd like to look over what the plan and overall idea is with this to make sure we don't repeat the same mistakes.
Please have a look at the patch on gerrit then. Feedback is much appreciated :) https://gerrit.wikimedia.org/r/#/c/14295/
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Looking over the code it does seem we're repeating the same issues that exist with the current interwiki system I was planning to eliminate when I moved includes/Interwiki.php to includes/interwiki/Interwiki.php and put this on my endless to-do list.
The issue I was trying to deal with was storage. Currently we 100% assume that the interwiki list is a table and there will only ever be one of them. But this counters multiple facts about interwikis in practice: - We have a default set of interwiki links. Because we use a database instead of flat files we end up inserting stuff on installation. As a result when something changes eg: Wikimedia supports https:// and now all links are supposed to be protocol-relative. We have hundreds of wikis all using outdated interwiki rules even after they upgrade MediaWiki because interwiki links are only inserted by software on installation, they are not taken directly from the software map. - In practice we don't want one interwiki map. In projects like Wikimedia we actually usually want two or three. We want a global shared list of interwikis so that [[Wikipedia:]] [[commons:]] etc... work on every project. We want a shared list of interwikis for each project (ie: Wikipedias, Wiktionaries, etc...), primarily so that [[en:]] [[es:]] etc... language links are not duplicated, since these can't be global but also there may be some interwiki links that apply to some projects but not others. And sometimes we also want a wiki-local interwiki list because some communities want to add links to sites that other wikis don't. Or we may want to localize a link. And we end up writing absolutely horrible hacks we shouldn't have to because implementation is ignorant of reality.
I had planned to do a few primary things to the system: - Drop the notion of the interwiki list simply being a database table. Multiple class implementations were going to make it possible to have database backed interwiki lists, file backed interwiki lists (multiple formats), etc... - Drop the single-list handling and add allow a list of multiple interwiki sources to be configured from a wg variable. Together it would mean that our default list of interwiki links would no longer be stored in the interwiki table and instead would be read directly from our source code where cleaning up the urls would nicely update all wikis when they upgrade. And it would mean that it would be easy to setup multiple interwiki list sources for wikis. Such as a global interwiki database, a project one, and a local one. And it would be possible to use simple text based file backed interwiki lists so that people don't need to mess with sql.
---- But it looks like the new sites code is also focused around a single list of database backed sites.
((Also, while there are a number of really interesting ideas, sorry to say it but some of the code already triggers that "Must rewrite!" mood rather than thinking of small incremental tweaks))
Also anything in this area really needs to think of our lack of user interface. If we rewrite this then we absolutely must include a UI to view and edit this in core. By rewriting it we ditch every hack trying to make it easy to control the interwiki list and only make the problem worse. The notes on synchronizing with wikidata look interesting. But this kind of thing absolutely has to be user-friendly and multi-wiki friendly at a core level, not only for wikis using wikidata. ---- I think some of this stuff is a bit large to discuss in code review or email. I'd like to do this RfC style, listing everything we need from different perspectives so we can come up with something that doesn't need to be redone yet again.
Originally I was focused around taking interwiki dependence out-of the database. But the talk of synchronization and other things in the code has me thinking of other things like a database table as a final index (like pagelinks, etc...), fetching lists, siteinfo, etc... from other sites, and other things. So I have a feeling that the best thing we come up with will probably look different than what either of us started with.
Firstly though, I probably won't be able to come up with a good idea without a good understanding of Wikidata's role in all this: - I would like to understand what Wikidata needs out of interwiki/sites and what it's going to do with the data - I'd also like to know if Wikidata plans to add any interface that will add/remove sites
If we do this hastily I think we may also miss a very good chance to make fixing bug 11 and bug 10237 much more sanely possible.
bug 39199 also covers a thought on linking in pages I've been thinking about.
[bug 11] https://bugzilla.wikimedia.org/show_bug.cgi?id=11 [bug 10237] https://bugzilla.wikimedia.org/show_bug.cgi?id=10237 [bug 39199] https://bugzilla.wikimedia.org/show_bug.cgi?id=39199