On Thu, 09 Aug 2012 09:12:16 -0700, Jeroen De Dauw
<jeroendedauw(a)gmail.com> wrote:
Hey,
So if this is something hoping to replace the
interwiki system I'd like
to look over what the plan and overall idea is with this to make sure we
don't repeat the same mistakes.
Please have a look at the patch on gerrit then. Feedback is much
appreciated :)
https://gerrit.wikimedia.org/r/#/c/14295/
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
Looking over the code it does seem we're repeating the same issues that
exist with the current interwiki system I was planning to eliminate when I
moved includes/Interwiki.php to includes/interwiki/Interwiki.php and put
this on my endless to-do list.
The issue I was trying to deal with was storage. Currently we 100% assume
that the interwiki list is a table and there will only ever be one of
them. But this counters multiple facts about interwikis in practice:
- We have a default set of interwiki links. Because we use a database
instead of flat files we end up inserting stuff on installation. As a
result when something changes eg: Wikimedia supports https:// and now all
links are supposed to be protocol-relative. We have hundreds of wikis all
using outdated interwiki rules even after they upgrade MediaWiki because
interwiki links are only inserted by software on installation, they are
not taken directly from the software map.
- In practice we don't want one interwiki map. In projects like Wikimedia
we actually usually want two or three. We want a global shared list of
interwikis so that [[Wikipedia:]] [[commons:]] etc... work on every
project. We want a shared list of interwikis for each project (ie:
Wikipedias, Wiktionaries, etc...), primarily so that [[en:]] [[es:]]
etc... language links are not duplicated, since these can't be global but
also there may be some interwiki links that apply to some projects but not
others. And sometimes we also want a wiki-local interwiki list because
some communities want to add links to sites that other wikis don't. Or we
may want to localize a link. And we end up writing absolutely horrible
hacks we shouldn't have to because implementation is ignorant of reality.
I had planned to do a few primary things to the system:
- Drop the notion of the interwiki list simply being a database table.
Multiple class implementations were going to make it possible to have
database backed interwiki lists, file backed interwiki lists (multiple
formats), etc...
- Drop the single-list handling and add allow a list of multiple interwiki
sources to be configured from a wg variable.
Together it would mean that our default list of interwiki links would no
longer be stored in the interwiki table and instead would be read directly
from our source code where cleaning up the urls would nicely update all
wikis when they upgrade. And it would mean that it would be easy to setup
multiple interwiki list sources for wikis. Such as a global interwiki
database, a project one, and a local one. And it would be possible to use
simple text based file backed interwiki lists so that people don't need to
mess with sql.
----
But it looks like the new sites code is also focused around a single list
of database backed sites.
((Also, while there are a number of really interesting ideas, sorry to say
it but some of the code already triggers that "Must rewrite!" mood rather
than thinking of small incremental tweaks))
Also anything in this area really needs to think of our lack of user
interface. If we rewrite this then we absolutely must include a UI to view
and edit this in core. By rewriting it we ditch every hack trying to make
it easy to control the interwiki list and only make the problem worse.
The notes on synchronizing with wikidata look interesting. But this kind
of thing absolutely has to be user-friendly and multi-wiki friendly at a
core level, not only for wikis using wikidata.
----
I think some of this stuff is a bit large to discuss in code review or
email. I'd like to do this RfC style, listing everything we need from
different perspectives so we can come up with something that doesn't need
to be redone yet again.
Originally I was focused around taking interwiki dependence out-of the
database. But the talk of synchronization and other things in the code has
me thinking of other things like a database table as a final index (like
pagelinks, etc...), fetching lists, siteinfo, etc... from other sites, and
other things. So I have a feeling that the best thing we come up with will
probably look different than what either of us started with.
Firstly though, I probably won't be able to come up with a good idea
without a good understanding of Wikidata's role in all this:
- I would like to understand what Wikidata needs out of interwiki/sites
and what it's going to do with the data
- I'd also like to know if Wikidata plans to add any interface that will
add/remove sites
If we do this hastily I think we may also miss a very good chance to make
fixing bug 11 and bug 10237 much more sanely possible.
bug 39199 also covers a thought on linking in pages I've been thinking
about.
[bug 11]
https://bugzilla.wikimedia.org/show_bug.cgi?id=11
[bug 10237]
https://bugzilla.wikimedia.org/show_bug.cgi?id=10237
[bug 39199]
https://bugzilla.wikimedia.org/show_bug.cgi?id=39199
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]