Hey,
You bring up some good points.
I think we're going to need to have some of this and the synchronization
stuff in core.
Right now the code has nothing but the one sites table. No repo code so
presumably the only implementation of that for awhile will be wikidata. And
if parts of this table is supposed to be editable in some cases where there
is no repo but non-editable then I don't see any way for an edit ui to tell
the difference.
We indeed need some configuration setting(s) for wikis to distinguish
between the two cases. That seems to be all "synchronisation code" we'll
need in core. It might or might not be useful to have more logic in core,
or in some dedicated extension. Personally I think having the actual
synchronization code in a separate extension would be nice, as a lot of it
won't be Wikidata specific. This is however not a requirement for Wikidata,
so the current plan is to just have it in the extension, always keeping in
mind that it should be easy to split it off later on. I'd love to discuss
this point further, but it should be clear this is not much of a blocker
for the current code, as it seems unlikely to affect it much, if at all.
On that note consider we're initially creating the new system in parallel
with the old one, which enabled us to just try out changes, and alter them
later on if it turns out there is a better way to do them. Then once we're
confident the new system is what we want to stick to, and know it works
because of it's usage by Wikidata, we can replace the current code with the
new system. This ought to allow us to work a lot faster by not blocking on
discussions and details for to long.
I'm also not sure how this synchronization which
sounds like one-way will
play with individual wikis wanting to add new interwiki
links.
For our case we only need it to work one way, from the Wikidata repo to
it's clients. More discussion would need to happen to decide on an
alternate approach. I already indicated I think this is not a blocker for
the current set of changes, so I'd prefer this to happen after the current
code got merged.
I'm talking about things like the interwiki extensions and scripts that
turn wiki tables into interwiki lists. All these
things are written against
the interwiki table. So by rewriting and using a new table we implicitly
break all the working tricks and throw the user back into SQL.
I am aware of this. Like noted already, the current new code does not yet
replace the old code, so this is not a blocker yet, but it will be for
replacing the old code with the new system. Having looked at the existing
code using the old system, I think migration should not be to hard, since
the new system can do everything the old one can do and the current using
code is not that much. The new system also has clear interfaces, preventing
the script from needing to know of the database table at all. That ought to
facilitate the "do not depend on a single db table" a lot, obviously :)
I like the idea of table entries without actual interwikis. The idea of
some interface listing user selectable sites came to
mind and perhaps sites
being added trivially even automatically.
Though if you plan to support this I think you'll need to drop the NOT
NULL from site_local_key.
I don't think the field needs to allow for null - right now the local keys
on the repo will be by default the same as the global keys, so none of them
will be null. On your client wiki you will then have these values by
default as well. If you don't want a particular site to be usable as
"languagelink" or "interwikilink", then simply set this in your local
configuration. No need to set the local id to null. Depending on how
actually we end up handling the defaulting process, having null might or
might not turn out to be useful. This is a detail though, so I'd suggest
sticking with not null for now, and then if it turns out I'd be more
convenient to allow for null when writing the sync code, just change it
then.
Actually, another thought makes me think the schema should be a little
different.
site_local_key probably shouldn't be a column, it should probably be
another table.
Something like site_local_key (slc_key, slc_site) which would map things
like en:, Wikipedia:, etc... to a specific site.
Denny and I discussed this at some length, now already more then a month
ago (man, this is taking long...). Our conclusions where that we do not
need it, or would benefit from it much in Wikidata. In fact, I'd introduce
additional complexity, which is a good argument for not including it in our
already huge project. I do agree that conceptually it's nicer to not
duplicate such info, but if you consider the extra complexity you'd need to
get rid of it, and the little gain you have (removal of some minor
duplication which we've had since forever and is not bothering anyone), I'm
sceptical we ought to go with this approach, even outside of Wikidata.
I think I need to understand the plans you have for synchronization a bit
more.
- Where does Wikidata get the sites
The repository wiki holds the canonical copy of the sites, which gets send
to all clients. Modification of the site data can only happen on the
repository. All wikis (repo and clients) have their own local config so can
choose to enable all sites for all functionality, completely hide them, or
anything in between.
- What synchronizes the data
The repo. As already mentioned, it might be nicer to split this off in it's
own extension at some point. But before we get to that, we first need to
have the current changes merged.
Btw if you really want to make this an abstract list of sites dropping site_url
> and the other two related columns might be an idea.
> At first glance the url looks like something standard that every site
> would have. But once you throw something like MediaWiki into the mix with
> short urls, long urls, and an API the url really becomes type specific data
> that should probably go in the blob. Especially when you start thinking
> about other custom types.
The patch sitting on gerrit already includes this. (Did you really look at
it already? The fields are documented quite well I'd think.) Every site has
a url (that's not specific to the type of site), but we have a type system
with currently the default (general) site type and a MediaWikiSite type.
The type system works with two blob fields, one for type specific data and
one for type specific configuration.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--