~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]
On 12-08-09 12:00 PM, Jeroen De Dauw wrote:
Hey,
Daniel, thanks for your input.
TL;DR at the bottom :)
The issue I was trying to deal with was storage.
Currently we 100%
assume that the interwiki list is a table and there will only
ever be
one of them.
Yes, we are not changing this. Having a more flexible system might or
might not be something we'd want in MediaWiki. We do not need it in
Wikidata though. The changes we're making here do not seem to affect
this issue at all, so you can just as well implement it later on.
In practice we don't want one interwiki map.
In projects like
Wikimedia we actually usually want two or three.
..
And sometimes we also want a wiki-local interwiki list because some
communities
want to add links to sites that other wikis don't.
This we are actually tacking, although in a different fashion then you
propose. Rather then having many different lists of sites to maintain,
we have split sites from their configuration. The list of sites is
global and shared by all clients. Their configuration however is
local. So if wiki a wants to use site x as interwikilink with prefix
foobar, wiki b wants to use it with prefix baz and wiki c does not
want to use it as interwikilink at all, this is perfectly possible.
This split and associated generalization our changes bring add a lot
of flexibility compared to the current system and remove bad
assumptions currently baked in.
I think we're going to need to have some of
this and the synchronization
stuff in core.
Right now the code has nothing but the one sites table. No repo code so
presumably the only implementation of that for awhile will be wikidata.
And if parts of this table is supposed to be editable in some cases
where there is no repo but non-editable then I don't see any way for an
edit ui to tell the difference.
I'm also not sure how this synchronization which sounds like one-way
will play with individual wikis wanting to add new interwiki links.
Also anything
in this area really needs to think of our lack of user
interface. If we rewrite
this then we absolutely must include a UI to
view and edit this in core.
Again, this is not something we're touching at all, or want to touch,
as we don't need it. Personally I think I'd be great to have such
facilities, and it makes sense to add these after the backend has been
fixed. I'd be happy to work with you on this (or leave it entirely up
to you) once we got the relevant rewrite work done.
By rewriting it we ditch every hack trying to
make it easy to
control the interwiki list and only make the problem worse.
Our change will not drop any existing functionality. I will make sure
there are tools/facilities at least as good (and probably better) then
the current ones.
I'm talking about things like the interwiki extensions and
scripts that
turn wiki tables into interwiki lists. All these things are written
against the interwiki table. So by rewriting and using a new table we
implicitly break all the working tricks and throw the user back into SQL.
I would like
to understand what Wikidata needs out of
interwiki/sites and what it's going to
do with the data
We need this for our "equivalent links", which consist out of a global
site id and a page. Right now we do not have consistent global ids, in
fact we don't have global ids. We just have local ids that happen to
be similar everywhere (while one might not want this, but is pretty
much forced to right now), which must be language codes in order to be
"languagelinks" or (better named) "equivalent links". Also, right
now,
all languagelinks are interwikilinks (wtf) - we want to be able to
have "equivalent links" without then also being interwiki links!
I like
the idea of table entries without actual interwikis. The idea of
some interface listing user selectable sites came to mind and perhaps
sites being added trivially even automatically.
Though if you plan to support this I think you'll need to drop the NOT
NULL from site_local_key.
Actually, another thought makes me think the schema should be a little
different.
site_local_key probably shouldn't be a column, it should probably be
another table.
Something like site_local_key (slc_key, slc_site) which would map things
like en:, Wikipedia:, etc... to a specific site.
I can see wikis wanting to use multiple interwiki names for the same
site. In fact I'm pretty sure this already happens with the existing
interwiki table. We just create duplicate rows.
But you want global ids so I really don't think you want data
duplication like that to happen.
I'd also
like to know if Wikidata plans to add any interface that
will add/remove sites
The backend will have an interface to do this, but we're not planning
on any API modules or UIs. The backend will be written keeping in mind
people will want those though, so it ought to be easy to add them
later on.
So to wrap up: I don't think there is any conflict between what we
want to do (if you disagree, please provide some pointers). You can
make your changes later on, and will have a much more solid base to
work on then now.
I think I need to understand the plans you have for
synchronization a
bit more.
- Where does Wikidata get the sites
- What synchronizes the data
- What is the repo like. Also what it it based off of. Is this wikis
syncing from another wiki's sites table or does Wikidata have a real set
of data the sites table gets based off of.
- Is this one-way synchronization or multiway.
synchronization, treatment of the table (whether it's an index of
something else or first class data), and editing/UIs for editing are a
set of things where you can get in the way of the ability to do the
others later if you don't think of them all up front.
Our old interwiki table was treated as first-class data and was simple
data that was easy to create an edit interface for. As a result it's
hard to do any synchronization for since we didn't plan for it.
Likewise if we design a sites table focused on synchronizing data and
treatment of the table as simultaneous first-class data with some of it
treated like an index. We can easily come up with something that is
going to get in the way of the consistency needed for a UI.
One of our options might be to treat sites like an index of data built
from other sources just like pagelinks. Wikidata can act as a repo, the
sites code can build from multiple sources with Wikidata being the
first, and when a UI comes into play the UI can create it's own list of
sites and that can be used as a source for the building of the sites table.
----
Heh, it probably doesn't help that this is making my abstract revision
idea come up and make me want to have the UI depend off of that.
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
Btw if you really want to make this an abstract list of sites dropping
site_url and the other two related columns might be an idea.
At first glance the url looks like something standard that every site
would have. But once you throw something like MediaWiki into the mix
with short urls, long urls, and an API the url really becomes type
specific data that should probably go in the blob. Especially when you
start thinking about other custom types.