Hi all,
thanks to Daniel (F.) for structuring the discussion. The discussion
is currently ongoing here:
<https://www.mediawiki.org/wiki/Requests_for_comment/New_sites_system>
I hope that the requirements and use cases section is complete. If
not, please tune in now. We will build on the use cases and their
discussion there.
I also created a first draft based for a schema, which was very
quickly completely ripped apart, and replaced by a much better one on
the discussion page. There are also other discussions going on there.
Please tune in if you are interested in the Sites table, in order to
achieve consensus on the topic.
<https://www.mediawiki.org/wiki/Talk:Requests_for_comment/New_sites_system#Database_schema_proposal_18334>
Furthermore, I want to address the unanswered questions Rob raised:
* Re Tim's July 18th comment and Rob's following comment: where is the
calling code?
The code calling the sitetables is in the Wikibase Library, basically
all the files starting with Site*:
<https://gerrit.wikimedia.org/r/gitweb?p=mediawiki/extensions/Wikibase.git;a=tree;f=lib/includes;h=7debe083be74ad42028f37e17e26ce9a419bf7ab;hb=HEAD>
But since they are part of the patchset, you probably seen them. The
Sites info is being used in:
* most importantly Wikibase/lib/includes/SiteLink.php, where the site
link (e.g. the link from a Wikidata item to a Wikipedia article) is
defined using the Sites data. The Sitelinks are the most prominent
object depending on the data, and are used basically everywhere on the
repository. Wikibase/repo/includes/api/ApiSetSiteLink.php offers a
good example of that.
* some utils in Wikibase/lib/includes/Utils.php
* further, a few places on the client, like LangLinkHandler and the hooks
* Questions by Bawulff I redacted from my answer (because I was
focusing on other stuff):
First and foremost, I'm a little confused as to
what the actual use
cases here are. Could we get a short summary for those who aren't
entirely following how wikidata will work, why the current interwiki
situation is insufficient?
Most of all, we need global identifiers for the different wikis. We
could add a table which only contains mapping of the local prefixes to
global identifiers, but we think that the current interwiki table
could use some love anyway, and thus we decided to restructure it as a
whole. This now has lead to the above mentioned RFC, but the original
blocker is: for providing language links form a central source --
Wikidata -- we need to have global wiki identifiers.
>* Site definitions can exist that are not used as
"interlanguage link" and
>not used as "interwiki link"
And if we put one of those on a talk page, what would
happen? Or if
foo was one such link, doing [[:foo:some page]] (Current behaviour is
it becomes an interwiki).
I probably misunderstand. If currently something is not set up as an
interlanguage link and neither as an interwiki link, it will become a
normal link, not an interwiki link (i.e. it will point to the local
page foo:some page in the main namespace). Did you mean something
else?
Although to be fair, I do see how the current way we
distinguish
between interwiki and interlang links is a bit hacky.
Agreed, the way it is currently done in core is a bit hacky.
>And in fact we are making this more flexible by
having the type system. The
>MediaWiki site type could for instance be able to form both "nice" urls and
>index.php ones. Or a gerrit type could have the logic to distinguish
>between the gerrit commit number and a sha1 hash.
I must admit I do like this this idea. In particular
the current
situation where we treat the value of an interwiki link as a title
(aka spaces -> underscores etc) even for sites that do not use such
conventions, has always bothered me. Having interwikis that support
url re-writing based on the value does sound cool, but I certainly
wouldn't want said code in a db blob (and just using an integer
site_type identifier is quite far away from giving us that, but its
still a step in a positive direction), which raises the question of
where would such rewriting code go.
A handler class for each type of site, that would construct links to
that type of side based on the data about this site.
> The issue I was trying to deal with was storage.
Currently we 100% assume
>that the interwiki list is a table and there will only ever be one of them.
Do we really assume that? Certainly that's the
default config, but I
don't think that is the config used on WMF. As far as I'm aware,
Wikimedia uses a cdb database file (via $wgInterwikiCache), which
contains all the interwikis for all sites. From what I understand, it
supports doing various "scope" levels of interwikis, including per db,
per site (Wikipedia, Wiktionary, etc), or global interwikis that act
on all sites.
We did not know about that database. Who can tell us more about it?
This would be very interesting to get our synching code optimized.
It still wouldn't help us with the global identifiers, though, but it
would be good to know more about it.
Cheers,
Denny
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 |
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
2012/8/14 Daniel Friesen <lists(a)nadir-seen-fire.com>om>:
On Tue, 14 Aug 2012 07:32:07 -0700, Jeroen De Dauw
<jeroendedauw(a)gmail.com>
wrote:
Hey,
You mention using a global id to refer to sites for making links. And
synchronization of the sites table.
So you're saying that this part of Wikidata only works within Wikimedia
projects right?
Does Wikidata overall only function within Wikimedia projects. Or is
there
a different mechanism to deal with clients from external wikis?
The software we're writing is completely Wikimedia agnostic and the actual
Wikidata project will obvious be usable outside of Wikimedia projects. We
will allow for links to non Wikimedia sites (although we have not agreed
on
how open this will be), and for non-Wikimedia sites to access all data
stored within Wikidata (including our "equivalent links" using the sites
table). Does that answer your question or am I missing something?
Cheers
--
Jeroen De Dauw
http://www.bn2vs.com
Don't panic. Don't be evil.
--
Ok, so the data is available to 3rd party wikis.
I was asking how you planned to handle sites in 3rd party wikis.
Do you have a separate mechanism to handle links from 3rd party clients? Or
are they supposed to sync their sites from Wikimedia's Wikidata?
--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [
http://daniel.friesen.name]
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 |
http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.