Hi all,
here is our update on last weeks blocker email. The list got considerably shorter, but we have some long standing issues still there. No new blockers came up.
== Ongoing ==
* Merging the Wikidata branch (ContentHandler) is still open, see https://bugzilla.wikimedia.org/show_bug.cgi?id=38622. There has been no feedback in the last few weeks. Daniel is waiting for input.
* Changeset https://gerrit.wikimedia.org/r/#/c/14295/, bug https://bugzilla.wikimedia.org/show_bug.cgi?id=38705 about handling sites. The idea is to migrate from the "interwiki" table to the new "Sites" facility. RobLa mentioned two weeks ago that Chad seems to be working in a similar direction, but we haven't seen comments yet. No discussion is ongoing or any substantial feedback was received here as well, and it seems somewhat stuck.
== New in the list ==
Nothing.
== Merges == * https://gerrit.wikimedia.org/r/#/c/14301/ (got merged. Yay!)
== Abandoned changesets or not-blocking anymore == * https://gerrit.wikimedia.org/r/#/c/14084/ (abandoned) * https://gerrit.wikimedia.org/r/#/c/8924/ (not blocking anymore but could use some reviewing love) * https://gerrit.wikimedia.org/r/#/c/14303/ (review in progress. not blocking anymore if we drop the STTL extension in favour of the ULS extension, currently investigated) * https://gerrit.wikimedia.org/r/#/c/17073/ (a change to the skin, which we abandoned and we resolve it differently)
I hope this helps, Cheers, Denny
Hi Denny,
Thanks for the update. Comments inline:
On Thu, Aug 9, 2012 at 6:54 AM, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
- Merging the Wikidata branch (ContentHandler) is still open, see
https://bugzilla.wikimedia.org/show_bug.cgi?id=38622. There has been no feedback in the last few weeks. Daniel is waiting for input.
Per discussion on the bug, there's an unresolved issue with the code as stored in Gerrit. Tim tried cloning this, and wasn't able to find some of the revisions that Daniel referred to. Last week, you mentioned that Daniel was going to send mail to the list about the Gerrit stuff, but I don't recall seeing that.
If you can get the code somewhere Tim can review it, he's ready to look at it.
- Changeset https://gerrit.wikimedia.org/r/#/c/14295/, bug
https://bugzilla.wikimedia.org/show_bug.cgi?id=38705 about handling sites. The idea is to migrate from the "interwiki" table to the new "Sites" facility. RobLa mentioned two weeks ago that Chad seems to be working in a similar direction, but we haven't seen comments yet. No discussion is ongoing or any substantial feedback was received here as well, and it seems somewhat stuck.
I'd strongly suggest starting a separate thread on this mailing list about this (please change the subject line if you reply to this message). In short, this is a controversial approach, and is unclear why you're letting it block your work.
It looks like this page needs an update as well: http://www.mediawiki.org/wiki/Wikidata_deployment
One thing that was tacked on the wiki page without mention here or a bug created was the "Stick to that language" extension. Is that a hard requirement, or nice to have?
Rob
Hey,
... is unclear why you're letting it block your work.
The current "interwiki" related code in core has many assumptions baked in that prevent us from doing what we need to do in phase 1. For instance "language links" can only be made to sites with an id that is a language code. Since we're properly identifying sites across our clients, we're using global identifiers, which will be "enwiki" rather then "en", so this cannot work. That's only one of the many evil things in the current code.
this is a controversial approach
How so?
Is anyone suggesting building on top of the pile of crap we currently have would be better?
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Hi Rob,
thanks for the answers.
2012/8/9 Rob Lanphier robla@wikimedia.org:
It looks like this page needs an update as well: http://www.mediawiki.org/wiki/Wikidata_deployment
Thanks, I updated the page.
One thing that was tacked on the wiki page without mention here or a bug created was the "Stick to that language" extension. Is that a hard requirement, or nice to have?
We are currently investigating using the "Universal Language Selector" instead of "Stick to that language", and on first glance it looks good. If this remains like this, we will drop "Stick to that language". That is why I didn't list the corresponding open issues there. We'd be happy to go for ULS instead. We expect to have a resolution on that next week.
Cheers, Denny
Denny,
We are currently investigating using the "Universal Language Selector" instead of "Stick to that language", and on first glance it looks good. If this remains like this, we will drop "Stick to that language". That is why I didn't list the corresponding open issues there. We'd be happy to go for ULS instead. We expect to have a resolution on that next week.
Look forward to discussing ULS in more detail.
Best, Alolita
On Thu, Aug 9, 2012 at 8:48 AM, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
Hi Rob,
thanks for the answers.
2012/8/9 Rob Lanphier robla@wikimedia.org:
It looks like this page needs an update as well: http://www.mediawiki.org/wiki/Wikidata_deployment
Thanks, I updated the page.
One thing that was tacked on the wiki page without mention here or a bug created was the "Stick to that language" extension. Is that a hard requirement, or nice to have?
We are currently investigating using the "Universal Language Selector" instead of "Stick to that language", and on first glance it looks good. If this remains like this, we will drop "Stick to that language". That is why I didn't list the corresponding open issues there. We'd be happy to go for ULS instead. We expect to have a resolution on that next week.
Cheers, Denny
-- Project director Wikidata Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin Tel. +49-30-219 158 26-0 | http://wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 09.08.2012 16:49, Rob Lanphier wrote:
On Thu, Aug 9, 2012 at 6:54 AM, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
- Merging the Wikidata branch (ContentHandler) is still open, see
https://bugzilla.wikimedia.org/show_bug.cgi?id=38622. There has been no feedback in the last few weeks. Daniel is waiting for input.
Per discussion on the bug, there's an unresolved issue with the code as stored in Gerrit. Tim tried cloning this, and wasn't able to find some of the revisions that Daniel referred to.
That would be strange, a straight fresh clone works fine for me, the revisions are in the log.
Tim, please confirm that you are unable to see the changes I mentioned when you just switch to the Wikidata branch on an up to date working copy of core, ignoring Gerrit.
Also, dennyb added direct links to the respective commits on gitweb. They are there. Gerrit just doesn't know about it. And the shortlogs on gitweb are strange.
Last week, you mentioned that Daniel was going to send mail to the list about the Gerrit stuff, but I don't recall seeing that.
I investigated the problem and reported my findings on bugzilla. There isn't muchz to say except "gerrit doesn't know about direct pushes" and "gitweb is confusing".
If you can get the code somewhere Tim can review it, he's ready to look at it.
Well, it's in the git repo. Everyone in the team is using that branch for development and testing, they'd notice if important changes were missing. So i'm confident that it really *is* there.
-- daniel
On Thu, 09 Aug 2012 06:54:03 -0700, Denny Vrandečić denny.vrandecic@wikimedia.de wrote:
Hi all,
[...]
- Changeset https://gerrit.wikimedia.org/r/#/c/14295/, bug
https://bugzilla.wikimedia.org/show_bug.cgi?id=38705 about handling sites. The idea is to migrate from the "interwiki" table to the new "Sites" facility. RobLa mentioned two weeks ago that Chad seems to be working in a similar direction, but we haven't seen comments yet. No discussion is ongoing or any substantial feedback was received here as well, and it seems somewhat stuck. [...]
I hope this helps, Cheers, Denny
I would like some more information on this. The bug doesn't appear to even have the correct link for a discussion on this.
Redoing our interwiki code to deal with some mistakes we made in storage was something I was hoping to do. So if this is something hoping to replace the interwiki system I'd like to look over what the plan and overall idea is with this to make sure we don't repeat the same mistakes.
Hey,
So if this is something hoping to replace the interwiki system I'd like
to look over what the plan and overall idea is with this to make sure we don't repeat the same mistakes.
Please have a look at the patch on gerrit then. Feedback is much appreciated :) https://gerrit.wikimedia.org/r/#/c/14295/
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
On Thu, 09 Aug 2012 09:12:16 -0700, Jeroen De Dauw jeroendedauw@gmail.com wrote:
Hey,
So if this is something hoping to replace the interwiki system I'd like to look over what the plan and overall idea is with this to make sure we don't repeat the same mistakes.
Please have a look at the patch on gerrit then. Feedback is much appreciated :) https://gerrit.wikimedia.org/r/#/c/14295/
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Looking over the code it does seem we're repeating the same issues that exist with the current interwiki system I was planning to eliminate when I moved includes/Interwiki.php to includes/interwiki/Interwiki.php and put this on my endless to-do list.
The issue I was trying to deal with was storage. Currently we 100% assume that the interwiki list is a table and there will only ever be one of them. But this counters multiple facts about interwikis in practice: - We have a default set of interwiki links. Because we use a database instead of flat files we end up inserting stuff on installation. As a result when something changes eg: Wikimedia supports https:// and now all links are supposed to be protocol-relative. We have hundreds of wikis all using outdated interwiki rules even after they upgrade MediaWiki because interwiki links are only inserted by software on installation, they are not taken directly from the software map. - In practice we don't want one interwiki map. In projects like Wikimedia we actually usually want two or three. We want a global shared list of interwikis so that [[Wikipedia:]] [[commons:]] etc... work on every project. We want a shared list of interwikis for each project (ie: Wikipedias, Wiktionaries, etc...), primarily so that [[en:]] [[es:]] etc... language links are not duplicated, since these can't be global but also there may be some interwiki links that apply to some projects but not others. And sometimes we also want a wiki-local interwiki list because some communities want to add links to sites that other wikis don't. Or we may want to localize a link. And we end up writing absolutely horrible hacks we shouldn't have to because implementation is ignorant of reality.
I had planned to do a few primary things to the system: - Drop the notion of the interwiki list simply being a database table. Multiple class implementations were going to make it possible to have database backed interwiki lists, file backed interwiki lists (multiple formats), etc... - Drop the single-list handling and add allow a list of multiple interwiki sources to be configured from a wg variable. Together it would mean that our default list of interwiki links would no longer be stored in the interwiki table and instead would be read directly from our source code where cleaning up the urls would nicely update all wikis when they upgrade. And it would mean that it would be easy to setup multiple interwiki list sources for wikis. Such as a global interwiki database, a project one, and a local one. And it would be possible to use simple text based file backed interwiki lists so that people don't need to mess with sql.
---- But it looks like the new sites code is also focused around a single list of database backed sites.
((Also, while there are a number of really interesting ideas, sorry to say it but some of the code already triggers that "Must rewrite!" mood rather than thinking of small incremental tweaks))
Also anything in this area really needs to think of our lack of user interface. If we rewrite this then we absolutely must include a UI to view and edit this in core. By rewriting it we ditch every hack trying to make it easy to control the interwiki list and only make the problem worse. The notes on synchronizing with wikidata look interesting. But this kind of thing absolutely has to be user-friendly and multi-wiki friendly at a core level, not only for wikis using wikidata. ---- I think some of this stuff is a bit large to discuss in code review or email. I'd like to do this RfC style, listing everything we need from different perspectives so we can come up with something that doesn't need to be redone yet again.
Originally I was focused around taking interwiki dependence out-of the database. But the talk of synchronization and other things in the code has me thinking of other things like a database table as a final index (like pagelinks, etc...), fetching lists, siteinfo, etc... from other sites, and other things. So I have a feeling that the best thing we come up with will probably look different than what either of us started with.
Firstly though, I probably won't be able to come up with a good idea without a good understanding of Wikidata's role in all this: - I would like to understand what Wikidata needs out of interwiki/sites and what it's going to do with the data - I'd also like to know if Wikidata plans to add any interface that will add/remove sites
If we do this hastily I think we may also miss a very good chance to make fixing bug 11 and bug 10237 much more sanely possible.
bug 39199 also covers a thought on linking in pages I've been thinking about.
[bug 11] https://bugzilla.wikimedia.org/show_bug.cgi?id=11 [bug 10237] https://bugzilla.wikimedia.org/show_bug.cgi?id=10237 [bug 39199] https://bugzilla.wikimedia.org/show_bug.cgi?id=39199
Hey,
Daniel, thanks for your input.
TL;DR at the bottom :)
The issue I was trying to deal with was storage. Currently we 100% assume
that the interwiki list is a table and there will only ever be one of them.
Yes, we are not changing this. Having a more flexible system might or might not be something we'd want in MediaWiki. We do not need it in Wikidata though. The changes we're making here do not seem to affect this issue at all, so you can just as well implement it later on.
In practice we don't want one interwiki map. In projects like Wikimedia
we actually usually want two or three.
.. And sometimes we also want a wiki-local interwiki list because some
communities want to add links to sites that other wikis don't.
This we are actually tacking, although in a different fashion then you propose. Rather then having many different lists of sites to maintain, we have split sites from their configuration. The list of sites is global and shared by all clients. Their configuration however is local. So if wiki a wants to use site x as interwikilink with prefix foobar, wiki b wants to use it with prefix baz and wiki c does not want to use it as interwikilink at all, this is perfectly possible. This split and associated generalization our changes bring add a lot of flexibility compared to the current system and remove bad assumptions currently baked in.
Also anything in this area really needs to think of our lack of user
interface. If we rewrite this then we absolutely must include a UI to view and edit this in core.
Again, this is not something we're touching at all, or want to touch, as we don't need it. Personally I think I'd be great to have such facilities, and it makes sense to add these after the backend has been fixed. I'd be happy to work with you on this (or leave it entirely up to you) once we got the relevant rewrite work done.
By rewriting it we ditch every hack trying to make it easy to control the
interwiki list and only make the problem worse.
Our change will not drop any existing functionality. I will make sure there are tools/facilities at least as good (and probably better) then the current ones.
I would like to understand what Wikidata needs out of interwiki/sites and
what it's going to do with the data
We need this for our "equivalent links", which consist out of a global site id and a page. Right now we do not have consistent global ids, in fact we don't have global ids. We just have local ids that happen to be similar everywhere (while one might not want this, but is pretty much forced to right now), which must be language codes in order to be "languagelinks" or (better named) "equivalent links". Also, right now, all languagelinks are interwikilinks (wtf) - we want to be able to have "equivalent links" without then also being interwiki links!
I'd also like to know if Wikidata plans to add any interface that will
add/remove sites
The backend will have an interface to do this, but we're not planning on any API modules or UIs. The backend will be written keeping in mind people will want those though, so it ought to be easy to add them later on.
So to wrap up: I don't think there is any conflict between what we want to do (if you disagree, please provide some pointers). You can make your changes later on, and will have a much more solid base to work on then now.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
On 12-08-09 12:00 PM, Jeroen De Dauw wrote:
Hey,
Daniel, thanks for your input.
TL;DR at the bottom :)
The issue I was trying to deal with was storage. Currently we 100%
assume that the interwiki list is a table and there will only ever be one of them.
Yes, we are not changing this. Having a more flexible system might or might not be something we'd want in MediaWiki. We do not need it in Wikidata though. The changes we're making here do not seem to affect this issue at all, so you can just as well implement it later on.
In practice we don't want one interwiki map. In projects like
Wikimedia we actually usually want two or three.
.. And sometimes we also want a wiki-local interwiki list because some
communities want to add links to sites that other wikis don't.
This we are actually tacking, although in a different fashion then you propose. Rather then having many different lists of sites to maintain, we have split sites from their configuration. The list of sites is global and shared by all clients. Their configuration however is local. So if wiki a wants to use site x as interwikilink with prefix foobar, wiki b wants to use it with prefix baz and wiki c does not want to use it as interwikilink at all, this is perfectly possible. This split and associated generalization our changes bring add a lot of flexibility compared to the current system and remove bad assumptions currently baked in.
I think we're going to need to have some of this and the synchronization stuff in core. Right now the code has nothing but the one sites table. No repo code so presumably the only implementation of that for awhile will be wikidata. And if parts of this table is supposed to be editable in some cases where there is no repo but non-editable then I don't see any way for an edit ui to tell the difference.
I'm also not sure how this synchronization which sounds like one-way will play with individual wikis wanting to add new interwiki links.
Also anything in this area really needs to think of our lack of user
interface. If we rewrite this then we absolutely must include a UI to view and edit this in core.
Again, this is not something we're touching at all, or want to touch, as we don't need it. Personally I think I'd be great to have such facilities, and it makes sense to add these after the backend has been fixed. I'd be happy to work with you on this (or leave it entirely up to you) once we got the relevant rewrite work done.
By rewriting it we ditch every hack trying to make it easy to
control the interwiki list and only make the problem worse.
Our change will not drop any existing functionality. I will make sure there are tools/facilities at least as good (and probably better) then the current ones.
I'm talking about things like the interwiki extensions and scripts that turn wiki tables into interwiki lists. All these things are written against the interwiki table. So by rewriting and using a new table we implicitly break all the working tricks and throw the user back into SQL.
I would like to understand what Wikidata needs out of
interwiki/sites and what it's going to do with the data
We need this for our "equivalent links", which consist out of a global site id and a page. Right now we do not have consistent global ids, in fact we don't have global ids. We just have local ids that happen to be similar everywhere (while one might not want this, but is pretty much forced to right now), which must be language codes in order to be "languagelinks" or (better named) "equivalent links". Also, right now, all languagelinks are interwikilinks (wtf) - we want to be able to have "equivalent links" without then also being interwiki links!
I like the idea of table entries without actual interwikis. The idea of some interface listing user selectable sites came to mind and perhaps sites being added trivially even automatically. Though if you plan to support this I think you'll need to drop the NOT NULL from site_local_key.
Actually, another thought makes me think the schema should be a little different. site_local_key probably shouldn't be a column, it should probably be another table. Something like site_local_key (slc_key, slc_site) which would map things like en:, Wikipedia:, etc... to a specific site. I can see wikis wanting to use multiple interwiki names for the same site. In fact I'm pretty sure this already happens with the existing interwiki table. We just create duplicate rows. But you want global ids so I really don't think you want data duplication like that to happen.
I'd also like to know if Wikidata plans to add any interface that
will add/remove sites
The backend will have an interface to do this, but we're not planning on any API modules or UIs. The backend will be written keeping in mind people will want those though, so it ought to be easy to add them later on.
So to wrap up: I don't think there is any conflict between what we want to do (if you disagree, please provide some pointers). You can make your changes later on, and will have a much more solid base to work on then now.
I think I need to understand the plans you have for synchronization a bit more. - Where does Wikidata get the sites - What synchronizes the data - What is the repo like. Also what it it based off of. Is this wikis syncing from another wiki's sites table or does Wikidata have a real set of data the sites table gets based off of. - Is this one-way synchronization or multiway.
synchronization, treatment of the table (whether it's an index of something else or first class data), and editing/UIs for editing are a set of things where you can get in the way of the ability to do the others later if you don't think of them all up front.
Our old interwiki table was treated as first-class data and was simple data that was easy to create an edit interface for. As a result it's hard to do any synchronization for since we didn't plan for it. Likewise if we design a sites table focused on synchronizing data and treatment of the table as simultaneous first-class data with some of it treated like an index. We can easily come up with something that is going to get in the way of the consistency needed for a UI.
One of our options might be to treat sites like an index of data built from other sources just like pagelinks. Wikidata can act as a repo, the sites code can build from multiple sources with Wikidata being the first, and when a UI comes into play the UI can create it's own list of sites and that can be used as a source for the building of the sites table. ---- Heh, it probably doesn't help that this is making my abstract revision idea come up and make me want to have the UI depend off of that.
Cheers
-- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. --
Btw if you really want to make this an abstract list of sites dropping site_url and the other two related columns might be an idea. At first glance the url looks like something standard that every site would have. But once you throw something like MediaWiki into the mix with short urls, long urls, and an API the url really becomes type specific data that should probably go in the blob. Especially when you start thinking about other custom types.
wikitech-l@lists.wikimedia.org