On 25.04.2013 21:43, Yuri Astrakhan wrote:
CCing wikidata.
I don't think this is a good approach. We shouldn't be breaking API just because there is a new under-the-hood feature (wikibase).
This is not a breaking change to the MediaWiki API at all. The hook did not exist before. Things not using the hook keep working exactly as before.
Only once Wikidata starts using the hook, the behavior of *Wikipedias* API changes (from including external links to not including them per default).
One could actually see this as fixing a bug: currently, "external" language links are mis-reported as being "local" language links. This is being fixed.
From the API client's perspective, it should work as before, plus there should be an extra flag notifying if the sitelink is stored in wikidata or locally. Sitelinks might be the first, but not the last change - e.g. categories, etc.
The "external" links could be included per default by ApiQueryLangLinks; I did not do this for performance reasons (considering the hook makes paging a lot more difficult, and may result in a lot more database queries).
Anomie said he'd think about making this less costly.
As for the implementation, it seems the hook approach might not satisfy all the usage scenarios:
- Given a set of pages (pageset), give all the sitelinks (possibly filtered with
a set of wanted languages). Rendering page for the UI would use this approach with just one page.
You want the hook to work on a more complex structure, changing the link sets for multiple pages?
Possible, but I don't think it's helpful. For any non-trivial set of pages, we'd be in danger of running out of memory, and some kind of chunking would be needed, complicating things even more. Also, implementing a handler for a hook that handles such a complex structure is quite painful and error prone. Assembling the result from multiple calls to a simple hook seems to make more sense to me, which is what I implemented in Idfcdc53af.
- langbacklinks - get a list of pages linking to a site.
Yes, that would only consider locally defined links. As I understand, this query is mainly used to find and fix broken links. So it makes sense to only include the ones that are actually defined (and fixable) locally.
- filtering based on having/not having specific langlink for other modules. E.g.
list all pages that have/don't have a link to a site X.
Same as above.
- alllanglinks (not yet implemented, but might be to match corresponding
allcategories, ...) - list all existing langlinks in the site.
Same as above. I believe the sensible semantics it "list all langlinks *defined* on the site". At least per default.
For alllanglinks, I can imagine how to do this efficiently for the wikibase case, but not for a generic hook that can manipulate sitelinks.
We could debate the need of some of these scenarios, but I feel that we shouldn't be breaking existing API.
Again: it doesn't. The API reports what is defined locally, and stored in the API locally, as before. Wikidata starting to use the new hook may break expectations in the data returned by Wikipedia's API, but that's a separate issue, I think.
-- daniel