[Pywikipedia-l] Wikidata and Pywikipedia

2 Mar 2013

Hi everyone,

As you might know phase 1 of Wikidata (interwiki links) is live at a lot 
of Wikipedia's and soon to be turned on for all Wikipedia's. Phase 2 is 
next, that's basically about infobox data. We are going to need a lot of 
clever bots to fill Wikidata. To make that possible Pywikipedia should 
(properly) implement Wikidata. That way bot authors don't have to worry 
or care about the inner workings of the Wikidata api, they just talk to 
the framework. At the moment trunk has a first implementation that isn't 
very clean and in the rewrite it's still missing.

Legoktm and I talked about this on irc. We need to have a proper data 
model in Pywikipedia. Based on 
https://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model_primer :
* WikibasePage is a subclass of Page and has some basic shared functions 
for labels, descriptions and aliases
* ItemPage is a subclass of WikibasePage with some item specific 
functions like claims and sitelinks (example 
https://www.wikidata.org/wiki/Q256638)
* PropertyPage is a subclass of WikibasePage with some property specific 
functions for the datatype (example 
https://www.wikidata.org/wiki/Property:P22)
* QueryPage is a subclass of WikibasePage for the future query type
* Claim is a subclass of object for claims. Simplified: It's a property 
(P22, father) attached to an item (Q256638, the princes) linking to 
another item (Q380949, Willem IV)

You can get these pages like a normal page (site object + title), but 
you probably also want to get them based on a Wikipedia page. For that 
there is 
https://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Princess%20Carolin…

. We should have a staticmethod itemByPage(Page) in which Page is 
https://en.wikipedia.org/wiki/Princess_Carolina_of_Orange-Nassau and it 
will give you the itemPage object for 
https://www.wikidata.org/wiki/Q256638. Currently in trunk the DataPage 
object has a constructor where you can give a page object and you'll get 
the corrosponding dataPage. I don't think that's the way to do it 
because it violates the data model and will get us in a lot of trouble 
later on when other sites (like Commons) might implement the Wikibase 
extension.

A WikibasePage should work the same as a normal page when it comes to 
fetching data. It should have the initial version (just a title, no 
content) and once you use a function that needs data (or you force it), 
it will fetch all the data from Wikibase and caches it.
* For an item the data looks like 
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q256638&…
* For a property the data looks like 
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=P22&for…
Parts of the data (description, aliases and labels) should be processed 
in the get function of WikibasePage, other parts in ItemPage /PropertyPage

Based on the api we should probably have some generators:
* One or more generator that uses wbgetentities to (pre-)fetch objects
* A search generator that uses wbsearchentities

WikibasePage:
* Set/add/delete label (@property?)
* Set/add/delete description (@property?)
* Set/add/delete alias (@property?)

ItemPage
* Set/add/delete sitelink (@property?)

Claim logic

Not sure how we can use wbeditentity and wblinktitles

We took some notes on 
https://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata/Rewrite_propo… 
.

What do you think? Is this the right direction? Feedback is appreciated.

Maarten

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

[Pywikipedia-l] Wikidata and Pywikipedia