So while writing some methods for Wikidata, I discovered that locked wikis,
deleted wikis, and renamed wikis all go into the same family.obsolete dict.
In fact, its impossible to create a site object for a closed wiki (both in
trunk and rewrite):
>>> wikipedia.getSite('aa','wikipedia')
pywikibot.exceptions.NoSuchSite: Language aa in family wikipedia is obsolete
This isn't favorable behavior since it prevents read access as well, and
write access for those who have it (stewards).
I wrote a quick patch for rewrite which separated this into three different
groups: locked (readable but not writable), deleted (not readable), and
"obsolete" (renamed wikis and backwards compatibility).
Trying to create a site for a locked wiki works fine, but will throw an
error when you try to fetch an edit token. Creating a site for a deleted or
obsolete wiki will throw the same error as before, just with a little bit
more specific error message.
What do people think? This is a bit urgent since Wikidata support in trunk
is broken because of it:
>>> wikipedia.DataPage(wikipedia.getSite('en','wikipedia'), 'Main
Page').interwiki()
pywikibot.exceptions.NoSuchSite: Language ii in family wikipedia is obsolete
-- Legoktm
Hi everyone,
As you might know phase 1 of Wikidata (interwiki links) is live at a lot
of Wikipedia's and soon to be turned on for all Wikipedia's. Phase 2 is
next, that's basically about infobox data. We are going to need a lot of
clever bots to fill Wikidata. To make that possible Pywikipedia should
(properly) implement Wikidata. That way bot authors don't have to worry
or care about the inner workings of the Wikidata api, they just talk to
the framework. At the moment trunk has a first implementation that isn't
very clean and in the rewrite it's still missing.
Legoktm and I talked about this on irc. We need to have a proper data
model in Pywikipedia. Based on
https://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model_primer :
* WikibasePage is a subclass of Page and has some basic shared functions
for labels, descriptions and aliases
* ItemPage is a subclass of WikibasePage with some item specific
functions like claims and sitelinks (example
https://www.wikidata.org/wiki/Q256638)
* PropertyPage is a subclass of WikibasePage with some property specific
functions for the datatype (example
https://www.wikidata.org/wiki/Property:P22)
* QueryPage is a subclass of WikibasePage for the future query type
* Claim is a subclass of object for claims. Simplified: It's a property
(P22, father) attached to an item (Q256638, the princes) linking to
another item (Q380949, Willem IV)
You can get these pages like a normal page (site object + title), but
you probably also want to get them based on a Wikipedia page. For that
there is
https://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Princess%20Carolin…
. We should have a staticmethod itemByPage(Page) in which Page is
https://en.wikipedia.org/wiki/Princess_Carolina_of_Orange-Nassau and it
will give you the itemPage object for
https://www.wikidata.org/wiki/Q256638. Currently in trunk the DataPage
object has a constructor where you can give a page object and you'll get
the corrosponding dataPage. I don't think that's the way to do it
because it violates the data model and will get us in a lot of trouble
later on when other sites (like Commons) might implement the Wikibase
extension.
A WikibasePage should work the same as a normal page when it comes to
fetching data. It should have the initial version (just a title, no
content) and once you use a function that needs data (or you force it),
it will fetch all the data from Wikibase and caches it.
* For an item the data looks like
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q256638&format=…
* For a property the data looks like
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=P22&format=json
Parts of the data (description, aliases and labels) should be processed
in the get function of WikibasePage, other parts in ItemPage /PropertyPage
Based on the api we should probably have some generators:
* One or more generator that uses wbgetentities to (pre-)fetch objects
* A search generator that uses wbsearchentities
WikibasePage:
* Set/add/delete label (@property?)
* Set/add/delete description (@property?)
* Set/add/delete alias (@property?)
ItemPage
* Set/add/delete sitelink (@property?)
Claim logic
Not sure how we can use wbeditentity and wblinktitles
We took some notes on
https://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata/Rewrite_propo…
.
What do you think? Is this the right direction? Feedback is appreciated.
Maarten
Yes please with all these cases.
Best
Xqt
----- Ursprüngliche Nachricht -----
Von: Morten Wang
Gesendet: 01.03.2013 17:32
An: Pywikipedia discussion list
Betreff: Re: [Pywikipedia-l] Rewrite branch,issue with user pages in aliased namespace
Found another bug related to this issue:
page = pywikibot.Page(site, u"Usuário:Vitorvicentevalente");
page.exists();
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pywikibot/page.py", line 417, in exists
return self.site.page_exists(self)
File "pywikibot/site.py", line 1180, in page_exists
return page._pageid > 0
AttributeError: 'Page' object has no attribute '_pageid'
Let me know if it's easier to simply open a bug on SourceForge for this instead.
Regards,
Morten
On 28 February 2013 09:55, Morten Wang <nettrom(a)gmail.com> wrote:
[inadvertently only replied to xqt, sorry about double emails, thought I'd post to the list too]
Awesome work, thanks for the speedy bugfix!
I noticed that it appears to also affect PreloadingGenerator, I get these warnings:
WARNING: preloadpages: Query returned unexpected title 'Usuário:Vitorvicentevalente'
WARNING: preloadpages: Query returned unexpected title 'Usuário:Vitorvicentevalente/Etiquetas'
Is that something that's easy to fix too, or should I perhaps switch to not preloading the pages? Looks like the warning's thrown in site.py, preloadpages, line 1291.
Regards,
Morten
On 28 February 2013 04:31, <info(a)gno.de> wrote:
Fixes in r11139
best
xqt
----- Original Nachricht ----
Von: Morten Wang <nettrom(a)gmail.com>
An: Pywikipedia discussion list <pywikipedia-l(a)lists.wikimedia.org>
Datum: 27.02.2013 20:46
Betreff: Re: [Pywikipedia-l] Rewrite branch,
issue with user pages in aliased namespace
> Deleted the API cache and the cookie file, still get the same error.
> Making sure the site reloads the namespace info (by calling
> site._getsiteinfo(force=True)) didn't help either.
>
> It's line 1711 in site.py (method loadrevisions) that's throwing the
> exception, in case that wasn't already obvious.
>
>
> Regards,
> Morten
>
>
> On 27 February 2013 12:28, <info(a)gno.de> wrote:
>
> > You should delete the API cache first.
> >
> > xqt
> > ------------------------------
> > Von: Morten Wang
> > Gesendet: 27.02.2013 16:39
> > An: Pywikipedia discussion list
> > Betreff: [Pywikipedia-l] Rewrite branch,issue with user pages in aliased
> > namespace
> >
> >
> > Hi all,
> >
> > I've run into an interesting issue on Portuguese Wikipedia, with a user
> > page that's in the aliased user namespace:
> >
> > import pywikibot;
> > site = pywikibot.getSite('pt');
> > page = pywikibot.Page(site, u"Usuário:Vitorvicentevalente");
> > page.title();
> > u'Usu\xe1rio(a):Vitorvicentevalente'
> > page.get()
> > [NOTE: callback trace removed for brevity]
> > pywikibot.exceptions.Error: loadrevisions: Query on
> > [[pt:Usuário(a):Vitorvicentevalente]] returned data on
> > 'Usuário:Vitorvicentevalente'
> >
> > According to the API "Usuário" is a valid namespace alias[1]. Is there
> an
> > easy workaround or fix here?
> >
> > Also, I've noticed this is not an issue in trunk, it's just the rewrite
> > branch that produces this error.
> >
> >
> > Footnotes:
> > 1: ref:
> >
> http://pt.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=general|
> namespaces|namespacealiases
> >
> >
> > Regards,
> > Morten
> >
> >
> > _______________________________________________
> > Pywikipedia-l mailing list
> > Pywikipedia-l(a)lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> >
> >
>
>
> --------------------------------
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
You should delete the API cache first.
xqt
----- Ursprüngliche Nachricht -----
Von: Morten Wang
Gesendet: 27.02.2013 16:39
An: Pywikipedia discussion list
Betreff: [Pywikipedia-l] Rewrite branch,issue with user pages in aliased namespace
Hi all,
I've run into an interesting issue on Portuguese Wikipedia, with a user page that's in the aliased user namespace:
import pywikibot;
site = pywikibot.getSite('pt');
page = pywikibot.Page(site, u"Usuário:Vitorvicentevalente");
page.title();
u'Usu\xe1rio(a):Vitorvicentevalente'
page.get()
[NOTE: callback trace removed for brevity]
pywikibot.exceptions.Error: loadrevisions: Query on [[pt:Usuário(a):Vitorvicentevalente]] returned data on 'Usuário:Vitorvicentevalente'
According to the API "Usuário" is a valid namespace alias[1]. Is there an easy workaround or fix here?
Also, I've noticed this is not an issue in trunk, it's just the rewrite branch that produces this error.
Footnotes:
1: ref: http://pt.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=general…
Regards,
Morten