https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
Web browser: --- Bug ID: 60206 Summary: site.preloadpages does not preload all links and templates Product: Pywikibot Version: core (2.0) Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: mpaa.wiki@gmail.com Classification: Unclassified Mobile Platform: ---
When in def preloadpages(self, pagelist, groupsize=50, templates=False, langlinks=False) templates=True and langlinks=True, not all lnks/templates are returned.
import pywikibot
site = pywikibot.Site('en', 'wikipedia') page = pywikibot.Page(site, 'Main Page')
for p in site.preloadpages([page], templates=True, langlinks=True): pass print 'p._templates', len(page._templates) print 'p._langlinks', len(page._langlinks)
They are actually more, see https://en.wikipedia.org/w/api.php?maxlag=5&format=jsonfm&rvprop=ids...
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #1 from Mpaa mpaa.wiki@gmail.com --- Retrieving 1 pages from wikipedia:en. p._templates 10 p._langlinks 10
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #2 from Merlijn van Deen valhallasw@arctus.nl --- The actual query used is https://en.wikipedia.org/w/api.php?maxlag=5&format=json&rvprop=ids%7...
i.e. maxlag: 5 format: json rvprop: ids|flags|timestamp|user|comment|content prop: revisions|info|categoryinfo|templates|langlinks titles: Main Page meta: userinfo indexpageids: action: query uiprop: blockinfo|hasmsg
it's clear not all results are returned (see the continue header), BUT according to Yuri, the continue header uses here is broken (this is https://www.mediawiki.org/wiki/API:Legacy_Query_Continue instead of https://www.mediawiki.org/wiki/API:Query#Continuing_queries).
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #3 from Mpaa mpaa.wiki@gmail.com --- Is it an option to migrate to https://www.mediawiki.org/wiki/API:Query#Continuing_queries? This supported only from MediaWiki version: ≥ 1.21.
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #4 from Merlijn van Deen valhallasw@arctus.nl --- After re-reading the Legacy Query Continue page, I think supporting that in this case is not a huge hassle - we don't use a generator, so there is no need to seperate the different query-continue parameters...
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #5 from Mpaa mpaa.wiki@gmail.com --- There are 2 issues. 1) query does not query-continue because self.continuekey is not recognized (see https://bugzilla.wikimedia.org/show_bug.cgi?id=55193) 2) even if it did, there would be multiple chunks yielded for each page and api.update_page() just record the last returned
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #6 from Gerrit Notification Bot gerritadmin@wikimedia.org --- Change 110067 had a related patch set uploaded by Mpaa: Bug 60206 - site.preloadpages does not preload all links and templates
https://gerrit.wikimedia.org/r/110067
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
Gerrit Notification Bot gerritadmin@wikimedia.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |PATCH_TO_REVIEW
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|PATCH_TO_REVIEW |RESOLVED CC| |info@gno.de Resolution|--- |FIXED
https://bugzilla.wikimedia.org/show_bug.cgi?id=60206
--- Comment #7 from Gerrit Notification Bot gerritadmin@wikimedia.org --- Change 110067 merged by jenkins-bot: Bug 60206 - site.preloadpages does not preload all links and templates
https://gerrit.wikimedia.org/r/110067
pywikipedia-bugs@lists.wikimedia.org