https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
Web browser: --- Bug ID: 55160 Summary: Page._getVersionHistory returns only a part of a history Product: Pywikibot Version: unspecified Hardware: All OS: All Status: ASSIGNED Severity: normal Priority: Unprioritized Component: General Assignee: Pywikipedia-bugs@lists.wikimedia.org Reporter: legoktm.wikipedia@gmail.com Classification: Unclassified Mobile Platform: ---
Originally from: http://sourceforge.net/p/pywikipediabot/bugs/1546/ Reported by: dixond Created on: 2012-11-28 13:00:50 Subject: Page._getVersionHistory returns only a part of a history Assigned to: xqt Original description: There is a bug in Page._getVersionHistory. It doesn't load the whole history it it is large. The problem in here (wikipedia.py): if len(result['query']['pages'].values()[0]['revisions']) < revCount: thisHistoryDone = True
I believe it should be as following: if not getAll and len(result['query']['pages'].values()[0]['revisions']) >= revCount: thisHistoryDone = True
Version.py: Pywikipedia trunk/pywikipedia/ (r10745, 2012/11/20, 13:03:05) Python 2.7.3 (default, Apr 10 2012, 23:31:26) [MSC v.1500 32 bit (Intel)] config-settings: use_api = True use_api_login = True unicode test: ok
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #1 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **priority**: 5 --> 8
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #2 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **priority**: 8 --> 5
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #3 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Are you sure that you have set getAll=True while invoking that method?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #4 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- - **assigned_to**: nobody --> xqt
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #5 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Yes, of course. It is quite obvious that the following code won't allow to load the rest of revisions by setting thisHistoryDone to True: if len(result['query']['pages'].values()[0]['revisions']) < revCount: thisHistoryDone = True
Am I missing anything?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #6 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- first of all _getVersionHistory() is an internal method and you shouldn't use it directly. Use getVersionHistory() instead. The the condition is quite right. Try the following statements:
import pywikibot as pwb p = pwb.Page('de', 'user talk:xqt') h = p.getVersionHistory(getAll=True) len(h)
which gives 4250 entries (yet).
Changing the condition will return 500 entries only.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #7 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Changing the condition still returns 4250 entries for me (have you missed the "not getAll and " part in my code?)
But if I use fullVersionHistory instead of getVersionHistory, it returns only 192 entries for me. I.e. try the following code:
import wikipedia as pywikibot p = pywikibot.Page('de', 'user talk:xqt') h = p.fullVersionHistory(getAll=True) print len(h)
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #8 from Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com --- Any updates? Are you able to reproduce this issue?
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
Kunal Mehta (Legoktm) legoktm.wikipedia@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://sourceforge.net/p/p | |ywikipediabot/bugs/1546
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |NEW CC| |info@gno.de
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #9 from Gerrit Notification Bot gerritadmin@wikimedia.org --- Change 105619 had a related patch set uploaded by Mpaa: (bug 55160) Page._getVersionHistory returns only a part of a history
https://gerrit.wikimedia.org/r/105619
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
Gerrit Notification Bot gerritadmin@wikimedia.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |PATCH_TO_REVIEW
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
Mpaa mpaa.wiki@gmail.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |mpaa.wiki@gmail.com
--- Comment #10 from Mpaa mpaa.wiki@gmail.com --- (In reply to comment #9)
Change 105619 had a related patch set uploaded by Mpaa: (bug 55160) Page._getVersionHistory returns only a part of a history
h = p.getVersionHistory(getAll=True) returns the full history.
h = p.fullVersionHistory(getAll=True) returns 192 entries (now more ...). Reason is that result might not be 'revCount' long also when 'query-continue' is returned, due to: {u'result':{u'*': u'This result was truncated because it would otherwise be larger than the limit of 12582912 bytes'}}
So it is not enough to check only that len() < revCount to declare that thisHistoryDone = True.
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
--- Comment #11 from Gerrit Notification Bot gerritadmin@wikimedia.org --- Change 105619 merged by jenkins-bot: (bug 55160) Page._getVersionHistory returns only a part of a history
https://gerrit.wikimedia.org/r/105619
https://bugzilla.wikimedia.org/show_bug.cgi?id=55160
xqt info@gno.de changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|PATCH_TO_REVIEW |RESOLVED Resolution|--- |FIXED
pywikipedia-bugs@lists.wikimedia.org