jayvdb created this task. jayvdb added subscribers: Evanontario, jayvdb, pywikipedia-bugs, Jsalsman, Halfak. jayvdb added a project: pywikibot-core. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION wikiwho currently depends on https://bitbucket.org/halfak/wikimedia-utilities , which is great at xml dump processing , with limited API support.
it would be useful to integrate wikiwho with pywikibot to work on live revisions from the wiki.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Halfak added a comment.
Eek! And that's an old unmaintained library! See https://github.com/halfak/MediaWiki-Utilities for the current version.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Halfak Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Halfak added a comment.
I also implement authorship tracking(http://pythonhosted.org/mediawiki-utilities/lib/persistence.html#mw-lib-pers...) and WikiWho's diffing strategy(http://pythonhosted.org/deltas/detection.html#module-deltas.detection.segmen...).
I can promise to respond to bugs and feature requests quickly. :)
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Halfak Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
jayvdb added a comment.
@halfak , wikiwho imports from 'wmf' , which appears to not exist in the current version. I can see a few other dependencies on the old version also. The current version (packaged at https://pypi.python.org/pypi/mediawiki-utilities) does look better, but it appears to not be backwards compatible. Sounds like the first step should be to upgrade the wikiwho code to work with the current mediawiki-utilities...? If so, we can create a new task for that. ;-)
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Halfak added a comment.
@jayvdb agreed. mediawiki-utilities is a compatibility breaking change with substantial improvements in performance.
Note also that mediawiki-utilities is only compatible with python3.x.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Halfak Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Evanontario added a comment.
If you give me some microtasks on this or the other thread (Accuracy Review) I'll be happy to do/attempt them @jayvdb. You told me to give you a poke if there were none created by now. I did notice that it looks like the primary mentor for Accuracy Review is concerned about whether the project is a 2-3 week contribution.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Evanontario Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
jayvdb closed blocking task T89764: Implement wikimedia-utilities Revision methods as "Declined".
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Jsalsman added a comment.
@halfak, can you show the code outline how to use http://pythonhosted.org/mediawiki-utilities/lib/persistence.html#mw-lib-pers... to obtain the age of a given unique word (for purposes of example let's say an article is titled "Economy of Jakarta" with a string in question being "GRDP (Gross Regional Domestic Product) was Rp. 566 trillion".) Does it start with dumps and annotate them, or do a WikiBlame-style binary search on revisions, for starters?
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jsalsman Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Halfak added a comment.
@Jsalsman
The libraries I linked to are at a higher level of abstraction than I think you are imagining. The implementation details (like using binary search and indexes) are up to you. But since you asked for some code, here's a simple strategy that would generate the answer on demand. Note that I reworked the example to use a real snippet from an article so I could run it to make sure it worked too.
from mw import api from mw.lib import persistence
# Initialize api session and page state session = api.Session("https://en.wikipedia.org/w/api.php") page_state = persistence.State()
# Query for the page's revisions rev_docs = session.revisions.query(titles={"Antoine Beauvilliers"}, properties={"content", "user", "timestamp", "sha1"}, direction="newer")
# Use the page_state to process the revisions (and store the revision's timestamps) last_tokens = None for rev_doc in rev_docs: tokens, _, _ = page_state.process(rev_doc.get("*", ""), rev_doc['timestamp'], checksum=rev_doc['sha1']) last_tokens = tokens
# This gnarely bit of code is just used to find the specific tokens we are looking for expected = "Of humble parentage, Beauvilliers worked his way up from kitchen boy" len_expected = len(persistence.tokenization.wikitext_split(expected)) match_ranges = [(i, i+len_expected) for i in range(len(last_tokens)) if "".join(t.text for t in last_tokens[i:i+len_expected]) == expected]
# Print out the tokens and the first revision they appeared in for start, end in match_ranges: for token in last_tokens[start:end]: if len(token.text.strip()) == 0: continue print("'{0}' was added {1}".format(token.text, token.revisions[0]))
The output looks like this:
'Of' was added 2013-05-24T20:07:27Z 'humble' was added 2013-06-01T05:39:29Z 'parentage' was added 2013-05-24T20:07:27Z ',' was added 2013-05-24T20:07:27Z 'Beauvilliers' was added 2013-05-24T20:07:27Z 'worked' was added 2014-08-29T08:26:55Z 'his' was added 2014-08-29T08:26:55Z 'way' was added 2014-08-29T08:26:55Z 'up' was added 2014-08-29T08:26:55Z 'from' was added 2014-08-29T08:26:55Z 'kitchen' was added 2014-08-29T08:26:55Z 'boy' was added 2014-08-29T08:26:55Z
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Halfak Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Jsalsman closed this task as "Resolved". Jsalsman claimed this task. Jsalsman added a comment.
@halfak, thank you so much for this; what a tremendous help!
@jayvdb, since that clearly works for our purposes, I'm resolving this. And since it clearly means there is no need to touch the dumps, I will take all the dump-related requirements and mentions out of https://phabricator.wikimedia.org/T89416
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jsalsman Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
jayvdb reopened this task as "Open". jayvdb added a comment.
As far as I know, wikiwho functionality (whether by wikiwho or wikimedia-utilities) has not been integrated into pywikibot, which is what this task is about. Reopening.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jsalsman, jayvdb Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
jayvdb removed a blocked task: T89416: Out-of-date fact and statistics identification and review (was "accuracy review").
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jsalsman, jayvdb Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Jsalsman placed this task up for grabs. Jsalsman set Security to None.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jsalsman Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Jsalsman added a comment.
Is it better to do this in pywikibot or mediawiki-utilities?
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Jsalsman Cc: Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
Ricordisamoa added a subscriber: Ricordisamoa.
TASK DETAIL https://phabricator.wikimedia.org/T89763
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Ricordisamoa Cc: Ricordisamoa, Halfak, Jsalsman, jayvdb, Aklapper, Evanontario, pywikipedia-bugs
pywikipedia-bugs@lists.wikimedia.org