Benjavalero created this task.
Benjavalero added a subscriber: Benjavalero.
Benjavalero added a project: pywikibot-core.
Benjavalero changed Security from none to none.
TASK DESCRIPTION
I am proposing in this issue a patch to improve the algorithm to replace text with exceptions. I have found that the current algorithm process the exception regexes many times, and this can be avoided. In my tests, with my patch in long text pages I obtain significant performance improvements.
I understand this patch touches a critical part of the pywikibot code, so unit tests should be provided, but I am Python newbie and I am afraid that testing in Python is a little out of my knowledge without a bit of guidance.
I hope you could test the patch and confirm my results.
TASK DETAIL
https://phabricator.wikimedia.org/T85037
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Benjavalero
Cc: Aklapper, Benjavalero, jayvdb, pywikipedia-bugs
jayvdb added a comment.
Well, ideally neither is 'moved' as such, but -yesterday and -recentchanges are replaced with standard pagegenerator arguments where possible to achieve the same result, e.g." -uploadlog" and generic pagegenerators / filters are used where not possible.
For providing backwards compatibility for '-yesterday' , you can do something like
if arg == '-yesterday':
gen.handleArg('-uploadlog')
Then you need to restrict the date range of the upload log to 'yesterday'. To restrict the date, add an EdittimeFilterPageGenerator after calling getCombinedGenerator.
However currently EdittimeFilterPageGenerator will continue to consume all entries outside of the date range(which could take forever); we need an option to tell it to stop when it first encounters a date outside of the range.
Replacing the custom -recentchanges might be a bit harder to do, so tackle that after you've done -yesterday, or I was planning to create it as another task. We could replace it with standard arguments "-recentchanges -ns:6", but the delay=120 means the current implementation is not fetching a lot of records from the start of the recentchanges log.
(and give it a decent module docstring)
TASK DETAIL
https://phabricator.wikimedia.org/T67192
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb
Cc: pywikipedia-bugs, valhallasw, Multichill, Ricordisamoa, jayvdb, Liuxinyu970226, Daviskr
Daviskr added a comment.
Should `recentChanges` be moved as well or just `uploadedYesterday`?
Also, should `uploadedYesterday` be renamed to `UploadedYesterdayPageGenerator` or something more generic that would allow the user to specify a range?
TASK DETAIL
https://phabricator.wikimedia.org/T67192
REPLY HANDLER ACTIONS
Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Daviskr
Cc: pywikipedia-bugs, valhallasw, Multichill, Ricordisamoa, jayvdb, Liuxinyu970226, Daviskr