Multichill created this task. Multichill assigned this task to Pywikibugs. Multichill added a subscriber: Multichill. Multichill added projects: pywikibot-core, Pywikibot-Wikidata. Multichill changed Security from none to none.
TASK DESCRIPTION At the moment HarvestRobot.treat() in harvest_template.py will always load the item (item.get()). This is quit inefficient if there is no data to import. The bot should look if there is (valid) data to import and if that's the case, load the item. If it's not the case, the bot shouldn't load the item. This should dramatically increase the speed of the bot.
TASK DETAIL https://phabricator.wikimedia.org/T76391
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
To: Pywikibugs, Multichill Cc: pywikipedia-bugs, Multichill, jayvdb
Multichill added a project: performance.
TASK DETAIL https://phabricator.wikimedia.org/T76391
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
To: Pywikibugs, Multichill Cc: pywikipedia-bugs, Multichill, jayvdb
jayvdb added a subscriber: jayvdb. jayvdb added a comment.
item.get is needed before item.claims can be accessed (on the next line).
We could 1) replace item.claims with a different API call that only gets the list of properties used on the item 2) extend option 1 to be a generic approach to lazy load item data 3) move "has claims for all properties" check further down in the process.
The problem with option 3 is that immediately after this check, harvest_template does a page.get() , and page.get() is probably more expensive than item.get(), at least on English Wikipedia where article text size exceeds typical wikidata item JSON size. This may not be true for smaller wikis where the average article text size is smaller (but I would expect it is true for most of the top 10 wikipedia)
TASK DETAIL https://phabricator.wikimedia.org/T76391
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
To: Pywikibugs, jayvdb Cc: pywikipedia-bugs, Multichill, jayvdb
jayvdb moved this task to Design discussions on the Pywikibot-Wikidata workboard.
TASK DETAIL https://phabricator.wikimedia.org/T76391
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
To: Pywikibugs, jayvdb Cc: pywikipedia-bugs, Multichill, jayvdb
Multichill added a project: Wikidata.
TASK DETAIL https://phabricator.wikimedia.org/T76391
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
To: Pywikibugs, Multichill Cc: pywikipedia-bugs, Multichill, jayvdb, Wikidata-bugs, aude, GWicke
pywikipedia-bugs@lists.wikimedia.org