jayvdb created this task. jayvdb added a subscriber: jayvdb. jayvdb added a project: pywikibot-core. Restricted Application added subscribers: Aklapper, pywikipedia-bugs.
TASK DESCRIPTION linktrail was only exposed in the API in v1.21. We need to determine how we'll support pre v1.21.
TASK DETAIL https://phabricator.wikimedia.org/T97630
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: pywikipedia-bugs, jayvdb, Aklapper
jayvdb added a comment.
Copying a comment from https://gerrit.wikimedia.org/r/#/c/207179/3/pywikibot/family.py,cm
If we dig up the history of linktrails, we may be able to deprecate the family definitions without _much_ loss of functionality for older versions, and *increase* our support for older versions at the same time.
We'll need to look at any changes to regex in family.py to see if the commit messages give clues for specific choices made by previous pywikibot contributors.
The values are defined in the language files, but could be overridden by MediaWiki messages https://www.mediawiki.org/wiki/Manual:MediaWiki_architecture#Localizing_mess... *however* I believe that overriding linktrail was using a MediaWiki: message was disabled for performance reasons.
Some wikis still have the MediaWiki: message, even thought it was not used, so that could be a fallback. https://fr.wikipedia.org/wiki/MediaWiki:Linktrail fr.wikipedia.org/w/api.php?action=query&meta=allmessages&ammessages=linktrail on wmf wikis, often these messages have been deleted
- https://de.wikipedia.org/wiki/MediaWiki:Linktrail - https://de.wikipedia.org/w/api.php?action=query&meta=allmessages&amm...
As the value in those language files changed over time, our static hard-wired linktrail definitions in the Family class will be wrong on some older sites. So, what we have is not perfect, and we may be able to build an alternative which is also not perfect, but requires less maintenance.
The link trail was previously always quite close to 'unicode word', however there was a lot of problems with using pcre's 'unicode' functionality, which is why custom sets of permitted letters were added to the link trail per language.
If the python re unicode word matching is similar to the custom sets of letters in the mediawiki language files, it could be good enough as a generic fallback for pre 1.21
TASK DETAIL https://phabricator.wikimedia.org/T97630
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: pywikipedia-bugs, jayvdb, Aklapper
jayvdb added a comment.
Another fallback strategy would be to store MW version language specific information (such as linktrail), on translatewiki , and fetch it from there. That way it is maintainable and reusable. We'd need to talk to the site maintainers about a page naming convention.
TASK DETAIL https://phabricator.wikimedia.org/T97630
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: pywikipedia-bugs, jayvdb, Aklapper
jayvdb added a comment.
The change to using a unicode regex was here: https://www.mediawiki.org/wiki/Special:Code/MediaWiki/36253 , brought about due to https://phabricator.wikimedia.org/T16512.
It would be interesting to see if that regex is 'close' to the effect of previous linktrail regex , as it might be usable as a default .
We will never get perfect parsing of old revisions unless we load the regex from the php source code of the relevant MW version used at the time of the revision. Which is an insane problem to solve and unlikely anyone cares about accuracy that much.
TASK DETAIL https://phabricator.wikimedia.org/T97630
REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>.
EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb Cc: pywikipedia-bugs, jayvdb, Aklapper
pywikipedia-bugs@lists.wikimedia.org