The getReferences() function needs to be re-written due to the new change in the What links here page since the addition of '(inclusion)' marking things as templates. The reason is that the current regex will count that as a redirect.
I am in the current process of re-writing this function, but in case anyone wants to beat me to it, I suggest the following all encompassing regular expression to use:
re.compile('<li><a href=".*?" title=".*?">(.*?)</a> *(*(inclusion|redirect page)*)*.*?</li>')
group(1) will give you the title, and group(2) of the search will be either: '', 'inclusion', 'redirect page'
"Jason Y. Lee" jylee@cs.ucr.edu wrote:
The getReferences() function needs to be re-written due to the new change in the What links here page since the addition of '(inclusion)' marking
things
as templates.
I checked in a revision to getReferences() last night, but it only deals with a different recent change in the software (Special:Whatlinkshere now uses a "from" parameter in place of the formerly used "offset" parameter), not this one. So anyone who wants to deal with the issue raised by Jason ought to start with the version checked in on 03-Jan-06.
Russ
"Jason Y. Lee" jylee@cs.ucr.edu wrote:
I am in the current process of re-writing this function, but in case
anyone
wants to beat me to it, I suggest the following all encompassing regular expression to use:
re.compile('<li><a href=".*?" title=".*?">(.*?)</a>
*(*(inclusion|redirect page)*)*.*?</li>')
group(1) will give you the title, and group(2) of the search will be
either:
'', 'inclusion', 'redirect page'
This will only work in the English Wikipedia. In es.wikipedia.org, for example, group(2) would be "página redirigida" for a redirect page, although it is still "inclusion" (no accented character, interestingly) for a template inclusion.
On Fri, Jan 06, 2006 at 10:22:46AM -0500, Russell Blau wrote:
This will only work in the English Wikipedia. In es.wikipedia.org, for example, group(2) would be "página redirigida" for a redirect page, although it is still "inclusion" (no accented character, interestingly) for a template inclusion.
Wikibots-l mailing list Wikibots-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikibots-l
That's because it relies on the system message pages: [[MediaWiki:Isredirect]] for 'redirect page' [[MediaWiki:Istemplate]] for 'inclusion'
-- Jason Y. Lee AKA AllyUnion
"Jason Y. Lee" jylee@cs.ucr.edu wrote:
The getReferences() function needs to be re-written due to the new change in the What links here page since the addition of '(inclusion)' marking
things
as templates. The reason is that the current regex will count that as a redirect.
OK, I've checked in a patch that seems to work in my testing. The key is that a (redirect) link is followed by a <ul> tag, while an (inclusion) link is not. I've also updated the code to check for false-positive redirects; this deals with things like redirect templates (#REDIRECT [[Fubar]] {{R from misspelling}}), which MediaWiki incorrectly treats as a redirect to *every* page linked from the included template.
In fact, if you had a page that read "#REDIRECT [[Foo]] [[Bar]]", it would show up on [[Special:Whatlinkshere/Bar]] as a redirect, even though the link to [[Bar]] is invisible to users unless they try to edit the redirect page. Weird. This patch hopefully will ignore such items on the Whatlinkshere page, although at the cost of slower performance (more page loads).
If you find any bugs, please let me know (here or on en:User talk:RussBlau).
On Fri, Jan 06, 2006 at 12:33:52PM -0500, Russell Blau wrote:
OK, I've checked in a patch that seems to work in my testing. The key is that a (redirect) link is followed by a <ul> tag, while an (inclusion) link is not. I've also updated the code to check for false-positive redirects; this deals with things like redirect templates (#REDIRECT [[Fubar]] {{R from misspelling}}), which MediaWiki incorrectly treats as a redirect to *every* page linked from the included template.
In fact, if you had a page that read "#REDIRECT [[Foo]] [[Bar]]", it would show up on [[Special:Whatlinkshere/Bar]] as a redirect, even though the link to [[Bar]] is invisible to users unless they try to edit the redirect page. Weird. This patch hopefully will ignore such items on the Whatlinkshere page, although at the cost of slower performance (more page loads).
If you find any bugs, please let me know (here or on en:User talk:RussBlau).
Wikibots-l mailing list Wikibots-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikibots-l
Although, it's interesting to note it counts both as a redirect page and inclusion...
A vandal blanked out a page, and the bot ended up raising the page as a page that didn't exist... I don't remember, but was the functionality of the bot always like that?
Also, avar in IRC mentioned it might be better to determine non-existing pages via Special:Export.
For example: http://en.wikipedia.org/wiki/Special:Export/Non-existing_page
Has the identicial XML match as: http://en.wikipedia.org/wiki/Special:Export/Doesnotexist
And does not match: http://en.wikipedia.org/wiki/Special:Export/User:AllyUnion
Furthermore, this is universal for all languages and sites, at least in the fact that I've tested it on several languages and meta.
On Sat, Jan 07, 2006 at 04:11:44AM -0800, Jason Y. Lee wrote:
A vandal blanked out a page, and the bot ended up raising the page as a page that didn't exist... I don't remember, but was the functionality of the bot always like that?
-- Jason Y. Lee AKA AllyUnion _______________________________________________ Wikibots-l mailing list Wikibots-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikibots-l
I'm in the first stage of converting {{Infobox Country}} templates, and had a bad conversion create a page. I erased the contents of my bad page, and now I'm getting an EditConflict on that page...4 times in a row so far. I'm testing with "if ( not newpage.exists() ) or newpage.isEmpty():" before "newpage.put( newpage_text )". I wonder if there is a relationship between this discussion of Page.exists() incantation and the EditConflict. (User:SEWilco)
Jason Y. Lee wrote:
Also, avar in IRC mentioned it might be better to determine non-existing pages via Special:Export.
For example: http://en.wikipedia.org/wiki/Special:Export/Non-existing_page
Has the identicial XML match as: http://en.wikipedia.org/wiki/Special:Export/Doesnotexist
And does not match: http://en.wikipedia.org/wiki/Special:Export/User:AllyUnion
Furthermore, this is universal for all languages and sites, at least in the fact that I've tested it on several languages and meta.
On Sat, Jan 07, 2006 at 04:11:44AM -0800, Jason Y. Lee wrote:
A vandal blanked out a page, and the bot ended up raising the page as a page that didn't exist... I don't remember, but was the functionality of the bot always like that?
-- Jason Y. Lee AKA AllyUnion _______________________________________________ Wikibots-l mailing list Wikibots-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikibots-l
wikibots-l@lists.wikimedia.org