[Mediawiki-l] Bot for rewriting external links as wiki links?

Thu May 24 20:20:05 UTC 2012

Am 21.05.2012 18:55, schrieb Platonides:
> On 21/05/12 18:41, David J. Biesack wrote:
>> Is there a bot which will repair external URL coded links to the same wikinand rewrite them as wiki links?
>> For example, if MediaWikis were installed in www.example.com/wiki and pages contains
>> links using external link notation
>>    blah http://www.example.com/wiki/Some_page blah
>>    blah [http://www.example.com/wiki/Some_page some text] blah
>>    blah [http://www.example.com/wiki/Some_page#some_section some text] blah
>> then the "What links here" feature would not work because such links are not recognized.
>> The bot would replace those with wiki links:
>>   blah [[Some page]] blah
>>   blah [[Some page | some text]] blah
>>   blah [[Some page#some_section | some text]] blah or  blah {{there|some section|Some page | some text}} blah
>> In addition, if the link is to a section of the current page, i.e. on [[Some page]],
>> the second link would be updated as
>>   {{here|some section|some text}}
>> There may need to be some way to disable this (i.e. in help or project pages
>> that discuss link notation).
> You could perform it using pywikipedia and some ugly regex.
It's http://www.mediawiki.org/wiki/Manual:Pywikipediabot
and the regex shouldnt be that ugly. At least if you dont want to do 
everything in one strike. I'd start with
"\[http://www.example.com/wiki/(.*?)#(.*?) (.*?)\]" "[[\1#\2 | \3]]" (or 
"{{there|\2|\1 | \3}}" if you prefer; I dont know any way to check if 
the link is to a section on the current page though; maybe with 
Extension:ParserFunctions: {{#ifeq: {{NAMESPACE}}:{{PAGENAME}} | \1 | 
{{here|\2|\3}} and some kind of substitution (safesubst?); one of the 
guys on our wiki probably could do this)
"\[http://www.example.com/wiki/(.*?) (.*?)\]" "[[\1 | \2]]"
and last:
"http://www.example.com/wiki/(.*?)" "[[\1]]"

Since you can tell the bot which pages to work on, you can exclude pages 
in help and project namespaces by simply telling it to work on the 
article namespace:

replace.py -start:! -regex "\[http://www.example.com/wiki/(.*?)#(.*?) 
(.*?)\]" "[[\1#\2 | \3]]" -summary:"internal links format"

Maybe http://www.mediawiki.org/wiki/Extension:Replace_Text could do the 
trick too, but I havent tried that Extension yet.

