Did anybody implement in MediaWiki a link rot checker for external
links? Here is a theory for how it could work:
When a new page is saved and the parser detects a "http:" pattern,
the external link is stored in a separate database table, with a
pointer to the wiki page where it was harvested. At regular
intervals, all external links are tried (HTTP GET) by a background
process and the success or failure rate is recorded. If a link
becomes unavailabe (HTTP ERROR) during three consequtive fetch
attempts, it gets listed on a special page of possibly broken
external links. Broken links from the same website could be
grouped together. Maybe the whole site is broken, has moved or
has been internally reorganized.
Lars Aronsson (lars(a)aronsson.se)
Aronsson Datateknik - http://aronsson.se
I have added an option to Special:Export that will add a list of all
contributors of a page to the XML output. The list is distinct (each
contributor mentioned only once). I also expanded the XML converter to
use this. (Brion et al.: please put the new Special:Export on the live
For my local test site, where the new Special:Export is running, it will
now add a list of all contributors to the output (IPs are omitted). It
will even add people who worked on a template that is used in the
Next stop: OpenOffice .ODT output...
At svwiki, there is a user account that has made no edits. However, the user
account sends wikimail - nasty ones. I have so far recieved three of them,
and this is in no way unusual. IP-check is being performed to see if the
account is a sockpuppet of another registered user.
Is it possible to block the users ability to send wikimail? A block doesn't
do that, blocked users can still send wikimail.
[repost, sorry if it ends duplicated]
it seems to me that there are some inconsistencies between at least the
page and revision tables, in the 20060303 enwiki dump.
The first problematic page would be page_id 12, Anarchism (sorry for the
raw mysql formatting):
| page_id | page_namespace | page_title | page_restrictions |
page_counter | page_is_redirect | page_is_new | page_random |
page_touched | page_latest | page_len |
| 12 | 0 | Anarchism | |
5252 | 0 | 0 | 0.786172332974311 |
20060303031540 | 41982999 | 67537 |
which indicates a revision # 41982999.
But there is no line with rev_id=41982999 in the revision table.
(these can be verified grepping for 41982999 directly in
enwiki-20060303-pages-articles.xml.bz2 and in
- am I missing something here ?
- it might be that the revision has changed between the dumps of those 2
tables (page has been edited)
- it ends in empty pages (i.e. with the usual stub text), for ~ 5% of
the pages (that seems huge, but I don't see where the problem lies)
- is it a temporary problem (I don't recall getting so many empty
articles with earlier dumps) ?
- is there a simple way to fix it ? (if no better idea emerges, I will
try to fix the page_latest column in the page table by doing a lookup on
rev_page in the revision table - is it right ?)