Heise just took down the H Online archive (the English-language version of Heise.de, a computer news site). This has broken a *huge* pile of reference links.
[{Special:LinkSearch]] only shows links in the wikitext - not links inside reference citation templates.
https://www.google.co.uk/search?q=site:en.wikipedia.org+link:h-online.com shows hundreds of links. Argh.
What I need to do is (a) find all the links (b) add archiveurl= (something on archive.org, which seems to have captured the whole site) and archivedate= .
Are there tools that do any of this job?
- d.
Hi David -
Funny you ask... there are not currently any solid ones afaik, but I've been talking with the Internet Archive about building out a bot and trying to achieve community consensus on ENWP to autoreplace deadlinks with archive.org ones. The IA has been crawling all new external links on all Wikimedia projects at least once every couple of hours for months, and has a strong interest in killing off literally all of our dead links. Unless something falls through, I should be bringing a more detailed plan up within maybe five or six weeks.
Best, Kevin Gorman
On Sun, Jan 26, 2014 at 4:10 PM, David Gerard dgerard@gmail.com wrote:
Heise just took down the H Online archive (the English-language version of Heise.de, a computer news site). This has broken a *huge* pile of reference links.
[{Special:LinkSearch]] only shows links in the wikitext - not links inside reference citation templates.
https://www.google.co.uk/search?q=site:en.wikipedia.org+link:h-online.com shows hundreds of links. Argh.
What I need to do is (a) find all the links (b) add archiveurl= (something on archive.org, which seems to have captured the whole site) and archivedate= .
Are there tools that do any of this job?
- d.
WikiEN-l mailing list WikiEN-l@lists.wikimedia.org To unsubscribe from this mailing list, visit: https://lists.wikimedia.org/mailman/listinfo/wikien-l
On 27 January 2014 00:17, Kevin Gorman kgorman@gmail.com wrote:
Funny you ask... there are not currently any solid ones afaik, but I've been talking with the Internet Archive about building out a bot and trying to achieve community consensus on ENWP to autoreplace deadlinks with archive.org ones. The IA has been crawling all new external links on all Wikimedia projects at least once every couple of hours for months, and has a strong interest in killing off literally all of our dead links. Unless something falls through, I should be bringing a more detailed plan up within maybe five or six weeks.
Yes, I knew you were cooking up something :-) I was just surprised it wasn't the sort of task that people had already automated, or written a nice toolserver bot for, or something.
The ones that use {{cite web}} and variants are pretty simple: you just whack in archiveurl= and archivedate= (preferably as close as possible to any cited accessdate=) ... then double-check by eye, of course. It just gets very tedious and error-prone doing it by hand, cut'n'pasting URLs into the middle of the computer guacamole we lovingly euphemise as "wikitext". VE isn't a much happier method.
- d.
On 26 January 2014 19:38, David Gerard dgerard@gmail.com wrote:
On 27 January 2014 00:17, Kevin Gorman kgorman@gmail.com wrote:
Funny you ask... there are not currently any solid ones afaik, but I've been talking with the Internet Archive about building out a bot and
trying
to achieve community consensus on ENWP to autoreplace deadlinks with archive.org ones. The IA has been crawling all new external links on
all
Wikimedia projects at least once every couple of hours for months, and
has
a strong interest in killing off literally all of our dead links. Unless something falls through, I should be bringing a more detailed plan up within maybe five or six weeks.
Yes, I knew you were cooking up something :-) I was just surprised it wasn't the sort of task that people had already automated, or written a nice toolserver bot for, or something.
The ones that use {{cite web}} and variants are pretty simple: you just whack in archiveurl= and archivedate= (preferably as close as possible to any cited accessdate=) ... then double-check by eye, of course. It just gets very tedious and error-prone doing it by hand, cut'n'pasting URLs into the middle of the computer guacamole we lovingly euphemise as "wikitext". VE isn't a much happier method.
Concur that it's a great idea....but perhaps a WMF Tools labs tool, instead of toolserver? Running battle, I know - but so many of the tools I have greatly valued over the years are now pretty much useless, or at least unreliable.
In any case - it would be great to have a bot that did a fair bit of that, but it should probably be manually run to ensure proper matching, kind of like AWB.
Risker/Anne
On 1/27/14, 1:10 AM, David Gerard wrote:
What I need to do is (a) find all the links (b) add archiveurl= (something on archive.org, which seems to have captured the whole site) and archivedate= .
This bot used to do something along those lines on en.wiki, but hasn't been active in some months: https://en.wikipedia.org/wiki/User:DASHBot/Dead_Links
Perhaps it or something similar could be revived?
-Mark
On 3 February 2014 14:31, Delirium delirium@hackish.org wrote:
On 1/27/14, 1:10 AM, David Gerard wrote:
What I need to do is (a) find all the links (b) add archiveurl= (something on archive.org, which seems to have captured the whole site) and archivedate= .
This bot used to do something along those lines on en.wiki, but hasn't been active in some months: https://en.wikipedia.org/wiki/User:DASHBot/Dead_Links Perhaps it or something similar could be revived?
That looks like pretty much what I was after.
Though I ended up fixing a hundred-odd pages by hand for the case of h-online.com :-)
- d.