The Internet Archive wants to particularly make sure to archive pages that Wikipedians use as citations. A GSoC project last year got most of the way to that goal but never quite finished making the feed of new links for use by the Archive. Would anyone else like to take this up?
More information:
https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You could ask Kevin to make his Toolserver project a MMP or you could just write your own script.)
https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be moved into Git from Subversion.
http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&am... - there is a real hunger for this!
Really essential extension to finish and bring in prod! Unfortunately, no time to work on that :( Emmanuel
Le 18/11/2012 13:36, Sumana Harihareswara a écrit :
The Internet Archive wants to particularly make sure to archive pages that Wikipedians use as citations. A GSoC project last year got most of the way to that goal but never quite finished making the feed of new links for use by the Archive. Would anyone else like to take this up?
More information:
https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You could ask Kevin to make his Toolserver project a MMP or you could just write your own script.)
https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be moved into Git from Subversion.
http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&am...
- there is a real hunger for this!
On 18/11/12 12:36, Sumana Harihareswara wrote:
The Internet Archive wants to particularly make sure to archive pages that Wikipedians use as citations. A GSoC project last year got most of the way to that goal but never quite finished making the feed of new links for use by the Archive. Would anyone else like to take this up?
More information:
https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You could ask Kevin to make his Toolserver project a MMP or you could just write your own script.)
https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be moved into Git from Subversion.
http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&am...
- there is a real hunger for this!
Hi -- instead of the implementation suggested above, which seems to combine link discovery with its own archiving engine, how about just generating an RSS feed of external links present (or possibly just those newly inserted) in pages edited in the last (say) five minutes, for other entities such as the Internet Archive to consume?
This would only require soft state, would not require the WMF to fetch or store any external web content, with all of the related possible problems associated with web archiving (retries, security, copyright, legality...), and would not require the WMF to keep track of what resources had been archived: each external archive could do that for itself.
The guts of something like this could be written using only the http://www.mediawiki.org/wiki/API:Recentchanges and http://www.mediawiki.org/wiki/API:Exturlusage APIs.
It looks like Kevin's "cronscript" link above does something just like this already -- adapting its output to generate RSS, and caching its output to prevent massive CPU overhead on repeated calls, would surely be trivial.
Neil
On 18/11/12 13:36, Sumana Harihareswara wrote:
The Internet Archive wants to particularly make sure to archive pages that Wikipedians use as citations. A GSoC project last year got most of the way to that goal but never quite finished making the feed of new links for use by the Archive. Would anyone else like to take this up?
More information:
https://www.mediawiki.org/wiki/User:Kevin_Brown/ArchiveLinks
http://toolserver.org/~nn123645/toolserver-feed/cronscript.php (You could ask Kevin to make his Toolserver project a MMP or you could just write your own script.)
This is quite straightforward.
https://www.mediawiki.org/wiki/Extension:ArchiveLinks - would have to be moved into Git from Subversion.
This is the longer plan, which is harder to do right. Although I see code going in the right direction.
http://en.wikipedia.org/w/index.php?title=Wikipedia:Village_pump_(policy)&am...
- there is a real hunger for this!
I'd go solving the problem with the toolserver MMP. It can be improved later.
I see a potential problem of missing new content added to a page, though. I'm not sure how Kevin expected to handle it. It's possible that the archiver automatically recrawls then so it isn't needed (eg. IA vs WebCite).
wikitech-l@lists.wikimedia.org