You knew this was coming...
IBM today released the first public alpha of its HistoryFlow application. It comes with a plugin to suck content from a moinmoin wiki and view its history; they had a similar plugin for MediaWiki, and are happy to put it back in once we confirm that a few thousand people using this to suck out WP histories won't thrash the servers.
http://www.alphaworks.ibm.com/tech/historyflow http://www.alphaworks.ibm.com/tech/historyflow/download
You have to sign up for an IBM ID in order to download and run the program, but it's totally worth it; a *beautiful* bit of design work.
Mad props to Martin et al for getting this out the dorr.
even more impressive: http://researchweb.watson.ibm.com/history follow the links, especially ----> Gallery
Remark: no sign-in required for this Tom
Sj schrieb:
You knew this was coming...
IBM today released the first public alpha of its HistoryFlow application. It comes with a plugin to suck content from a moinmoin wiki and view its history; they had a similar plugin for MediaWiki, and are happy to put it back in once we confirm that a few thousand people using this to suck out WP histories won't thrash the servers.
http://www.alphaworks.ibm.com/tech/historyflow http://www.alphaworks.ibm.com/tech/historyflow/download
You have to sign up for an IBM ID in order to download and run the program, but it's totally worth it; a *beautiful* bit of design work.
Mad props to Martin et al for getting this out the dorr.
On Fri, 25 Mar 2005 19:35:26 -0500, Sj 2.718281828@gmail.com wrote:
You knew this was coming...
IBM today released the first public alpha of its HistoryFlow application. It comes with a plugin to suck content from a moinmoin wiki and view its history; they had a similar plugin for MediaWiki, and are happy to put it back in once we confirm that a few thousand people using this to suck out WP histories won't thrash the servers.
Assuming it sets its own user-agent string, we can surely block that in mod_rewrite (if they tell us in advance what the user-agent string is)?
Assuming it sets its own user-agent string, we can surely block that in mod_rewrite (if they tell us in advance what the user-agent string is)?
It seems apt to do this and make the recommendation that those who want to use this tool download old/cur and run them on their own machine (if changing the url is supported). We have more bandwidth than cpu after all. Likely speed up the tool severalfold as well resulting in a better use experience.
/brian
My impression is that their method of extracting history (they ask for the last 100 revisions of each article) would not be easy on the database. It's not a bandwidth issue. (SJ)
On Fri, 25 Mar 2005 19:23:46 -0700, Brian reflection@gmail.com wrote:
Assuming it sets its own user-agent string, we can surely block that in mod_rewrite (if they tell us in advance what the user-agent string is)?
It seems apt to do this and make the recommendation that those who want to use this tool download old/cur and run them on their own machine (if changing the url is supported). We have more bandwidth than cpu after all. Likely speed up the tool severalfold as well resulting in a better use experience.
/brian _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Assuming it sets its own user-agent string, we can surely block that in mod_rewrite (if they tell us in advance what the user-agent string is)?
Or IBM could implement a simple blacklist of sites. If someone tries to go to en.wikipedia.org it explains that site doesn't want the burden, and bounces them to the database download page instead.
wikitech-l@lists.wikimedia.org