Folks,
First, I have to apologize if this has been dealt with on the list before. Just point me in the right direction. =)
I am doing academic research into Wikipedia and am a stand still.
I need:
- · The complete history of three articles as some sort of text or flat file. - · A way to access more history for later research
I try to download the datadumps (pages-meta-history.xml.7zhttp://download.wikimedia.org/enwiki/20060518/enwiki-20060518-pages-meta-history.xml.7z) in the following browsers with the following results:
Firefox Dump stops at 4 gig
IE Dump is seen only as a corrupt 1.6 gig file
Opera Dump is seen only as a corrupt 1.6 gig file
Is there a way to ftp the file?
I can't get either single article histories or the whole thing and I am at a loss. This should be much easier than this, so I think I am missing something.
Any help would be appreciated. Thanks!!!
Mark
Mark Bell http://www.storygeek.com "The future is here...it's just not widely distributed." - Tim O'Reilly
On 6/8/06, Mark Bell typewritermark@gmail.com wrote:
I need:
- · The complete history of three articles as some sort of
text or flat file.
- · A way to access more history for later research
What do you mean "more history"? Once you have the complete history of the three articles, what more is there? :)
Is there a way to ftp the file?
Depending on what the articles are and how many histories there are, you could try something like viewing the first 500 entries in the history in a browser, then telling a program like Getright to download all the links. Depends what format you want the diffs (well, complete articles) in exactly. With a bit of fiddling you could probably generate all the URLs you need...
I was thinking about making a GreaseMonkey script to make this easier, but never did it. Sorry!
I can't get either single article histories or the whole thing and I am at a loss. This should be much easier than this, so I think I am missing something.
Don't think so.
Steve
Sorry to clarify.
"More history" means the complete history of more articles.
I am using a tool called "historyflow" that can read html so let me try the HTML solution.
Thanksso much.
M
On 6/8/06, Steve Bennett stevage@gmail.com wrote:
On 6/8/06, Mark Bell typewritermark@gmail.com wrote:
I need:
- · The complete history of three articles as some sort of
text or flat file.
- · A way to access more history for later research
What do you mean "more history"? Once you have the complete history of the three articles, what more is there? :)
Is there a way to ftp the file?
Depending on what the articles are and how many histories there are, you could try something like viewing the first 500 entries in the history in a browser, then telling a program like Getright to download all the links. Depends what format you want the diffs (well, complete articles) in exactly. With a bit of fiddling you could probably generate all the URLs you need...
I was thinking about making a GreaseMonkey script to make this easier, but never did it. Sorry!
I can't get either single article histories or the whole thing and I am
at a
loss. This should be much easier than this, so I think I am missing something.
Don't think so.
Steve _______________________________________________ Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 6/8/06, Mark Bell typewritermark@gmail.com wrote:
Sorry to clarify.
"More history" means the complete history of more articles.
I am using a tool called "historyflow" that can read html so let me try the HTML solution.
You might be interested in this: http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php
You can paste in a list of article and it gives you back various output formats.
The article data will be out of date, since that's pulling from the toolserver which isn't in sync with production. Sorry, I don't know when the last import was.
On 08/06/06, Jeremy Dunck jdunck@gmail.com wrote:
The article data will be out of date, since that's pulling from the toolserver which isn't in sync with production. Sorry, I don't know when the last import was.
The database for the English Wikipedia is being reimported as we speak, after which it'll be replicated again. So, while it'll take time to get it up to date, it will happen soon. :)
Rob Church
Jeremy Dunck wrote:
On 6/8/06, Mark Bell typewritermark@gmail.com wrote:
I am using a tool called "historyflow" that can read html so let me try the HTML solution.
You might be interested in this: http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php You can paste in a list of article and it gives you back various output formats.
I have installed the History-Flow tool at home but couldn't find any documentation on how to obtain articles from Wikipedia in an appropriate format...there was apparently going to be some sort of plug-in which was dropped because of DOS worries.
Which is the most appropriate format to choose from those available on Magnus' tool, and is there a HOW-TO which will show me how to load that up into History-Flow?
TIA HAND
Folks,
I have had success. I need to do a few more tests but then i shoud be able to post a whole historyflow tutorial on my site.
The pictures are amazing!
M
On 6/9/06, Phil Boswell phil.boswell@gmail.com wrote:
Jeremy Dunck wrote:
On 6/8/06, Mark Bell typewritermark@gmail.com wrote:
I am using a tool called "historyflow" that can read html so let me try the HTML solution.
You might be interested in this: http://tools.wikimedia.de/~magnus/wiki2xml/w2x.php You can paste in a list of article and it gives you back various output formats.
I have installed the History-Flow tool at home but couldn't find any documentation on how to obtain articles from Wikipedia in an appropriate format...there was apparently going to be some sort of plug-in which was dropped because of DOS worries.
Which is the most appropriate format to choose from those available on Magnus' tool, and is there a HOW-TO which will show me how to load that up into History-Flow?
TIA HAND
Phil
View this message in context: http://www.nabble.com/Researcher-needs-help-t1754475.html#a4791623 Sent from the Wikipedia Developers forum at Nabble.com.
Wikitech-l mailing list Wikitech-l@wikimedia.org http://mail.wikipedia.org/mailman/listinfo/wikitech-l
On 6/9/06, Mark Bell typewritermark@gmail.com wrote:
Folks,
I have had success. I need to do a few more tests but then i shoud be able to post a whole historyflow tutorial on my site.
The pictures are amazing!
Can't wait to see!
Steve
On Thu, Jun 08, 2006 at 07:17:43AM -0400, Mark Bell wrote:
Mark Bell http://www.storygeek.com "The future is here...it's just not widely distributed." - Tim O'Reilly
Kind of off-topic, but . . . I think William Gibson said that first.
William Gibson: "The future is already here. It's just not very evenly distributed."
Like that.
wikitech-l@lists.wikimedia.org