Hi, in my research,I just want to do some statistical analysis based on the page history and other related metadata (I do not really need to content of the pages). The whole database dump is really too big. is stub-meta-history.xml.gz the right file to download in this case. I tried to download some of the versions of the file. They are either very small in size or cannot be opened. Can you tell me which file should be downloaded. Thank you
____________________________________________________________________________________ Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/
On 10/15/07, Jun liu junliug@yahoo.com wrote:
Hi, in my research,I just want to do some statistical analysis based on the page history and other related metadata (I do not really need to content of the pages). The whole database dump is really too big. is stub-meta-history.xml.gz the right file to download in this case. I tried to download some of the versions of the file. They are either very small in size or cannot be opened. Can you tell me which file should be downloaded. Thank you
You haven't stated what information you are looking for... It is likely that the information you want only exists in the full-text and may not be easily extracted even with the full-text.
Hi, where can I find any sucessful enwiki dumpt. None of the following dumps really works. How can I access previous dumps? 20070802/ 2007-Oct-01 02:07:14 - Directory 20070908/ 2007-Sep-12 18:49:51 - Directory 20071001/ thanks
____________________________________________________________________________________ Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase. http://farechase.yahoo.com/
On Mon, 2007-10-15 at 21:18 -0700, Jun liu wrote:
Hi, where can I find any sucessful enwiki dumpt. None of the following dumps really works. How can I access previous dumps?
This question came up a couple of weeks ago, and I'm having trouble understanding why people can't get the enwiki dumps working.
I just pulled the full enwiki dump, extracted it, imported it into MySQL, which produced 5,654,236 rows of content and took 38 minutes to complete.
I wrapped the svn Mediawiki around that, and am able to use the wiki just fine within my application space without any problems.
What trouble did you have with your version of the same process?
Jun liu wrote:
Hi, where can I find any sucessful enwiki dumpt. None of the following dumps really works. How can I access previous dumps? 20070802/ 2007-Oct-01 02:07:14 - Directory 20070908/ 2007-Sep-12 18:49:51 - Directory 20071001/ thanks
http://download.wikimedia.org/enwiki/latest/
@Brion: Why does http://download.wikimedia.org/enwiki/20070716/enwiki-20070716-pages-meta-his... 404? It downloaded fine some weeks ago. There's no new superseeding version.
On Tue, 2007-10-16 at 10:29 +0200, Platonides wrote:
@Brion: Why does http://download.wikimedia.org/enwiki/20070716/enwiki-20070716-pages-meta-his... 404? It downloaded fine some weeks ago. There's no new superseeding version.
It is for precisely this reason, I wrote a perl script that handles this for me by querying the rss feed for each language and project, extracts the target link referenced in it, and fetches THAT link instead.
There was a time a month or two ago, when quite a few of the linked dumps were pointing to non-existant content.. Brion and others fixed those in short order though.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Platonides wrote:
@Brion: Why does http://download.wikimedia.org/enwiki/20070716/enwiki-20070716-pages-meta-his... 404? It downloaded fine some weeks ago. There's no new superseeding version.
Older dumps get expired out as there is limited disk space available.
- -- brion vibber (brion @ wikimedia.org)
"Brion Vibber" brion@wikimedia.org wrote in message news:4714BBEC.9020108@wikimedia.org...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Platonides wrote:
@Brion: Why does
http://download.wikimedia.org/enwiki/20070716/enwiki-20070716-pages-meta-his tory.xml.7z
404? It downloaded fine some weeks ago. There's no new superseeding
version.
Older dumps get expired out as there is limited disk space available.
Even when there is no newer version to replace it??
- Mark Clements (HappyDog)
On 10/16/07, Brion Vibber brion@wikimedia.org wrote:
Platonides wrote:
@Brion: Why does http://download.wikimedia.org/enwiki/20070716/enwiki-20070716-pages-meta-his... 404? It downloaded fine some weeks ago. There's no new superseeding version.
Older dumps get expired out as there is limited disk space available.
I'm sure Internet Archive would be willing to host the dumps if you ask them nicely. Erik? Haven't you talked to them before about providing free disk space and hosting?
That would make oversight more ineffective...
Disk space is cheap: space is a poor reason. :)
On 10/16/07, Anthony wikimail@inbox.org wrote:
On 10/16/07, Brion Vibber brion@wikimedia.org wrote:
Platonides wrote:
@Brion: Why does
http://download.wikimedia.org/enwiki/20070716/enwiki-20070716-pages-meta-his...
404? It downloaded fine some weeks ago. There's no new superseeding
version.
Older dumps get expired out as there is limited disk space available.
I'm sure Internet Archive would be willing to host the dumps if you ask them nicely. Erik? Haven't you talked to them before about providing free disk space and hosting?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org http://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 10/16/07, Anthony wikimail@inbox.org wrote:
I'm sure Internet Archive would be willing to host the dumps if you ask them nicely. Erik? Haven't you talked to them before about providing free disk space and hosting?
They probably would. I'd be happy to make introductions if the tech team is interested.
Erik Moeller wrote:
On 10/16/07, Anthony wikimail@inbox.org wrote:
I'm sure Internet Archive would be willing to host the dumps if you ask them nicely. Erik? Haven't you talked to them before about providing free disk space and hosting?
They probably would. I'd be happy to make introductions if the tech team is interested.
Not particularly; as noted elsewhere in the thread, older dumps would include materials which have been deleted for copyright or privacy reasons, so we wouldn't want to be distributing very old dumps publicly anyway.
-- brion vibber (brion @ wikimedia.org)
Anthony wrote:
On 10/16/07, Brion Vibber brion@wikimedia.org wrote:
Older dumps get expired out as there is limited disk space available.
I'm sure Internet Archive would be willing to host the dumps if you ask them nicely. Erik? Haven't you talked to them before about providing free disk space and hosting?
Disk space is cheeeeeeeeeep, right? How much more could the dump servers use? I haven't contributed anything to the Foundation in a little while; I'd be more than willing to chip in a few hundred bucks for this cause, and I bet I'm not alone.
Jun liu wrote:
Hi, in my research,I just want to do some statistical analysis based on the page history and other related metadata (I do not really need to content of the pages). The whole database dump is really too big.
If you're just doing research, you can develop your methods on a dump of some smaller Wikipedia, such as Lithuanian. Then you can step up, borrow a more powerful computer, and see if the method also works on a dump of the Portuguese Wikipedia.
wikitech-l@lists.wikimedia.org