Hi, I checked the dumps of the Italian Wikipedia at http://dumps.wikimedia.org/itwiki/ and I found there are 6 dumps available, the latest 2010-Jun-27, the oldest 2010-Mar-02.
My question is: are older dumps stored somewhere? Is it possible to get them? Even if this involves running some script on old database dumps, that would be ok. (I posted this question at http://meta.wikimedia.org/wiki/Talk:Data_dumps#Oldest_Wikipedia_dump_availab... as well.) Thanks in advance!
-- -- Paolo Massa Email: paolo AT gnuband DOT org Blog: http://gnuband.org
There are older dumps that we will be getting ahold of, but we don't have them ourselves right this second. What time period are you looking for?
Ariel
Στις 19-07-2010, ημέρα Δευ, και ώρα 10:30 +0200, ο/η paolo massa έγραψε:
Hi, I checked the dumps of the Italian Wikipedia at http://dumps.wikimedia.org/itwiki/ and I found there are 6 dumps available, the latest 2010-Jun-27, the oldest 2010-Mar-02.
My question is: are older dumps stored somewhere? Is it possible to get them? Even if this involves running some script on old database dumps, that would be ok. (I posted this question at http://meta.wikimedia.org/wiki/Talk:Data_dumps#Oldest_Wikipedia_dump_availab... as well.) Thanks in advance!
--
Paolo Massa Email: paolo AT gnuband DOT org Blog: http://gnuband.org
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I wanted to conduct a longitudinal analysis so having data going back in time up to the first day of wikipedia would be totally awesome! Even at time windows of one year would be enough. And it would be great to have them for different wikipedias (en, de, it, ...) Thanks!
P.
On Mon, Jul 19, 2010 at 4:19 PM, Ariel T. Glenn ariel@wikimedia.org wrote:
There are older dumps that we will be getting ahold of, but we don't have them ourselves right this second. What time period are you looking for?
Ariel
Στις 19-07-2010, ημέρα Δευ, και ώρα 10:30 +0200, ο/η paolo massa έγραψε:
Hi, I checked the dumps of the Italian Wikipedia at http://dumps.wikimedia.org/itwiki/ and I found there are 6 dumps available, the latest 2010-Jun-27, the oldest 2010-Mar-02.
My question is: are older dumps stored somewhere? Is it possible to get them? Even if this involves running some script on old database dumps, that would be ok. (I posted this question at http://meta.wikimedia.org/wiki/Talk:Data_dumps#Oldest_Wikipedia_dump_availab... as well.) Thanks in advance!
--
Paolo Massa Email: paolo AT gnuband DOT org Blog: http://gnuband.org
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
2010/7/19 paolo massa paolo@gnuband.org:
I wanted to conduct a longitudinal analysis so having data going back in time up to the first day of wikipedia would be totally awesome! Even at time windows of one year would be enough. And it would be great to have them for different wikipedias (en, de, it, ...)
Why can't you just use the latest full dump? Each dump contains the full history of all articles. Do you need to gather data on deleted articles or something?
Thanks Gregor and yes, you are right. I didn't think about your suggestion before, sorry. The fact is that I wrote a script running on the pages-meta-current.xml because it is much smaller and manageable but, you are right: I can use the revision of the page I'm interested that is in pages-meta-history.xml
Thanks for the suggestion!
P.
On Mon, Jul 19, 2010 at 8:59 PM, Aryeh Gregor Simetrical+wikilist@gmail.com wrote:
2010/7/19 paolo massa paolo@gnuband.org:
I wanted to conduct a longitudinal analysis so having data going back in time up to the first day of wikipedia would be totally awesome! Even at time windows of one year would be enough. And it would be great to have them for different wikipedias (en, de, it, ...)
Why can't you just use the latest full dump? Each dump contains the full history of all articles. Do you need to gather data on deleted articles or something?
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 07/20/2010 09:51 AM, paolo massa wrote:
Thanks Gregor and yes, you are right. I didn't think about your suggestion before, sorry. The fact is that I wrote a script running on the pages-meta-current.xml because it is much smaller and manageable but, you are right: I can use the revision of the page I'm interested that is in pages-meta-history.xml
If you're only interested in a small number of pages, you can get an up-to-date "mini dump" through Special:Export. See http://meta.wikimedia.org/wiki/Help:Export and http://www.mediawiki.org/wiki/Export for details.
Alternatively, you can also fetch page histories through the API: http://www.mediawiki.org/wiki/API:Query_-_Properties#revisions_.2F_rv
Yep, thanks to your email, I realized a "mini dump" extracted randomly is enough for my purposes. Thanks Ilmari!
P.
On Tue, Jul 20, 2010 at 1:25 PM, Ilmari Karonen nospam@vyznev.net wrote:
On 07/20/2010 09:51 AM, paolo massa wrote:
Thanks Gregor and yes, you are right. I didn't think about your suggestion before, sorry. The fact is that I wrote a script running on the pages-meta-current.xml because it is much smaller and manageable but, you are right: I can use the revision of the page I'm interested that is in pages-meta-history.xml
If you're only interested in a small number of pages, you can get an up-to-date "mini dump" through Special:Export. See http://meta.wikimedia.org/wiki/Help:Export and http://www.mediawiki.org/wiki/Export for details.
Alternatively, you can also fetch page histories through the API: http://www.mediawiki.org/wiki/API:Query_-_Properties#revisions_.2F_rv
-- Ilmari Karonen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org