Jamie Morken wrote:
Hi,
What do you mean by "opening"?
enwiki pages-meta-history is hard due to its size, not because
Ariel or
Tomasz being more stupid than any volunteer.
I trust them to do it at least as well as a volunteer would.
Of course, if you can perform better I'm all for giving you a
shell to
fix it, and the scripts are there for improvements as well.
I wasn't aware that the dump scripts were publicly available, where can they be
downloaded from or are they part of mediawiki?
It is in
http://svn.wikimedia.org/viewvc/mediawiki/trunk/backup/
although the files look a bit old, so perhaps there are some uncommitted
changes?
/me looks for offenders
What do you
need exactly about the images? Which image dumps do you
want? Do you have enough terabytes to store them?
Dumps/Access has been given by request in the past to that data.
If it's not there it's because:
a) Those dumps would take a lot of space.
I don't think that is a valid reason, thumbnail dumps of all the
images from enwiki would probably be a smaller file than the current
enwiki pages-meta-history bz2 file.
We have thumbs on lots of sizes. Which size do you want the thumbs? It's
easy to tar all the images used on a wiki, since that's tracked in the
database, but not at all knowing which exact size was each of them used.
enwiki has a total of 858979 local files which sum 229 GB (and there's
still commmons). 2357967 unique images (37050694 uses) are in their
articles. Assuming 20Kb per image thumb (is that a good value?), that's
48 Gb, more than the 31.9 GB of the (really compressed)
pages-meta-history.xml.7z but we would need to agree. They would tie at
14 Kb.
Even if all thumbs were unrealistically small, 1Kb each, they would
still be several GB.
b) Nobody feels particulary interested in them.
I disagree, there has been a lot of interest in having image dumps
available for download. There was a discussion on this recently on the
xmldatadumps list, that basically concluded that subsets of images
(ie. enwiki thumbnails) would be useful.
I am unable to find it, although a thread like that somewhere rings a
bell to me.
There are wiki pages dedicated
to this topic of how to download images, this is because there are no
image dumps available. Is the wikimedia foundation interested to host
image dumps again? If they are maybe we can start a discussion on how
to make the script and what image dumps to start with.
cheers,
Jamie