On Thu, 26 May 2005, Kate Turner wrote:
Andy Rabagliati wrote in gmane.science.linguistics.wikipedia.technical:
On Wed, 25 May 2005, Kate Turner wrote:
i've started running an image dump for en.wp using a version of trickle with large-file support (the last one died after 2GB). if this works i'll set up regular image dumps again along with the db backups.
Can you set it up so it can be rsync'ed ?
do you want to rsync the tar file of all images, or the image directory as its stored on disk?
The latter.
i don't see any benefit from the latter, and the former is not something that's feasible right now, although it may be doable in future...
I keep an identical tree this side, and rsync does the rest.
Even if you change the tree arrangement, I can write a script to re-arrange this end. But .. dont :-)
If the archive has only changed by 5%, I only download 5%. I can ignore archive trees and rescaled thumbnails.
It will greatly save your bandwidth and mine, at the expense of some CPU.
we currently have a lot more problems in terms of CPU(/disk) use than bandwidth, particularly in the area of images & the dumps service.
Bandwidth is the priority in Africa. However, the magnitude of information means that even an out-of-date copy of wikipedia is very useable. My worry sometimes is that a newer SQL dump refers to newer pictures, where I have some perfectly appropriate ones here :-)
So perhaps I should use an SQL dump of the same vintage as the picture archive for 'least astonishment'.
And, one day, there could be a trickle-back of edits, maybe moderated at the remote end.
Eric has offered access to other (SQL ?) update methods, but I am busy on other things and have not had time to investigate.
However, rsync to me is understandable and optimal for my needs, and takes little of my time. And I'll just download SQL dumps.
Cheers, Andy!