Dear Ariel,
0) WP-MIRROR
WP-MIRROR 0.6 now works with dumps from
your.org. I am turning my
attention to the other mirror sites.
1) LATEST
I read with interest the thread about `latest' directories that began
with
<http://lists.wikimedia.org/pipermail/xmldatadumps-l/2012-October/000610.html>.
I have some additional questions.
The mirror sites at C3SL and Masaryk Univ. do not have a `latest'
directory in the project directories that I looked at. Compare for
example:
(shell)$ rsync dumps.wikimedia.your.org::wikimedia-dumps/enwiki/ | tail -n 2
drwxr-xr-x 242 2013/01/04 07:52:13 20130102
drwxr-xr-x 1101 2013/01/03 18:48:34 latest
(shell)$ rsync wikipedia.c3sl.ufpr.br::wikipedia/enwiki/ | tail -n 2
drwxr-xr-x 61440 2012/11/10 10:47:05 20121101
drwxr-xr-x 61440 2012/12/10 09:21:34 20121201
WP-MIRROR looks for the `latest' directory on the assumption that any
links found there point to complete files (i.e. no partials). Whereas
files found in dated directories may be partials. For example, the
most recent `imagelinks':
This file is complete:
(shell)$ rsync dumps.wikimedia.your.org::wikimedia-dumps/enwiki/20121201/
| grep imagelinks
-rw-r--r-- 356437362 2012/12/01 07:08:54 enwiki-20121201-imagelinks.sql.gz
This file is a partial:
(shell)$ rsync dumps.wikimedia.your.org::wikimedia-dumps/enwiki/20130102/
| grep imagelinks
-rw-r--r-- 20 2013/01/02 07:47:35 enwiki-20130102-imagelinks.sql.gz
The `latest' link points to the complete file:
(shell)$ rsync -a
dumps.wikimedia.your.org::wikimedia-dumps/enwiki/latest/ | grep image
lrwxrwxrwx 40 2013/01/02 03:52:49 enwiki-latest-image.sql.gz
-> ../20130102/enwiki-20130102-image.sql.gz
So I am wondering what algorythm I should use if I want WP-MIRROR to
pull dump files from C3SL and Masaryk U. Can you help with the
following questions?
2) C3SL
In the absence of a `latest' directory, can I be sure that all the
files found there are complete files (i.e. not partials)? Is the
mirroring process atomic?
3) Masaryk Univ.
Several issues: a) No `latest' directories; b) no `enwiki'; and c)
most recent dumps date from November:
(shell)$ rsync ftp.fi.muni.cz::pub/wikimedia/zuwiki/ | tail -n 2
drwxr-xr-x 4096 2012/10/23 14:04:02 20121023
drwxr-xr-x 4096 2012/11/05 15:02:33 20121105
Will they be catching up?
Sincerely Yours,
Kent