I'll slip
that into my updates; updating an RSS .xml file would not be
difficult at all.
*-rss.xml files are now slipped into the 'latest' subdirectory with the
new dumps being generated now.
Cool, there are RSS feeds now up at
http://download.wikimedia.org/dawiktionary/latest/
(for example).
Is there any chance maybe of a summary feed ( e.g.
http://download.wikimedia.org/wiki-name/index-rss.xml ),
which includes the results for the individual files, including failures? (The current
per-file feeds only update on success I think, but knowing that some of the dumps
failed is useful information if you need 2 or 3 files from the same backup set).
For example, I count around 24 RSS feeds for dawiktionary, one for each of the 24 data
files, plus there's the one md5sum file. That's a lot of feeds, and it could be
nice
to have "one feed to rule them all", that showed at a glance all the backup
files
generated so far.
The trouble with a combination feed from a reader's perspective, is that they
probably
do not want a new entry to be added for every file type (that would result in 24 new
entries, effectively spamming the feed-reader). Maybe something more like
http://download.wikimedia.org/dawiktionary/20070327/ (i.e. a big table, with each row
having the name of the file as a link to download the file, the description, whether
it succeeded or failed, the file size, time it was created, and possibly the MD5dum).
The trouble with that is that some wikis take a long time to backup, and some are really
quick. For example,
http://download.wikimedia.org/enwiki/20070206/ took around 5 weeks
from start to finish; whereas dawiktionary took 53 seconds to backup. So for enwiki,
an intermediate report of which files are done would be useful, whereas for dawiktionary,
there's probably no point. As a very rough possibility, since the index pages already
include an ETA, maybe if after finishing a file, and if the ETA for the next file is more
than 12 hours into the future, then the index feed gets updated with a new entry showing
the current backup status? (Plus there would have to be an entry added after the last
file finishes). That would give one entry for dawiktionary, and for enwiki's backup
from
20070206, it would give:
* one entry after generating oldimage.sql.gz
* one entry 2 days later after generating all-titles-in-ns0.gz
* one entry another 2 days later after generating pages-articles.xml.bz2
* one entry 16 hours later after generating pages-meta-current.xml.bz2
* one entry 3 weeks later after generating pages-meta-history.xml.bz2
* and then one entry a week later after generating pages-meta-history.xml.7z
So that's 6 entries for enwiki, spread out over roughly 5 weeks. And the first one
could
be dropped maybe, and then the 5 entries correspond to the files that I suspect are
used most: all-titles, pages-articles, pages-meta-current, and the pages-meta-history
files.
Then again, maybe just watching those the 5 RSS feeds for those files, and ignoring
everything else, is the simplest solution :-/
I don't know: maybe a summary feed is practical, maybe it's not. It's
certainly more
complicated to describe than I would like :-( Essentially, it's trying to walk a fine
line of giving people concise and complete and current notifications as quickly as
possible, whilst trying not to overwhelm them with messages.
All the best,
Nick.