[Mediawiki-api] Revisions since certain date / wiki mirror

sl contrib sl.contrib at googlemail.com
Sun May 31 15:54:40 UTC 2009


Hi all,
First post to the list. I've got a bunch of questions, and I hope this is
the right place to ask them.

I'm interested in the idea of wiki 'mirroring': updating a second wiki ('B')
periodically with content from wiki A. (There's of course some discussion of
this on the web, so I'm aware that there's been quite a bit of  thinking on
this already, but I couldn't quite find the solution I was looking for.)

A first stab at mirroring would be to do a Special:Export on the whole of A,
and then do a Special:Import on B. But this becomes impractical for larger
wikis: Ideally, I just want to update what needs updating.

The best way to do this would probably be something like list=recentchanges
(going back to the date of last transfer). Of course this doesn't work,
because recentchanges are are periodically purged, so cannot be used between
arbitrary dates. The log doesn't seem to record edits (is this correct?), so
this can't be used to get a list of changes between two arbitrary dates.

So, question 1: Is it possible to get a list of all changes (including
edits) between two dates (in a single query)?

If one wanted the complete version history, then another way to do this
would be to get all revisions since the last transfer made, i.e. something
like:
action=query&prop=revisions&revids=1450|1451|1452|...&rvprop=content
(then transform xml to Special:Import format, and upload). Together with a
query of the log, this would give you all changes.

But suppose the wiki is very active or you don't have much bandwidth or you
simply don't want the whole version history, but just the latest versions
(since the last transfer). The only way I can see is to do something like
this:

   - 1. Fetch the list of namespaces
   - 2. Get the list of revisions in each namespace
   (action=query&prop=revisions&generator=allpages for each namespace)
   - 3. See what needs updating, and then fetch all the changed pages.


Question 2: Can you see a better way of doing this? Also, why won't
generator=allpages work across namespaces? (I guess there my be a reason why
that isn't possible to do easily.)

One way would be to try something like:

action=query&prop=revisions&generator=allpages&rvstart=20090521000000
but this doesn't work.

So, my question 3: Do you know why this doesn't work? I assume there isn't
an efficient mysql query to accomplish this, or are there other reasons?

Finally, I guess I am wondering whether there are people actively interested
in discussing issues around wiki mirroring/synchronisation more. If so,
what's the best mailing list for this?

Sorry, the post got a bit longer than I expected - thanks for considering
this!

All the best,
Bjoern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.wikimedia.org/pipermail/mediawiki-api/attachments/20090531/defce97c/attachment.htm 


More information about the Mediawiki-api mailing list