2009/5/31 sl contrib sl.contrib@googlemail.com:
Hi all, First post to the list. I've got a bunch of questions, and I hope this is the right place to ask them. I'm interested in the idea of wiki 'mirroring': updating a second wiki ('B') periodically with content from wiki A. (There's of course some discussion of this on the web, so I'm aware that there's been quite a bit of thinking on this already, but I couldn't quite find the solution I was looking for.) A first stab at mirroring would be to do a Special:Export on the whole of A, and then do a Special:Import on B. But this becomes impractical for larger wikis: Ideally, I just want to update what needs updating. The best way to do this would probably be something like list=recentchanges (going back to the date of last transfer). Of course this doesn't work, because recentchanges are are periodically purged, so cannot be used between arbitrary dates.
If you have control over wiki A, you can set $wgRCMaxAge to a higher value. You could also do the updates more often so there's never more than $wgRCMaxAge between them.
The log doesn't seem to record edits (is this correct?), so this can't be used to get a list of changes between two arbitrary dates.
Correct.
So, question 1: Is it possible to get a list of all changes (including edits) between two dates (in a single query)?
Only list=recentchanges, but you already knew that.
If one wanted the complete version history, then another way to do this would be to get all revisions since the last transfer made, i.e. something like: action=query&prop=revisions&revids=1450|1451|1452|...&rvprop=content (then transform xml to Special:Import format, and upload). Together with a query of the log, this would give you all changes. But suppose the wiki is very active or you don't have much bandwidth or you simply don't want the whole version history, but just the latest versions (since the last transfer). The only way I can see is to do something like this:
1. Fetch the list of namespaces 2. Get the list of revisions in each namespace (action=query&prop=revisions&generator=allpages for each namespace) 3. See what needs updating, and then fetch all the changed pages.
Question 2: Can you see a better way of doing this? Also, why won't generator=allpages work across namespaces? (I guess there my be a reason why that isn't possible to do easily.)
Because other parameters like apprefix don't work cross-namespace. Requests to make list=allpages work cross-namespace have been made in the past and denied because the benefits of the slight increase in convenience (there are few namespaces anyway) don't outweigh the complexity of preventing certain parameters from being used cross-namespace.
One way would be to try something like:
action=query&prop=revisions&generator=allpages&rvstart=20090521000000
but this doesn't work. So, my question 3: Do you know why this doesn't work?
This'll probably result in an error, since rvstart can't be used in multi-page mode.
I assume there isn't an efficient mysql query to accomplish this, or are there other reasons? Finally, I guess I am wondering whether there are people actively interested in discussing issues around wiki mirroring/synchronisation more. If so, what's the best mailing list for this? Sorry, the post got a bit longer than I expected - thanks for considering this! All the best, Bjoern
I think your best bet is to use list=recentchanges and update frequently.
Roan Kattouw (Catrope)