I see several paths forward on this:
1) Make an existing protocol and client work. Whether ftp or rsync or http or scp, they think they're copying a tree of filles. a) Give people access to something that looks like a tree of folders, and just let them recurse as needed using "wget -m". Doesn't quite make me want to barf. 2) Make an existing protocol work, even if a new client is needed for optimal use. E.g. wget -m with an extra parameter that only shows the client new files since the date of the last sync. 3) Devise a new protocol. Call it "BCD" for "Big Copy of Data". a) I'm thinking that the client should have the capability of asking for files with timestamps in a given range. b) The client would then be able to keep a record of the timestamp ranges for which it is currently accurate. c) A file deletion event would have a timestamp. Once deleted, the file would be unavailable even if its timestamp was requested. d) Any change in filename becomes an edit event. e) The idea is that a client would never have to re-ask for a timestamp range again. ______________________________________ From: wikitech-l-bounces@lists.wikimedia.org [wikitech-l-bounces@lists.wikimedia.org] on behalf of Brion Vibber [brion@pobox.com] Sent: Monday, August 15, 2011 8:06 PM To: Wikimedia developers Subject: Re: [Wikitech-l] forking media files
A more customized protocol might end up better at that; offhand I'm not sure if rsync 3's protocol can be super convenient at that or whether something else would be needed.