On Wed, Mar 4, 2009 at 5:05 AM, Robert Ullmann rlullmann@gmail.com wrote:
For every 1000th pageid, get the earliest rev and note the date and time. (If a given ID is missing, i.e. deleted, hunt around it, +1, -1, +2, -2 etc 'till you find one.)
To get the date and time for a particular pageid, interpolate between the next higher and lower 1000th. This should pretty much always get you the correct date, with some chance of it being off by one for pages created near midnight UTC.
That's a clever idea. As it turns out, using the stub dump wasn't bad; I was able to assign earliest revision dates and revision counts to the articles in WEX overnight.