I'm using data from a snapshot of the English Wikipedia and would like to run a query similar to the following:
SELECT * FROM revision WHERE rev_id > some_rev_id;
Can I be confident that all revisions returned were saved after some_rev_id?
Thanks! -Aaron
P.S. I have considered using rev_timestamp and would like to avoid that if it is possible.
On 8/10/09 9:28 AM, Aaron L Halfaker wrote:
I'm using data from a snapshot of the English Wikipedia and would like to run a query similar to the following:
SELECT * FROM revision WHERE rev_id> some_rev_id;
Can I be confident that all revisions returned were saved after some_rev_id?
Yes, in the sense that they were added to the database afterwards.
No, in the sense that rev_timestamp may not always show a later date for a later rev_id:
* Page histories imported from pre-conversion UseModWiki archives * Anything imported via Special:Import * Anything undeleted before ar_rev_id column was added * Anything saved on a server that had a mis-configured clock
-- brion
On Mon, Aug 10, 2009 at 12:36 PM, Brion Vibberbrion@wikimedia.org wrote:
Yes, in the sense that they were added to the database afterwards.
No, in the sense that rev_timestamp may not always show a later date for a later rev_id:
- Page histories imported from pre-conversion UseModWiki archives
- Anything imported via Special:Import
- Anything undeleted before ar_rev_id column was added
- Anything saved on a server that had a mis-configured clock
Plus there's a race condition in generating the timestamps. They're generated slightly before the row is actually inserted, so if two revisions were saved at almost exactly the same time, it's possible for one to have a timestamp one second later but a lower id.
This is usually a safe-ish assumption, though, as long as occasional misordering is acceptable. Don't we generate next/previous revision links in some places based on rev_id?
On Mon, Aug 10, 2009 at 12:10 PM, Aryeh GregorSimetrical+wikilist@gmail.com wrote:
This is usually a safe-ish assumption, though, as long as occasional misordering is acceptable. Don't we generate next/previous revision links in some places based on rev_id?
In the diff view, for one. Weird ones like this are very common in old articles:
http://en.wikipedia.org/w/index.php?title=Bill_Clinton&oldid=238014&...
—C.W.
wikitech-l@lists.wikimedia.org