Today I did some analysis over latest revisions on huwiki and there I stumbled on something that surprised me. I believed that revids were given sequentially, so that lower revision id implies an earlier date, and higher revision id implies a later date. Thus, all edits having id greater than 6.000.000 would be no older than august 2009 on huwiki. However, the following revids are anomalies to this, being set 5-6 years back in comparison to their surrounding revids:
8764880, 2004 8764883, 2005 8764884, 2005 8764885, 2005 8764886, 2005 8764887, 2005 8764904, 2004 8764905, 2004 8764906, 2005 8764907, 2005 8764908, 2005
Example: http://hu.wikipedia.org/w/index.php?title=Ornithopoda&oldid=8764883
I don't really want to ask anything, I hope I pointed out something interesting. However, if there be any comments on this, shot away. :)
M
Importing and some deletion related things (before rev_id was moved to the archive table) can cause a revision to get a higher rev_id than it should have
On Thu, Mar 17, 2011 at 9:02 AM, Mihajlo Andjelkovic < michael.angelkovich@gmail.com> wrote:
Today I did some analysis over latest revisions on huwiki and there I stumbled on something that surprised me. I believed that revids were given sequentially, so that lower revision id implies an earlier date, and higher revision id implies a later date. Thus, all edits having id greater than 6.000.000 would be no older than august 2009 on huwiki. However, the following revids are anomalies to this, being set 5-6 years back in comparison to their surrounding revids:
8764880, 2004 8764883, 2005 8764884, 2005 8764885, 2005 8764886, 2005 8764887, 2005 8764904, 2004 8764905, 2004 8764906, 2005 8764907, 2005 8764908, 2005
Example: http://hu.wikipedia.org/w/index.php?title=Ornithopoda&oldid=8764883
I don't really want to ask anything, I hope I pointed out something interesting. However, if there be any comments on this, shot away. :)
M
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/toolserver-l Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
On 17 mrt 2011,at 14:34 John wrote:
On Thu, Mar 17, 2011 at 9:02 AM, Mihajlo Andjelkovic <michael.angelkovich@gmail.com
wrote:
Today I did some analysis over latest revisions on huwiki and there I stumbled on something that surprised me. I believed that revids were given sequentially, so that lower revision id implies an earlier date, and higher revision id implies a later date. Thus, all edits having id greater than 6.000.000 would be no older than august 2009 on huwiki. However, the following revids are anomalies to this, being set 5-6 years back in comparison to their surrounding revids:
8764880, 2004 8764883, 2005 8764884, 2005 8764885, 2005 8764886, 2005 8764887, 2005 8764904, 2004 8764905, 2004 8764906, 2005 8764907, 2005 8764908, 2005
Example: http://hu.wikipedia.org/w/index.php?title=Ornithopoda&oldid=8764883
I don't really want to ask anything, I hope I pointed out something interesting. However, if there be any comments on this, shot away. :)
Importing and some deletion related things (before rev_id was moved to the archive table) can cause a revision to get a higher rev_id than it should have
Although 'should' is a relative and questionable word, I just want to point out that this is valid and expected behaviour, not a bug.
Revision-ids are assigned in order of which they enter the database table of public available revisions.
If I import a page from a different wiki it will get a fresh revision id, not the same id it had on the old wiki. Simply because the id it had on the old wiki is most likely already used on the new wiki.
There is no rule nor any intention to make the ids represent a timeline, there is the rev_timestamp column for that purpose.
Another way, as John pointed out, is deletion.
If a page (or rather, it's revisions) are deleted by an administrator / user with 'sysop' right it will be moved from revision-table to archive-table.
As of MediaWiki version 1.5 (released in 2005) during deletion / undeletion the revision-id will be saved when it's moved to the archive-table, and will be re-used during undeletion / restore.
So any page deleted after June 2005 will retain the same low old revision if when restored. However any page deleted before 2005 didn't have the saved revision-id, so when any of those pages are restored now MediaWiki generates a new revision-id, just like it does for Import, just like it did before 2005 for undeletion.
As we can see in the logs here: http://hu.wikipedia.org/w/index.php?title=Speci%C3%A1lis:Rendszernapl%C3%B3k...
.. that page was deleted before June 2005 and undeleted in 2010.
As such it got a new revision id.
Conclusion: revision.rev_id is great to count revisions, and contributions. And for developers to see if a revision was added later in. However it's not meant for timelines, use rev_timestamp instead.
-- Krinkle
Another example are articles imported from usemodwiki, whose history will have a later id than those which were 'current'.
On Fri, Mar 18, 2011 at 01:03, Krinkle krinklemail@gmail.com wrote:
However it's not meant for timelines, use rev_timestamp instead.
However, revisions having same rev_timestamp still should be ordered by rev_id. There are pages that have had more than edit per second, and this would ensure that the ordering is really correct.
— Kalan
On Thu, Mar 17, 2011 at 6:26 PM, Kalan kalan.001@gmail.com wrote:
On Fri, Mar 18, 2011 at 01:03, Krinkle krinklemail@gmail.com wrote:
However it's not meant for timelines, use rev_timestamp instead.
However, revisions having same rev_timestamp still should be ordered by rev_id. There are pages that have had more than edit per second, and this would ensure that the ordering is really correct.
No, no they shouldn't. Being an autoincrement field you shouldn't be using rev_id like that.
On Thu, Mar 17, 2011 at 7:56 PM, OQ overlordq@gmail.com wrote:
No, no they shouldn't. Being an autoincrement field you shouldn't be using rev_id like that.
Why not? Some kind of tie-breaker is needed when ordering, if you want the order to be well-defined (which is needed for, e.g., IndexPager in MediaWiki). rev_id is as good a tie-breaker as any. It's unique and short, to start with. Also, for InnoDB, it has the advantage of being stored in the leaf of every index, so you might be able to avoid data lookups in some cases.
Krinkle wrote:
So any page deleted after June 2005 will retain the same low old revision if when restored. However any page deleted before 2005 didn't have the saved revision-id, so when any of those pages are restored now MediaWiki generates a new revision-id, just like it does for Import, just like it did before 2005 for undeletion.
Part of this is wrong. When a page is undeleted, it gets a new page ID, no matter when it was created/restored. While the old page ID is stored in the archive table, it isn't used during undeletion. This is the subject of bug 26123: https://bugzilla.wikimedia.org/show_bug.cgi?id=26123.
I just tested this again on test.wikipedia.org to make sure I wasn't crazy/that the behavior hadn't changed since I last looked. It definitely does not use the old page ID upon restoration. As noted on the bug, one of the potential problems here is that you can have multiple options to choose from.
MZMcBride
toolserver-l@lists.wikimedia.org