For those of you who are interested in reverts: I just presented our paper on accurate revert detection at the ACM Hypertext and Social Media conference 2012, showing a significant accuracy (and coverage) gain compared to the widely used method of finding identical revisions (via MD5 hash values) to detect reverts, proving that our method detects edit pairs that are significantly more likely to be actual reverts according to editors perception of a revert and the Wikipedia definition. 35% of the reverts found by the MD5 method in our sample are not assessed to be reverts by more than 80% of our survey participants (accuracy 0%). The provided new method finds different reverts for these 35% plus 12% more, which show a 70% accuracy.
Find the PDF slides, paper and results here: http://people.aifb.kit.edu/ffl/reverts/
I'll be happy to answer any questions.
More in detail: The MD5 hash method employed by many researchers to identify reverts (as some others, like using edit comments) is acknowledged to produce some inaccuracies as far as the Wikipedia definition of a revert ("reverses the actions of any editors", "undoing the actions"..) is concerned. The extent of these inaccuracies is usually judged to be not too large, as naturally, most reverting edits are carried out immediately after the edit to be reverted, being an "identity revert" (Wikipedia definition: "..normally results in the page being restored to a version that existed previously"). Still, there has not been a user evaluation assessing how well the detected reverts conform with the Wikipedia definition and what users actually perceive as a revert. We developed and evaluated an alternative method to the MD5 identity revert and show a significant increase in accuracy (and coverage). 34% of the reverts detected by the MD5 hash method in our sample actually fail to be acknowledged as full reverts by more than 80% of users in our study, while our new method performs much better, finding different reverts for these 34% wrongly detected reverts plus 12% more reverts, showing an accuracy of 70% for these newly found edit pairs actually being reverts according to the users. The increased accuracy performance between the reverts detected only by the MD5 and only by our new method is highly significant, while reverts detected by both methods also perform significantly better than those only detected by the MD5 method.
Trade-off: Although this method is much slower than the MD5 method (as it is using DIFFs between revisions) it reflects much better what users (and the Wikipedia community as a whole) see as a revert. It thereby is a valid alternative if you are interested in the antagonistic relationships between users on a more detailed and accurate level. There is quite some potential to make it even faster by combining the two methods, decreasing the number of DIFFs to be performed, let's see if we can come around doing that :)
The scripts and results listed in the paper can be found at http://people.aifb.kit.edu/ffl/reverts/
Best,
Fabian
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edumailto:fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
See also the review in last month's Wikimedia Research Newsletter: https://meta.wikimedia.org/wiki/Research:Newsletter/2012-05-28#New_algorithm...
On Wed, Jun 27, 2012 at 10:05 AM, Floeck, Fabian (AIFB) fabian.floeck@kit.edu wrote:
For those of you who are interested in reverts: I just presented our paper on accurate revert detection at the ACM Hypertext and Social Media conference 2012, showing a significant accuracy (and coverage) gain compared to the widely used method of finding identical revisions (via MD5 hash values) to detect reverts, proving that our method detects edit pairs that are significantly more likely to be actual reverts according to editors perception of a revert and the Wikipedia definition. 35% of the reverts found by the MD5 method in our sample are not assessed to be reverts by more than 80% of our survey participants (accuracy 0%). The provided new method finds different reverts for these 35% plus 12% more, which show a 70% accuracy.
Find the PDF slides, paper and results here: http://people.aifb.kit.edu/ffl/reverts/
I'll be happy to answer any questions.
More in detail: The MD5 hash method employed by many researchers to identify reverts (as some others, like using edit comments) is acknowledged to produce some inaccuracies as far as the Wikipedia definition of a revert ("reverses the actions of any editors", "undoing the actions"..) is concerned. The extent of these inaccuracies is usually judged to be not too large, as naturally, most reverting edits are carried out immediately after the edit to be reverted, being an "identity revert" (Wikipedia definition: "..normally results in the page being restored to a version that existed previously"). Still, there has not been a user evaluation assessing how well the detected reverts conform with the Wikipedia definition and what users actually perceive as a revert. We developed and evaluated an alternative method to the MD5 identity revert and show a significant increase in accuracy (and coverage). 34% of the reverts detected by the MD5 hash method in our sample actually fail to be acknowledged as full reverts by more than 80% of users in our study, while our new method performs much better, finding different reverts for these 34% wrongly detected reverts plus 12% more reverts, showing an accuracy of 70% for these newly found edit pairs actually being reverts according to the users. The increased accuracy performance between the reverts detected only by the MD5 and only by our new method is highly significant, while reverts detected by both methods also perform significantly better than those only detected by the MD5 method.
Trade-off: Although this method is much slower than the MD5 method (as it is using DIFFs between revisions) it reflects much better what users (and the Wikipedia community as a whole) see as a revert. It thereby is a valid alternative if you are interested in the antagonistic relationships between users on a more detailed and accurate level. There is quite some potential to make it even faster by combining the two methods, decreasing the number of DIFFs to be performed, let's see if we can come around doing that :)
The scripts and results listed in the paper can be found at http://people.aifb.kit.edu/ffl/reverts/
Best,
Fabian
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I don't understand: if 35 % of the sample reverts identified by the hash method are not considered such by human check and the new system has a 70 % accuracy, the difference in false positives is 5 %? I don't understand from the paper either. The main point seems to be about the more reverts found (as expected), right?
Nemo
@Tilman: Thanks, I was not aware of that being in the NL, didn't read it. Excuses everyone for the double posting.
@Federico: Sorry for not putting it more clearly/ confusing you: So 1. From the reverts detected by MD5 hash, 37% (actually 37% percent, I just looked it up) were not detected by the new method, 63% percent where detected by the new method as well. When we asked people about if these 37% are a full revert (and requiring 80%+ of people to agree for it to be labeled a "true revert") for none of these reverts the crowd agreed (i.e. 0% accuracy, only goes up if you lower the agreement notably, which means you cannot be sure anymore, if it is indeed a revert). 2. When we looked at the results produced from our method only, (again, with the 80% agreement score threshold), about 70% of the found results were deemed reverts in comparison. 3. I just put these numbers in the mail (and the presentation) to exemplify the gain of accuracy. They are not in the paper in this form, as there, we showed the gain in accuracy just by the statistical significance of the differences in the agreements score, which I later realized might not be as "tangible" as some accuracy numbers. Turns out it seems to be more confusing the way I put it, sorry for that.
@WereSpielChequers: That could be indeed an interesting direction one could look into. Although given the problems of the identity revert method we discussed in the paper, I can not yet see how these could be alleviated by looking at reverts in the article section-wise. You are certainly right to point out that in this specific situation, although there would be not necessarily an identical hash for the whole article leading to a revert detection, there could be an identical/duplicate hash for the subsection, leading to an accurate revert detection in that section. Though inside this section, the same issues as portrayed in our paper would surface. I will look at that period of "Sarah Palin" however to get a better picture of that. Thanks a lot for the input.
Best,
Fabian
On Jun 27, 2012, at 8:14 PM, Federico Leva (Nemo) wrote:
I don't understand: if 35 % of the sample reverts identified by the hash method are not considered such by human check and the new system has a 70 % accuracy, the difference in false positives is 5 %? I don't understand from the paper either. The main point seems to be about the more reverts found (as expected), right?
Nemo
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edumailto:fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Fabian,
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts via a md5 hash are not considered reverts by (I presume) humans? Can you give a common example? By definition, identity revert revisions represent an exact replica of a previous revision in an article and, therefore, should discard any intermediate changes. What definition of "revert" are you using that the md5 hash method does not satisfy?
-Aaron
On Wed, Jun 27, 2012 at 12:12 PM, Floeck, Fabian (AIFB) < fabian.floeck@kit.edu> wrote:
@Tilman: Thanks, I was not aware of that being in the NL, didn't read it. Excuses everyone for the double posting.
@Federico: Sorry for not putting it more clearly/ confusing you: So
- From the reverts detected by MD5 hash, 37% (actually 37% percent, I
just looked it up) were not detected by the new method, 63% percent where detected by the new method as well. When we asked people about if these 37% are a full revert (and requiring 80%+ of people to agree for it to be labeled a "true revert") for none of these reverts the crowd agreed (i.e. 0% accuracy, only goes up if you lower the agreement notably, which means you cannot be sure anymore, if it is indeed a revert). 2. When we looked at the results produced from our method only, (again, with the 80% agreement score threshold), about 70% of the found results were deemed reverts in comparison. 3. I just put these numbers in the mail (and the presentation) to exemplify the gain of accuracy. They are not in the paper in this form, as there, we showed the gain in accuracy just by the statistical significance of the differences in the agreements score, which I later realized might not be as "tangible" as some accuracy numbers. Turns out it seems to be more confusing the way I put it, sorry for that.
@WereSpielChequers: That could be indeed an interesting direction one could look into. Although given the problems of the identity revert method we discussed in the paper, I can not yet see how these could be alleviated by looking at reverts in the article section-wise. You are certainly right to point out that in this specific situation, although there would be not necessarily an identical hash for the *whole* article leading to a revert detection, there could be an identical/duplicate hash for the subsection, leading to an accurate revert detection in that section. Though inside this section, the same issues as portrayed in our paper would surface. I will look at that period of "Sarah Palin" however to get a better picture of that. Thanks a lot for the input.
Best,
Fabian
On Jun 27, 2012, at 8:14 PM, Federico Leva (Nemo) wrote:
I don't understand: if 35 % of the sample reverts identified by the hash method are not considered such by human check and the new system has a 70 % accuracy, the difference in false positives is 5 %? I don't understand from the paper either. The main point seems to be about the more reverts found (as expected), right?
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Aaron,
this is explained in the paper and to some extend in the slides together with examples. To summarize it: Not every edit "undoing the actions of another editor" does necessarily lead to an identical revision that was there before. One example: I add new unique content to the article and in the same edit, I remove the exact words that you just wrote in the previous edit. I'm effectively reverting your edit, according to the Wikipedia definition and also to what most editors see as a revert. But there is no revision content/hash value generated that has been there before (i.e. it would not be detected by a method looking for identical hashes = false negative). That is of course only according to the Wikipedia definition, which puts a strong emphasis on "undoing". The MD5 hash method does NOT make any mistakes when you use the definition "A revert is going back to a previous revision" as a baseline. Our point in this paper is essentially that this latter definition does not really reflect what a revert in fact is, according to the WIkipedia definition and to users. It is more a definition of what normally happens when a revert is carried out (compare the Wikipedia definition --> "normally..") That, in turn, can lead to a number of misinterpretations of the antagonistic relationships between users, when you want (like us, in later work) e.g. to model a social network between them. The definition "a revert means going back to another, identical revision" is too narrow, as there are (hence the 12% coverage gain) a lot more edit actions that fall under the term "revert". Also, this definition produces a lot of false positives, that are not reverts in the understanding of users/Wikipedia definition (hence the 37%).
If it is still unclear, I would recommend you to read the paper as we explain it more in detail there. For any remaining questions, I will of course try to answer your emails as fast as possible.
Best,
Fabian
On Jun 28, 2012, at 1:39 AM, Aaron Halfaker wrote:
Fabian,
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts via a md5 hash are not considered reverts by (I presume) humans? Can you give a common example? By definition, identity revert revisions represent an exact replica of a previous revision in an article and, therefore, should discard any intermediate changes. What definition of "revert" are you using that the md5 hash method does not satisfy?
-Aaron
On Wed, Jun 27, 2012 at 12:12 PM, Floeck, Fabian (AIFB) <fabian.floeck@kit.edumailto:fabian.floeck@kit.edu> wrote: @Tilman: Thanks, I was not aware of that being in the NL, didn't read it. Excuses everyone for the double posting.
@Federico: Sorry for not putting it more clearly/ confusing you: So 1. From the reverts detected by MD5 hash, 37% (actually 37% percent, I just looked it up) were not detected by the new method, 63% percent where detected by the new method as well. When we asked people about if these 37% are a full revert (and requiring 80%+ of people to agree for it to be labeled a "true revert") for none of these reverts the crowd agreed (i.e. 0% accuracy, only goes up if you lower the agreement notably, which means you cannot be sure anymore, if it is indeed a revert). 2. When we looked at the results produced from our method only, (again, with the 80% agreement score threshold), about 70% of the found results were deemed reverts in comparison. 3. I just put these numbers in the mail (and the presentation) to exemplify the gain of accuracy. They are not in the paper in this form, as there, we showed the gain in accuracy just by the statistical significance of the differences in the agreements score, which I later realized might not be as "tangible" as some accuracy numbers. Turns out it seems to be more confusing the way I put it, sorry for that.
@WereSpielChequers: That could be indeed an interesting direction one could look into. Although given the problems of the identity revert method we discussed in the paper, I can not yet see how these could be alleviated by looking at reverts in the article section-wise. You are certainly right to point out that in this specific situation, although there would be not necessarily an identical hash for the whole article leading to a revert detection, there could be an identical/duplicate hash for the subsection, leading to an accurate revert detection in that section. Though inside this section, the same issues as portrayed in our paper would surface. I will look at that period of "Sarah Palin" however to get a better picture of that. Thanks a lot for the input.
Best,
Fabian
On Jun 27, 2012, at 8:14 PM, Federico Leva (Nemo) wrote:
I don't understand: if 35 % of the sample reverts identified by the hash method are not considered such by human check and the new system has a 70 % accuracy, the difference in false positives is 5 %? I don't understand from the paper either. The main point seems to be about the more reverts found (as expected), right?
Nemo
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584tel:%2B49%20721%20608%204%206584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edumailto:fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ckhttp://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edumailto:fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Fabian,
I may have not stated myself clearly. I'd asking for examples of false positives detected by the md5 checksum approach. In other words, I'm hoping that you'll show me some revisions from English Wikipedia (extra credit for links) that appear to be reverting other edits via the md5 checksum approach, but actually are not reverting other edits.
-Aaron
On Thu, Jun 28, 2012 at 9:54 AM, Floeck, Fabian (AIFB) < fabian.floeck@kit.edu> wrote:
Hi Aaron,
this is explained in the paper and to some extend in the slides together with examples. To summarize it: Not every edit "undoing the actions of another editor" does necessarily lead to an identical revision that was there before. One example: I add new unique content to the article and in the same edit, I remove the exact words that you just wrote in the previous edit. I'm effectively reverting your edit, according to the Wikipedia definition and also to what most editors see as a revert. But there is no revision content/hash value generated that has been there before (i.e. it would not be detected by a method looking for identical hashes = false negative). That is of course only according to the Wikipedia definition, which puts a strong emphasis on "undoing". The MD5 hash method does NOT make any mistakes when you use the definition "A revert is going back to a previous revision" as a baseline. Our point in this paper is essentially that this latter definition does not really reflect what a revert in fact *is, *according to the WIkipedia definition and to users. It is more a definition of what normally happens when a revert is carried out (compare the Wikipedia definition --> "*normally..*") That, in turn, can lead to a number of misinterpretations of the antagonistic relationships between users, when you want (like us, in later work) e.g. to model a social network between them. The definition "a revert means going back to another, identical revision" is too narrow, as there are (hence the 12% coverage gain) a lot more edit actions that fall under the term "revert". Also, this definition produces a lot of false positives, that are not reverts in the understanding of users/Wikipedia definition (hence the 37%).
If it is still unclear, I would recommend you to read the paper as we explain it more in detail there. For any remaining questions, I will of course try to answer your emails as fast as possible.
Best,
Fabian
On Jun 28, 2012, at 1:39 AM, Aaron Halfaker wrote:
Fabian,
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts via a md5 hash are not considered reverts by (I presume) humans? Can you give a common example? By definition, identity revert revisions represent an exact replica of a previous revision in an article and, therefore, should discard any intermediate changes. What definition of "revert" are you using that the md5 hash method does not satisfy?
-Aaron
On Wed, Jun 27, 2012 at 12:12 PM, Floeck, Fabian (AIFB) < fabian.floeck@kit.edu> wrote:
@Tilman: Thanks, I was not aware of that being in the NL, didn't read it. Excuses everyone for the double posting.
@Federico: Sorry for not putting it more clearly/ confusing you: So
- From the reverts detected by MD5 hash, 37% (actually 37% percent, I
just looked it up) were not detected by the new method, 63% percent where detected by the new method as well. When we asked people about if these 37% are a full revert (and requiring 80%+ of people to agree for it to be labeled a "true revert") for none of these reverts the crowd agreed (i.e. 0% accuracy, only goes up if you lower the agreement notably, which means you cannot be sure anymore, if it is indeed a revert). 2. When we looked at the results produced from our method only, (again, with the 80% agreement score threshold), about 70% of the found results were deemed reverts in comparison. 3. I just put these numbers in the mail (and the presentation) to exemplify the gain of accuracy. They are not in the paper in this form, as there, we showed the gain in accuracy just by the statistical significance of the differences in the agreements score, which I later realized might not be as "tangible" as some accuracy numbers. Turns out it seems to be more confusing the way I put it, sorry for that.
@WereSpielChequers: That could be indeed an interesting direction one could look into. Although given the problems of the identity revert method we discussed in the paper, I can not yet see how these could be alleviated by looking at reverts in the article section-wise. You are certainly right to point out that in this specific situation, although there would be not necessarily an identical hash for the *whole* article leading to a revert detection, there could be an identical/duplicate hash for the subsection, leading to an accurate revert detection in that section. Though inside this section, the same issues as portrayed in our paper would surface. I will look at that period of "Sarah Palin" however to get a better picture of that. Thanks a lot for the input.
Best,
Fabian
On Jun 27, 2012, at 8:14 PM, Federico Leva (Nemo) wrote:
I don't understand: if 35 % of the sample reverts identified by the hash method are not considered such by human check and the new system has a 70 % accuracy, the difference in false positives is 5 %? I don't understand from the paper either. The main point seems to be about the more reverts found (as expected), right?
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
<quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM -0700">
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts via a md5 hash are not considered reverts by (I presume) humans? Can you give a common example? By definition, identity revert revisions represent an exact replica of a previous revision in an article and, therefore, should discard any intermediate changes. What definition of "revert" are you using that the md5 hash method does not satisfy?
Also, I can't tell from either the paper or the conversation here: Are Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? When you do that, things become a bit more complicated.
And are you sure your human coders aren't just relying on edit summaries? Like Aaron, I'm having a hard time imagining a situation where revisions go HASH-A => HASH-B => HASH-A that shouldn't be treated like a revert and think tend to think this sounds more like fallible than broken tools. If the user doesn't *know* or think they are reverting an edit, it seems wrong to *not* to call that a revert.
Later, Mako
One example I've seen of md5's failing is in common short vandalism phrases. The most obvious of these is removing all text, or single vulgar words.
Here's a scenario:
- User A vandalizes an article by replacing it with "poop." - User B restores the article. - Some time passes.... - User C vandalizes an article by replacing it with "poop."
User C isn't really reverting B's edit. You may be able to guess that since A was reverted, C must not be reverting, but this logic can be tricky and wrong.
In practice, I've been able to catch most of these instances by a) ignoring md5 reverts outside a certain window of revisions and b) ignoring md5 reverts whose replace lots of text with very little text.
Also, this situation occurs regularly, but nowhere near 37%.
-Shilad
On Fri, Jun 29, 2012 at 11:15 AM, Benj. Mako Hill mako@mit.edu wrote:
<quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM -0700"> > I'm confused by your explanation. > > How is it possible that this 37% of revisions that are detected as reverts > via a md5 hash are not considered reverts by (I presume) humans? Can you > give a common example? By definition, identity revert revisions represent > an exact replica of a previous revision in an article and, therefore, > should discard any intermediate changes. What definition of "revert" are > you using that the md5 hash method does not satisfy?
Also, I can't tell from either the paper or the conversation here: Are Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? When you do that, things become a bit more complicated.
And are you sure your human coders aren't just relying on edit summaries? Like Aaron, I'm having a hard time imagining a situation where revisions go HASH-A => HASH-B => HASH-A that shouldn't be treated like a revert and think tend to think this sounds more like fallible than broken tools. If the user doesn't *know* or think they are reverting an edit, it seems wrong to *not* to call that a revert.
Later, Mako
-- Benjamin Mako Hill mako@mit.edu http://mako.cc/
Creativity can be a social contribution, but only in so far as society is free to use the results. --GNU Manifesto
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
First of all, thanks a lot for your questions and remarks. (btw Mako: nice panel talk yesterday at WPAC12)
tl;dr: scroll all the way down for examples
Questions by Mako: 1. Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? --> I'm not quite sure what you mean but we do not limit this to specific edits, only exception being: Both methods were tested with a limit of going back max. 20 revisions to look for reverts. 2.And are you sure your human coders aren't just relying on edit summaries? --> they couldn't see the edit summaries, via our experimental setup. 3. HASH-A => HASH-B => HASH-A no revert? --> (assuming you mean HASH B is only one revision/edit): this is ALWAYS a revert by A targeting B, in both methods and was always evaluated as such by the users.
Before I give examples, let me just remind you that this is only a sample, so first of all, of course it is not inferred statistically that these 37% that I mentioned necessarily appear like this in general in the exact same way. Secondly, this number is generated when you assume that 80% of all participants uttered agreement to an edit pair being a full revert, i.e. of course there were cases in the sample where people did disagree and some cases where even the majority was voting for it to be a full revert while being detected by MD5, just not over 79%. I chose this threshold to make the differences clear, I could have also selected some other arbitrary value. That is exactly why we did not put it in the paper. Because the analysis in the paper is a much better ground for making statistical inferences about the data that is the "basic population" for this analysis.
Now, let me give you some examples for false positives generated by the MD5 hash method:
1. One self-generated example (inspired by observations) is given in the paper (almost identically):
RevID # RevContent (after edit)# Edit # Hash 1 # Peanut # +Peanut # Hash1 2 # Peanut Apple # +Apple # Hash2 3 # Peanut Apple Banana # +Banana # Hash3 4 # Peanut Banana # - Apple # Hash4 5 # Peanut # -Banana # Hash1
MD5 assigns 5 as reverting edit of 2,3,4 DIFF assigns 5 as reverting edit of 3, 4 as reverting edit of 2
false positive in this case (according to Wikipedia definition) for MD5: 5 reverting 4 and 2 (as 4 is unrelated to what 5 is doing and 2's contribution is removed already, it can thus not be undone anymore by 5)
2. "Real-life" examples rated as false positives in the user evaluation:
When you asked me for the examples, I started digging them up from the data sample that was used and in fact realized that many false positives of the MD5 method are related to self-reverts. As this is no issue for our data extraction aims (we want to have self-reverts in the results as well) and was not considered when just randomly drawing edit pairs from the two methods' results, we didn't discuss this in the paper. If you, of course, do not consider self-reverts to be reverts in the Wikipedia definition sense, they could be filtered out by collapsing subsequent edits of one editor before running the revert analysis with the MD5 method. That would reduce the number of false positives notably I assume. I will certainly look into that. If you don't collapse these edits, however (which is not regularly done before reporting/using revert detection results), the number of false positives will be quite high, as the edits-to-be-collapsed (and prone to being misinterpreted) appear quite often and their span can at times be considerably large. And of course there are cases not related to self-reverts.
I tried to select examples representative for the sample, which received very little or no votes to be full reverts (as detected by MD5):
Example A detected as reverted: http://en.wikipedia.org/w/index.php?&diff=25866415 detected as reverting: http://en.wikipedia.org/w/index.php?diff=25866579
detected-as-reverting edit removes only "insomnia" from the detected-as-reverted edit, i.e. no full revert, as some insertions from previous edits have already been deleted by the reverted editor himself. Would be a correct full revert if you collapsed edits by the reverted editor to one.
Example B
detected as reverted: http://en.wikipedia.org/w/index.php?diff=196507540 detected as reverting: http://en.wikipedia.org/w/index.php?diff=196507775
self-revert of vandalism introduced ("kirsty u tit") before second editor reverts--> cannot be reverted by detected-as-reverted edit. Would also be remedied by collapsing the first editors edits.
Example C not related to self-revert, this is an example of incomplete vandalism repair, which is then subsequently completed:
detected as reverted: http://en.wikipedia.org/w/index.php?diff=162097520 detected as reverting:http://en.wikipedia.org/w/index.php?diff=162113945
Example D not related to self-revert
A revert is carried out by TheJazzDalek targeting the edits by 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39, but in the same edit, something is deleted by TheJazzDalek, leading to a new unique revisions content. As 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39 in the next edit reverts this deletion by TheJazzDalek, but not the initial revert of his (74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39's) own edits, it is erroneously concluded that 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39 reverts himself, which is not the case.
detected as reverted: http://en.wikipedia.org/w/index.php?diff=292533562 detected as reverting: http://en.wikipedia.org/w/index.php?diff=292760323
Example E
detected as reverted: http://en.wikipedia.org/w/index.php?diff=231824943 detected as reverting: http://en.wikipedia.org/w/index.php?diff=231960286
First, reverting editor (Laser brain) undoes (not rolling back to/ not creating duplicate revision) some edits by another editor before deleting the result of 67.162.68.255http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 's edits (one of which was detected here as reverted). The "detected as reverted" revision is partly self-reverted by 67.162.68.http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255255http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 . The other part, a date change in an "accessdate=" is not "undone" as such, but the whole "accessdate=" part is deleted (stemming from a third editor).
Example F
Here, between the "reverted" and the "reverting" one, there happens a mixture of self-reverts, reverts and different vandalism forms,:
detected as reverted:http://en.wikipedia.org/w/index.php?diff=131372047 detected as reverting:http://en.wikipedia.org/w/index.php?diff=131658207
If I now failed to answer any of your questions please excuse me and ask me again.
Best,
Fabian
On Jun 29, 2012, at 9:53 PM, Shilad Sen wrote:
One example I've seen of md5's failing is in common short vandalism phrases. The most obvious of these is removing all text, or single vulgar words.
Here's a scenario:
* User A vandalizes an article by replacing it with "poop." * User B restores the article. * Some time passes.... * User C vandalizes an article by replacing it with "poop."
User C isn't really reverting B's edit. You may be able to guess that since A was reverted, C must not be reverting, but this logic can be tricky and wrong.
In practice, I've been able to catch most of these instances by a) ignoring md5 reverts outside a certain window of revisions and b) ignoring md5 reverts whose replace lots of text with very little text.
Also, this situation occurs regularly, but nowhere near 37%.
-Shilad
On Fri, Jun 29, 2012 at 11:15 AM, Benj. Mako Hill <mako@mit.edumailto:mako@mit.edu> wrote: <quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM -0700">
I'm confused by your explanation.
How is it possible that this 37% of revisions that are detected as reverts via a md5 hash are not considered reverts by (I presume) humans? Can you give a common example? By definition, identity revert revisions represent an exact replica of a previous revision in an article and, therefore, should discard any intermediate changes. What definition of "revert" are you using that the md5 hash method does not satisfy?
Also, I can't tell from either the paper or the conversation here: Are Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? When you do that, things become a bit more complicated.
And are you sure your human coders aren't just relying on edit summaries? Like Aaron, I'm having a hard time imagining a situation where revisions go HASH-A => HASH-B => HASH-A that shouldn't be treated like a revert and think tend to think this sounds more like fallible than broken tools. If the user doesn't *know* or think they are reverting an edit, it seems wrong to *not* to call that a revert.
Later, Mako
-- Benjamin Mako Hill mako@mit.edumailto:mako@mit.edu http://mako.cc/
Creativity can be a social contribution, but only in so far as society is free to use the results. --GNU Manifesto
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Shilad W. Sen Assistant Professor Mathematics, Statistics, and Computer Science Dept. Macalester College ssen@macalester.edumailto:ssen@macalester.edu 651-696-6273 _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orgmailto:Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edumailto:fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
I suspected as much. It looks like "false positive" isn't a very good way to think about the accuracy problem. It looks to me like there are three states of interest. Assuming a pair of revision_ids representing the information contained in a "revert": (reverted_id, reverting_id)
1. (reverted_id, reverting_id): The desired case. The reverted revision appears to have has contributions discarded by the reverting revision 2. (reverted_id, X): Suboptimal, but useful case. The reverted revision was indeed reverted, but associated reverting revision was not the one to discard the contributions. 3. (X, X): False positive. The reverted revision was not actually reverted.
I had thought you were referring to case #3, when you were generally referring to case #2. Is that right?
-Aaron
On Mon, Jul 2, 2012 at 1:23 PM, Floeck, Fabian (AIFB) <fabian.floeck@kit.edu
wrote:
First of all, thanks a lot for your questions and remarks. (btw Mako: nice panel talk yesterday at WPAC12)
tl;dr: scroll all the way down for examples
Questions by Mako:
- Are you limiting this to edits that are separated by an revisions with
identical hashes by only one edit? --> I'm not quite sure what you mean but we do not limit this to specific edits, only exception being: Both methods were tested with a limit of going back max. 20 revisions to look for reverts. 2.And are you sure your human coders aren't just relying on edit summaries? --> they couldn't see the edit summaries, via our experimental setup. 3. HASH-A => HASH-B => HASH-A no revert? --> (assuming you mean HASH B is only one revision/edit): this is ALWAYS a revert by A targeting B, in both methods and was always evaluated as such by the users.
Before I give examples, let me just remind you that this is only a sample, so first of all, of course it is not inferred statistically that these 37% that I mentioned necessarily appear like this in general in the exact same way. Secondly, this number is generated when you assume that 80% of all participants uttered agreement to an edit pair being a *full* revert, i.e. of course there were cases in the sample where people did disagree and some cases where even the majority was voting for it to be a full revert while being detected by MD5, just not over 79%. I chose this threshold to make the differences clear, I could have also selected some other arbitrary value. That is exactly why we did not put it in the paper. Because the analysis in the paper is a much better ground for making statistical inferences about the data that is the "basic population" for this analysis.
Now, let me give you some examples for false positives generated by the MD5 hash method:
- One self-generated example (inspired by observations) is given in the
paper (almost identically):
RevID # RevContent (after edit)# Edit # Hash 1 # Peanut # +Peanut # Hash1 2 # Peanut Apple # +Apple # Hash2 3 # Peanut Apple Banana # +Banana # Hash3 4 # Peanut Banana # - Apple # Hash4 5 # Peanut # -Banana # Hash1
MD5 assigns 5 as reverting edit of 2,3,4 DIFF assigns 5 as reverting edit of 3, 4 as reverting edit of 2
false positive in this case (according to Wikipedia definition) for MD5: 5 reverting 4 and 2 (as 4 is unrelated to what 5 is doing and 2's contribution is removed already, it can thus not be undone anymore by 5)
- "Real-life" examples rated as false positives in the user evaluation:
When you asked me for the examples, I started digging them up from the data sample that was used and in fact realized that many false positives of the MD5 method are related to self-reverts. As this is no issue for our data extraction aims (we want to have self-reverts in the results as well) and was not considered when just randomly drawing edit pairs from the two methods' results, we didn't discuss this in the paper. If you, of course, do not consider self-reverts to be reverts in the Wikipedia definition sense, they could be filtered out by collapsing subsequent edits of one editor before running the revert analysis with the MD5 method. That would reduce the number of false positives notably I assume. I will certainly look into that. If you don't collapse these edits, however (which is not* regularly *done* *before reporting/using revert detection results), the number of false positives will be quite high, as the edits-to-be-collapsed (and prone to being misinterpreted) appear quite often and their span can at times be considerably large. And of course there are cases not related to self-reverts.
I tried to select examples representative for the sample, which received very little or no votes to be full reverts (as detected by MD5):
Example A detected as reverted: http://en.wikipedia.org/w/index.php?&diff=25866415 detected as reverting: http://en.wikipedia.org/w/index.php?diff=25866579
detected-as-reverting edit removes only "insomnia" from the detected-as-reverted edit, i.e. no full revert, as some insertions from previous edits have already been deleted by the reverted editor himself. Would be a correct full revert if you collapsed edits by the reverted editor to one.
Example B
detected as reverted: http://en.wikipedia.org/w/index.php?diff=196507540 detected as reverting: http://en.wikipedia.org/w/index.php?diff=196507775
self-revert of vandalism introduced ("kirsty u tit") before second editor reverts--> cannot be reverted by detected-as-reverted edit. Would also be remedied by collapsing the first editors edits.
Example C not related to self-revert, this is an example of incomplete vandalism repair, which is then subsequently completed:
detected as reverted: http://en.wikipedia.org/w/index.php?diff=162097520 detected as reverting:http://en.wikipedia.org/w/index.php?diff=162113945
Example D not related to self-revert
A revert is carried out by TheJazzDalek targeting the edits by 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39, but in the same edit, something is deleted by TheJazzDalek, leading to a new unique revisions content. As 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39 in the next edit reverts this deletion by TheJazzDalek, but not the initial revert of his (74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39's) own edits, it is erroneously concluded that 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39 reverts himself, which is not the case.
detected as reverted: http://en.wikipedia.org/w/index.php?diff=292533562 detected as reverting: http://en.wikipedia.org/w/index.php?diff=292760323
Example E
detected as reverted: http://en.wikipedia.org/w/index.php?diff=231824943 detected as reverting: http://en.wikipedia.org/w/index.php?diff=231960286
First, reverting editor (Laser brain) undoes (not rolling back to/ not creating duplicate revision) some edits by another editor before deleting the result of 67.162.68.255http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 's edits (one of which was detected here as reverted). The "detected as reverted" revision is partly self-reverted by 67.162.68.http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 255 http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 . The other part, a date change in an "accessdate=" is not "undone" as such, but the whole "accessdate=" part is deleted (stemming from a third editor).
Example F
Here, between the "reverted" and the "reverting" one, there happens a mixture of self-reverts, reverts and different vandalism forms,:
detected as reverted:http://en.wikipedia.org/w/index.php?diff=131372047 detected as reverting:http://en.wikipedia.org/w/index.php?diff=131658207
If I now failed to answer any of your questions please excuse me and ask me again.
Best,
Fabian
On Jun 29, 2012, at 9:53 PM, Shilad Sen wrote:
One example I've seen of md5's failing is in common short vandalism phrases. The most obvious of these is removing all text, or single vulgar words.
Here's a scenario:
- User A vandalizes an article by replacing it with "poop."
- User B restores the article.
- Some time passes....
- User C vandalizes an article by replacing it with "poop."
User C isn't really reverting B's edit. You may be able to guess that since A was reverted, C must not be reverting, but this logic can be tricky and wrong.
In practice, I've been able to catch most of these instances by a) ignoring md5 reverts outside a certain window of revisions and b) ignoring md5 reverts whose replace lots of text with very little text.
Also, this situation occurs regularly, but nowhere near 37%.
-Shilad
On Fri, Jun 29, 2012 at 11:15 AM, Benj. Mako Hill mako@mit.edu wrote:
<quote who="Aaron Halfaker" date="Wed, Jun 27, 2012 at 04:39:30PM -0700"> > I'm confused by your explanation. > > How is it possible that this 37% of revisions that are detected as reverts > via a md5 hash are not considered reverts by (I presume) humans? Can you > give a common example? By definition, identity revert revisions represent > an exact replica of a previous revision in an article and, therefore, > should discard any intermediate changes. What definition of "revert" are > you using that the md5 hash method does not satisfy?
Also, I can't tell from either the paper or the conversation here: Are Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? When you do that, things become a bit more complicated.
And are you sure your human coders aren't just relying on edit summaries? Like Aaron, I'm having a hard time imagining a situation where revisions go HASH-A => HASH-B => HASH-A that shouldn't be treated like a revert and think tend to think this sounds more like fallible than broken tools. If the user doesn't *know* or think they are reverting an edit, it seems wrong to *not* to call that a revert.
Later, Mako
-- Benjamin Mako Hill mako@mit.edu http://mako.cc/
Creativity can be a social contribution, but only in so far as society is free to use the results. --GNU Manifesto
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
-- Shilad W. Sen Assistant Professor Mathematics, Statistics, and Computer Science Dept. Macalester College ssen@macalester.edu 651-696-6273 _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
--
Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Aaron,
I wonder why you were surprised by the high number of what I call false positives in the sample then, if you suspected as much.
Further, false positives as I described them are indeed a very good way to think about the accuracy problem, if you put some thought in what you define as accuracy and as false positives. The definition that we use and which can be found if you read the paper: the approach aims at defining "who is reverted by whom" i.e. we think about reverts as antagonistic relations between editors and want to track the correct ties between these. Indicating an antagonistic relationship between two editors if there is actually none can clearly be deemed a "false positive" in the detection task. Equally important: even if you are only interested in which revision was reverted, this is often not done correctly by the MD5 hash method, as you can clearly see from the examples I gave.
Regarding your "error cases":
1. (reverted_id, reverting_id): The desired case. The reverted revision appears to have has contributions discarded by the reverting revision
exactly
1. (reverted_id, X): Suboptimal, but useful case. The reverted revision was indeed reverted, but associated reverting revision was not the one to discard the contributions.
Sure, if you are only interested in WHO was reverted, but not by whom, it might be useful. The thing is that the MD5 method gets even the WHO wrong in a couple of cases, though more often in the variation (X, reverting_ID).
If want to know both sides of the revert, this is definitely not "suboptimal", but simply a false positive for the relation between the two revisions.
1. (X, X): False positive. The reverted revision was not actually reverted.
Like #2., this is also a false positive for the relations between the revisions, while also detecting wrongly which revision was reverted.
I had thought you were referring to case #3, when you were generally referring to case #2. Is that right?
No, that is not right at all, I refer to both cases #2. and #3, as you can clearly see from the examples. I'll try to make it more understandable: in my hand-made example, look at the edit #4 (deleting"Apple") and the edit #5 (deleting "Banana") --> edit #5 didn't undo/revert edit #4 and edit #4 was not reverted by anyone (the deletion of "Apple" was not undone). If you show this pair of edits to editors, as we did, they say this is not a revert in the sense of the Wikipedia definition. Consequently, this is a #3. false positive, as edit #4 was not reverted. This case can as well be found frequently in the examples.
Fabian
On Jul 6, 2012, at 2:54 AM, Aaron Halfaker wrote:
I suspected as much. It looks like "false positive" isn't a very good way to think about the accuracy problem. It looks to me like there are three states of interest. Assuming a pair of revision_ids representing the information contained in a "revert": (reverted_id, reverting_id)
1. (reverted_id, reverting_id): The desired case. The reverted revision appears to have has contributions discarded by the reverting revision 2. (reverted_id, X): Suboptimal, but useful case. The reverted revision was indeed reverted, but associated reverting revision was not the one to discard the contributions. 3. (X, X): False positive. The reverted revision was not actually reverted.
I had thought you were referring to case #3, when you were generally referring to case #2. Is that right?
-Aaron
On Mon, Jul 2, 2012 at 1:23 PM, Floeck, Fabian (AIFB) <fabian.floeck@kit.edumailto:fabian.floeck@kit.edu> wrote: First of all, thanks a lot for your questions and remarks. (btw Mako: nice panel talk yesterday at WPAC12)
tl;dr: scroll all the way down for examples
Questions by Mako: 1. Are you limiting this to edits that are separated by an revisions with identical hashes by only one edit? --> I'm not quite sure what you mean but we do not limit this to specific edits, only exception being: Both methods were tested with a limit of going back max. 20 revisions to look for reverts. 2.And are you sure your human coders aren't just relying on edit summaries? --> they couldn't see the edit summaries, via our experimental setup. 3. HASH-A => HASH-B => HASH-A no revert? --> (assuming you mean HASH B is only one revision/edit): this is ALWAYS a revert by A targeting B, in both methods and was always evaluated as such by the users.
Before I give examples, let me just remind you that this is only a sample, so first of all, of course it is not inferred statistically that these 37% that I mentioned necessarily appear like this in general in the exact same way. Secondly, this number is generated when you assume that 80% of all participants uttered agreement to an edit pair being a full revert, i.e. of course there were cases in the sample where people did disagree and some cases where even the majority was voting for it to be a full revert while being detected by MD5, just not over 79%. I chose this threshold to make the differences clear, I could have also selected some other arbitrary value. That is exactly why we did not put it in the paper. Because the analysis in the paper is a much better ground for making statistical inferences about the data that is the "basic population" for this analysis.
Now, let me give you some examples for false positives generated by the MD5 hash method:
1. One self-generated example (inspired by observations) is given in the paper (almost identically):
RevID # RevContent (after edit)# Edit # Hash 1 # Peanut # +Peanut # Hash1 2 # Peanut Apple # +Apple # Hash2 3 # Peanut Apple Banana # +Banana # Hash3 4 # Peanut Banana # - Apple # Hash4 5 # Peanut # -Banana # Hash1
MD5 assigns 5 as reverting edit of 2,3,4 DIFF assigns 5 as reverting edit of 3, 4 as reverting edit of 2
false positive in this case (according to Wikipedia definition) for MD5: 5 reverting 4 and 2 (as 4 is unrelated to what 5 is doing and 2's contribution is removed already, it can thus not be undone anymore by 5)
2. "Real-life" examples rated as false positives in the user evaluation:
When you asked me for the examples, I started digging them up from the data sample that was used and in fact realized that many false positives of the MD5 method are related to self-reverts. As this is no issue for our data extraction aims (we want to have self-reverts in the results as well) and was not considered when just randomly drawing edit pairs from the two methods' results, we didn't discuss this in the paper. If you, of course, do not consider self-reverts to be reverts in the Wikipedia definition sense, they could be filtered out by collapsing subsequent edits of one editor before running the revert analysis with the MD5 method. That would reduce the number of false positives notably I assume. I will certainly look into that. If you don't collapse these edits, however (which is not regularly done before reporting/using revert detection results), the number of false positives will be quite high, as the edits-to-be-collapsed (and prone to being misinterpreted) appear quite often and their span can at times be considerably large. And of course there are cases not related to self-reverts.
I tried to select examples representative for the sample, which received very little or no votes to be full reverts (as detected by MD5):
Example A detected as reverted: http://en.wikipedia.org/w/index.php?&diff=25866415 detected as reverting: http://en.wikipedia.org/w/index.php?diff=25866579
detected-as-reverting edit removes only "insomnia" from the detected-as-reverted edit, i.e. no full revert, as some insertions from previous edits have already been deleted by the reverted editor himself. Would be a correct full revert if you collapsed edits by the reverted editor to one.
Example B
detected as reverted: http://en.wikipedia.org/w/index.php?diff=196507540 detected as reverting: http://en.wikipedia.org/w/index.php?diff=196507775
self-revert of vandalism introduced ("kirsty u tit") before second editor reverts--> cannot be reverted by detected-as-reverted edit. Would also be remedied by collapsing the first editors edits.
Example C not related to self-revert, this is an example of incomplete vandalism repair, which is then subsequently completed:
detected as reverted: http://en.wikipedia.org/w/index.php?diff=162097520 detected as reverting:http://en.wikipedia.org/w/index.php?diff=162113945
Example D not related to self-revert
A revert is carried out by TheJazzDalek targeting the edits by 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39, but in the same edit, something is deleted by TheJazzDalek, leading to a new unique revisions content. As 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39 in the next edit reverts this deletion by TheJazzDalek, but not the initial revert of his (74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39's) own edits, it is erroneously concluded that 74.131.204.39http://en.wikipedia.org/wiki/Special:Contributions/74.131.204.39 reverts himself, which is not the case.
detected as reverted: http://en.wikipedia.org/w/index.php?diff=292533562 detected as reverting: http://en.wikipedia.org/w/index.php?diff=292760323
Example E
detected as reverted: http://en.wikipedia.org/w/index.php?diff=231824943 detected as reverting: http://en.wikipedia.org/w/index.php?diff=231960286
First, reverting editor (Laser brain) undoes (not rolling back to/ not creating duplicate revision) some edits by another editor before deleting the result of 67.162.68.255http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 's edits (one of which was detected here as reverted). The "detected as reverted" revision is partly self-reverted by 67.162.68.http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255255http://en.wikipedia.org/wiki/Special:Contributions/67.162.68.255 . The other part, a date change in an "accessdate=" is not "undone" as such, but the whole "accessdate=" part is deleted (stemming from a third editor).
Example F
Here, between the "reverted" and the "reverting" one, there happens a mixture of self-reverts, reverts and different vandalism forms,:
detected as reverted:http://en.wikipedia.org/w/index.php?diff=131372047 detected as reverting:http://en.wikipedia.org/w/index.php?diff=131658207
If I now failed to answer any of your questions please excuse me and ask me again.
Best,
Fabian
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edumailto:fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Hi Fabian,
That looks interesting, but I wondered if you were aware of some of the possible results when you are editing Wikipedia articles section by section?
If an article has multiple sections then it doesn't matter how many edits have been made to other sections, if you want to undo the most recent edit to a particular section then you can just hit undo or rollback and revert it. The contents of the whole article will be a new and potentially unique revision as one section will have reverted to what it was before it was vandalised and the other sections will be as they were before the latest revert.
You could get some interesting examples by looking at the history of the article on Sarah Palin on the night she became John McCain's running mate. The edit rate peaked at 25 edits per minute, that should make it a good example of an article where edits were only being done one section at a time as anyone who tried to edit the whole article would have been pretty much guaranteed an edit conflict. As I remember it there were multiple edit wars taking place simultaneously in different sections of the article, none would have taken the whole article back to a previous version, just one section.
WereSpielChequers
On 27 June 2012 18:05, Floeck, Fabian (AIFB) fabian.floeck@kit.edu wrote:
For those of you who are interested in reverts: I just presented our paper on accurate revert detection at the ACM Hypertext and Social Media conference 2012, showing a significant accuracy (and coverage) gain compared to the widely used method of finding identical revisions (via MD5 hash values) to detect reverts, proving that our method detects edit pairs that are significantly more likely to be actual reverts according to editors perception of a revert and the Wikipedia definition. 35% of the reverts found by the MD5 method in our sample are not assessed to be reverts by more than 80% of our survey participants (accuracy 0%). The provided new method finds different reverts for these 35% plus 12% more, which show a 70% accuracy.
Find the PDF slides, paper and results here: http://people.aifb.kit.edu/ffl/reverts/
I'll be happy to answer any questions.
More in detail: The MD5 hash method employed by many researchers to identify reverts (as some others, like using edit comments) is acknowledged to produce some inaccuracies as far as the Wikipedia definition of a revert ("reverses the actions of any editors", "undoing the actions"..) is concerned. The extent of these inaccuracies is usually judged to be not too large, as naturally, most reverting edits are carried out immediately after the edit to be reverted, being an "identity revert" (Wikipedia definition: "..*normally* results in the page being restored to a version that existed previously"). Still, there has not been a user evaluation assessing how well the detected reverts conform with the Wikipedia definition and what users actually perceive as a revert. We developed and evaluated an alternative method to the MD5 identity revert and show a significant increase in accuracy (and coverage). 34% of the reverts detected by the MD5 hash method in our sample actually fail to be acknowledged as full reverts by more than 80% of users in our study, while our new method performs much better, finding different reverts for these 34% wrongly detected reverts plus 12% more reverts, showing an accuracy of 70% for these newly found edit pairs actually being reverts according to the users. The increased accuracy performance between the reverts detected only by the MD5 and only by our new method is highly significant, while reverts detected by both methods also perform significantly better than those only detected by the MD5 method.
Trade-off: Although this method is much slower than the MD5 method (as it is using DIFFs between revisions) it reflects much better what users (and the Wikipedia community as a whole) see as a revert. It thereby is a valid alternative if you are interested in the antagonistic relationships between users on a more detailed and accurate level. There is quite some potential to make it even faster by combining the two methods, decreasing the number of DIFFs to be performed, let's see if we can come around doing that :)
The scripts and results listed in the paper can be found at http://people.aifb.kit.edu/ffl/reverts/
Best,
Fabian
-- Karlsruhe Institute of Technology (KIT) Institute of Applied Informatics and Formal Description Methods
Dipl.-Medwiss. Fabian Flöck Research Associate
Building 11.40, Room 222 KIT-Campus South D-76128 Karlsruhe
Phone: +49 721 608 4 6584 Skype: f.floeck_work E-Mail: fabian.floeck@kit.edu WWW: http://www.aifb.kit.edu/web/Fabian_Fl%C3%B6ck
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org