Hear Hear! There are no problems ever posted on this list. Only solutions! :D
Thank you for those insights! Looks like I have a plan :)
Jérôme
De : Isaac Johnson <isaac(a)wikimedia.org>
À : wiki-research-l(a)lists.wikimedia.org
Sujet : [Wiki-research-l] Re: Collecting reverted edits at user level
Date : 14/04/2023 17:47:56 Europe/Paris
Hi Jérôme,
I wrote a little overview of this a while back that might be of use:
https://meta.wikimedia.org/wiki/User:Isaac_(WMF)/Analysis_gotchas#Reverts_(…
Essentially, the library that Nathan suggested (mwreverts) is great for the
shasum-based approach and you'll need to use edit tags
<https://en.wikipedia.org/wiki/Wikipedia:Tags> to check for additional
tool-based reverts like mw-undo, mw-rollback, etc. I think combining the
two approaches makes the most sense and you can see a bunch more details on
their overlap for English Wikipedia in this task:
https://phabricator.wikimedia.org/T266374
It sounds like you're collecting specific edits so this is probably less
relevant, but I'll also highlight the excellent public dataset put together
by the Wikimedia Foundation Data Engineering team that has the full edit
history for each language edition and includes metadata such as whether the
edit was a revert based on shasums as well as the edit tags. If you were
processing many many edits, I'd suggest starting with this as it would have
all the information you need in one place.
- More details:
https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/MediaWiki_his…
- You can see an example of how to access and process these dumps on
Wikimedia-hosted Jupyter notebooks (PAWS
<https://wikitech.wikimedia.org/wiki/PAWS>) here:
https://public-paws.wmcloud.org/User:Isaac_(WMF)/Denormalized%20Edit%20Hist…
Hope that helps!
Best,
Isaac
On Fri, Apr 14, 2023 at 11:33 AM J. Nathan Matias <natematias(a)gmail.com>
wrote:
Hi Jérôme,
Have you looked at python-mwreverts
<https://github.com/mediawiki-utilities/python-mwreverts>? This library
has
been used by many researchers who are studying reverted edits, and it may
be useful for your work as well.
All the best,
--Nathan
On Fri, Apr 14, 2023 at 11:15 AM <jerome.hergueux(a)mailo.com> wrote:
Sending this again from my current address. Left
Gmail a long time ago --
not sure the redirect still works... My apologies if this is hitting your
inbox twice!
_ _ _ _
Dear Wikimedia research community,
I'd have a question for the data savvy people on this list :)
My goal is simple: for a sample of English Wikipedia editors, I'm trying
to identify their edits which were reverted. I can see two possible way
of
doing this:
1. Identify the reverts using the SHA1 values. (A revert happens when the
edit exactly restores the page to its previous state.)
2. Identify the reverts using the "undo" button.
As I see it, solution 2 is less "precise" (you'll miss some reverts,
e.g.,
those performed manually). However, it would also
be less computationally
intensive, and I don't see that it would introduce any bias (results can
be
compared across editors in a statistical
model).
However, I do not see the information about whether a revision was
reverted using the “undo” button in the enwiki database:
https://www.mediawiki.org/w/index.php?title=Manual:Database_layout/diagram&…
I find this surprising. Am I missing something? (And if so, how do you
personally feel about strategy 1 vs. strategy 2?)
Thank you so much for any insight you might be willing to provide! :D
Sincerely,
Jérôme
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org
To unsubscribe send an email to
wiki-research-l-leave(a)lists.wikimedia.org
--
J. Nathan Matias <http://natematias.com/> : Center for Advanced Study in
the Behavioral Sciences : Cornell University : Citizens and Technology Lab
<https://citizensandtech.org> : social.coop/@natematias : blog
<https://natematias.com/external-posts/> : daylight time photos
<https://social.coop/@natematias/109423664679446879>
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-leave(a)lists.wikimedia.org
--
Isaac Johnson (he/him/his) -- Senior Research Scientist -- Wikimedia
Foundation
_______________________________________________
Wiki-research-l mailing list -- wiki-research-l(a)lists.wikimedia.org
To unsubscribe send an email to wiki-research-l-leave(a)lists.wikimedia.org