Hi Thorsten, thanks for the question.
I see the shape of both of our graphs is very similar, with some slight
differences in magnitude of some of the peaks. I think both your guess and
Marcel's guess contribute to the small difference. And if you'd like to
quantify it, you can always look at the API where have some of the info.
For example, for bots identified by the regular expression Marcel mentions,
the link is this:
https://wikimedia.org/api/rest_v1/metrics/edits/aggregate/fr.wikipedia.org/…
(you can play around with these based on the documentation of the API
<https://wikitech.wikimedia.org/wiki/Analytics/AQS/Wikistats_2>, and using
XHR logging in the browser console when looking at
stats.wikimedia.org)
For the deletion drift problem you mention, we don't have easily accessible
public data yet, but we are working on it. Right now you'd have to
download your project's mediawiki history dataset
<https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits/Mediawiki_history_dumps>.
This has every edit, whether or not it was deleted, with handy fields that
tell you its deletion status. Loading that up in a database and querying
it should let you answer your questions.
On Fri, Sep 11, 2020 at 10:45 AM Marcel Ruiz Forns <mforns(a)wikimedia.org>
wrote:
Hi Thorsten!
Did you just filter out the editors marked as bots via a userGroup?
We also filter out some editors by username, because some bots are not
marked as such via a userGroup. The regular expression we use is this one
(IIRC):
https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery…
Not sure that's the only source of discrepancy, but could be! Please, let
us know.
thanks!
On Fri, Sep 11, 2020 at 4:22 PM Thorsten Ruprechter <ruprechter(a)tugraz.at>
wrote:
Hello,
I have a question about the "User edits" metric presented on Wikistats,
and would be very grateful for advice regarding an issue we encountered.
We are currently computing some edit metrics for multiple Wikipedia
language versions. However, we realized there is some discrepancy between
our edit count results and the ones reported on Wikistats. It seems that
total edit counts are higher for our data, while trends for daily edits are
also different. As an example, the French Wikipedia:
Wikistats:
https://stats.wikimedia.org/#/fr.wikipedia.org/contributing/user-edits/norm…
Our results (see attachment):
We removed all users marked as bots in the database, and excluded edits
to talk pages, as it is done with the Wikistats edit count metric. I just
now found this note [1]: "The original Wikistats did not count edits if the
page they were made on was deleted. We are doing the same thing in
Wikistats 2 for now, which means you may see metric totals shifting over
time (as pages are deleted)."
Could this be what is causing this rift, or are there other processing
details which we have to consider to reproduce the Wikistats numbers as
closely as possible? On a separate note - are the daily edit counts for all
pages (including deleted articles) accessible somewhere?
thanks, thorsten
[1]
https://meta.wikimedia.org/wiki/Research:Wikistats_metrics/Edits
--
Thorsten Ruprechter
Institute of Interactive Systems and Data Science (ISDS)
Graz University of Technology, Austria
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics
--
*Marcel Ruiz Forns** (he/him)*
Senior Software Engineer
_______________________________________________
Analytics mailing list
Analytics(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/analytics