Dear all,
Does anyone have an estimate or know where I could get one about the number of edits performed with Proxy Servers, hopefully broken down by registered and unregistered users?
Thank you, Tom
On Sun, Feb 3, 2019 at 1:18 PM Thomas Stieve tomthirteen@email.arizona.edu wrote:
Dear all,
Does anyone have an estimate or know where I could get one about the number of edits performed with Proxy Servers, hopefully broken down by registered and unregistered users?
Can you define the "Proxy Servers" term and clarify which Wikimedia projects and what level of granularity you are interested in having this information for?
For existing statistics we have https://stats.wikimedia.org/v2/ which is maintained by the Wikimedia Analytics team. More information about that project is available at https://www.mediawiki.org/wiki/Analytics/Wikistats. Lower granularity information would likely require some new research project (https://meta.wikimedia.org/wiki/Research:Index) or collaboration with the Analytics team to compute the information. It may be easier to find help on the https://lists.wikimedia.org/mailman/listinfo/analytics list.
Bryan
Dear Bryan,
I would like to map IP edits to articles. However, I would like to exclude IP address that are invalid, for example by using proxy servers. I am interested in articles in the 2016 Wikipedia for 271 languages at the country level of geolocation.
Any assistance would be greatly appreciated, Tom
On Mon, Feb 4, 2019 at 12:18 PM Bryan Davis bd808@wikimedia.org wrote:
On Sun, Feb 3, 2019 at 1:18 PM Thomas Stieve tomthirteen@email.arizona.edu wrote:
Dear all,
Does anyone have an estimate or know where I could get one about the
number of edits performed with Proxy Servers, hopefully broken down by registered and unregistered users?
Can you define the "Proxy Servers" term and clarify which Wikimedia projects and what level of granularity you are interested in having this information for?
For existing statistics we have https://stats.wikimedia.org/v2/ which is maintained by the Wikimedia Analytics team. More information about that project is available at https://www.mediawiki.org/wiki/Analytics/Wikistats. Lower granularity information would likely require some new research project (https://meta.wikimedia.org/wiki/Research:Index) or collaboration with the Analytics team to compute the information. It may be easier to find help on the https://lists.wikimedia.org/mailman/listinfo/analytics list.
Bryan
Bryan Davis Wikimedia Foundation bd808@wikimedia.org [[m:User:BDavis_(WMF)]] Manager, Technical Engagement Boise, ID USA irc: bd808 v:415.839.6885 x6855
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
Some geolocation databases have a category for proxies, for instance MaxMind uses the A1 code for the "country" Anonymous Proxyhttps:// dev.maxmind.com/geoip/legacy/codes/iso3166/
(Of course, these categorizations would need to be accurate (both proxy detection and country detection) to get meaningful results)
I find your consideration that proxy IP addresses are invalid strange, though. Maybe considering just edits that were not reverted would be a more useful metric?
Thank you. I mean that a proxy IP could denote a different country. I am trying to filter that out.
On Mon, Feb 4, 2019 at 4:27 PM Platonides platonides@gmail.com wrote:
Some geolocation databases have a category for proxies, for instance MaxMind uses the A1 code for the "country" Anonymous Proxyhttps:// dev.maxmind.com/geoip/legacy/codes/iso3166/
(Of course, these categorizations would need to be accurate (both proxy detection and country detection) to get meaningful results)
I find your consideration that proxy IP addresses are invalid strange, though. Maybe considering just edits that were not reverted would be a more useful metric?
Wikimedia Cloud Services mailing list Cloud@lists.wikimedia.org (formerly labs-l@lists.wikimedia.org) https://lists.wikimedia.org/mailman/listinfo/cloud
On Mon, Feb 4, 2019 at 12:09 PM Thomas Stieve tomthirteen@email.arizona.edu wrote:
I would like to map IP edits to articles. However, I would like to exclude IP address that are invalid, for example by using proxy servers. I am interested in articles in the 2016 Wikipedia for 271 languages at the country level of geolocation.
I looked for existing data sets related to your request, but did not find an exact match from the content of https://meta.wikimedia.org/wiki/Research:Data or https://meta.wikimedia.org/wiki/Statistics.
The data in our Wiki Replica databases available in the Cloud Services environment will not have geolocation tagging done already. IP address information will only be available for anonymous edits as a side effect of MediaWiki's use of surrogate usernames for anonymous edits generated from the IP address of the editor. Gathering all IP edits from 2016 across all Wikipedias via the Wiki Replica databases will require quite a large amount of processing power.
Reposting your question on the Analytics mailing list (https://lists.wikimedia.org/mailman/listinfo/analytics) may find more Wikimedian's who have experience in collecting this type of data for analysis.
Bryan