Hi all I am Basil George, a research scholar at IIIT, Hyderabadhttp://iiit.ac.in/, India. I have submitted an IEG proposal to build a tool for easy visualization of revision histories of Wikipedia pages along with their relevant statistics by rendering them onto a map. The tool will also provide suggestions as to which geographical areas lack editors on a particular topic. Free and open source geo-spatial technologies will be used for this project which, we hope, will encourage more technology developers to pitch in and contribute to developing Wikimedia.
Please go through the proposal herehttp://meta.wikimedia.org/wiki/Grants:IEG/Mapping_History:_Revision_History_Visualizer_and_Improvement_Tracker_using_Geo-Spatial_Technologies and do endorse it if you find it interesting.
Looking forward to a good discussion.
Thanks and regards,
On 2013-02-16 1:30 PM, "Basil George" basilgeorge007@gmail.com wrote:
Hi all I am Basil George, a research scholar at IIIT, Hyderabad<
India. I have submitted an IEG proposal to build a tool for easy visualization of revision histories of Wikipedia pages along with their relevant statistics by rendering them onto a map. The tool will also provide suggestions as to which geographical areas lack editors on a particular topic. Free and open source geo-spatial technologies will be used for this project which, we hope, will encourage more technology developers to pitch in and contribute to developing Wikimedia.
Please go through the proposal here<
http://meta.wikimedia.org/wiki/Grants:IEG/Mapping_History:_Revision_History_...
and do endorse it if you find it interesting.
Looking forward to a good discussion.
Thanks and regards,
-- Basil. http://researchweb.iiit.ac.in/~basil.george/ Chance favors the prepared mind. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
First off its nice to see technical ieg's being proposed.
How do you plan to gelocate logged in users. Logged in user's ips are considered sensitive and your tool will not have access to them to do geo ip stuff on. You could maybe try and process user pages but that sounds difficult.
-bawolff On 2013-02-16 1:30 PM, "Basil George" basilgeorge007@gmail.com wrote:
Hi all I am Basil George, a research scholar at IIIT, Hyderabad< http://iiit.ac.in/%3E, India. I have submitted an IEG proposal to build a tool for easy visualization of revision histories of Wikipedia pages along with their relevant statistics by rendering them onto a map. The tool will also provide suggestions as to which geographical areas lack editors on a particular topic. Free and open source geo-spatial technologies will be used for this project which, we hope, will encourage more technology developers to pitch in and contribute to developing Wikimedia.
Please go through the proposal here< http://meta.wikimedia.org/wiki/Grants:IEG/Mapping_History:_Revision_History_...
and do endorse it if you find it interesting.
Looking forward to a good discussion.
Thanks and regards,
-- Basil. http://researchweb.iiit.ac.in/~basil.george/ Chance favors the prepared mind. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 02/16/2013 01:24 PM, Brian Wolff wrote:
How do you plan to gelocate logged in users. Logged in user's ips are considered sensitive and your tool will not have access to them to do geo ip stuff on. You could maybe try and process user pages but that sounds difficult.
Just a random thought to put out there: wouldn't it be possible to create an anonymised dataset for research purposes? That would seriously reduce the amount of sensitivity and, if restricted to fully vetted research purposes, wouldn't be a significant disclosure of private information.
Just thinking out loud, here.
-- Coren / Marc
On 2013-02-16 2:30 PM, "Marc A. Pelletier" marc@uberbox.org wrote:
On 02/16/2013 01:24 PM, Brian Wolff wrote:
How do you plan to gelocate logged in users. Logged in user's ips are considered sensitive and your tool will not have access to them to do geo ip stuff on. You could maybe try and process user pages but that sounds difficult.
Just a random thought to put out there: wouldn't it be possible to
create an anonymised dataset for research purposes? That would seriously reduce the amount of sensitivity and, if restricted to fully vetted research purposes, wouldn't be a significant disclosure of private information.
Just thinking out loud, here.
-- Coren / Marc
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
I imagine that it would be difficult to anonoymise that properly and still have enough data to present something interesting ( however im not particularly versed in data analytics so maybe it is possible)
The original poster (assuming I understood correctly) wanted to make per page maps of where people edited from. Well there are certainly hot button topics that have a very very active edit history, the majority of articles have only a small number of editors. If you start cross referencing user contribs with geo history maps im sure one would be able to find out who is where.
-bawolff
On 02/16/2013 01:38 PM, Brian Wolff wrote:
If you start cross referencing user contribs with geo history maps im sure one would be able to find out who is where.
Yes, I think that's unsurmountable; there are very many articles with few enough distinct contributors that even anonymised the data becomes too specific.
-- Marc
On Sat, Feb 16, 2013 at 11:54 PM, Brian Wolff bawolff@gmail.com wrote:
First off its nice to see technical ieg's being proposed.
How do you plan to gelocate logged in users. Logged in user's ips are considered sensitive and your tool will not have access to them to do geo ip stuff on. You could maybe try and process user pages but that sounds difficult.
-bawolff
*We have tried to address the privacy issues in obtaining IPs herehttp://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_History_Visualizer_and_Improvement_Suggester_using_Geo-Spatial_Technologies#Addressing_privacy_concerns. As we have mentioned, no personal information of individual editors other than their rough geographical locations (at state/country level) will be used and displayed. If the tool is deployed as a MediaWiki gadget, we guess it will be installed on the server where the MediaWiki is hosted. Will there still be privacy issues in obtaining the IPs? In the standalone case where the tool will be used as an independent application, is there any way of obtaining the IPs without violating the privacy terms?*
Thanks,
On 02/16/2013 01:50 PM, Basil George wrote:
*We have tried to address the privacy issues in obtaining IPs herehttp://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_History_Visualizer_and_Improvement_Suggester_using_Geo-Spatial_Technologies#Addressing_privacy_concerns. As we have mentioned, no personal information of individual editors other than their rough geographical locations (at state/country level) will be used and displayed.
Please see my detailed reply at https://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_Hi...
Bottom line, under the privacy policy this is only okay for logged out users, and logged in users who opt in.
Matt Flaschen
On Sun, Feb 17, 2013 at 2:17 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 02/16/2013 01:50 PM, Basil George wrote:
*We have tried to address the privacy issues in obtaining IPs here<
http://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_His...
. As we have mentioned, no personal information of individual editors other than their rough geographical locations (at state/country level) will be used and displayed.
Please see my detailed reply at
https://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_Hi...
Bottom line, under the privacy policy this is only okay for logged out users, and logged in users who opt in.
Matt Flaschen
If there is no way of obtaining the IPs of registered users from history
page, then we propose a few changes to the project idea (talkhttp://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_History_Visualizer_and_Improvement_Suggester_using_Geo-Spatial_Technologies#Addressing_privacy_concerns ):* Support the visualizer app only to those pages having large number of edits so that cross-referencing users with geographical locations become difficult.* Use only those IPs which are publicly available (unregistered editor IPs & registered editor IPs who have chosen to make them public).Though these changes may dilute the overall project idea, we still consider the implementation of this project helpful for future geo-spatial research and development of Wikimedia projects. The source code of this project may help in the implementation of many other projects that may require rendering and analysis of information on maps etc.Thanks and regards,
Well keep in mind it's not just the total number of editors. The requirement would have to be that the number the editors in each independent geographic region would have to either be zero or above a threshold. Meaning if we did geo-analysis based on countries, any country with at least one editor would have to have at least $threhold editors for the page's data to be shown publicly. This is because even on an article with 300 editors, if only two of those editors are from Canada or something, then it would still be easy to identify those two editors.
But if such a restriction like I just mentioned was implemented, I don't think there would be any problems with the data, although the threshold value would have to be chosen very carefully, and I'm not sure the WMF would want to take such a risk (although I can't speak for the WMF since I'm just a volunteer).
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sun, Feb 17, 2013 at 12:25 PM, Basil George basilgeorge007@gmail.comwrote:
On Sun, Feb 17, 2013 at 2:17 AM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 02/16/2013 01:50 PM, Basil George wrote:
*We have tried to address the privacy issues in obtaining IPs here<
http://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_His...
. As we have mentioned, no personal information of individual editors
other
than their rough geographical locations (at state/country level) will
be
used and displayed.
Please see my detailed reply at
https://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_Hi...
Bottom line, under the privacy policy this is only okay for logged out users, and logged in users who opt in.
Matt Flaschen
If there is no way of obtaining the IPs of registered users from history
page, then we propose a few changes to the project idea (talk< http://meta.wikimedia.org/wiki/Grants_talk:IEG/Mapping_History:_Revision_His...
):* Support the visualizer app only to those pages having large number of edits so that cross-referencing users with geographical locations become difficult.* Use only those IPs which are publicly available (unregistered editor IPs & registered editor IPs who have chosen to make them public).Though these changes may dilute the overall project idea, we still consider the implementation of this project helpful for future geo-spatial research and development of Wikimedia projects. The source code of this project may help in the implementation of many other projects that may require rendering and analysis of information on maps etc.Thanks and regards,
-- Basil. http://researchweb.iiit.ac.in/~basil.george/ Chance favors the prepared mind. _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 02/16/2013 12:30 PM, Basil George wrote:
Please go through the proposal herehttp://meta.wikimedia.org/wiki/Grants:IEG/Mapping_History:_Revision_History_Visualizer_and_Improvement_Tracker_using_Geo-Spatial_Technologies and do endorse it if you find it interesting.
I also think you should notify the affected Wikipedias themselves on the village pump (or local equivalents). If this were approved, it would affect users more than developers.
Matt Flaschen
Keep in mind we already do log IP addresses (to an extent, for CheckUser and whatnot), so the issue isn't actually capturing information, it's the use and display of that information, especially since such display would be public. Like Brian said, de-anonymizing such information might not be difficult, *especially* on articles that are edited by only a select group of users, e.g., most Wikipedia articles.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sat, Feb 16, 2013 at 4:00 PM, Matthew Flaschen mflaschen@wikimedia.orgwrote:
On 02/16/2013 12:30 PM, Basil George wrote:
Please go through the proposal here<
http://meta.wikimedia.org/wiki/Grants:IEG/Mapping_History:_Revision_History_...
and do endorse it if you find it interesting.
I also think you should notify the affected Wikipedias themselves on the village pump (or local equivalents). If this were approved, it would affect users more than developers.
Matt Flaschen
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
On 16 February 2013 20:06, Tyler Romeo tylerromeo@gmail.com wrote:
Keep in mind we already do log IP addresses (to an extent, for CheckUser and whatnot), so the issue isn't actually capturing information, it's the use and display of that information, especially since such display would be public. Like Brian said, de-anonymizing such information might not be difficult, *especially* on articles that are edited by only a select group of users, e.g., most Wikipedia articles.
I'm assuming you've added an extra "not" there - for many articles that have a very small number of editors, it would be vanishingly easy to start geolocating people, especially with a couple of cross references.
I'll throw in for the record that geolocation is really problematic for countries with very limited numbers of IPs (which coincidentally are often countries with censorious governments), and there are huge regions where IP data cannot be considered at all accurate: for example, most of the Middle East.
Risker/Anne
Eh, English. But that's what I meant, it would be very easy.
*--* *Tyler Romeo* Stevens Institute of Technology, Class of 2015 Major in Computer Science www.whizkidztech.com | tylerromeo@gmail.com
On Sat, Feb 16, 2013 at 8:17 PM, Risker risker.wp@gmail.com wrote:
On 16 February 2013 20:06, Tyler Romeo tylerromeo@gmail.com wrote:
Keep in mind we already do log IP addresses (to an extent, for CheckUser and whatnot), so the issue isn't actually capturing information, it's the use and display of that information, especially since such display would
be
public. Like Brian said, de-anonymizing such information might not be difficult, *especially* on articles that are edited by only a select
group
of users, e.g., most Wikipedia articles.
I'm assuming you've added an extra "not" there - for many articles that have a very small number of editors, it would be vanishingly easy to start geolocating people, especially with a couple of cross references.
I'll throw in for the record that geolocation is really problematic for countries with very limited numbers of IPs (which coincidentally are often countries with censorious governments), and there are huge regions where IP data cannot be considered at all accurate: for example, most of the Middle East.
Risker/Anne _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
wikitech-l@lists.wikimedia.org