All,
these are highlights from a session the Wikimedia Foundation’s Research & Data team hosted at CSCW ’14 in Baltimore. The audience was a group of researchers either working on Wikipedia/Wikimedia-related research projects or interested in learning about opportunities to collaborate with the Foundation.
Feel free to get in touch if you have any questions/comments. Contact Dario Taraborelli - dario@wikimedia.org Aaron Halfaker - ahalfaker@wikimedia.org Jonathan Morgan - jmorgan@wikimedia.org
IRC: irc://irc.freenode.net/wikimedia-research (webclient)
Mailing list: wiki-research-l (mailing list)
Resources We gave a short overview of existing resources of potential interest to Wikipedia/Wikimedia researchers:
OAuth allows 3rd-party software to edit Wikipedia on behalf of a Wikipedia editor and it’s a (mostly untapped) opportunity to run experimental research or test new interfaces targeted at Wikipedians. See: https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuth Data portal summarizes data sources that are currently available to researchers and app developers. See: https://meta.wikimedia.org/wiki/Research:Data Wikimedia Research Newsletter: A monthly overview reviewing or summarizing recent research (contributions are welcome, please contact Dario if you’re interested in contributing) https://meta.wikimedia.org/wiki/Research:Newsletter Subject recruitment. Aaron and Dario have managed a process for documenting and vetting subject recruitment occurring on Wikimedia projects. This process was set in place to help resolve the tension between researchers’ need to recruit subjects and editors’ desire to not be bothered. The process involves a public discussion and mentorship in order to ensure that proposed studies that affect editors are well documented, are addressing original questions and do not result in unnecessary disruption of wiki work. This is a service we’ve been providing on a volunteer basis as members of the Research Committee, it’s meant to offer support to researchers but doesn’t eliminate the risk that an account used for recruitment purposes might be blocked by an administrator. IRBs and minors. One of the issues that we discussed is dealing with IRB & other ethics boards’ requirements when studies may result in interaction with minors. Aaron ahalfaker@wikimedia.org is willing to discuss the issue with researchers and university staff upon request. Annual survey modules. Interest was expressed in exploring strategies for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar.
WikiResearch Workshop at CSCW 2015. We discussed planning a workshop for CSCW next year. Anyone who is interested in collaborating, please contact us. Details are TBD, but our general goals include: increase awareness of the public data resources that are available highlight research areas that are ripe for investigation, esp. where WMF could benefit from the results get a better sense of what kind of data resources (and/or what data formats) researchers would like to have brainstorm a (lightweight, ethical, practical) model for partnership between WMF and academic research orgs that want access to certain non-public data
Wiki Research Hackathons. On Nov. 9th, 2013, we held our first global research hackathon (announcement). We had universities and other local meetups from around the world connect via Google Hangout to share ideas, data and presentations geared toward datasets, code and other resources. We’ll be planning another hackathon in the coming months. You can help by hosting or attending your own local event. Please contact us if you’re interested.
Public listing on WMF’s strategic research questions. We discussed the potential for the Wikimedia Foundation to list out key areas of research that we are interested in. This is something we are keenly interested in and you should expect to hear from us soon through wiki-research-l and @WikiResearch.
Tweet @WikiResearch. We maintain a relatively high-visibility twitter account from which we tweet about new research, data, and other initiatives. If you tweet about your own wiki-related work @WikiResearch, we will retweet it so long as it’s relevant. We will also experiment with the use of this Twitter handle to increase the visibility of libraries and analytics tools to support Wikipedia research.
Internships/grad student residencies. We talked briefly about research collaborations, internships and other forms of work opportunities at WMF. We’re actively exploring possibilities and will broadcast details through wiki-research-l and @WikiResearch when we know more.
We’re hiring. We are looking to expand the research team at WMF, if you are interested in working with us keep an eye on wiki-research-l and @WikiResearch for job openings or contact us off-list.
Hi Dario,
Nice report. I have some questions related to the annual survey modules.
Does Analytics have any ideas to contribute to how to stabilize and increase the population of active editors and to improve editor gender diversity? There were relevant blog posts at [1] and [2]. I would like to hear how data and analysis of that survey have been used in areas outside of VE development and any other ideas Analytics has about improving population size and gender diversity.
Were there any follow ups to the "annual" editor survey from 2011? A blog post [3] says the survey was anticipated to be annual. There is a page about a 2012 annual survey on Meta [4] but no results are posted and it appears no follow up surveys were completed in 2012 or 2013. [5]
Thanks,
Pine
[1] https://blog.wikimedia.org/2012/06/29/editor-survey-lack-of-time-and-unpleas... [2] https://blog.wikimedia.org/2012/04/27/nine-out-of-ten-wikipedians-continue-t... [3] https://blog.wikimedia.org/2011/12/07/launching-the-second-annual-wikipedia-... [4] https://meta.wikimedia.org/wiki/Research:Wikipedia_Editor_Survey_2012#Result... [5] https://meta.wikimedia.org/wiki/Research:Projects
From: dtaraborelli@wikimedia.org Date: Wed, 5 Mar 2014 17:25:51 -0800 To: analytics-internal@lists.wikimedia.org; wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] Notes from the wiki research session at CSCW '14
All, these are highlights from a session the Wikimedia Foundation’s Research & Data team hosted at CSCW ’14 in Baltimore. The audience was a group of researchers either working on Wikipedia/Wikimedia-related research projects or interested in learning about opportunities to collaborate with the Foundation. Feel free to get in touch if you have any questions/comments.ContactDario Taraborelli - dario@wikimedia.orgAaron Halfaker - ahalfaker@wikimedia.orgJonathan Morgan - jmorgan@wikimedia.org IRC: irc://irc.freenode.net/wikimedia-research (webclient) Mailing list: wiki-research-l (mailing list) ResourcesWe gave a short overview of existing resources of potential interest to Wikipedia/Wikimedia researchers: OAuth allows 3rd-party software to edit Wikipedia on behalf of a Wikipedia editor and it’s a (mostly untapped) opportunity to run experimental research or test new interfaces targeted at Wikipedians. See: https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuthData portal summarizes data sources that are currently available to researchers and app developers. See: https://meta.wikimedia.org/wiki/Research:DataWikimedia Research Newsletter: A monthly overview reviewing or summarizing recent research (contributions are welcome, please contact Dario if you’re interested in contributing) https://meta.wikimedia.org/wiki/Research:Newsletter Subject recruitment. Aaron and Dario have managed a process for documenting and vetting subject recruitment occurring on Wikimedia projects. This process was set in place to help resolve the tension between researchers’ need to recruit subjects and editors’ desire to not be bothered. The process involves a public discussion and mentorship in order to ensure that proposed studies that affect editors are well documented, are addressing original questions and do not result in unnecessary disruption of wiki work. This is a service we’ve been providing on a volunteer basis as members of the Research Committee, it’s meant to offer support to researchers but doesn’t eliminate the risk that an account used for recruitment purposes might be blocked by an administrator. IRBs and minors. One of the issues that we discussed is dealing with IRB & other ethics boards’ requirements when studies may result in interaction with minors. Aaron ahalfaker@wikimedia.org is willing to discuss the issue with researchers and university staff upon request. Annual survey modules. Interest was expressed in exploring strategies for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar. WikiResearch Workshop at CSCW 2015. We discussed planning a workshop for CSCW next year. Anyone who is interested in collaborating, please contact us. Details are TBD, but our general goals include: increase awareness of the public data resources that are availablehighlight research areas that are ripe for investigation, esp. where WMF could benefit from the resultsget a better sense of what kind of data resources (and/or what data formats) researchers would like to havebrainstorm a (lightweight, ethical, practical) model for partnership between WMF and academic research orgs that want access to certain non-public data Wiki Research Hackathons. On Nov. 9th, 2013, we held our first global research hackathon (announcement). We had universities and other local meetups from around the world connect via Google Hangout to share ideas, data and presentations geared toward datasets, code and other resources. We’ll be planning another hackathon in the coming months. You can help by hosting or attending your own local event. Please contact us if you’re interested. Public listing on WMF’s strategic research questions. We discussed the potential for the Wikimedia Foundation to list out key areas of research that we are interested in. This is something we are keenly interested in and you should expect to hear from us soon through wiki-research-l and @WikiResearch. Tweet @WikiResearch. We maintain a relatively high-visibility twitter account from which we tweet about new research, data, and other initiatives. If you tweet about your own wiki-related work @WikiResearch, we will retweet it so long as it’s relevant. We will also experiment with the use of this Twitter handle to increase the visibility of libraries and analytics tools to support Wikipedia research. Internships/grad student residencies. We talked briefly about research collaborations, internships and other forms of work opportunities at WMF. We’re actively exploring possibilities and will broadcast details through wiki-research-l and @WikiResearch when we know more. We’re hiring. We are looking to expand the research team at WMF, if you are interested in working with us keep an eye on wiki-research-l and @WikiResearch for job openings or contact us off-list. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hi Pine
On Mar 5, 2014, at 11:43 PM, ENWP Pine deyntestiss@hotmail.com wrote:
Does Analytics have any ideas to contribute to how to stabilize and increase the population of active editors and to improve editor gender diversity? There were relevant blog posts at [1] and [2]. I would like to hear how data and analysis of that survey have been used in areas outside of VE development and any other ideas Analytics has about improving population size and gender diversity.
not to my knowledge, other than VE I cannot think of other areas in which survey results about diversity have driven Product design, which today is primarily focused on user acquisition and new editor activation experiments. You should look outside of Product (notably, Programs & Grantmaking) for projects like the Teahouse that are much more geared towards diversity, but I am sure you are already familiar with them.
There’s also a related (and less known) project that was piloted a few months ago to try and gauge gender gap in specific segments of the editor population or editor lifecycle via microsurveys. [1] I’d love to hear from other parties interested in using this model, which I think is promising.
<shameless plug>I am also writing up a proposal for a Wikimania talk [2] about targeted acquisitions [3]. It’s still a stub for now but once it’s more fleshed out I’ll post it to the list to get feedback on the possible use of offsite acquisition campaigns as a leverage to increase the diversity of the Wikimedian population</shameless plug>
[1] https://meta.wikimedia.org/wiki/Research:Gender_micro-survey [2] https://wikimania2014.wikimedia.org/wiki/Submissions/The_missing_Wikipedia_a... [3] https://meta.wikimedia.org/wiki/Targeted_acquisition_campaigns
Were there any follow ups to the "annual" editor survey from 2011? A blog post [3] says the survey was anticipated to be annual. There is a page about a 2012 annual survey on Meta [4] but no results are posted and it appears no follow up surveys were completed in 2012 or 2013. [5]
As Tilman noted in the section of the report about surveys, at this stage it’s not clear if there’s bandwidth to run these surveys on an annual basis.
Dario
Thanks,
Pine
[1] https://blog.wikimedia.org/2012/06/29/editor-survey-lack-of-time-and-unpleas... [2] https://blog.wikimedia.org/2012/04/27/nine-out-of-ten-wikipedians-continue-t... [3] https://blog.wikimedia.org/2011/12/07/launching-the-second-annual-wikipedia-... [4]https://meta.wikimedia.org/wiki/Research:Wikipedia_Editor_Survey_2012#Result... [5] https://meta.wikimedia.org/wiki/Research:Projects
From: dtaraborelli@wikimedia.org Date: Wed, 5 Mar 2014 17:25:51 -0800 To: analytics-internal@lists.wikimedia.org; wiki-research-l@lists.wikimedia.org Subject: [Wiki-research-l] Notes from the wiki research session at CSCW '14
All,
these are highlights from a session the Wikimedia Foundation’s Research & Data team hosted at CSCW ’14 in Baltimore. The audience was a group of researchers either working on Wikipedia/Wikimedia-related research projects or interested in learning about opportunities to collaborate with the Foundation.
Feel free to get in touch if you have any questions/comments. Contact
Dario Taraborelli - dario@wikimedia.org Aaron Halfaker - ahalfaker@wikimedia.org Jonathan Morgan - jmorgan@wikimedia.org
IRC: irc://irc.freenode.net/wikimedia-research (webclient)
Mailing list: wiki-research-l (mailing list)
Resources
We gave a short overview of existing resources of potential interest to Wikipedia/Wikimedia researchers:
OAuth allows 3rd-party software to edit Wikipedia on behalf of a Wikipedia editor and it’s a (mostly untapped) opportunity to run experimental research or test new interfaces targeted at Wikipedians. See: https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuth Data portal summarizes data sources that are currently available to researchers and app developers. See: https://meta.wikimedia.org/wiki/Research:Data Wikimedia Research Newsletter: A monthly overview reviewing or summarizing recent research (contributions are welcome, please contact Dario if you’re interested in contributing) https://meta.wikimedia.org/wiki/Research:Newsletter Subject recruitment. Aaron and Dario have managed a process for documenting and vetting subject recruitment occurring on Wikimedia projects. This process was set in place to help resolve the tension between researchers’ need to recruit subjects and editors’ desire to not be bothered. The process involves a public discussion and mentorship in order to ensure that proposed studies that affect editors are well documented, are addressing original questions and do not result in unnecessary disruption of wiki work. This is a service we’ve been providing on a volunteer basis as members of the Research Committee, it’s meant to offer support to researchers but doesn’t eliminate the risk that an account used for recruitment purposes might be blocked by an administrator.
IRBs and minors. One of the issues that we discussed is dealing with IRB & other ethics boards’ requirements when studies may result in interaction with minors. Aaron ahalfaker@wikimedia.org is willing to discuss the issue with researchers and university staff upon request.
Annual survey modules. Interest was expressed in exploring strategies for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar.
WikiResearch Workshop at CSCW 2015. We discussed planning a workshop for CSCW next year. Anyone who is interested in collaborating, please contact us. Details are TBD, but our general goals include: increase awareness of the public data resources that are available highlight research areas that are ripe for investigation, esp. where WMF could benefit from the results get a better sense of what kind of data resources (and/or what data formats) researchers would like to have brainstorm a (lightweight, ethical, practical) model for partnership between WMF and academic research orgs that want access to certain non-public data
Wiki Research Hackathons. On Nov. 9th, 2013, we held our first global research hackathon (announcement). We had universities and other local meetups from around the world connect via Google Hangout to share ideas, data and presentations geared toward datasets, code and other resources. We’ll be planning another hackathon in the coming months. You can help by hosting or attending your own local event. Please contact us if you’re interested.
Public listing on WMF’s strategic research questions. We discussed the potential for the Wikimedia Foundation to list out key areas of research that we are interested in. This is something we are keenly interested in and you should expect to hear from us soon through wiki-research-l and @WikiResearch.
Tweet @WikiResearch. We maintain a relatively high-visibility twitter account from which we tweet about new research, data, and other initiatives. If you tweet about your own wiki-related work @WikiResearch, we will retweet it so long as it’s relevant. We will also experiment with the use of this Twitter handle to increase the visibility of libraries and analytics tools to support Wikipedia research.
Internships/grad student residencies. We talked briefly about research collaborations, internships and other forms of work opportunities at WMF. We’re actively exploring possibilities and will broadcast details through wiki-research-l and @WikiResearch when we know more.
We’re hiring. We are looking to expand the research team at WMF, if you are interested in working with us keep an eye on wiki-research-l and @WikiResearch for job openings or contact us off-list. _______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi Dario, When you look at the statistics [1], you find that the number of page views in English is going down faster than in the other languages combined. You also find that the percentage of readers for the top ten Wikipedias in size is slowly but surely decreasing (now at 88.94%). How can we decrease this percentage even more without sacrificing the number of page views for the top 10?
Has there been any research in how we can stimulate the growth in Wikipedias that are not part of the top 10%. Do we know to what extend the English Wikipedia model works for these other languages or is a hindrance. Do we know what people are looking for in the smaller Wikipedias and do we know what they do / do not find. Do we know how people find articles in those languages, does this work in the same way as it does for English? Is it possible that we have to cultivate contacts with the local "Googles" in order to grow attention for what we have to offer.
Do we know what the effect is of the new search engine that is much better at providing results in other scripts? Do we know to what extend inter language links are created and, do we know how this has changed since the move to Wikidata?
Dario, can you please tell us to what extend the other languages are studied at all? Do we know what effect they have? Do we know about the experience of these Wikipedias locally? Do we care about the typography in other scripts? Do we know about the NPOV in the small projects? Do we know about gender diversity in the smaller languages. How about cultural bias and how does this compare to the cultural bias in the big projects?
Dario there is so much that we do not know, have not touched. Why study more of what has been studied to death? Thanks, GerardM
On 6 March 2014 02:25, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
All,
these are highlights from a session the Wikimedia Foundation’s Research & Data team https://www.mediawiki.org/wiki/Analytics/Research_and_Datahosted at CSCW ’14 in Baltimore. The audience was a group of researchers either working on Wikipedia/Wikimedia-related research projects or interested in learning about opportunities to collaborate with the Foundation.
Feel free to get in touch if you have any questions/comments. Contact
- Dario Taraborelli - dario@wikimedia.org
- Aaron Halfaker - ahalfaker@wikimedia.org
- Jonathan Morgan - jmorgan@wikimedia.org
IRC: irc://irc.freenode.net/wikimedia-research (webclienthttp://webchat.freenode.net/?channels=#wikimedia-research )
Mailing list: wiki-research-l (mailing listhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l )
Resources We gave a short overview of existing resources of potential interest to Wikipedia/Wikimedia researchers:
- OAuth allows 3rd-party software to edit Wikipedia on behalf of a
Wikipedia editor and it’s a (mostly untapped) opportunity to run experimental research or test new interfaces targeted at Wikipedians. See: https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuth
- Data portal summarizes data sources that are currently available to
researchers and app developers. See: https://meta.wikimedia.org/wiki/Research:Data
- Wikimedia Research Newsletter: A monthly overview reviewing or
summarizing recent research (contributions are welcome, please contact Dario if you’re interested in contributing) https://meta.wikimedia.org/wiki/Research:Newsletter
Subject recruitment. Aaron and Dario have managed a processhttps://meta.wikimedia.org/wiki/Research:Subject_recruitmentfor documenting and vetting subject recruitment occurring on Wikimedia projects. This process was set in place to help resolve the tension between researchers’ need to recruit subjects and editors’ desire to not be bothered. The process involves a public discussion and mentorship in order to ensure that proposed studies that affect editors are well documented, are addressing original questions and do not result in unnecessary disruption of wiki work. This is a service we’ve been providing on a volunteer basis as members of the Research Committee, it’s meant to offer support to researchers but doesn’t eliminate the risk that an account used for recruitment purposes might be blocked by an administrator. IRBs and minors. One of the issues that we discussed is dealing with IRB & other ethics boards’ requirements when studies may result in interaction with minors. Aaron ahalfaker@wikimedia.org is willing to discuss the issue with researchers and university staff upon request. Annual survey modules. Interest was expressed in exploring strategies for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar. WikiResearch Workshop at CSCW 2015. We discussed planning a workshop for CSCW next year. Anyone who is interested in collaborating, please contact us. Details are TBD, but our general goals include:
- increase awareness of the public data resources that are available
- highlight research areas that are ripe for investigation, esp. where
WMF could benefit from the results
- get a better sense of what kind of data resources (and/or what data
formats) researchers would like to have
- brainstorm a (lightweight, ethical, practical) model for partnership
between WMF and academic research orgs that want access to certain non-public data
Wiki Research Hackathons. On Nov. 9th, 2013, we held our first global research hackathon (announcementhttps://meta.wikimedia.org/wiki/Research:Labs2/Hackathons/November_9th,_2013). We had universities and other local meetups from around the world connect via Google Hangout to share ideas, data and presentations geared toward datasets, code and other resources. We’ll be planning another hackathon in the coming months. You can help by hosting or attending your own local event. Please contact us if you’re interested.
Public listing on WMF’s strategic research questions. We discussed the potential for the Wikimedia Foundation to list out key areas of research that we are interested in. This is something we are keenly interested in and you should expect to hear from us soon through wiki-research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-land @WikiResearch https://twitter.com/WikiResearch.
Tweet @WikiResearch. We maintain a relatively high-visibility twitter account from which we tweet about new research, data, and other initiatives. If you tweet about your own wiki-related work @WikiResearchhttps://twitter.com/WikiResearch, we will retweet it so long as it’s relevant. We will also experiment with the use of this Twitter handle to increase the visibility of libraries and analytics tools to support Wikipedia research.
Internships/grad student residencies. We talked briefly about research collaborations, internships and other forms of work opportunities at WMF. We’re actively exploring possibilities and will broadcast details through wiki-research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-land @WikiResearch https://twitter.com/WikiResearch when we know more.
We’re hiring. We are looking to expand the research team at WMF, if you are interested in working with us keep an eye on wiki-research-l and @WikiResearch for job openings or contact us off-list.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi, Forgot to include the URL
[1] http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
Sorry, Gerard
On 6 March 2014 09:17, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi Dario, When you look at the statistics [1], you find that the number of page views in English is going down faster than in the other languages combined. You also find that the percentage of readers for the top ten Wikipedias in size is slowly but surely decreasing (now at 88.94%). How can we decrease this percentage even more without sacrificing the number of page views for the top 10?
Has there been any research in how we can stimulate the growth in Wikipedias that are not part of the top 10%. Do we know to what extend the English Wikipedia model works for these other languages or is a hindrance. Do we know what people are looking for in the smaller Wikipedias and do we know what they do / do not find. Do we know how people find articles in those languages, does this work in the same way as it does for English? Is it possible that we have to cultivate contacts with the local "Googles" in order to grow attention for what we have to offer.
Do we know what the effect is of the new search engine that is much better at providing results in other scripts? Do we know to what extend inter language links are created and, do we know how this has changed since the move to Wikidata?
Dario, can you please tell us to what extend the other languages are studied at all? Do we know what effect they have? Do we know about the experience of these Wikipedias locally? Do we care about the typography in other scripts? Do we know about the NPOV in the small projects? Do we know about gender diversity in the smaller languages. How about cultural bias and how does this compare to the cultural bias in the big projects?
Dario there is so much that we do not know, have not touched. Why study more of what has been studied to death? Thanks, GerardM
On 6 March 2014 02:25, Dario Taraborelli dtaraborelli@wikimedia.orgwrote:
All,
these are highlights from a session the Wikimedia Foundation’s Research & Data team https://www.mediawiki.org/wiki/Analytics/Research_and_Datahosted at CSCW ’14 in Baltimore. The audience was a group of researchers either working on Wikipedia/Wikimedia-related research projects or interested in learning about opportunities to collaborate with the Foundation.
Feel free to get in touch if you have any questions/comments. Contact
- Dario Taraborelli - dario@wikimedia.org
- Aaron Halfaker - ahalfaker@wikimedia.org
- Jonathan Morgan - jmorgan@wikimedia.org
IRC: irc://irc.freenode.net/wikimedia-research (webclienthttp://webchat.freenode.net/?channels=#wikimedia-research )
Mailing list: wiki-research-l (mailing listhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-l )
Resources We gave a short overview of existing resources of potential interest to Wikipedia/Wikimedia researchers:
- OAuth allows 3rd-party software to edit Wikipedia on behalf of a
Wikipedia editor and it’s a (mostly untapped) opportunity to run experimental research or test new interfaces targeted at Wikipedians. See: https://www.mediawiki.org/wiki/Extension:OAuth#Using_OAuth
- Data portal summarizes data sources that are currently available to
researchers and app developers. See: https://meta.wikimedia.org/wiki/Research:Data
- Wikimedia Research Newsletter: A monthly overview reviewing or
summarizing recent research (contributions are welcome, please contact Dario if you’re interested in contributing) https://meta.wikimedia.org/wiki/Research:Newsletter
Subject recruitment. Aaron and Dario have managed a processhttps://meta.wikimedia.org/wiki/Research:Subject_recruitmentfor documenting and vetting subject recruitment occurring on Wikimedia projects. This process was set in place to help resolve the tension between researchers’ need to recruit subjects and editors’ desire to not be bothered. The process involves a public discussion and mentorship in order to ensure that proposed studies that affect editors are well documented, are addressing original questions and do not result in unnecessary disruption of wiki work. This is a service we’ve been providing on a volunteer basis as members of the Research Committee, it’s meant to offer support to researchers but doesn’t eliminate the risk that an account used for recruitment purposes might be blocked by an administrator. IRBs and minors. One of the issues that we discussed is dealing with IRB & other ethics boards’ requirements when studies may result in interaction with minors. Aaron ahalfaker@wikimedia.org is willing to discuss the issue with researchers and university staff upon request. Annual survey modules. Interest was expressed in exploring strategies for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar. WikiResearch Workshop at CSCW 2015. We discussed planning a workshop for CSCW next year. Anyone who is interested in collaborating, please contact us. Details are TBD, but our general goals include:
- increase awareness of the public data resources that are available
- highlight research areas that are ripe for investigation, esp.
where WMF could benefit from the results
- get a better sense of what kind of data resources (and/or what data
formats) researchers would like to have
- brainstorm a (lightweight, ethical, practical) model for
partnership between WMF and academic research orgs that want access to certain non-public data
Wiki Research Hackathons. On Nov. 9th, 2013, we held our first global research hackathon (announcementhttps://meta.wikimedia.org/wiki/Research:Labs2/Hackathons/November_9th,_2013). We had universities and other local meetups from around the world connect via Google Hangout to share ideas, data and presentations geared toward datasets, code and other resources. We’ll be planning another hackathon in the coming months. You can help by hosting or attending your own local event. Please contact us if you’re interested.
Public listing on WMF’s strategic research questions. We discussed the potential for the Wikimedia Foundation to list out key areas of research that we are interested in. This is something we are keenly interested in and you should expect to hear from us soon through wiki-research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-land @WikiResearch https://twitter.com/WikiResearch.
Tweet @WikiResearch. We maintain a relatively high-visibility twitter account from which we tweet about new research, data, and other initiatives. If you tweet about your own wiki-related work @WikiResearchhttps://twitter.com/WikiResearch, we will retweet it so long as it’s relevant. We will also experiment with the use of this Twitter handle to increase the visibility of libraries and analytics tools to support Wikipedia research.
Internships/grad student residencies. We talked briefly about research collaborations, internships and other forms of work opportunities at WMF. We’re actively exploring possibilities and will broadcast details through wiki-research-lhttps://lists.wikimedia.org/mailman/listinfo/wiki-research-land @WikiResearch https://twitter.com/WikiResearch when we know more.
We’re hiring. We are looking to expand the research team at WMF, if you are interested in working with us keep an eye on wiki-research-l and @WikiResearch for job openings or contact us off-list.
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Hoi Gerard,
thanks for the gigantic list of questions – comments inline
On Mar 6, 2014, at 12:17 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi Dario, When you look at the statistics [1], you find that the number of page views in English is going down faster than in the other languages combined. You also find that the percentage of readers for the top ten Wikipedias in size is slowly but surely decreasing (now at 88.94%). How can we decrease this percentage even more without sacrificing the number of page views for the top 10?
I guess you saw our report on 2013 traffic trends [1], page views have been following a downward trend in 2013, but unique visitors as measured via comScore have been steadily growing over time and we have no evidence to date of a change in that trend, after controlling for seasonality. We are working with the analytics engineers to have more reliable data about traffic to be able to accurately answer these questions, including breaking down readership trends by country, project, device and source.
[1] https://www.mediawiki.org/wiki/File:2013_Wikimedia_traffic_trends.pdf
Has there been any research in how we can stimulate the growth in Wikipedias that are not part of the top 10%. Do we know to what extend the English Wikipedia model works for these other languages or is a hindrance. Do we know what people are looking for in the smaller Wikipedias and do we know what they do / do not find. Do we know how people find articles in those languages, does this work in the same way as it does for English? Is it possible that we have to cultivate contacts with the local “Googles" in order to grow attention for what we have to offer.
speaking for Analytics/Research & Data, we haven’t done a lot of original research let alone experimentation on small Wikipedias. I expect request logs and search logs will provide useful data to understand how people find articles on these projects.
Do we know what the effect is of the new search engine that is much better at providing results in other scripts? Do we know to what extend inter language links are created and, do we know how this has changed since the move to Wikidata? Dario, can you please tell us to what extend the other languages are studied at all? Do we know what effect they have? Do we know about the experience of these Wikipedias locally? Do we care about the typography in other scripts? Do we know about the NPOV in the small projects? Do we know about gender diversity in the smaller languages. How about cultural bias and how does this compare to the cultural bias in the big projects? Dario there is so much that we do not know, have not touched.
amen to that.
Why study more of what has been studied to death?
I am not sure I understand your question, but if you are suggesting that we need to find better ways to pitch unexplored research to the wiki research community I am down with that. It’s sad that we haven’t found a good model to create a speed dating system to match research questions and researchers, but many people on this list as well as those who served on the research committee have expressed a lot of interest in fixing this problem. Do you want to help and do you have any example of strategies that you think might be successful?
Dario
Hoi Dario,
You ask if I want to help. <grin> I do and, I have things to give and I have things to ask, so let us do a bit of both for best effect </grin>
On research data. Much of the research data has equivalent information in Wikidata. When you research for gender diversity for instance, articles are identified to be about "human" and "sex, gender". Where Wikidata does NOT have that information, it should be updated as a matter of principle. The reason is that with such an update in Wikidata the information for other languages through the "inter language links" will grow the gender information for other languages as well. This enables the same analysis to some extend for those other languages.
When you need to query Wikidata, WDQ the tool that does query Wikidata for many, many months was updated today and it allows you to query on the "qualifiers" as well [1]. This is why there is an argument to be made to use Wikidata for data analysis and research exclusively.
In the previous research newsletter, the research on Wikidata and interwiki links between English and Portuguese Wikipedia was largely dismissed because "Wikidata had changed the game". Wikidata does not change the game when you compare only between two languages. What I think I observe in Wikidata is that there are fewer people working on inter language links, not more. I also notice that the number of Wikipedia articles without an item in Wikidata is growing. We have had bots run on the Indian Wikipedias to add items and they took surprisingly long to run.
When you consider " gender diversity" for instance as a subject for research, what I observe is that the same research is repeated and repeated again. For me it hardly qualifies as relevant; when using WDQ you can have up to date information whenever you want it. It start to qualify for me when it states that the baseline had a percentage and a number of males/females combined with a moment where the percentage has changed and the number of males/females identified have changed.
When you want to research a specific language, any language at that, all articles need to be represented with an item. It is best Wikidata practice anyway. The way to work is then to first set the base line, get the numbers that are relevant to the research and then do the analysis on the raw data (ie Wikipedia) this results in updates in Wikidata and this allows for the same queries to be run to understand what the numbers mean. Yes, I do understand that you make use of subsets of data to do research. It just happens that WDQ uses its own database that gets updated from Wikidata. It would be totally unreasonable to think that this database cannot be manipulated. Also you can have your own instances of this database and have WDQ run on that (you will be the first one to actually try this but hey this is research). So yes, you can preserve your dataset and yes you can compare it to what happens in the wild (ie outside of the chosen subset as well).
When you research the smaller languages, their needs and their coverage, you have to appreciate that English cannot be the yardstick to measure by. The rest of the world uses meters and, en.wp does not even cover 50% of the subjects that are known to Wikidata. The WMF does know what people search for and do not find. That is to say, the numbers exist but are not available for analysis. When you rank them, you learn what people are looking for. Making Wikidata items out of them is the quickest way to provide initial information for that language and on that subject when "Wikidata search" is enabled on a Wikipedia.
Dario, this is actionable information that we do not have. Research that leads to actionable results is imho the most relevant research.
As to studying things to death, given that en.wp is what research is about, the numbers are only relevant to the extend that en.wp is relevant. My point is very much that its relevance is decreasing in favour of all the other languages. The consequence is that investments that are en.wp centred do not have the effect that is expected elsewhere. Investment in other languages, cultures and countries are likely to have a bigger return on investment. Particularly when the investments, the research is about stimulating growth and growth.
Thanks, GerardM
[1] http://magnusmanske.de/wordpress/?p=178
On 7 March 2014 01:46, Dario Taraborelli dtaraborelli@wikimedia.org wrote:
Hoi Gerard,
thanks for the gigantic list of questions – comments inline
On Mar 6, 2014, at 12:17 AM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi Dario, When you look at the statistics [1], you find that the number of page views in English is going down faster than in the other languages combined. You also find that the percentage of readers for the top ten Wikipedias in size is slowly but surely decreasing (now at 88.94%). How can we decrease this percentage even more without sacrificing the number of page views for the top 10?
I guess you saw our report on 2013 traffic trends [1], page views have been following a downward trend in 2013, but unique visitors as measured via comScore have been steadily growing over time and we have no evidence to date of a change in that trend, after controlling for seasonality. We are working with the analytics engineers to have more reliable data about traffic to be able to accurately answer these questions, including breaking down readership trends by country, project, device and source.
[1] https://www.mediawiki.org/wiki/File:2013_Wikimedia_traffic_trends.pdf
Has there been any research in how we can stimulate the growth in Wikipedias that are not part of the top 10%. Do we know to what extend the English Wikipedia model works for these other languages or is a hindrance. Do we know what people are looking for in the smaller Wikipedias and do we know what they do / do not find. Do we know how people find articles in those languages, does this work in the same way as it does for English? Is it possible that we have to cultivate contacts with the local “Googles" in order to grow attention for what we have to offer.
speaking for Analytics/Research & Data, we haven’t done a lot of original research let alone experimentation on small Wikipedias. I expect request logs and search logs will provide useful data to understand how people find articles on these projects.
Do we know what the effect is of the new search engine that is much better at providing results in other scripts? Do we know to what extend inter language links are created and, do we know how this has changed since the move to Wikidata? Dario, can you please tell us to what extend the other languages are studied at all? Do we know what effect they have? Do we know about the experience of these Wikipedias locally? Do we care about the typography in other scripts? Do we know about the NPOV in the small projects? Do we know about gender diversity in the smaller languages. How about cultural bias and how does this compare to the cultural bias in the big projects? Dario there is so much that we do not know, have not touched.
amen to that.
Why study more of what has been studied to death?
I am not sure I understand your question, but if you are suggesting that we need to find better ways to pitch unexplored research to the wiki research community I am down with that. It’s sad that we haven’t found a good model to create a speed dating system to match research questions and researchers, but many people on this list as well as those who served on the research committee have expressed a lot of interest in fixing this problem. Do you want to help and do you have any example of strategies that you think might be successful?
Dario
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
I agree having a dataset that indicated "gender" of an article would be useful in research. However, I am a bit unsure how this should be done. Obviously there are lots of articles about topics like hydrogen and magnetism that are gender-neutral.
Articles about individual people can be classified as male or female (although I guess there will be some transgender as well). Articles about groups of men and women (men's sporting teams, women's sporting teams) can be too. But would we try to put a gender on articles about a sport, e.g. football, cricket would be male based on the dominant gender of players, whereas cheerleading and netball and synchronised swimming would be female? Or is that now denying the reality of smaller groups of the opposite sex who participate in those sports? Similarly with occupations. Most occupations now can be pursued by both sexes but again there is a strong skew in many of them, men are plumbers, women are midwives, etc.
Would "gender" have to be some sort of sliding scale? Barack Obama (male), Cricket (mostly male), Tennis (mixed), Midwifery (mostly female), Queen Elizabeth (female)
Kerry
Dario Taraborelli, 06/03/2014 02:25:
Annual survey modules. Interest was expressed in exploring strategies for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar.
Can community members also propose questions? There was a proposal some time ago by Kaldari on this list and one more recently elsewhere. I proposed to bring up such proposals in [[Talk:General User Survey]], they're a recurring topic so it would be better to have a stable and discoverable wiki page where to hold such proposals and discussions.
Nemo
+1 I see no difference here between "community members" and "researchers". Seems like any selection process should afford for both.
On Sat, Mar 8, 2014 at 2:22 PM, Federico Leva (Nemo) nemowiki@gmail.comwrote:
Dario Taraborelli, 06/03/2014 02:25:
Annual survey modules. Interest was expressed in exploring strategies
for expanding the annual editor/reader survey with new questions contributed by researchers. At this point (March 2014) we cannot commit to any such project, but in general there is potential for cooperations between WMF and academic researchers in this area. Interested parties should contact Tilman Bayer (tbayer at wikimedia dot org) who has been conducting the last WMF editor survey and can provide information about these surveys (methodology, results, available data etc.) and their calendar.
Can community members also propose questions? There was a proposal some time ago by Kaldari on this list and one more recently elsewhere. I proposed to bring up such proposals in [[Talk:General User Survey]], they're a recurring topic so it would be better to have a stable and discoverable wiki page where to hold such proposals and discussions.
Nemo
Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
wiki-research-l@lists.wikimedia.org