Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
Dear Wiki Community,
My name is Mackenzie Lemieux and I am a neuroscience researcher at the Salk
Institute for Biological Studies and I am interested in exploring biases on
Wikipedia.
My research hypothesis is that gender or ethnicity mediate the rate of
flagging and deletion of pages for women in STEM. I hope to
retrospectively analyze Wikipedia's deletion history, harvest the
biographical articles about scientists that have been created over the past
n years and then confirm the gender and ethnicity of a large sample.
It appears that we can identify deleted pages with Wikipedia's deletion log
<https://en.wikipedia.org/wiki/Wikipedia:Deletion_log>, but to actually see
the page that was deleted we need to be members of one of these Wikipedia
user groups: Administrators
<https://en.wikipedia.org/wiki/Wikipedia:Administrators>, Oversighters
<https://en.wikipedia.org/wiki/Wikipedia:Oversight>, Researchers
<https://en.wikipedia.org/wiki/Wikipedia:Researchers>, Checkusers
<https://en.wikipedia.org/wiki/Wikipedia:CheckUser>.
Does anyone have advice on how to obtain researcher status or is there
anyone willing to collaborate who has access to the data we need?
Warmly,
Mackenzie Lemieux
--
Mackenzie Lemieux
mackenzie.lemieux(a)gmail.com
cell: 416-806-0041
220 Gilmour Avenue
Toronto, Ontario
M6P 3B4
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours on Tuesday, 2020-11-03 at 17:00-18:00 PM UTC (9am PT/6pm CET).
To participate, join the video-call via this Wikimedia-meet link [2]. There
is no set agenda - feel free to add your item to the list of topics in the
etherpad [3] (You can do this after you join the meeting, too.), otherwise
you are welcome to also just hang out. More detailed information (e.g.
about how to attend) can be found here [4].
Through these office hours, we aim to make ourselves more available to
answer some of the research related questions that you as Wikimedia
volunteer editors, organizers, affiliates, staff, and researchers face in
your projects and initiatives. Some example cases we hope to be able to
support you in:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour, however, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Martin (WMF Research Team)
[1] https://research.wikimedia.org/team.html
[2] https://meet.wmcloud.org/ResearchOfficeHours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Martin Gerlach
Research Scientist
Wikimedia Foundation
[Moving the WMF internal lists to Bcc. Adding wiki-research-l, the
public mailing list for research related questions.]
Hi Ritvik,
Thank you for reaching out to us and your interest to do research with
the Wikimedia projects' data. I'm particularly excited to read that
you are already thinking about giving back to the commons by data,
knowledge, and insights. We love that. :)
Regarding the specific data that you wrote about:
* Our team, Research, is responsible for setting up Formal
Collaborations that allow the type of research that you mention in
your email. At this time, we are only able to prioritize and initiate
formal collaborations that are in-line with our annual plan
commitments and I expect that to stay the same in the coming 8 months.
I'm sorry that we can't explore together a formal collaboration at
this point.
* However, thanks to the nudges by some folks from the research
community and this list, we took steps to find a pathway to share some
of Wikipedia's COVID-19 related data with the research community.
While the decision about what to publish is not finalized, I do expect
to see a geographical dimension associated with pageviews as part of
the release (the granularity of which is to be determined).
You can read more about the details of the data that we are currently
keeping at https://meta.wikimedia.org/wiki/Data_retention_guidelines#Exceptions_to_the…
.
I am sorry that we cannot have a fast turnaround for your request. I
do believe, however, that by reserving more time to work on the
question of how to release the data publicly, we can unlock more
research and also provide a more equitable path for this highly
important dataset and many key research questions that can be answered
with it.
Best,
Leila
--
Leila Zia
Head of Research
Wikimedia Foundation
On Fri, Oct 30, 2020 at 7:34 AM Ramakrishnan, Ritvik
<ritvik.ramakrishnan(a)gatech.edu> wrote:
>
> Good Morning,
>
> Hope all is going well! My name is Ritvik Ramakrishnan and I am a Research Assistant at Harvard University. I have CC’d a Postdoctoral Researcher from Harvard, Dr. Tao Hu, in this email.
>
> Currently, we are looking at Wikipedia view counts to analyze the trends between that and COVID-19 growth in the United States. However, the Wikipedia view counts available using the Wikimedia Rest API Documentation made it difficult for us to geolocate and filter it to just the United States. The view count numbers we have aren’t confined to a location.
>
> Because of this, after talking to Italy researchers who had conducted a similar study for the Zika Virus using Wikipedia data confined to the United States, they suggested we reach out to Wikimedia Foundation to establish a non-disclosure agreement as part of your formal collaboration policy.
>
> Since we want to be able to look at view counts per day by location, in return we can provide cutting-edge data and information that your foundation can possibly release the data we used for our study once we are accepted for publication.
>
> Let me know what next steps we can take in order to proceed. Thank you!
>
> Warm Regards,
> Ritvik Ramakrishnan
> _______________________________________________
> Research-Internal mailing list
> Research-Internal(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/research-internal
Greetings!
Quantifying effort is obviously a fraught prospect, but Geiger and Halfaker [1] used edit sessions defined as consecutive edits by an editor without a gap longer than an hour to quantify the total number of labor hours spent on Wikipedia. I'm familiar with other papers that use this approach to measure things like editor experience.
I'm curious about the amount of effort put into each particular article. Edit sessions seem like a good approach, but there are some problems:
* How much time does an edit session of length 1 take?
* Should article edit sessions be consecutive in the same article?
* What if someone makes an edit to related article in the middle of their session?
I wonder what folks here think about alternatives for quantifying effort to an article like
1. Number of wikitext characters added/removed
2. Levenshtein (edit) distance (of characters or tokens)
3. Simply the number of edits
Thanks for your help!
[1] Geiger, R. S., & Halfaker, A. (2013). Using edit sessions to measure participation in Wikipedia. Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 861–870. http://dl.acm.org/citation.cfm?id=2441873
--
Nathan TeBlunthuis
PhD Candidate
University of Washington
Department of Communication
Meedan, a global non-profit I work with, is hiring a software engineer. The
posting says frontend, but full-stack developers are also super welcome.
It's a distributed organization with a great mission and culture. I'm very
happy to answer questions if anyone's interested and very much appreciate
your help spreading the word.
Meedan builds Check <https://github.com/meedan/check>, a web platform for
collaborative media annotation and fact-checking. The frontends include a
React web app, a cross-browser Web Extension and a sophisticated Slack bot,
all accessing our backend services via GraphQL and REST APIs.
https://meedan.com/jobs/software-engineer-frontend/
Best wishes,
Scott
--
Dr Scott A. Hale
http://scott.hale.us
computermacgyver(a)gmail.com
The Max Planck Institute for Demographic Research (MPIDR) is recruiting highly qualified Post-Docs/Research Scientists, at various levels of seniority, to join the Lab of Digital and Computational Demography.
The MPIDR is one of the leading demographic centers in the world. It is part of the Max Planck Society, a network of more than 80 institutes that form Germany's premier basic-research organization. Max Planck Institutes have an established record of world-class, foundational research in the sciences, technology, social sciences and the humanities. They offer a unique environment that combines the best aspects of an academic setting and a research laboratory.
The Lab of Digital and Computational Demography, headed by MPIDR Director Emilio Zagheni, is looking for candidates with a background in Demography, Data Science, Computer Science, Statistics, Economics, Sociology, Psychology, Social Psychology, Geography, Applied Mathematics, Public Health, Public Policy, or related disciplines.
The Lab brings together methodologists (from areas like Statistics, Computer Science or Mathematical Demography) with experts in areas of the Social Sciences in order to enable cross-pollination of ideas, to advance methods and theory of population research, and to address pressing societal questions.
For more information about the Lab and its current projects, see: https://www.demogr.mpg.de/go/lab-dcd<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.demogr.mpg.de_go_l…>
The successful candidate must have obtained their PhD (or expect to have obtained their PhD by the time the post commences no later than Fall 2021), and their profile should match one of the following three:
1. A methodologist interested in producing advances in demographic methods and in the field of Digital and Computational Demography.
2. A social and behavioral scientist with strong expertise in at least one of the following substantive areas: migration and mobility; population aging and generational processes; social demography; environmental demography; (digital) health; technological change and well-being.
3. A computational social scientist interested in working on questions central to demographic research.
Across all profiles, the ability and willingness to work in interdisciplinary teams in order to conduct cutting-edge research that advances population science is key.
Applications have to be submitted online via https://www.demogr.mpg.de/go/JobAd617491<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.demogr.mpg.de_go_J…> and include the following documents:
1. Curriculum Vitae
2. Letter of interest (Max 1 page): Briefly state why you are interested in joining the MPIDR, how the MPIDR could foster your professional development and career trajectory, and in which ways your interests fit the research strengths of the MPIDR.
3. Research Statement (Max 2 pages): Briefly describe your research accomplishments, as well as ongoing and future research plans. Please also describe your technical skills, areas of expertise, as well as the type of advanced training that you would like to receive as a research scientist.
4. Names and contact information for 3 academic references
5. One or two writing samples or publications
Note that incomplete submissions will not be considered.
The positions will be open until filled. In order to receive full consideration, applications should be submitted by December 1st, 2020. The starting date is flexible, but no later than Fall 2021.
Successful applicants will be offered a 3-year contract with remuneration commensurate to experience (starting from approx. 56,000 EUR gross per year for researchers who have just completed their PhD, up to approx. 70,000 EUR gross per year for more senior scientists) based on the salary structure of the German public sector (Öffentlicher Dienst, TVöD Bund). The successful candidates are expected to work locally at the MPIDR in Rostock, Germany. Relocation support is available.
The Max Planck Society offers a broad range of measures to support the reconciliation of work and family. These are complemented by the MPIDR's own initiatives. For more information, see: https://www.demogr.mpg.de/go/work-family<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.demogr.mpg.de_en_c…>.
In addition, there are a range of central initiatives and measures primarily geared towards helping young female researchers and mothers to advance their career. See the link below for some examples: https://www.demogr.mpg.de/go/career-development<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.demogr.mpg.de_en_c…>.
Our Institute values diversity and is committed to employing individuals from minorities. The Max Planck Society has set itself the goal of employing more severely handicapped people. The Institute and the Max Planck Society also seek to increase the proportion of women in areas where they are underrepresented. As women are underrepresented in computational social sciences, we explicitly encourage them to apply.
For inquiries about the positions, please contact sekzagheni(a)demogr.mpg.de<mailto:sekzagheni@demogr.mpg.de>.
----------
This mail has been sent through the MPI for Demographic Research. Should you receive a mail that is apparently from a MPI user without this text displayed, then the address has most likely been faked. If you are uncertain about the validity of this message, please check the mail header or ask your system administrator for assistance.
Dear fellow wiki-researchers,
Greetings from Zurich!
Together with Jérôme (in cc), we are working on a research problem that would definitely benefit from your expert insights.
Our goal: we want to predict a measure of the average "quality" of the edits made at the user level based on a set of covariates. To achieve this, we need to compute a measure of edit quality, and then aggregate those measures at the editor level. To do so, we envision relying on Aaron Halfaker's "word persistence" method [1] by querying the Wikipedia API [2].
Our main issue: we are dealing with approximately 20 million edits in this project. If we do all these queries serially (and assuming 4-5 seconds per query), then we would need approximately 2.5 years to complete the job!
Our question for you: how do you guys typically handle such computationally intensive data processing tasks?
One option to speed this up is to run several parallel processes to query the server. Does anybody know whether there is a formal limit on the number of connections a single IP can open to the API, and for how long? We also worry that opening several hundred connections at the same time may adversely affect the availability of the server for others...
We thank you in advance for your help and insights!
Sincerely,
Tarun & Jérôme @ ETH Zurich
[1] https://meta.wikimedia.org/wiki/Research:Content_persistence
[2] https://en.wikipedia.org/w/api.php
For your information.
---------- Forwarded message ---------
From: MPIDR - Career <career(a)demogr.mpg.de>
Date: Mon, Oct 12, 2020 at 7:26 AM
Subject: Vacancy at Department Zagheni, Max Planck Institute for
Demographic Research
To:
Dear colleague:
We are looking for postdocs/research scientists for the Department
Zagheni (https://www.demogr.mpg.de/en/career_6122/jobs_fellowships_1910/postdocs_res…)
at the Max Planck Institute for Demographic Research in Rostock,
Germany. Please kindly forward to interested persons at your
institutions. We would appreciate if you could help spread the job
announcement.
With best wishes,
Antje
Antje Gosselck
Max Planck Institute for Demographic Research
Konrad-Zuse-Str. 1
D-18057 Rostock
Germany
www.demogr.mpg.de
Phone: +49 381 2081-108
Data protection notice:
We use your data exclusively to inform you about current news from the
MPIDR. Please use the following contact to obtain information on
personal data stored about you or to have the data changed at any
time:
career(a)demogr.mpg.de
Should you no longer wish to receive news from the MPIDR, please click
the following link:
Unsubscribe from MPIDR career distribution list
Information on data protection can be accessed at any time on the
website of the Max Planck Institute for Demographic Research
(https://www.demogr.mpg.de/en/privacy_policy_5725/default.htm).
----------
This mail has been sent through the MPI for Demographic Research.
Should you receive a mail that is apparently from a MPI user without
this text displayed, then the address has most likely been faked. If
you are uncertain about the validity of this message, please check the
mail header or ask your system administrator for assistance.