(cross-posting to the research and analytics mailing lists)
I've been working with a few different groups of professors, grad students
and Wikipedians to apply for grants to develop some intelligent data
services on top of the WMF labs architecture. I'm posting to ask you to
take some time to review the proposals and leave comments or an endorsement
as you see fit. I encourage you to raise conversations about each proposal
on the respective talk pages as this helps the IEG grant committee make
This is a project I have been trying to sell for a while. Counter-vandalism
tools all use their own strategy for detecting low quality edits. Some of
them use simple rule based scoring systems. Others are based on relatively
advanced machine learning strategies. The goal of this project is to solve
the revision scoring problem with advanced machine learning techniques and
to make that solution available for tool builders via a web API.
Brent & Shilad are professors @ UMN & Macalester College. They have been
working to develop a tool that collects information retrieval strategies
from the academic literature to make them easy to use for researchers and
wiki tool developers. This IEG is geared towards developing a web API on
top of the system to make it even easier for wiki tool developers to use.
Per my work on Articles for Creation, it seems that a main stumbling block
for newcomer page creators and reviewers is the notability of topics. In
this project, we propose to build a machine classifier to aid in decision
making around notability. One of the core use-cases is to flag article
drafts that are clearly notable to the machine, but would otherwise be
overlooked by their human reviewers.
In this project, we propose to extract varied datasets of different types
of editor interactions. Some examples include reverts, talk page reply,
user talk page, etc. We'd then use some natural language processing
strategies to identify the nature of interactions (e.g. positive vs.
negative affect). These datasets would be published openly for others to
make use of. We'd also use the data to explore hypotheses around which
types of interactions promote retention and which types of
editors/interactions lead to others leaving Wikipedia.