(cross-posting to the research and analytics mailing lists)

Hey folks,

I've been working with a few different groups of professors, grad students and Wikipedians to apply for grants to develop some intelligent data services on top of the WMF labs architecture.  I'm posting to ask you to take some time to review the proposals and leave comments or an endorsement as you see fit.  I encourage you to raise conversations about each proposal on the respective talk pages as this helps the IEG grant committee make decisions.  

This is a project I have been trying to sell for a while. Counter-vandalism tools all use their own strategy for detecting low quality edits. Some of them use simple rule based scoring systems. Others are based on relatively advanced machine learning strategies. The goal of this project is to solve the revision scoring problem with advanced machine learning techniques and to make that solution available for tool builders via a web API.

Brent & Shilad are professors @ UMN & Macalester College. They have been working to develop a tool that collects information retrieval strategies from the academic literature to make them easy to use for researchers and wiki tool developers. This IEG is geared towards developing a web API on top of the system to make it even easier for wiki tool developers to use.

Per my work on Articles for Creation, it seems that a main stumbling block for newcomer page creators and reviewers is the notability of topics. In this project, we propose to build a machine classifier to aid in decision making around notability. One of the core use-cases is to flag article drafts that are clearly notable to the machine, but would otherwise be overlooked by their human reviewers.

In this project, we propose to extract varied datasets of different types of editor interactions. Some examples include reverts, talk page reply, user talk page, etc. We'd then use some natural language processing strategies to identify the nature of interactions (e.g. positive vs. negative affect). These datasets would be published openly for others to make use of. We'd also use the data to explore hypotheses around which types of interactions promote retention and which types of editors/interactions lead to others leaving Wikipedia.