Brian Wolff wrote:

> Have you run this by Wikipedians? ... since it involves adding a 
> bunch of templates....

The 2009 strategy proposal linked towards the end, of which the GSoC proposal is a limited subset, did not get any criticism after several high-profile opportunities. Both proposals are intended to be language- and keyword-neutral. It would be best if the initial bot were built and tested on some other wiki than the English Wikipedia, so the Bot Approvals Group will be able to get concrete answers to any questions they might have.

The use of templates should be optional; one way to do that would be to allow the use of a mirroring namespace to hold the templates, instead of the primary namespace. But there is probably a better way. Thank you for something so interesting to think about.

 >> Prepare a table of each word in article dumps indicating its age. 
This in itself is a non-trivial problem (for a gsoc student anyways), 

It would be non-trivial for a large production dump, but for a small subset of articles in a given dump, there are deterministic algorithms which perform with sufficient accuracy to form the specified partial basis of a selection heuristic. Creating such a table is at worst O(N) in revisions, but there are ways to hash words with N-gram contexts so that moved and blanked text is more likely to be treated correctly than what raw diffs would lead people to believe might happen. This is equivalent to the general blame problem, and I look forward to explaining the history of the problem (see e.g., ) to show why the N-gram hash solution is best.

>> Convert flagged passages to GIFT questions for review and
>> present them to one or more subscribed reviewers 
Wouldn't you want to give the reviewers an actual form where
> they can fill out the questions?

Yes, and I want to store questions in GIFT format to allow follow-on integration with the Global Learning Xprize Meta-Team deliverables. Presentation of a GIFT question means converting it to a form instead of just displaying it in markup. The question pertaining to whether direct integration is a reasonable follow-on goal depends on the extent to which branching scenario interactive fiction role-play content, such as shown in 
can be automatically created. I think it can be, and look forward to discussing the matter in detail with co-mentor volunteers.

On Thursday, February 12, 2015, James Salsman <> wrote:
I invite review of this preliminary proposal for a Google Summer of
Code project:

If you would like to co-mentor this project, please sign up. I've been
a GSoC mentor every year since 2010, and successfully mentored two
students in 2012 resulting in work which has become academically
relevant, including in languages which I can not read, i.e., .) I am most interested in
co-mentors at the WMF or Wiki Education Foundation involved with
engineering, design, or education.


Create a Pywikibot to find articles in given categories, category
trees, and lists. For each such article, add in-line templates to
indicate the location of passages with (1) facts and statistics which
are likely to have become out of date and have not been updated in a
given number of years, and (2) phrases which are likely unclear. Use a
customizable set of keywords and the DELPH-IN LOGIN parser
[] to find such passages for review.
Prepare a table of each word in article dumps indicating its age.
Convert flagged passages to GIFT questions
[] for review and present them to one
or more subscribed reviewers. Update the source template with the
reviewer(s)' answers to the GIFT question, but keep the original text
as part of the template. When reviewers disagree, update the template
to reflect that fact, and present the question to a third reviewer to
break the tie.

Possible stretch goals for Global Learning Xprize Meta-Team systems
[] integration TBD.

Best regards,
James Salsman