> Have you run this by Wikipedians? ... since it involves adding a
> bunch of templates....
The 2009 strategy proposal linked towards the end, of which the GSoC proposal is a limited subset, did not get any criticism after several high-profile opportunities. Both proposals are intended to be language- and keyword-neutral. It would be best if the initial bot were built and tested on some other wiki than the English Wikipedia, so the Bot Approvals Group will be able to get concrete answers to any questions they might have.
The use of templates should be optional; one way to do that would be to allow the use of a mirroring namespace to hold the templates, instead of the primary namespace. But there is probably a better way. Thank you for something so interesting to think about.
>> Prepare a table of each word in article dumps indicating its age.
>
> This in itself is a non-trivial problem (for a gsoc student anyways),
It would be non-trivial for a large production dump, but for a small subset of articles in a given dump, there are deterministic algorithms which perform with sufficient accuracy to form the specified partial basis of a selection heuristic. Creating such a table is at worst O(N) in revisions, but there are ways to hash words with N-gram contexts so that moved and blanked text is more likely to be treated correctly than what raw diffs would lead people to believe might happen. This is equivalent to the general blame problem, and I look forward to explaining the history of the problem (see e.g., http://wikitrust.soe.ucsc.edu/talks-and-papers ) to show why the N-gram hash solution is best.
>> Convert flagged passages to GIFT questions for review and
>> present them to one or more subscribed reviewers
>
> Wouldn't you want to give the reviewers an actual form where
> they can fill out the questions?
Yes, and I want to store questions in GIFT format to allow follow-on integration with the Global Learning Xprize Meta-Team deliverables. Presentation of a GIFT question means converting it to a form instead of just displaying it in markup. The question pertaining to whether direct integration is a reasonable follow-on goal depends on the extent to which branching scenario interactive fiction role-play content, such as shown in http://www.capuano.biz/papers/EL_2014.pdf and
http://talknicer.com/GLMORS_2014.pdf