I'd suggest, James, that relying on suggestions from a six-year-old strategy document when we're about to start a new strategic session, isn't the best course of action.
I'd also query what exactly is the plan for doing something with this information. Collecting lists of things that might no longer be up to date when there is no correlating action plan for updating the same information is probably not good use of anyone's time or effort.
Risker/Anne
On 13 February 2015 at 07:57, James Salsman jsalsman@gmail.com wrote:
Brian Wolff wrote:
Have you run this by Wikipedians? ... since it involves adding a bunch of templates....
The 2009 strategy proposal linked towards the end, of which the GSoC proposal is a limited subset, did not get any criticism after several high-profile opportunities. Both proposals are intended to be language- and keyword-neutral. It would be best if the initial bot were built and tested on some other wiki than the English Wikipedia, so the Bot Approvals Group will be able to get concrete answers to any questions they might have.
The use of templates should be optional; one way to do that would be to allow the use of a mirroring namespace to hold the templates, instead of the primary namespace. But there is probably a better way. Thank you for something so interesting to think about.
*Prepare a table of each word in article dumps indicating its age. *
*> * *> *This in itself is a non-trivial problem (for a gsoc student anyways),
It would be non-trivial for a large production dump, but for a small subset of articles in a given dump, there are deterministic algorithms which perform with sufficient accuracy to form the specified partial basis of a selection heuristic. Creating such a table is at worst O(N) in revisions, but there are ways to hash words with N-gram contexts so that moved and blanked text is more likely to be treated correctly than what raw diffs would lead people to believe might happen. This is equivalent to the general blame problem, and I look forward to explaining the history of the problem (see e.g., http://wikitrust.soe.ucsc.edu/talks-and-papers ) to show why the N-gram hash solution is best.
*Convert flagged passages to GIFT questions for review and*
*>> present them to one or more subscribed reviewers *
Wouldn't you want to give the reviewers an actual form where they can fill out the questions?
Yes, and I want to store questions in GIFT format to allow follow-on integration with the Global Learning Xprize Meta-Team deliverables. Presentation of a GIFT question means converting it to a form instead of just displaying it in markup. The question pertaining to whether direct integration is a reasonable follow-on goal depends on the extent to which branching scenario interactive fiction role-play content, such as shown in http://www.capuano.biz/papers/EL_2014.pdf and http://talknicer.com/GLMORS_2014.pdf can be automatically created. I think it can be, and look forward to discussing the matter in detail with co-mentor volunteers. http://talknicer.com/GLMORS_2014.pdf
On Thursday, February 12, 2015, James Salsman jsalsman@gmail.com wrote:
I invite review of this preliminary proposal for a Google Summer of Code project: http://www.mediawiki.org/wiki/Accuracy_review
If you would like to co-mentor this project, please sign up. I've been a GSoC mentor every year since 2010, and successfully mentored two students in 2012 resulting in work which has become academically relevant, including in languages which I can not read, i.e., http://talknicer.com/turkish-tablet.pdf .) I am most interested in co-mentors at the WMF or Wiki Education Foundation involved with engineering, design, or education.
Synopsis:
Create a Pywikibot to find articles in given categories, category trees, and lists. For each such article, add in-line templates to indicate the location of passages with (1) facts and statistics which are likely to have become out of date and have not been updated in a given number of years, and (2) phrases which are likely unclear. Use a customizable set of keywords and the DELPH-IN LOGIN parser [http://erg.delph-in.net/logon] to find such passages for review. Prepare a table of each word in article dumps indicating its age. Convert flagged passages to GIFT questions [http://microformats.org/wiki/gift] for review and present them to one or more subscribed reviewers. Update the source template with the reviewer(s)' answers to the GIFT question, but keep the original text as part of the template. When reviewers disagree, update the template to reflect that fact, and present the question to a third reviewer to break the tie.
Possible stretch goals for Global Learning Xprize Meta-Team systems [http://www.wiki.xprize.org/Meta-team#Goals] integration TBD.
Best regards, James Salsman
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l