Hey folks,
Since OpenSym was in San Francisco this year, we welcomed researchers working in the Wikimedia space to join us for breakfast at the Wikimedia Foundation on the morning after the conference. During this event, we shook hands with made a few quick presentations about ongoing projects and stuff that's right around the corner.
I took some notes on what was presented and I figured that many on this list might appreciate the notes as well.
*Wikimedia research (*Collaborate with us! https://www.mediawiki.org/wiki/Wikimedia_Research/Formal_collaborations *):*
- Research and data https://www.mediawiki.org/wiki/Wikimedia_Research/Research_and_Data -- Data science and experimental systems development. - Design Research https://www.mediawiki.org/wiki/Wikimedia_Research/Design_Research -- Generative and evaluative research support for product development.
(With much more overlap than is implied by the distinction)
*Communication channels:*
- IRC: #wikimedia-research on freenode.net (webchat http://webchat.freenode.net/?channels=wikimedia-research) - This is "the office" for us. It's an excellent channel for asking a quick question or discussing an idea. - Mailing list: wiki-research-l@lists.wikimedia.org (signup https://lists.wikimedia.org/mailman/listinfo/wiki-research-l) - WikiResearch Showcase https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase -- Monthly event covering WMF research results and invited speakers researching WMF projects.
*Projects presented:*
- Revision scoring as a service https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service -- State-of-the-art AI (vandalism & article quality prediction) as a web service. See ORES https://meta.wikimedia.org/wiki/Objective_Revision_Evaluation_Service for current system capabilities and Wiki labels https://meta.wikimedia.org/wiki/Wiki_labels, our crowdsourced data gathering system. - Scholarly article citations https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia -- An open-licensed dataset of scholarly identifiers in Wikipedia which notes when, historically, and identifier was first added. - Activity sessions https://meta.wikimedia.org/wiki/Research:Activity_session -- (coming soon) A dataset of sessionized editing activity. Useful for measuring labor hours or studying work patterns. - Measuring value-added https://meta.wikimedia.org/wiki/Research:Measuring_value-added -- (coming soon) A dataset of robust measurements of editor productivity and value-added. See also Content persistence https://meta.wikimedia.org/wiki/Research:Content_persistence. - Clickstream dataset http://ewulczyn.github.io/Wikipedia_Clickstream_Getting_Started/ -- An open-licensed dataset containing page view pair counts (as inferred by the "Referrer" header) - Increasing article coverage https://meta.wikimedia.org/wiki/Research:Increasing_article_coverage -- This research aims to identify important content available in one language edition but missing from another and recommend the work to editors who would be most interested in translating. - Improving link coverage https://meta.wikimedia.org/wiki/Research:Improving_link_coverage -- an approach for automatically finding useful hyperlinks to add to a website by analyzing server access logs.
I'm sure I missed some stuff. I invite my colleagues to supplement my notes in their replies. Thanks to all who joined us!
-Aaron