Hey folks,

Since OpenSym was in San Francisco this year, we welcomed researchers working in the Wikimedia space to join us for breakfast at the Wikimedia Foundation on the morning after the conference.  During this event, we shook hands with made a few quick presentations about ongoing projects and stuff that's right around the corner.  

I took some notes on what was presented and I figured that many on this list might appreciate the notes as well. 

Wikimedia research (Collaborate with us!):
(With much more overlap than is implied by the distinction)

Communication channels:
Projects presented:
  • Revision scoring as a service -- State-of-the-art AI (vandalism & article quality prediction) as a web service.  See ORES for current system capabilities and Wiki labels, our crowdsourced data gathering system.
  • Scholarly article citations -- An open-licensed dataset of scholarly identifiers in Wikipedia which notes when, historically, and identifier was first added. 
  • Activity sessions -- (coming soon) A dataset of sessionized editing activity.  Useful for measuring labor hours or studying work patterns. 
  • Measuring value-added -- (coming soon) A dataset of robust measurements of editor productivity and value-added.  See also Content persistence.
  • Clickstream dataset -- An open-licensed dataset containing page view pair counts (as inferred by the "Referrer" header) 
  • Increasing article coverage -- This research aims to identify important content available in one language edition but missing from another and recommend the work to editors who would be most interested in translating.
  • Improving link coverage --  an approach for automatically finding useful hyperlinks to add to a website by analyzing server access logs.
I'm sure I missed some stuff.  I invite my colleagues to supplement my notes in their replies.  Thanks to all who joined us!  

-Aaron