Hi Everyone,

Just a reminder, this will begin at 11:30 AM PST Today!

Kind regards,

Sarah R.

On Sun, Jun 18, 2017 at 3:47 PM, Sarah R <srodlund@wikimedia.org> wrote:
Hi Everyone,

The next Research Showcase will be live-streamed this Wednesday, June 21, 2017 at 11:30 AM (PST) 18:30 UTC. 

YouTube stream: https://www.youtube.com/watch?v=i2jpKRwPT-Q

As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here.

This month's presentations:

Title: Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia

By Allen Yilun Lin

Abstract: Wikipedia-based studies and systems frequently assume that each article describes a separate concept. However, in this paper, we show that this article-as-concept assumption is problematic due to editors’ tendency to split articles into parent articles and sub-articles when articles get too long for readers (e.g. “United States” and “American literature” in the English Wikipedia). In this paper, we present evidence that this issue can have significant impacts on Wikipedia-based studies and systems and introduce the subarticle matching problem. The goal of the sub-article matching problem is to automatically connect sub-articles to parent articles to help Wikipedia-based studies and systems retrieve complete information about a concept. We then describe the first system to address the sub-article matching problem. We show that, using a diverse feature set and standard machine learning techniques, our system can achieve good performance on most of our ground truth datasets, significantly outperforming baseline approaches.


Title: Understanding Wikidata Queries


By Markus Kroetzsch


Abstract: Wikimedia provides a public service that lets anyone answer complex questions over the sum of all knowledge stored in Wikidata. These questions are expressed in the query language SPARQL and range from the most simple fact retrievals ("What is the birthday of Douglas Adams?") to complex analytical queries ("Average lifespan of people by occupation"). The talk presents ongoing efforts to analyse the server logs of the millions of queries that are answered each month. It is an important but difficult challenge to draw meaningful conclusions from this dataset. One might hope to learn relevant information about the usage of the service and Wikidata in general, but at the same time one has to be careful not to be misled by the data. Indeed, the dataset turned out to be highly heterogeneous and unpredictable, with strongly varying usage patterns that make it difficult to draw conclusions about "normal" usage. The talk will give a status report, present preliminary results, and discuss possible next steps.


--
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation





--
Sarah R. Rodlund
Senior Project Coordinator-Product & Technology, Wikimedia Foundation

In a real sense all life is inter-related. All men are caught in an inescapable network of mutuality, tied in a single garment of destiny. Whatever affects one directly, affects all indirectly. I can never be what I ought to be until you are what you ought to be, and you can never be what you ought to be until I am what I ought to be...This is the inter-related structure of reality.”

― Martin Luther King Jr.'s Letter from Birmingham Jail and the Struggle That Changed a Nation