Hi Everyone,
The next Research Showcase will be live-streamed this Wednesday, March 21, 2018 at 11:30 AM (PDT) 18:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=ACevHs0sMMw
As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018.
Over the past years, the Research team at Wikimedia Foundation and some of our formal collaborators have been focused on doing research and building technologies that can help editors across Wikimedia languages find tasks for contributions. While the early effort was heavily focused on article recommendation for creation (horizontal expansion), in 2016 we started a new direction of research with a focus on vertical expansion of Wikipedia articles. The two talks in the March 2018 Research Showcase will share some of what we have learned from this research. More specifically, we will talk about Wikipedia category network as a great signal for creating templates/structures for Wikipedia articles as well as ongoing research to learn what content (sections) are missing from Wikipedia across its many languages. The two corresponding abstracts with more details are below. Join us! :)
Using Wikipedia categories for research: opportunities, challenges, and solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is used by editors as a way to label articles and organize them in a hierarchical structure. This manually created and curated network of 1.6 million nodes in English Wikipedia generated by arranging the categories in a child-parent relation (i.e., Scientists-People, Cities-Human Settlement) allows researchers to infer valuable relations between concepts. A clean structure in this format would be a valuable resource for a variety of tools and application including automatic reasoning tools. Unfortunately, Wikipedia category network contains some "noise" since in many cases the association as subcategory does not define an is-a relation (Scientists is-a People vs. Billionaires is-a Wealth). Inspired to develop a model for recommending sections to be added to the already existing Wikipedia articles, we developed a method to clean this network and to keep only the categories that have a high chance to be associated with their children by an is-a relation. The strategy is based on the concept of "pure" categories, and the algorithm uses the types of the attached articles to determine how homogenous the category is. The approach does not rely on any linguistic feature and therefore is suitable for all Wikipedia languages. In this talk, we will discuss the high-level overview of the algorithm and some of the possible applications for the generated network beyond article section recommendations.
Beyond Automatic Translation: Aligning Wikipedia sections across multiple languagesBy *Diego Saez-Trumper*Sections are the building blocks of Wikipedia articles. For editors, they can be used as an entry point for creating and expanding articles. For readers, they enhance readability of Wikipedia content. In this talk, we present an ongoing research to align article sections across Wikipedia languages. We show how the available technology for automatic translations are not good enough for translating section titles. We then show a complementary approach for section alignment, using Wikidata and cross-lingual word embeddings. We will present some of the use-cases of a methodology for aligning sections across languages, including improved section recommendation, especially in medium to smaller size languages where the language itself may not contain enough signal about the structure of the articles and signals can be inferred from other larger Wikipedia languages.
Sarah R. Rodlund Senior Project Coordinator-Product & Technology, Wikimedia Foundation srodlund@wikimedia.org
Hi Everyone,
Just a reminder -- this is beginning in a half hour. Hope to see you there!
On Mon, Mar 19, 2018 at 1:54 PM, Sarah R srodlund@wikimedia.org wrote:
Hi Everyone,
The next Research Showcase will be live-streamed this Wednesday, March 21, 2018 at 11:30 AM (PDT) 18:30 UTC.
YouTube stream: https://www.youtube.com/watch?v=ACevHs0sMMw
As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#March_2018.
Over the past years, the Research team at Wikimedia Foundation and some of our formal collaborators have been focused on doing research and building technologies that can help editors across Wikimedia languages find tasks for contributions. While the early effort was heavily focused on article recommendation for creation (horizontal expansion), in 2016 we started a new direction of research with a focus on vertical expansion of Wikipedia articles. The two talks in the March 2018 Research Showcase will share some of what we have learned from this research. More specifically, we will talk about Wikipedia category network as a great signal for creating templates/structures for Wikipedia articles as well as ongoing research to learn what content (sections) are missing from Wikipedia across its many languages. The two corresponding abstracts with more details are below. Join us! :)
Using Wikipedia categories for research: opportunities, challenges, and solutionsBy *Tiziano Piccardi, EPFL*The category network in Wikipedia is used by editors as a way to label articles and organize them in a hierarchical structure. This manually created and curated network of 1.6 million nodes in English Wikipedia generated by arranging the categories in a child-parent relation (i.e., Scientists-People, Cities-Human Settlement) allows researchers to infer valuable relations between concepts. A clean structure in this format would be a valuable resource for a variety of tools and application including automatic reasoning tools. Unfortunately, Wikipedia category network contains some "noise" since in many cases the association as subcategory does not define an is-a relation (Scientists is-a People vs. Billionaires is-a Wealth). Inspired to develop a model for recommending sections to be added to the already existing Wikipedia articles, we developed a method to clean this network and to keep only the categories that have a high chance to be associated with their children by an is-a relation. The strategy is based on the concept of "pure" categories, and the algorithm uses the types of the attached articles to determine how homogenous the category is. The approach does not rely on any linguistic feature and therefore is suitable for all Wikipedia languages. In this talk, we will discuss the high-level overview of the algorithm and some of the possible applications for the generated network beyond article section recommendations.
Beyond Automatic Translation: Aligning Wikipedia sections across multiple languagesBy *Diego Saez-Trumper*Sections are the building blocks of Wikipedia articles. For editors, they can be used as an entry point for creating and expanding articles. For readers, they enhance readability of Wikipedia content. In this talk, we present an ongoing research to align article sections across Wikipedia languages. We show how the available technology for automatic translations are not good enough for translating section titles. We then show a complementary approach for section alignment, using Wikidata and cross-lingual word embeddings. We will present some of the use-cases of a methodology for aligning sections across languages, including improved section recommendation, especially in medium to smaller size languages where the language itself may not contain enough signal about the structure of the articles and signals can be inferred from other larger Wikipedia languages.
Sarah R. Rodlund Senior Project Coordinator-Product & Technology, Wikimedia Foundation srodlund@wikimedia.org