Wiki-research-l October 2021

wiki-research-l@lists.wikimedia.org

9 participants
9 discussions

by song＠cs.umn.edu

Pursuant to prior discussions about the need for a research policy on Wikipedia, WikiProject Research is drafting a policy regarding the recruitment of Wikipedia users to participate in studies. At this time, we have a proposed policy, and an accompanying group that would facilitate recruitment of subjects in much the same way that the Bot Approvals Group approves bots. The policy proposal can be found at: http://en.wikipedia.org/wiki/Wikipedia:Research The Subject Recruitment Approvals Group mentioned in the proposal is being described at: http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group Before we move forward with seeking approval from the Wikipedia community, we would like additional input about the proposal, and would welcome additional help improving it. Also, please consider participating in WikiProject Research at: http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research -- Bryan Song GroupLens Research University of Minnesota

9 months, 2 weeks

How to access deleted Wikipedia articles

by D Z

Hello All, I am doing research investigating the role of machine translation in Wikipedia articles. I am having trouble with how to know if an article has been deleted from Wikipedia. Specifically, I am getting a list of articles from the cxtranslation list and I would like to know which articles are no longer on Wikipedia. I see that there is the deletion log form <https://en.wikipedia.org/wiki/Special:Log/delete> but is there an API or some way to access something like this form so I could check if a mass amount of articles have been deleted? I have used the Media Wiki API <https://en.wikipedia.org/w/api.php> to get articles and the API returns missing for some articles, but this does not seem to be fully accurate for determining if an article has been deleted because the API has returned 'missing' for articles that do exist. To summarize, my main question is: given an article language edition and article title, or an article pageid, is there an API to check if the article has been deleted? Any help would be greatly appreciated! Thanks, Doris Zhou

2 years, 5 months

Invitation to Wikimedia Research Office hours November 2, 2021

by Emily Lescak

Hi all, Join the Research Team at the Wikimedia Foundation [1] for their monthly Office hours this Tuesday, 2021-11-02, at 12:00-13:00 UTC (5am PT/8am ET/1pm CET). Please note the time change! We are experimenting with our Office hours schedules to make our sessions more globally welcoming. To participate, join the video-call via this link [2]. There is no set agenda - feel free to add your item to the list of topics in the etherpad [3]. You are welcome to add questions / items to the etherpad in advance, or when you arrive at the session. Even if you are unable to attend the session, you can leave a question that we can address asynchronously. If you do not have a specific agenda item, you are welcome to hang out and enjoy the conversation. More detailed information (e.g. about how to attend) can be found here [4]. Through these office hours, we aim to make ourselves more available to answer research related questions that you as Wikimedia volunteer editors, organizers, affiliates, staff, and researchers face in your projects and initiatives. Here are some example cases we hope to be able to support you with: - You have a specific research related question that you suspect you should be able to answer with the publicly available data and you don’t know how to find an answer for it, or you just need some more help with it. For example, how can I compute the ratio of anonymous to registered editors in my wiki? - You run into repetitive or very manual work as part of your Wikimedia contributions and you wish to find out if there are ways to use machines to improve your workflows. These types of conversations can sometimes be harder to find an answer for during an office hour. However, discussing them can help us understand your challenges better and we may find ways to work with each other to support you in addressing it in the future. - You want to learn what the Research team at the Wikimedia Foundation does and how we can potentially support you. Specifically for affiliates: if you are interested in building relationships with the academic institutions in your country, we would love to talk with you and learn more. We have a series of programs that aim to expand the network of Wikimedia researchers globally and we would love to collaborate with those of you interested more closely in this space. - You want to talk with us about one of our existing programs [5]. Hope to see many of you, Emily on behalf of the WMF Research Team [1] https://research.wikimedia.org [2] https://meet.jit.si/WMF-Research-Office-Hours [3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours [4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours [5] https://research.wikimedia.org/projects.html -- Emily Lescak (she / her) Senior Research Community Officer The Wikimedia Foundation

2 years, 6 months

[Wikimedia Research Showcase] Bridging knowledge gaps

by Janna Layton

Hi all, The next Wikimedia Research Showcase will be on October 27, 16:30 UTC (9:30am PT/ 12:30pm ET/ 18:30pm CEST). The Wikimedia Foundation Research Team will present on knowledge gaps. Livestream: https://www.youtube.com/watch?v=d0Qg98EVmuI Speaker: Wikimedia Foundation Research Team Title: Automatic approaches to bridge knowledge gaps in Wikimedia projects Abstract: In order to advance knowledge equity as part of the Wikimedia Movement’s 2030 strategic direction, the Research team at the Wikimedia Foundation has been conducting research to “Address Knowledge Gaps” as one of its main programs. One core component of this program is to develop technologies to bridge knowledge gaps. In this talk, we give an overview on how we approach this task using tools from Machine Learning in four different contexts: section alignment in content translation, link recommendation in structured editing, image recommendation in multimedia knowledge gaps, and the equity of the recommendations themselves. We will present how these models can assist contributors in addressing knowledge gaps. Finally, we will discuss the impact of these models in applications deployed across Wikimedia projects supporting different Product initiatives at the Wikimedia Foundation. More information: * Section alignment: meta:Research:Expanding_Wikipedia_articles_across_languages/Inter_language_approach#Section_Alignment <https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_articles_acros…> * Link recommendation: meta:Research:Link_recommendation_model_for_add-a-link_structured_task <https://meta.wikimedia.org/wiki/Research:Link_recommendation_model_for_add-…> * Image recommendation: meta:Research:Recommending_Images_to_Wikipedia_Articles <https://meta.wikimedia.org/wiki/Research:Recommending_Images_to_Wikipedia_A…> * Equity in recommendations: meta:Research:Prioritization_of_Wikipedia_Articles/Recommendation <https://meta.wikimedia.org/wiki/Research:Prioritization_of_Wikipedia_Articl…> -- Janna Layton (she/her) Administrative Associate - Product & Technology Wikimedia Foundation <https://wikimediafoundation.org/> -- Janna Layton (she/her) Administrative Associate - Product & Technology Wikimedia Foundation <https://wikimediafoundation.org/>

2 years, 6 months

Scientific greetings this Sunday, 31 Oct

by David Abián

Hi, If you're a researcher (whether from academia, industry, other sectors, or independent), you'll probably be interested in participating in the online session "Scientific greetings", which will be held this Sunday, 31 October, as part of the WikidataCon 2021 <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021>. In this condensed session each researcher will have 5 minutes to present what aspects of Wikidata they're studying or how Wikidata is useful for their research, find out what other colleagues are working on, and ask for or offer collaboration. We're sending this email to inform you that prior registration is required to present at this session, so we encourage you to follow the steps below: 1. Sign up for the WikidataCon 2021 <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021> (online, 29-31 Oct) if you haven't already done so. It's free and requires no personal data. 2. *Add your name or username and, optionally, other details **here <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Scientific_…>**as soon as possible*. Slots will be allocated on a first-come, first-served basis. If you want to prepare slides, feel free to use this template <https://docs.google.com/presentation/d/1XqQYDwfOnIAlhEjz__AJxg3lluYuNp3QOyo…>. If you're planning a pre-recorded session, please upload it to Youtube or Vimeo (unfortunately, we can't display videos from Commons). Please feel free to share this email with anyone who might find it useful, and write to us if you have any questions. We hope you enjoy the session and the conference. The organizers of the session, Tiago, Gabriel and David

2 years, 6 months

[ANN] New DBpedia Snapshot 2021-09

by DBpedia

Apologies for cross-posting. The full release description including further statistics can be found on https://www.dbpedia.org/blog/snapshot-2021-09-release/ <https://www.dbpedia.org/blog/snapshot-2021-09-release/>. We are pleased to announce immediate availability of a new edition of the free and publicly accessible SPARQL Query Service Endpoint and Linked Data Pages, for interacting with the new Snapshot Dataset. News since DBpedia Snapshot 2021-06 <https://www.dbpedia.org/blog/snapshot-2021-06-release/> * Release notes are now maintained in the Databus Collection (https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09 <https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09>) * Image and Abstract Extractor was improved * Work in progress: Smoothing the community issue reporting and fixing at Github (https://github.com/dbpedia/extraction-framework/issues/new/choose <https://github.com/dbpedia/extraction-framework/issues/new/choose>) What is the “DBpedia Snapshot” Release? Historically, this release has been associated with many names: "DBpedia Core", "EN DBpedia", and — most confusingly — just "DBpedia". In fact, it is a combination of — * EN Wikipedia data— A small, but very useful, subset (~ 1 Billion triples or 14%) of the whole DBpedia extraction <https://link.springer.com/chapter/10.1007/978-3-030-59833-4_1>using theDBpedia Information Extraction Framework <https://github.com/dbpedia/extraction-framework>(DIEF), comprising structured information extracted from the English Wikipedia plus some enrichments from other Wikipedia language editions, notably multilingual abstracts in ar, ca, cs, de, el, eo, es, eu, fr, ga, id, it, ja, ko, nl, pl, pt, sv, uk, ru, zh. * Links— 62 million community-contributed cross-references and owl:sameAs links to other linked data sets on the Linked Open Data (LOD) Cloud that allow to effectively find and retrieve further information from the largest, decentral, change-sensitive knowledge graph on earth that has formed around DBpedia since 2007. * Community extensions— Community-contributed extensions such as additional ontologies and taxonomies. Release Frequency & Schedule Going forward, releases will be scheduled for the 15th of February, May, July, and October (with +/- 5 days tolerance), and are named using the same date convention as the Wikipedia Dumps that served as the basis for the release. An example of the release timeline is shown below: September 6–8 Sep 8–20 Sep 20–Oct 10 Oct 10–20 Wikipedia dumps for June 1 become available on https://dumps.wikimedia.org/ <https://dumps.wikimedia.org/> Download and extraction with DIEF Post-processing and quality-control period Linked Data and SPARQL endpoint deployment Data Freshness Given the timeline above, the EN Wikipediadata of DBpedia Snapshot has a lag of 1-4 months. Further Information Growth of DBpedia, breakdown of links by domain, download instructions and some tips on how to effectively work with DBpedia are published as part of this blog post: https://www.dbpedia.org/blog/snapshot-2021-09-release/ <https://www.dbpedia.org/blog/snapshot-2021-09-release/> Stay tuned and stay safe! With kind regards, The DBpedia Association

2 years, 6 months

[CfP] JWS: Community-based KBs and KGs

by Editors of CBKB

The Journal of Web Semantics (JWS) invites submissions for a special issue on Community-based Knowledge Bases and Knowledge Graphs, edited by Tim Finin, Sebastian Hellmann, David Martin, and Elena Simperl. (contact email: cbkb(a)cs.umbc.edu <mailto:cbkb@cs.umbc.edu>) Submissions are due by November 01, 2021. Please see the JWS post here: http://www.websemanticsjournal.org/2021/06/cfp-community-based-knowledge-ba… <http://www.websemanticsjournal.org/2021/06/cfp-community-based-knowledge-ba…> Introduction Community-based knowledge bases (KBs) and knowledge graphs (KGs) are critical to many domains. They contain large amounts of information, used in applications as diverse as search, question-answering systems, and conversational agents. They are the backbone of linked open data, helping connect entities from different datasets. Finally, they create rich knowledge engineering ecosystems, making significant, empirical contributions to our understanding of KB/KG science, engineering, and practices. From here forward, we use "KB" to include both knowledge bases and knowledge graphs. Also, "KB" and "knowledge" encompass both ontology/schema and data. Community-based KBs come in many shapes and sizes, but they tend to share a number of commonalities: * They are created through the efforts of a group of contributors, following a set of agreed goals, policies, practices, and quality norms. * They are available under open licenses. * They are central to knowledge-sharing networks bringing together various stakeholders. * They serve the needs of a community of users, including, but not restricted to, their contributor base. * Many draw their content from crowdsourced resources (such as Wikipedia, OpenStreetMap). Examples of community-based KBs include Wikidata, DBpedia, ConceptNet, GeoNames, FrameNet, and Yago. This special issue will highlight recent research, challenges, and opportunities in the field of community-based KBs and the interaction and processes between stakeholders and the KBs. We welcome papers on a wide variety of topics. Papers that focus on the participation of a community of contributors are especially encouraged. Topics of interest We are looking for studies, frameworks, methods, techniques and tools on topics such as the following: * The impact of community involvement on characteristics of KBs such as requirements, design, technology choices, policies, etc. For example, how are KB characteristics driven by the community and reflective of the community's needs? * Conversely, the impact of KB characteristics on community involvement. For example, how do changes in these characteristics affect the participation and behavior of members of the community? * Organizational challenges and solutions in developing and managing community-based KBs. * Technical challenges and solutions in community-based KBs, concerning a technical area such as: o Representation of knowledge and logical foundations o Reasoning, querying, and constraint-checking o Knowledge acquisition o Knowledge preparation (e.g., cleaning, deduplication, alignment, merging) o Maintaining consistency with external sources o Representing and managing metadata (including issues involved in adding metadata to relation instances) o Provenance o Quality assurance * User interfaces and experience, both for contributing to the KB and using it, by different user groups. * Implemented metrics and quality tests to guide the community in improving KG quality and expanding KG coverage. * Achieving and managing knowledge diversity, for instance, in the form of multilinguality, multi-cultural coverage, multiple points of view, and a diverse and inclusive contributor base. * Detecting and avoiding malicious, inappropriate, and misleading content in community-based KBs. * Biases in community-based KBs and their impact on downstream uses of KB content. * Community-based KBs in science, medicine, law, government, or other domains. * Handling specialized types of knowledge (such as commonsense, probabilistic, or linguistic knowledge) in a community setting. * Methods and tools to manage KB evolution, including change detection, change management, conflict resolution, visualization of change history. * Tools and affordances supporting community or collaborative activities, including discussions, feedback, decision making, task allocation, etc. * Motivations and incentives affecting community participation. * Approaches and metrics for community health, including but not restricted to community growth or diversity. * Roles and participation profiles in communities building and maintaining KBs. * Frameworks and approaches to support group decision-making and resolve conflicts. Types of Papers We invite submission of Research, Survey, Ontology, and System papers, according to the guidelines given at https://www.jws-volumes.com <https://www.jws-volumes.com/>. Submission Guidelines The Journal of Web Semantics solicits original scientific contributions of high quality. Following the overall mission of the journal, we emphasize the publication of papers that combine theories, methods and experiments from different subject areas in order to deliver innovative semantic methods and applications. The publication of large-scale experiments and their analysis is also encouraged to clearly illustrate scenarios and methods that introduce semantics into existing Web interfaces, contents and services. Submission of your manuscript is welcome provided that it, or any translation of it, has not been copyrighted or published and is not being submitted for publication elsewhere. Manuscripts should be prepared for publication in accordance with instructions given in the JWS guide for authors <http://www.elsevier.com/journals/journal-of-web-semantics/1570-8268/guide-f…>. The submission and review process will be carried out using Elsevier's Web-based EM system <https://www.editorialmanager.com/JOWS/default.aspx>. Please state the name of the SI in your cover letter and, at the time of submission, please select “VSI:CBKB” when reaching the Article Type selection. Upon acceptance of an article, the author(s) will be asked to transfer copyright of the article to the publisher. This transfer will ensure the widest possible dissemination of information. Elsevier's liberalpreprint policy<https://www.elsevier.com/authors/journal-authors/submit-your-paper/sharing-…>permits authors and their institutions to host preprints on their web sites. Preprints of the articles will be made freely accessible viaJWS First Look <https://papers.ssrn.com/sol3/JELJOUR_Results.cfm?form_name=journalbrowse&jo…>. Final copies of accepted publications will appear in print and at Elsevier's archival online server. Important Dates * Submission deadline: November 1, 2021 * Author notification: February 7, 2022 * Minor revisions due: February 21, 2022 * Major revisions due: March 14, 2022 * Papers appear on JWS preprint server: May 2, 2022 * Publication: Fall or Winter 2022 Guest Editors Tim Finin is the Willard and Lillian Hackerman Chair in Engineering and a Professor of Computer Science and Electrical Engineering at the University of Maryland, Baltimore County (UMBC). Sebastian Hellmann is the head of the “Knowledge Integration and Language Technologies (KILT)" Competence Center at InfAI, Leipzig. He also is the executive director and board member of the non-profit DBpedia Association with over 30 key players <https://www.dbpedia.org/members/overview/>in the knowledge graph area. He earned a rank in AMiner’s top 10 of the most influential scholars in knowledge engineering of the last decade. David L. Martinis a Research & Development Scientist in Artificial Intelligence. He has held positions at SRI International, Siri, Inc., Apple, Nuance Communications, Samsung Research America, and the University of California at Santa Cruz. He is a Senior Member of the Association for the Advancement of Artificial Intelligence, and currently works as an independent consultant in Silicon Valley, California. Elena Simperlis professor of computer science at King’s College London, a Fellow of the British Computer Society and former Turing fellow. According to AMiner, she is in the top 100 most influential scholars in knowledge engineering of the last decade, as well as in the Women in AI 2000 ranking. Before joining King’s College, she held positions at the University of Southampton, as well as in Germany and Austria.

2 years, 6 months

Wikimedia Enterprise HTML dumps available for public download

by Ariel Glenn WMF

I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for October 17-18th are available for public download; see https://dumps.wikimedia.org/other/enterprise_html/ for more information. We expect to make updated versions of these files available around the 1st/2nd of the month and the 20th/21st of the month, following the cadence of the standard SQL/XML dumps. This is still an experimental service, so there may be hiccups from time to time. Please be patient and report issues as you find them. Thanks! Ariel "Dumps Wrangler" Glenn [1] See https://www.mediawiki.org/wiki/Wikimedia_Enterprise for much more about Wikimedia Enterprise and its API.

2 years, 6 months

Announcing the Wikipedia Image / Caption Matching Competition

by Emily Lescak

Hi all, To help bridge Wikipedia’s visual knowledge gaps, the Research team <https://research.wikimedia.org/> at the Wikimedia Foundation has launched the “Wikipedia Image/Caption Matching Competition <https://www.kaggle.com/c/wikipedia-image-caption>”. Read on for more information or check out our blog post <https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-…> ! Images are essential for knowledge sharing, learning, and understanding. However, the majority of images on Wikipedia articles lack written context (e.g., captions, alt-text), often making them inaccessible. As part of our initiatives <https://research.wikimedia.org/knowledge-gaps.html> to address Wikipedia’s knowledge gaps, the Research <https://research.wikimedia.org/> team at the Wikimedia Foundation is hosting the “Wikipedia Image/Caption Matching Competition <https://www.kaggle.com/c/wikipedia-image-caption>.” We invite the communities of volunteers, developers, data scientists, and machine learning enthusiasts to develop systems that can automatically associate images with their corresponding captions and article titles. In this competition (hosted on Kaggle <https://www.kaggle.com/>), participants are provided with content from Wikipedia articles in 100+ language editions and are asked to build systems that automatically retrieve the text (an image caption, or an article title) closest to a query image.The data is a combination of Google AI’s recently released WIT dataset <https://github.com/google-research-datasets/wit> and a new dataset of 6 Million images from Wikimedia Commons that we have released <https://analytics.wikimedia.org/published/datasets/one-off/caption_competit…> for this competition. Kaggle is hosting all data needed to get started with the task, example notebooks, a forum for participants to share and collaborate, and submitted models in open-sourced formats. We encourage everyone to download our data and participate in the competition. This challenge is an opportunity for people around the world to grow their technical skills while increasing the accessibility of Wikipedia. This competition is possible thanks to collaborations with Google Research <https://research.google/>, EPFL <https://www.epfl.ch/en/>, Naver Labs Europe <https://europe.naverlabs.com/> and Hugging Face <https://huggingface.co/>, who assisted with data preparation and competition design. Check out our blog post <https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-…> for more information! The point of contact for this project is Miriam Redi. You're welcome to reach out with questions or comments at miriam(a)wikimedia.org. Cheers, Emily Lescak, on behalf of the Research team -- Emily Lescak (she / her) Senior Research Community Officer The Wikimedia Foundation

2 years, 6 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Wiki-research-l October 2021