Pursuant to prior discussions about the need for a research
policy on Wikipedia, WikiProject Research is drafting a
policy regarding the recruitment of Wikipedia users to
participate in studies.
At this time, we have a proposed policy, and an accompanying
group that would facilitate recruitment of subjects in much
the same way that the Bot Approvals Group approves bots.
The policy proposal can be found at:
http://en.wikipedia.org/wiki/Wikipedia:Research
The Subject Recruitment Approvals Group mentioned in the proposal
is being described at:
http://en.wikipedia.org/wiki/Wikipedia:Subject_Recruitment_Approvals_Group
Before we move forward with seeking approval from the Wikipedia
community, we would like additional input about the proposal,
and would welcome additional help improving it.
Also, please consider participating in WikiProject Research at:
http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Research
--
Bryan Song
GroupLens Research
University of Minnesota
Hello All,
I am doing research investigating the role of machine translation in
Wikipedia articles. I am having trouble with how to know if an article has
been deleted from Wikipedia. Specifically, I am getting a list of articles
from the cxtranslation list and I would like to know which articles are no
longer on Wikipedia. I see that there is the deletion log form
<https://en.wikipedia.org/wiki/Special:Log/delete> but is there an API or
some way to access something like this form so I could check if a mass
amount of articles have been deleted?
I have used the Media Wiki API <https://en.wikipedia.org/w/api.php> to get
articles and the API returns missing for some articles, but this does not
seem to be fully accurate for determining if an article has been deleted
because the API has returned 'missing' for articles that do exist.
To summarize, my main question is: given an article language edition and
article title, or an article pageid, is there an API to check if the
article has been deleted?
Any help would be greatly appreciated!
Thanks,
Doris Zhou
Hi all,
Join the Research Team at the Wikimedia Foundation [1] for their monthly
Office hours this Tuesday, 2021-11-02, at 12:00-13:00 UTC (5am PT/8am
ET/1pm CET). Please note the time change! We are experimenting with our
Office hours schedules to make our sessions more globally welcoming.
To participate, join the video-call via this link [2]. There is no set
agenda - feel free to add your item to the list of topics in the etherpad
[3]. You are welcome to add questions / items to the etherpad in advance,
or when you arrive at the session. Even if you are unable to attend the
session, you can leave a question that we can address asynchronously. If
you do not have a specific agenda item, you are welcome to hang out and
enjoy the conversation. More detailed information (e.g. about how to
attend) can be found here [4].
Through these office hours, we aim to make ourselves more available to
answer research related questions that you as Wikimedia volunteer editors,
organizers, affiliates, staff, and researchers face in your projects and
initiatives. Here are some example cases we hope to be able to support you
with:
-
You have a specific research related question that you suspect you
should be able to answer with the publicly available data and you don’t
know how to find an answer for it, or you just need some more help with it.
For example, how can I compute the ratio of anonymous to registered editors
in my wiki?
-
You run into repetitive or very manual work as part of your Wikimedia
contributions and you wish to find out if there are ways to use machines to
improve your workflows. These types of conversations can sometimes be
harder to find an answer for during an office hour. However, discussing
them can help us understand your challenges better and we may find ways to
work with each other to support you in addressing it in the future.
-
You want to learn what the Research team at the Wikimedia Foundation
does and how we can potentially support you. Specifically for affiliates:
if you are interested in building relationships with the academic
institutions in your country, we would love to talk with you and learn
more. We have a series of programs that aim to expand the network of
Wikimedia researchers globally and we would love to collaborate with those
of you interested more closely in this space.
-
You want to talk with us about one of our existing programs [5].
Hope to see many of you,
Emily on behalf of the WMF Research Team
[1] https://research.wikimedia.org
[2] https://meet.jit.si/WMF-Research-Office-Hours
[3] https://etherpad.wikimedia.org/p/Research-Analytics-Office-hours
[4] https://www.mediawiki.org/wiki/Wikimedia_Research/Office_hours
[5] https://research.wikimedia.org/projects.html
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation
Hi all,
The next Wikimedia Research Showcase will be on October 27, 16:30 UTC (9:30am
PT/ 12:30pm ET/ 18:30pm CEST). The Wikimedia Foundation Research Team will
present on knowledge gaps.
Livestream: https://www.youtube.com/watch?v=d0Qg98EVmuI
Speaker: Wikimedia Foundation Research Team
Title: Automatic approaches to bridge knowledge gaps in Wikimedia projects
Abstract: In order to advance knowledge equity as part of the Wikimedia
Movement’s 2030 strategic direction, the Research team at the Wikimedia
Foundation has been conducting research to “Address Knowledge Gaps” as one
of its main programs. One core component of this program is to develop
technologies to bridge knowledge gaps. In this talk, we give an overview on
how we approach this task using tools from Machine Learning in four
different contexts: section alignment in content translation, link
recommendation in structured editing, image recommendation in multimedia
knowledge gaps, and the equity of the recommendations themselves. We will
present how these models can assist contributors in addressing knowledge
gaps. Finally, we will discuss the impact of these models in applications
deployed across Wikimedia projects supporting different Product initiatives
at the Wikimedia Foundation.
More information:
* Section alignment:
meta:Research:Expanding_Wikipedia_articles_across_languages/Inter_language_approach#Section_Alignment
<https://meta.wikimedia.org/wiki/Research:Expanding_Wikipedia_articles_acros…>
* Link recommendation:
meta:Research:Link_recommendation_model_for_add-a-link_structured_task
<https://meta.wikimedia.org/wiki/Research:Link_recommendation_model_for_add-…>
* Image recommendation:
meta:Research:Recommending_Images_to_Wikipedia_Articles
<https://meta.wikimedia.org/wiki/Research:Recommending_Images_to_Wikipedia_A…>
* Equity in recommendations:
meta:Research:Prioritization_of_Wikipedia_Articles/Recommendation
<https://meta.wikimedia.org/wiki/Research:Prioritization_of_Wikipedia_Articl…>
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
--
Janna Layton (she/her)
Administrative Associate - Product & Technology
Wikimedia Foundation <https://wikimediafoundation.org/>
Hi,
If you're a researcher (whether from academia, industry, other sectors,
or independent), you'll probably be interested in participating in the
online session "Scientific greetings", which will be held this Sunday,
31 October, as part of the WikidataCon 2021
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021>. In this
condensed session each researcher will have 5 minutes to present what
aspects of Wikidata they're studying or how Wikidata is useful for their
research, find out what other colleagues are working on, and ask for or
offer collaboration.
We're sending this email to inform you that prior registration is
required to present at this session, so we encourage you to follow the
steps below:
1. Sign up for the WikidataCon 2021
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021> (online,
29-31 Oct) if you haven't already done so. It's free and requires no
personal data.
2. *Add your name or username and, optionally, other details **here
<https://www.wikidata.org/wiki/Wikidata:WikidataCon_2021/Program/Scientific_…>**as
soon as possible*. Slots will be allocated on a first-come,
first-served basis.
If you want to prepare slides, feel free to use this template
<https://docs.google.com/presentation/d/1XqQYDwfOnIAlhEjz__AJxg3lluYuNp3QOyo…>.
If you're planning a pre-recorded session, please upload it to Youtube
or Vimeo (unfortunately, we can't display videos from Commons).
Please feel free to share this email with anyone who might find it
useful, and write to us if you have any questions. We hope you enjoy the
session and the conference.
The organizers of the session,
Tiago, Gabriel and David
Apologies for cross-posting. The full release description including
further statistics can be found on
https://www.dbpedia.org/blog/snapshot-2021-09-release/
<https://www.dbpedia.org/blog/snapshot-2021-09-release/>.
We are pleased to announce immediate availability of a new edition of
the free and publicly accessible SPARQL Query Service Endpoint and
Linked Data Pages, for interacting with the new Snapshot Dataset.
News since DBpedia Snapshot 2021-06
<https://www.dbpedia.org/blog/snapshot-2021-06-release/>
*
Release notes are now maintained in the Databus Collection
(https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09
<https://databus.dbpedia.org/dbpedia/collections/dbpedia-snapshot-2021-09>)
*
Image and Abstract Extractor was improved
*
Work in progress: Smoothing the community issue reporting and
fixing at Github
(https://github.com/dbpedia/extraction-framework/issues/new/choose
<https://github.com/dbpedia/extraction-framework/issues/new/choose>)
What is the “DBpedia Snapshot” Release?
Historically, this release has been associated with many names: "DBpedia
Core", "EN DBpedia", and — most confusingly — just "DBpedia". In fact,
it is a combination of —
*
EN Wikipedia data— A small, but very useful, subset (~ 1 Billion
triples or 14%) of the whole DBpedia extraction
<https://link.springer.com/chapter/10.1007/978-3-030-59833-4_1>using
theDBpedia Information Extraction Framework
<https://github.com/dbpedia/extraction-framework>(DIEF), comprising
structured information extracted from the English Wikipedia plus
some enrichments from other Wikipedia language editions, notably
multilingual abstracts in ar, ca, cs, de, el, eo, es, eu, fr, ga,
id, it, ja, ko, nl, pl, pt, sv, uk, ru, zh.
*
Links— 62 million community-contributed cross-references and
owl:sameAs links to other linked data sets on the Linked Open Data
(LOD) Cloud that allow to effectively find and retrieve further
information from the largest, decentral, change-sensitive knowledge
graph on earth that has formed around DBpedia since 2007.
*
Community extensions— Community-contributed extensions such as
additional ontologies and taxonomies.
Release Frequency & Schedule
Going forward, releases will be scheduled for the 15th of February, May,
July, and October (with +/- 5 days tolerance), and are named using the
same date convention as the Wikipedia Dumps that served as the basis for
the release. An example of the release timeline is shown below:
September 6–8
Sep 8–20
Sep 20–Oct 10
Oct 10–20
Wikipedia dumps for June 1 become available on
https://dumps.wikimedia.org/ <https://dumps.wikimedia.org/>
Download and extraction with DIEF
Post-processing and quality-control period
Linked Data and SPARQL endpoint deployment
Data Freshness
Given the timeline above, the EN Wikipediadata of DBpedia Snapshot has a
lag of 1-4 months.
Further Information
Growth of DBpedia, breakdown of links by domain, download instructions
and some tips on how to effectively work with DBpedia are published as
part of this blog post:
https://www.dbpedia.org/blog/snapshot-2021-09-release/
<https://www.dbpedia.org/blog/snapshot-2021-09-release/>
Stay tuned and stay safe!
With kind regards,
The DBpedia Association
The Journal of Web Semantics (JWS) invites submissions for a special
issue on Community-based Knowledge Bases and Knowledge Graphs, edited by
Tim Finin, Sebastian Hellmann, David Martin, and Elena Simperl. (contact
email: cbkb(a)cs.umbc.edu <mailto:cbkb@cs.umbc.edu>) Submissions are due
by November 01, 2021. Please see the JWS post here:
http://www.websemanticsjournal.org/2021/06/cfp-community-based-knowledge-ba…
<http://www.websemanticsjournal.org/2021/06/cfp-community-based-knowledge-ba…>
Introduction
Community-based knowledge bases (KBs) and knowledge graphs (KGs) are
critical to many domains. They contain large amounts of information,
used in applications as diverse as search, question-answering systems,
and conversational agents. They are the backbone of linked open data,
helping connect entities from different datasets. Finally, they create
rich knowledge engineering ecosystems, making significant, empirical
contributions to our understanding of KB/KG science, engineering, and
practices. From here forward, we use "KB" to include both knowledge
bases and knowledge graphs. Also, "KB" and "knowledge" encompass both
ontology/schema and data.
Community-based KBs come in many shapes and sizes, but they tend to
share a number of commonalities:
*
They are created through the efforts of a group of contributors,
following a set of agreed goals, policies, practices, and quality norms.
*
They are available under open licenses.
*
They are central to knowledge-sharing networks bringing together
various stakeholders.
*
They serve the needs of a community of users, including, but not
restricted to, their contributor base.
*
Many draw their content from crowdsourced resources (such as
Wikipedia, OpenStreetMap).
Examples of community-based KBs include Wikidata, DBpedia, ConceptNet,
GeoNames, FrameNet, and Yago. This special issue will highlight recent
research, challenges, and opportunities in the field of community-based
KBs and the interaction and processes between stakeholders and the KBs.
We welcome papers on a wide variety of topics. Papers that focus on the
participation of a community of contributors are especially encouraged.
Topics of interest
We are looking for studies, frameworks, methods, techniques and tools on
topics such as the following:
*
The impact of community involvement on characteristics of KBs such
as requirements, design, technology choices, policies, etc. For
example, how are KB characteristics driven by the community and
reflective of the community's needs?
*
Conversely, the impact of KB characteristics on community
involvement. For example, how do changes in these characteristics
affect the participation and behavior of members of the community?
*
Organizational challenges and solutions in developing and managing
community-based KBs.
*
Technical challenges and solutions in community-based KBs,
concerning a technical area such as:
o
Representation of knowledge and logical foundations
o
Reasoning, querying, and constraint-checking
o
Knowledge acquisition
o
Knowledge preparation (e.g., cleaning, deduplication, alignment,
merging)
o
Maintaining consistency with external sources
o
Representing and managing metadata (including issues involved in
adding metadata to relation instances)
o
Provenance
o
Quality assurance
*
User interfaces and experience, both for contributing to the KB and
using it, by different user groups.
*
Implemented metrics and quality tests to guide the community in
improving KG quality and expanding KG coverage.
*
Achieving and managing knowledge diversity, for instance, in the
form of multilinguality, multi-cultural coverage, multiple points of
view, and a diverse and inclusive contributor base.
*
Detecting and avoiding malicious, inappropriate, and misleading
content in community-based KBs.
*
Biases in community-based KBs and their impact on downstream uses of
KB content.
*
Community-based KBs in science, medicine, law, government, or other
domains.
*
Handling specialized types of knowledge (such as commonsense,
probabilistic, or linguistic knowledge) in a community setting.
*
Methods and tools to manage KB evolution, including change
detection, change management, conflict resolution, visualization of
change history.
*
Tools and affordances supporting community or collaborative
activities, including discussions, feedback, decision making, task
allocation, etc.
*
Motivations and incentives affecting community participation.
*
Approaches and metrics for community health, including but not
restricted to community growth or diversity.
*
Roles and participation profiles in communities building and
maintaining KBs.
*
Frameworks and approaches to support group decision-making and
resolve conflicts.
Types of Papers
We invite submission of Research, Survey, Ontology, and System papers,
according to the guidelines given at https://www.jws-volumes.com
<https://www.jws-volumes.com/>.
Submission Guidelines
The Journal of Web Semantics solicits original scientific contributions
of high quality. Following the overall mission of the journal, we
emphasize the publication of papers that combine theories, methods and
experiments from different subject areas in order to deliver innovative
semantic methods and applications. The publication of large-scale
experiments and their analysis is also encouraged to clearly illustrate
scenarios and methods that introduce semantics into existing Web
interfaces, contents and services.
Submission of your manuscript is welcome provided that it, or any
translation of it, has not been copyrighted or published and is not
being submitted for publication elsewhere.
Manuscripts should be prepared for publication in accordance with
instructions given in the JWS guide for authors
<http://www.elsevier.com/journals/journal-of-web-semantics/1570-8268/guide-f…>.
The submission and review process will be carried out using Elsevier's
Web-based EM system
<https://www.editorialmanager.com/JOWS/default.aspx>. Please state the
name of the SI in your cover letter and, at the time of submission,
please select “VSI:CBKB” when reaching the Article Type selection.
Upon acceptance of an article, the author(s) will be asked to transfer
copyright of the article to the publisher. This transfer will ensure the
widest possible dissemination of information. Elsevier's liberalpreprint
policy<https://www.elsevier.com/authors/journal-authors/submit-your-paper/sharing-…>permits
authors and their institutions to host preprints on their web sites.
Preprints of the articles will be made freely accessible viaJWS First
Look
<https://papers.ssrn.com/sol3/JELJOUR_Results.cfm?form_name=journalbrowse&jo…>.
Final copies of accepted publications will appear in print and at
Elsevier's archival online server.
Important Dates
*
Submission deadline: November 1, 2021
*
Author notification: February 7, 2022
*
Minor revisions due: February 21, 2022
*
Major revisions due: March 14, 2022
*
Papers appear on JWS preprint server: May 2, 2022
*
Publication: Fall or Winter 2022
Guest Editors
Tim Finin is the Willard and Lillian Hackerman Chair in Engineering and
a Professor of Computer Science and Electrical Engineering at the
University of Maryland, Baltimore County (UMBC).
Sebastian Hellmann is the head of the “Knowledge Integration and
Language Technologies (KILT)" Competence Center at InfAI, Leipzig. He
also is the executive director and board member of the non-profit
DBpedia Association with over 30 key players
<https://www.dbpedia.org/members/overview/>in the knowledge graph area.
He earned a rank in AMiner’s top 10 of the most influential scholars in
knowledge engineering of the last decade.
David L. Martinis a Research & Development Scientist in Artificial
Intelligence. He has held positions at SRI International, Siri, Inc.,
Apple, Nuance Communications, Samsung Research America, and the
University of California at Santa Cruz. He is a Senior Member of the
Association for the Advancement of Artificial Intelligence, and
currently works as an independent consultant in Silicon Valley, California.
Elena Simperlis professor of computer science at King’s College London,
a Fellow of the British Computer Society and former Turing fellow.
According to AMiner, she is in the top 100 most influential scholars in
knowledge engineering of the last decade, as well as in the Women in AI
2000 ranking. Before joining King’s College, she held positions at the
University of Southampton, as well as in Germany and Austria.
I am pleased to announce that Wikimedia Enterprise's HTML dumps [1] for
October 17-18th are available for public download; see
https://dumps.wikimedia.org/other/enterprise_html/ for more information. We
expect to make updated versions of these files available around the 1st/2nd
of the month and the 20th/21st of the month, following the cadence of the
standard SQL/XML dumps.
This is still an experimental service, so there may be hiccups from time to
time. Please be patient and report issues as you find them. Thanks!
Ariel "Dumps Wrangler" Glenn
[1] See https://www.mediawiki.org/wiki/Wikimedia_Enterprise for much more
about Wikimedia Enterprise and its API.
Hi all,
To help bridge Wikipedia’s visual knowledge gaps, the Research team
<https://research.wikimedia.org/> at the Wikimedia Foundation has launched
the “Wikipedia Image/Caption Matching Competition
<https://www.kaggle.com/c/wikipedia-image-caption>”.
Read on for more information or check out our blog post
<https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-…>
!
Images are essential for knowledge sharing, learning, and understanding.
However, the majority of images on Wikipedia articles lack written context
(e.g., captions, alt-text), often making them inaccessible. As part of our
initiatives <https://research.wikimedia.org/knowledge-gaps.html> to address
Wikipedia’s knowledge gaps, the Research <https://research.wikimedia.org/>
team at the Wikimedia Foundation is hosting the “Wikipedia Image/Caption
Matching Competition <https://www.kaggle.com/c/wikipedia-image-caption>.”
We invite the communities of volunteers, developers, data scientists, and
machine learning enthusiasts to develop systems that can automatically
associate images with their corresponding captions and article titles.
In this competition (hosted on Kaggle <https://www.kaggle.com/>),
participants are provided with content from Wikipedia articles in 100+
language editions and are asked to build systems that automatically
retrieve the text (an image caption, or an article title) closest to a
query image.The data is a combination of Google AI’s recently released WIT
dataset <https://github.com/google-research-datasets/wit> and a new dataset
of 6 Million images from Wikimedia Commons that we have released
<https://analytics.wikimedia.org/published/datasets/one-off/caption_competit…>
for this competition. Kaggle is hosting all data needed to get started with
the task, example notebooks, a forum for participants to share and
collaborate, and submitted models in open-sourced formats.
We encourage everyone to download our data and participate in the
competition. This challenge is an opportunity for people around the world
to grow their technical skills while increasing the accessibility of
Wikipedia.
This competition is possible thanks to collaborations with Google Research
<https://research.google/>, EPFL <https://www.epfl.ch/en/>, Naver Labs
Europe <https://europe.naverlabs.com/> and Hugging Face
<https://huggingface.co/>, who assisted with data preparation and
competition design. Check out our blog post
<https://diff.wikimedia.org/2021/09/13/the-wikipedia-image-caption-matching-…>
for more information! The point of contact for this project is Miriam Redi.
You're welcome to reach out with questions or comments at
miriam(a)wikimedia.org.
Cheers,
Emily Lescak, on behalf of the Research team
--
Emily Lescak (she / her)
Senior Research Community Officer
The Wikimedia Foundation