Hi everyone,
A while ago we did a survey among reusers about the different types of
ontology issues they are facing when building applications and more
using data from Wikidata. The results are available now. More details
here: https://www.wikidata.org/wiki/Wikidata_talk:Ontology_issues_prioritization#…
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher - WD:Q18016466
Portfolio Lead for Wikidata
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
https://wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e.
V. Eingetragen im Vereinsregister des Amtsgerichts
Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig
anerkannt durch das Finanzamt für Körperschaften I Berlin,
Steuernummer 27/029/42207.
Hi all,
The next Research Showcase, with the theme of *Wikimedia and LGBTQIA+*,
will be live-streamed Wednesday, June 21 at 16:30 UTC. Find your local time
here <https://zonestamp.toolforge.org/1687365012>.
YouTube stream: https://www.youtube.com/watch?v=AOD2ZdxRNfo
You can join the conversation on IRC at #wikimedia-research or on the
YouTube chat.
This month's presentations:
- *Multilingual Contextual Affective Analysis of LGBT People Portrayals
in Wikipedia*
- *Speaker*: Chan Park, Carnegie Mellon University
- *Abstract*: In this talk, I present our research on analyzing the
portrayal of LGBT individuals in their biographies on Wikipedia, with a
particular focus on subtle word connotations and cross-cultural
comparisons. We aim to address two primary research questions: 1) How can
we effectively measure the nuanced connotations of words in multilingual
texts, which reflect sentiments, power dynamics, and agency? 2)
How can we
analyze the portrayal of a specific group, such as the LGBT
community, and
compare these portrayals across different languages? To answer these
questions, we collect the Multilingual Contextualized Connotation Frames
dataset, comprising 2,700 examples in English, Spanish, and Russian. We
also develop a new multilingual model based on pre-trained multilingual
language models. Additionally, we devise a matching algorithm to
construct
a comparison corpus for the target corpus, isolating the attribute of
interest. Finally, we showcase how our developed models and constructed
corpora enable us to conduct cross-cultural analysis of LGBT People
Portrayals on Wikipedia. Our results reveal systematic differences in how
the LGBT community is portrayed across languages, surfacing cultural
differences in narratives and signs of social biases.
- *Paperː* Park, C. Y., Yan, X., Field, A., & Tsvetkov, Y. (2021,
May). Multilingual contextual affective analysis of LGBT people
portrayals
in Wikipedia. In Proceedings of the International AAAI Conference on Web
and Social Media (Vol. 15, pp. 479-490).
<https://arxiv.org/pdf/2010.10820.pdf>
- *Visual gender biases in Wikipediaː A systematic evaluation across the
ten most spoken languages*
- *Speaker*: Daniele Metilli, University College London
- *Abstract*: Wikidata Gender Diversity (WiGeDi) is a one-year
project funded through the Wikimedia Research Fund. The project
is studying
gender diversity in Wikidata, focusing on marginalized gender identities
such as those of trans and non-binary people, and adopting a queer and
intersectional feminist perspective. The project is organised in three
strands — model, data, and community. First, we are looking at how the
current Wikidata ontology model represents gender, and the
extent to which
this representation is inclusive of marginalized gender
identities. We are
analysing the data stored in the knowledge base to gather insights and
identify possible gaps and biases. Finally, we are looking at how the
community has handled the move towards the inclusion of a wider
spectrum of
gender identities by studying a corpus of user discussions through
computational linguistics methods. This presentation will report on the
current status of the Wikidata Gender Diversity project and the
envisioned
outcomes. We will discuss the main challenges that we are facing and the
opportunities that our project will potentially enable, on Wikidata and
beyond.
- *Paperː* Metilli D. & Paolini C. (in press). ‘Non-binary gender
representation in Wikidata’. In: Provo A., Burlingame K. & Watson B.M.
Ethics in Linked Data. Litwin Books. <https://wigedi.com/chapter.pdf>
You can watch our past Research Showcases here:
https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase
Hope you can join us!
Warm regards,
--
*Pablo Aragón (he/him)*
Research Scientist
Wikimedia Foundation
https://research.wikimedia.org
Hello all!
The Search Platform Team usually holds an open meeting on the first
Wednesday of each month. Come talk to us about anything related to
Wikimedia search, Wikidata Query Service (WDQS), Wikimedia Commons Query
Service (WCQS), etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, May 3, 2023
Time: 15:00-16:00 UTC / 08:00 PDT / 11:00 EDT / 17:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vgj-bbeb-uyi
Join by phone: https://tel.meet/vgj-bbeb-uyi?pin=8118110806927
Have fun and see you soon!
Guillaume
--
*Guillaume Lederrey* (he/him)
Engineering Manager
Wikimedia Foundation <https://wikimediafoundation.org/>
We are very excited that we will conduct the fourth Wikidata workshop this
year and would be very happy to see a lot of submissions from the Wikidata
community! Please see the call for papers below!
The Fourth Wikidata Workshop
Call for Papers
Co-located with the 22nd International Conference on Semantic Web (ISWC
2023).
Date: November 6 or 7, 2023
The format of the workshop will be announced soon
Website: https://wikidataworkshop.github.io/2023/
== Important dates ==
Papers due: Thursday, 20 July 2023
Notification of accepted papers: Thursday, September 31, 2023
Camera-ready papers due: Thursday, September 7, 2023
Workshop date: November 06/07, 2023
== Overview ==
Wikidata is an openly available knowledge base hosted by the Wikimedia
Foundation. It can be accessed and edited by both humans and machines and
acts as a common structured data repository for several Wikimedia projects,
including Wikipedia, Wiktionary, and Wikisource. It is used in a variety of
applications by researchers and practitioners alike.
In recent years, we have seen an increase in the number of publications
around Wikidata. While there are several dedicated venues for the broader
Wikidata community to meet, none of them focuses on publishing original,
peer-reviewed research. This workshop fills this gap - we hope to provide a
forum to build this fledgling scientific community and promote novel work
and resources that support it.
The workshop primarily seeks original contributions that address the
opportunities and challenges of creating, contributing to, and using a
global, collaborative, open-domain, multilingual knowledge graph such as
Wikidata.
We encourage a range of submissions, including novel research, opinion
pieces, and descriptions of systems and resources which are naturally
linked to Wikidata and its ecosystem or enabled by it. What we are less
interested in are works that use Wikidata alongside or in lieu of other
resources to carry out some computational task - unless the work feeds back
into the Wikidata ecosystem, for instance, by improving or commenting on
some Wikidata aspect, or suggesting new design features, tools, and
practices.
This year, we again added a track for already published work. To foster
conversations around the topic of Wikidata, we invite authors of papers
published at other conferences to submit their papers to present at the
workshop. These will not be included in the proceedings but gives a chance
for authors to interact with the community.
We welcome interdisciplinary work, as well as interesting applications that
shed light on the benefits of Wikidata and discuss areas of improvement.
The workshop is planned as an interactive half-day event, in which most of
the time will be dedicated to discussions and exchanges rather than oral
presentations. For this reason, all accepted papers will be presented in
short talks and accompanied by a poster.
== Topics ==
Topics of submissions include, but are not limited to:
- Data quality and vandalism detection in Wikidata
- Referencing in Wikidata
- Anomaly, bias, or novelty detection in Wikidata
- Algorithms for aligning Wikidata with other knowledge graphs
- The Semantic Web and Wikidata
- Community interaction in Wikidata
- Multilingual aspects of Wikidata
- Using LLM with Wikidata
- Innovative uses of AI and NLP applications for Wikidata
- Machine learning approaches to improve data quality in Wikidata
- Tools, bots, and datasets for improving or evaluating Wikidata
- Participation, diversity, and inclusivity aspects in the Wikidata
ecosystem
- Human-bot interaction
- Managing knowledge evolution in Wikidata
- Abstract Wikipedia
== Submission guidelines ==
We welcome the following types of contributions.
= Track 1: Novel Works =
The papers in this track will be peer-reviewed by at least three
researchers using a single-blind review process. Accepted papers will be
published as open-access papers on CEUR (authors can also waive this). We
invite the following types of papers:
- Full research paper: Novel research contributions (7-12 pages)
- Short research paper: Novel research contributions of smaller scope than
full papers (3-6 pages)
- Position paper: Well-argued ideas and opinion pieces, not yet in the
scope of a research contribution (6-8 pages)
- Resource paper: New dataset or other resources directly relevant to
Wikidata, including the publication of that resource (8-12 pages)
- Demo paper: New system critically enabled by Wikidata (6-8 pages)
Submissions must be as PDF or HTML, formatted in the style of the Springer
Publications format for Lecture Notes in Computer Science (LNCS). For
details on the LNCS style, see Springer’s Author Instructions.
Papers have to be submitted through OpenReview(Please add “[NOVEL]” at the
beginning of the title on the submission page so we know that you are
submitting to this track):
https://openreview.net/group?id=swsa.semanticweb.org/ISWC/2023/Workshop/Wik…
= Track 2: Published works =
This track welcomes papers previously published at a peer-reviewed research
venue to be presented and discussed in the workshop. They do not have to
follow the formatting and page limit instructions from Track 1 and can
instead be submitted in the original format.
Previously published papers will be reviewed by the organising committee in
terms of the topical fit and prominence of the publication venue. They will
not be published as part of the proceedings. We invite the following types
of papers:
- Full research paper: Previously published research contributions
- Resource paper: Previously published datasets or other resources that are
important or interesting to the community
- Demo paper: Presenting a previously published system critically enabled
by Wikidata
Papers have to be submitted through OpenReview (please add “[PUBLISHED]” at
the beginning of the title on the submission page so we know that you are
submitting to this track):
https://openreview.net/group?id=swsa.semanticweb.org/ISWC/2023/Workshop/Wik…
== Proceedings ==
The complete set of papers from the Novel Works Track will be published
with the CEUR Workshop Proceedings (CEUR-WS.org).
== Best Paper Award ==
We will recognize the best paper with the best paper award. Reviewers will
be asked to flag papers they deem worthy of a prize. The general chairs
will set up a small panel that will read the papers, consider the
reviewers' comments and assess the talk to determine the winner. The award
comes with a 500 € prize, sponsored by Robert Bosch GmbH.
== Organizing committee ==
Lucie-Aimée Kaffee, Hasso Plattner Institute, Lucie-Aimee.Kaffee[[(a)]]hpi.de
Simon Razniewski, Bosch Center for AI, Simon.Razniewski[[(a)]]de.bosch.com
Kholoud Alghamdi, King's College London, kholoud.alghamdi[[(a)]]kcl.ac.uk
Hiba Arnaout, Max Planck Institute for Informatics,
harnaout[[(a)]]mpi-inf.mpg.dec
--
Lucie-Aimée Kaffee
--
Lucie-Aimée Kaffee
Hi all,
Too many items on Wikidata still miss the basic statements. Perhaps we can
focus together for a short period of time on a single subject to get this
fixed.
For example: all items with instance of (P31) tourist attraction (like
museums) should also contain the country (P17) where it is located.
In the past day I worked on this and got it down from 4689 to 1190 items,
but help is welcome.
Query:
https://query.wikidata.org/#SELECT%20%3Fitem%20%3FitemLabel%20%3FitemDescri…
Who can help to get this number down further?
Thanks!
Romaine(or