Hi everyone,
This is to announce that over the past month we started to look at ways to
help us all get a better understanding of the quality of Wikidata's data in
a specific area of interest. For this purpose we worked on building two
tools; an Item Quality Evaluator and a Constraint Violation Checker - both
of these tools are now available at:
-
Item Quality Evaluator <https://item-quality-evaluator.toolforge.org>
-
Constraint Violation Checker
<https://github.com/wmde/wikidata-constraints-violation-checker>
Data quality on Wikidata has many aspects. The constraint violations and
ORES quality scores that these tools use are two helpful indicators of
certain aspects of quality that we hope will be helpful for you.
As you may know, Wikidata’s data quality is very unevenly distributed -
some areas are very well maintained and others not so much. We only
currently provide ORES quality scores on a global and per-Item level. This
has two effects, however:
-
Editors taking care of a specific area of Wikidata want to improve that
area but currently don’t have an easy way to find the Items with the lowest
quality they can focus their time on in order to raise the quality of that
area.
-
Re-user of Wikidata’s data are usually only interested in a subset of
Wikidata’s Items and by extension the quality of that subset. It is
currently hard for them to know what quality level they are getting for
their subset of interest.
To address this issue we put together two small tools. The Item Quality
Evaluator is a simple website that provides ORES quality scores for a list
of Items in Wikidata. The Constraint Violation Checker is a small
command-line script that retrieves the number of constraint violations and
ORES scores for a list of Items for further analysis.
How does the Item Quality Evaluator tool work?
You provide it with a list of Item IDs or a SPARQL query and then it'll get
the ORES score for each of them as well as the average score over all the
Items you
provided in a nice webpage. This way, you can more easily identify the
Items in an area you are interested in that have the lowest quality and
improve them.
How does the Constraint Violation Checker script work?
When you run it, it outputs a CSV file with the number of statements, the
number of constraint violations for each severity level, the number of
sitelinks to all projects and to Wikipedia and the ORES score for each of
those Items.
Why didn't we integrate the constraint violations data into the Item
Quality Evaluator?
We want to do that in the long-term but right now it is not possible
because the constraint violation data is not easily accessible and
retrieving it takes several hours to run for a large list of items.
Please try these tools and let us know if you encounter any issues. If you
want to provide general feedback, feel free to let us know.
Cheers,
--
Mohammed Sadat
*Community Communications Manager for Wikidata/Wikibase*
Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin
Phone: +49 (0)30 219 158 26-0
https://wikimedia.de
Keep up to date! Current news and exciting stories about Wikimedia,
Wikipedia and Free Knowledge in our newsletter (in German): Subscribe now
<https://www.wikimedia.de/newsletter/>.
Imagine a world in which every single human being can freely share in the
sum of all knowledge. Help us to achieve our vision!
https://spenden.wikimedia.de
Wikimedia Deutschland – Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/029/42207.
The on-wiki version of this is here:
https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Focus_languages
Hello all,
The Wikidata team at Wikimedia Deutschland will be working on improvements
to the lexicographic data part of Wikidata during this year. The Abstract
Wikipedia team at the Wikimedia Foundation will be working on the
generation of natural language text for baseline Wikipedia articles in the
next few years, and on functions in Wikifunctions to work with
lexicographic data. For these cases, it would be beneficial to focus on a
small specific set of languages at first. Participating communities will
hopefully find that this project leads to long-term growth in Wikipedia and
Wiktionary in and about their language.
Lydia and Denny would like to choose the same focus languages for both of
the teams, as this is beneficial for both projects to have this aligned.
We will be working closely together with the focus communities over the
next few years. This means that features will land first in these languages
and we will have particularly active feedback channels. We are looking for
communities that are open to trying out new things.
The decision of which languages should be the focus languages should be
done together with the wider communities. In particular, we would like to
make the decision with a promising self-selecting community. This worked
very well for Wikidata, where the focus projects were self-selected.
We will use English as a demonstration language and two or three other
languages as focus languages. English is chosen as it is easy to
demonstrate to a wide audience and is a working language for both
development teams.
For the focus languages, we want to work with an active and enthusiastic
community or seed of a community over the next few years on these projects.
In order to be fully transparent, we have compiled a number of detailed
other criteria
<https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Focus_languages…>
we would like to use to guide us in our decision, but this assumes that
there are communities to choose from. None of these criteria are set in
stone, and we are happy to discuss them, remove some if they are not good
ideas, or add others if we missed something. Regard this as a strawdog
proposal. For example, Mahir Morshed
<http://meta.wikimedia.org/wiki/User:Mahir256> came up with a complementary
set of criteria on Phabricator
<https://phabricator.wikimedia.org/T274373#6821602>, which we will consider
in the selection as well. We will have Q&A office hours for discussion, and
are open to comments via wiki
<https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data/Focus_lang…>
or email.
We are thinking of a two-pronged approach:
-
first, to call for communities to propose themselves to work with us;
-
second, to look at the data and see which languages would be good
candidates.
We don’t want to set too strict a process. We would like the second prong
of the approach to go on throughout the whole process to help us come to a
good understanding of the options.
For the first prong, we would like the candidate seed groups to describe
and nominate themselves on wiki, following a short form
<https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Focus_languages…>.
Nominations should be submitted by April 7, and the decision will be made
by April 14 by the teams taking your comments into account. If we notice
that self-nominations are not happening, we will try to engage with
language communities directly.
It is possible that the two teams will choose different candidates,
although we will try to avoid that.
We are looking forward to hearing about what you think of this proposal.
Please comment on the talk page on wiki
<https://www.wikidata.org/wiki/Wikidata_talk:Lexicographical_data/Focus_lang…>
.
Lydia and Denny
Hi everyone,
We are delighted to announce that Wiki Workshop 2021 will be held
virtually in April 2021 and as part of the Web Conference 2021 [1].
The exact day is to be finalized and we know it will be between April
19-23.
In the past years, Wiki Workshop has traveled to Oxford, Montreal,
Cologne, Perth, Lyon, and San Francisco, and (virtually) to Taipei.
Last year, we had more than 120 participants in the workshop and we
are particularly excited about this year's as we will celebrate the
20th birthday of Wikipedia.
We encourage contributions by all researchers who study the Wikimedia
projects. We specifically encourage 1-2 page submissions of
preliminary research. You will have the option to publish your work as
part of the proceedings of The Web Conference 2021.
You can read more about the call for papers and the workshop at
http://wikiworkshop.org/2021/#call. Please note that the deadline for
the submissions to be considered for proceedings is January 29. All
other submissions should be received by March 1.
If you have questions about the workshop, please let us know on this
list or at wikiworkshop(a)googlegroups.com.
Looking forward to seeing many of you in this year's edition.
Best,
Miriam Redi, Wikimedia Foundation
Bob West, EPFL
Leila Zia, Wikimedia Foundation
[1] https://www2021.thewebconf.org/
Call for Proposals: 2021 LD4 Conference on Linked Data, “Building Connections Together”
Submission deadline: Monday, April 12, 2021
Website: http://bit.ly/ld42021
The deadline to propose content for the 2021 LD4 Conference is fast approaching! All proposals are due Monday, April 12, by 11:59 pm PST.
The 2021 LD4 Conference<http://bit.ly/ld42021> will be held online from July 12-23. We are accepting proposals for content in a variety of formats and we invite you to submit your proposal<https://forms.gle/h8BVmSHrre8h4Xsy7> to be part of creating this conference! By bringing together a broad range of perspectives, the conference seeks to foster a community of practice for linked data in cultural heritage institutions.
The conference will include activities tailored to all levels of experience with linked data, with a focus on themes of:
* Linked data education
* Inclusion of diverse voices
* Practical steps toward linked data adoption
* Reliability and availability of linked data
* Incorporating linked data into day-to-day library operations
* Linked data advocacy
We are accepting proposals for content in a variety of formats and especially encourage proposals (conference or pre-conference) from participants from groups and regions that are traditionally underrepresented in conferences related to linked data in libraries and other cultural heritage organizations, as well as proposals from early career professionals. Successful proposals will focus on concrete ways that linked data impacts GLAM (Galleries, Libraries, Archives, and Museums) institutions, and will share pathways that allow others to participate in linked data. Please see the full Call for Proposals<https://sites.google.com/stanford.edu/2021ld4conf/call-for-proposals?authus…> for additional details.
Questions about the conference and this call for proposals? See our website bit.ly/ld42021<http://bit.ly/ld42021> or contact 2021_ld4conf_chairs(a)googlegroups.com<mailto:2021_ld4conf_chairs@googlegroups.com>
Thanks,
Gloria
Gloria Gonzalez
Senior Agile Product Owner
Zepheira, a division of EBSCO
https://www.ebsco.com/
1st Call for Papers:
** The Second Wikidata Workshop **
Co-located with the 20th International Conference on Semantic Web (ISWC
2021).
Date: October 24 or 25, 2021
The workshop will be held online, afternoon European time.
Website: https://wikidataworkshop.github.io/2021/
== Important dates ==
Papers due: Friday, July 30, 2021
Notification of accepted papers: Friday, September 24, 2021
Camera-ready papers due: Monday, October 4, 2021
Workshop date: October 24/25, 2021
== Overview ==
Wikidata is an openly available knowledge base, hosted by the Wikimedia
Foundation. It can be accessed and edited by both humans and machines
and acts as a common structured data repository for several Wikimedia
projects, including Wikipedia, Wiktionary, and Wikisource. It is used in
a variety of applications by researchers and practitioners alike.
In recent years, we have seen an increase in the number of publications
around Wikidata. While there are several dedicated venues for the
broader Wikidata community to meet, none of them focuses on publishing
original, peer-reviewed research. This workshop fills this gap - we hope
to provide a forum to build this fledgling scientific community and
promote novel work and resources that support it.
The workshop seeks original contributions that address the opportunities
and challenges of creating, contributing to, and using a global,
collaborative, open-domain, multilingual knowledge graph such as Wikidata.
We encourage a range of submissions, including novel research, opinion
pieces, and descriptions of systems and resources, which are naturally
linked to Wikidata and its ecosystem, or enabled by it. What we’re less
interested in are works which use Wikidata alongside or in lieu of other
resources to carry out some computational task - unless the work feeds
back into the Wikidata ecosystem, for instance by improving or
commenting on some Wikidata aspect, or suggesting new design features,
tools and practices.
We also encourage submissions on the topic of Abstract Wikipedia,
particularly around collaborative code management, natural language
generation by a community, the abstract representation of knowledge, and
the interaction between Abstract Wikipedia and Wikidata on the one, and
Abstract Wikipedia and the language Wikipedias on the other side.
We welcome interdisciplinary work, as well as interesting applications
that shed light on the benefits of Wikidata and discuss areas of
improvement.
The workshop is planned as an interactive half-day event, in which most
of the time will be dedicated to discussions and exchange rather than
oral presentations. For this reason, all accepted papers will be
presented in short talks and accompanied by a poster. All works will be
presented online.
== Topics ==
Topics of submissions include, but are not limited to:
- Data quality and vandalism detection in Wikidata
- Referencing in Wikidata
- Anomaly, bias, or novelty detection in Wikidata
- Algorithms for aligning Wikidata with other knowledge graphs
- The Semantic Web and Wikidata
- Community interaction in Wikidata
- Multilingual aspects in Wikidata
- Machine learning approaches to improve data quality in Wikidata
- Tools, bots and datasets for improving or evaluating Wikidata
- Participation, diversity and inclusivity aspects in the Wikidata ecosystem
- Human-bot interaction
- Managing knowledge evolution in Wikidata
- Abstract Wikipedia
== Submission guidelines ==
We welcome the following types of contributions.
- Full research paper: Novel research contributions (7-12 pages)
- Short research paper: Novel research contributions of smaller scope
than full papers (3-6 pages)
- Position paper: Well-argued ideas and opinion pieces, not yet in the
scope of a research contribution (6-8 pages)
- Resource paper: New dataset or other resources directly relevant to
Wikidata, including the publication of that resource (8-12 pages)
- Demo paper: New system critically enabled by Wikidata (6-8 pages)
Submissions must be as PDF or HTML, formatted in the style of the
Springer Publications format for Lecture Notes in Computer Science
(LNCS). For details on the LNCS style, see Springer’s Author Instructions.
The papers will be peer-reviewed by at least three researchers. Accepted
papers will be published as open access papers on CEUR (we will only
publish to CEUR if the authors agree to have their papers published).
Papers have to be submitted through easychair:
https://easychair.org/conferences/?conf=wikidata21
== Proceedings ==
The complete set of papers will be published with the CEUR Workshop
Proceedings (CEUR-WS.org).
== Organizing committee ==
Lucie-Aimée Kaffee, University of Southampton,
lucie.kaffee[[(a)]]gmail.com
Simon Razniewski, Max Planck Institute for Informatics,
srazniew[[(a)]]mpi-inf.mpg.de
Aidan Hogan, University of Chile,
ahogan[[(a)]]dcc.uchile.cl
--
Lucie-Aimée Kaffee
The Search Platform Team
<https://www.mediawiki.org/wiki/Wikimedia_Search_Platform> usually holds
office hours the first Wednesday of each month. Come talk to us about
anything related to Wikimedia search, Wikidata Query Service, Wikimedia
Commons Query Service, etc.!
Feel free to add your items to the Etherpad Agenda for the next meeting.
Details for our next meeting:
Date: Wednesday, April 7th, 2021
Time: 15:00-16:00 GMT / 08:00-09:00 PDT / 11:00-12:00 EDT / 17:00-18:00 CEST
Etherpad: https://etherpad.wikimedia.org/p/Search_Platform_Office_Hours
Google Meet link: https://meet.google.com/vyc-jvgq-dww
Join by phone in the US: +1 786-701-6904 PIN: 262 122 849#
Hope to talk to you tomorrow!
Trey Jones
Sr. Computational Linguist, Search Platform
Wikimedia Foundation
UTC–4 / EDT