Wikidata February 2020

wikidata@lists.wikimedia.org

45 participants
36 discussions

LDF server missing language tags for labels
by Osma Suominen 05 Feb '20

05 Feb '20

Hi, I tried to use the Wikidata LDF endpoint [1] to access label data about individual entities instead of using the SPARQL endpoint (possibly unreliable) or looking up URIs directly (which can be very slow when there are lots of statements about the entity). But I noticed that the LDF endpoint doesn't know about the language tags for labels, which makes the service a bit pointless for me, as I'm interested in the labels in a specific language. For example, the first example query in the WDQS User Manual section on the LDF endpoint [2], about the entity Q146 (cat), returns triples like this: Q146 label "பூனை"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. Q146 label "amcic"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. Q146 label "chat"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. Q146 label "gat"^^http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. All the label and description values have the data type rdf:langString and no language tag. I suspect this is somehow related to RDF 1.1, which introduced the langString data type. Any chance of having this fixed or should I just rely on the other endpoints? Best, Osma [1] https://query.wikidata.org/bigdata/ldf [2] https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Linked_Da… -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 15 (Unioninkatu 36) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen(a)helsinki.fi http://www.nationallibrary.fi

1 0

New Wikimedia dataset for NLP research
by Gabriel Altay 03 Feb '20

03 Feb '20

Hello Wikidata folks, I would like to bring your attention to an open source dataset I've been developing called the Kensho Derived Wikimedia Dataset (KDWD). It's a cleaned English subset of Wikipedia/Wikidata with 2.3B tokens, 5.3M pages, 51M nodes, and 120M edges. More details are available here https://blog.kensho.com/announcing-the-kensho-derived-wikimedia-dataset-5d1… best, -Gabriel

1 0

Weekly Summary #401
by Léa Lacroix 03 Feb '20

03 Feb '20

*Here's your quick overview of what has been happening around Wikidata over the last week.* Discussions - Open request for adminship: Mike Peel <https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrat…> Events <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events> - Past: Wikidata in Social Science Classroom - Workshop <https://www.wikidata.org/wiki/Wikidata:Events/Dubai/Wikidata_in_Social_Stud…>, Dubai, January 21st - Upcoming: Wikibase Community User Group online meeting <https://meta.wikimedia.org/wiki/Wikibase_Community_User_Group/Meetings/Febr…> (date to be decided, you can vote here <https://framadate.org/l74qKfBWnVbLag3m>) - Upcoming: OSM TW x Wikidata Taiwan meetup <https://meta.wikimedia.org/wiki/Wikimedia_Taiwan/Wikidata_Taiwan/2020>, February 10th, Taipei Press, articles, blog posts <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage> - A newbie's guide to querying Wikidata <https://markhneedham.com/blog/2020/01/29/newbie-guide-querying-wikidata/>, by Mark Needham Tool of the week - VizQuery <https://tools.wmflabs.org/hay/vizquery/> allows you to use the Wikidata Query Service without having to know SPARQL. Simply use a couple of autocomplete input boxes and you can do most basic queries. Other Noteworthy Stuff - Bruno <https://www.wikidata.org/wiki/User:Bruno771> and Denny <https://www.wikidata.org/wiki/User:Denny> present how to use Lexical Masks <https://www.wikidata.org/wiki/Wikidata:Lexical_Masks> in ShEx to validate lexemes, including a first set of example schemata. They also invite everyone to work on more languages, and will keep adding more ShEx schema over time. - 2020 report on Property constraints <https://www.wikidata.org/wiki/Wikidata:2020_report_on_Property_constraints> by user:Abián <https://www.wikidata.org/wiki/User:Abi%C3%A1n> - Wikimedia Hackathon in Tirana <https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2020>: scholarship requests and registration for people needing visa support are open until February 9th. - Mismatched reference: first version to be deployed this week <https://lists.wikimedia.org/pipermail/wikidata/2020-February/013786.html> - OpenRefine 3.3 <https://github.com/OpenRefine/OpenRefine/releases/tag/3.3> was released Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: - General datatypes: category for maps <https://www.wikidata.org/wiki/Property:P7867>, number of reviews/ratings <https://www.wikidata.org/wiki/Property:P7887>, merged into <https://www.wikidata.org/wiki/Property:P7888>, 8-bits.info ID <https://www.wikidata.org/wiki/Property:P7890> - External identifiers: CoBiS author ID <https://www.wikidata.org/wiki/Property:P7865>, marterl.at ID <https://www.wikidata.org/wiki/Property:P7866>, NPDRIM record ID <https://www.wikidata.org/wiki/Property:P7868>, Analysis & Policy Observatory node ID <https://www.wikidata.org/wiki/Property:P7869>, Analysis & Policy Observatory term ID <https://www.wikidata.org/wiki/Property:P7870>, PCBdB game ID <https://www.wikidata.org/wiki/Property:P7871>, Diccionari del cinema a Catalunya ID <https://www.wikidata.org/wiki/Property:P7872>, EFIS film festival ID <https://www.wikidata.org/wiki/Property:P7873>, EFIS person ID <https://www.wikidata.org/wiki/Property:P7874>, Eurogamer ID <https://www.wikidata.org/wiki/Property:P7875>, FlashScore.com team ID <https://www.wikidata.org/wiki/Property:P7876>, GameStar ID <https://www.wikidata.org/wiki/Property:P7877>, Soccerdonna team ID <https://www.wikidata.org/wiki/Property:P7878>, The Video Games Museum game ID <https://www.wikidata.org/wiki/Property:P7879>, Voetbal International player ID <https://www.wikidata.org/wiki/Property:P7880>, Games Database game ID <https://www.wikidata.org/wiki/Property:P7881>, ft.dk politician identifier <https://www.wikidata.org/wiki/Property:P7882>, Historical Marker Database ID <https://www.wikidata.org/wiki/Property:P7883>, Joconde Inscription ID <https://www.wikidata.org/wiki/Property:P7884>, Joconde time period ID <https://www.wikidata.org/wiki/Property:P7885>, Media Art Database ID <https://www.wikidata.org/wiki/Property:P7886>, Cambridge Encyclopedia of Anthropology ID <https://www.wikidata.org/wiki/Property:P7889>, EFIS filmfirm ID <https://www.wikidata.org/wiki/Property:P7891>, EFIS film ID <https://www.wikidata.org/wiki/Property:P7892>, Ciência ID <https://www.wikidata.org/wiki/Property:P7893>, Swedish School Registry ID <https://www.wikidata.org/wiki/Property:P7894>, Whaling History ID <https://www.wikidata.org/wiki/Property:P7895> - New property proposals <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Property_proposal> to review: - General datatypes: hierarchy switch <https://www.wikidata.org/wiki/Wikidata:Property_proposal/hierarchy_switch>, Wikipedia infobox field <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Wikipedia_infobox_…>, ontological level of Wikidata item <https://www.wikidata.org/wiki/Wikidata:Property_proposal/ontological_level_…>, status of mortal remains <https://www.wikidata.org/wiki/Wikidata:Property_proposal/status_of_mortal_r…>, TheTVDB person ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/TheTVDB_person_ID>, fails compliance with <https://www.wikidata.org/wiki/Wikidata:Property_proposal/fails_compliance_w…> - External identifiers: Gamekult company ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gamekult_company_ID>, Gamekult franchise ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gamekult_franchise…>, CNGB project ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/CNGB_project_ID>, Gamekult platform ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Gamekult_platform_…>, Adelsvapen ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Adelsvapen_ID>, Bibliotheca Hagiographica Latina ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Bibliotheca_Hagiog…>, Clavis Clavium ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Clavis_Clavium_ID>, FEMA number <https://www.wikidata.org/wiki/Wikidata:Property_proposal/FEMA_number>, SerialStation game ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/SerialStation_game…>, GBAtemp game ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/GBAtemp_game_ID>, AnimalBase ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/AnimalBase_ID>, RPGamer ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/RPGamer_ID>, Denkmalatlas Niedersachsen Objekt-ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Denkmalatlas_Niede…>, DR music artist ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/DR_music_artist_ID>, Jurisdiction List Number <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Jurisdiction_List_…>, Médias 19 ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/M%C3%A9dias_19_ID>, ArchiWebture ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/ArchiWebture_ID>, MOCAGH ID <https://www.wikidata.org/wiki/Wikidata:Property_proposal/MOCAGH_ID> - Query examples: - Number of cases coronavirus reported per country <https://w.wiki/GTg> - Release date of albums from before 1980 by artists of the EDM genre <https://w.wiki/GZs> (source <https://twitter.com/LearningSPARQL/status/1223669161966080001>) - Books related to LGBTI+ topics <https://w.wiki/GZt> (source <https://twitter.com/jsamwrites/status/1224053275458265092>) - Map of medical facilities in Kalimpong district, India, color-coded by type <https://w.wiki/GZw> (source <https://twitter.com/wikidataindia/status/1222406520400297984>) Development - Enable the first version of tainted/mismatched references on wikidata.org - Work on adding a button to hide the notification (phab:T234789 <https://phabricator.wikimedia.org/T234789>) - Show the icon after canceling editing if the icon was shown before ( phab:T234790 <https://phabricator.wikimedia.org/T234790>) - More work on Wikidata Bridge (restrict editing based on user rights or data types) - More work on wb_terms migration You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. If you want to help, you can also have a look at the tasks needing a volunteer <https://phabricator.wikimedia.org/project/board/71/query/zfiRgTnZF7zu/?filt…>. Monthly Tasks - Add labels, in your own language(s), for the new properties listed above. - Comment on property proposals: all open proposals <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Overview> - Suggested and open tasks <https://www.wikidata.org/wiki/Wikidata:Contribute/Suggested_and_open_tasks> ! - Contribute to a Showcase item <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Showcase_items> . - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread the interface and documentation pages, in your own language! - Help merge identical items <https://www.wikidata.org/wiki/User:Pasleim/projectmerge> across Wikimedia projects. - Help write the next summary! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Knowledge Graph Conference 2020 - Workshops and Tutorials Announcement
by Violeta Ilik 03 Feb '20

03 Feb '20

Dear Wikidata community, The Knowledge Graph Conference organizing team is pleased to announce the workshops and tutorials part of the KGC 2020 Program. They are taking place on May 4 and 5 in Butler Library, Columbia University Libraries in NYC. Workshops are stand-alone sub events of the conference. They have separate calls for papers and their own program and organizing committee. Tutorials are learning sessions including both lecture style and hands-on sessions. Each tutorial will be for half a day unless specified. For more information about each workshop and tutorial please visit this page: https://www.knowledgegraph.tech/the-knowledge-graph-conference-kgc/workshop… Early Bird registration ends on *February 15, 2020*. To register please visit this page: https://www.knowledgegraph.tech/the-knowledge-graph-conference-kgc/register/ WORKSHOPS - KGC Workshop on Applied Knowledge Graph: Best industry/academic practices, methods and challenges between representation and reasoning Organizers: Vivek Khetan, AI research specialist, Accenture Labs, SF Colin Puri, R&D Principal - Accenture Labs Lambert Hogenhout, Chief Analytics, Partnerships and Innovation, United Nations Limit: 40 people Date: May 4, 2020 Place: Room 203, Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Personal Health Knowledge Graphs (PHKG): Challenges and Opportunities Organizers: Ching-Hua Chen, PhD <https://researcher.watson.ibm.com/researcher/view.php?person=us-chinghua>, Amar Das, MD PhD <https://researcher.watson.ibm.com/researcher/view.php?person=us-amardas>, Ying Ding, PhD <https://www.ischool.utexas.edu/tags/ying-ding>, Deborah McGuinness, PhD <https://tw.rpi.edu/web/person/Deborah_L_McGuinness>, Oshani Seneviratne, PhD <https://idea.rpi.edu/people/staff/oshani-seneviratne>, and Mohammed J Zaki, PhD <http://www.cs.rpi.edu/~zaki> Limit: 40 people Date: May 5, 2020 Place: Room 203, Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> TUTORIALS - Virtualized Knowledge Graphs for Enterprise Applications Presenter: Eric Little, PhD – CEO LeapAnalysis Limit: 20 people Date and time: May 4, 2020 8:30AM - 12:30PM Place: Studio Butler, Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Data discovery on a (free) hybrid BI/Search/Knowledge graph platform: the Siren Community Edition hands on tutorial Presenter: Giovanni Tummarello, Ph.D Limit: 20 people Date and time: May 4, 2020 8:30AM - 12:30PM Place: Room 523 Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Building a Knowledge Graph from schema.org annotations Presenters: Elias Kärle, Umutcan Simsek, and Dieter Fensel (STI Innsbruck, University of Innsbruck) Limit: 25 people Date and time: May 4, 2020 1:30PM - 5:30PM Place: Room 523 Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Designing and Building Enterprise Knowledge Graphs from Relational Databases Presenter: Juan Sequeda, DataWorld Limit: 25 people Date and time: May 5, 2020 8:30AM - 12:30PM Place: Room 523 Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Rapid Knowledge Graph development with GraphQL and RDF databases Presenters: Vassil Momtchev, Ontotext Limit: 25 people Date and time: May 5, 2020 1:30PM - 5:30PM Place: Room 523 Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Introduction to Logic Knowledge Graphs, Succinct Data Structures and Delta Encoding for Modern Databases, and the Web Object Query Language Presenter: Dr. Gavin Mendel-Gleason and Cheukting Ho (DataChemist) Limit: 20 people Date and time: May 5, 2020 8:30AM - 12:30PM Place: Room 306 Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> - Modeling Evolving Data in Graphs While Preserving Backward Compatibility: The Power of RDF Quads Presenter: Souripriya Das, Matthew Perry, and Eugene I. Chong (Oracle) Limit: 20 people Date and time: May 5, 2020 1:30PM - 5:30PM Place: Room 306 Butler Library, Columbia University <https://goo.gl/maps/7ijLP7ze7Jw94uid9> Violeta Ilik KGC 2020 Workshops & Tutorials Chair -- Violeta Ilik

1 0

Mismatched reference: first version to be deployed this week
by Léa Lacroix 03 Feb '20

03 Feb '20

Hello all, As announced last month, we’ve been working on mismatched reference <https://www.wikidata.org/wiki/Wikidata:Mismatched_reference_notification_in…>, a new feature that alerts users when editing a value without changing the existing attached reference. This feature has been tested over the past month. Based on the positive feedback we received, we are now able to move forward and enable the feature on wikidata.org. This will take place this week, in two different steps: - Today around 13:00 UTC, you will be able to see a notification (similar to the constraint ones) after saving an edit - On Thursday, February 6th, we will also enable a button that will allow you to hide the notification if you think that the reference is not mismatched Please note that for now, the feature is not persistent: the editor who made the change will see it appear when they saved their edit, but if they reload the page, the notification will be gone. Other users also won’t be able to see it. We are considering adding this persistency feature in the future. If you want to give feedback about the feature, feel free to use this talk page <https://www.wikidata.org/wiki/Wikidata_talk:Mismatched_reference_notificati…>. If you want to report an issue directly on Phabricator, feel free to use this form <https://phabricator.wikimedia.org/maniphest/task/edit/form/43/?projects=wik…>. Cheers, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

CfP for KG-BIAS 2020 – Bias in Automatic Knowledge Graph Construction: A Workshop at AKBC 2020
by Edgar Meij 01 Feb '20

01 Feb '20

************************************************************** KG-BIAS 2020 – Bias in Automatic Knowledge Graph Construction: A Workshop at AKBC 2020 UC Irvine, USA – Wed June 24, 2020 https://kg-bias.github.io/ kg-bias(a)googlegroups.com ************************************************************** ### Overview Knowledge Graphs (KGs) store human knowledge about the world in structured format, e.g., triples of facts or graphs of entities and relations, to be processed by AI systems. In the past decade, extensive research efforts have gone into constructing and utilizing knowledge graphs for tasks in natural language processing, information retrieval, recommender systems, and more. Once constructed, knowledge graphs are often considered as “gold standard” data sources that safeguard the correctness of other systems. Because the biases inherent to KGs may become magnified and spread through such systems, it is crucial that we acknowledge and address various types of bias in knowledge graph construction. Such biases may originate in the very design of the KG, in the source data from which it is created (semi-)automatically, and in the algorithms used to sample, aggregate, and process that data. Causes of bias include systematic errors due to selecting non-random items (selection bias), misremembering certain events (recall bias), and interpreting facts in a way that affirms individuals' preconceptions (confirmation bias). Biases typically appear subliminally in expressions, utterances, and text in general and can carry over into downstream representations such as embeddings and knowledge graphs. This workshop – to be held for the first time at AKBC 2020 – addresses the questions: “how do such biases originate?”, “How do we identify them?”, and “What is the appropriate way to handle them, if at all?”. This topic is as-yet unexplored and the goal of our workshop is to start a meaningful, long-lasting dialogue spanning researchers across a wide variety of backgrounds and communities. Topics of interest include, but are not limited to: * Ethics, bias, and fairness * Qualitatively and quantitatively defining types of bias * Implicit or explicit human bias reflected in data people generate * Algorithmic bias represented in learned models or rules * Taxonomies and categorizations of different biases * Empirically observing biases * Measuring diversity of opinions * Language, gender, geography, or interest bias * Implications of existing bias to human end-users * Benchmarks and datasets for bias in KGs * Measuring or remediating bias * De-biased KG completion methods * Algorithms for making inferences interpretable and explainable * De-biasing or post-processing algorithms * Creating user awareness on cognitive biases * Ethics of data collection for bias management * Diversification of information sources * Provenance and traceability ### Submission Instructions Submission files should not exceed 8 pages with additional pages allowed for references. Reviews are double-blind; author names and affiliations must be removed. All submissions must be written in English and submitted as PDF files formatted using the sigconf template: https://www.acm.org/publications/proceedings-template. Submissions should be made electronically through https://easychair.org/conferences/?conf=kgbias2020. ### Workshop format We accept position papers, short papers, and full papers. Both ongoing and already published work is welcomed, and we will offer authors the option of having their paper included in the workshop proceedings. More details regarding the actual format and schedule of the workshop will be announced closer to the workshop date. ### Important Dates Apr 27 KG-BIAS 2020 submission deadline May 18 KG-BIAS 2020 notification Jun 22-23 AKBC Conference Jun 24 KG-BIAS 2020 workshop ### Code of Conduct Our workshop adheres to all principles and guidelines specified in the ACM Code of Ethics and Professional Conduct <https://www.acm.org/code-of-ethics> . ### Organizing committee * Edgar Meij, Bloomberg * Tara Safavi, University of Michigan * Chenyan Xiong, Microsoft Research AI * Miriam Redi, Wikimedia Foundation * Gianluca Demartini, University of Queensland * Fatma Özcan, IBM Research ### Contact information You can find us at https://kg-bias.github.io/ and contact us at kg-bias(a)googlegroups.com.

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata February 2020