Wikidata May 2016

wikidata@lists.wikimedia.org

64 participants
50 discussions

SPARQL: Entries with no label in one lang
by Toni Hermoso Pulido 03 May '16

03 May '16

Hello, I'm trying to retrieve via SPARQL those entries missing a label in a certain language (e.g. "de"). Example from this query: http://tinyurl.com/zazewyx How this could be done straight from a SPARQL query? Thanks, -- Toni Hermoso Pulido http://www.cau.cat http://www.similis.cc

2 3

Deployment date for the ArticlePlaceholder extension
by Lucie Kaffee 03 May '16

03 May '16

Hey everyone, We are working on making the ArticlePlaceholder extension ready for deployment now. We hope to deploy the extension on 2016-05-11. Until then, it would be great if you can help us out, especially if you speak the languages we are deploying to. It would be necessary to have the properties on Wikidata translated as one of the most important steps before the actual deployment. https://www.wikidata.org/wiki/Wikidata:List_of_properties https://tools.wmflabs.org/wikidata-terminator/ has a list of the most used items with missing labels and descriptions. In order for the extension to actually be useful it is necessary is to translate the labels and descriptions of items - this help would be greatly appreciated! <https://www.wikidata.org/wiki/Wikidata:List_of_properties> It would be a great help for Wikidata as a project as well as an advantage for the ArticlePlaceholder. Currently deployment is planned on: - Esperanto Wikipedia https://eo.wikipedia.org/wiki/Vikipedio:Diskutejo/Teknikejo#Anstata.C5.ADig… - Haitian Creole https://ht.wikipedia.org/wiki/Wikipedya:Kafe#Article_placeholder - Oriya Wikipedia https://or.wikipedia.org/wiki/%E0%AC%89%E0%AC%87%E0%AC%95%E0%AC%BF%E0%AC%AA… - Gujarati Wikipedia - Neapolitan Wikipedia - Asturian Wikipedia I am very much looking forward to this important step! Thank you all very much, Lucie (Frimelle) -- Lucie-Aimée Kaffee Working Student Software Development Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0http://wikimedia.de Imagine a world in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

4 4

Instabilities in Wikidata Query Service
by Guillaume Lederrey 03 May '16

03 May '16

This is a bad time for Wikidata Query Service. We had a few issues last week to update WDQS and reload all data. During this data reload, we are running on a single server. And now this server is starting to behave erratically. I honestly do not know yet what is happening except that we see a lot of file (pipes actually) left open and that WDQS stops responding. Investigation is going on [1] and we'll let you know what we find. Sorry for the inconvenience and thank you all for your patience! [1] https://phabricator.wikimedia.org/T134238 -- Guillaume Lederrey Operations Engineer, Discovery Wikimedia Foundation

1 0

Multiple properties/identifiers for the same resource
by Sebastian Burgstaller 03 May '16

03 May '16

Hi everyone, I am lately facing the following problem: There are many (biomedical) resources we import data from, which consist of several parts. And for each of these parts, they use either a different identifier structure, or they use the same identifier structure but with different accession URLs. This is valid for very essential resources like ChEMBL (e.g. compounds, targets, assays), miRNA database, IUPHAR and others In order to represent and link to these resources properly in Wikidata, how should we do this? The "easy" way is to just propose properties for each of these parts of a resource, which also allows to specify the proper formatter url. But this certainly would create several properties for the same resource. The other way would be to specify a set of formatter urls, but this fails currently anyway, as this has not been implemented (yet). Maybe we could specify formatter urls on a value basis which could override the formatter url specified in the property? But I guess this requires substantial dev time in Wikibase. What are your thoughts/ideas? Thanks! Sebastian

6 12

Fwd: [freebase-discuss] So long and thanks for all the data!
by Lydia Pintscher 03 May '16

03 May '16

Hey everyone, Freebase has been shut down today. Cheers Lydia ---------- Forwarded message --------- From: 'Jason Douglas' via Freebase Discuss < freebase-discuss(a)googlegroups.com> Date: Mon, May 2, 2016 at 8:08 PM Subject: [freebase-discuss] So long and thanks for all the data! To: Freebase Discuss <freebase-discuss(a)googlegroups.com> Today we will shut down freebase.com and the Freebase APIs. This is later than originally announced [1], but the launch of the KG Search API [2] also took longer than anticipated. The Freebase data was openly licensed and belongs to all of us, so we'll be leaving the data dumps available indefinitely. In addition to the last regular data dump that's been available ever since Freebase went read only [3], we've added a complete final graphd dump that includes all of the tuple metadata for anyone who really wants a forensic challenge [4]. If you have no idea what graphd tuples are, you probably shouldn't bother looking at it. ;-) It was fun! And to keep contributing to open knowledge bases, remember to check out Wikidata (wikidata.org)! - Google Knowledge Graph Team [1] https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc [2] https://developers.google.com/knowledge-graph/ [3] http://commondatastorage.googleapis.com/freebase-public/rdf/freebase-rdf-la… [4] http://commondatastorage.googleapis.com/freebase-public/rdf/graphd-archive.… guid - The 128-bit guid of the primitive, rendered as a 32-byte hex string. typeguid - The guid of the primitive's type if it is an edge (the predicate in RDF terms). If empty, the primitive represents a node in the graph, rather than an edge. left - The guid of the primitive's left-hand side (the subject in RDF terms). right - The guid of the primitive's right-hand side (the object in RDF terms). value - The literal value of the primitive. Only a few edge types have both right and value values (e.g, /type/object/key, /type/object/name). datatype - The type of the value, when present. One of: string | boolean | integer | float | timestamp | url | guid | bytestring. scope - The guid of the object whose permission was used to write the primitive (typically a user or attribution node). timestamp - When the primitive was written, with16 bits of sub-second precision. live - Whether the primitive was an assertion or deletion. archival - not used valuetype - not used txstart - Signals the beginning of replication transaction name - A few primitives are named so they can be directly accessed by system bootstrap code, starting with HAS_KEY. -- You received this message because you are subscribed to the Google Groups "Freebase Discuss" group. To unsubscribe from this group and stop receiving emails from it, send an email to freebase-discuss+unsubscribe(a)googlegroups.com. To post to this group, send email to freebase-discuss(a)googlegroups.com. Visit this group at https://groups.google.com/group/freebase-discuss. For more options, visit https://groups.google.com/d/optout. -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

2 1

Wikidata ontology
by Jan Macura 02 May '16

02 May '16

Hi all I've been using the <http://wikidata.org/ontology#> namespace for datatype properties for some time (more than a year). Now I can see everywhere only the <http://wikiba.se/ontology#> ns. Was there some reason for change? Are these two somehow compatible? Will the first one be deprecated? Thanks Jan

4 13

weekly summary #207
by Lydia Pintscher 02 May '16

02 May '16

Hey folks :) Here's your summary of what happened around Wikidata over the past week. Enjoy! Discussions - Wikidata, a stable source for disambiguation items? <https://www.wikidata.org/wiki/Wikidata:Project_chat#Wikidata.2C_a_stable_so…> Events <https://www.wikidata.org/wiki/Wikidata:Events>/Press/Blogs <https://www.wikidata.org/wiki/Wikidata:Press_coverage> - Find, Prioritize, and Recommend: An article recommendation system to fill knowledge gaps across Wikipedia <https://blog.wikimedia.org/2016/04/27/article-recommendation-system/> - Upcoming: CSVconf <http://csvconf.com> - Upcoming: Open Tech Summit <http://opentechsummit.net> Other Noteworthy Stuff - Commons now has arbitrary access <https://commons.wikimedia.org/wiki/Commons:Village_pump#Wikidata_support:_a…> - Work on ORES to make vandalism fighting easier is progressing well. "Damaging" and "goodfaith" models for Wikidata are now online. <https://lists.wikimedia.org/pipermail/wikidata/2016-May/008641.html> - Wikimind <https://wikimind.mindmup.com> - New coverage maps have been created: https://commons.wikimedia.org/wiki/File:Wikidata_Map_April_2016_Huge.png and https://commons.wikimedia.org/wiki/File:Wikidata_items_map_with_difference,… - We have new images you can use to indicate your website/app/service is using data from Wikidata. <https://commons.wikimedia.org/wiki/Category:Powered_by_Wikidata> (2 of the files have a problem still. We're working on a fix and will reupload.) Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: Loop ID <https://www.wikidata.org/wiki/Property:P2798>, sound power <https://www.wikidata.org/wiki/Property:P2797>, 3DMet ID <https://www.wikidata.org/wiki/Property:P2796>, locality or place <https://www.wikidata.org/wiki/Property:P2795>, Index Hepaticarum ID <https://www.wikidata.org/wiki/Property:P2794>, clearance <https://www.wikidata.org/wiki/Property:P2793>, ASF KID Cave Tag Number <https://www.wikidata.org/wiki/Property:P2792>, power consumed <https://www.wikidata.org/wiki/Property:P2791>, net tonnage <https://www.wikidata.org/wiki/Property:P2790>, connects with <https://www.wikidata.org/wiki/Property:P2789>, Czech neighbourhood ID code <https://www.wikidata.org/wiki/Property:P2788>, longest span <https://www.wikidata.org/wiki/Property:P2787>, aerodrome reference point <https://www.wikidata.org/wiki/Property:P2786>, Mercalli intensity scale <https://www.wikidata.org/wiki/Property:P2784> - Category reports on without claims by site <https://www.wikidata.org/wiki/Wikidata:Database_reports/without_claims_by_s…> now link to PetScan's fast "has no statements"-option. A report for your preferred Wikipedia can be added. - Query examples: software with most versions <https://query.wikidata.org/#SELECT%20%3Fitem%20%28COUNT%28%3Fstat%29%20AS%2…> (source <https://twitter.com/WikidataFacts/status/724724966491623425>), oldest software <https://query.wikidata.org/#SELECT%20%3Fsoftware%20%3FsoftwareLabel%20%3Fda…> (source <https://twitter.com/WikidataFacts/status/724727692210352128>), map of U1 stations in Berlin <https://query.wikidata.org/#%23Map%20of%20U1%20subway%20stations%20in%20Ber…> (source <https://twitter.com/johl/status/726025270159921152>) Development - Created the first MediaInfo entity through the API (screenshot <https://twitter.com/wikidata/status/725606133747077120>, a bit more background <https://lists.wikimedia.org/pipermail/wikidata/2016-April/008606.html>) - Substantially reduced server load for item and property displaying ( phab:T132662 <https://phabricator.wikimedia.org/T132662>) - There are currently some issues with the order of latitude/longitude inn coordinates in the query service map visualization. It will be fixed tonight. - Removed unsupported sort and dir parameters from the wikibase.api.RepoApi JavaScript module. This may break user JavaScript calling getEntitiesByPage (phab:T119856 <https://phabricator.wikimedia.org/T119856>). - Worked on new flyers for institutions that want to cooperate with Wikidata and developers wanting to use our data (will be published on Commons once they're done) - Moved forward with internationalization of the query service interface (not on translatewiki.net yet but being worked on) - Worked on making it possible to extend SPARQL queries in simplified natural language version. It will also no longer add query prefixes when editing the query. Those are not live yet. - Fixed a bug where admins got a blank page when trying to view deleted revisions (phabricator:T132645 <https://phabricator.wikimedia.org/T132645>) - Investigating issues with bad suggestions for properties when adding new statements (phabricator:T132839 <https://phabricator.wikimedia.org/T132839>) You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. Monthly Tasks - Hack on one of these <https://phabricator.wikimedia.org/maniphest/query/R8GRzX1eH0tb/#R>. - Help develop the next summary here! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> - Contribute to a Showcase item <https://www.wikidata.org/wiki/Wikidata:Showcase_items> - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread pages in your own language! - Add labels, in your own language(s), for the new properties listed above. Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

[GSoC 2016] Community Bonding
by Alangi Derick 02 May '16

02 May '16

Hi everyone I am Alangi Derick and also d3r1ck on the IRC. I was selected in the GSoC 2016 program and i'm opportune to work on the project titled "Integration of IFTTT support for Wikidata" and I am very happy to be the first African GSoCer in Wikimedia Foundation. This is indeed a privilege and I wish to thank everyone that guided me in this movement to attain this level. I specially want to thank my mentors (Stephen Laporte, Marius Hoch, Lydia Pintscher, Benedikt Seidl and Sam Tarling) for helping me before and within this program. Also I won't forget to thank some very key org admins for their great help when I just joined the movement (Quim Gil, Brian Wolf, Andre Klapper), I really thank you all for your help. I will love to keep working with you all so that I can better shape my future in Wikimedia and also hope to see you all in person when the time comes :) and also to all Wikimedia Developers, I shall keep in touch and we all shall make the movement better in the future. Cheers!!!! Regards Alangi Derick Ndimnain

2 1

SPARQL: Lots of items missing from "Art UK" painter searches
by James Heald 01 May '16

01 May '16

Is there a big problem with the SPARQL servers ? I was just about to update some of the numbers at https://en.wikipedia.org/wiki/Wikipedia_talk:GLAM/Your_paintings/header which has the numbers for sitelinks, cross-properties etc for painters with P1367 (Art UK artist ID; formerly "Your Paintings" artist ID). SPARQL used to return 22,412 hits; and Autolist currently returns 22,458 hits (the latter includes deprecated values). But now SPARQL is only returning 18,291 hits http://tinyurl.com/zbqznun What's going on? The number has been over 22,000 since at least October last year. (And before that was down at about 8,700). Should all SPARQL searches be currently seen as unreliable ? Should there be a warning on the search page advising of this ? -- James.

3 5

"Damaging" and "goodfaith" models for Wikidata are now online
by Amir Ladsgroup 01 May '16

01 May '16

Hello, TLDR: Vandalism detection model for Wikidata just got much more accurate. Longer version: ORES is designed to handle different types of classification. For example one of under development classification types is "wikiclass" which determines type of edits. If they are adding content, or fixing mistake, etc. The most mature classification of ORES is edit quality. Whether an edit is vandalism or not. We usually have three models: "reverted" model. Training data for this model is obtained automatically. We sample around 20K edits (for Wikidata it was different) and we consider an edit as vandalism if they are reverted within a certain time period after the edit (7 days for Wikidata). On the other hand, "damaging" and "goodfaith" models are more accurate because we sample about 20K edits. Prelabel edits that being made by trusted users such as admins and bots as not harmful to Wikidata/Wikipedia and then we ask users to label the rest. (For Wikidata it was around 4K edits) Since most edits in Wikidata are made by bots and trusted users, We altered this method for Wikidata a bit but the whole process is the same. Don't forget that since it's human judgement, this models are more accurate and useful to damage detection. The ORES extension uses "damaging" model and not "reverted" model, thus having "damaging" model online is a requirement for the extension deployment. People label edits that if an edit is damaging to Wikidata and if the edit is made by good intention. So we have three cases: 1- An edit is harmful to Wikidata but made with good intention. An honest/newbie mistake 2- An edit is harmful and made bad intention. A vandalism 3- A edit with good intention and productive. A "good" edit". Biggest reason to distinguish between honest mistakes and vandalisms is that using anti-vandalism bots caused reducing on new user retention in Wikis [1]. So future anti-vandalism bots should not revert good faith mistakes but report them for human review. One of good things about Wikidata damage detection labeling process is that so many people were involved (we had 38 labelers for Wikidata[2]). Another good thing is that its fitness very high in terms of AI [3]. But since number of damaging edits and not damaging edits are not the same, scores it gives to edits are not intuitive. Let me give you an example: In our damaging model if an edit is scored less than 80% it's probably not vandalism. Actually, in a very huge sampling of human edits we had for reverted model we couldn't find a bad edit with score lower than 93% i.e. If an edit is scored 92% in reverted model, you are pretty sure it's not vandalism. Please reach out to us if you have any questions on using these scores. Please reach out to us if have any questions in general ;) In terms of needed changes, ScoredRevision gadget is set automatically to prefer the damaging model. I just changed my bot in #wikidata-vandalism channel in order to use damaging instead of reverted. If you want to use these models. Check out our docs [4] Sincerely, Revision scoring team [5] [1]: Halfaker, A.; Geiger, R. S.; Morgan, J. T.; Riedl, J. (28 December 2012). "The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity Is Causing Its Decline". *American Behavioral Scientist* *57* (5): 664–688. [2]: https://labels.wmflabs.org/campaigns/wikidatawiki/?campaigns=stats [3]: https://ores.wmflabs.org/scores/wikidatawiki/?model_info [4]: https://ores.wmflabs.org/v2/ [5]: https://meta.wikimedia.org/wiki/Research:Revision_scoring_as_a_service#Team

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata May 2016