Wikidata August 2017

wikidata@lists.wikimedia.org

41 participants
26 discussions

Wiki PageID
by Gintautas Sulskus 05 Oct '18

05 Oct '18

Hi, I have a couple of questions regarding the Wiki Page ID. Does it always stay unique for the page, where the page itself is just a placeholder for any kind of information that might change over time? Consider the following cases: 1. The first time someone creates page "Moon" it is assigned ID=1. If at some point the page is renamed to "The_Moon", the ID=1 remains intact. Is this correct? 2. What if we have page "Moon" with ID=1. Someone creates a second-page "The_Moon" with ID=2. Is it possible that page "Moon" is transformed into a redirect? Then, "Moon" would be redirecting to page "The_Moon"? 3. Is it possible for page "Moon" to become a category "Category:Moon" with the same ID=1? Thanks, Gintas

7 11

Coordinate precision in Wikidata, RDF & query service
by Stas Malyshev 06 Nov '17

06 Nov '17

Hi! I would like to initiate a discussion about coordinate precision in Wikidata and Query Service. The reason is that right now we do not have any limit to precision, coordinates are basically doubles, and that allows to specify over-precise coordinates and makes it harder to compare them - both between themselves within Wikidata and with outside services. >From the precision description in [1], we would rarely need beyond third or fourth digit after the decimal point. However, we have in the database coordinates like: Point(13.366666666 41.766666666) which pretends to specify it with sub-millimeter accuracy - for an entity that describes a municipality[2]! We do have precision on values - e.g. the above has specified precision of "arcseconds" - so it may be just a formatting issue, but even arcsecond looks somewhat over-precise for a city. And it may be a bit challenging to convert DMS precision DD precision. But the bigger question is whether we should store over-precise coordinates in the database at all, or we should round them up on export or inside the data. The formulae that are used to calculate distances have, by obvious reasons, limited precision, and direct comparisons can't take precision into account, which may lead to such coordinates very hard to work with. Should we maybe just put a limit on how precise we put coordinates into RDF and in query service? Would four decimals after the dot be enough? According to [4] this is what commercial GPS device can provide. If not, why and which accuracy would be appropriate? We do export precision of the coordinate as wikibase:geoPrecision[3] - and we currently have 258060 distinct values for it. This is very weird. I am not sure precision is useful in this form. Can anybody tell me any use case for this number now? If not, maybe we should change how we represent it. I'm also not sure where these come from as we only have 13 options in the UI. Bots? [1] https://en.wikipedia.org/wiki/Decimal_degrees [2] https://www.wikidata.org/wiki/Q116746 [3] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Globe_coor… [4] https://gis.stackexchange.com/questions/8650/measuring-accuracy-of-latitude… -- Stas Malyshev smalyshev(a)wikimedia.org

8 19

New research paper + dataset on property ranking
by Simon Razniewski 03 Sep '17

03 Sep '17

Hello, I wanted to make you aware of our new paper "Doctoral Advisor or Medical Condition: Towards Entity-specific Rankings of Knowledge Base Properties", which deals with the problem of determining the interestingness of Wikidata properties for individual entities. In the paper we develop a dataset of 350 random (entity, property1, property2) records, and use human judgments to determine the more interesting property in each record. We then show that state-of-the-art techniques (Wikidata Property Suggestor, Google search) achieve 61% precision on predicting the winner in high-agreement records, which can be lifted to 74% by using linguistic similarity, but remains still significantly below human performance (87.5% precision). Paper: http://www.simonrazniewski.com/2017_ADMA.pdf (to appear at ADMA 2017). Dataset: https://www.kaggle.com/srazniewski/wikidatapropertyranking Best wishes, Simon Razniewski

3 3

Wikidata SPARQL endpoint dropping some results?
by Osma Suominen 29 Aug '17

29 Aug '17

Hi, I recently set up a nightly cronjob that adds Wikidata links to the YSO Places SKOS dataset. The way it works is that each time, it runs a rather simple SPARQL CONSTRUCT query [1] against the Wikidata SPARQL endpoint (script [2]) that looks for YSO ID properties (P2347), and stores the result as sorted N-Triples into a file [3] that then gets committed to GitHub. These triples are then incorporated into the YSO Places data set [4] and published in Finto.fi. But I've noticed that each time this SPARQL query is executed, a few triples that were in the previous version get dropped and others get reinstated. It seems to me that the Wikidata SPARQL endpoint is randomly (?) dropping some triples from the result set. As an example, the skos:closeMatch triple that links yso:p109659 ("Laanila, Oulu") to wd:Q11874312 was there yesterday [5] but not in today's version [6]. The last edit in Wikidata was made 3 weeks ago [7] so nothing in the RDF data available through the Wikidata SPARQL endpoint should have changed. Is this a known problem? Am I missing something here? Is there something wrong with the approach of running a CONSTRUCT query against the Wikidata endpoint and expecting to get the same result (of around 4200 triples) each time, unless the underlying data in Wikidata has changed? -Osma [1] https://github.com/NatLibFi/Finto-data/blob/master/vocabularies/yso-paikat/… [2] https://github.com/NatLibFi/Finto-data/blob/master/vocabularies/yso-paikat/… [3] https://github.com/NatLibFi/Finto-data/blob/master/vocabularies/yso-paikat/… [4] https://github.com/NatLibFi/Finto-data/blob/master/vocabularies/yso-paikat/… [5] https://github.com/NatLibFi/Finto-data/blob/9ea0e34d0814b1aa7a8ac39597d395a… [6] https://github.com/NatLibFi/Finto-data/blob/6af4389c9f85b0b2bd1f201f2940738… [7] https://www.wikidata.org/w/index.php?title=Q11874312&action=history -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen(a)helsinki.fi http://www.nationallibrary.fi

4 3

Some Mix'n'match mappings not stored in Wikidata?
by Osma Suominen 29 Aug '17

29 Aug '17

Hi, We're more than halfway through mapping YSO places to Wikidata. Most of the remaining are places that don't exist in Wikidata, and adding them is quite labor-intensive so we will have to consider our strategy. Anyway, I did some checking of what remains unmapped and noticed a potential problem: some mappings for places that we have mapped using Mix'n'match have not actually been stored in Wikidata. For example Q36 Poland ("Puola" in YSO Places) is such a case. In Mix'n'match it is shown as manually matched (see attached screenshot), but in Wikidata the corresponding YSO ID property doesn't actually exist for the entity. I checked the change history of the Q36 entity and couldn't find anything relevant there, so it seems that the mapping was never stored in Wikidata. Maybe there was a transient error of some kind? Another such case was Q1754 Stockholm ("Tukholma" in YSO places). But for that one we removed the existing mapping in Mix'n'match and set it again, and now it is properly stored in Wikidata. Mix'n'match currently reports 4228 mappings for YSO places, while a SPARQL query for the Wikidata endpoint returns 4221 such mappings. So I suspect that this only affects a small number of entities. Is it possible to compare the Mix'n'match mappings with what actually exists in Wikidata, and somehow re-sync them? Or just to get the mappings out from Mix'n'match and compare them with what exists in Wikidata, so that the few missing mappings may be added there manually? Thanks, Osma -- Osma Suominen D.Sc. (Tech), Information Systems Specialist National Library of Finland P.O. Box 26 (Kaikukatu 4) 00014 HELSINGIN YLIOPISTO Tel. +358 50 3199529 osma.suominen(a)helsinki.fi http://www.nationallibrary.fi

4 9

Weekly Summary #275
by Léa Lacroix 28 Aug '17

28 Aug '17

*Here's your quick overview of what has been happening around Wikidata over the last week.* Discussions - Open request for adminship: علاء <https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrat…> - New request for comments: How to capture negative results in Wikidata? <https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/How_to_capture_…> - Comment about approach on items about pages in cebwiki created by Lsjbot <https://www.wikidata.org/wiki/Wikidata:Project_chat#Dealing_with_our_second…> - Comment about whether to have a "Wikimedia username" property <https://www.wikidata.org/wiki/Wikidata:Properties_for_deletion#Property:P41…> Events <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Events>/ Press/Blogs <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Press_coverage> - Upcoming: WikiArabia <https://meta.wikimedia.org/wiki/WikiArabia/2017>, in Cairo, Egypt, on 23-25 October. They're looking for a volunteer to give a Wikidata workshop. Please contact them if you're interested! - A Glimpse into Babel: An Analysis of Multilinguality in Wikidata <http://opensym.lero.ie/wp-content/uploads/2017/08/a14-kaffee.pdf>, by Lucie-Aimée Kaffee et al. (Q37859976 <https://www.wikidata.org/wiki/Q37859976>) - *The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge* August Wikimedia Research Showcase (second half), presented by Andrew Su, about how GeneWiki <https://en.wikipedia.org/wiki/en:Portal:Gene_Wiki> is using Wikidata. Video on YouTube <https://www.youtube.com/watch?v=Fa0Ztv2iF4w&feature=youtu.be&t=1929> - *Citation.js: Endpoint on RunKit <https://larsgw.blogspot.nl/2017/08/citationjs-endpoint-on-runkit.html>* shows a demo REST API to convert Wikidata entries into BibTeX, Bib.TXT, citations, and CSL-JSON. - Video of 3 Wikimania 2017 sessions <https://www.youtube.com/watch?v=P2e9zRWCKuw>: Wikidata Revolution, Performing arts in Wikidata, and Sum of all paintings. Others videos can be found on Andrew's channel <https://www.youtube.com/channel/UC-UNtpnYbBVDd-dM6sAX9sQ>. WikidataCon <https://www.wikidata.org/wiki/Wikidata:WikidataCon_2017> - The organization team is still looking for sponsors to support the event and provide even more awesome stuff to the attendees. If your company can support the WikidataCon, please get in touch with Lydia <https://www.wikidata.org/wiki/User:Lydia_Pintscher_%28WMDE%29>. - Seeing the enthusiasm of the community for the WikidataCon, we raised our attendees limit from 150 to 200 persons. The last tickets will be released on September 1st. People who registered on the waitlist will be notified when a ticket is available. - The program committee is currently reviewing and organizing the submissions. We will contact the speakers soon, and publish the program in the beginning of September (around 6th). - We're looking for a keynote speaker who could bring an external point of view on ontologies. If you know interesting people, feel free to help <https://www.wikidata.org/wiki/Topic:Tx37uyqyxuvstxmq> Other Noteworthy Stuff - Q38000000 <https://www.wikidata.org/wiki/Q38000000> was created - Language fallback indicators hidden on variant fallback languages <https://www.wikidata.org/wiki/Wikidata:Project_chat#Announcement:_Fallback_…> Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: National Natural Landmarks site ID <https://www.wikidata.org/wiki/Property:P4182>, GujLit Book ID <https://www.wikidata.org/wiki/Property:P4181>, GujLit Person ID <https://www.wikidata.org/wiki/Property:P4180>, tabular population <https://www.wikidata.org/wiki/Property:P4179>, Beazley Archive Pottery Database ID <https://www.wikidata.org/wiki/Property:P4178>, Finnish National Gallery person ID <https://www.wikidata.org/wiki/Property:P4177>, effective firing range <https://www.wikidata.org/wiki/Property:P4176>, Patreon person ID <https://www.wikidata.org/wiki/Property:P4175>, Wikimedia username <https://www.wikidata.org/wiki/Property:P4174>, Instagram location ID <https://www.wikidata.org/wiki/Property:P4173>, America's Byways road ID <https://www.wikidata.org/wiki/Property:P4172> - Query examples: - number of persons per QID (using P735=Q4925477) <https://query.wikidata.org/#%23defaultView%3ALineChart%0A%23%20number%20of%…> (source <https://www.wikidata.org/wiki/Wikidata:WikiProject_Names/lists#Number_of_pe…> ) - QID milestones by date <https://query.wikidata.org/embed.html#%23defaultView%3ALineChart%0A%23%20by…> - Timeline for biochemist and Nobel Laureate Hans Adolf Krebs <https://query.wikidata.org/#%23defaultView%3ATimeline%0ASELECT%20DISTINCT%2…> (source <https://twitter.com/SciHiBlog/status/901314630924083200>) - Canadian actors who twice played a US president <https://query.wikidata.org/#%23%20Canadian%20actors%20who%20twice%20played%…> (source <https://twitter.com/WikidataFacts/status/901545164765790208>) - Duration of Steven Spielberg's films <https://query.wikidata.org/#%23%20Lengths%20of%20Steven%20Spielberg%27s%20F…> (source <https://twitter.com/andersoncliffb/status/899721395235672064> ) - Newest external tools: Metaphacts <https://www.amazon.com/metaphacts/dp/B0745KLCFX>, a feature to search in Wikidata via Alexa Development - Worked on fixing the technical glitch in the UI (phabricator:T173543 <https://phabricator.wikimedia.org/T173543>) - Did polishing on the Constraint Checks gadget (for example phabricator:T173738 <https://phabricator.wikimedia.org/T173738>) - Worked on caching constraints checks results which is needed to enable constraints checks for all logged-in users (phabricator:T173696 <https://phabricator.wikimedia.org/T173696>) - Made progress on checking constraints also on qualifiers and references (phabricator:T168532 <https://phabricator.wikimedia.org/T168532>) - Worked more on making it easier to see when input in a property or value input field is not recognized (phabricator:T170531 <https://phabricator.wikimedia.org/T170531>) - Fighting with load issues in the job queue (phabricator:T173710 <https://phabricator.wikimedia.org/T173710>) - Looking into not showing language fallback hints for dialects to make the UI less cluttered for them and make it less necessary to enter labels and descriptions for them (phabricator:T174318 <https://phabricator.wikimedia.org/T174318>) - Wrapping up our experiments with the new front-end technologies You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. Monthly Tasks - Add labels, in your own language(s), for the new properties listed above. - Comment on property proposals: all open proposals <https://www.wikidata.org/wiki/Wikidata:Property_proposal/Overview> - Suggested and open tasks <https://www.wikidata.org/wiki/Wikidata:Contribute/Suggested_and_open_tasks> ! - Contribute to a Showcase item <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:Showcase_items> . - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread the interface and documentation pages, in your own language! - Help merge identical items <https://www.wikidata.org/wiki/User:Pasleim/projectmerge> across Wikimedia projects. - Help write the next summary! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Announcement: Fallback indicators hidden on variant fallbacks
by Lucas Werkmeister 28 Aug '17

28 Aug '17

Hello all, we’ll be deploying a minor change to the Wikidata interface soon. If your interface is set up in a language other than English, when you view an entity page, you can see entity labels in other languages than the one you selected, according to language fallbacks (e. g. Austrian German users might see German labels if there is no Austrian German label, or even English labels if there is no German label either for an entity). The language of the label being displayed is shown in a small indicator, e. g. Douglas Adams <https://www.wikidata.org/wiki/Q42> *English*. From August 29th forward, that indicator will be hidden by default if the user language and the language of the label are variants of the same language, e. g. if the user interface is in Austrian German and the label is in German. If you don’t like this change, you can override it by adding a small piece of code <https://phabricator.wikimedia.org/P5929> to your your common.css <https://www.wikidata.org/wiki/Special:MyPage/common.css>. If you encounter any problem with this change, feel free to leave a comment under this Phabricator ticket <https://phabricator.wikimedia.org/T174318>. Best regards, Lucas Werkmeister -- Lucas Werkmeister Software Developer (Intern) Wikimedia Deutschland e. V. | Tempelhofer Ufer 23-24 | 10963 Berlin Phone: +49 (0)30 219 158 26-0 https://wikimedia.de Imagine a world, in which every single human being can freely share in the sum of all knowledge. That‘s our commitment. Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

WikiArabia 2017 is looking for a Wikidata workshop facilitator
by Léa Lacroix 28 Aug '17

28 Aug '17

Hello all, WikiArabia 2017 is taking place in Cairo, Egypt, on October 23-25. https://meta.wikimedia.org/wiki/WikiArabia/2017 They are looking for someone to handle a Wikidata workshop there. If a volunteer from the area is willing to go there, please contact the organizers for more details! You can write an e-mail to ah.hamdi(a)wikimedia-eg.org Please share in your local networks if relevant. Thanks a lot, -- Léa Lacroix Project Manager Community Communication for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

Followup from Zooniverse Conversation in June
by Alex Stinson 25 Aug '17

25 Aug '17

Hey all, I wanted to followup from the Zooniverse comments/thread a couple months back, as part of Pharos's comments about crowd-sourcing depicts statements ( https://lists.wikimedia.org/pipermail/wikidata/2017-June/010795.html). I met with Zooniverse staff during the week leading up to Wikimania, and was able to get a rough outline of how they might be connected with our ecosystem: - They have a project builder system, that allows anyone to create a project that uses their crowdsourcing structures, to do description and identification of components of media files. The documentation is at: https://www.zooniverse.org/lab - They have an undocumented method for drawing media from external repositories via a simple API -- so we could be calling on content generated from sets via Commons or Wikidata. - They would be interested in exploring if there is a good process for connecting Commons as a source for their project builder, and would be willing to provide some developer advisory support for either a volunteer or partner institution to develop an appropriate link between the two apis. I am talking with Pharos about potentially doing something with the Met set on Commons, but if you think you have a set of media files already on Commons that might be of interest to a citizen science-type crowdsourcing project - please let me know. (Tools developed by the Wikidata community might be more appropriate in the short term for the Met collection [1]) - Cool-less relevant note: Zooniverse is experimenting with using Machine learning models to both prompt citizen science actions, and to sort various sets of media in projects. I am interested in exploring this relationship, because as Structured Data on Commons gains the ability to store structured media file information in the next few years, we have an increased ability to absorb simple descriptive and other crowdsourcing on top of media files, and the Zooniverse community provides access to a very wide group of crowd-contribution interested communities. Moreover, if one of the value statements for uploading to Commons, was a simple access point to Zooniverse Crowdsourcing for either scientific purposes or enriching structured descriptive metadata, we might have a lot more contributions of both scientific and cultural heritage collections to Commons.[2] If you have a set of media on Commons that you might be interested in testing in the Zooniverse Project Builder, and have some developer capacity, please let me know offlist. We don't need to move quickly on the offer of consultation, but I would like to continue talking with Zooniverse about how to create a better relationship. Cheers, Alex -- Alex Stinson GLAM-Wiki Strategist Wikimedia Foundation Twitter:@glamwiki/@sadads [1] see the update from Gordibach: https://meta.wikimedia.org/wiki/Grants_talk:IdeaLab/Wikidata_Paintbrush [2] I am also keenly aware that any projects that pilot this kind of data enrichment, will require a fair amount of Commons and/or Wikidata community discussion about data quality, and its use. Learn more about how the communities behind Wikipedia, Wikidata and other Wikimedia projects partner with cultural heritage organizations: http://glamwiki.org

1 0

Fwd: [Analytics] Research Showcase Wednesday, August 23, 2017 at 11:30 AM (PST) 18:30 UTC
by Jonathan Morgan 23 Aug '17

23 Aug '17

One of this month's WMF research showcase presentations is by Andrew Su of Scripps Institute, the coordinator of Gene Wiki <https://en.wikipedia.org/wiki/Portal:Gene_Wiki>. ---------- Forwarded message ---------- From: Sarah R <srodlund(a)wikimedia.org> Date: Mon, Aug 21, 2017 at 3:22 PM Subject: [Analytics] Research Showcase Wednesday, August 23, 2017 at 11:30 AM (PST) 18:30 UTC To: wikimedia-l(a)lists.wikimedia.org, analytics(a)lists.wikimedia.org, wiki-research-l(a)lists.wikimedia.org Hi Everyone, The next Research Showcase will be live-streamed this Wednesday, August 23, 2017 at 11:30 AM (PST) 18:30 UTC. YouTube stream: https://www.youtube.com/watch?v=Fa0Ztv2iF4w As usual, you can join the conversation on IRC at #wikimedia-research. And, you can watch our past research showcases here <https://www.mediawiki.org/wiki/Wikimedia_Research/Showcase#August_2017>. This month's presentation: Sneha Narayan (Northwestern University) *The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users* Integrating new users into a community with complex norms presents a challenge for peer production projects like Wikipedia. We present The Wikipedia Adventure (TWA): an interactive tutorial that offers a structured and gamified introduction to Wikipedia. In addition to describing the design of the system, we present two empirical evaluations. First, we report on a survey of users, who responded very positively to the tutorial. Second, we report results from a large-scale invitation-based field experiment that tests whether using TWA increased newcomers' subsequent contributions to Wikipedia. We find no effect of either using the tutorial or of being invited to do so over a period of 180 days. We conclude that TWA produces a positive socialization experience for those who choose to use it, but that it does not alter patterns of newcomer activity. We reflect on the implications of these mixed results for the evaluation of similar social computing systems. Andrew Su (Scripps Research Institute) *The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge* The Gene Wiki project began in 2007 with the goal of creating a collaboratively-written, community-reviewed, and continuously-updated review article for every human gene within Wikipedia. In 2013, shortly after the creation of the Wikidata project, the project expanded to include the organization and integration of structured biomedical data. This talk will focus on our current and future work, including efforts to encourage contributions from biomedical domain experts, to build custom applications that use Wikidata as the back-end knowledge base, and to promote CC0-licensing among biomedical knowledge resources. Comments, feedback and contributions are welcome at https://github.com/SuLab/genewikicentral and https://www.wikidata.org/wiki/WD:MB. Kindly, Sarah R. Rodlund Senior Project Coordinator-Product & Technology, Wikimedia Foundation srodlund(a)wikimedia.org _______________________________________________ Analytics mailing list Analytics(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/analytics -- Jonathan T. Morgan Senior Design Researcher Wikimedia Foundation User:Jmorgan (WMF) <https://meta.wikimedia.org/wiki/User:Jmorgan_(WMF)>

2 1

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata August 2017