Wikidata July 2015

wikidata@lists.wikimedia.org

55 participants
29 discussions

by John Lewis

Hi everyone, Here is the (belated right now) weekly summary of everything Wikidata! Discussions[edit <https://www.wikidata.org/w/index.php?title=Wikidata:Status_updates/2015_07_…> ] - Open request for adminship: Nikki <https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Administrat…> Events <https://www.wikidata.org/wiki/Wikidata:Events>/Press/Blogs <https://www.wikidata.org/wiki/Wikidata:Press_coverage> - Right now: Wikimania <https://wikimania2015.wikimedia.org/wiki/> Other Noteworthy Stuff - Magnus made some new tools: - list of scientists on Wikidata born after 1950 with no image, grouped by employer <https://tools.wmflabs.org/wikidata-todo/static/scientists_without_image_by_…>. Check your university! - list of scientists on Wikidata born after 1950 with no ORCID iD, grouped by employer <https://tools.wmflabs.org/wikidata-todo/static/scientists_without_orcid_by_…>. Ditto! - an InfoBitt replacement <https://twitter.com/MagnusManske/status/619211793496473600> - a tool you can use to match up Wikipedia articles without an item <https://tools.wmflabs.org/wikidata-todo/duplicity.php> to either an existing item or a create a new one if none exists - plus Mix'N'Match got new datasets you can help match up with Wikidata - CPDL ID (P2000) <https://www.wikidata.org/wiki/Property:P2000> was created. - Some presentations at WikiMania related to Wikidata have detailed slides online: - WikiProject Sum of All Paintings <https://wikimania2015.wikimedia.org/wiki/Submissions/Sum_of_all_paintings> - Templates are dead! Long live templates! <https://wikimania2015.wikimedia.org/wiki/Submissions/Templates_are_dead!_Lo…!>, technical presentation on the function and future of template syntax - Future of structured documents: VisualEditor, Citations and Wikidata, oh my! <https://wikimania2015.wikimedia.org/wiki/Submissions/Future_of_structured_d…!> Did you know? - Newest properties: CPDL ID <https://www.wikidata.org/wiki/Property:P2000>, UNESCO language status <https://www.wikidata.org/wiki/Property:P1999>, UCI code <https://www.wikidata.org/wiki/Property:P1998>, Facebook Places ID <https://www.wikidata.org/wiki/Property:P1997>, parliament.uk bio link <https://www.wikidata.org/wiki/Property:P1996>, medical specialty <https://www.wikidata.org/wiki/Property:P1995>, AllMusic composition ID <https://www.wikidata.org/wiki/Property:P1994>, TeX string <https://www.wikidata.org/wiki/Property:P1993>, Plazi ID <https://www.wikidata.org/wiki/Property:P1992>, LPSN URL <https://www.wikidata.org/wiki/Property:P1991>, species kept <https://www.wikidata.org/wiki/Property:P1990>, Encyclopaedia Metallum artist ID <https://www.wikidata.org/wiki/Property:P1989>,Delarge ID <https://www.wikidata.org/wiki/Property:P1988> - Newest WikiProjects <https://www.wikidata.org/wiki/Special:MyLanguage/Wikidata:WikiProjects> : Linguistics <https://www.wikidata.org/wiki/Wikidata:WikiProject_Linguistics> Development - Attending Wikimania 2015 in Mexico City right now, meeting awesome people that already use or want to know everything about Wikidata. - After the next deployment, redirects will be automatically created when merging items. - Special:UnconnectedPages can now be queried via the query API, thanks to Ladsgroup. - Migrating away from an older serializer to the DataModel Serialization component. - Worked on a new time parser that can parse formats like M/D/Y, Y M D and so on. - Fixed a bug where a value could not be edited when save failed. - Made the snak type, badge and rank selectors position themselves after resizing the browser window and introduced a possibility to collapse the sitelinks sections. - Worked on a change in editing to be able to edit a statement and it’s references in one step. You can see all open bugs related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. Monthly Tasks - Hack on one of these <https://phabricator.wikimedia.org/maniphest/query/R8GRzX1eH0tb/#R>. - Help develop the next summary here! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> - Contribute to a Showcase item <https://www.wikidata.org/wiki/Wikidata:Showcase_items>

8 years, 9 months

On Human computation (was Re: Freebase is dead, long live :BaseKB)

by Marco Fossati

Thanks @Benjamin for the pointers! I completely agree with @Tom. I've also been researching techniques for crowdsourcing micro-tasks, mostly for NLP activities like frame semantics annotation: http://www.aclweb.org/anthology/P13-2130 http://ceur-ws.org/Vol-1030/paper-03.pdf I found out that the crowd of paid workers can really make the difference, even for such difficult and subjective tasks. So here are my 2 cents to get the best out of it: 1. Extreme care for quality check mechanisms: for instance, the CrowdFlower.com platform has a facility that allows to automatically discard untrusted workers; 2. The micro-task must be atomic, i.e., not containing multiple sub-tasks; 3. The UI design is always crucial: simple words, clear examples, avoid screen scrolling. Cheers! On 7/18/15 2:00 PM, wikidata-request(a)lists.wikimedia.org wrote: > Date: Fri, 17 Jul 2015 13:42:55 -0400 > From: Tom Morris<tfmorris(a)gmail.com> > To: "Discussion list for the Wikidata project." > <wikidata(a)lists.wikimedia.org> > Subject: Re: [Wikidata] Freebase is dead, long live :BaseKB > Message-ID: > <CAE9vqEF3+xQtkbBiiV0Co2Lz8_HMvKhTh=3DwHyZ2UP0PK=CmQ(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > 3,000 judgments per person per day sounds high to me, particularly on a > sustained basis, but it really depends on the type of task. Some of the > tasks were very simple with custom high performance single purpose "games" > designed around them. For example, Genderizer presented a person's > information and allowed choices of Male, Female, Other, and Skip. Using > arrow key bindings for the four choices to allow quick selection without > moving one's hand, pipelining preloading the next topic in the background, > and allowing votes to be undone in case of error were all features which > allowed voters to make choices very quickly. > > The figures quoted in the paper below (18 seconds per judgment) work out to > more like 1,800 judgments per eight hour day. They collected 2.3 million > judgments over the course of a year from 555 volunteers (1.05 million > judgments) and 84 paid workers (1.25 million). > > On Fri, Jul 17, 2015 at 12:35 PM, Benjamin Good<ben.mcgee.good(a)gmail.com> > wrote: > >> >They wrote a really insightful paper about how their processes for >> >large-scale data curation worked. Among may other things, they >> >investigated mechanical turk 'micro tasks' versus hourly workers and >> >generally found the latter to be more cost effective. >> > >> >"The Anatomy of a Large-Scale Human Computation Engine" >> >http://wiki.freebase.com/images/e/e0/Hcomp10-anatomy.pdf >> > > The full citation, in case someone needs to track it down, is: > > Kochhar, Shailesh, Stefano Mazzocchi, and Praveen Paritosh. "The anatomy of > a large-scale human computation engine." *Proceedings of the acm sigkdd > workshop on human computation*. ACM, 2010. > > There's also a slide presentation by the same name which presents some > additional information: > http://www.slideshare.net/brixofglory/rabj-freebase-all-5049845 > > Praveen Paritosh has written a number of papers on the topic of human > computation, if you're interested in that (I am!): > https://scholar.google.com/citations?user=_wX4sFYAAAAJ&hl=en&oi=sra -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j

8 years, 9 months

Wikidata meetup at Wikimania

by Lydia Pintscher

Hey folks :) I've just added a meetup for us at Wikimania: https://wikimania2015.wikimedia.org/wiki/Wikidata_Meetup Hope to see many of you there. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

8 years, 9 months

Usage of Wikidata as language-neutral link target

by Federico Leva (Nemo)

In our wiki discussions, I increasingly see people linking Wikidata items when wishing to explain a concept (as one would do on a Wiktionary or Wikipedia entry) without "[[d:Q169207|forcing]]" the multilingual audience to a specific target language. Am I the only one seeing this trend? Do we all think it's a good thing? (Though a minor internal one.) To be upfront, I ask because I face unexpected opposition to the adoption of this custom elsewhere: https://phabricator.wikimedia.org/T89719 Nemo

8 years, 9 months

Wiktionary-Wikidata meetup at Wikimania

by André Costa

Hi all We are planning on holding a meetup[1 <https://wikimania2015.wikimedia.org/wiki/Wiktionary-Wikidata_Meetup>] at Wikimania for discussing Wiktionary-Wikidata/Wikibase and it's place in the Linked Open Data world. The recent proposal[2 <https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/201…>] has again made this a current topic and we believe a meetup at Wikimania would be a great opportunity to discuss this and related issues. It would therefore be especially interesting if people from both projects would want to show up. In addition to looking at how Wikitionary will be integrated into Wikidata we are also interested in seeing how a structured Wikitonary fits in the greater world of open data. Our personal background for this is acollaboration between Wikimedia Sverige and the Swedish Centre for Terminology[3 <http://tnc.se/the-swedish-centre-for-terminology.html>] who are looking at making their resources available as Linked Open Data and thinking about how Wikidata can fit as a node in this and also about whether Wikibase might be a suitable platform. That said we would be surprised if that was the only example out there. If you've had similar thoughts or know of other connections then we would love to hear about them. The meetup will take place on Friday 17 July 17:30-20:00. Location TBD. Sign up on the meetup page[1 <https://wikimania2015.wikimedia.org/wiki/Wiktionary-Wikidata_Meetup>] if you are interested in attending and post a note on the discussion page if there is something specific you have been thinking about. Regards, André Costa / Lokal_Profil P.S. Sorry for the short notice [1] https://wikimania2015.wikimedia.org/wiki/Wiktionary-Wikidata_Meetup [2] https://www.wikidata.org/wiki/Wikidata:Wiktionary/Development/Proposals/201… [3] http://tnc.se/the-swedish-centre-for-terminology.html André Costa | GLAM-tekniker, Wikimedia Sverige | Andre.Costa(a)wikimedia.se | +46 (0)733-964574 Stöd fri kunskap, bli medlem i Wikimedia Sverige. Läs mer på blimedlem.wikimedia.se

8 years, 9 months

Re: [Wikidata] Freebase is dead, long live :BaseKB

by Eric Sun

>> Note that Freebase did a lot of human curation and we know they could get about 3000 verifications of facts by "non-experts" a day who were paid for their efforts. That scales out to almost a million facts per FTE per year. Where can I found out more about how they were able to do such high-volume human curation? 3000/day is a huge number. On Thu, Jul 16, 2015 at 5:01 AM, <wikidata-request(a)lists.wikimedia.org> wrote: > Date: Wed, 15 Jul 2015 15:25:27 -0400 > From: Paul Houle <ontology2(a)gmail.com> > To: "Discussion list for the Wikidata project." > <wikidata-l(a)lists.wikimedia.org> > Subject: [Wikidata] Freebase is dead, long live :BaseKB > Message-ID: > < > CAE__kdQt55E7k7xHMeuBCu9QrwRKoMU_60NDuYgcTHNkC7DFHA(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > For those who are interested in the project of getting something out of > Freebase for use in Wikidata or somewhere else, I'd like to point out > > http://basekb.com/gold/ > > this a completely workable solution for running queries out of Freebase > after the MQL API goes dark. > > I have been watching the discussion about the trouble moving Freebase data > to Wikidata and let me share some thoughts. > > First quality is in the eye of the beholder and if somebody defines that > quality is a matter of citing your sources, than that is their definition > of 'quality' and they can attain it. You might have some other definition > of quality and be appalled that Wikidata has so little to say about a topic > that has caused much controversy and suffering: > > https://www.wikidata.org/wiki/Q284451 > > there are ways to attain that too. > > Part of the answer is that different products are going to be used in > different places. For instance, one person might need 100% coverage of > books he wants to talk about, another one might want a really great > database of ski areas, etc. > > Note that Freebase did a lot of human curation and we know they could get > about 3000 verifications of facts by "non-experts" a day who were paid for > their efforts. That scales out to almost a million facts per FTE per year. > > > > -- > Paul Houle > > *Applying Schemas for Natural Language Processing, Distributed Systems, > Classification and Text Mining and Data Lakes* > > (607) 539 6254 paul.houle on Skype ontology2(a)gmail.com > https://legalentityidentifier.info/lei/lookup/ > <http://legalentityidentifier.info/lei/lookup/> >

8 years, 9 months

[Wikidata-l] Call for development openness

by Ricordisamoa

Hi. I recently started following mediawiki/extensions/Wikibase on Gerrit, and quite astonishingly found that nearly all of the 100 most recently updated changes appear to be owned by WMDE employees (exceptions being one change by Legoktm and some from L10n-bot). This is not the case, for example, with mediawiki/core. While this may be desired by the Wikidata team for corporate reasons, I feel that encouraging code review by volunteers would empower both Wikidata and third-party communities with new ways of contributing to the project and raise awareness of the development team's goals in the long term. The messy naming conventions play a role too, i.e. Extension:Wikibase <https://www.mediawiki.org/w/index.php?title=Extension:Wikibase&redirect=no> is supposed to host technical documentation but instead redirects to the Wikibase <https://www.mediawiki.org/wiki/Wikibase> portal, with actual documentation split into Extension:Wikibase Repository <https://www.mediawiki.org/wiki/Extension:Wikibase_Repository> and Extension:Wikibase Client <https://www.mediawiki.org/wiki/Extension:Wikibase_Client>, apparently ignoring the fact that the code is actually developed in a single repository (correct me if I'm wrong). Just to add some more confusion, there's also Extension:Wikidata build <https://www.mediawiki.org/wiki/Extension:Wikidata_build> with no documentation. And what about wmde on GitHub <https://github.com/wmde> with countless creatively-named repos? They make life even harder for potential contributors. Finally, the ever-changing client-side APIs make gadgets development a pain in the ass. Sorry if this sounds like a slap in the face, but it had to be said.

8 years, 10 months

Freebase is dead, long live :BaseKB

by Paul Houle

For those who are interested in the project of getting something out of Freebase for use in Wikidata or somewhere else, I'd like to point out http://basekb.com/gold/ this a completely workable solution for running queries out of Freebase after the MQL API goes dark. I have been watching the discussion about the trouble moving Freebase data to Wikidata and let me share some thoughts. First quality is in the eye of the beholder and if somebody defines that quality is a matter of citing your sources, than that is their definition of 'quality' and they can attain it. You might have some other definition of quality and be appalled that Wikidata has so little to say about a topic that has caused much controversy and suffering: https://www.wikidata.org/wiki/Q284451 there are ways to attain that too. Part of the answer is that different products are going to be used in different places. For instance, one person might need 100% coverage of books he wants to talk about, another one might want a really great database of ski areas, etc. Note that Freebase did a lot of human curation and we know they could get about 3000 verifications of facts by "non-experts" a day who were paid for their efforts. That scales out to almost a million facts per FTE per year. -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontology2(a)gmail.com https://legalentityidentifier.info/lei/lookup/ <http://legalentityidentifier.info/lei/lookup/>

8 years, 10 months

by Aimco68＠yahoocom

Best regards Pradip Dave. Aimco Pesticides Limited Akhand Jyoti, 8th Road, Santacruz (East), Mumbai 400 055, India Board Line: +91 22 6760 4000 Fax No: +91 22 6760 4060 / 4070 www.aimcopesticides.com Aimco68(a)yahoo.com Ppd(a)aimcopesticides.com

8 years, 10 months

Update frequency on the Wikidata Query API

by Andra Waagmeester

We currently rely on the Wikidata Query API to identify whether or not a set of claims exists on a given property. Some of our previous bot runs has created duplicates since recent additions didn't make it to the WDQ API yet. In our efforts to prevent the creation of duplicate entries, I am trying to better understand the WDQ-api. The documentation of the WDQ-api states that [1] "Also, the data used here is from WikiData "dumps", so it can be a few hours old.". However, when I check on the datadumps they are either updated weekly with json dumps or incremental daily dumps as xml [2]. Also, sometimes the WDQ-api seems to have instant behaviour with claims being added, in the sense that they are immediately available through the WDQ API. How often is the WDQ api really being updated? Is it possible to query wikidata live, with WDQ and if not, are there alternatives that would allow this? Regards, Andra Waagmeester [1] https://wdq.wmflabs.org/api_documentation.html [2] https://www.wikidata.org/wiki/Wikidata:Database_download

8 years, 10 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata July 2015