Wikidata March 2016

wikidata@lists.wikimedia.org

50 participants
36 discussions

playing with Wikidata in BigQuery
by Felipe Hoffa 12 Mar '16

12 Mar '16

Hi! I'm Felipe Hoffa at Google, and I've been playing with Wikidata's data in BigQuery: https://twitter.com/felipehoffa/status/705068002522304512 (thx Denny for the introduction to all things Wikidata!) It's all very early, but I wanted to share some results, and ask for advice on how to continue. The best news about Wikidata in BigQuery: You can process the whole raw JSON dump in about 7 seconds: SELECT MIN(LENGTH(item)) FROM [fh-bigquery:wikidata.latest_raw] WHERE LENGTH(item)>5 (the shortest element in wikidata is 102 characters, and I use the LENGTH()<5 to filter the first and last rows of the dump file, which are simple square brackets) You can also parse each JSON record on the fly: SELECT JSON_EXTRACT_SCALAR(item, '$.id') FROM [fh-bigquery:wikidata.latest_raw] WHERE LENGTH(item)=102 (4 seconds, the shortest element is https://www.wikidata.org/wiki/Q2307693) Or to find cats: SELECT JSON_EXTRACT_SCALAR(item, '$.id') id, JSON_EXTRACT_SCALAR(item, '$.sitelinks.enwiki.title') title, JSON_EXTRACT_SCALAR(item, '$.labels.en.value') label, item FROM [fh-bigquery:wikidata.latest_raw] WHERE JSON_EXTRACT_SCALAR(item, '$.claims.P31[0].mainsnak.datavalue.value.numeric-id')='146' #cats AND LENGTH(item)>10 LIMIT 300 (Wikidata has 54 cats) SQL is very limited though - how about running some JavaScript inside SQL? Here I'm looking for Japanese and Arabic cats, and URL encoding their links: https://github.com/fhoffa/code_snippets/blob/master/wikidata/find_cats_japa… (25 links to the Japanese and Arabic Wikipedia) Now that I have full control of each element with JavaScript, I can create a more traditional relational table, with nested elements, that only contains Wikidata items that have a page in the English Wikipedia: https://github.com/fhoffa/code_snippets/blob/master/wikidata/create_wiki_en… (Wikidata has ~20M rows, while my "English Wikidata" has ~6M) With this new table, I can write simpler queries that ask questions like "who has female and male genders assigned on Wikidata": SELECT en_wiki, GROUP_CONCAT(UNIQUE(STRING(gender.numeric_id))) WITHIN RECORD genders FROM [fh-bigquery:wikidata.latest_en_v1] OMIT RECORD IF (EVERY(gender.numeric_id!=6581072) OR EVERY(gender.numeric_id!=6581097)) (33 records, and some look like they shouldn't have been assigned both genders) Finally, why did I URL encode the titles to the English Wikipedia? So I can run JOINs with the Wikipedia pageviews dataset to find out the most visited cats (or movies?): SELECT en_wiki, SUM(requests) requests FROM [fh-bigquery:wikipedia.pagecounts_201602] a JOIN ( SELECT en_wiki FROM [fh-bigquery:wikidata.latest_en_v1] WHERE instance_of.numeric_id=146 ) b ON a.title=b.en_wiki WHERE language='en' GROUP BY 1 ORDER BY 2 DESC LIMIT 100 (13 seconds, Grumpy Cat got 19,342 requests in February) Or to process way less data, a JOIN that only looks at the top 365k pages from English Wikipedia): SELECT en_wiki, SUM(requests) requests FROM [fh-bigquery:wikipedia.pagecounts_201602_en_top365k] a JOIN ( SELECT en_wiki FROM [fh-bigquery:wikidata.latest_en_v1] WHERE instance_of.numeric_id=146 ) b ON a.title=b.en_wiki GROUP BY 1 ORDER BY 2 DESC LIMIT 100 (15 seconds, same answers, but only 14 of the 39 cats are in the top 365k pages) What I need help with: - Advice, feedback? - My "raw to table" javascript code is incomplete and not very pretty - which columns would you want extracted? https://github.com/fhoffa/code_snippets/blob/master/wikidata/create_wiki_en… Try it out... it's free (up to a replenishing monthly limit), and I wrote instructions to get started while at the last Wikimania in Mexico: https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_50_billion_wiki… Thanks, hopefully this is useful to the Wikidata community. --Felipe Hoffa https://twitter.com/felipehoffa

5 6

Wikimania submission deadline approaching
by Tobias Schönberg 11 Mar '16

11 Mar '16

Hi all! Friendly reminder: Please visit this page and submit your discussions, posters, or training session before the 20th March 2016: https://wikimania2016.wikimedia.org/wiki/Submissions -Tobias

1 0

Call for Workshop and Tutorial proposals - EKAW 2016
by EKAW 2016 11 Mar '16

11 Mar '16

Apologies for cross-posting. ======================================================== CALL FOR WORKSHOP AND TUTORIAL PROPOSALS 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2016) Workshop and Tutorial days: 19-20 November 2016, Bologna, Italy Proposal submission: May 25, 2016 Web site: http://ekaw2016.cs.unibo.it/ ======================================================== The International Conference on Knowledge Engineering and Knowledge Management (EKAW) is concerned with all aspects of eliciting, acquiring, modeling and managing knowledge, as well its role in the construction of knowledge-intensive systems and services for the semantic web, knowledge management, e-business, natural language processing, intelligent information integration, etc. Besides a research track, EKAW will host a number of workshops and tutorial on topics related to the theme of the conference. We hope our workshops to provide an informal setting where participants have the opportunity to discuss specific technical topics in an atmosphere that fosters the active exchange of ideas; and tutorials to enable attendees to fully appreciate current issues, main schools of thought, and possible application areas. == TOPICS OF INTEREST == In order to meet these goals, workshop/tutorial proposals should address topics that satisfy the following criteria: - the topic falls in the general scope of EKAW 2016 ( http://ekaw2016.cs.unibo.it/?q=callforpapers), - there is a clear focus on a specific technology, problem or application, and - there is a sufficiently large community interested in the topic. == SUBMISSION GUIDELINES == Proposals should be submitted via EasyChair, which will be made available on EKAW’s web site shortly (http://ekaw2016.cs.unibo.it/). Submissions should be a single PDF file of no more than 5 pages, specifying "Workshop Proposal" or "Tutorial Proposal", and should contain the following information. Workshop proposals: - Title. - Abstract (200 words). - Motivation on why the topic is of particular interest at this time and its relation to the main conference topics. - Workshop format, discussing the mix of events such as paper presentations, invited talks, panels, and general discussion. - Intended audience and expected number of participants. - List of (potential) members of the program committee (at least 25% have to be confirmed at the time of the proposal, confirmed participants should be marked specifically). - Indication of whether the workshop should be considered for a half-day or full-day. - The tentative dates (submission, notification, camera-ready deadline, etc.) - Past versions of the workshop, including URLs as well as number of submissions and acceptance rates. - Data of the organizers (name, affiliation, email address, homepage) and short CV. Additionally - we strongly advise having more than one organizer, preferably from different institutions, bringing different perspectives to the workshop topic. - we welcome, and will prioritise, workshops with creative structures and organizations that attract various types of contributions and ensure rich interactions. Tutorial proposals: - Title. - Abstract (200 words). - Relation to the conference topics, i.e. why it will be of interest to the conference attendants. - If the tutorial, or a very similar tutorial, has been given elsewhere, explanation of the benefit of presenting it again to the EKAW community. - Overview of content, description of the aims, presentation style, potential/preferred prerequisite knowledge. - Indication on whether the tutorial should be considered for a half-day or full-day. - Intended audience and expected number of participants. - Audio-visual or technical requirements and any special room requirements (for hands-on sessions, any software needed and download sites must be provided by the tutorial presenters). - Data of the presenters (name, affiliation, email address, homepage) and short CV including also their expertise, experiences in teaching and in tutorial presentation. == WORKSHOP ORGANIZERS RESPONSIBILITIES == The organizers of accepted workshops are expected to: - prepare a workshop webpage (linked to the official EKAW website) containing the call for papers and detailed information about the workshop organization and timelines. - be responsible for the workshop publicity. - be responsible for their own reviewing process, decide upon the final program content and report the number of submissions and accepted papers to the workshop chair. - be responsible for publishing electronic proceedings (e.g., on the CEUR-WS website). - assure workshop participants are informed they have to register to the main conference and the workshop. - schedule, attend and coordinate their entire workshop. == TUTORIAL ORGANIZERS RESPONSIBILITIES == The proposers of accepted tutorials are expected to prepare a tutorial webpage (linked to the official EKAW website) containing detailed information about the tutorial, and to distribute material to participants. == SUBMISSION DATES AND DETAIL == Important Dates - Proposals due: May 25, 2016 - Notifications: June 27, 2016 Suggested Timeline for Workshops - Workshop website up and calls: July 18, 2016 - Deadline to submit Papers to Workshops: September 15, 2016 - Acceptance of Papers for Workshops: October 6, 2016 - Workshop days: November 19-20, 2016 == CHAIRS == - Jun Zhao (University of Oxford) - Matthew Horridge (Stanford University)

1 0

ESWC 2016 - Call for Challenge Entries, and new Wikidata Bonus Challenge
by Heiko Paulheim 10 Mar '16

10 Mar '16

Apologies for cross-posting ======================================================== 13th ESWC 2016 http://2016.eswc-conferences.org/call-challenges Call for Semantic Web Challenges Entries Open Knowledge Extraction (OKE) Challenge Challenge on Semantic Sentiment Analysis Conference Live app Challenge Open Challenge on Question Answering over Linked Data Top-K Shortest Path in Large Typed RDF Graphs Challenge Semantic Publishing Challenge schema.org - Bonus Challenge Wikidata - Bonus Challenge ======================================================== OVERVIEW The 13th ESWC, to be held from May 29th to June 2nd in Heraklion, Crete, features no less than seven challenges this year! The purpose of the challenges is to showcase the maturity of state of the art methods and tools on tasks common to the Semantic Web community and adjacent disciplines, in a controlled setting involving rigorous evaluation. Semantic Web Challenges are an official track of the conference, ensuring significant visibility for the challenges as well as participants. Challenge participants are asked to present their submissions as well as provide a paper describing their work. The details of the submissions may vary per challenge and will be found in the individual calls. These papers must undergo a peer-review by experts relevant to the challenge task, and will be published in the official ESWC2016 Satellite Events proceedings. IMPORTANT DATES Individual challenges may deviate from these dates but as a rule the following dates apply: * Training data ready and challenges Calls for Papers sent: Friday January 15th, 2016 * Challenge papers submission deadline: Monday March 21st, 2016 * Challenge paper reviews due: Tuesday April 5th, 2016 * Notifications sent to participants and invitations to submit task results: Friday April 8th, 2016 * Camera ready papers due: Sunday April 24th, 2016 CHALLENGES AT A GLANCE Open Knowledge Extraction (OKE) Challenge The OKE challenge, launched as first edition at last year Extended Semantic Web Conference, ESWC2015, has the ambition to provide a reference framework for researc on Knowledge Extraction from text for the Semantic Web by re-defining a number of tasks (typically from information and knowledge extraction), taking into account specific SW requirements. http://2016.eswc-conferences.org/eswc-16-open-knowledge-extraction-oke-chal… Challenge on Semantic Sentiment Analysis Social media evolution has given users one important opportunity for expressing their thoughts and opinions online. The information thus produced is related to many different areas such as commerce, tourism, education, health and causes the size of the Social Web to expand exponentially. http://2016.eswc-conferences.org/eswc-16-challenge-semantic-sentiment-analy… Conference Live app Challenge In the past two years the Extended Semantic Web Conference (ESWC) has provided a semantic Web application to browse conference data. The application, called Conference Live, is a Web and mobile application based on conference data from the Semantic Web Dog Food server, which provides facilities to browse papers and authors at a specific conference. http://2016.eswc-conferences.org/eswc-16-conference-live-app-challenge 6th Open Challenge on Question Answering over Linked Data (QALD-6) The past years have seen a growing amount of research on question answering over Semantic Web data, shaping an interaction paradigm that allows end users to profit from the expressive power of Semantic Web standards while at the same time hiding their complexity behind an intuitive and easy-to-use interface. The Question Answering over Linked Data challenge provides an up-to-date benchmark for assessing and comparing systems that mediate between a user, expressing his or her information need in natural language, and RDF data. http://2016.eswc-conferences.org/6th-open-challenge-question-answering-over… Top-K Shortest Path in Large Typed RDF Graphs Challenge The advent of SPARQL 1.1 introduced property paths as a new graph matching paradigm that allows the employment of Kleene star * (and it's variant +) unary operators to build SPARQL queries that are agnostic of the underlying RDF graph structure. The ability to express path patterns that are agnostic of the underlying graph structure is certainly a step forward. http://2016.eswc-conferences.org/top-k-shortest-path-large-typed-rdf-graphs… Semantic Publishing Challenge 2016 – Assessing the Quality of Scientific Output in its Ecosystem This is the next iteration of the successful Semantic Publishing Challenge of ESWC 2014 and 2015. We continue pursuing the objective of assessing the quality of scientific output, evolving thedataset bootstrapped in 2014 and 2015 to take into account the wider ecosystem of publications. http://2016.eswc-conferences.org/assessing-quality-scientific-output-its-ec… schema.org - Bonus Challenge Rather than create a separate schema.org challenge, we encourage where appropriate submissions to other ESWC2016 challenges to consider also exploring schema.org's relationship with Linked Data and Semantic Web tools, technologies, vocabularies and datasets. http://2016.eswc-conferences.org/bonus-challenge Wikidata - Bonus Challenge Wikidata is the largest free and open general purpose knowledge base in the world, collecting a wide variety of common and specialized knowledge in a machine-readable form. Wikimedia projects like Wikipedia make use of the data to enrich their articles. Anyone else is equally welcome to use the data in Wikidata to enrich their applications or do research, for example. Over the past 3 years, Wikidata has grown rapidly and build a great community around structured knowledge. The purpose of this additional challenge is to explore ways of closing key gaps in Wikidata or between Wikidata and the Linked Data and Semantic Web community. First we offer a brief background on Wikidata and its current key gaps, then we outline how this relates to this year's set challenges. http://2016.eswc-conferences.org/wikidata-challenge CONTACT ESWC 2016 Challenge Chairs * Stefan Dietze, L3S Research Center, Germany (dietze(a)l3s.de) * Anna Tordai, Elsevier, Netherlands (a.tordai(a)elsevier.com) -- Prof. Dr. Heiko Paulheim Data and Web Science Group University of Mannheim Phone: +49 621 181 2661 B6, 26, Room C1.09 D-68159 Mannheim Mail: heiko(a)informatik.uni-mannheim.de Web: www.heikopaulheim.com

1 0

nice
by Gerard Meijssen 09 Mar '16

09 Mar '16

Hoi, A mix of maps Wikidata and DBpedia.. I wonder what DBpedia has that we do not. Thanks, GerardM

7 9

DBpedia link
by Joakim Soderberg 09 Mar '16

09 Mar '16

Hi, Is there a sameAs property (P) for DBpediaId?

1 0

weekly summary #199
by Lydia Pintscher 07 Mar '16

07 Mar '16

Hey folks :) Here's your summary of all things Wikidata of the past week. Events <https://www.wikidata.org/wiki/Wikidata:Events>/Press/Blogs <https://www.wikidata.org/wiki/Wikidata:Press_coverage> - Oscars 2016: Movies that got the most attention on Wikipedia <https://medium.com/google-cloud/oscars-2016-movies-that-got-the-most-attent…> (followup with most famous cats <https://twitter.com/felipehoffa/status/705068002522304512>) - Statements per item <http://ultimategerardm.blogspot.de/2016/03/wikidata-statements-per-item.html> Other Noteworthy Stuff - COOL-WD has been announced to help assess the completeness of various parts of the knowledge base <https://lists.wikimedia.org/pipermail/wikidata/2016-March/008322.html> - You can apply for a grant as part of the Inspire Campaign around the topic of content review and curation <https://meta.wikimedia.org/wiki/Grants:IdeaLab/Inspire> - Wikimedia was accepted for Google Summer of Code. If you're a student this can be your chance to work on Wikidata code this summer. <https://lists.wikimedia.org/pipermail/wikitech-l/2016-March/084921.html> - A catalog of Uruguayan authors has been added to Mix'n'match <https://tools.wmflabs.org/mix-n-match/?mode=catalog_details&catalog=175> - Listeria bot now has experimental support for references (example <https://en.wikipedia.org/wiki/User:Magnus_Manske/listeria_test4>) - Self-portraits of women through the ages, powered by Wikidata <https://commons.wikimedia.org/wiki/Self-portraits_of_women> Did you know? - Newest properties <https://www.wikidata.org/wiki/Special:ListProperties>: Québec cultural heritage directory people identifier <https://www.wikidata.org/wiki/Property:P2592>, grammatical option indicates <https://www.wikidata.org/wiki/Property:P2591>, Statistics Indonesia language code <https://www.wikidata.org/wiki/Property:P2590>, Statistics Indonesia ethnicity code <https://www.wikidata.org/wiki/Property:P2589>, village code of Indonesia <https://www.wikidata.org/wiki/Property:P2588>, has phoneme <https://www.wikidata.org/wiki/Property:P2587>, INSEE departement code <https://www.wikidata.org/wiki/Property:P2586>, INSEE region code <https://www.wikidata.org/wiki/Property:P2585>, Australian Wetlands Database Directory of Important Wetlands Reference Code <https://www.wikidata.org/wiki/Property:P2584>, distance from Earth <https://www.wikidata.org/wiki/Property:P2583>, J. Paul Getty Museum object id <https://www.wikidata.org/wiki/Property:P2582>, BabelNet id <https://www.wikidata.org/wiki/Property:P2581>, Baltisches Biographisches Lexikon digital ID <https://www.wikidata.org/wiki/Property:P2580>, studied by <https://www.wikidata.org/wiki/Property:P2579>, studies <https://www.wikidata.org/wiki/Property:P2578>, admissible rule in <https://www.wikidata.org/wiki/Property:P2577>, UCSC Genome Browser assembly ID <https://www.wikidata.org/wiki/Property:P2576>, measures <https://www.wikidata.org/wiki/Property:P2575>, National-Football-Teams.com player ID <https://www.wikidata.org/wiki/Property:P2574>, number of out of school children <https://www.wikidata.org/wiki/Property:P2573>, Twitter hashtag <https://www.wikidata.org/wiki/Property:P2572>, uncertainty corresponds to <https://www.wikidata.org/wiki/Property:P2571>, Saros cycle of eclipse <https://www.wikidata.org/wiki/Property:P2570>, contact times of eclipse <https://www.wikidata.org/wiki/Property:P2569>, repealed by <https://www.wikidata.org/wiki/Property:P2568>, amended by <https://www.wikidata.org/wiki/Property:P2567>, ECHA InfoCard ID <https://www.wikidata.org/wiki/Property:P2566>, global-warming potential <https://www.wikidata.org/wiki/Property:P2565> - Query example: French heads of government by length of service <https://query.wikidata.org/#PREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.s…>, Metro stations in Paris <https://query.wikidata.org/#PREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.s…> Development - Lucie and Charlie handed in their Bachelor theses on the Article Placeholder and Editing Wikidata from Wikipedia. Work on both topics will continue. We'll publish both theses soon. A big congrats to both of them! - More work on the first prototype for Commons support ( phabricator:T125822 <https://phabricator.wikimedia.org/T125822>) - Continued working on improving language support for monolingual text and labels/descriptions/aliases (phabricator:T125066 <https://phabricator.wikimedia.org/T125066>) - Conversion of properties from string to external identifier datatype is ongoing - Experimenting with putting query results on a map in query.wikidata.org (earthquakes <https://twitter.com/filbertkm/status/705729435857715200>, ski resorts <https://twitter.com/filbertkm/status/706102977258512384>) You can see all open tickets related to Wikidata here <https://phabricator.wikimedia.org/maniphest/query/4RotIcw5oINo/#R>. Monthly Tasks - Hack on one of these <https://phabricator.wikimedia.org/maniphest/query/R8GRzX1eH0tb/#R>. - Help develop the next summary here! <https://www.wikidata.org/wiki/Wikidata:Status_updates/Next> - Contribute to a Showcase item <https://www.wikidata.org/wiki/Wikidata:Showcase_items> - Help translate <https://www.wikidata.org/wiki/Special:LanguageStats> or proofread pages in your own language! - Add labels, in your own language(s), for the new properties listed above. Anything to add? Please share! :) Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207.

1 0

SPARQL/BlazeGraph: label service performance
by Markus Kroetzsch 06 Mar '16

06 Mar '16

Hi, There is a performance issue with the labelling service. Using labels makes even simple queries time out. For example this one: SELECT $p $pLabel WHERE { $p wdt:P31 _:bnode . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } LIMIT 11 The workaround is to use subqueries. For example, the following query returns immediately: SELECT $p $pLabel WHERE { { SELECT $p WHERE { $p wdt:P31 _:bnode . } LIMIT 11 } SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } } I strongly suppose that almost every use of the labelling service could be performed like this (the only exception is when you apply further query conditions on the label). BlazeGraph should recognize this. Meanwhile, everybody who uses queries with labels in an application should rewrite them as above to get the best performance (and reduce load on the query service ;-). Cheers, Markus -- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/

2 2

Re: [Wikidata] nice
by Marco Fossati 06 Mar '16

06 Mar '16

Glad to see an effort that integrates data from both databases! Marco On 3/3/16 13:00, wikidata-request(a)lists.wikimedia.org wrote: > Date: Wed, 02 Mar 2016 22:00:03 +0000 > From: Denny Vrandečić<vrandecic(a)gmail.com> > To: "Discussion list for the Wikidata project." > <wikidata(a)lists.wikimedia.org> > Subject: Re: [Wikidata] nice > Message-ID: > <CAJVtBfcZpTeMpaXobe-Zw4zC4SakNbNxiMKY6-e3qL60MD-GOQ(a)mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > (and to make it clear, it is unclear whether this is an error due to > DBpedia or due to the companies extraction framework, I was not diving into > the data) > > On Wed, Mar 2, 2016 at 1:59 PM Denny Vrandečić<vrandecic(a)gmail.com> wrote: > >> >Depends how good the DBpedia data really is - as the BBC article says, >> >some 2007 football match in the UK was extracted as a "Battle"... >> > >> >On Wed, Mar 2, 2016 at 1:54 PM Daniel Kinzler<daniel.kinzler(a)wikimedia.de> >> >wrote: >> > >>> >>"They found 12,703 battles which had an exact location and date, 2,657 of >>> >>them >>> >>are from Wikidata, the others are from DPpedia." >>> >> >>> >>Maybe we can do better? >>> >> >>> >>Am 02.03.2016 um 22:14 schrieb Lydia Pintscher: >>>> >> >On Wed, Mar 2, 2016 at 8:14 PM Gerard Meijssen < >>> >>gerard.meijssen(a)gmail.com >>>> >> ><mailto:gerard.meijssen@gmail.com>> wrote: >>>> >> > >>>> >> > Hoi, >>>> >> > Yup I missed that one.. this [1] was my source:) >>>> >> > Gerard >>>> >> > >>>> >> > [1]http://www.bbc.com/news/magazine-35685889 >>>> >> > >>>> >> > >>>> >> >This is really great. I am thrilled about this because this isn't >>> >>coverage about >>>> >> >Wikidata but coverage_with_ Wikidata on major news sites for the >>> >>second time >>>> >> >this week >>>> >> >( >>> >>http://www.faz.net/aktuell/feuilleton/kino/academy-awards-die-oscars-von-19… >>> >>being >>>> >> >the other one). They're using Wikidata data to do meaningful reporting. >>> >>Our data >>>> >> >and the project as a whole got (at the very least) good enough for >>> >>this. It >>>> >> >feels to me like we've broken through a wall. >>>> >> >High5 everyone! :D >>>> >> > >>>> >> >Cheers >>>> >> >Lydia >>>> >> >-- >>>> >> >Lydia Pintscher -http://about.me/lydia.pintscher >>>> >> >Product Manager for Wikidata >>>> >> > >>>> >> >Wikimedia Deutschland e.V. >>>> >> >Tempelhofer Ufer 23-24 >>>> >> >10963 Berlin >>>> >> >www.wikimedia.de <http://www.wikimedia.de> >>>> >> > >>>> >> >Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. >>>> >> > >>>> >> >Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg >>> >>unter der >>>> >> >Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für >>>> >> >Körperschaften I Berlin, Steuernummer 27/029/42207. >>>> >> > >>>> >> > >>>> >> >_______________________________________________ >>>> >> >Wikidata mailing list >>>> >> >Wikidata(a)lists.wikimedia.org >>>> >> >https://lists.wikimedia.org/mailman/listinfo/wikidata >>>> >> > >>> >> >>> >> >>> >>-- >>> >>Daniel Kinzler >>> >>Senior Software Developer >>> >> >>> >>Wikimedia Deutschland >>> >>Gesellschaft zur Förderung Freien Wissens e.V. >>> >> >>> >>_______________________________________________ >>> >>Wikidata mailing list >>> >>Wikidata(a)lists.wikimedia.org >>> >>https://lists.wikimedia.org/mailman/listinfo/wikidata >>> >> >> >

3 2

Announcing COOL-WD: A Completeness Tool for Wikidata
by Darari Fariz 05 Mar '16

05 Mar '16

Hello Wikidata community! Wikidata is a great platform for collecting information, and the high quality work of many authors yields very reliable information. Still, a challenge for users of Wikidata is that there is no way to see whether *all* data on a certain topic is in Wikidata. For instance, it is easy to see that Malia and Sasha are children of Obama, but there is no way to specify that these are all his children. More generally, Wikidata stores many facts, but it stores no information about which topic it contains all facts. Today we are happy to share with you a prototype that allows to add and manage such completeness information, and would be happy to get your feedback on how useful you consider this tool, or where you see space for improvements. With our prototype, called COOL-WD (Completeness Tool for Wikidata), one can: 1. See completeness statements for Wikidata facts 2. Add, remove, aggregate and filter completeness statements 3. See how completeness statements allow conclusions about the completeness of SPARQL queries over Wikidata. COOL-WD is available at http://cool-wd.inf.unibz.it/ and a 3-min demo video can be found at http://cool-wd.inf.unibz.it/coolwd-hd.mp4 It employs various libraries, most importantly GWT, Apache Jena, SQLite and the Wikidata API. The formal background and description of the tool including an indexing technique for completeness statements have been accepted as a research paper at ICWE 2016 (http://icwe2016.inf.usi.ch/) available to download at: http://bit.ly/1VOsRCH Below are some naive ideas of how completeness could be useful to users: > Use Case 1: Rido is a geographer who would like to contribute to Wikidata about the administrative divisions of regions. He cares so much about data quality, especially data completeness, and is collaborating with Simon, another geographer. However, when completing data on Wikidata, there is currently no way to mark which data is complete. Rido and Simon must make these notes about completeness manually in, say, a Google Doc. Worse still, the effort from Rido and Simon to complete data could not be appreciated by Wikidata users since to the users’ eyes, there is no difference between complete data and incomplete data on Wikidata. Demo: Wikidata is complete for all administrative divisions of Saxony (http://cool-wd.inf.unibz.it/?p=Q1202) > Use Case 2: Jen is a developer of a moviegoer application. She usually integrates data between multiple sources including Wikidata. If some movies on Wikidata have completeness statements, she might optimize her application to not search in other data sources for those movies. Demo: So, when her app is asking on COOL-WD at http://cool-wd.inf.unibz.it/?p=query for cast and screenwriters of the movie Before Sunset (http://cool-wd.inf.unibz.it/?p=Q652186): SELECT * WHERE { wd:Q652186 wdt:P161 ?c . wd:Q652186 wdt:P58 ?s } Her app gets not only query answers but also the completeness information of her query. We are looking forward to your feedback! Best, Fariz, Simon, Rido, and Werner Free University of Bozen-Bolzano, Italy

10 14

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata March 2016