Wikidata September 2015

wikidata@lists.wikimedia.org

62 participants
50 discussions

[ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
by Marco Fossati 07 Sep '15

07 Sep '15

[Begging pardon if you have already read this in the Wikidata project chat] Hi everyone, As Wikidatans, we all know how much data quality matters. We all know what high quality stands for: statements need to be validated via references to external, non-wiki, sources. That's why the primary sources tool is being developed: https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool And that's why I am preparing the StrepHit IEG proposal: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a Natural Language Processing pipeline that understands human language, extracts structured data from raw text and produces Wikidata statements with reference URLs. As a demonstration to support the IEG proposal, you can find the **FBK-strephit-soccer** dataset uploaded to the primary sources tool backend. It's a small dataset serving the soccer domain use case. Please follow the instructions on the project page to activate it and start playing with the data. What is the biggest difference that sets StrepHit datasets apart from the currently uploaded ones? At least one reference URL is always guaranteed for each statement. This means that if StrepHit finds some new statement that was not there in Wikidata before, it will always propose its external references. We do not want to manually reject all the new statements with no reference, right? If you like the idea, please endorse the StrepHit IEG proposal! Cheers, -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j

4 3

Re: [Wikidata] [Wikimedia-l] LsJbot and geonames
by Gerard Meijssen 06 Sep '15

06 Sep '15

Hoi, PLEASE reconsider. A Wikidata based solution is not superior because it started from Wikidata. PLEASE consider collaboration. It will be so much more powerful when LSJBOT and people at Wikidata collaborate. It will get things right the first time. It does not have to be perfect from the start as long as it gets better over time. As long as we always work on improving the data. PLEASE consider text generation based on Wikidata. They are the scripts LSJBOT uses, they can help us improve the text when more or better information becomes available. Thanks, GerardM On 6 September 2015 at 08:25, Ricordisamoa <ricordisamoa(a)openmailbox.org> wrote: > Proper data-based stubs are being worked on: > https://phabricator.wikimedia.org/project/profile/1416/ > Lsjbot, you have no chance to survive make your time. > > > Il 06/09/2015 02:40, Anders Wennersten ha scritto: > >> Geonames [1] is a database which holds around 9 M entries of geographical >> related items from all over the world. >> >> Lsjbot is now generating articles from a subset of it, after several >> months of extensive research on its quality, Wikidata relations and >> notability issues. While the quality in some regions is substandard (and >> these will not be generated) it was seen as very good in most areas. In >> the discussion I was intrigued to learn that identical Arabic names should >> be transcribed differently depending on its geographic location. And I was >> fascinated of the question of notability of wells in the Bahrain desert >> (which in the end was excluded, mostly because we knew too little of that >> reality) >> >> In this run Lsjbot has extended its functionality even further then when >> it generated articles for species. It looks for relevant geographical items >> close to the actual one: a lake close by, a mountain and where is the >> nearest major town etc. >> >> Macedonia can be taken as one example. Lsjbot generated over 10000 >> articles (and 5000 disambiguous pages) making it a magnitude more than what >> exist in enwp. Also for a well defined type like villages, almost 50% as >> many has been generated than existing in enwp. One example [2] where you >> can see what has been generated (and note the reuse of a relevant figure >> existing in frwp). Please compare the corresponding articles on other >> languages in this case, many having less information than the bot generated >> one. >> >> The generation is still in early stage [3) but has already got the >> article count for svwp to pass 2 M today. But it will take many months >> more before completed and perhaps more M marks will be passed before it is >> through. If you want to give feedback you are welcome to enter it at [4] >> >> Anders >> (with all credits for the Lsjbot to be given to Sverker, its owner, I am >> just one of the many supporters of him and his bot on svwp) >> >> [1] >> http://www.geonames.org/about.html >> >> [2] >> https://sv.wikipedia.org/wiki/Polaki_%28ort_i_Makedonien%29 >> >> [3] >> https://sv.wikipedia.org/wiki/Kategori:Robotskapade_geografiartiklar >> >> [4] >> >> https://sv.wikipedia.org/wiki/Anv%C3%A4ndardiskussion:Lsjbot/Projekt_alla_p… >> >> >> >> >> _______________________________________________ >> Wikimedia-l mailing list, guidelines at: >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines >> Wikimedia-l(a)lists.wikimedia.org >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >> > > > _______________________________________________ > Wikimedia-l mailing list, guidelines at: > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines > Wikimedia-l(a)lists.wikimedia.org > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, > <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >

1 0

Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
by Marco Fossati 05 Sep '15

05 Sep '15

Hi Gerard, Let me add a further reply to your comment. On 9/5/15 2:01 PM, wikidata-request(a)lists.wikimedia.org wrote: > Message: 3 > Date: Fri, 4 Sep 2015 19:26:38 +0200 > From: Gerard Meijssen<gerard.meijssen(a)gmail.com> > > No. > Quality is not determined by sources. Sources do lie. > > When you want quality, you seek sources where they matter most. It is not > by going for "all" of them I completely agree with you that many sources can be flawed. I may have neglected the term "trustworthy" before "sources" and added it in the Wikidata project chat. The IEG proposal will also include an investigation phase to select a set of authoritative sources, see the first task in the proposal work package: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Va… I'll expand on this. Cheers, -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j

1 0

Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
by Marco Fossati 05 Sep '15

05 Sep '15

Dear all, On 9/5/15 2:01 PM, wikidata-request(a)lists.wikimedia.org wrote: > Message: 3 > Date: Fri, 4 Sep 2015 19:26:38 +0200 > From: Gerard Meijssen<gerard.meijssen(a)gmail.com> > > Quality is not determined by sources. Sources do lie. > > When you want quality, you seek sources where they matter most. Thanks @Gerard for your criticism, let me reply to your concerns. The following references contrast your points. I got inspired by them when developing the idea: https://www.wikidata.org/wiki/Wikidata:Referencing_improvements_input http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-t… https://tools.wmflabs.org/wikidata-todo/sourcery.html https://phabricator.wikimedia.org/T76230 https://phabricator.wikimedia.org/T76232 https://phabricator.wikimedia.org/T76231 https://phabricator.wikimedia.org/T90881 > > Message: 4 > Date: Fri, 4 Sep 2015 19:34:22 +0200 > From: Lydia Pintscher<lydia.pintscher(a)wikimedia.de> > > Thank you for working on this, Marco. This is a great step forward. I > wish you good luck for the IEG proposal! Thanks @Lydia for your encouragement! Cheers, -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j

1 0

Naming Conventions for URIs
by Paul Houle 04 Sep '15

04 Sep '15

Tell me if I am right or wrong about this. If I am coining a URI for something that has an identifier in an outside system is is straightforward to append the identifier (possibly modified a little) to a prefix, such as http://dbpedia.org/resource/Stellarator Then you can write @prefix dbpedia: <http://dbpedia.org/resource/> and then refer to the concept (in either Turtle or SPARQL) as dbpedia:Stellarator. I will take one step further than this and say that for pedagogical and other coding situations, the extra length of prefix declarations is an additional cognitive load on top of all the other cognitive loads of dealing with the system, so in the name of concision you can do something like @base <http://dbpedia.org/resource/> @prefix : <http://dbpedia.org/ontology/> and then you can write :someProperty and <Stellarator>, and your queries are looking very simple. The production for a QName cannot begin with a number so it is not correct to write something like dbpedia:100 or expect to have the full URI squashed to that. This kind of gotcha will drive newbies nuts, and the realization of RDF as a universal solvent requires squashing many of them. Another example is isbn:9971-5-0210-0 If you look at the @base declaration above, you see a way to get around this, because with the base above you can write <100> which works just fine in the dbpedia case. I like what Wikidata did with using fairly dense sequential integers for the ids, so a dbpedia resource URI looks like http://www.wikidata.org/entity/Q4876286 which is always a QName, so you can write @base <http://www.wikidata.org/entity/> @prefix wd: <http://www.wikidata.org/entity/> and then you can write wd:Q4876286 <Q4876286> and it is all fine, because (i) wikidata added the alpha prefix and (ii) started at the beginning with it, and (iii) made up a plausible explanation for it is that way. Freebase mids have the same property, so :BaseKB has it too I think customers would expect to be able to give us isbn:0884049582 and have it just work, but because a number is never valid in the QName, you can encode the URI like this: http://isbn.example.com/I0884049582 and then write isbn:I0884049582 <I0884049582> which is not too bad. Note, however, if you want to write <0884049582> you have to encode as http://isbn.example.com/I0884049582 because, at least with the Jena framework, the same thing happens if you write @base <http://isbn.example.com/I> or @base <http://isbn.example.com/> so you can't choose a representation which supports that mode of expression and a :+prefix mode. Now what bugs me is, what to do in the case of something which "might or might not be numeric". What internal prefix would find good acceptability for end users? -- Paul Houle *Applying Schemas for Natural Language Processing, Distributed Systems, Classification and Text Mining and Data Lakes* (607) 539 6254 paul.houle on Skype ontology2(a)gmail.com :BaseKB -- Query Freebase Data With SPARQL http://basekb.com/gold/ Legal Entity Identifier Lookup https://legalentityidentifier.info/lei/lookup/ <http://legalentityidentifier.info/lei/lookup/> Join our Data Lakes group on LinkedIn https://www.linkedin.com/grp/home?gid=8267275

7 8

[Fwd] Gerrit Cleanup Day: Wed, Sep 23
by Andre Klapper 04 Sep '15

04 Sep '15

The Wikidata crew is very welcome to join! andre -------- Forwarded Message -------- From: Andre Klapper <aklapper(a)wikimedia.org> To: Wikimedia developers <wikitech-l(a)lists.wikimedia.org> Subject: Gerrit Cleanup Day: Wed, Sep 23 Date: Tue, 01 Sep 2015 00:27:12 +0200 I'm happy to announce a Gerrit Cleanup Day on Wed, September 23. It's an experiment to reduce Wikimedia's code review backlog which hurts growing our long-term code contributor base. Development/engineering teams of the Wikimedia Foundation are supposed to join and use the day to primarily review recently submitted open Gerrit changesets without a review, focussing on volunteer contributions. And developers of other organizations and individual developers are of course also very invited to join and help! :) https://phabricator.wikimedia.org/T88531 provides more information, steps, links to Gerrit queries. Note it's still work in progress. Your questions and feedback are welcome. Thanks, andre -- Andre Klapper | Wikimedia Bugwrangler http://blogs.gnome.org/aklapper/

2 1

Wikidata Toolkit 0.5.0 released
by Markus Kroetzsch 04 Sep '15

04 Sep '15

Hi all, I am happy to announce the release of Wikidata Toolkit 0.5.0 [1], the Java library for programming with Wikidata and Wikibase. The most prominent new feature of this release is Wikibase API support, which allows you create Java programs that read and write data to Wikidata (or any other Wikibase site). The API write functionality checks the live data before making edits to merge statements for you and to detect edit conflicts. New example programs illustrate this functionality. Overall, we think this will make WDTK interesting for bot authors. Other prominent features include: * Unit support (just in time before it is enabled on Wikidata.org ;-) * Processing of local dump files not downloaded from Wikimedia (useful for other Wikibase users) * New builder classes to simplify construction of the rather complex data objects we have in Wikidata * WorldMapProcessor example (the code used to build the Wikidata maps) * Improved output file naming for examples, taking dump date into account * Several improvements in RDF export (but the general RDF structure is as in 0.4.0; updating this to the new structure we have for the official SPARQL endpoint is planned for the next release). Maven users can get the library directly from Maven Central (see [1]); this is the preferred method of installation. It might still take a moment until the new packages become visible in Maven Central. There is also an all-in-one JAR at github [3] and of course the sources [4] and updated JavaDocs [5]. Feedback is very welcome. Developers are also invited to contribute via github. Cheers, Markus [1] https://www.mediawiki.org/wiki/Wikidata_Toolkit [2] https://www.mediawiki.org/wiki/Wikidata_Toolkit/Client [3] https://github.com/Wikidata/Wikidata-Toolkit/releases [4] https://github.com/Wikidata/Wikidata-Toolkit/ [5] http://wikidata.github.io/Wikidata-Toolkit/ -- Markus Kroetzsch Faculty of Computer Science Technische Universität Dresden +49 351 463 38486 http://korrekt.org/

5 7

mobile improvements going live soon
by Lydia Pintscher 03 Sep '15

03 Sep '15

Hey folks :) As part of his internship with us Bene has been working on making Wikidata more usable on mobile devices. We'll be redirecting users on mobile devices there soon just like it is done on Wikipedia. You can go and have a look at it here: http://m.wikidata.beta.wmflabs.org/wiki/Q15905 Tracking bug: https://phabricator.wikimedia.org/T78430 Remaining issues: * Only labels, descriptions, aliases and sitelinks editable (via special pages) https://phabricator.wikimedia.org/T95878 * Special pages don't lead to mobile version https://phabricator.wikimedia.org/T103428 * Other languages section currently hidden https://phabricator.wikimedia.org/T91397 * Search currently not working correctly but WIP https://phabricator.wikimedia.org/T85368 * Identifiers not linked as no gadgets work in MobileFrontend https://phabricator.wikimedia.org/T85365 (will be fixed with https://phabricator.wikimedia.org/T95682) * Too many edit buttons for sitelinks, should be only one for all sitelinks https://phabricator.wikimedia.org/T110901 * Table of contents is too large https://phabricator.wikimedia.org/T110902 * Diffs do not adjust to mobile styling https://phabricator.wikimedia.org/T95883 Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

2 2

First version for units is ready for testing!
by Lydia Pintscher 01 Sep '15

01 Sep '15

Hi everyone :) We've finally done all the groundwork for unit support. I'd love for you to give the first version a try on the test system here: http://wikidata.beta.wmflabs.org/wiki/Q23950 There are a few known issues still but since this is one of the things holding back Wikidata I made the call to release now and work on these remaining things after that. What I know is still missing: * We're showing the label of the item of the unit. We should be showing the symbol of the unit in the future. (https://phabricator.wikimedia.org/T77983) * We can't convert between units yet - we only have the groundwork for it so far. (https://phabricator.wikimedia.org/T77978) * The items representing often-used units should be ranked higher in the selector. (https://phabricator.wikimedia.org/T110673) * When editing an existing value you see the URL of unit's item. This should be replaced by the label. (https://phabricator.wikimedia.org/T110675) * When viewing a diff of a unit change you see the URL of the unit's item. This should be replaced by the label. (https://phabricator.wikimedia.org/T108808) * We need to think some more about the automatic edit summaries for unit-related changes. (https://phabricator.wikimedia.org/T108807) If you find any bugs or if you are missing other absolutely critical things please let me know here or file a ticket on phabricator.wikimedia.org. If everything goes well we can get this on Wikidata next Wednesday. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

5 12

Semantic MediaWiki Conference Fall 2015: Call for Contributions
by Lia Veja 01 Sep '15

01 Sep '15

Please feel free to redistribute! Lia Veja From: [[kgh] ] <mediawiki(a)kghoffmeyer.de> Date: Tue, Sep 1, 2015 at 8:43 AM Subject: Semantic MediaWiki Conference Fall 2015: Call for Contributions To: Semantic MediaWiki users <semediawiki-user(a)lists.sourceforge.net>, semediawiki-devel(a)lists.sourceforge.net Cc: Alina Mierlus <alina(a)similis.cc>, Toni Hermoso Pulido <toniher(a)cau.cat>, Lia Veja <cornelia.veja(a)gmail.com>, Karsten Hoffmeyer <karsten(a)hoffmeyer.info> Dear users, developers and all people interested in semantic wikis, We are very happy to announce that early bird registration to the 12th Semantic MediaWiki Conference is now open! Important facts reminder: Dates: October 28th to October 30th 2015 (Wednesday to Friday) Location: Fabra i Coats, Art Factory. Carrer Sant Adrià 20 (Sant Andreu), Barcelona, Catalonia, Spain. Conference page: https://semantic-mediawiki.org/wiki/SMWCon_Fall_2015 Participants: Everybody interested in semantic wikis, especially in Semantic MediaWiki, e.g. users, developers, consultants, business representatives and researchers. We welcome new contributions from you: We encourage contributions about applications and development of semantic wikis; for a list of topics, see [1]. Please propose regular talks, posters or workshops on the conference website. We will do our best to consider your proposal in the conference program. An interesting variety of talks has already be proposed, see [2]. Presentations will generally be video and audio recorded and made available for others after the conference. If you've already announced your talk it's now time to expand its description. News on participation and tutorials: You can now officially register for the conference [3] and benefit from early bird fees until October 5, 2015. The tutorial program has been announced and made available [4]. Organization: Amical Wikimedia [5] and Open Semantic Data Association e. V. [6] have become the official organisers of SMWCon Fall 2015 Thanks to Institut de Cultura - Ajuntament de Barcelona [7] for providing free access to the conference location and its infrastructure If you have questions you can contact Lia Veja and Karsten Hoffmeyer (Program Chairs), Alina Mierluș (General Chair) or Toni Hermoso (Local Chair) per e-mail (Cc). We will be happy to see you in Barcelona! Lia Veja, Karsten Hoffmeyer (Program Board) [1] <http://semantic-mediawiki.org/wiki/SMWCon_Fall_2015/Announcement> [2] <https://semantic-mediawiki.org/wiki/SMWCon_Fall_2015#Program_proposals> [3] <https://ti.to/wikisofia/smwcon2015-fall> [4] <http://semantic-mediawiki.org/wiki/SMWCon_Fall_2015#Program> [5] <https://www.wikimedia.cat/> [6] <https://opensemanticdata.org/> [7] <http://lameva.barcelona.cat/barcelonacultura/en/> -- Dr. Cornelia Veja ---------------------------------------------- Deutsches Institut für Internationale Pädagogische Forschung (DIPF) Schlossstrasse 29; Room 309 60486 Frankfurt am Main Tel.: +49 (0)69 24708-703 E-Mail: veja(a)dipf.de Web: www.dipf.de; ----------------------------------------------- Iris SemData Consulting Ltd. 162/76, C. Brancusi Street 400462 Cluj-Napoca (Romania) Tel: +40-364-439-875 Mobile: +40-723-326-175 E-Mail: Cornelia.Veja(a)gmail.com Skype ID: liaveja

1 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Wikidata September 2015