Wikidata-tech September 2013

wikidata-tech@lists.wikimedia.org

15 participants
20 discussions

by Jeroen De Dauw

Hey Steffen and Andy, Continuing what I started on Twitter here, as some more characters might be helpful :) It seems that both our projects (FLOW3 and Wikidata) are in a similar situation. We are using Gerrit as CR tool, and TravisCI to run our tests. And we both want to have Travis run tests for all patchsets submitted to Gerrit, and then +1 or -1 on verified based on the build passing or failing. To what extend have you gotten such a thing to work on your project? Is there code available anywhere? If both projects can use the same code for this, I'd be happy to contribute to what you already have. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

9 years, 10 months

Fwd: Wikidata Toolkit: call for feedback/support

by Markus Krötzsch

Dear all, FYI: a project proposal for creating a software toolkit to work with Wikidata data in external tools [1]. Sent to wikidata-l, but maybe it's also interesting for those working with semantic data here. Feedback and support is welcome. Having email replies go to wikidata-l would be great, but don't let this hold you back if you're not on there. You can also post to the bottom of the wiki page [1]. Cheers, Markus [1] https://meta.wikimedia.org/wiki/Grants:IEG/Wikidata_Toolkit -------- Original Message -------- Subject: [Wikidata-l] Wikidata Toolkit: call for feedback/support Date: Sun, 29 Sep 2013 13:11:11 +0100 From: Markus Krötzsch <markus(a)semantic-mediawiki.org> Reply-To: Discussion list for the Wikidata project. <wikidata-l(a)lists.wikimedia.org> To: Discussion list for the Wikidata project. <wikidata-l(a)lists.wikimedia.org> Dear Wikidatanions (*), I have just drafted a little proposal for creating more tools for external people to work with Wikidata, especially to build services on top of its data [1]. Your feedback and support is needed. Idea: Currently, this is quite hard for people, since we only have WDA for reading/analysing dumps [2] and Wikidata Query as a single web service to ask queries [3]. We should have more support for programmers who want to load, query, analyse, and otherwise use the data. The proposal is to start such a toolkit to enable more work with the data. The plan is to kickstart this project with a small team using Wikimedia's Individual Engagement program. For this we will need your support -- feel free to add your voice to the wiki page [1]. Of course, comments of all sorts are also great -- this email thread will be linked from the page. If you would like to be involved with the project, that's great too; let me know and I can add you to the proposal. The proposal will already be submitted tomorrow, but support should also be possible after that, I hope. Cheers, Markus (*) Do we have a demonym yet? Wikipedian sounds natural, Wikidatan less so. Maybe this should be another thread ... ;-) [1] https://meta.wikimedia.org/wiki/Grants:IEG/Wikidata_Toolkit [2] http://github.com/mkroetzsch/wda [3] http://208.80.153.172/wdq/ -- Markus Kroetzsch, Departmental Lecturer Department of Computer Science, University of Oxford Room 306, Parks Road, OX1 3QD Oxford, United Kingdom +44 (0)1865 283529 http://korrekt.org/ _______________________________________________ Wikidata-l mailing list Wikidata-l(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l

10 years, 6 months

RFC: TitleValue

by Daniel Kinzler

Hi all! As discussed at the MediaWiki Architecture session at Wikimania, I have created an RFC for the TitleValue class, which could be used to replace the heavy-weight Title class in many places. The idea is to show case the advantages (and difficulties) of using true "value objects" as opposed to "active records". The idea being that hair should not know how to cut itself. You can find the proposal here: <https://www.mediawiki.org/wiki/Requests_for_comment/TitleValue> Any feedback would be greatly appreciated. -- daniel PS: I have included the some parts of the proposal below, to give a quick impression. ------------------------------------------------------------------------------ == Motivation == The old Title class is huge and has many dependencies. It relies on global state for things like namespace resolution and permission checks. It requires a database connection for caching. This makes it hard to use Title objects in a different context, such as unit tests. Which in turn makes it quite difficult to write any clean unit tests (not using any global state) for MediaWiki since Title objects are required as parameters by many classes. In a more fundamental sense, the fact that Title has so many dependencies, and everything that uses a Title object inherits all of these dependencies, means that the MediaWiki codebase as a whole has highly "tangled" dependencies, and it is very hard to use individual classes separately. Instead of trying to refactor and redefine the Title class, this proposal suggest to introduce an alternative class that can be used instead of Title object to represent the title of a wiki page. The implementation of the old Title class should be changed to rely on the new code where possible, but its interface and behavior should not change. == Architecture == The proposed architecture consists of three parts, initially: # The TitleValue class itself. As a value object, this has no knowledge about namespaces, permissions, etc. It does not support normalization either, since that would require knowledge about the local configuration. # A TitleParser service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a string into a TitleValue should require a TitleParser service as a constructor argument (dependency injection). Should that not be possible, a TitleParser can be obtained from a global registry. # A TitleFormatter service that has configuration knowledge about namespaces and normalization rules. Any class that needs to turn a TitleValue into a string should require a TitleFormatter service as a constructor argument (dependency injection). Should that not be possible, a TitleFormatter can be obtained from a global registry.

10 years, 7 months

The next problem with EntityId

by Jeroen De Dauw

Hey, With the recent EntityId refactoring, the assumption that these IDs consist out of a prefix (ie Q) and a numeric part (ie 42) can be removed from the EntityId class. Except that this assumption is still present in several other locations in the codebase. If we want to support entity ids that do not follow the prefix+numeric id schema (for instance using a filename or word as id), then we need to get rid of the occurrences of this assumption. This is not trivial to do however. The biggest hurdle is that we have a lot of entity ids stored in the db using 2 fields: entity type (string) and numeric id (int). The former field being mapped from the id prefix, and the later being the numeric part. As long as this is there, we need to be able to get a numeric part from the id object, and reconstruct id objects given a type and numeric part. Removing the format assumption would mean having the same entity type field, but rather then having an int field holding just part of the id, there would be a string field that holds the whole serialization. While it is technically not that hard to change this in the software, the size of the tables of Wikidata.org makes doing a rebuild somewhat hard. This means we essentially need to choose if we want to be able to, at any point, have entity ids that do not follow the prefix+number format. If this is not important, we can leave everything as it is, and keep the assumption around without feeling dirty about it. In that case we also accept the fact that if later on we want an id that does not follow this format, we'll be out of luck. If on the other hand we want to support ids with other formats, we need to start working on getting rid of the remaining assumptions on the old format, and accept that we'll need to write some rebuilding code for a (few) Wikidata.org table(s). It'd be good to have this decision sooner then later, as code touching places where such assumptions are located needs to oddly hold both possible decisions into account, while both typically suggest a quit different approach. Cheers -- Jeroen De Dauw http://www.bn2vs.com Don't panic. Don't be evil. ~=[,,_,,]:3 --

10 years, 7 months

FYI: Wikiquote/Wikidata proposal

by Lydia Pintscher

Heya folks :) FYI: Sannita and Nemo wrote a proposal for Wikiquote/Wikidata integration. https://www.wikidata.org/wiki/Wikidata:Wikiquote Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Technical Projects Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

10 years, 7 months

Want to join the Wikidata development team? Apply now!

by Lydia Pintscher

Heya folks :) Wikimedia Deutschland is looking for a software developer with a focus on frontend development with Java Script to work on Wikidata among other things. Details are here: http://wikimedia.de/wiki/Software_Developer_(f/m)_focus_on_Frontend_Develop… If you have any questions about the position please contact Abraham at abraham.taherivand(a)wikimedia.de Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Technical Projects Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

10 years, 7 months

Cirrus Search enabled on test.wikidata.org

by Katie Filbert

The new Elastic Search backend, Cirrus, has been enabled on test.wikidata.org Please try it out, give feedback and report bugs! https://bugzilla.wikimedia.org/enter_bug.cgi?product=MediaWiki%20extensions… Cirrus currently sorts results by number of incoming links from within the wiki. You can try searching for "Athens" as an example. Cheers, Katie -- Katie Filbert Wikidata Developer Wikimedia Germany e.V. | NEW: Obentrautstr. 72 | 10963 Berlin Phone (030) 219 158 26-0 http://wikimedia.de Wikimedia Germany - Society for the Promotion of free knowledge eV Entered in the register of Amtsgericht Berlin-Charlottenburg under the number 23 855 as recognized as charitable by the Inland Revenue for corporations I Berlin, tax number 27/681/51985.

10 years, 7 months

possible to edit item source on the test system

by Lydia Pintscher

Heya folks :) Items on test.wikidata.org have a working edit button that leads to editing the source of the page. We need to fix this before the next deployment. Anyone who can look into it? https://bugzilla.wikimedia.org/show_bug.cgi?id=54114 Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Community Communications for Technical Projects Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

10 years, 7 months

Re: [Wikidata-tech] [Wikidata-l] [Pywikipedia-l] wbsearchentities()

by Daniel Kinzler

Hi all! We have to impose a fixed limit on search result, since search results can not be ordered by a unique ID, so paging is expensive. The default for this limit is 50, but it SHOULD be 500 for bots. But the higher limit for bots is currently not applied by the wbsearchentities module - that's a bug, see <https://bugzilla.wikimedia.org/show_bug.cgi?id=54096>. We should be able to fix this soon. Please poke us again if nothing happens for a couple of weeks. -- daniel Am 12.09.2013 12:12, schrieb Merlijn van Deen: > On 11 September 2013 20:31, Chinmay Naik <chin.naik26(a)gmail.com> wrote: > >> Can i retreive more than 100 items using this? I notice the >> 'search-continue' returned by the search result disappears after 50 items. >> for ex >> https://www.wikidata.org/wiki/Special:ApiSandbox#action=wbsearchentities&fo… >> >> > The api docs at https://www.wikidata.org/w/api.php explicitly state the > highest value for 'continue' is 50: > > limit - Maximal number of results > The value must be between 0 and 50 > Default: 7 > continue - Offset where to continue a search > The value must be between 0 and 50 > Default: 0 > > which indeed suggests there is a hard limit of 100 entries. Maybe someone > in the Wikidata dev team can explain the reason behind this? > > Merlijn > > > > _______________________________________________ > Wikidata-l mailing list > Wikidata-l(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata-l >

10 years, 7 months

IRI, URL, sameAs and the "identifier mess"

by David Cuenca

A couple of months ago there was a conversation about what to do with the identifiers that should be "owl:sameAs" [1] Then there is another discussion about using a "formatter URL" property to use any catalogue/db as an id instead of creating a property [2] Now there is another property proposal to implement sameAs as a property taking a url. [3] And this is all related to the recent thread in this mailing list about IRI-value or string-value for URLs. So, in the end, what is the preferred approach? Micru [1] https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/07#Identif… [2] https://www.wikidata.org/wiki/Wikidata:Property_proposal/Unsorted#Formatter… [3] https://www.wikidata.org/wiki/Wikidata:Property_proposal/Generic#Same_as_.2…

10 years, 7 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Wikidata-tech September 2013