On Mon, Aug 26, 2013 at 9:26 AM, Denny Vrandečić
> 2013/8/26 Rob Lanphier <robla(a)wikimedia.org>
>> On Fri, Aug 23, 2013 at 8:03 AM, Denny Vrandečić
>> <denny.vrandecic(a)wikimedia.de> wrote:
>> > for Semantic MediaWiki we had this already listed for a while here:
>> > <https://meta.wikimedia.org/wiki/Wikidata/Notes/SMW_and_Wikidata>
>> > The list is not completely up-to-date, it reflects the previous package
>> > structure.
>> It would be helpful if there was a wiki page that reflected the
>> current state of things. For MediaWiki core, we're trying to get into
>> the practice of having RFCs for changes of this magnitude, and
>> something of that level of rigor would be nice here too.
> The current state is described here:
> <http://www.mediawiki.org/wiki/Wikibase> , more specifically here:
> The new structure was described in Emails on Wikidata-tech, most recently
> We are indeed not yet using the RFC process, but I would prefer if we could
> agree to move to this process in the future, as this particular discussion
> is going on already a bit longer than what I would expect for a discussion
> regarding the question about how to organize code.
This is not just about how we organize code. This is about how we
package our work. It's unusual for us to have extensions with hard
dependencies on other extensions, let alone the complicated hierarchy
you all have chosen. It may be ok to do that, but we should discuss
it before setting the precedent.
This gets exposed to end users via Special:Version:
Since MediaWiki administrators often use this as a means of
understanding how to configure their wikis "like Wikipedia", it would
be nice if we didn't clutter that page up with a lot of the internals
of our systems. Each of the links should point to a page that does a
good job of describing what the extension does, and the vast majority
of them do. Unfortunately, for most of the Wikidata extensions right
now, the pages are pretty much boilerplaite plus a one-liner in many
Rather than get too far into the implementation details of what your
extensions are doing, I think maybe I'll hold off until someone on my
team has more time to think about this and comment on it.
After discussion with Danwe and Denny I decided to rewrite EntityId.
It was the highest risk class according to PHPUnit (290 CRAP) in the
DataModel component, which really is bad considering this is one of our
most basic classes, which a lot of other code binds itself to. The goal of
rewriting it is getting rid of needless complexity, needless bindings and
making non-problematic use of the class easier.
The rewrite currently includes:
* Dropping of id prefix configurability (as previously discussed)
* Separation of the id class and the entity id DataValue (EntityId no
longer derives from DataValue. A new EntityIdDataValue composites in a
* The EntityId class now has as fields entity type (string) and id
* New ItemId and PropertyId derivatives, which in the constructor only take
the id serialization. The long term goal is to make EntityId abstract
* Serialization format compatibility where required. The format of the
DataValue will remain the same for now
Comments on the work in progress are welcome:
It is possible this change will be done by tomorrow. More likely it'll only
be merged on Monday or later.
Jeroen De Dauw
Don't panic. Don't be evil. ~=[,,_,,]:3
Am 25.08.2013 19:19, schrieb Markus Krötzsch:
> If we have an IRI DV, considering that URLs are special IRIs, it seems clear
> that IRI would be the best way of storing them.
The best way of storing them really depends on the storage platform. It may be a
string or something else.
I think the real issue here is that we are exposing something that is really an
internal detail (the data value type) instead of the high level information we
actually should be exposing, namely property type.
I think splitting the two was a mistake, and I think exposing the DV type while
making the property type all but inaccessible makes things a lot worse.
In my opinion, data should be self-descriptive, so the *semantic* type of the
property should be included along with the value. People expect this, and assume
that this is what the DV type is. But it's not, and should not be used or abused
for this purpose.
Ideally, it should not matter at all to any 3rd party if use use a string or IRI
DV internally. The (semantic) property type would be URL, and that's all that
I'm quite unhappy about the current situation; we are beginning to see the
backlash of the decision not to include the property type inline. If we don't do
anything about this now, I fear the confusion is going to get worse.
Are there any objections against, or concerns with, dropping the
configurability of entity id prefixes? Hardcoding these in their associated
entity classes will make a lot of things easier.
To clarify, entity id prefixes are things such as the "Q" in "Q42" for item
ids or the "P" in "P1337" in property ids.
Jeroen De Dauw
Don't panic. Don't be evil. ~=[,,_,,]:3
I'd like to reiterate over the importance of using @covers tags  for
test cases rather then writing "Test for SomeClass". These tags make
coverage reporting a lot more accurate, and thus make it easier to spot
areas lacking tests.
Today I added a bunch of @covers tags in the DataModel component. The
reported coverage dropped from 85% to 57%, quite a difference.
Jeroen De Dauw
Don't panic. Don't be evil. ~=[,,_,,]:3
Forwarding to the Wikidata tech list in case this makes a future
Wiktionary collaboration easier.
-------- Original Message --------
Subject: [Wiki-research-l] Java-based Wiktionary Library (JWKTL) 1.0.0
released as open source software (Wiki-research-l Digest, Vol 96, Issue 22)
Date: Tue, 20 Aug 2013 14:20:56 +0000
From: Judith Eckle-Kohler <eckle-kohler(a)ukp.informatik.tu-darmstadt.de>
[Apologies for X-posting]
We are pleased to announce the release of the Java-based Wiktionary
Library (JWKTL) 1.0.0 - an application programming interface for Wiktionary.
Project homepage: http://code.google.com/p/jwktl/
== Overview ==
JWKTL (Java-based Wiktionary Library) is an application programming
interface for the free multilingual online dictionary Wiktionary
(http://www.wiktionary.org). JWKTL enables efficient and structured
access to the information encoded in the English, the German, and the
Russian Wiktionary language editions, including sense definitions, part
of speech tags, etymology, example sentences, translations, semantic
relations, and many other lexical information types. The Russian JWKTL
parser is based on Wikokit (http://code.google.com/p/wikokit/).
Prior to being available as open source software, JWKTL has been a
research project at the Ubiquitous Knowledge Processing (UKP) Lab of the
Technische Universität Darmstadt, Germany. The following people have
mainly contributed to this project: Yevgen Chebotar, Iryna Gurevych,
Christian M. Meyer, Christof Müller, Lizhen Qu, Torsten Zesch.
== Publications ==
A detailed description of Wiktionary and JWKTL is available in our
* Christian M. Meyer and Iryna Gurevych: Wiktionary: A new rival for
expert-built lexicons? Exploring the possibilities of collaborative
lexicography, Chapter 13 in S. Granger & M. Paquot (Eds.): Electronic
Lexicography, pp. 259291, Oxford: Oxford University Press, November
* Christian M. Meyer and Iryna Gurevych: OntoWiktionary Constructing
an Ontology from the Collaborative Online Dictionary Wiktionary, chapter
6 in M. T. Pazienza and A. Stellato (Eds.): Semi-Automatic Ontology
Development: Processes and Resources, pp. 131161, Hershey, PA: IGI
Global, February 2012.
* Torsten Zesch, Christof Müller, and Iryna Gurevych: Extracting Lexical
Semantic Knowledge from Wikipedia and Wiktionary, in: Proceedings of the
6th International Conference on Language Resources and Evaluation
(LREC), pp. 16461652, May 2008. Marrakech, Morocco.
== License and Availability ==
The latest version of JWKTL is available via Maven Central. If you use
Maven as your build tool, then you can add JWKTL as a dependency in your
JWKTL is available as open source software under the Apache License 2.0
(ASL). The software thus comes "as is" without any warranty (see license
text for more details). JWKTL makes use of Berkeley DB Java Edition
5.0.73 (Sleepycat License), Apache Ant 1.7.1 (ASL), Xerces 2.9.1 (ASL),
JUnit 4.10 (CPL).
Some classes have been taken from the Wikokit project (available under
multiple licenses, redistributed under the ASL license). See NOTICE.txt
for further details.
== Contact ==
Please direct any questions or suggestions to
Group E-Mail: jwktl-users(a)googlegroups.com
Christian M. Meyer
Christian M. Meyer, M.Sc.
Ubiquitous Knowledge Processing (UKP Lab)
FB 20 Computer Science Department
Technische Universität Darmstadt
Hochschulstr. 10, D-64289 Darmstadt, Germany
Phone [+49] (0)6151 16-5386, fax -5455, room S2/02/B113
Web Research at TU Darmstadt (WeRC)
---------- Forwarded message ----------
From: "Sumana Harihareswara" <sumanah(a)wikimedia.org>
Date: Aug 22, 2013 7:06 PM
Subject: Re: [Wikitech-l] unexpected error info in HTML
To: "Wikimedia developers" <wikitech-l(a)lists.wikimedia.org>, "Jiang BIAN" <
> On 08/01/2013 03:08 AM, Jiang BIAN <bianjiang(a)google.com> wrote:
> > Hi,
> > I noticed some pages we crawled containing error message like this;
> > <div id="mw-content-text" lang="zh-CN" dir="ltr"
> > class="error">Failed to render property P373:
> > Wikibase\LanguageWithConversion::factory: given languages do not have
> > same parent language</p>
> > But when I open the url in browser, there is no such message. And using
> > index.php can also get normal content without error messages.
> > Here are examples you can retry:
> > bad
> > $ wget 'http://zh.wikipedia.org/zh-cn/Google'
> > good
> > $ wget 'http://zh.wikipedia.org/w/index.php?title=Google'
> > Looks like something is wrong on Wikipedia side, anything need to fix?
> > Thanks
> I checked with Jiang Bian and found out that this is still happening --
> can anyone help Google out here? :-)
> Sumana Harihareswara
> Engineering Community Manager
> Wikimedia Foundation
> Wikitech-l mailing list
I have updated doxygen documentation for Wikibase and added it for some
related extensions. (some query-related components are still missing)
If this is useful, I can add remaining components, make it look nicer and
set it up to update regularly.
Wikimedia Germany e.V. | NEW: Obentrautstr. 72 | 10963 Berlin
Phone (030) 219 158 26-0
Wikimedia Germany - Society for the Promotion of free knowledge eV Entered
in the register of Amtsgericht Berlin-Charlottenburg under the number 23
855 as recognized as charitable by the Inland Revenue for corporations I
Berlin, tax number 27/681/51985.
For the unlikely case that some readers of this list aren't aware of
this yet: There are a lot of Wikidata bugs in bugzilla that could use
some help. They're marked with the keyword need-volunteer. You can see
them here: https://bugzilla.wikimedia.org/buglist.cgi?keywords=need-volunteer%2C%20&ke…
(Obviously you're welcome to work on other stuff too but if you don't
know what to put time into this is the perfect list.) If you're
working on any of those please leave a comment in the bug report so
others know you're on it. In case you're more into writing a bot then
http://www.wikidata.org/wiki/Wikidata:Bot_requests is the place to
If you need help here and the IRC channel #wikimedia-wikidata on
freenode is a good place to ask.
Lydia Pintscher - http://about.me/lydia.pintscher
Community Communications for Technical Projects
Wikimedia Deutschland e.V.
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.