Hi,
I am not questioning or criticizing, just curious - why was it decided to implement lemmas as terms? I guess it is for code reuse purposes, but just wanted to ask.
Cheers, Denny
Am 02.11.2016 um 21:53 schrieb Denny Vrandečić:
Hi,
I am not questioning or criticizing, just curious - why was it decided to implement lemmas as terms? I guess it is for code reuse purposes, but just wanted to ask.
Yes, ideed. We have code for rendering, serializing, indexing, and searching Terms. We do not have any infrastructure for plain strings. We could also handle it as a monolingual-text StringValue, but that offers less re-use, in particular no search, and no batch lookup for rendering.
Also, conceptually, the lemma is rather similar to a label. And it's always *in* a language. The only question is whether we only have one, or multiple (for variants/scripts). But one will do for now.
Ah, great. Thanks for the answer! I'm glad to hear that it makes development faster :)
Will the forms be terms too? That *might* even be a solution for the multiscript issue.
On Wed, Nov 2, 2016, 14:42 Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 02.11.2016 um 21:53 schrieb Denny Vrandečić:
Hi,
I am not questioning or criticizing, just curious - why was it decided to implement lemmas as terms? I guess it is for code reuse purposes, but
just
wanted to ask.
Yes, ideed. We have code for rendering, serializing, indexing, and searching Terms. We do not have any infrastructure for plain strings. We could also handle it as a monolingual-text StringValue, but that offers less re-use, in particular no search, and no batch lookup for rendering.
Also, conceptually, the lemma is rather similar to a label. And it's always *in* a language. The only question is whether we only have one, or multiple (for variants/scripts). But one will do for now.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
An other genuine question: why having both the Term and the MonolingualText data structures? Is it just for historical reasons (labels have been introduced before statements and so before all the DataValue system) or is there an architectural reason behind?
Cheers,
Thomas
Le 3 nov. 2016 à 01:11, Denny Vrandečić vrandecic@gmail.com a écrit :
Ah, great. Thanks for the answer! I'm glad to hear that it makes development faster :)
Will the forms be terms too? That *might* even be a solution for the multiscript issue.
On Wed, Nov 2, 2016, 14:42 Daniel Kinzler daniel.kinzler@wikimedia.de wrote: Am 02.11.2016 um 21:53 schrieb Denny Vrandečić:
Hi,
I am not questioning or criticizing, just curious - why was it decided to implement lemmas as terms? I guess it is for code reuse purposes, but just wanted to ask.
Yes, ideed. We have code for rendering, serializing, indexing, and searching Terms. We do not have any infrastructure for plain strings. We could also handle it as a monolingual-text StringValue, but that offers less re-use, in particular no search, and no batch lookup for rendering.
Also, conceptually, the lemma is rather similar to a label. And it's always *in* a language. The only question is whether we only have one, or multiple (for variants/scripts). But one will do for now.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech _______________________________________________ Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
Tpt asked:
why having both the Term and the MonolingualText data structures? Is it just for historical reasons (labels have been introduced before statements and so before all the DataValue system) or is there an architectural reason behind?
That's not the only reason.
First, all data values (including monolingual text) must implement the same DataValue interface.
Term must not implement anything (it does implement Comparable for convenience).
All DataValues share the same abstract DataValueObject base class. The only reason for this is code sharing. No code should type hint against DataValueObject (I just checked and hurray, we are clean).
MonolingualTextValue could indeed share code with Term. But it's not possible to do "class MonolingualTextValue extends DataValueObject, Term" in PHP. We would need to drop the code sharing with DataValueObject and do "class MonolingualTextValue extends Term implements DataValue" instead, which means we would have to copy all the code from DataValueObject over to MonolingualTextValue. This is entirely possible, but what would be the actual advantage of such a change? Which code would benefit from being able to pass MonoLingualValue's to code that accepts Term's?
Best Thiemo
Am 11.11.2016 um 14:38 schrieb Thiemo Mättig:
Tpt asked:
why having both the Term and the MonolingualText data structures? Is it just for historical reasons (labels have been introduced before statements and so before all the DataValue system) or is there an architectural reason behind?
That's not the only reason.
Besides the code perspective that Thiemo just explained, there is also the conceptual perspective: Terms are editorial information attached to an entity for search and display. DataValues such as MonolingualText represent a value withing a Statement, citing an external authority. This leads to slight differences in behavior - for instance, the set of languages available for Terms is suptly different from the set of languages available for MonolongualText.
Anyway, the fact that the two are totally separate has historical reasons. One viable approach for code sharing would be to have MonolingualText contain a Term object. But that would introduce more coupling between our components. I don't think the little bit of code that could be shared is worth the effort.
Ok! Thank you very much for the answers!
Cheers,
Thomas
Le 11 nov. 2016 à 17:27, Daniel Kinzler daniel.kinzler@wikimedia.de a écrit :
Am 11.11.2016 um 14:38 schrieb Thiemo Mättig:
Tpt asked:
why having both the Term and the MonolingualText data structures? Is it just for historical reasons (labels have been introduced before statements and so before all the DataValue system) or is there an architectural reason behind?
That's not the only reason.
Besides the code perspective that Thiemo just explained, there is also the conceptual perspective: Terms are editorial information attached to an entity for search and display. DataValues such as MonolingualText represent a value withing a Statement, citing an external authority. This leads to slight differences in behavior - for instance, the set of languages available for Terms is suptly different from the set of languages available for MonolongualText.
Anyway, the fact that the two are totally separate has historical reasons. One viable approach for code sharing would be to have MonolingualText contain a Term object. But that would introduce more coupling between our components. I don't think the little bit of code that could be shared is worth the effort.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata-tech mailing list Wikidata-tech@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-tech
wikidata-tech@lists.wikimedia.org