On 02/04/12 19:52, Gregor Hagedorn wrote:
I certainly appreciate your experience in system design, Markus! Still, I feel strongly about this so I poke another time. :-)
We should take care not to overrate this topic. There are hundreds of thousands of articles that have a unique, exact match between different language versions. Cities, countries, people, works of art, species, astronomic objects, chemical elements, car models, airports, ... I could continue forever -- all of these entities have a clear-cut agreed-upon identity that is not language dependent and that suffices for most purposes. I would go further and say that, if a concept has no such clear identity, then it is much less useful to store data about it. I am not at all concerned that different language versions of Wikipedia will use largely disjoint sets of Wikidata items due to small (but somehow essential) differences in meaning. It will happen, but usually for good reasons or as a temporary problem that can be addressed.
Please note that it does not matter if different communities have a different policy about what is written in an article about, say, a car model. Of course, two such articles will never be matching exactly, and always have a bit more or less information. However, for Wikidata it is only important that they are about the same car model. This will occur in a large number of cases.
Regards,
Markus
My analysis is that one of the two doors should closed in the first iteration. This iteration could start with a clean system, that is easily analyzed and has the potential of a better learning curve (it does not delegate the difficulty to decide which door to use to the user). If it turns out that both options of relations are needed, I believe it is possible to add them in the next iteration.
My interpretation of the options:
Solution 1 (my preference for the sake of simplicity):
A wikidata page reflects an independent entity that may be an exact, close, narrower or broader match with several Wikipedia language versions. That is, each Wikidata page has relations to 0-n Wikipedias, Wiktionaries, or Commons (NEW PROPOSAL to extent beyond Wikipedia alone), each of which is labeled by exact/close/narrower/broader match (SKOS vocabulary).
In addition (NEW PROPOSAL) it allows to express relations to other definitions outside of Wikipedias, Wiktionaries, Commons, esp. were the entity is complex to understand or map to Wikipedia entities. This would be a natural extension of the Wikipedias, Wiktionaries, or Commons to the entire semantic web.
An option to be researched over time would be, whether a "defining link" qualifier must be present on one relation, pointing either to a Wikipedia permalink to a given version or to an external permalink.
The great advantage of this system to me is that it can deal efficiently with data where an import is desired, but where the exact mapping to Wikipedia pages is difficult to ascertain.
Solution 2 (clear cut version of your present proposal, perhaps cleanest solution): ========
A wikidata page reflects 0-1 Wikipedias, Wiktionaries, or Commons pages. Relations such as exact/close/narrower/broader between the language versions (the interlanguage links) are stated only between Wikidata pages.
Advantage to me: Clearcut design. As Wikipedia pages develop and become broader or narrower in scope, only the relations between wikidata objects need to be changed. However, property data of the Wikidata page may become false as the linked Wikipedia page in a given language changes.
Solution 3 (your present preference to express relations with two structurally separate means): ======== A wikidata page reflects 0-n Wikipedias, Wiktionaries, or Commons pages. If two pages are an exact or close match, they are stored as multiple Wikipedia-Links to a single Wikidata page. A differentiation between close and exact match is not possible.
If a given language version of Wikipedia is sufficiently different from another one, it must be linked to a newly created, independent Wikidata pages. Relations between Wikidata pages can be qualified as narrower/broader (but not close or exact match, these are required to be given the same Wikidata page.
Advantage as I see them: None.
Potential problems with solution 3:
- The user is expected to use two widely different actions depending
on whether two language versions are sufficiently closely matching or not. The actions are structurally different, and it is unlikely that this can be hidden by the user interface (because wikidata page object creation and deletions are involved) The burden of the decision where to draw the line is left to the user community. Revisions of this decision within the community require creating or deleting Wikidata objects, and are therefore likely to be difficult to make transparent.
- Scenario: If two Wikipedia language versions describe more or less
the same abstract object, but one is later revised to be more narrow, the other more broadly, a careful study of the changes of the revisions since the creation of the wikidata page object is required, to decide which Wikidata page remains linked to a Wikipedia page, and for which revision a new one must be created. Or whether perhaps two new ones must be created?
apologies for pestering you with this...
Gregor
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l