Hi all,
We have a lot of statements saying that something is an instance of a "Wikipedia disambiguation page" (Q4167410). Unfortunately, this kind of information says something about a particular Wikipedia article in a particular language, and often is not true for other languages. Moreover, even if there is a language where the according article is marked as disambiguation page, it is still common that the page gives a description of a real item.
Example (bad use of instance of:Wikipedia disambiguation page) ==============================================================
https://www.wikidata.org/wiki/Q247819 (VW Polo)
Enwiki (like many languages) has a normal article here that is not a disambiguation page. It says "The Volkswagen Polo is a supermini car produced by the German manufacturer Volkswagen". That's very different from "The Volkswagen Polo is a disambiguation page."
Even Wikipedias where the VW-Polo article is marked as disambiguation page do not claim that the thing they are talking about is the disambiguation page. For instance, frwiki has the article in Catégorie:Homonymie, yet it says:
"Volkswagen Polo est une automobile, de la gamme des polyvalentes, de la marque allemande Volkswagen"
Again, it is not said that VW Polo is a disambiguation page, even though the page (not the car) is marked as one.
Proper use of instance of:Wikipedia disambiguation page =======================================================
Now there are also many proper disambiguation pages. They do not have a joint concept, other than the ambiguous title in a particular language.
Examples: https://en.wikipedia.org/wiki/Jaguar_(disambiguation) and, entertainingly: https://en.wikipedia.org/wiki/Disambiguation_(disambiguation)
An item that is "instance of:Wikipedia disambiguation page":
* should not have sitelinks to pages that are not disambiguation pages (an item can either be about a Wikipedia page or about a car, but these should be kept separate), * should always use the exact page title as the label (because this is the real label of the page; the page "Jaguar (disambiguation)" is not called "Jaguar" by anybody), * should hardly have any statements at all, since there is almost nothing that you can truthfully say about a group of pages in many different languages, and since we want to avoid project-specific statements (that's one reason we have badges as part of site links).
Whether disambiguation pages should have more than a single sitelink at all is another question. In my view, if we are talking about a "page", it is not the same page in French as it is in English (most properties that pages could naturally have, such as authors, language, creation date, etc. apply to a single page only). However, I can see that it is practical to group such pages nonetheless.
Conclusion ==========
It would be nice if somebody could analyse this problem in more detail (how many of our "disambiguation page" items have statements that are obviously not about a page but about a car make, animal, etc.). We might need some manual effort to clean this up (basically, a kind of un-merging game).
The immediate conclusion is that we need to be much more careful importing this type of information from one Wikipedia, since it is (by its very nature) not project-independent and not universal across languages.
Cheers,
Markus
Hoi, Markus, I am not surprised at all that such problems exist. The problem is inherent in the descriptions. They are added and made sense at that time. Now they do no longer apply because of statements made that ensure it is no longer a disambiguation page. These descriptions are not seen. You only see the descriptions in *YOUR* language.
The first obvious remedial task would be to remove all the texts in all languages where the "instance of" is different from "Wikimedia disambiguation page".
I am really happy that you notice the fragility of descriptions. Automated descriptions do not suffer from this..
Related to what you have noticed are the list articles. There is a policy where list articles are turned into singular and then describe whatever they are were list of. They are no longer list articles and the texts indicating they are is wrong as well. Thanks, GerardM
On 20 August 2014 11:40, Markus Krötzsch markus@semantic-mediawiki.org wrote:
Hi all,
We have a lot of statements saying that something is an instance of a "Wikipedia disambiguation page" (Q4167410). Unfortunately, this kind of information says something about a particular Wikipedia article in a particular language, and often is not true for other languages. Moreover, even if there is a language where the according article is marked as disambiguation page, it is still common that the page gives a description of a real item.
Example (bad use of instance of:Wikipedia disambiguation page)
https://www.wikidata.org/wiki/Q247819 (VW Polo)
Enwiki (like many languages) has a normal article here that is not a disambiguation page. It says "The Volkswagen Polo is a supermini car produced by the German manufacturer Volkswagen". That's very different from "The Volkswagen Polo is a disambiguation page."
Even Wikipedias where the VW-Polo article is marked as disambiguation page do not claim that the thing they are talking about is the disambiguation page. For instance, frwiki has the article in Catégorie:Homonymie, yet it says:
"Volkswagen Polo est une automobile, de la gamme des polyvalentes, de la marque allemande Volkswagen"
Again, it is not said that VW Polo is a disambiguation page, even though the page (not the car) is marked as one.
Proper use of instance of:Wikipedia disambiguation page
Now there are also many proper disambiguation pages. They do not have a joint concept, other than the ambiguous title in a particular language.
Examples: https://en.wikipedia.org/wiki/Jaguar_(disambiguation) and, entertainingly: https://en.wikipedia.org/wiki/Disambiguation_(disambiguation)
An item that is "instance of:Wikipedia disambiguation page":
- should not have sitelinks to pages that are not disambiguation pages (an
item can either be about a Wikipedia page or about a car, but these should be kept separate),
- should always use the exact page title as the label (because this is the
real label of the page; the page "Jaguar (disambiguation)" is not called "Jaguar" by anybody),
- should hardly have any statements at all, since there is almost nothing
that you can truthfully say about a group of pages in many different languages, and since we want to avoid project-specific statements (that's one reason we have badges as part of site links).
Whether disambiguation pages should have more than a single sitelink at all is another question. In my view, if we are talking about a "page", it is not the same page in French as it is in English (most properties that pages could naturally have, such as authors, language, creation date, etc. apply to a single page only). However, I can see that it is practical to group such pages nonetheless.
Conclusion
It would be nice if somebody could analyse this problem in more detail (how many of our "disambiguation page" items have statements that are obviously not about a page but about a car make, animal, etc.). We might need some manual effort to clean this up (basically, a kind of un-merging game).
The immediate conclusion is that we need to be much more careful importing this type of information from one Wikipedia, since it is (by its very nature) not project-independent and not universal across languages.
Cheers,
Markus
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l
Hi Gerard,
Actually I was not referring to Wikidata descriptions in this thread. That seems to be a simple problem to fix if one finds a way to fix the classification. The bigger problem here is that the data is inconsistent (claiming that VW Polo is a page that is produced by Volkswagen).
Markus
On 20.08.2014 12:31, Gerard Meijssen wrote:
Hoi, Markus, I am not surprised at all that such problems exist. The problem is inherent in the descriptions. They are added and made sense at that time. Now they do no longer apply because of statements made that ensure it is no longer a disambiguation page. These descriptions are not seen. You only see the descriptions in *YOUR* language.
The first obvious remedial task would be to remove all the texts in all languages where the "instance of" is different from "Wikimedia disambiguation page".
I am really happy that you notice the fragility of descriptions. Automated descriptions do not suffer from this..
Related to what you have noticed are the list articles. There is a policy where list articles are turned into singular and then describe whatever they are were list of. They are no longer list articles and the texts indicating they are is wrong as well. Thanks, GerardM
On 20 August 2014 11:40, Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org> wrote:
Hi all, We have a lot of statements saying that something is an instance of a "Wikipedia disambiguation page" (Q4167410). Unfortunately, this kind of information says something about a particular Wikipedia article in a particular language, and often is not true for other languages. Moreover, even if there is a language where the according article is marked as disambiguation page, it is still common that the page gives a description of a real item. Example (bad use of instance of:Wikipedia disambiguation page) ==============================__==============================__== https://www.wikidata.org/wiki/__Q247819 <https://www.wikidata.org/wiki/Q247819> (VW Polo) Enwiki (like many languages) has a normal article here that is not a disambiguation page. It says "The Volkswagen Polo is a supermini car produced by the German manufacturer Volkswagen". That's very different from "The Volkswagen Polo is a disambiguation page." Even Wikipedias where the VW-Polo article is marked as disambiguation page do not claim that the thing they are talking about is the disambiguation page. For instance, frwiki has the article in Catégorie:Homonymie, yet it says: "Volkswagen Polo est une automobile, de la gamme des polyvalentes, de la marque allemande Volkswagen" Again, it is not said that VW Polo is a disambiguation page, even though the page (not the car) is marked as one. Proper use of instance of:Wikipedia disambiguation page ==============================__========================= Now there are also many proper disambiguation pages. They do not have a joint concept, other than the ambiguous title in a particular language. Examples: https://en.wikipedia.org/wiki/__Jaguar_(disambiguation) <https://en.wikipedia.org/wiki/Jaguar_(disambiguation)> and, entertainingly: https://en.wikipedia.org/wiki/__Disambiguation_(__disambiguation) <https://en.wikipedia.org/wiki/Disambiguation_(disambiguation)> An item that is "instance of:Wikipedia disambiguation page": * should not have sitelinks to pages that are not disambiguation pages (an item can either be about a Wikipedia page or about a car, but these should be kept separate), * should always use the exact page title as the label (because this is the real label of the page; the page "Jaguar (disambiguation)" is not called "Jaguar" by anybody), * should hardly have any statements at all, since there is almost nothing that you can truthfully say about a group of pages in many different languages, and since we want to avoid project-specific statements (that's one reason we have badges as part of site links). Whether disambiguation pages should have more than a single sitelink at all is another question. In my view, if we are talking about a "page", it is not the same page in French as it is in English (most properties that pages could naturally have, such as authors, language, creation date, etc. apply to a single page only). However, I can see that it is practical to group such pages nonetheless. Conclusion ========== It would be nice if somebody could analyse this problem in more detail (how many of our "disambiguation page" items have statements that are obviously not about a page but about a car make, animal, etc.). We might need some manual effort to clean this up (basically, a kind of un-merging game). The immediate conclusion is that we need to be much more careful importing this type of information from one Wikipedia, since it is (by its very nature) not project-independent and not universal across languages. Cheers, Markus _________________________________________________ Wikidata-l mailing list Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org> https://lists.wikimedia.org/__mailman/listinfo/wikidata-l <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
Wikidata-l mailing list Wikidata-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-l