Hi all,
I just noticed that we have a number of "orphaned items" which were created and imported from some Wikipedia article that then got deleted. The result is an item with almost no data, no sitelinks, and all references claiming "imported from X Wikipedia".
Example: https://www.wikidata.org/wiki/Q9386774
Here is what happened: https://www.wikidata.org/w/index.php?title=Q9386774&action=history
It would be good to have a process for dealing with such cases. I am not saying that we must delete such items immediately, but it seems obvious that they need some special attention to become self-sustaining even without Wikipedia articles associated.
Things that would be important to keep such items: * Links to other external datasets that confirm the existence of the thing. * Links to authoritative web sites that confirm the existence of the thing. * Proper references for all data (we always want that, but here it's even more critical: "imported from Wikipedia" is never great, but at least it leaves some hope of finding proper references if the Wikipedia page still exists).
In cases like the above, deletion seems to be the most reasonable solution (the little data that is there can easily be added again if needed in the future). It seems that one could automatically collect such candidates for deletion (pages that are not used as property values, have no site links, have no identifier properties, were not edited since more than a month, an have less than, say, ten properties+labels+descriptions).
Regards,
Markus
(in this case, it appears to be the "castle of Żagań", once located in https://en.wikipedia.org/wiki/%C5%BBaga%C5%84 )
On Fri, May 29, 2015 at 12:24 PM Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
Hi all,
I just noticed that we have a number of "orphaned items" which were created and imported from some Wikipedia article that then got deleted. The result is an item with almost no data, no sitelinks, and all references claiming "imported from X Wikipedia".
Example: https://www.wikidata.org/wiki/Q9386774
Here is what happened: https://www.wikidata.org/w/index.php?title=Q9386774&action=history
It would be good to have a process for dealing with such cases. I am not saying that we must delete such items immediately, but it seems obvious that they need some special attention to become self-sustaining even without Wikipedia articles associated.
Things that would be important to keep such items:
- Links to other external datasets that confirm the existence of the thing.
- Links to authoritative web sites that confirm the existence of the thing.
- Proper references for all data (we always want that, but here it's
even more critical: "imported from Wikipedia" is never great, but at least it leaves some hope of finding proper references if the Wikipedia page still exists).
In cases like the above, deletion seems to be the most reasonable solution (the little data that is there can easily be added again if needed in the future). It seems that one could automatically collect such candidates for deletion (pages that are not used as property values, have no site links, have no identifier properties, were not edited since more than a month, an have less than, say, ten properties+labels+descriptions).
Regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 29 May 2015 at 12:39, Magnus Manske magnusmanske@googlemail.com wrote:
(in this case, it appears to be the "castle of Żagań", once located in https://en.wikipedia.org/wiki/%C5%BBaga%C5%84 )
This:
http://www.poland.travel/en/gallery/palace-in-zagan ?
Not quite sure...
On Fri, May 29, 2015 at 1:33 PM Andy Mabbett andy@pigsonthewing.org.uk wrote:
On 29 May 2015 at 12:39, Magnus Manske magnusmanske@googlemail.com wrote:
(in this case, it appears to be the "castle of Żagań", once located in https://en.wikipedia.org/wiki/%C5%BBaga%C5%84 )
This:
http://www.poland.travel/en/gallery/palace-in-zagan ?
-- Andy Mabbett @pigsonthewing http://pigsonthewing.org.uk
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
The problem that users face is that they experience the merging of items to difficult or didn't know that that was possible. They understand (with much annoyance) that they can only add a sitelink to one item. Therefore they delete a sitelink on one item, and add it to another item.
Personally I think that an afterwards merge would be recommended here. Would it be possible to have a bot 1. determine what the original sitelink was that has been removed from the item, 2. see if this sitelink is added on another item, 3. check if the statements of both items match (otherwise: a list for humans/tool to check if it is the same), 4. if the same: automatically merge both items.
I think it would be good to have more things being automated as much as possible.
Romaine
2015-05-29 13:23 GMT+02:00 Markus Krötzsch markus@semantic-mediawiki.org:
Hi all,
I just noticed that we have a number of "orphaned items" which were created and imported from some Wikipedia article that then got deleted. The result is an item with almost no data, no sitelinks, and all references claiming "imported from X Wikipedia".
Example: https://www.wikidata.org/wiki/Q9386774
Here is what happened: https://www.wikidata.org/w/index.php?title=Q9386774&action=history
It would be good to have a process for dealing with such cases. I am not saying that we must delete such items immediately, but it seems obvious that they need some special attention to become self-sustaining even without Wikipedia articles associated.
Things that would be important to keep such items:
- Links to other external datasets that confirm the existence of the thing.
- Links to authoritative web sites that confirm the existence of the thing.
- Proper references for all data (we always want that, but here it's even
more critical: "imported from Wikipedia" is never great, but at least it leaves some hope of finding proper references if the Wikipedia page still exists).
In cases like the above, deletion seems to be the most reasonable solution (the little data that is there can easily be added again if needed in the future). It seems that one could automatically collect such candidates for deletion (pages that are not used as property values, have no site links, have no identifier properties, were not edited since more than a month, an have less than, say, ten properties+labels+descriptions).
Regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 29.05.2015 13:42, Romaine Wiki wrote:
The problem that users face is that they experience the merging of items to difficult or didn't know that that was possible. They understand (with much annoyance) that they can only add a sitelink to one item. Therefore they delete a sitelink on one item, and add it to another item.
Personally I think that an afterwards merge would be recommended here. Would it be possible to have a bot 1. determine what the original sitelink was that has been removed from the item, 2. see if this sitelink is added on another item, 3. check if the statements of both items match (otherwise: a list for humans/tool to check if it is the same), 4. if the same: automatically merge both items.
I think it would be good to have more things being automated as much as possible.
That's an important situation too, but I think in the example I gave something else happened: the sitelink was not moved, but the Wikipedia article that it was pointing to got deleted. So it's not just the link that vanished: all information about the item that might have been found on the deleted Wikipedia page is also gone. It's therefore quite hard to find out what the item might have been about.
Regards,
Markus
2015-05-29 13:23 GMT+02:00 Markus Krötzsch <markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
Hi all, I just noticed that we have a number of "orphaned items" which were created and imported from some Wikipedia article that then got deleted. The result is an item with almost no data, no sitelinks, and all references claiming "imported from X Wikipedia". Example: https://www.wikidata.org/wiki/Q9386774 Here is what happened: https://www.wikidata.org/w/index.php?title=Q9386774&action=history It would be good to have a process for dealing with such cases. I am not saying that we must delete such items immediately, but it seems obvious that they need some special attention to become self-sustaining even without Wikipedia articles associated. Things that would be important to keep such items: * Links to other external datasets that confirm the existence of the thing. * Links to authoritative web sites that confirm the existence of the thing. * Proper references for all data (we always want that, but here it's even more critical: "imported from Wikipedia" is never great, but at least it leaves some hope of finding proper references if the Wikipedia page still exists). In cases like the above, deletion seems to be the most reasonable solution (the little data that is there can easily be added again if needed in the future). It seems that one could automatically collect such candidates for deletion (pages that are not used as property values, have no site links, have no identifier properties, were not edited since more than a month, an have less than, say, ten properties+labels+descriptions). Regards, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I think this is a problem with the current workflow for creating articles, which starts with Wikipedia, then finishes with Wikidata, though it should probably be the other way around. This problem will eventually solve itself, given enough time, although I believe we still have lots of poorly documented images on Commons that are currently being cleaned up that date back to the early days of Commons, when people uploaded images there because "they had to" but didn't spend any time on the meta data there and stuffed it all into the Wikipedia articles they linked the image to. Since then, lots of that metadata has found it's way back to the images on Commons, where it should have been added in the first place. It may seem like double work, but it is necessary due to lack of proper tools to automate it. Right now there is a lot of double work needing to be done in Wikidata as people create articles, and this can only be done by copying most of the information in the leading paragraph to various statements on Wikidata. This can be both annoying and confusing.
I think the idea of deletion with 0-2 statements is OK, but 10 statements? With 10 statements there must be something salvageable, no?
On Fri, May 29, 2015 at 3:20 PM, Markus Krötzsch < markus@semantic-mediawiki.org> wrote:
On 29.05.2015 13:42, Romaine Wiki wrote:
The problem that users face is that they experience the merging of items to difficult or didn't know that that was possible. They understand (with much annoyance) that they can only add a sitelink to one item. Therefore they delete a sitelink on one item, and add it to another item.
Personally I think that an afterwards merge would be recommended here. Would it be possible to have a bot 1. determine what the original sitelink was that has been removed from the item, 2. see if this sitelink is added on another item, 3. check if the statements of both items match (otherwise: a list for humans/tool to check if it is the same), 4. if the same: automatically merge both items.
I think it would be good to have more things being automated as much as possible.
That's an important situation too, but I think in the example I gave something else happened: the sitelink was not moved, but the Wikipedia article that it was pointing to got deleted. So it's not just the link that vanished: all information about the item that might have been found on the deleted Wikipedia page is also gone. It's therefore quite hard to find out what the item might have been about.
Regards,
Markus
2015-05-29 13:23 GMT+02:00 Markus Krötzsch
<markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
Hi all, I just noticed that we have a number of "orphaned items" which were created and imported from some Wikipedia article that then got deleted. The result is an item with almost no data, no sitelinks, and all references claiming "imported from X Wikipedia". Example: https://www.wikidata.org/wiki/Q9386774 Here is what happened: https://www.wikidata.org/w/index.php?title=Q9386774&action=history It would be good to have a process for dealing with such cases. I am not saying that we must delete such items immediately, but it seems obvious that they need some special attention to become self-sustaining even without Wikipedia articles associated. Things that would be important to keep such items: * Links to other external datasets that confirm the existence of the thing. * Links to authoritative web sites that confirm the existence of the thing. * Proper references for all data (we always want that, but here it's even more critical: "imported from Wikipedia" is never great, but at least it leaves some hope of finding proper references if the Wikipedia page still exists). In cases like the above, deletion seems to be the most reasonable solution (the little data that is there can easily be added again if needed in the future). It seems that one could automatically collect such candidates for deletion (pages that are not used as property values, have no site links, have no identifier properties, were not edited since more than a month, an have less than, say, ten properties+labels+descriptions). Regards, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Markus,
Indeed yes, that is also an issue. It can happen with new articles and with older articles.
Some articles get deleted as they are a duplicate of another article, or worse written (to bad to keep), or not an encyclopaedic subject to have in an encyclopaedia.
Every day, on nl-wiki we check new articles if they are connected on Wikidata. Almost all articles that have a template that marks it as nominated for deletion we ignore and we do not add them to Wikidata. On nl-wiki we do this by hand, to make sure all basic statements are added, but if this is done by bots, you get a situation that they may not check for templates that mark articles for deletion.
If an deleted item has statements, the question is if this information is at itself valuable to keep to be used and/or for the future.
Romaine
2015-05-29 15:20 GMT+02:00 Markus Krötzsch markus@semantic-mediawiki.org:
On 29.05.2015 13:42, Romaine Wiki wrote:
The problem that users face is that they experience the merging of items to difficult or didn't know that that was possible. They understand (with much annoyance) that they can only add a sitelink to one item. Therefore they delete a sitelink on one item, and add it to another item.
Personally I think that an afterwards merge would be recommended here. Would it be possible to have a bot 1. determine what the original sitelink was that has been removed from the item, 2. see if this sitelink is added on another item, 3. check if the statements of both items match (otherwise: a list for humans/tool to check if it is the same), 4. if the same: automatically merge both items.
I think it would be good to have more things being automated as much as possible.
That's an important situation too, but I think in the example I gave something else happened: the sitelink was not moved, but the Wikipedia article that it was pointing to got deleted. So it's not just the link that vanished: all information about the item that might have been found on the deleted Wikipedia page is also gone. It's therefore quite hard to find out what the item might have been about.
Regards,
Markus
2015-05-29 13:23 GMT+02:00 Markus Krötzsch
<markus@semantic-mediawiki.org mailto:markus@semantic-mediawiki.org>:
Hi all, I just noticed that we have a number of "orphaned items" which were created and imported from some Wikipedia article that then got deleted. The result is an item with almost no data, no sitelinks, and all references claiming "imported from X Wikipedia". Example: https://www.wikidata.org/wiki/Q9386774 Here is what happened: https://www.wikidata.org/w/index.php?title=Q9386774&action=history It would be good to have a process for dealing with such cases. I am not saying that we must delete such items immediately, but it seems obvious that they need some special attention to become self-sustaining even without Wikipedia articles associated. Things that would be important to keep such items: * Links to other external datasets that confirm the existence of the thing. * Links to authoritative web sites that confirm the existence of the thing. * Proper references for all data (we always want that, but here it's even more critical: "imported from Wikipedia" is never great, but at least it leaves some hope of finding proper references if the Wikipedia page still exists). In cases like the above, deletion seems to be the most reasonable solution (the little data that is there can easily be added again if needed in the future). It seems that one could automatically collect such candidates for deletion (pages that are not used as property values, have no site links, have no identifier properties, were not edited since more than a month, an have less than, say, ten properties+labels+descriptions). Regards, Markus _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
On 29.05.2015 17:05, Romaine Wiki wrote:
Hi Markus,
Indeed yes, that is also an issue. It can happen with new articles and with older articles.
Some articles get deleted as they are a duplicate of another article, or worse written (to bad to keep), or not an encyclopaedic subject to have in an encyclopaedia.
Every day, on nl-wiki we check new articles if they are connected on Wikidata. Almost all articles that have a template that marks it as nominated for deletion we ignore and we do not add them to Wikidata. On nl-wiki we do this by hand, to make sure all basic statements are added, but if this is done by bots, you get a situation that they may not check for templates that mark articles for deletion.
If an deleted item has statements, the question is if this information is at itself valuable to keep to be used and/or for the future.
Romaine
On 2015-05-29 17:42, Markus Krötzsch wrote:
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
Actually, we already have tools designed by Pasleim to track such items:
https://www.wikidata.org/wiki/User:Pasleim/notability
https://www.wikidata.org/wiki/User:Pasleim/Items_for_deletion/Almost_empty
I usually check that there are no backlinks, provided there are none check the history, and if it turns out the item is empty because of a non-automated merge I merge it, and if it is empty because the only interwiki link was deleted on the project I delete it as non-notable.
The problems are often items which never had any links. Many of them are spam, but some of them can be used for structural needs and can be kept. It is not always easy to figure out in practice, especially if they are in non-Latin and non-Cyrillic alphabets.
Cheers Yaroslav
Yaroslav thanks for posting - I had no idea. Thanks for your work on this too
On Sat, May 30, 2015 at 10:05 AM, Yaroslav M. Blanter putevod@mccme.ru wrote:
On 2015-05-29 17:42, Markus Krötzsch wrote:
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
Actually, we already have tools designed by Pasleim to track such items:
https://www.wikidata.org/wiki/User:Pasleim/notability
https://www.wikidata.org/wiki/User:Pasleim/Items_for_deletion/Almost_empty
I usually check that there are no backlinks, provided there are none check the history, and if it turns out the item is empty because of a non-automated merge I merge it, and if it is empty because the only interwiki link was deleted on the project I delete it as non-notable.
The problems are often items which never had any links. Many of them are spam, but some of them can be used for structural needs and can be kept. It is not always easy to figure out in practice, especially if they are in non-Latin and non-Cyrillic alphabets.
Cheers Yaroslav
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi!
The problems are often items which never had any links. Many of them are spam, but some of them can be used for structural needs and can be kept.
If they are structural (like wikidata-only classes, etc.) shouldn't they have some incoming links? After all, structure is supposed to be used for something...
Hoi, How about people who have received an award and complete the list of people who were awarded ? How about people who had a position and complete the list of people who held that position ? How about people who are parents between a famous grandfather and a famous grandchild ...
There are so many possibilities.. The thing is Wikipedia is not really the yard stick of what is relevant in Wikidata Thanks, GerardM
On 30 May 2015 at 20:42, Stas Malyshev smalyshev@wikimedia.org wrote:
Hi!
The problems are often items which never had any links. Many of them are spam, but some of them can be used for structural needs and can be kept.
If they are structural (like wikidata-only classes, etc.) shouldn't they have some incoming links? After all, structure is supposed to be used for something...
-- Stas Malyshev smalyshev@wikimedia.org
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 30.05.2015 um 21:12 schrieb Gerard Meijssen:
Hoi, How about people who have received an award and complete the list of people who were awarded ? How about people who had a position and complete the list of people who held that position ? How about people who are parents between a famous grandfather and a famous grandchild ...
In such a case, they would either have a statement stating the award/position, or have incoming, because they are used on statements on other items, e.g. as the father or child.
If they have no statements, and are not used in statements, I do not see how they could be structurally significant.
I think the key issue here is findability, as Yaroslav pointed out. If the incoming links should be there but are not (yet) there, then deletion is probably best, since anyone needing those items will probably create a double anyway, as the item's findability is zero.
On Sun, May 31, 2015 at 11:28 AM, Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
Am 30.05.2015 um 21:12 schrieb Gerard Meijssen:
Hoi, How about people who have received an award and complete the list of
people who
were awarded ? How about people who had a position and complete the list
of
people who held that position ? How about people who are parents between
a
famous grandfather and a famous grandchild ...
In such a case, they would either have a statement stating the award/position, or have incoming, because they are used on statements on other items, e.g. as the father or child.
If they have no statements, and are not used in statements, I do not see how they could be structurally significant.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Every item can easily be found. Many Wikipedias have "Wikidata search" enabled. Some have it enabled on en.wp like For many humans I have added statements like date of death and date of birth. When I cannot disambiguate properly I often add statements to make it easy understand who is who.
Given the vagaries of Wikipedia notability I absolutely do not believe in giving primacy to whatever is said elsewhere. What I do know is that I prefer search in Reaonator. Thanks, GerardM
On 31 May 2015 at 13:52, Jane Darnell jane023@gmail.com wrote:
I think the key issue here is findability, as Yaroslav pointed out. If the incoming links should be there but are not (yet) there, then deletion is probably best, since anyone needing those items will probably create a double anyway, as the item's findability is zero.
On Sun, May 31, 2015 at 11:28 AM, Daniel Kinzler < daniel.kinzler@wikimedia.de> wrote:
Am 30.05.2015 um 21:12 schrieb Gerard Meijssen:
Hoi, How about people who have received an award and complete the list of
people who
were awarded ? How about people who had a position and complete the
list of
people who held that position ? How about people who are parents
between a
famous grandfather and a famous grandchild ...
In such a case, they would either have a statement stating the award/position, or have incoming, because they are used on statements on other items, e.g. as the father or child.
If they have no statements, and are not used in statements, I do not see how they could be structurally significant.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 31.05.2015 11:28, Daniel Kinzler wrote:
Am 30.05.2015 um 21:12 schrieb Gerard Meijssen:
Hoi, How about people who have received an award and complete the list of people who were awarded ? How about people who had a position and complete the list of people who held that position ? How about people who are parents between a famous grandfather and a famous grandchild ...
In such a case, they would either have a statement stating the award/position, or have incoming, because they are used on statements on other items, e.g. as the father or child.
If they have no statements, and are not used in statements, I do not see how they could be structurally significant.
Interesting example. Just having an award (but no incoming statements or sitelinks) might not be enough. It would just tell us "somebody received the award". We need some statements/sitelinks/descriptions that tell us who exactly that person was.
Jane proposed a good benchmark question: do we have enough information about the item to detect and merge duplicates more or less automatically? Items where this is not the case should receive special attention -- and be either stabilised or deleted eventually. For persons (P31:Q5), the name (label) can go a long way to identify items. Awards are probably too weak to integrate information over (even specific things like "the 1981 Nobel prize in Chemistry" might not have a unique award winner; and the absence of a Nobel prize will not be noticed as an incompleteness for a person, so an item about the same person that misses the award statement will not be detected as duplicate).
Regards,
Markus
Hoi, When someone or something received an award, it is needed if only to complete the list of recipients of that award.. There is no benchmark for enough information. The notion that you a Nobel award winner is not relevant is poppycock. With automated descriptions awards do show.
When you only considered the current sub par fixed descriptions you lose out big time. Thanks, GerardM
On 31 May 2015 at 15:01, Markus Krötzsch markus@semantic-mediawiki.org wrote:
On 31.05.2015 11:28, Daniel Kinzler wrote:
Am 30.05.2015 um 21:12 schrieb Gerard Meijssen:
Hoi, How about people who have received an award and complete the list of people who were awarded ? How about people who had a position and complete the list of people who held that position ? How about people who are parents between a famous grandfather and a famous grandchild ...
In such a case, they would either have a statement stating the award/position, or have incoming, because they are used on statements on other items, e.g. as the father or child.
If they have no statements, and are not used in statements, I do not see how they could be structurally significant.
Interesting example. Just having an award (but no incoming statements or sitelinks) might not be enough. It would just tell us "somebody received the award". We need some statements/sitelinks/descriptions that tell us who exactly that person was.
Jane proposed a good benchmark question: do we have enough information about the item to detect and merge duplicates more or less automatically? Items where this is not the case should receive special attention -- and be either stabilised or deleted eventually. For persons (P31:Q5), the name (label) can go a long way to identify items. Awards are probably too weak to integrate information over (even specific things like "the 1981 Nobel prize in Chemistry" might not have a unique award winner; and the absence of a Nobel prize will not be noticed as an incompleteness for a person, so an item about the same person that misses the award statement will not be detected as duplicate).
Regards,
Markus
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen:
Hoi, When someone or something received an award, it is needed if only to complete the list of recipients of that award.. There is no benchmark for enough information. The notion that you a Nobel award winner is not relevant is poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a label, we already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is not what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen:
Hoi, When someone or something received an award, it is needed if only to
complete
the list of recipients of that award.. There is no benchmark for enough information. The notion that you a Nobel award winner is not relevant is poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a label, we already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
And that helps me how? Most awards have been won by more than one person. If I know that Q99999 has won the Nobel Prize in literature, and I know a fact about Patrick Modiano, should I add that fact to Q99999 or should I create a new item?
André
On Sun, May 31, 2015 at 5:23 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is not what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen:
Hoi, When someone or something received an award, it is needed if only to complete the list of recipients of that award.. There is no benchmark for enough information. The notion that you a Nobel award winner is not relevant is poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a label, we already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Not enough data. Q99999 may have a label that is "Patrick Modiano".. your first challenge is to find out that your Patrick Modiano is indeed that particular one. Given that you know what award was won, you have a start. Thanks, GerardM
On 31 May 2015 at 17:40, Andre Engels andreengels@gmail.com wrote:
And that helps me how? Most awards have been won by more than one person. If I know that Q99999 has won the Nobel Prize in literature, and I know a fact about Patrick Modiano, should I add that fact to Q99999 or should I create a new item?
André
On Sun, May 31, 2015 at 5:23 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is
not
what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de
wrote:
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen:
Hoi, When someone or something received an award, it is needed if only to complete the list of recipients of that award.. There is no benchmark for
enough
information. The notion that you a Nobel award winner is not relevant
is
poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not when
or
which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a label,
we
already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
And what if Q99999 does not have a label? How am I going from the information "Q99999 won the Nobel Prize in Literature" to "Q99999 is/is not Patrick Modiano"?
André
On Sun, May 31, 2015 at 6:29 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Not enough data. Q99999 may have a label that is "Patrick Modiano".. your first challenge is to find out that your Patrick Modiano is indeed that particular one. Given that you know what award was won, you have a start. Thanks, GerardM
On 31 May 2015 at 17:40, Andre Engels andreengels@gmail.com wrote:
And that helps me how? Most awards have been won by more than one person. If I know that Q99999 has won the Nobel Prize in literature, and I know a fact about Patrick Modiano, should I add that fact to Q99999 or should I create a new item?
André
On Sun, May 31, 2015 at 5:23 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is not what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen:
Hoi, When someone or something received an award, it is needed if only to complete the list of recipients of that award.. There is no benchmark for enough information. The notion that you a Nobel award winner is not relevant is poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a label, we already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi When you look at the statistics, you will find that we aggressively pursue the inclusion of labels. When there is no label in your language it is tough. When you use Reasonator there is no problem; you will always see a label in whatever language is available.
My problem is that we know that the prize was won. The item is likely to have a label. For the rest ... as they say in double Dutch.. "search it but out". Thanks, GerardM
On 31 May 2015 at 18:40, Andre Engels andreengels@gmail.com wrote:
And what if Q99999 does not have a label? How am I going from the information "Q99999 won the Nobel Prize in Literature" to "Q99999 is/is not Patrick Modiano"?
André
On Sun, May 31, 2015 at 6:29 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Not enough data. Q99999 may have a label that is "Patrick Modiano".. your first challenge is to find out that your Patrick Modiano is indeed that particular one. Given that you know what award was won, you have a start. Thanks, GerardM
On 31 May 2015 at 17:40, Andre Engels andreengels@gmail.com wrote:
And that helps me how? Most awards have been won by more than one person. If I know that Q99999 has won the Nobel Prize in literature, and I know a fact about Patrick Modiano, should I add that fact to Q99999 or should I create a new item?
André
On Sun, May 31, 2015 at 5:23 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on
the
award. Having all these awardees on the article is not so great, it is not what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen:
Hoi, When someone or something received an award, it is needed if only
to
complete the list of recipients of that award.. There is no benchmark for enough information. The notion that you a Nobel award winner is not
relevant
is poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not
when
or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a
label,
we already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
And when you look at the discussion, you see that the message that you are referring to said:
" If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to."
The only thing I can conclude is that you are against the removal of items without a label because they probably do have a label. Which in my opinion is UTTER BOLLOCKS.
To get back to
On Sun, May 31, 2015 at 6:46 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi When you look at the statistics, you will find that we aggressively pursue the inclusion of labels. When there is no label in your language it is tough. When you use Reasonator there is no problem; you will always see a label in whatever language is available.
My problem is that we know that the prize was won. The item is likely to have a label. For the rest ... as they say in double Dutch.. "search it but out". Thanks, GerardM
On 31 May 2015 at 18:40, Andre Engels andreengels@gmail.com wrote:
And what if Q99999 does not have a label? How am I going from the information "Q99999 won the Nobel Prize in Literature" to "Q99999 is/is not Patrick Modiano"?
André
On Sun, May 31, 2015 at 6:29 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Not enough data. Q99999 may have a label that is "Patrick Modiano".. your first challenge is to find out that your Patrick Modiano is indeed that particular one. Given that you know what award was won, you have a start. Thanks, GerardM
On 31 May 2015 at 17:40, Andre Engels andreengels@gmail.com wrote:
And that helps me how? Most awards have been won by more than one person. If I know that Q99999 has won the Nobel Prize in literature, and I know a fact about Patrick Modiano, should I add that fact to Q99999 or should I create a new item?
André
On Sun, May 31, 2015 at 5:23 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is not what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de wrote:
Am 31.05.2015 um 15:21 schrieb Gerard Meijssen: > Hoi, > When someone or something received an award, it is needed if only > to > complete > the list of recipients of that award.. There is no benchmark for > enough > information. The notion that you a Nobel award winner is not > relevant > is > poppycock. With automated descriptions awards do show.
If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to.
That is what markus is talking about. For people, if there is a label, we already have pretty good info. But if there is no label, we have a problem, and if there isn't any other identifying info,m the item is useless.
-- Daniel Kinzler Senior Software Developer
Wikimedia Deutschland Gesellschaft zur Förderung Freien Wissens e.V.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I hit 'send' by accident. What I wanted to say is:
To get back to a hopefully more fruitful discussion: My opinion is that an item can be deleted if it cannot be determined by a reasonably knowledgeable person whether or not the item is about a given person/place/subject/whatever. Do you or do you not agree?
André Engels
On Sun, May 31, 2015 at 7:58 PM, Andre Engels andreengels@gmail.com wrote:
And when you look at the discussion, you see that the message that you are referring to said:
" If you have an item that says someone whon a nobel prize, but not when or which, and also does *noit* have a label, that items is quite useless; it'S impossible to tell which person it is even referring to."
The only thing I can conclude is that you are against the removal of items without a label because they probably do have a label. Which in my opinion is UTTER BOLLOCKS.
To get back to
On Sun, May 31, 2015 at 6:46 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi When you look at the statistics, you will find that we aggressively pursue the inclusion of labels. When there is no label in your language it is tough. When you use Reasonator there is no problem; you will always see a label in whatever language is available.
My problem is that we know that the prize was won. The item is likely to have a label. For the rest ... as they say in double Dutch.. "search it but out". Thanks, GerardM
On 31 May 2015 at 18:40, Andre Engels andreengels@gmail.com wrote:
And what if Q99999 does not have a label? How am I going from the information "Q99999 won the Nobel Prize in Literature" to "Q99999 is/is not Patrick Modiano"?
André
On Sun, May 31, 2015 at 6:29 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Not enough data. Q99999 may have a label that is "Patrick Modiano".. your first challenge is to find out that your Patrick Modiano is indeed that particular one. Given that you know what award was won, you have a start. Thanks, GerardM
On 31 May 2015 at 17:40, Andre Engels andreengels@gmail.com wrote:
And that helps me how? Most awards have been won by more than one person. If I know that Q99999 has won the Nobel Prize in literature, and I know a fact about Patrick Modiano, should I add that fact to Q99999 or should I create a new item?
André
On Sun, May 31, 2015 at 5:23 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is not what we do.
Impossible? Certainly not. Reat the damn article (on the award). Thanks, GerardM
On 31 May 2015 at 17:06, Daniel Kinzler daniel.kinzler@wikimedia.de wrote: > > Am 31.05.2015 um 15:21 schrieb Gerard Meijssen: > > Hoi, > > When someone or something received an award, it is needed if only > > to > > complete > > the list of recipients of that award.. There is no benchmark for > > enough > > information. The notion that you a Nobel award winner is not > > relevant > > is > > poppycock. With automated descriptions awards do show. > > If you have an item that says someone whon a nobel prize, but not > when > or > which, > and also does *noit* have a label, that items is quite useless; it'S > impossible > to tell which person it is even referring to. > > That is what markus is talking about. For people, if there is a > label, > we > already have pretty good info. But if there is no label, we have a > problem, and > if there isn't any other identifying info,m the item is useless. > > > -- > Daniel Kinzler > Senior Software Developer > > Wikimedia Deutschland > Gesellschaft zur Förderung Freien Wissens e.V. > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
-- André Engels, andreengels@gmail.com
Am 31.05.2015 um 17:23 schrieb Gerard Meijssen:
Hoi, Typically such items were created because the article about the award mentions them. So it is all a matter of perspective. When the award is leading, the information about an award winner is in the article on the award. Having all these awardees on the article is not so great, it is not what we do.
You mean the Wikipedia article references a Wikidata Item as an in-text link? Is that done? I have never heard of that, and it seems like a violation of the "no interwiki links in the article body" rule.
On 2015-05-30 20:42, Stas Malyshev wrote:
Hi!
The problems are often items which never had any links. Many of them are spam, but some of them can be used for structural needs and can be kept.
If they are structural (like wikidata-only classes, etc.) shouldn't they have some incoming links? After all, structure is supposed to be used for something...
They should but very often they do not. I understand that most of these items will never be found when they are actually needed, on the other hand I am hesitant to go for blanc deletion, since users in good standing invested some time into creation of these items.
Cheers Yaroslav
Hi Markus,
I think there must always be some way to make an item unique. A way to identify the item outside Wikidata. This can be a sitelink, for subjects located on a fixed location on Earth it are the coordinates, etc. But only coordinates without knowing what the subject is does not make sense either. In some way the item must be able to be identified somewhere somehow.
This subject can be compared with the subject of what we (on nl-wiki) see as basic statements that need to be added to be able to identify a subject on Wikidata and to be able to differ it from another subject. (To be able to answer the question: the article X is not connected to Wikidata, to which item should it be connected?) For everything instance of. For geographical situated subjects we request the country, located in the administrative territorial entity, location (for towns, etc), coordinates. For people gender, birth/death date/place, occupation, country. For living creatures the taxonomic rank, scientific name, parent taxon. For creative works the author, date.
Romaine
2015-05-29 17:42 GMT+02:00 Markus Krötzsch markus@semantic-mediawiki.org:
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
On 29.05.2015 17:05, Romaine Wiki wrote:
Hi Markus,
Indeed yes, that is also an issue. It can happen with new articles and with older articles.
Some articles get deleted as they are a duplicate of another article, or worse written (to bad to keep), or not an encyclopaedic subject to have in an encyclopaedia.
Every day, on nl-wiki we check new articles if they are connected on Wikidata. Almost all articles that have a template that marks it as nominated for deletion we ignore and we do not add them to Wikidata. On nl-wiki we do this by hand, to make sure all basic statements are added, but if this is done by bots, you get a situation that they may not check for templates that mark articles for deletion.
If an deleted item has statements, the question is if this information is at itself valuable to keep to be used and/or for the future.
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, Given that 19,21% of all items have no statements whatsoever, it is a bit premature to come with such notions. Let us first fix this and then consider what we do not need. Thanks, GerardM
https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 31 May 2015 at 16:12, Romaine Wiki romaine.wiki@gmail.com wrote:
Hi Markus,
I think there must always be some way to make an item unique. A way to identify the item outside Wikidata. This can be a sitelink, for subjects located on a fixed location on Earth it are the coordinates, etc. But only coordinates without knowing what the subject is does not make sense either. In some way the item must be able to be identified somewhere somehow.
This subject can be compared with the subject of what we (on nl-wiki) see as basic statements that need to be added to be able to identify a subject on Wikidata and to be able to differ it from another subject. (To be able to answer the question: the article X is not connected to Wikidata, to which item should it be connected?) For everything instance of. For geographical situated subjects we request the country, located in the administrative territorial entity, location (for towns, etc), coordinates. For people gender, birth/death date/place, occupation, country. For living creatures the taxonomic rank, scientific name, parent taxon. For creative works the author, date.
Romaine
2015-05-29 17:42 GMT+02:00 Markus Krötzsch markus@semantic-mediawiki.org :
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
On 29.05.2015 17:05, Romaine Wiki wrote:
Hi Markus,
Indeed yes, that is also an issue. It can happen with new articles and with older articles.
Some articles get deleted as they are a duplicate of another article, or worse written (to bad to keep), or not an encyclopaedic subject to have in an encyclopaedia.
Every day, on nl-wiki we check new articles if they are connected on Wikidata. Almost all articles that have a template that marks it as nominated for deletion we ignore and we do not add them to Wikidata. On nl-wiki we do this by hand, to make sure all basic statements are added, but if this is done by bots, you get a situation that they may not check for templates that mark articles for deletion.
If an deleted item has statements, the question is if this information is at itself valuable to keep to be used and/or for the future.
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I do agree with that notion, to a large degree. It is probably more important to give at least some statements to items with associated Wikipedia articles, that to delete empty items that, by their very definition, are not in the way of anything.
As a practical suggestion for helping: http://tools.wmflabs.org/wikidata-todo/random_item_without_instance.php
On Sun, May 31, 2015 at 3:39 PM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Given that 19,21% of all items have no statements whatsoever, it is a bit premature to come with such notions. Let us first fix this and then consider what we do not need. Thanks, GerardM
https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 31 May 2015 at 16:12, Romaine Wiki romaine.wiki@gmail.com wrote:
Hi Markus,
I think there must always be some way to make an item unique. A way to identify the item outside Wikidata. This can be a sitelink, for subjects located on a fixed location on Earth it are the coordinates, etc. But only coordinates without knowing what the subject is does not make sense either. In some way the item must be able to be identified somewhere somehow.
This subject can be compared with the subject of what we (on nl-wiki) see as basic statements that need to be added to be able to identify a subject on Wikidata and to be able to differ it from another subject. (To be able to answer the question: the article X is not connected to Wikidata, to which item should it be connected?) For everything instance of. For geographical situated subjects we request the country, located in the administrative territorial entity, location (for towns, etc), coordinates. For people gender, birth/death date/place, occupation, country. For living creatures the taxonomic rank, scientific name, parent taxon. For creative works the author, date.
Romaine
2015-05-29 17:42 GMT+02:00 Markus Krötzsch <markus@semantic-mediawiki.org
:
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
On 29.05.2015 17:05, Romaine Wiki wrote:
Hi Markus,
Indeed yes, that is also an issue. It can happen with new articles and with older articles.
Some articles get deleted as they are a duplicate of another article, or worse written (to bad to keep), or not an encyclopaedic subject to have in an encyclopaedia.
Every day, on nl-wiki we check new articles if they are connected on Wikidata. Almost all articles that have a template that marks it as nominated for deletion we ignore and we do not add them to Wikidata. On nl-wiki we do this by hand, to make sure all basic statements are added, but if this is done by bots, you get a situation that they may not check for templates that mark articles for deletion.
If an deleted item has statements, the question is if this information is at itself valuable to keep to be used and/or for the future.
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Am 31.05.2015 um 17:01 schrieb Magnus Manske:
I do agree with that notion, to a large degree. It is probably more important to give at least some statements to items with associated Wikipedia articles, that to delete empty items that, by their very definition, are not in the way of anything.
If an item has no statements, no sitelinks, and isn't used anywhere, how do you tell what it even *is*? The label only? Is that sufficient and/or useful? What would be lost by deleting it? Maybe, if it has labels in many languages, with good descriptions, that gives enoug info for identifying the tiem, and it is useful to keep it. But "James Herrod" / "Person", with no extra info... what use is it?
Hi!
If an item has no statements, no sitelinks, and isn't used anywhere, how do you tell what it even *is*? The label only? Is that sufficient and/or useful? What would be lost by deleting it? Maybe, if it has labels in many languages, with
Unless its purpose if obvious (i.e. label/description/talk page describes it clearly) I'd say it might be more dangerous to keep it around, as if some people start to use it in different meanings, and then people add independent articles on Wiki which would produce different items with the same meaning, in multiple languages, pretty soon we'd have quite a mess on our hands. Empty item by itself with no links, no good labels and no data or almost no data (like "John Smith, human" and that's it) is not worth much, IMHO.
Yes, I don't have good formal criteria for "obvious" so I imagine we'd have to take it on case basis or maybe think about some.
Hi!
As a practical suggestion for helping: http://tools.wmflabs.org/wikidata-todo/random_item_without_instance.php
I would also suggest http://tools.wmflabs.org/wikidata-todo/important_blank_items.php which lists most linked items from wikis that have no connection to other items whatsoever. Some of them are tough to classify or link to anything, but some are rather obvious.
Gerard, that is a good point. I believe that percentage has been dropping, no? The question is whether what is left are new items, or items still dating from the mass upload of last year and the previously-connected-wikipedia-articles have been deleted. If they are new, maybe we should wait. If they are in the second group, I say they are unsalvageable and should probably be deleted.
On Sun, May 31, 2015 at 4:38 PM, Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, Given that 19,21% of all items have no statements whatsoever, it is a bit premature to come with such notions. Let us first fix this and then consider what we do not need. Thanks, GerardM
https://tools.wmflabs.org/wikidata-todo/stats.php?reverse
On 31 May 2015 at 16:12, Romaine Wiki romaine.wiki@gmail.com wrote:
Hi Markus,
I think there must always be some way to make an item unique. A way to identify the item outside Wikidata. This can be a sitelink, for subjects located on a fixed location on Earth it are the coordinates, etc. But only coordinates without knowing what the subject is does not make sense either. In some way the item must be able to be identified somewhere somehow.
This subject can be compared with the subject of what we (on nl-wiki) see as basic statements that need to be added to be able to identify a subject on Wikidata and to be able to differ it from another subject. (To be able to answer the question: the article X is not connected to Wikidata, to which item should it be connected?) For everything instance of. For geographical situated subjects we request the country, located in the administrative territorial entity, location (for towns, etc), coordinates. For people gender, birth/death date/place, occupation, country. For living creatures the taxonomic rank, scientific name, parent taxon. For creative works the author, date.
Romaine
2015-05-29 17:42 GMT+02:00 Markus Krötzsch <markus@semantic-mediawiki.org
:
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
On 29.05.2015 17:05, Romaine Wiki wrote:
Hi Markus,
Indeed yes, that is also an issue. It can happen with new articles and with older articles.
Some articles get deleted as they are a duplicate of another article, or worse written (to bad to keep), or not an encyclopaedic subject to have in an encyclopaedia.
Every day, on nl-wiki we check new articles if they are connected on Wikidata. Almost all articles that have a template that marks it as nominated for deletion we ignore and we do not add them to Wikidata. On nl-wiki we do this by hand, to make sure all basic statements are added, but if this is done by bots, you get a situation that they may not check for templates that mark articles for deletion.
If an deleted item has statements, the question is if this information is at itself valuable to keep to be used and/or for the future.
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata