On 2015-05-29 17:42, Markus Krötzsch wrote:
Hi Jane, hi Romaine,
I think we agree that valuable information should be kept if at all possible. My chief concern is that orphaned items do not have a clear identity. It's not useful to know that "something" is at a certain location. The first thing we must determine is what this "thing" is that we are talking about. Links to Wikipedia are a good way of doing this. Without them, we need to come up with other identity providing sources. We certainly have the right infrastructure for this (with all the identifier properties that point to other databases and authority files).
The first goal of anyone who wants to safe an orphan should be to connect it with the outside world so as to give it some grounding to build on.
A weaker way to provide basic grounding is to make internal connections. There are cases where this is strong (one can identify items as "the author of War & Peace" or "the mother of Marie Skłodowska-Curie"), but there are other cases where it is too weak ("the town in Germany" or "the part of Europe" do not identify anything). One would need to give this more thought if one wanted to determine automatically if an item receives its identity from the incoming/outgoing links to other items.
Cheers,
Markus
Actually, we already have tools designed by Pasleim to track such items:
https://www.wikidata.org/wiki/User:Pasleim/notability
https://www.wikidata.org/wiki/User:Pasleim/Items_for_deletion/Almost_empty
I usually check that there are no backlinks, provided there are none check the history, and if it turns out the item is empty because of a non-automated merge I merge it, and if it is empty because the only interwiki link was deleted on the project I delete it as non-notable.
The problems are often items which never had any links. Many of them are spam, but some of them can be used for structural needs and can be kept. It is not always easy to figure out in practice, especially if they are in non-Latin and non-Cyrillic alphabets.
Cheers Yaroslav