I think this is a problem with the current workflow for creating articles, which starts with Wikipedia, then finishes with Wikidata, though it should probably be the other way around. This problem will eventually solve itself, given enough time, although I believe we still have lots of poorly documented images on Commons that are currently being cleaned up that date back to the early days of Commons, when people uploaded images there because "they had to" but didn't spend any time on the meta data there and stuffed it all into the Wikipedia articles they linked the image to. Since then, lots of that metadata has found it's way back to the images on Commons, where it should have been added in the first place. It may seem like double work, but it is necessary due to lack of proper tools to automate it. Right now there is a lot of double work needing to be done in Wikidata as people create articles, and this can only be done by copying most of the information in the leading paragraph to various statements on Wikidata. This can be both annoying and confusing. 

I think the idea of deletion with 0-2 statements is OK, but 10 statements? With 10 statements there must be something salvageable, no?

On Fri, May 29, 2015 at 3:20 PM, Markus Krötzsch <markus@semantic-mediawiki.org> wrote:
On 29.05.2015 13:42, Romaine Wiki wrote:
The problem that users face is that they experience the merging of items
to difficult or didn't know that that was possible. They understand
(with much annoyance) that they can only add a sitelink to one item.
Therefore they delete a sitelink on one item, and add it to another item.

Personally I think that an afterwards merge would be recommended here.
Would it be possible to have a bot 1. determine what the original
sitelink was that has been removed from the item, 2. see if this
sitelink is added on another item, 3. check if the statements of both
items match (otherwise: a list for humans/tool to check if it is the
same), 4. if the same: automatically merge both items.

I think it would be good to have more things being automated as much as
possible.

That's an important situation too, but I think in the example I gave something else happened: the sitelink was not moved, but the Wikipedia article that it was pointing to got deleted. So it's not just the link that vanished: all information about the item that might have been found on the deleted Wikipedia page is also gone. It's therefore quite hard to find out what the item might have been about.

Regards,

Markus


2015-05-29 13:23 GMT+02:00 Markus Krötzsch
<markus@semantic-mediawiki.org <mailto:markus@semantic-mediawiki.org>>:

    Hi all,

    I just noticed that we have a number of "orphaned items" which were
    created and imported from some Wikipedia article that then got
    deleted. The result is an item with almost no data, no sitelinks,
    and all references claiming "imported from X Wikipedia".

    Example:
    https://www.wikidata.org/wiki/Q9386774

    Here is what happened:
    https://www.wikidata.org/w/index.php?title=Q9386774&action=history

    It would be good to have a process for dealing with such cases. I am
    not saying that we must delete such items immediately, but it seems
    obvious that they need some special attention to become
    self-sustaining even without Wikipedia articles associated.

    Things that would be important to keep such items:
    * Links to other external datasets that confirm the existence of the
    thing.
    * Links to authoritative web sites that confirm the existence of the
    thing.
    * Proper references for all data (we always want that, but here it's
    even more critical: "imported from Wikipedia" is never great, but at
    least it leaves some hope of finding proper references if the
    Wikipedia page still exists).

    In cases like the above, deletion seems to be the most reasonable
    solution (the little data that is there can easily be added again if
    needed in the future). It seems that one could automatically collect
    such candidates for deletion (pages that are not used as property
    values, have no site links, have no identifier properties, were not
    edited since more than a month, an have less than, say, ten
    properties+labels+descriptions).

    Regards,

    Markus

    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wikidata




_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata