Re: [Wikidata] [Spam] Re: [Spam] Re: No links, wrong data: Scotland's orphans need help

7 Jun 2015


      Ah, that might be a little bit overkill. It would have to be "re-instanced"
on every subsequent edit. Not to mention the "contamination" of P31 with
maintenance items.
On Sun, Jun 7, 2015 at 6:45 PM Andrew Gray andrew.gray@dunelm.org.uk
wrote:
...
A related suggestion... I've wondered before if what we could use for
such imports is a "meta" value for the P31 property - something like
"instance of: imported unchecked item". When a person has corrected or
checked the items, added sitelinks, etc it's easy to remove this
value. This would let us easily identify ones that might still need
assistance, eg to check for duplicates or to mark them as a part of a
larger item, without continually having to go through the list.
Commons does something similar with hidden tracking categories for
bulk uploads, and it's quite useful there.
Andrew.
On 3 June 2015 at 14:48, Markus Krötzsch markus@semantic-mediawiki.org
wrote:
...
Thanks, Andrew, for the clarification. This makes perfect sense.
I don't see a problem with one bridge having two IDs in some external
database. We already have this for other ID-like properties for other
reasons. What is important though is that it still is a single bridge,
and
...
should therefore be one item.
Your clarification is reassuring since it suggests that the problem is
not
...
overly common after all. Maybe one can just merge these cases manually.
Once
...
the (multiple) ids are found in the merged items, avoiding future
duplicates
...
will be done as usual (which is still difficult with the Scottish
Heritage
...
ids since we have many legit Wikidata items that have the same id -- but
this at least is an independent problem).
Regards,
Markus
On 03.06.2015 13:48, Andrew Gray wrote:
...
This particular case is something of a known problem - we've
encountered it with some of the other heritage-building identifier
lists as well.
Bridges often span a river which is the border for two jurisdictions
(in this case, council areas). Each local area counts it as a historic
building, and because the national lists are aggregated from local
lists, it gets two entries in the main list, one as Fife and one as
Edinburgh. A similar case in Wales is the Menai Suspension Bridge,
which is 4049 from the Gwynedd register and 18572 from the Anglesey
one (Wikidata, at Q581526, only lists one identifer).
The lack of deduplication is probably intentional rather than a bug,
and both entries are "correct". Perhaps one way to handle this for
Wikidata would be to, hmm, say something like "if the item is some
kind of a bridge, then allow two IDs" in the constraints?
I can't immediately think of any bridges which cross national borders
*and* are a heritage building in both countries, but we'd see the same
thing there, with it having identifiers from both sides.
Andrew.
On 2 June 2015 at 12:12, Markus Krötzsch <markus@semantic-mediawiki.org
...
wrote:
...
Another interesting type of Scottish historic orphans are those that
are
...
...
...
duplicates of items that do have site links. Even very prominent ones
are
...
...
...
duplicated, such as
https://www.wikidata.org/wiki/Q17569486 (dup)
https://www.wikidata.org/wiki/Q933000 (real item)
Interestingly, they use different Scotland IDs, and it does indeed seem
that
Historic Scotland also contains duplicates:
http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0::::BUILDING,HL...
...
...
...
http://data.historic-scotland.gov.uk/pls/htmldb/f?p=2200:15:0::::BUILDING,HL...
...
...
...
Overall, this seems to be an example of an ID that really should not be
considered "identity providing" since there seems to be an many-to-many
relationship between Wikidata and Historic Scottland. Orphans should
receive
additional ids from a better source if at all possible. With the great
number of seemingly legit non-functional uses of the Scotland IDs, they
cannot be used in practice to detect duplicates.
Regards,
Markus
On 02.06.2015 13:01, Markus Krötzsch wrote:
...
On 02.06.2015 11:30, Magnus Manske wrote:
...
Update 2:
For example,
https://www.wikidata.org/wiki/Q17847522
and
https://www.wikidata.org/wiki/Q17847537
have the same Scotland ID, but refer to different entities (church
and
...
...
...
...
...
churchyard, respectively). They were as two entities in the original
dataset, sharing the same ID.
Yes, I noticed such cases too. From the information Wikidata, it is
not
...
...
...
...
clear to me why this is sometimes done and sometimes not done.
For example, these adjacent houses have the same Scotland ID but
different items that each have their own coordinates (where did the
coordinates come from?):
https://www.wikidata.org/wiki/Q17576211
https://www.wikidata.org/wiki/Q17576182
https://www.wikidata.org/wiki/Q17576185
In many other cases, adjacent houses with the same ID are combined
into
...
...
...
...
one item:
https://www.wikidata.org/wiki/Q17806587
(note, however, that the house addresses given in the ID and in the
item
...
...
...
...
label do not match, though they overlap on most of the houses.)
Finally, there are also cases where there are different IDs and we
have
...
...
...
...
several items, but they have the same labels that merge the contents
of
...
...
...
...
the two IDs:
https://www.wikidata.org/wiki/Q17810121
https://www.wikidata.org/wiki/Q17810137
It seems that the data was not taken from the Historic Sites database
but from some different source that has its own coordinate data and a
different (but seemingly arbitrary) approach to grouping sites.
However,
...
...
...
...
the coordinated give Historic Scotland as their reference -- I wonder
if
...
...
...
...
Historic Scotland might be changing frequently or exist in several
versions.
Regards,
Markus
...
On Tue, Jun 2, 2015 at 10:26 AM Magnus Manske
<magnusmanske@googlemail.com mailto:magnusmanske@googlemail.com>
wrote:
 Update: There appear to be quite a few items with duplicate

Scotland
     IDs (not all of them may be erroneous!):
     http://wdq.wmflabs.org/stats?action=doublestring&prop=709
 On Tue, Jun 2, 2015 at 10:23 AM Magnus Manske
 <magnusmanske@googlemail.com <mailto:

magnusmanske@googlemail.com>>
...
...
...
...
...
 wrote:

     I created (some/most of) these items as part of the Wiki

Loves
...
...
...
...
...
     Monuments UK 2014 drive, to run the campaign from Wikidata
     rather than from a bespoke database. This allows the

community
...
...
...
...
...
     (TM) to maintain the data, rather than one poor sod (e.g.,
     myself) having to frantically update all of it every year

;-)
...
...
...
...
...
     "Consumer" tool is here:
     https://tools.wmflabs.org/wlmuk/index_wd.html

     These are based on "official" data from National Heritage,
     provided to me via Wikimedia UK. Grade A (or Grade I/II* in
     England) structures should be noteworthy by default.

     It appears (as per your examples) that some of these were
     created as duplicates/with wrong IDs. As I said, this is

based
...
...
...
...
...
     on "official" data, so it's the best I could do at the time.
     With mass creation, there are bound to be a few strays. If

you
...
...
...
...
...
     can find some large-scale, systemic issue I'll try to fix

it,
...
...
...
...
...
     but the one-offs will always fall back to manual fixing. At
     least, with Wikidata, we can fix them together.

     On Tue, Jun 2, 2015 at 10:01 AM Daniel Kinzler
     <daniel.kinzler@wikimedia.de
     <mailto:daniel.kinzler@wikimedia.de>> wrote:

         Am 01.06.2015 um 22:26 schrieb Markus Krötzsch:
          > Finally, the technical question is: Why is this even
         possible? I thought that,
          > in each language, label+description are a key

(globally
...
...
...
...
...
         unique), yet here we
          > have many pairs of items with exactly the same label

and
             description. Or is the
              > problem that no description was entered and so the
system
             does not apply the
              > key?
         The uniqueness constraint does indeed not apply if there

is
             no description.
         --
         Daniel Kinzler
         Senior Software Developer

         Wikimedia Deutschland
         Gesellschaft zur Förderung Freien Wissens e.V.

         _______________________________________________
         Wikidata mailing list
         Wikidata@lists.wikimedia.org
         <mailto:Wikidata@lists.wikimedia.org>
         https://lists.wikimedia.org/mailman/listinfo/wikidata


Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata
--

Andrew Gray
andrew.gray@dunelm.org.uk


Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] [Spam] Re: [Spam] Re: No links, wrong data: Scotland's orphans need help