@James
As you mention yourself using ranks is a very limiting approach, and I
think that we shouldn't modify the data to help the queries, but try to
make the queries more intelligent. - Once confliciting, and time-dependent
statements are added to each item, the return values of simple queries will
be huge lists, or chunks of the data-tree. - So I think even the infoboxes
have to make some decisions on how they wan't to deal with the complexity,
and those decisions might not be the same in every language community. - I
also think we need to communicate this more that something like "Mayor of
Barcelona" might get 1 results now, but is actually bad-practice and in
Wikidata's future will likely return 100s of values.
-Tobias
2015-11-27 15:58 GMT+01:00 James Heald <j.heald(a)ucl.ac.uk>uk>:
Some items have quite a lot of "instance of"
statements, connecting them
to quite a few different classes.
For example, Frankfurt is currently an instance of seven different classes,
https://www.wikidata.org/wiki/Q1794
and Glasgow is currently an instance of five different classes:
https://www.wikidata.org/wiki/Q4093
This can produce quite a pile-up of descriptions in the
description/subtitle section of an infobox -- for example, as on the
Spanish page for Frankfurt at
https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
in the section between the infobox title and the picture.
Question:
Is it an appropriate use of ranking, to choose a few of the values to
display, and set those values to be "preferred rank" ?
It would be useful to have wider input, as to whether it is a good thing
as to whether this is done widely.
Discussions are open at
https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_ra…
and
https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
-- but these have so far been inconclusive, and have got slightly taken
over by questions such as
* how well terms really do map from one language to another --
near-equivalences that may be near enough for sitelinks may be jarring or
insufficient when presented boldly up-front in an infobox.
(For example, the French translation "ville" is rather unspecific, and
perhaps inadequate in what it conveys, compared to "city" in English or
"ciudad" in Spanish; "town" in English (which might have over
100,000
inhabitants) doesn't necessarily match "bourg" in French or
"Kleinstadt" in
German).
* whether different-language wikis may seek different degrees of
generalisation or specificity in such sub-title areas, depending on how
"close" the subject is to that wiki.
(For readers in some languages, some fine distinctions may be highly
relevant and familiar, whereas for other language groups that level of
detail may be undesirably obscure).
There is also the question of the effect of promoting some values to
"preferred rank" for the visibility of other values in SPARQL -- in
particular when so queries are written assuming they can get away with
using just the simple "truthy" wdt:... form of properties.
However, making eg the value "city" preferred for Glasgow means that it
will no longer be returned in searches for its other values, if these have
been written using "wdt:..." -- so it will now be missed in a simple-level
query for "council areas", the current top-level administrative
subdivisions of Scotland, or for historically-based "registration counties"
-- and this problem will become more pronounced if the practice becomes
more widespread of making some values "preferred" (and so other values
invisible, at least for queries using wdt:...).
From a SPARQL point of view, what would actually be very helpful would to
add a (new) fourth rank -- "misleading without qualifier", below
"normal"
but above "deprecated" -- for statements that *are* true (with the
qualifiers), but could be misleading without them
* for example, for a town that was the county town of a shire once, but
hasn't been for two centuries
* or for an administrative area that is partly located in one higher-level
division, and partly in another -- this is very valuable information to be
able to note, but it's important to be able to exclude it from being all
included in a recursive search for the places in one (but not the other) of
that higher-level division.
The statements shouldn't be marked "deprecated", because they are true
(unlike a widely-given but incorrect date of birth, for example). At the
moment one can sort of work round the issue, if one can find another
statement to make "preferred", so that the qualified statement becomes
invisible to a simple search without qualifiers. However, if "preferred"
status is going to be used just to select things to show in infoboxes, it
becomes very desirable that "wdt:..." searches should retrieve things at
normal rank as well -- creating a need for a new rank for statements which
are true, but misleading if read without qualifiers.
What *is* needed though, is a view on whether trying to tailor what is
shown in infoboxes is an appropriate reason to alter statement rankings.
It would be good to get a view on this.
The Spanish guys who stated doing this have temporarily put further
rank-changes on hold, for the issue to be discussed; but so far what they
have done has only just scratched the surface of what could be done --
there are still a lot more cases of multiple values they would like to tidy.
So: is this the kind of thing that "preferred rank" is envisaged for ?
Or, should some statements not be marked as less preferred than others, if
this is the only reason ?
-- James.
_______________________________________________
Wikidata mailing list
Wikidata(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata