Some items have quite a lot of "instance of" statements, connecting them
to quite a few different classes.
For example, Frankfurt is currently an instance of seven different classes,
https://www.wikidata.org/wiki/Q1794
and Glasgow is currently an instance of five different classes:
https://www.wikidata.org/wiki/Q4093
This can produce quite a pile-up of descriptions in the
description/subtitle section of an infobox -- for example, as on the
Spanish page for Frankfurt at
https://es.wikipedia.org/wiki/Fr%C3%A1ncfort_del_Meno
in the section between the infobox title and the picture.
Question:
Is it an appropriate use of ranking, to choose a few of the values to
display, and set those values to be "preferred rank" ?
It would be useful to have wider input, as to whether it is a good thing
as to whether this is done widely.
Discussions are open at
https://www.wikidata.org/wiki/Wikidata:Project_chat#Preferred_and_normal_ra…
and
https://www.wikidata.org/wiki/Wikidata:Bistro#Rang_pr.C3.A9f.C3.A9r.C3.A9
-- but these have so far been inconclusive, and have got slightly taken
over by questions such as
* how well terms really do map from one language to another --
near-equivalences that may be near enough for sitelinks may be jarring
or insufficient when presented boldly up-front in an infobox.
(For example, the French translation "ville" is rather unspecific, and
perhaps inadequate in what it conveys, compared to "city" in English or
"ciudad" in Spanish; "town" in English (which might have over 100,000
inhabitants) doesn't necessarily match "bourg" in French or "Kleinstadt"
in German).
* whether different-language wikis may seek different degrees of
generalisation or specificity in such sub-title areas, depending on how
"close" the subject is to that wiki.
(For readers in some languages, some fine distinctions may be highly
relevant and familiar, whereas for other language groups that level of
detail may be undesirably obscure).
There is also the question of the effect of promoting some values to
"preferred rank" for the visibility of other values in SPARQL -- in
particular when so queries are written assuming they can get away with
using just the simple "truthy" wdt:... form of properties.
However, making eg the value "city" preferred for Glasgow means that it
will no longer be returned in searches for its other values, if these
have been written using "wdt:..." -- so it will now be missed in a
simple-level query for "council areas", the current top-level
administrative subdivisions of Scotland, or for historically-based
"registration counties" -- and this problem will become more pronounced
if the practice becomes more widespread of making some values
"preferred" (and so other values invisible, at least for queries using
wdt:...).
From a SPARQL point of view, what would actually be very helpful would
to add a (new) fourth rank -- "misleading without qualifier", below
"normal" but above "deprecated" -- for statements that *are* true (with
the qualifiers), but could be misleading without them
* for example, for a town that was the county town of a shire once, but
hasn't been for two centuries
* or for an administrative area that is partly located in one
higher-level division, and partly in another -- this is very valuable
information to be able to note, but it's important to be able to exclude
it from being all included in a recursive search for the places in one
(but not the other) of that higher-level division.
The statements shouldn't be marked "deprecated", because they are true
(unlike a widely-given but incorrect date of birth, for example). At
the moment one can sort of work round the issue, if one can find another
statement to make "preferred", so that the qualified statement becomes
invisible to a simple search without qualifiers. However, if
"preferred" status is going to be used just to select things to show in
infoboxes, it becomes very desirable that "wdt:..." searches should
retrieve things at normal rank as well -- creating a need for a new rank
for statements which are true, but misleading if read without qualifiers.
What *is* needed though, is a view on whether trying to tailor what is
shown in infoboxes is an appropriate reason to alter statement rankings.
It would be good to get a view on this.
The Spanish guys who stated doing this have temporarily put further
rank-changes on hold, for the issue to be discussed; but so far what
they have done has only just scratched the surface of what could be done
-- there are still a lot more cases of multiple values they would like
to tidy.
So: is this the kind of thing that "preferred rank" is envisaged for ?
Or, should some statements not be marked as less preferred than others,
if this is the only reason ?
-- James.
I finally found the time to play extensively with Mix’n’match and it’s by far one of the most promising models I’ve come across for Wikidata growth. A short conversation with Magnus on Twitter got me thinking on how to best preserve the output of costly human curation.[1]
I spent most of my time manually auditing automatically matched entries from the Dizionario Biografico degli Italiani [2]. These entries are long, unstructured biographical entries and it takes quite a lot of effort to understand if the two individuals referenced by Wikidata and DBI actually are the same person. This is a great example of a task that’s still pretty hard for a machine to perform, no matter how sophisticated the algorithm.
My favorite example? Mix’n’ match suggested a match between Giulio Baldigara (Q1010811 <https://www.wikidata.org/wiki/Q1010811>) and Giulio Baldigara (DBI <http://www.treccani.it/enciclopedia/giulio-baldigara_(Dizionario_Biografico…>) which looked totally legitimate: these two individuals are both Italian architects from the 16th century with the same name, they were both born around the same years in the same city, they were both active in Hungary at the same time: strong indication that they are the same person, right? It turns out they are brothers and the full name of the person referenced in Wikidata is Giulio Cesare Baldigara (the least known in a family of architects). I unmatched the suggestion and flagged the DBI entry as non existing in Wikidata.
My question at the moment is: the output of a labor-intensive review of a potential match is currently stored as a volatile flag in a tool hosted on labs, but is invisible in Wikidata. Should something happen to Mix’n’match (god forbid) the result of my work would get lost. Which got me thinking:
- shouldn’t a manually unmatched item be created directly on Wikidata (after all DBI is all about notable individuals who would easily pass Wikidata’s notability threshold for biographies)
- shouldn’t the relation between Giulio (Cesare) Baldigara (Q1010811 <https://www.wikidata.org/wiki/Q1010811>) and the newly created item for Giulio Baldigara be explicitly represented via a not the same as property, to prevent future humans or machines from accidentally remerging the two items based on some kind of heuristics
Thoughts welcome,
Dario
[1] https://twitter.com/ReaderMeter/status/667214565621432320 <https://twitter.com/ReaderMeter/status/667214565621432320>
[2] https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&sho… <https://tools.wmflabs.org/mix-n-match/?mode=catalog&catalog=55&offset=0&sho…>
When checking the quality of the latest bot edits on disease terms I am
seeing some strange results from the WDQS.
The numbers of statements with rank Normal and rank Deprecated don't add
up, to the number added from the original source.
When I ran the following query:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT DISTINCT ?diseases ?doid WHERE {
?diseases p:P699 ?doid .
?doid wikibase:rank wikibase:NormalRank .
?doid wikibase:rank wikibase:DeprecatedRank .
}
I did expect no result, since it returns statements with both rank normal,
as rank deprecated. However, I got 2041 tuples [1].
Andra
[1] http://tinyurl.com/pgre6gh
Hey folks :)
I am looking for someone who can represent Wikidata here:
http://www.oeaw.ac.at/acdh/de/node/396
This is on the 3rd of December so unfortunately short notice. If you
are willing and able to talk about Wikidata there let me know and I'll
get you in touch with the right people.
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hey folks :)
I know it is early but we have to make our plans for Wikimania 2016
and need your input in the next few days here:
https://www.wikidata.org/wiki/Wikidata:Wikimania_2016
Thanks! Let's rock Wikimania 2016 ;-)
Cheers
Lydia
--
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata
Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
Hi Romaine, Gerard, and Wikidatans,
Re Wikidata's Third Birthday, you'll find in this blog entry -
http://scott-macleod.blogspot.com/2015/11/roses-lightning-talk-about-cc-wor…
-
the Lightning Talk with minute marker I gave at the Wikimedia Foundation in
SF yesterday about World University and School's (WUaS) donation to CC
WUaS's to CC Wikidata :). Thank you!
Best regards,
Scott
On Wed, Nov 18, 2015 at 7:43 AM, Romaine Wiki <romaine.wiki(a)gmail.com>
wrote:
> Hi Scott,
>
> Sorry for the delay, I have been ill the past two weeks and not feeling
> well enough to look into your e-mail.
>
> Honestly speaking, hearing about a donation is great, but I do not have a
> clear picture of what is donated. I trust my wiki colleagues in handling
> the data as they are better in it. But if you have any questions or ideas,
> feel free to ask.
>
> Greetings,
> Romaine
>
> 2015-11-02 6:50 GMT+01:00 Scott MacLeod <
> worlduniversityandschool(a)gmail.com>:
>
>> Hi Romaine (and Gerard),
>>
>> Greetings from the SF Bay Area to you in the Netherlands I think. (Gerard
>> Meijssen and I Skyped together about World University and School/Wikidata
>> some months ago; I've also done this with Quim Gil more recently). And I've
>> been in touch some with Lydia Pintscher, Markus Kroetzsch and Magnus Manske
>> about this as well. I think WUaS's all languages and all countries as
>> online great universities' focus will be a growth story for Wikidata in
>> remarkable ways.
>>
>> https://www.wikidata.org/wiki/Wikidata:Third_Birthday/Presents
>>
>>
>> http://worlduniversityandschool.blogspot.com/2015/10/donation-of-cc-world-u…
>>
>> Having donated CC World University and School to CC Wikidata on October
>> 29th, per Lydia's suggestion, I wonder if you and I could please email
>> further about integrating WUaS into Wikidata-Wikipedia and whether I could
>> please ask for your help as a coder who knows Wikidata in beginning doing
>> this in Wikidata?
>>
>> Thank you.
>>
>> Best regards,
>> Scott
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> - Scott MacLeod - Founder & President
>> - World University and School
>> - http://worlduniversityandschool.org
>> - 415 480 4577
>> - World University and School - like Wikipedia with best STEM-centric
>> OpenCourseWare - incorporated as a nonprofit university and school in
>> California, and is a U.S. 501 (c) (3) tax-exempt educational organization.
>>
>>
>>
>
--
- Scott MacLeod - Founder & President
- Please donate to tax-exempt 501 (c) (3)
- World University and School
- via PayPal, or credit card, here -
- http://worlduniversityandschool.org
- or send checks to
- 415 480 4577
- PO Box 442, (86 Ridgecrest Road), Canyon, CA 94516
- World University and School - like Wikipedia with best STEM-centric
OpenCourseWare - incorporated as a nonprofit university and school in
California, and is a U.S. 501 (c) (3) tax-exempt educational organization.
World University and School is sending you this because of your interest in
free, online, higher education. If you don't want to receive these, please
reply with 'unsubscribe' in the body of the email, leaving the subject line
intact. Thank you.
I would like to create a REST service in my application that has
request/response behavior shown in the following two scenarios:
http://example/claims?id=Q170790&lang=en
{
“id”: “Q170790”,
“label”: “mathematician”,
“claims”: [
{
“property”: {
“id”: “P279”,
“label”: “subclass of”
},
“value”: [
{
“id”: “Q816264”,
“label”: “formal science”
}
]
},
{
“property”: {
“id”: “P425”,
“label”: “field of this profession”
},
“value”: [
{
“id”: “Q395”,
“label”: “mathematics”
}
]
}
]
}
http://example/claims?id=Q170790&lang=fr
{
“id”: “Q170790”,
“label”: “math\u00e9maticien”,
“claims”: [
{
“property”: {
“id”: “P279”,
“label”: “sous-classe de”
},
“value”: [
{
“id”: “Q816264”,
“label”: “science formelle”
}
]
},
{
“property”: {
“id”: “P425”,
“label”: “domaine d'occupation”
},
“value”: [
{
“id”: “Q395”,
“label”: “math\u00e9matiques”
}
]
}
]
}
I'm trying to identify the best Wikidata API queries to support this
functionality, ideally only requiring one invocation. The closest one
I've found is something like the following, but it seems that I'd have
to make an additional query to get the label for each Q item returned:
https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q170790&languag…
Is there a way to somehow include the desired information in one query?
Regards,
James Weaver
I'm working at the moment finding places of birth for photographers who
have birthdates but lack locations (my query is below). Ultimately I'll be
comparing WD's photographer info to my own database of photographer
biographies. This leads me to a few questions regarding dates:
How can I query out the date data in plain language (20 July 1982 vs.
1982-07-20T00:00:00Z)?
I need this so I can distinguish between someone actually born January 1,
1900 and someone whose birth date is given as simply 1900, or someone with
a date given simply as 19th century.
How do folks feel about giving lifedates as centuries (born: 19th century)?
I personally would not do it. In my own database, if lifedates are not
known we give "active" dates. For example I would have
Alexander Möhlen (German, active 1900s-1910s)
<https://www.wikidata.org/wiki/Q2642705>.
Is there a concept such as "Period of Activity" that could be used, or
would others find this desirable?
Is there a way to record uncertainty in dates (circa or ca.)?
Many thanks in advance,
David
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?photographer ?photographerLabel ?photographerDescription ?dob WHERE
{
?photographer wdt:P106 wd:Q33231 .
?photographer wdt:P569 ?dob .
FILTER NOT EXISTS { ?photographer wdt:P19 ?pob .}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
} Order by ?photographerLabel