Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)

23 Feb 2017


      You can specify multiple languages for the label service:
# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
  ?item wdt:P279 wd:Q43229.
  SERVICE wikibase:label { bd:serviceParam wikibase:language
"en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" }
}
ORDER BY ASC(LCASE(?itemLabel))
Link:
https://query.wikidata.org/#%23%20All%20subclasses%20of%20a%20class%20exampl...
I’ve also changed the query to sort the results case-insensitively.
(Note: the query seems to occasionally take a very long time for me, 180
seconds – I’m not sure if the many label languages cause the slowdown or
if it’s just my internet connection.)
Cheers,
Lucas
On 23.02.2017 02:57, Rick Labs wrote:
...
I'm running into some major label gaps, as are others.
My area of interest is the Company data project. I'm new to SPARQL and
here is my working query:
# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
    ?item wdt:P279 wd:Q43229 .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY ASC(?itemLabel)
Background
https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies seems
most interested in https://www.wikidata.org/wiki/Q4830453 business
enterprise. So I write the above SPARQL to see how "business
enterprise" fits under its immediate parent - P279 Organization. I
want to learn about all "brother/sister" level objects under
"Organization."
If you run the above you will see how many "Organization" children
objects have no English label. This greatly impedes understanding what
is considered a "business enterprise" and what is not. (Yes - this
part of the ontology seems to need some serious tuning up too!)  When
we go to build out a reasonable starter ontology under the "company
data project" we want the structure sound prior to filling it in with
a considerable volume of data.
For example, a key goal is the company data needs to "add up" to
economic data. Any entity that has a proprietor, partners, or any
payroll counts in economic data. Government offices, schools, non
profits, etc. all produce goods or services - all contribute to
economic output (GDP).  So, much of the "company data  project" is
directly relevant to entities that are more general than just
"business entities".
Is there a way I can run a SPARQL query that outputs the EN label if
available (as above), and any other label in any other language
(including a column for language code) if not? Ideally I'd like to
have only one additional language reported if EN is not available, and
I'd like to have it report according to my preference (German if
available, French if not, then Japanese, Chinese on down the line. It
would also be beneficial to have a column for the longer description,
if available.
For my analysis purposes now I'm happy to work with simple language
translations done by machine. Even if they are slightly off they are
probably good enough for my purposes of reviewing and trying to
understand the standing ontology. I don't plan on inserting the
translations back into WikiData myself, but might try to rally up
humans with those specific language skills to double check the machine
translations and once verified, insert the translated labels back into
WikiData.
I'm not at all familiar with other tools that might be available
relevant to "the missing label challenge". Right now SPARQL, SERVICE
wikibase:label, and Google Translate seem like the way to go. But, all
ideas are most welcome.
Thanks!
Rick
On 2/19/2017 11:00 AM, Romaine Wiki wrote:
...
Hi all,
If you look in the recent changes, most items have labels in English
and those are shown in the recent changes and elsewhere (so we know
what the item is about without opening first). But not all items have
labels, and these items without English label are often items with
only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This
forms a significant gap.
Is there a way to easily make a transcription from one language to
another?
Or alternatively if there is a database that has such transcriptions?
Also the other way round might be helpful for users of Wikidata that
use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
Thanks!
Romaine

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Label gaps on Wikidata - (SPARQL help needed. SERVICE wikibase:label)