I'm running into some major label gaps, as are others.
My area of interest is the Company data project. I'm new to SPARQL and here is my working query:
# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
?item wdt:P279 wd:Q43229 .
SERVICE wikibase:label { bd:serviceParam wikibase:language
"en" }
}
ORDER BY ASC(?itemLabel)
Background
https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies
seems most interested in https://www.wikidata.org/wiki/Q4830453
business enterprise. So I write the above SPARQL to see how
"business enterprise" fits under its immediate parent - P279
Organization. I want to learn about all "brother/sister" level
objects under "Organization."
If you run the above you will see how many "Organization"
children objects have no English label. This greatly impedes
understanding what is considered a "business enterprise" and what
is not. (Yes - this part of the ontology seems to need some
serious tuning up too!) When we go to build out a reasonable
starter ontology under the "company data project" we want the
structure sound prior to filling it in with a considerable volume
of data.
For example, a key goal is the company data needs to "add up" to
economic data. Any entity that has a proprietor, partners, or any
payroll counts in economic data. Government offices, schools, non
profits, etc. all produce goods or services - all contribute to
economic output (GDP). So, much of the "company data project" is
directly relevant to entities that are more general than just
"business entities".
Is there a way I can run a SPARQL query that outputs the EN label
if available (as above), and any other label in any other language
(including a column for language code) if not? Ideally I'd like to
have only one additional language reported if EN is not available,
and I'd like to have it report according to my preference (German
if available, French if not, then Japanese, Chinese on down the
line. It would also be beneficial to have a column for the longer
description, if available.
For my analysis purposes now I'm happy to work with simple
language translations done by machine. Even if they are slightly
off they are probably good enough for my purposes of reviewing and
trying to understand the standing ontology. I don't plan on
inserting the translations back into WikiData myself, but might
try to rally up humans with those specific language skills to
double check the machine translations and once verified, insert
the translated labels back into WikiData.
I'm not at all familiar with other tools that might be available
relevant to "the missing label challenge". Right now SPARQL,
SERVICE wikibase:label, and Google Translate seem like the way to
go. But, all ideas are most welcome.
Thanks!
Rick
RomaineThanks!Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.Or alternatively if there is a database that has such transcriptions?Is there a way to easily make a transcription from one language to another?Hi all,If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata