I'm running into some major label gaps, as are others.

My area of interest is the Company data project. I'm new to SPARQL and here is my working query:

# All subclasses of a class example
# here all subclasses of P279 Organization (Q43229)
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel
WHERE
{
?item wdt:P279 wd:Q43229 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
ORDER BY ASC(?itemLabel)

Background

https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies seems most interested in https://www.wikidata.org/wiki/Q4830453 business enterprise. So I write the above SPARQL to see how "business enterprise" fits under its immediate parent - P279 Organization. I want to learn about all "brother/sister" level objects under "Organization."

If you run the above you will see how many "Organization" children objects have no English label. This greatly impedes understanding what is considered a "business enterprise" and what is not. (Yes - this part of the ontology seems to need some serious tuning up too!) When we go to build out a reasonable starter ontology under the "company data project" we want the structure sound prior to filling it in with a considerable volume of data.

For example, a key goal is the company data needs to "add up" to economic data. Any entity that has a proprietor, partners, or any payroll counts in economic data. Government offices, schools, non profits, etc. all produce goods or services - all contribute to economic output (GDP). So, much of the "company data project" is directly relevant to entities that are more general than just "business entities".

Is there a way I can run a SPARQL query that outputs the EN label if available (as above), and any other label in any other language (including a column for language code) if not? Ideally I'd like to have only one additional language reported if EN is not available, and I'd like to have it report according to my preference (German if available, French if not, then Japanese, Chinese on down the line. It would also be beneficial to have a column for the longer description, if available.

For my analysis purposes now I'm happy to work with simple language translations done by machine. Even if they are slightly off they are probably good enough for my purposes of reviewing and trying to understand the standing ontology. I don't plan on inserting the translations back into WikiData myself, but might try to rally up humans with those specific language skills to double check the machine translations and once verified, insert the translated labels back into WikiData.

I'm not at all familiar with other tools that might be available relevant to "the missing label challenge". Right now SPARQL, SERVICE wikibase:label, and Google Translate seem like the way to go. But, all ideas are most welcome.

Thanks!

Rick

On 2/19/2017 11:00 AM, Romaine Wiki wrote:

Hi all,

If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.

Is there a way to easily make a transcription from one language to another?

Or alternatively if there is a database that has such transcriptions?

Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.

Thanks!

Romaine
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata