Hi all,
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
Is there a way to easily make a transcription from one language to another? Or alternatively if there is a database that has such transcriptions?
Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
Thanks!
Romaine
Citiranje Romaine Wiki romaine.wiki@gmail.com:
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
Is there a way to easily make a transcription from one language to another? Or alternatively if there is a database that has such transcriptions?
There is in many cases, however there are some problems associated with it. You may not know what is the original language to transcribe from, you might need a translation rather than transcription, if there are multiple labels you have no way to choose between them.
Indeed in many cases a translation is needed, but for some languages and specific types of entities what is needed is just a transcription if not just a copy from the original language. For example names of humans or settlements. I guess for some languages with the same script, one can just copy the label, f.e. for a british person from en to fr.
Konstantinos Stampoulis geraki@geraki.gr http://www.geraki.gr
---- Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org ----------------------------------------------------------------------- Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω τίποτε να κρύψω. :-)
2017-02-19 18:16 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:
Citiranje Romaine Wiki romaine.wiki@gmail.com:
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
gap.
Is there a way to easily make a transcription from one language to
another?
Or alternatively if there is a database that has such transcriptions?
There is in many cases, however there are some problems associated with it. You may not know what is the original language to transcribe from, you might need a translation rather than transcription, if there are multiple labels you have no way to choose between them.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, You know, typically you are right. In the last few days I added members of the chamber of deputies of Haiti. I used names from the English Wikipedia but I am not sure that the names are correct. In one instance I found that the first name was at the back for others I am not sure that we have it right.
The problem with rules are the exceptions and for automated approaches you have to seriously consider these. Thanks, GerardM
On 22 February 2017 at 07:46, Konstantinos Stampoulis geraki@geraki.gr wrote:
Indeed in many cases a translation is needed, but for some languages and specific types of entities what is needed is just a transcription if not just a copy from the original language. For example names of humans or settlements. I guess for some languages with the same script, one can just copy the label, f.e. for a british person from en to fr.
Konstantinos Stampoulis geraki@geraki.gr http://www.geraki.gr
Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω τίποτε να κρύψω. :-)
2017-02-19 18:16 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:
Citiranje Romaine Wiki romaine.wiki@gmail.com:
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
gap.
Is there a way to easily make a transcription from one language to
another?
Or alternatively if there is a database that has such transcriptions?
There is in many cases, however there are some problems associated with it. You may not know what is the original language to transcribe from, you might need a translation rather than transcription, if there are multiple labels you have no way to choose between them.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Relevant: https://arxiv.org/pdf/1702.06235.pdf
On Wed, Feb 22, 2017 at 6:57 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, You know, typically you are right. In the last few days I added members of the chamber of deputies of Haiti. I used names from the English Wikipedia but I am not sure that the names are correct. In one instance I found that the first name was at the back for others I am not sure that we have it right.
The problem with rules are the exceptions and for automated approaches you have to seriously consider these. Thanks, GerardM
On 22 February 2017 at 07:46, Konstantinos Stampoulis geraki@geraki.gr wrote:
Indeed in many cases a translation is needed, but for some languages and specific types of entities what is needed is just a transcription if not just a copy from the original language. For example names of humans or settlements. I guess for some languages with the same script, one can just copy the label, f.e. for a british person from en to fr.
Konstantinos Stampoulis geraki@geraki.gr http://www.geraki.gr
Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω τίποτε να κρύψω. :-)
2017-02-19 18:16 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:
Citiranje Romaine Wiki romaine.wiki@gmail.com:
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
gap.
Is there a way to easily make a transcription from one language to
another?
Or alternatively if there is a database that has such transcriptions?
There is in many cases, however there are some problems associated with it. You may not know what is the original language to transcribe from, you might need a translation rather than transcription, if there are multiple labels you have no way to choose between them.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, A wonderful read. The fun thing to realise is that this is all about static data. It shows that given good data a lot can be inferred a lot that is of value. The one thing to realise is that Wikidata is not static and this has two components; as more statements are added more items will have the "minimum" amount of statements to provide an accurate biographic summary and there is a potential for vandalism. This last part can be remedied to compare Wikidata with existing Wikipedia articles; it will make it more obvious to signal vandalism ..
So all in all, it is a wonderful read. What it does not cover is the potential for using it for "other" languages.. This is where this will make even more of a difference. All it tales to add valid statements are labels. For many items it can be said that the effect of one added label may impact hundreds of thousands of items.. Thanks, GerardM
On 22 February 2017 at 11:14, Magnus Manske magnusmanske@googlemail.com wrote:
Relevant: https://arxiv.org/pdf/1702.06235.pdf
On Wed, Feb 22, 2017 at 6:57 AM Gerard Meijssen gerard.meijssen@gmail.com wrote:
Hoi, You know, typically you are right. In the last few days I added members of the chamber of deputies of Haiti. I used names from the English Wikipedia but I am not sure that the names are correct. In one instance I found that the first name was at the back for others I am not sure that we have it right.
The problem with rules are the exceptions and for automated approaches you have to seriously consider these. Thanks, GerardM
On 22 February 2017 at 07:46, Konstantinos Stampoulis geraki@geraki.gr wrote:
Indeed in many cases a translation is needed, but for some languages and specific types of entities what is needed is just a transcription if not just a copy from the original language. For example names of humans or settlements. I guess for some languages with the same script, one can just copy the label, f.e. for a british person from en to fr.
Konstantinos Stampoulis geraki@geraki.gr http://www.geraki.gr
Συνεισφέρετε στην Βικιπαίδεια. https://el.wikipedia.org
Οι παραπάνω απόψεις είναι προσωπικές και δεν εκφράζουν παρά μόνο εμένα. Το μήνυμα θεωρείται εμπιστευτικό μόνο εάν το έχω ζητήσει ρητά, διαφορετικά μπορείτε να το χρησιμοποιήσετε σε οποιαδήποτε δημόσια συζήτηση. Δεν έχω τίποτε να κρύψω. :-)
2017-02-19 18:16 GMT+02:00 Smolenski Nikola smolensk@eunet.rs:
Citiranje Romaine Wiki romaine.wiki@gmail.com:
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant
gap.
Is there a way to easily make a transcription from one language to
another?
Or alternatively if there is a database that has such transcriptions?
There is in many cases, however there are some problems associated with it. You may not know what is the original language to transcribe from, you might need a translation rather than transcription, if there are multiple labels you have no way to choose between them.
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi,
User:VIGNERON had the idea of a workshop at Wikimania to work on this problem: https://www.wikidata.org/wiki/Wikidata:Wikimania_2017#Transl-a-thon
Maybe we should do online sessions to regularly reduce the gap on labels and descriptions?
Envel
2017-02-19 17:00 GMT+01:00 Romaine Wiki romaine.wiki@gmail.com:
Hi all,
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
Is there a way to easily make a transcription from one language to another? Or alternatively if there is a database that has such transcriptions?
Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
Thanks!
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hoi, The one language with the most labels is English. There are however quite a lot of items with only labels in other languages.. The problem with items with no labels is that quite often they have few statements and consequently it is hard to disambiguate them. For the record; the automated descriptions are really valuable here.. When you consider these items there are several problems. One of them is that the number of items that need to be merged is quite high. To really understand what is already there use Reasonator; for any and all items it will show you any available label.
When you want to make it a priority to label, it makes sense to consider what could help. It means that you have to divide the items that are problematic and find meaningful differences. One distinction is items with articles. They have their own Wikilinks and these are a pointer to what is connected to it. When there is no article connected, it does not mean that it is not notable!
When articles exist, it helps when Google translate knows the language. It may even give you a name in English you can google for. Consequently you may find another item on the same subject but that is a good thing.
Transcriptions .. they often do not work. Thanks, GerardM
On 19 February 2017 at 17:00, Romaine Wiki romaine.wiki@gmail.com wrote:
Hi all,
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
Is there a way to easily make a transcription from one language to another? Or alternatively if there is a database that has such transcriptions?
Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
Thanks!
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
1) Gap: I do agree it would be good to promote these backlogs, as two of the easiest ones for newcomers to work on. (Although there are guidelines and best-practices, and any backlog promotion should clearly point to those documentation pages, so that newcomers can have a ready-reference).
2) Translation I also agree that a machine-translation /suggestion/ or /hint/ would be a nice option. The main concern is users who don't understand the limitations of machine-translation and whom must resist the urge to just copy&paste. (This goes for both language-fluency, but also for technical-vocabulary fluency, e.g. I could not give a confident description of most chemistry or physics articles, even with numerous machine-translation-based suggestions or the article itself!)
I can't see anything specifically about this in Phabricator, so it's probably worth filing a feature request, unless someone else points out a task I missed, or raises an overwhelming concern. [Note: a semi-related task to link in the SeeAlso of the new one: T71345]
3) Tools: Is it currently possible to get a list of items without a label/description in language X? I tried a few weeks ago, and the onwiki Special pages were broken. I filed https://phabricator.wikimedia.org/T157884 "Nothing loads on Special:EntitiesWithoutDescription or Special:EntitiesWithoutLabel results" to cover this problem.
Ah, I now see https://tools.wmflabs.org/wikidata-terminator/? which works for missing descriptions. However the "with missing labels" set of links seems to be broken for most languages. Sjoerd filed https://bitbucket.org/magnusmanske/wikidata-todo/issues/45/terminator-top-10... and I've added some example links.
The other set of links that are listed, are all outdated ( https://www.wikidata.org/wiki/Wikidata:WikiProject_Labels_and_descriptions#L... and below)
I wonder if we should add a link to https://tools.wmflabs.org/wikidata-game/distributed/#game=23 ("Kaspar's Persondata game: Descriptions") in that list? AFAIK it only contains English suggestions though.
Are there any other tools which help with listing or processing these particular backlogs?
Quiddity (Volunteer hat. This is just the address I use to subscribe to this list)
Citiranje "Nick Wilson (Quiddity)" nwilson@wikimedia.org:
- Translation
I also agree that a machine-translation /suggestion/ or /hint/ would be a nice option. The main concern is users who don't understand the limitations of machine-translation and whom must resist the urge to just copy&paste.
It should be possible, perhaps even preferred, to show translation of the most common descriptions, done on translatewiki. Thus all the descriptions like "Wikipedia disambiguation page", "Wikimedia category" etc could be visible in all languages.
On Mon, Feb 20, 2017 at 9:59 PM, Smolenski Nikola smolensk@eunet.rs wrote:
Citiranje "Nick Wilson (Quiddity)" nwilson@wikimedia.org:
- Translation
I also agree that a machine-translation /suggestion/ or /hint/ would be a nice option. The main concern is users who don't understand the limitations of machine-translation and whom must resist the urge to just copy&paste.
It should be possible, perhaps even preferred, to show translation of the most common descriptions, done on translatewiki. Thus all the descriptions like "Wikipedia disambiguation page", "Wikimedia category" etc could be visible in all languages.
I think this (good) example is for a slightly different feature, which means that there are 2 distinct feature-requests:
-----
1) For unique item descriptions (the main focus of this mailing list thread), we want to find a way to "suggest" descriptions to editors, based on machine-translations of existing descriptions in other languages.
1a) This could be a new task in phabricator? (per discussion in this thread)
1b) (Probably a very-long-term goal?) This could also perhaps be https://phabricator.wikimedia.org/T64695 "Draft a computer-assisted translation system for Wikidata labels/descriptions" which discusses the scaling problems, and suggests that we might EVENTUALLY want semi-automated description updates, at least in some items, similar to how Reasonator works. I suspect it would be best to keep those 2 ideas separate, hence I suggest filing a new task for (1a).
------
2) A way for generic description translations, to be automatically added to some items.
2a) For very common & wikimedia-focused descriptions, this seems to be /periodically/ handled by bots. E.g. for Disambiguation items, it looks like User:MilanBot currently handles this task, for example: * https://www.wikidata.org/w/index.php?title=Q260478&action=history * https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/MilanBot E.g. for Category items, it looks like ValterVBot currently handles this task, for example: * https://www.wikidata.org/w/index.php?title=Q6939670&diff=198113824&o... * https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/ValterVB...
This task, https://phabricator.wikimedia.org/T139912 seems to track the idea of properly automating it all, and it links to an onwiki discussion that has many more details. I don't understand the technical discussions, or current state of development, enough to even attempt to summarize.
2b) For other common descriptions, these translations all seem to be manually added? E.g. for items with the description "scientific journal article" or "scientific article". * https://www.wikidata.org/wiki/Q28510879 and https://www.wikidata.org/wiki/Q28579322 and https://www.wikidata.org/wiki/Q28298612 and I think thousands more? However, these are probably not a best practice that we want to encourage, per https://www.wikidata.org/wiki/Help:Description and per some of the descriptions in other languages being more precise (e.g. "vedecký článok (publikovaný 2009-01)" ). Therefore, this (2b) cluster probably belongs more with the (1a/1b) set of feature-requests, and should not be mass-replicated across Wikidata.
I hope that's mostly accurate... Quiddity
Hoi, Labels are a priority and do need attention but descriptions do not even though they are a mess. Descriptions are often added by bot and they are based on an initial set of statements. They are typically not revisited and as statements are added, it becomes increasingly obvious how ill they represent the item involved.
It is an old argument but here we go again. The automated descriptions as developed by Magnus are superior. Like the bot generated descriptions they are based on statements but they are generated as and when they are needed and they do allow for other languages. For me the most crucial part is that when I need disambiguation, I add statements to good effect. Yes, you may want descriptions in a dump but when an algorithm exists, it is possible to run it at dump time as well. My point is that technical issues do not trump usefulness. As it is a lot of time is wasted on something that is obviously below par, something that does not even work well for English. Thanks, GerardM
On 22 February 2017 at 01:04, Nick Wilson (Quiddity) nwilson@wikimedia.org wrote:
On Mon, Feb 20, 2017 at 9:59 PM, Smolenski Nikola smolensk@eunet.rs wrote:
Citiranje "Nick Wilson (Quiddity)" nwilson@wikimedia.org:
- Translation
I also agree that a machine-translation /suggestion/ or /hint/ would be
a
nice option. The main concern is users who don't understand the
limitations
of machine-translation and whom must resist the urge to just copy&paste.
It should be possible, perhaps even preferred, to show translation of
the most
common descriptions, done on translatewiki. Thus all the descriptions
like
"Wikipedia disambiguation page", "Wikimedia category" etc could be
visible in
all languages.
I think this (good) example is for a slightly different feature, which means that there are 2 distinct feature-requests:
- For unique item descriptions (the main focus of this mailing list
thread), we want to find a way to "suggest" descriptions to editors, based on machine-translations of existing descriptions in other languages.
1a) This could be a new task in phabricator? (per discussion in this thread)
1b) (Probably a very-long-term goal?) This could also perhaps be https://phabricator.wikimedia.org/T64695 "Draft a computer-assisted translation system for Wikidata labels/descriptions" which discusses the scaling problems, and suggests that we might EVENTUALLY want semi-automated description updates, at least in some items, similar to how Reasonator works. I suspect it would be best to keep those 2 ideas separate, hence I suggest filing a new task for (1a).
- A way for generic description translations, to be automatically
added to some items.
2a) For very common & wikimedia-focused descriptions, this seems to be /periodically/ handled by bots. E.g. for Disambiguation items, it looks like User:MilanBot currently handles this task, for example:
- https://www.wikidata.org/w/index.php?title=Q260478&action=history
- https://www.wikidata.org/wiki/Wikidata:Requests_for_
permissions/Bot/MilanBot E.g. for Category items, it looks like ValterVBot currently handles this task, for example:
198113824&oldid=197219107
permissions/Bot/ValterVBot
This task, https://phabricator.wikimedia.org/T139912 seems to track the idea of properly automating it all, and it links to an onwiki discussion that has many more details. I don't understand the technical discussions, or current state of development, enough to even attempt to summarize.
2b) For other common descriptions, these translations all seem to be manually added? E.g. for items with the description "scientific journal article" or "scientific article".
https://www.wikidata.org/wiki/Q28579322 and https://www.wikidata.org/wiki/Q28298612 and I think thousands more? However, these are probably not a best practice that we want to encourage, per https://www.wikidata.org/wiki/Help:Description and per some of the descriptions in other languages being more precise (e.g. "vedecký článok (publikovaný 2009-01)" ). Therefore, this (2b) cluster probably belongs more with the (1a/1b) set of feature-requests, and should not be mass-replicated across Wikidata.
I hope that's mostly accurate... Quiddity
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
I'm running into some major label gaps, as are others.
My area of interest is the Company data project. I'm new to SPARQL and here is my working query:
# All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } ORDER BY ASC(?itemLabel)
Background
https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies seems most interested in https://www.wikidata.org/wiki/Q4830453 business enterprise. So I write the above SPARQL to see how "business enterprise" fits under its immediate parent - P279 Organization. I want to learn about all "brother/sister" level objects under "Organization."
If you run the above you will see how many "Organization" children objects have no English label. This greatly impedes understanding what is considered a "business enterprise" and what is not. (Yes - this part of the ontology seems to need some serious tuning up too!) When we go to build out a reasonable starter ontology under the "company data project" we want the structure sound prior to filling it in with a considerable volume of data.
For example, a key goal is the company data needs to "add up" to economic data. Any entity that has a proprietor, partners, or any payroll counts in economic data. Government offices, schools, non profits, etc. all produce goods or services - all contribute to economic output (GDP). So, much of the "company data project" is directly relevant to entities that are more general than just "business entities".
Is there a way I can run a SPARQL query that outputs the EN label if available (as above), and any other label in any other language (including a column for language code) if not? Ideally I'd like to have only one additional language reported if EN is not available, and I'd like to have it report according to my preference (German if available, French if not, then Japanese, Chinese on down the line. It would also be beneficial to have a column for the longer description, if available.
For my analysis purposes now I'm happy to work with simple language translations done by machine. Even if they are slightly off they are probably good enough for my purposes of reviewing and trying to understand the standing ontology. I don't plan on inserting the translations back into WikiData myself, but might try to rally up humans with those specific language skills to double check the machine translations and once verified, insert the translated labels back into WikiData.
I'm not at all familiar with other tools that might be available relevant to "the missing label challenge". Right now SPARQL, SERVICE wikibase:label, and Google Translate seem like the way to go. But, all ideas are most welcome.
Thanks!
Rick
On 2/19/2017 11:00 AM, Romaine Wiki wrote:
Hi all,
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
Is there a way to easily make a transcription from one language to another? Or alternatively if there is a database that has such transcriptions?
Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
Thanks!
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
You can specify multiple languages for the label service:
# All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229. SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" } } ORDER BY ASC(LCASE(?itemLabel))
Link: https://query.wikidata.org/#%23%20All%20subclasses%20of%20a%20class%20exampl...
I’ve also changed the query to sort the results case-insensitively.
(Note: the query seems to occasionally take a very long time for me, 180 seconds – I’m not sure if the many label languages cause the slowdown or if it’s just my internet connection.)
Cheers, Lucas
On 23.02.2017 02:57, Rick Labs wrote:
I'm running into some major label gaps, as are others.
My area of interest is the Company data project. I'm new to SPARQL and here is my working query:
# All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229 . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } } ORDER BY ASC(?itemLabel)
Background
https://www.wikidata.orB/wiki/Wikidata:WikiProject_Companies seems most interested in https://www.wikidata.org/wiki/Q4830453 business enterprise. So I write the above SPARQL to see how "business enterprise" fits under its immediate parent - P279 Organization. I want to learn about all "brother/sister" level objects under "Organization."
If you run the above you will see how many "Organization" children objects have no English label. This greatly impedes understanding what is considered a "business enterprise" and what is not. (Yes - this part of the ontology seems to need some serious tuning up too!) When we go to build out a reasonable starter ontology under the "company data project" we want the structure sound prior to filling it in with a considerable volume of data.
For example, a key goal is the company data needs to "add up" to economic data. Any entity that has a proprietor, partners, or any payroll counts in economic data. Government offices, schools, non profits, etc. all produce goods or services - all contribute to economic output (GDP). So, much of the "company data project" is directly relevant to entities that are more general than just "business entities".
Is there a way I can run a SPARQL query that outputs the EN label if available (as above), and any other label in any other language (including a column for language code) if not? Ideally I'd like to have only one additional language reported if EN is not available, and I'd like to have it report according to my preference (German if available, French if not, then Japanese, Chinese on down the line. It would also be beneficial to have a column for the longer description, if available.
For my analysis purposes now I'm happy to work with simple language translations done by machine. Even if they are slightly off they are probably good enough for my purposes of reviewing and trying to understand the standing ontology. I don't plan on inserting the translations back into WikiData myself, but might try to rally up humans with those specific language skills to double check the machine translations and once verified, insert the translated labels back into WikiData.
I'm not at all familiar with other tools that might be available relevant to "the missing label challenge". Right now SPARQL, SERVICE wikibase:label, and Google Translate seem like the way to go. But, all ideas are most welcome.
Thanks!
Rick
On 2/19/2017 11:00 AM, Romaine Wiki wrote:
Hi all,
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first). But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
Is there a way to easily make a transcription from one language to another? Or alternatively if there is a database that has such transcriptions?
Also the other way round might be helpful for users of Wikidata that use/read it in Chinese, Arabic, Cyrillic script, Hebrew, etc.
Thanks!
Romaine
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
-Thad +ThadGuidry https://www.google.com/+ThadGuidry
Hi!
On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do:
?item rdfs:label ?label
and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi!
On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do:
?item rdfs:label ?label
and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep.
Example at: http://tinyurl.com/h2sbvhd
Ah! That wasn't made clear on the wiki. Thanks Stas!
On Thu, Feb 23, 2017, 12:22 PM Kingsley Idehen kidehen@openlinksw.com wrote:
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi!
On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do:
?item rdfs:label ?label
and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep.
Example at: http://tinyurl.com/h2sbvhd
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com)
Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thanks Stas & especially Kingsley for the example:
# All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?label ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229; rdfs:label ?label . # SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" } FILTER (LANG(?label) = "en") } ORDER BY ASC(LCASE(?itemLabel))
When I pull the FILTER line out of above I have almost what I need - "the universe" of all sub classes of organization (regardless of language). I want all subclasses in the output, not just those available currently with an English label.
In the table output, is it possible to get: a column for language code, and get the description to show up (if available for that row)? That would be very helpful prior to my manual operations.
Can I easily export the results table to CSV or Excel? I can filter and sort easily from there provided I have the hooks.
Thanks very much!
Rick
.
On 2/23/2017 1:22 PM, Kingsley Idehen wrote:
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi!
On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do:
?item rdfs:label ?label
and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep.
Example at: http://tinyurl.com/h2sbvhd
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software (Home Page:http://www.openlinksw.com)
Weblogs (Blogs): Legacy Blog:http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog:http://kidehen.blogspot.com Medium Blog:https://medium.com/@kidehen
Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about LinkedIn:http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal:http://kingsley.idehen.net/dataspace/person/kidehen#this :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Thanks Stas & especially Kingsley for the example:
# All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?label ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229; rdfs:label ?label . # SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" } FILTER (LANG(?label) = "en") } ORDER BY ASC(LCASE(?itemLabel))
When I pull the FILTER line out of above I have almost what I need - "the universe" of all sub classes of organization (regardless of language). I want all subclasses in the output, not just those available currently with an English label.
In the table output, is it possible to get: a column for language code, and get the description to show up (if available for that row)? That would be very helpful prior to my manual operations.
Can I easily export the results table to CSV or Excel? I can filter and sort easily from there provided I have the hooks.
Thanks very much!
Rick
.
On 2/23/2017 1:22 PM, Kingsley Idehen wrote:
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi!
On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do:
?item rdfs:label ?label
and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep.
Example at: http://tinyurl.com/h2sbvhd
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software (Home Page:http://www.openlinksw.com)
Weblogs (Blogs): Legacy Blog:http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog:http://kidehen.blogspot.com Medium Blog:https://medium.com/@kidehen
Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about LinkedIn:http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal:http://kingsley.idehen.net/dataspace/person/kidehen#this :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Hi Rick,
Is this what you're after? http://tinyurl.com/z7ru9yr
Once you run the query there is a download drop-down menu, just above the query results on the right hand side of the screen - it has a range of options including CSV.
Hope that helps!
Nav
On 24 February 2017 at 02:25, Rick Labs tmp2004@clbcm.com wrote:
Thanks Stas & especially Kingsley for the example:
# All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?label ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229; rdfs:label ?label . # SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" } FILTER (LANG(?label) = "en") } ORDER BY ASC(LCASE(?itemLabel))
When I pull the FILTER line out of above I have almost what I need - "the universe" of all sub classes of organization (regardless of language). I want all subclasses in the output, not just those available currently with an English label.
In the table output, is it possible to get: a column for language code, and get the description to show up (if available for that row)? That would be very helpful prior to my manual operations.
Can I easily export the results table to CSV or Excel? I can filter and sort easily from there provided I have the hooks.
Thanks very much!
Rick
.
On 2/23/2017 1:22 PM, Kingsley Idehen wrote:
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi!
On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all
Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do:
?item rdfs:label ?label
and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep.
Example at: http://tinyurl.com/h2sbvhd
-- Regards,
Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com)
Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen
Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen
Web Identities (WebID): Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Nav,
YES!!! that's it! Your SPARQL works perfectly, exactly what I wanted.
Thanks very much. Just had to learn how to get the CVS into Excel as UTF-8, not hard. Can finally see what objects people want immediately below "Organizations", worldwide. (yes, whats evolved is pretty darn "chaotic")
Very much appreciated.
Rick
On 2/24/2017 7:25 AM, Navino Evans wrote:
Hi Rick,
Is this what you're after? http://tinyurl.com/z7ru9yr Once you run the query there is a download drop-down menu, just above the query results on the right hand side of the screen - it has a range of options including CSV. Hope that helps! Nav
On 24 February 2017 at 02:25, Rick Labs <tmp2004@clbcm.com mailto:tmp2004@clbcm.com> wrote:
Thanks Stas & especially Kingsley for the example: # All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?label ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229; rdfs:label ?label . # SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" } FILTER (LANG(?label) = "en") } ORDER BY ASC(LCASE(?itemLabel)) When I pull the FILTER line out of above I have almost what I need - "the universe" of all sub classes of organization (regardless of language). I want all subclasses in the output, not just those available currently with an English label. In the table output, is it possible to get: a column for language code, and get the description to show up (if available for that row)? That would be very helpful prior to my manual operations. Can I easily export the results table to CSV or Excel? I can filter and sort easily from there provided I have the hooks. Thanks very much! Rick . On 2/23/2017 1:22 PM, Kingsley Idehen wrote:
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi! On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do: ?item rdfs:label ?label and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep. Example at: http://tinyurl.com/h2sbvhd -- Regards, Kingsley Idehen Founder & CEO OpenLink Software (Home Page:http://www.openlinksw.com) Weblogs (Blogs): Legacy Blog:http://www.openlinksw.com/blog/~kidehen/ <http://www.openlinksw.com/blog/%7Ekidehen/> Blogspot Blog:http://kidehen.blogspot.com Medium Blog:https://medium.com/@kidehen Profile Pages: Pinterest:https://www.pinterest.com/kidehen/ <https://www.pinterest.com/kidehen/> Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen <https://www.quora.com/profile/Kingsley-Uyi-Idehen> Twitter:https://twitter.com/kidehen Google+:https://plus.google.com/+KingsleyIdehen/about <https://plus.google.com/+KingsleyIdehen/about> LinkedIn:http://www.linkedin.com/in/kidehen <http://www.linkedin.com/in/kidehen> Web Identities (WebID): Personal:http://kingsley.idehen.net/dataspace/person/kidehen#this <http://kingsley.idehen.net/dataspace/person/kidehen#this> :http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this <http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
-- /navino@histropedia.com mailto:navino@histropedia.com/ @NavinoEvans https://twitter.com/NavinoEvans
www.histropedia.com http://www.histropedia.com/ Twitter https://twitter.com/Histropedia Facebo https://www.facebook.com/Histropediaok https://www.facebook.com/Histropedia Google + https://plus.google.com/+Histropedia
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 2/24/17 4:00 PM, Rick Labs wrote:
Nav,
YES!!! that's it! Your SPARQL works perfectly, exactly what I wanted.
Thanks very much. Just had to learn how to get the CVS into Excel as UTF-8, not hard. Can finally see what objects people want immediately below "Organizations", worldwide. (yes, whats evolved is pretty darn "chaotic")
Very much appreciated.
Rick
An FYI, and happy to receive feedback about how to achieve the same thing directly from Wikidata:
[1] http://tinyurl.com/gohu2eg -- SPARQL-FED from a Service that can return a CSV doc URI [2] http://tinyurl.com/zblqqf2 -- SPARQL Query Results in CSV format.
Getting the CSV into Google Spreadsheet boils down to the formula: =importData("http://tinyurl.com/zblqqf2") or the use the raw URI if the tinyURL trips up Google Spreadsheet.
curl -iLO http://tinyurl.com/zblqqf2 to make a local copy that you can import into Excel using its CSV import. In the past, you used to be able to consume a URI directly in Excel, but things have gotten strange of late.
Links:
[1] https://www.linkedin.com/pulse/importing-data-google-spreadsheet-using-sparq...
[2] http://kidehen.blogspot.com/2015/06/importing-data-into-microsoft-excel-via....
[3] http://kidehen.blogspot.com/2015/06/importing-data-into-google-spreadsheet.h...
Kingsley
On 2/24/2017 7:25 AM, Navino Evans wrote:
Hi Rick,
Is this what you're after? http://tinyurl.com/z7ru9yr Once you run the query there is a download drop-down menu, just above the query results on the right hand side of the screen - it has a range of options including CSV. Hope that helps! Nav
On 24 February 2017 at 02:25, Rick Labs <tmp2004@clbcm.com mailto:tmp2004@clbcm.com> wrote:
Thanks Stas & especially Kingsley for the example: # All subclasses of a class example # here all subclasses of P279 Organization (Q43229) SELECT ?item ?label ?itemDescription ?itemAltLabel WHERE { ?item wdt:P279 wd:Q43229; rdfs:label ?label . # SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de,fr,ja,cn,ru,es,sv,pl,nl,sl,ca,it" } FILTER (LANG(?label) = "en") } ORDER BY ASC(LCASE(?itemLabel)) When I pull the FILTER line out of above I have almost what I need - "the universe" of all sub classes of organization (regardless of language). I want all subclasses in the output, not just those available currently with an English label. In the table output, is it possible to get: a column for language code, and get the description to show up (if available for that row)? That would be very helpful prior to my manual operations. Can I easily export the results table to CSV or Excel? I can filter and sort easily from there provided I have the hooks. Thanks very much! Rick . On 2/23/2017 1:22 PM, Kingsley Idehen wrote:
On 2/23/17 12:59 PM, Stas Malyshev wrote:
Hi! On 2/23/17 7:20 AM, Thad Guidry wrote:
In Freebase we had a parameter %lang=all Does the SPARQL label service have something similar ?
Not as such, but you don't need it if you want all the labels, just do: ?item rdfs:label ?label and you'd get all labels. No need to invoke service for that, the service is for when you have specific set of languages you're interested in.
Yep. Example at: http://tinyurl.com/h2sbvhd -- Regards, Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com) Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ <http://www.openlinksw.com/blog/%7Ekidehen/> Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ <https://www.pinterest.com/kidehen/> Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen <https://www.quora.com/profile/Kingsley-Uyi-Idehen> Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about <https://plus.google.com/+KingsleyIdehen/about> LinkedIn: http://www.linkedin.com/in/kidehen <http://www.linkedin.com/in/kidehen> Web Identities (WebID): Personal: http://kingsley.idehen.net/dataspace/person/kidehen#this <http://kingsley.idehen.net/dataspace/person/kidehen#this> : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this <http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this> _______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata <https://lists.wikimedia.org/mailman/listinfo/wikidata>
-- /navino@histropedia.com mailto:navino@histropedia.com/ @NavinoEvans https://twitter.com/NavinoEvans
www.histropedia.com http://www.histropedia.com/ Twitter https://twitter.com/Histropedia Facebo https://www.facebook.com/Histropediaok https://www.facebook.com/Histropedia Google + https://plus.google.com/+Histropedia
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
On 24 February 2017 at 22:00, Rick Labs tmp2004@clbcm.com wrote:
Nav,
YES!!! that's it! Your SPARQL works perfectly, exactly what I wanted.
Thanks very much. Just had to learn how to get the CVS into Excel as UTF-8, not hard. Can finally see what objects people want immediately below "Organizations", worldwide. (yes, whats evolved is pretty darn "chaotic") Very much appreciated.
Rick
Excellent!! Very happy to help. Best of luck cleaning up the chaos :)
Am 19.02.2017 um 17:00 schrieb Romaine Wiki:
Hi all,
If you look in the recent changes, most items have labels in English and those are shown in the recent changes and elsewhere (so we know what the item is about without opening first).
Wikidata actually tries to show you the labels in your üpreferred interface language. And if you user language is not available, it uses a fallback mechanism to show the next-best language, which may even include automated transciptions. When all else fails, it will show the English label. If that doesn't exist, it shows the ID.
But not all items have labels, and these items without English label are often items with only a label in Chinese, Arabic, Cyrillic script, Hebrew, etc. This forms a significant gap.
The fallback mechanism works OK, but is not great for English speaking users who see a lot of items that have no English label. For English, we just don't know what to fall back to. Just anything? Or try european languages first? What should the rule be? If we can decide on a good rule, it should actualyl be pretty simple to add such fallback for English.
Is there a way to easily make a transcription from one language to another?
We have such rules for some languages/variants, e.g. between the cyrillic and the roman representations of Kazakh or Uzbek. But translitteration rules can be complex, and covering every permutation of the 300 languages we support would mean we'd need about 45000 rule sets...
Or alternatively if there is a database that has such transcriptions?
Not yet. One of the goals of Wikidata is to be that database.
On February 27, 2017 at 7:54:43 AM, Daniel Kinzler ( daniel.kinzler@wikimedia.de) wrote:
The fallback mechanism works OK, but is not great for English speaking users who see a lot of items that have no English label. For English, we just don't know what to fall back to. Just anything? Or try european languages first? What should the rule be? If we can decide on a good rule, it should actualyl be pretty simple to add such fallback for English.
One option is to allow users to define their own ranked preferences for language beyond just first place. (I personally would enjoy having French as a fallback to English.) This has the downside of only really working for people with accounts, which I suspect might be a minority of overall traffic.
Cheers, James Hare
Am 27.02.2017 um 17:01 schrieb James Hare:
One option is to allow users to define their own ranked preferences for language beyond just first place. (I personally would enjoy having French as a fallback to English.)
That would badly fragment the parser cache. I don't think it's viable.
This has the downside of only really working for people with accounts, which I suspect might be a minority of overall traffic.
Currently, we only support English for anon visiors (yes, this is very sad; the reason is, again, caching - varnish, this time).
Good fall back languages for English would be any of the Germanic languages or Romance languages. As a native American, I also would agree with this article's listing of languages that are more easily understood by my brain:
1. Afrikaans 2. Danish 3. French 4. Italian 5. Norwegian etc.
9 easy languages for English Speakers. https://matadornetwork.com/abroad/9-easy-languages-for-english-speakers-to-l...
-Thad +ThadGuidry https://www.google.com/+ThadGuidry
Something I have been wondering is whether it is possible to get a template on eg Commons for a templated WDQS query to take account of the user's language (and also, ideally, preferred fall-back languages, as perhaps indicated by their {{#babel}} settings).
I had hoped it might be possible to include these preferences as a parameter string in the "label service" part of the query text.
From what Daniel is saying, it seems this may not be possible, because the template expansion would then depend on the user's preferred language(s), which would not be compatible with the template cacheing.
Is that right? Or is there a way round this?
-- James.
On 27/02/2017 16:03, Daniel Kinzler wrote:
Am 27.02.2017 um 17:01 schrieb James Hare:
One option is to allow users to define their own ranked preferences for language beyond just first place. (I personally would enjoy having French as a fallback to English.)
That would badly fragment the parser cache. I don't think it's viable.
This has the downside of only really working for people with accounts, which I suspect might be a minority of overall traffic.
Currently, we only support English for anon visiors (yes, this is very sad; the reason is, again, caching - varnish, this time).
Am 27.02.2017 um 18:18 schrieb James Heald:
From what Daniel is saying, it seems this may not be possible, because the template expansion would then depend on the user's preferred language(s), which would not be compatible with the template cacheing.
Is that right? Or is there a way round this?
We are currently aiming for a compromise: we render the page with the user's interface language as the target language, and apply fallback accordingly. We do not take into account secondary user languages, as defined e.g. by the Babel or Translate extensions.
This means a user with the UI language set to French will see French if available, but will not see Spanish, even if they somehow declared that they also speak Spanish.
This way, we split the parser cache once per UI language - a factor of 300, but not the exponential explosion we would get if we would split on every possible permutation of languages (does anyone want to compute 300 factorial?).