Hi,
I want to auto-generate disambiguation description for African politicians to be added to Wikidata, e.g. from the country Mozambique (Q1029) the following descriptions should be generated:
Mozambican politician (en) Mosambikanischer Politiker (de) politico mozambicano (it) ...
This could be extended to other professions. My questions:
- Can anyone point me to data sources where to best look up country adjectives such as "Mozambican"?
- Where/how to best store the lexical information for best reuse with other renderers
- If a create small renderers for this short descriptions, what architecture do you prefer for best reuse?
My just-get-it-done solution would be a set of CSV files and a few lines of Perl code, but maybe this use case can be aligned with Abstract Wikidata to better learn about it.
Looking forward to collaborate, Jakob
Jakob asked:
- Can anyone point me to data sources where to best look up country adjectives such as "Mozambican"?
Are you aware of P1549 (demonym) in Wikidata? https://www.wikidata.org/wiki/Q1029#P1549 has the adjectives you are looking for, at least for some languages. It might not be ideal, but it exists. Encoding further grammatical details (word order, any other words needed) would need to be language specific, but I think you can see how a language-based template pointing to Q1029, P1549, and the Q item for the occupation would accomplish this.
Arthur
On Sat, Jul 4, 2020 at 11:06 AM Jakob Voß jakob.voss@gbv.de wrote:
Hi,
I want to auto-generate disambiguation description for African politicians to be added to Wikidata, e.g. from the country Mozambique (Q1029) the following descriptions should be generated:
Mozambican politician (en) Mosambikanischer Politiker (de) politico mozambicano (it) ...
This could be extended to other professions. My questions:
- Can anyone point me to data sources where to best look up country
adjectives such as "Mozambican"?
- Where/how to best store the lexical information for best reuse with
other renderers
- If a create small renderers for this short descriptions, what
architecture do you prefer for best reuse?
My just-get-it-done solution would be a set of CSV files and a few lines of Perl code, but maybe this use case can be aligned with Abstract Wikidata to better learn about it.
Looking forward to collaborate, Jakob
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
Hi Jakob,
thank you for reaching out! You got some answers for the demonyms.
Regarding how to best learn from this for the Abstract Wikipedia: as you say yourself, the code itself is just a few lines - but they come with a lot of interesting lessons attached to it.
As you start doing that for an increasing number of languages we will run into interesting issues, such as differences by grammatical gender being expressed in some but not in other languages, how to deal with historical countries, how to deal with cases as Marie Curie, Tesla, etc.
I think capturing these in an essay either on your web site or on wiki can be very valuable.
Again, I think the code itself will be short and can be quickly rewritten - but the lessons you learned on the way will too easily be forgotten if not captured.
Thank you, Denny
On Sat, Jul 4, 2020 at 8:06 AM Jakob Voß jakob.voss@gbv.de wrote:
Hi,
I want to auto-generate disambiguation description for African politicians to be added to Wikidata, e.g. from the country Mozambique (Q1029) the following descriptions should be generated:
Mozambican politician (en) Mosambikanischer Politiker (de) politico mozambicano (it) ...
This could be extended to other professions. My questions:
- Can anyone point me to data sources where to best look up country
adjectives such as "Mozambican"?
- Where/how to best store the lexical information for best reuse with
other renderers
- If a create small renderers for this short descriptions, what
architecture do you prefer for best reuse?
My just-get-it-done solution would be a set of CSV files and a few lines of Perl code, but maybe this use case can be aligned with Abstract Wikidata to better learn about it.
Looking forward to collaborate, Jakob
Abstract-Wikipedia mailing list Abstract-Wikipedia@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
abstract-wikipedia@lists.wikimedia.org