Hello,
I have been thinking of a way to organise data in Wiktionary that would allow for words to automatically show translations to other languages with much less work than is currently required.
Currently, translations to other languages have to be added manually, meaning they are not automatically propagated across language pairs. What I mean by this is showcased in the following example:
1. I create a page for word X in language A. 2. I create a page for word Y in language B. 3. I add a translation to the page for word X, and state that it translates to word Y in language B. 4. If I want the page for word Y to show that it translates to word X in language A, I have to do this manually.
Automating this seems a bit tricky. I think that the key is acknowledging that meanings can be separated from language and used as the links of translation. In this view, words and their definitions are language-specific, but meanings are language-agnostic.
Because I may have done a bad job at explaining this context, I have created a short example in the form of an sqlite3 SQL script that creates a small dictionary database with two meanings for the word "desert"; one of the meanings has been linked to the corresponding words in Spanish and in German. The script mainly showcases how words can be linked across languages with minimal rework.
You can find the script attached. To experiment with this, simply run
.read feature_showcase.sql
within an interactive sqlite3 session. (There may be other ways of doing it but this is how I tested it.)
I believe this system can also be used to automate other word relations such as hyponyms and hypernyms, meronyms and holonyms, and others. It can also allow looking up words in other languages and getting definitions in the language of choice. In short, it would allow Wiktionary to more effortlessly function as a universal dictionary.
Has something like this been suggested before? I would be pleased to receive feedback on this idea.
With kind regards, Wolter HV
Off hand isn’t this something that wikidata was setup to handle?
On Sun, Jun 20, 2021 at 12:40 PM Wolter HV wolterhv@gmx.de wrote:
Hello,
I have been thinking of a way to organise data in Wiktionary that would allow for words to automatically show translations to other languages with much less work than is currently required.
Currently, translations to other languages have to be added manually, meaning they are not automatically propagated across language pairs. What I mean by this is showcased in the following example:
- I create a page for word X in language A.
- I create a page for word Y in language B.
- I add a translation to the page for word X, and state that it
translates to word Y in language B. 4. If I want the page for word Y to show that it translates to word X in language A, I have to do this manually.
Automating this seems a bit tricky. I think that the key is acknowledging that meanings can be separated from language and used as the links of translation. In this view, words and their definitions are language-specific, but meanings are language-agnostic.
Because I may have done a bad job at explaining this context, I have created a short example in the form of an sqlite3 SQL script that creates a small dictionary database with two meanings for the word "desert"; one of the meanings has been linked to the corresponding words in Spanish and in German. The script mainly showcases how words can be linked across languages with minimal rework.
You can find the script attached. To experiment with this, simply run
.read feature_showcase.sql
within an interactive sqlite3 session. (There may be other ways of doing it but this is how I tested it.)
I believe this system can also be used to automate other word relations such as hyponyms and hypernyms, meronyms and holonyms, and others. It can also allow looking up words in other languages and getting definitions in the language of choice. In short, it would allow Wiktionary to more effortlessly function as a universal dictionary.
Has something like this been suggested before? I would be pleased to receive feedback on this idea.
With kind regards, Wolter HV _______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-leave@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Hi!
On Wed, Jun 23, 2021 at 11:05 AM Wolter HV wolterhv@gmx.de wrote:
[2021-06-20 17:43 +0100] John:
Off hand isn’t this something that wikidata was setup to handle?
I'm not sure, but I don't see the functionality currently being there in Wiktionary. Is this something currently under development?
Yeah, for something like fifteen years, I guess… :-) See e.g. OmegaWiki (formerly known as WiktionaryZ).
The modern incarnation of machine-readable dictionary is the Lexicographical Data project on Wikidata. It is a nice project, definitely go take a look at it, but it is not really an evolution/improvement of Wiktionary but rather a fresh start. (Among other reasons because of the license incompatibility of Wiktionary’s CC-BY-SA with Wikidata’s CC-0.) See https://www.wikidata.org/wiki/Wikidata:Lexicographical_data
-- [[cs:User:Mormegil | Petr Kadlec]]
[2021-06-23 10:56 +0100] petr:
Hi!
Hi! Thanks for your reply.
Yeah, for something like fifteen years, I guess… :-) See e.g. OmegaWiki (formerly known as WiktionaryZ).
OmegaWiki is, if not exactly, astoundingly near what I was proposing. It links words to meanings and automatically derives translations from that, which is the main feature I was looking for. It also supports linking words with one another with different relationships like hypo- and hypernimic. I wonder why it isn't more popular.
The modern incarnation of machine-readable dictionary is the Lexicographical Data project on Wikidata. It is a nice project, definitely go take a look at it, but it is not really an evolution/improvement of Wiktionary but rather a fresh start. (Among other reasons because of the license incompatibility of Wiktionary’s CC-BY-SA with Wikidata’s CC-0.) See https://www.wikidata.org/wiki/Wikidata:Lexicographical_data
Thanks, this is interesting too, though this project doesn't seem to decouple meanings from words, so automatic translations don't work with it (as far as I could see from my short snoop-around.)
I'll stick to OmegaWiki and hopefully add my grain of salt to it. Thanks for bringing that to the conversation!
Regards, Wolter HV
Hi,
On Sun, Jul 4, 2021 at 2:49 AM Wolter HV wolterhv@gmx.de wrote:
Thanks, this is interesting too, though this project doesn't seem to decouple meanings from words, so automatic translations don't work with it (as far as I could see from my short snoop-around.)
I’m not sure what you mean with “decouple meanings from words”. Sure, lexemes themselves do not have DefinedMeaning entries like OmegaWiki does, but note this is a part of Wikidata, and lexeme senses are linked with main-namespace Wikidata items using the “item for this sense” (P5137) property. See e.g. https://www.wikidata.org/wiki/Lexeme:L10984 linking the “point of entry to an enclosed space” sense to https://www.wikidata.org/wiki/Q53060 and also to the corresponding lexemes in other language(s), in this case, https://www.wikidata.org/wiki/Lexeme:L406305#S1
HTH, -- [[cs:User:Mormegil | Petr Kadlec]]
[2021-06-20 17:39 +0100] Wolter HV:
You can find the script attached. To experiment with this, simply run
.read feature_showcase.sql
within an interactive sqlite3 session. (There may be other ways of doing it but this is how I tested it.)
I found out, unsurprisingly, that my attachment didn't make it into the mailing list :D
Here is a pastebin link with the aforementioned sqlite3 SQL script:
https://paste.gnome.org/pca7e7y0v
Regards, Wolter HV
wikitech-l@lists.wikimedia.org