I recently read the french sentence « Je ne crois pas au développement des
sens. » — translation : I don’t believes senses with develop much
(following links in a Wikidata Weekly summary, the slides on a french
meeting about Wikidata lexicographical datas). I believe in it, (regardless
of the arguments exposed in the slides), and I write this email to try to
explain why.
I’m curious to know if there is already some work on the automated
discovering of lexicographical datas / senses thanks to the help of
Wikidata items.
There is tools for automated tagging of terms with the corresponding
Wikidata item, that appeared on this mailing list and/or on the wikidata
weekly summaries.
There is also methods that can discover senses into texts using only the
terms with no reference to any external « sense » like
https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a2…
and can discriminate several usages of the same word according to the
context.
Wikidata lexicographical datas and Wikibase items could close the loop
between the 2 methods and allow us to semi automatically build tools that
annotate texts with Wikidata items it there is something relevant in
Wikidata, but if there is nono try to suggest to add datas on Wikidata,
wether it’s a missing item or a missing sense for the term.
It may even be possible to store word embeddings generated by word2vec
methods into Wikidata senses.
In conclusion, I think Wikidata senses will be used because they allow to
close a gap. It does not depends only on a strong involvement in a
volunteer traditional lexicographic community. If reasearchers of the
language community dives into this and develop algorithms and easy to use
tools to share there lexicographical datas in Wikidata, there could be a
very positive feedback loop where numerous data ends to be added on
Wikidata, where the store datas helps the algorithm to enrich text
annotations, for example, and missing datas are semi automatically added
thanks to user feedback.
This is all just wishful thinking, but I thought this deserved to be
shared, hopefully this will launch at list a thread of ideas/comment in
here :)
Thomas