Re: [Wikidata] Lexical datas and automated learning – where it is answered to « I don’t believe in Wikidata senses developpment »

20 Sep 2019

This is very interesting and definitely relevant to my interrogation.

Finance aside, it may be worth trying to define more in depth what would be
a workflow, and usecases. For people to validate datas, they have to want
it … Why would they want it ? It’s not enough to just build a random tool
and hope it will be used. I find myself often testing tools proposed in the
Wikidata Weekly, actually it rarely occurs that it become something I use
very often.

We’re on a Wikimedia project, one of the purpose shared by people in it is
to build an encyclopedy … One cool goal would be a tool that for example
analyse the text produced by people and suggests wikilinks. People could
then give feedback by saying to the tool « yes, this is indeed the meaning
of this word in my text » to validate the output, or « yes, this is indeed
a cool wikilink ». Just a thought.

On this example, those techs would be used to analyse the produced text and
make suggestions. It could be made through an open API that receive
(wiki)text, returns annotated wikitext, and recieve feedback from users
through the client UI, eventually writing posiive feedback into Wikidata if
it’s missing.

Such a service would be reusable for all kind of usecases. Another could
be, for example, analysis of commons semi-structured descriptions to
suggests values for depicts claims.

Le ven. 20 sept. 2019 à 14:22, Houcemeddine A. Turki <
turkiabdelwaheb(a)hotmail.fr&gt; a écrit :

...
  Dear all,
 I thank you for your efforts. To know more about word embedding and
 semantic similarity, please refer to the survey of our research group about
 the issue available at
 https://www.sciencedirect.com/science/article/pii/S0952197619301745. If
 you would like that we work on using these techniques to enrich
 Lexicographical Data on Wikidata, we will be honoured to do this. However,
 we will face two main problems. The first one is absolutely funding and the
 second one is that we need people to validate the information returned by
 these two techniques and adjust it if needed.
 Yours Sincerely,
 Houcemeddine Turki (he/him)
 Medical Student, Faculty of Medicine of Sfax, University of Sfax, Tunisia
 Undergraduate Researcher, UR12SP36
 GLAM, Research and Education Coordinator, Wikimedia TN User Group
 Member, Wiki Project Med
 Member, WikiIndaba Steering Committee
 Member, Wikimedia and Library User Group Steering Committee
 Co-Founder, WikiLingua Maghreb
 ____________________
 +21629499418

 -------- Message d'origine --------
 De : Thomas Douillard &lt;thomas.douillard(a)gmail.com&gt;
 Date : 2019/09/20 12:08 (GMT+01:00)
 À : "Discussion list for the Wikidata project." <
 wikidata(a)lists.wikimedia.org&gt;
 Objet : [Wikidata] Lexical datas and automated learning – where it is
 answered to « I don’t believe in Wikidata senses developpment »

 I recently read the french sentence « Je ne crois pas au développement
 des sens. » — translation : I don’t believes senses with develop much
 (following links in a Wikidata Weekly summary, the slides on a french
 meeting about Wikidata lexicographical datas). I believe in it, (regardless
 of the arguments exposed in the slides), and I write this email to try to
 explain why.

 I’m curious to know if there is already some work on the automated
 discovering of lexicographical datas / senses thanks to the help of
 Wikidata items.

 There is tools for automated tagging of terms with the corresponding
 Wikidata item, that appeared on this mailing list and/or on the wikidata
 weekly summaries.
 There is also methods that can discover senses into texts using only the
 terms with no reference to any external « sense » like
 https://towardsdatascience.com/word-embedding-with-word2vec-and-fasttext-a2…
 and can discriminate several usages of the same word according to the
 context.

 Wikidata lexicographical datas and Wikibase items could close the loop
 between the 2 methods and allow us to semi automatically build tools that
 annotate texts with Wikidata items it there is something relevant in
 Wikidata, but if there is nono try to suggest to add datas on Wikidata,
 wether it’s a missing item or a missing sense for the term.

 It may even be possible to store word embeddings generated by word2vec
 methods into Wikidata senses.

 In conclusion, I think Wikidata senses will be used because they allow to
 close a gap. It does not depends only on a strong involvement in a
 volunteer traditional lexicographic community. If reasearchers of the
 language community dives into this and develop algorithms and easy to use
 tools to share there lexicographical datas in Wikidata, there could be a
 very positive feedback loop where numerous data ends to be added on
 Wikidata, where the store datas helps the algorithm to enrich text
 annotations, for example, and missing datas are semi automatically added
 thanks to user feedback.

 This is all just wishful thinking, but I thought this deserved to be
 shared, hopefully this will launch at list a thread of ideas/comment in
 here :)

 Thomas

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [Wikidata] Lexical datas and automated learning – where it is answered to « I don’t believe in Wikidata senses developpment »