Re: [Wiki-research-l] Machine-utilizable Crowdsourced Lexicons

1 Jun 2018

Thomas,

I also wanted to briefly indicate how non-trivial that some of these technical topics are;
for example algorithmically determining which interpretation hypotheses are correct for
sentences or whether one or more constituent elements of sentences are best interpreted in
ways not yet specified in a growing, dynamic lexicon.

The matter relates to language learning. There is the matter of encountering new lexemes,
lexemes with zero senses thus far in the lexicon, and then there is the matter of
encountering new senses of lexemes previously encountered.

My earlier comment was that software systems could signal machine-utilizable crowdsourced
lexicon services, in the case of certain events, so that users could utilize data to
prioritize collaborative work. I also theorize, as others do, that a viable concept of
sequencing work with respect to building natural language understanding systems and
lexicons is entering data in the order of reading level, from infancy to adult reading
level.

Building machine-utilizable crowdsourced lexicon software with rich, structured metadata
and with extensible storage slots for definitions in multiple knowledge representation
formats is a difficult task; one that makes possible other difficult tasks utilizing such
lexicons.

Thank you for the enjoyable brainstorming session and for indicating the state of the art
with regard to projects underway. I am interested in any of your thoughts, opinions and
ideas with respect to the future of machine-utilizable crowdsourced lexicons.

Best regards,

Adam

________________________________
From: Wiki-research-l &lt;wiki-research-l-bounces(a)lists.wikimedia.org&gt; on behalf of
Adam Sobieski &lt;adamsobieski(a)hotmail.com&gt;
Sent: Thursday, May 31, 2018 4:26:46 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Machine-utilizable Crowdsourced Lexicons

Thomas,

Thank you for the exciting information with regard to the future of Wikidata lexemes. With
bulk upload and update capabilities, we might anticipate alignments and uploads from
projects on the scales of FrameNet, PropBank, VerbNet and WordNet.

With regard to crowdsourced lexicons containing machine-utilizable definitions, we can
consider a feature where, as software using the API’s for definitions find that there
aren’t yet definitions for  particular lexemes, counters can be accumulated such that
users can observe which lexemes’ definitions are in popular demand. This could be a means
of prioritizing which lexemes to rigorously define.

We might envision natural language understanding, including semantic interpretation, of
children’s books in upcoming years.

Best regards,

Adam

________________________________
From: Wiki-research-l &lt;wiki-research-l-bounces(a)lists.wikimedia.org&gt; on behalf of
Thomas Pellissier Tanon &lt;thomas(a)pellissier-tanon.fr&gt;
Sent: Thursday, May 31, 2018 6:25:56 AM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Machine-utilizable Crowdsourced Lexicons

...
  In addition to Web-based user interfaces for content
editing, machine lexicons could support bulk API’s including those based on XML-RPC
and
SPARUL.

It is what it is planned for Wikidata lexemes. There is already a REST API.
Example:
https://www.wikidata.org/wiki/Special:EntityData/L42.json

We are currently working on an RDF output of the lexemes content using
Lemon/Ontolex [1]. It is planned to import this RDF representation into
https://query.wikidata.org in order to be able to execute SPARQL queries on
it.

Cheers,

Thomas

[1] https://mediawiki.org/wiki/Extension:WikibaseLexeme/RDF_mapping

Le jeu. 31 mai 2018 à 05:22, Adam Sobieski &lt;adamsobieski(a)hotmail.com&gt; a
écrit :

> Micru,
> Finn,
>
> Thank you for the hyperlinks to the pertinent projects.
>
> I’m thinking that machine lexicon services could include URL-addressible:
> (1) headwords and lemmas, (2) conjugations and declensions, and (3)
> specific senses or definitions. Each conjugation or declension could have
> its own URL-addressable definitions. Machine-utilizable definitions are
> envisioned as existing in a number of machine-utilizable knowledge
> representation formats.
>
...
  In addition to Web-based user interfaces for content
editing, machine > lexicons could support bulk API’s including those based on
XML-RPC and
> SPARUL. With regard to the use of SPARQL and SPARUL, there may already
> exist a suitable ontology. Some lexical ontologies include: Lemon (
> https://www.w3.org/2016/05/ontolex/), LexInfo (http://www.lexinfo.net/),
> LIR (http://mayor2.dia.fi.upm.es/oeg-upm/index.php/en/technologies/63-lir/),
> LMM (http://ontologydesignpatterns.org/wiki/Ontology:LMM), semiotics.owl (
> http://www.ontologydesignpatterns.org/cp/owl/semiotics.owl), and Senso
> Comune (http://www.sensocomune.it/). It should be possible to extend
> existing ontologies to include machine-utilizable definitions in a number
> of knowledge representation formats.
>
> I’m thinking about topics in knowledge representation with regard to the
> formal semantics of nouns, verbs, adjectives, adverbs, pronouns,
> prepositions and conjunctions and about how automated reasoners could make
> use of machine-utilizable definitions to obtain and compare semantic
> interpretations as software systems parse natural language.
>
>
> Best regards,
> Adam
>
> _______________________________________________
> Wiki-research-l mailing list
> Wiki-research-l(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
_______________________________________________
Wiki-research-l mailing list
Wiki-research-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [Wiki-research-l] Machine-utilizable Crowdsourced Lexicons