Re: [Wikipedia-l] [Wikidata] Fwd: [Wikimedia-l] Wikipedia in an abstract language - Wikipedia-l

15 Jan 2019

Cool, thanks! I read this a while ago, rereading again.

On Tue, Jan 15, 2019 at 3:28 AM Sebastian Hellmann <
hellmann(a)informatik.uni-leipzig.de&gt; wrote:

...
  Hi all,

 let me send you a paper from 2013, which might either help directly or at
 least to get some ideas...

 A lemon lexicon for DBpedia, Christina Unger, John McCrae, Sebastian
 Walter, Sara Winter, Philipp Cimiano, 2013, Proceedings of 1st
 International Workshop on NLP and DBpedia, co-located with the 12th
 International Semantic Web Conference (ISWC 2013), October 21-25, Sydney,
 Australia

 https://github.com/ag-sc/lemon.dbpedia

 https://pdfs.semanticscholar.org/638e/b4959db792c94411339439013eef536fb052.…

 Since the mappings from DBpedia to Wikidata properties are here:
 http://mappings.dbpedia.org/index.php?title=Special:AllPages&namespace=…
 e.g. http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate

 You could directly use the DBpedia-lemon lexicalisation for Wikidata.

 The mappings can be downloaded with

 git clone https://github.com/dbpedia/extraction-framework ; cd core ;
 ../run download-mappings

 All the best,

 Sebastian

 On 14.01.19 18:34, Denny Vrandečić wrote:

 Felipe,

 thanks for the kind words.

 There are a few research projects that use Wikidata to generate parts of
 Wikipedia articles - see for example https://arxiv.org/abs/1702.06235 which
 is almost as good as human results and beats templates by far, but only for
 the first sentence of biographies.

 Lucie Kaffee has also quite a body of research on that topic, and has
 worked very succesfully and tightly with some Wikipedia communities on
 these questions. Here's her bibliography:
 https://scholar.google.com/citations?user=xiuGTq0AAAAJ&hl=de

 Another project of hers is currently under review for a grant:

https://meta.wikimedia.org/wiki/Grants:Project/Scribe:_Supporting_Under-res…
 - I would suggest to take a look and if you are so inclined to express
 support. It is totally worth it!

 My opinion is that these projects are great for starters, and should be
 done (low-hanging fruits and all that), but won't get much further at least
 for a while, mostly because Wikidata rarely offers more than a skeleton of
 content. A decent Wikipedia article will include much, much more content
 than what is represented in Wikidata. And if you only use that for input,
 you're limiting yourself too much.

 Here's a different approach based on summarization over input sources:

https://www.wired.com/story/using-artificial-intelligence-to-fix-wikipedias…
-
 this has a more promising approach for the short- to mid-term.

 I still maintain that the Abstract Wikipedia approach has certain
 advantages over both learned approaches, and is most aligned with Lucie's
 work. The machine learned approaches always fall short on the dimension of
 editability, due to the black-boxness of their solutions.

 Also, furthermore, agree to Jeblad.

 Remains the question, why is there not more discussion? Maybe because
 there is nothing substantial to discuss yet :) The two white papers are
 rather high level and the idea is not concrete enough yet, so that I
 wouldn't expect too much discussion yet going on on-wiki. That was similar
 to Wikidata - the number who discussed Wikidata at this level of maturity
 was tiny, it increased considerably once an actual design plan was
 suggested, but still remained small - and then exploded once the system was
 deployed. I would be surprised and delighted if we managed to avoid this
 pattern this time, but I can't do more than publicly present the idea,
 announce plans once they are there, and hope for a timely discussion :)

 Cheers,
 Denny

 On Mon, Jan 14, 2019 at 2:54 AM John Erling Blad &lt;jeblad(a)gmail.com&gt; wrote:

  An additional note; what Wikipedia urgently needs
is a way to create
 and reuse canned text (aka "templates"), and a way to adapt that text
 to data from Wikidata. That is mostly just inflection rules, but in
 some cases it involves grammar rules. To create larger pieces of text
 is much harder, especially if the text is supposed to be readable.
 Jumbling sentences together as is commonly done by various botscripts
 does not work very well, or rather, it does not work at all.

 On Mon, Jan 14, 2019 at 11:44 AM John Erling Blad &lt;jeblad(a)gmail.com&gt;
 wrote:

 Using an abstract language as an basis for translations have been
 tried before, and is almost as hard as translating between two common
 languages.

 There are two really hard problems, it is the implied references and
 the cultural context. An artificial language can get rid of the
 implied references, but it tend to create very weird and unnatural
 expressions. If the cultural context is removed, then it can be
 extremely hard to put it back in, and without any cultural context it
 can be hard to explain anything.

 But yes, you can make an abstract language, but it won't give you any
 high quality prose.

 On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone &lt;schenonef(a)gmail.com&gt; 
wrote:
  >
 > This is quite an awesome idea. But thinking about it, wouldn't it be 
possible to use structured data in wikidata to generate articles? Can't we
 skip the need of learning an abstract language by using wikidata?
  >
 > Also, is there discussion about this idea anywhere in the Wikimedia  wikis? I
haven't found any...
  >
 > On Sat, Sep 29, 2018 at 3:44 PM Pine W &lt;wiki.pine(a)gmail.com&gt; wrote:
 >>
 >> Forwarding because this (ambitious!) proposal may be of interest to 
people
  >> on other lists. I'm not endorsing
the proposal at this time, but I'm
 >> curious about it.
 >>
 >> Pine
 >> ( https://meta.wikimedia.org/wiki/User:Pine )
 >>
 >>
 >> ---------- Forwarded message ---------
 >> From: Denny Vrandečić &lt;vrandecic(a)gmail.com&gt;
 >> Date: Sat, Sep 29, 2018 at 6:32 PM
 >> Subject: [Wikimedia-l] Wikipedia in an abstract language
 >> To: Wikimedia Mailing List &lt;wikimedia-l(a)lists.wikimedia.org&gt;
 >>
 >>
 >> Semantic Web languages allow to express ontologies and knowledge  bases in
a
  >> way meant to be particularly amenable to
the Web. Ontologies  formalize the
  >> shared understanding of a domain. But
the most expressive and  widespread
  >> languages that we know of are human
natural languages, and the  largest
  >> knowledge base we have is the wealth of
text written in human  languages.
  >>
 >> We looks for a path to bridge the gap between knowledge  representation
  >> languages such as OWL and human natural
languages such as English. We
 >> propose a project to simultaneously expose that gap, allow to 
collaborate
  >> on closing it, make progress widely
visible, and is highly  attractive and
  >> valuable in its own right: a Wikipedia
written in an abstract  language to
  >> be rendered into any natural language on
request. This would make  current
  >> Wikipedia editors about 100x more
productive, and increase the  content of
  >> Wikipedia by 10x. For billions of users
this will unlock knowledge  they
  >> currently do not have access to.
 >>
 >> My first talk on this topic will be on October 10, 2018,  16:45-17:00, at
  >> the Asilomar in Monterey, CA during the
Blue Sky track of ISWC. My  second,
  >> longer talk on the topic will be at the
DL workshop in Tempe, AZ,  October
  >> 27-29. Comments are very welcome as I
prepare the slides and the  talk.
  >>
 >> Link to the paper: http://simia.net/download/abstractwikipedia.pdf
 >>
 >> Cheers,
 >> Denny
 >> _______________________________________________
 >> Wikimedia-l mailing list, guidelines at:
 >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
 >> https://meta.wikimedia.org/wiki/Wikimedia-l
 >> New messages to: Wikimedia-l(a)lists.wikimedia.org
 >> Unsubscribe: 
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
  >>
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
 >> _______________________________________________
 >> Wikipedia-l mailing list
 >> Wikipedia-l(a)lists.wikimedia.org
 >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
 >
 > _______________________________________________
 > Wikidata mailing list
 > Wikidata(a)lists.wikimedia.org
 > https://lists.wikimedia.org/mailman/listinfo/wikidata 
 _______________________________________________
 Wikidata mailing list
 Wikidata(a)lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata

 _______________________________________________
 Wikidata mailing
listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata

 --
 All the best,
 Sebastian Hellmann

 Director of Knowledge Integration and Linked Data Technologies (KILT)
 Competence Center
 at the Institute for Applied Informatics (InfAI) at Leipzig University
 Executive Director of the DBpedia Association
 Projects: http://dbpedia.org, http://nlp2rdf.org,
 http://linguistics.okfn.org, https://www.w3.org/community/ld4lt
 <http://www.w3.org/community/ld4lt>
 Homepage: http://aksw.org/SebastianHellmann
 Research Group: http://aksw.org