Fwd: [Wikimedia-l] Wikipedia in an abstract language

List overview All Threads
Download

newer

older

[CfP] ISWC 2019 *** Deadline...

[CfP] ISWC 2019 - *One day left*...

Pine W

29 Sep 2018 29 Sep '18

6:42 p.m.

Forwarding because this (ambitious!) proposal may be of interest to people on other lists. I'm not endorsing the proposal at this time, but I'm curious about it. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) ---------- Forwarded message --------- From: Denny Vrandečić <vrandecic(a)gmail.com> Date: Sat, Sep 29, 2018 at 6:32 PM Subject: [Wikimedia-l] Wikipedia in an abstract language To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org> Semantic Web languages allow to express ontologies and knowledge bases in a way meant to be particularly amenable to the Web. Ontologies formalize the shared understanding of a domain. But the most expressive and widespread languages that we know of are human natural languages, and the largest knowledge base we have is the wealth of text written in human languages. We looks for a path to bridge the gap between knowledge representation languages such as OWL and human natural languages such as English. We propose a project to simultaneously expose that gap, allow to collaborate on closing it, make progress widely visible, and is highly attractive and valuable in its own right: a Wikipedia written in an abstract language to be rendered into any natural language on request. This would make current Wikipedia editors about 100x more productive, and increase the content of Wikipedia by 10x. For billions of users this will unlock knowledge they currently do not have access to. My first talk on this topic will be on October 10, 2018, 16:45-17:00, at the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second, longer talk on the topic will be at the DL workshop in Tempe, AZ, October 27-29. Comments are very welcome as I prepare the slides and the talk. Link to the paper: http://simia.net/download/abstractwikipedia.pdf Cheers, Denny _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l(a)lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

Attachments:

attachment.htm (text/html — 3.3 KB)

Show replies by date

Luca Martinelli

29 Sep 29 Sep

8:17 p.m.

Wow! Please share the slides or the video! I'm interested too. L. Il sab 29 set 2018, 20:43 Pine W <wiki.pine(a)gmail.com> ha scritto:

...

fn＠imm.dtu.dk

4 Oct 4 Oct

3:50 p.m.

Denny's project is a very interesting. We already have Wikidata and Magnus Manske's autodesc which can create paragraph-length natural language for some types of items. Example: https://tools.wmflabs.org/autodesc/?q=Q18618629&lang=&mode=long&… """Denny Vrandečić is a Croatia researcher, programmer, and computer scientist. He was born on February 27, 1978 in Stuttgart. He studied at Karlsruhe Institute of Technology from October 2004 until June 2010, University of Stuttgart from September 1998 until February 2004, University of Stuttgart from September 1997 until February 2004, and Geschwister-Scholl-Gymnasium. He worked for Google from October 2013, for Wikimedia Deutschland from March 2012 until September 2013, and for Karlsruhe Institute of Technology from 2004 until 2012.""" Currently he seems to support English, French and Dutch. I think Magnus Manske would accept pull request to other languages at https://bitbucket.org/magnusmanske/reasonator/src/9c58fadb7b72a791142fc158a… So how would we go beyond Magnus? Would the Wikidata representation suffice? I have seen Q50827579 and Q28819478 for Wikidata to language generation, but I am not aware of running applications and are they better than Magnus' hard-coded approach? I have been experimenting a bit the other way. Ordia can go from natural language to Wikidata-lexemes (for single Danish example):

...

>> from ordia.base import Base >> base = Base() >> base.words_to_form_ids('der kom en soldat marcherende henad

landevejen'.split(), language='da') [['L3064-F1'], ['L3065-F3', 'L3065-F6'], ['L2022-F1', 'L3073-F3'], ['L3074-F1'], ['L3075-F5'], ['L3215-F1'], ['L3216-F2']] Writing the encyclopedic text in "Wikidata-lexemesh" could perhaps ease translation, particularly after 18 October when senses are planned to be enabled. /Finn On 09/29/2018 08:42 PM, Pine W wrote:

...

Forwarding because this (ambitious!) proposal may be of interest to people on other lists. I'm not endorsing the proposal at this time, but I'm curious about it. Pine ( https://meta.wikimedia.org/wiki/User:Pine ) ---------- Forwarded message --------- From: *Denny Vrandečić* <vrandecic(a)gmail.com <mailto:vrandecic@gmail.com>> Date: Sat, Sep 29, 2018 at 6:32 PM Subject: [Wikimedia-l] Wikipedia in an abstract language To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org <mailto:wikimedia-l@lists.wikimedia.org>> Semantic Web languages allow to express ontologies and knowledge bases in a way meant to be particularly amenable to the Web. Ontologies formalize the shared understanding of a domain. But the most expressive and widespread languages that we know of are human natural languages, and the largest knowledge base we have is the wealth of text written in human languages. We looks for a path to bridge the gap between knowledge representation languages such as OWL and human natural languages such as English. We propose a project to simultaneously expose that gap, allow to collaborate on closing it, make progress widely visible, and is highly attractive and valuable in its own right: a Wikipedia written in an abstract language to be rendered into any natural language on request. This would make current Wikipedia editors about 100x more productive, and increase the content of Wikipedia by 10x. For billions of users this will unlock knowledge they currently do not have access to. My first talk on this topic will be on October 10, 2018, 16:45-17:00, at the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second, longer talk on the topic will be at the DL workshop in Tempe, AZ, October 27-29. Comments are very welcome as I prepare the slides and the talk. Link to the paper: http://simia.net/download/abstractwikipedia.pdf Cheers, Denny _______________________________________________ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l(a)lists.wikimedia.org <mailto:Wikimedia-l@lists.wikimedia.org> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org <mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe> _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Felipe Schenone

13 Jan 13 Jan

12:28 p.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

This is quite an awesome idea. But thinking about it, wouldn't it be possible to use structured data in wikidata to generate articles? Can't we skip the need of learning an abstract language by using wikidata? Also, is there discussion about this idea anywhere in the Wikimedia wikis? I haven't found any... On Sat, Sep 29, 2018 at 3:44 PM Pine W <wiki.pine(a)gmail.com> wrote:

...

John Erling Blad

14 Jan 14 Jan

10:44 a.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

...

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

John Erling Blad

10:54 a.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

An additional note; what Wikipedia urgently needs is a way to create and reuse canned text (aka "templates"), and a way to adapt that text to data from Wikidata. That is mostly just inflection rules, but in some cases it involves grammar rules. To create larger pieces of text is much harder, especially if the text is supposed to be readable. Jumbling sentences together as is commonly done by various botscripts does not work very well, or rather, it does not work at all. On Mon, Jan 14, 2019 at 11:44 AM John Erling Blad <jeblad(a)gmail.com> wrote:

...

Using an abstract language as an basis for translations have been tried before, and is almost as hard as translating between two common languages. There are two really hard problems, it is the implied references and the cultural context. An artificial language can get rid of the implied references, but it tend to create very weird and unnatural expressions. If the cultural context is removed, then it can be extremely hard to put it back in, and without any cultural context it can be hard to explain anything. But yes, you can make an abstract language, but it won't give you any high quality prose. On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone <schenonef(a)gmail.com> wrote: > > This is quite an awesome idea. But thinking about it, wouldn't it be possible to use structured data in wikidata to generate articles? Can't we skip the need of learning an abstract language by using wikidata? > > Also, is there discussion about this idea anywhere in the Wikimedia wikis? I haven't found any... > > On Sat, Sep 29, 2018 at 3:44 PM Pine W <wiki.pine(a)gmail.com> wrote: >> >> Forwarding because this (ambitious!) proposal may be of interest to people >> on other lists. I'm not endorsing the proposal at this time, but I'm >> curious about it. >> >> Pine >> ( https://meta.wikimedia.org/wiki/User:Pine ) >> >> >> ---------- Forwarded message --------- >> From: Denny Vrandečić <vrandecic(a)gmail.com> >> Date: Sat, Sep 29, 2018 at 6:32 PM >> Subject: [Wikimedia-l] Wikipedia in an abstract language >> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org> >> >> >> Semantic Web languages allow to express ontologies and knowledge bases in a >> way meant to be particularly amenable to the Web. Ontologies formalize the >> shared understanding of a domain. But the most expressive and widespread >> languages that we know of are human natural languages, and the largest >> knowledge base we have is the wealth of text written in human languages. >> >> We looks for a path to bridge the gap between knowledge representation >> languages such as OWL and human natural languages such as English. We >> propose a project to simultaneously expose that gap, allow to collaborate >> on closing it, make progress widely visible, and is highly attractive and >> valuable in its own right: a Wikipedia written in an abstract language to >> be rendered into any natural language on request. This would make current >> Wikipedia editors about 100x more productive, and increase the content of >> Wikipedia by 10x. For billions of users this will unlock knowledge they >> currently do not have access to. >> >> My first talk on this topic will be on October 10, 2018, 16:45-17:00, at >> the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My second, >> longer talk on the topic will be at the DL workshop in Tempe, AZ, October >> 27-29. Comments are very welcome as I prepare the slides and the talk. >> >> Link to the paper: http://simia.net/download/abstractwikipedia.pdf >> >> Cheers, >> Denny >> _______________________________________________ >> Wikimedia-l mailing list, guidelines at: >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and >> https://meta.wikimedia.org/wiki/Wikimedia-l >> New messages to: Wikimedia-l(a)lists.wikimedia.org >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, >> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata

Denny Vrandečić

5:34 p.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

Felipe, thanks for the kind words. There are a few research projects that use Wikidata to generate parts of Wikipedia articles - see for example https://arxiv.org/abs/1702.06235 which is almost as good as human results and beats templates by far, but only for the first sentence of biographies. Lucie Kaffee has also quite a body of research on that topic, and has worked very succesfully and tightly with some Wikipedia communities on these questions. Here's her bibliography: https://scholar.google.com/citations?user=xiuGTq0AAAAJ&hl=de Another project of hers is currently under review for a grant: https://meta.wikimedia.org/wiki/Grants:Project/Scribe:_Supporting_Under-res… - I would suggest to take a look and if you are so inclined to express support. It is totally worth it! My opinion is that these projects are great for starters, and should be done (low-hanging fruits and all that), but won't get much further at least for a while, mostly because Wikidata rarely offers more than a skeleton of content. A decent Wikipedia article will include much, much more content than what is represented in Wikidata. And if you only use that for input, you're limiting yourself too much. Here's a different approach based on summarization over input sources: https://www.wired.com/story/using-artificial-intelligence-to-fix-wikipedias… - this has a more promising approach for the short- to mid-term. I still maintain that the Abstract Wikipedia approach has certain advantages over both learned approaches, and is most aligned with Lucie's work. The machine learned approaches always fall short on the dimension of editability, due to the black-boxness of their solutions. Also, furthermore, agree to Jeblad. Remains the question, why is there not more discussion? Maybe because there is nothing substantial to discuss yet :) The two white papers are rather high level and the idea is not concrete enough yet, so that I wouldn't expect too much discussion yet going on on-wiki. That was similar to Wikidata - the number who discussed Wikidata at this level of maturity was tiny, it increased considerably once an actual design plan was suggested, but still remained small - and then exploded once the system was deployed. I would be surprised and delighted if we managed to avoid this pattern this time, but I can't do more than publicly present the idea, announce plans once they are there, and hope for a timely discussion :) Cheers, Denny On Mon, Jan 14, 2019 at 2:54 AM John Erling Blad <jeblad(a)gmail.com> wrote:

...

wrote:

> > This is quite an awesome idea. But thinking about it, wouldn't it be

possible to use structured data in wikidata to generate articles? Can't we skip the need of learning an abstract language by using wikidata?

> > Also, is there discussion about this idea anywhere in the Wikimedia

wikis? I haven't found any...

> > On Sat, Sep 29, 2018 at 3:44 PM Pine W <wiki.pine(a)gmail.com> wrote: >> >> Forwarding because this (ambitious!) proposal may be of interest to

people

>> on other lists. I'm not endorsing the proposal at this time, but I'm >> curious about it. >> >> Pine >> ( https://meta.wikimedia.org/wiki/User:Pine ) >> >> >> ---------- Forwarded message --------- >> From: Denny Vrandečić <vrandecic(a)gmail.com> >> Date: Sat, Sep 29, 2018 at 6:32 PM >> Subject: [Wikimedia-l] Wikipedia in an abstract language >> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org> >> >> >> Semantic Web languages allow to express ontologies and knowledge

bases in a

>> way meant to be particularly amenable to the Web. Ontologies

formalize the

>> shared understanding of a domain. But the most expressive and

widespread

>> languages that we know of are human natural languages, and the largest >> knowledge base we have is the wealth of text written in human

languages.

>> >> We looks for a path to bridge the gap between knowledge representation >> languages such as OWL and human natural languages such as English. We >> propose a project to simultaneously expose that gap, allow to

collaborate

>> on closing it, make progress widely visible, and is highly attractive

and

>> valuable in its own right: a Wikipedia written in an abstract

language to

>> be rendered into any natural language on request. This would make

current

>> Wikipedia editors about 100x more productive, and increase the

content of

>> Wikipedia by 10x. For billions of users this will unlock knowledge

they

>> currently do not have access to. >> >> My first talk on this topic will be on October 10, 2018, 16:45-17:00,

>> the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My

second,

>> longer talk on the topic will be at the DL workshop in Tempe, AZ,

October

>> 27-29. Comments are very welcome as I prepare the slides and the talk. >> >> Link to the paper: http://simia.net/download/abstractwikipedia.pdf >> >> Cheers, >> Denny >> _______________________________________________ >> Wikimedia-l mailing list, guidelines at: >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and >> https://meta.wikimedia.org/wiki/Wikimedia-l >> New messages to: Wikimedia-l(a)lists.wikimedia.org >> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l

>> <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe> >> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

Sebastian Hellmann

15 Jan 15 Jan

11:28 a.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

Hi all, let me send you a paper from 2013, which might either help directly or at least to get some ideas...

...

A lemon lexicon for DBpedia, Christina Unger, John McCrae, Sebastian Walter, Sara Winter, Philipp Cimiano, 2013, Proceedings of 1st International Workshop on NLP and DBpedia, co-located with the 12th International Semantic Web Conference (ISWC 2013), October 21-25, Sydney, Australia https://github.com/ag-sc/lemon.dbpedia https://pdfs.semanticscholar.org/638e/b4959db792c94411339439013eef536fb052.…

Since the mappings from DBpedia to Wikidata properties are here: http://mappings.dbpedia.org/index.php?title=Special:AllPages&namespace=… e.g. http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate You could directly use the DBpedia-lemon lexicalisation for Wikidata. The mappings can be downloaded with | | |git clone https://github.com/dbpedia/extraction-framework ; cd core ; ../run download-mappings | All the best, Sebastian On 14.01.19 18:34, Denny Vrandečić wrote:

...

Using an abstract language as an basis for translations have been tried before, and is almost as hard as translating between two

common

languages. There are two really hard problems, it is the implied references and the cultural context. An artificial language can get rid of the implied references, but it tend to create very weird and unnatural expressions. If the cultural context is removed, then it can be extremely hard to put it back in, and without any cultural

context it

can be hard to explain anything. But yes, you can make an abstract language, but it won't give

you any

high quality prose. On Mon, Jan 14, 2019 at 8:09 AM Felipe Schenone

<schenonef(a)gmail.com <mailto:schenonef@gmail.com>> wrote:

> > This is quite an awesome idea. But thinking about it, wouldn't

it be possible to use structured data in wikidata to generate articles? Can't we skip the need of learning an abstract language by using wikidata?

> > Also, is there discussion about this idea anywhere in the

Wikimedia wikis? I haven't found any...

> > On Sat, Sep 29, 2018 at 3:44 PM Pine W <wiki.pine(a)gmail.com

<mailto:wiki.pine@gmail.com>> wrote:

>> >> Forwarding because this (ambitious!) proposal may be of

interest to people

>> on other lists. I'm not endorsing the proposal at this time,

but I'm

>> curious about it. >> >> Pine >> ( https://meta.wikimedia.org/wiki/User:Pine ) >> >> >> ---------- Forwarded message --------- >> From: Denny Vrandečić <vrandecic(a)gmail.com

<mailto:vrandecic@gmail.com>>

>> Date: Sat, Sep 29, 2018 at 6:32 PM >> Subject: [Wikimedia-l] Wikipedia in an abstract language >> To: Wikimedia Mailing List <wikimedia-l(a)lists.wikimedia.org

<mailto:wikimedia-l@lists.wikimedia.org>>

>> >> >> Semantic Web languages allow to express ontologies and

knowledge bases in a

>> way meant to be particularly amenable to the Web. Ontologies

formalize the

>> shared understanding of a domain. But the most expressive and

widespread

>> languages that we know of are human natural languages, and

the largest

>> knowledge base we have is the wealth of text written in human

languages.

>> >> We looks for a path to bridge the gap between knowledge

representation

>> languages such as OWL and human natural languages such as

English. We

>> propose a project to simultaneously expose that gap, allow to

collaborate

>> on closing it, make progress widely visible, and is highly

attractive and

>> valuable in its own right: a Wikipedia written in an abstract

language to

>> be rendered into any natural language on request. This would

make current

>> Wikipedia editors about 100x more productive, and increase

the content of

>> Wikipedia by 10x. For billions of users this will unlock

knowledge they

>> currently do not have access to. >> >> My first talk on this topic will be on October 10, 2018,

16:45-17:00, at

>> the Asilomar in Monterey, CA during the Blue Sky track of

ISWC. My second,

>> longer talk on the topic will be at the DL workshop in Tempe,

AZ, October

>> 27-29. Comments are very welcome as I prepare the slides and

the talk.

>> >> Link to the paper:

http://simia.net/download/abstractwikipedia.pdf

>> >> Cheers, >> Denny >> _______________________________________________ >> Wikimedia-l mailing list, guidelines at: >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and >> https://meta.wikimedia.org/wiki/Wikimedia-l >> New messages to: Wikimedia-l(a)lists.wikimedia.org

<mailto:Wikimedia-l@lists.wikimedia.org>

>> Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

>> <mailto:wikimedia-l-request@lists.wikimedia.org

<mailto:wikimedia-l-request@lists.wikimedia.org>?subject=unsubscribe>

>> _______________________________________________ >> Wikipedia-l mailing list >> Wikipedia-l(a)lists.wikimedia.org

<mailto:Wikipedia-l@lists.wikimedia.org>

>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > > _______________________________________________ > Wikidata mailing list > Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> > https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org> https://lists.wikimedia.org/mailman/listinfo/wikidata _______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

-- All the best, Sebastian Hellmann Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt> Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

Denny Vrandečić

9:27 p.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

Cool, thanks! I read this a while ago, rereading again. On Tue, Jan 15, 2019 at 3:28 AM Sebastian Hellmann < hellmann(a)informatik.uni-leipzig.de> wrote:

...

Hi all, let me send you a paper from 2013, which might either help directly or at least to get some ideas... A lemon lexicon for DBpedia, Christina Unger, John McCrae, Sebastian Walter, Sara Winter, Philipp Cimiano, 2013, Proceedings of 1st International Workshop on NLP and DBpedia, co-located with the 12th International Semantic Web Conference (ISWC 2013), October 21-25, Sydney, Australia https://github.com/ag-sc/lemon.dbpedia https://pdfs.semanticscholar.org/638e/b4959db792c94411339439013eef536fb052.… Since the mappings from DBpedia to Wikidata properties are here: http://mappings.dbpedia.org/index.php?title=Special:AllPages&namespace=… e.g. http://mappings.dbpedia.org/index.php/OntologyProperty:BirthDate You could directly use the DBpedia-lemon lexicalisation for Wikidata. The mappings can be downloaded with git clone https://github.com/dbpedia/extraction-framework ; cd core ; ../run download-mappings All the best, Sebastian On 14.01.19 18:34, Denny Vrandečić wrote: Felipe, thanks for the kind words. There are a few research projects that use Wikidata to generate parts of Wikipedia articles - see for example https://arxiv.org/abs/1702.06235 which is almost as good as human results and beats templates by far, but only for the first sentence of biographies. Lucie Kaffee has also quite a body of research on that topic, and has worked very succesfully and tightly with some Wikipedia communities on these questions. Here's her bibliography: https://scholar.google.com/citations?user=xiuGTq0AAAAJ&hl=de Another project of hers is currently under review for a grant: https://meta.wikimedia.org/wiki/Grants:Project/Scribe:_Supporting_Under-res… - I would suggest to take a look and if you are so inclined to express support. It is totally worth it! My opinion is that these projects are great for starters, and should be done (low-hanging fruits and all that), but won't get much further at least for a while, mostly because Wikidata rarely offers more than a skeleton of content. A decent Wikipedia article will include much, much more content than what is represented in Wikidata. And if you only use that for input, you're limiting yourself too much. Here's a different approach based on summarization over input sources: https://www.wired.com/story/using-artificial-intelligence-to-fix-wikipedias… - this has a more promising approach for the short- to mid-term. I still maintain that the Abstract Wikipedia approach has certain advantages over both learned approaches, and is most aligned with Lucie's work. The machine learned approaches always fall short on the dimension of editability, due to the black-boxness of their solutions. Also, furthermore, agree to Jeblad. Remains the question, why is there not more discussion? Maybe because there is nothing substantial to discuss yet :) The two white papers are rather high level and the idea is not concrete enough yet, so that I wouldn't expect too much discussion yet going on on-wiki. That was similar to Wikidata - the number who discussed Wikidata at this level of maturity was tiny, it increased considerably once an actual design plan was suggested, but still remained small - and then exploded once the system was deployed. I would be surprised and delighted if we managed to avoid this pattern this time, but I can't do more than publicly present the idea, announce plans once they are there, and hope for a timely discussion :) Cheers, Denny On Mon, Jan 14, 2019 at 2:54 AM John Erling Blad <jeblad(a)gmail.com> wrote:

wrote:

> > This is quite an awesome idea. But thinking about it, wouldn't it be

possible to use structured data in wikidata to generate articles? Can't we skip the need of learning an abstract language by using wikidata?

> > Also, is there discussion about this idea anywhere in the Wikimedia

wikis? I haven't found any...

> > On Sat, Sep 29, 2018 at 3:44 PM Pine W <wiki.pine(a)gmail.com> wrote: >> >> Forwarding because this (ambitious!) proposal may be of interest to

people

bases in a

>> way meant to be particularly amenable to the Web. Ontologies

formalize the

>> shared understanding of a domain. But the most expressive and

widespread

>> languages that we know of are human natural languages, and the

largest

>> knowledge base we have is the wealth of text written in human

languages.

>> >> We looks for a path to bridge the gap between knowledge

representation

>> languages such as OWL and human natural languages such as English. We >> propose a project to simultaneously expose that gap, allow to

collaborate

>> on closing it, make progress widely visible, and is highly

attractive and

>> valuable in its own right: a Wikipedia written in an abstract

language to

>> be rendered into any natural language on request. This would make

current

>> Wikipedia editors about 100x more productive, and increase the

content of

>> Wikipedia by 10x. For billions of users this will unlock knowledge

they

>> currently do not have access to. >> >> My first talk on this topic will be on October 10, 2018,

16:45-17:00, at

>> the Asilomar in Monterey, CA during the Blue Sky track of ISWC. My

second,

>> longer talk on the topic will be at the DL workshop in Tempe, AZ,

October

>> 27-29. Comments are very welcome as I prepare the slides and the

talk.

>> >> Link to the paper: http://simia.net/download/abstractwikipedia.pdf >> >> Cheers, >> Denny >> _______________________________________________ >> Wikimedia-l mailing list, guidelines at: >> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and >> https://meta.wikimedia.org/wiki/Wikimedia-l >> New messages to: Wikimedia-l(a)lists.wikimedia.org >> Unsubscribe:

https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

_______________________________________________ Wikidata mailing list Wikidata(a)lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________ Wikidata mailing listWikidata@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/wikidata -- All the best, Sebastian Hellmann Director of Knowledge Integration and Linked Data Technologies (KILT) Competence Center at the Institute for Applied Informatics (InfAI) at Leipzig University Executive Director of the DBpedia Association Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt> Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org

John Erling Blad

18 Jan 18 Jan

11:58 a.m.

New subject: [Wikipedia-l] Fwd: [Wikimedia-l] Wikipedia in an abstract language

Tried a couple of times to rewrite this, but it grows out of bound anyhow. Seems like it has its own life. There is a book from 2000 by Robert Dale and Ehud Reiter; Building natural language generation systems ISBN 978-0-521-02451-8 Wikibase items can be rebuilt as Plans from the type statement (top-down) or as Constituents from the other statements (bottom-up). The two models does not necessarily agree. This is although only the overall document structure, and organizing of the data, and it leaves out the really hard part – the language specific realization. You can probably redefine Plans and Constituents as entities, I have toyed around with them as Lua classes, and put them into Wikidata. The easiest way to reuse them locally would be to use a lookup structure for fully or partly canned text, and define rules for agreement and inflection as part of these texts. Piecing together canned text is hard, but easier than building full prose from the bottom. It is possible to define a very low-level realization for some languages, but that is a lot harder. The idea for lookup of canned text is to use the text that covers most of the available statements, but still such that most of the remaining statements can also be covered. That is some kind of canned text might not support a specific agreement rule, thus some other canned text can not reference it and less coverage is achieved. For example the direction to the sea can not be expressed in a canned text for Finnish and then the distance can not reference the direction. To get around this I prioritized Plans and Constituents, with those having higher priority being put first. What a person is known for should go in front of his other work. I ordered the Plans and Constituents chronologically to maintain causality. This can also be called sorting. Priority tend to influence plans, and order influence constituents. Then there are grouping, which keeps some statements together. Length, width, height are typically a group. A lake can be described with individual canned text for length, width, and height, but those are given low priority. Then it an be made a canned text for length and height, with somewhat higher priority. An even higher priority can be given to a canned text for all three. Given that all three statements are available then the composite canned text for all of them will be used. If only some of them exist then a lower priority canned text will be used. Note that the book use "canned text" a little different. Also note that the canned texts can be translated as ordinary message strings. They can also be defined as a kind of entities in Wikidata. As ordinary message strings they need additional data, but that comes naturally as entities in Wikidata. My drodling put it inside each Wikipedia, as it would be easier to reuse from Lua-modules. (And yes, you can then override part of the ArticlePlaceholder to show the text at the special page.)

1939

days inactive

2050

days old

wikidata@lists.wikimedia.org

Manage subscription

9 comments

7 participants

tags (0)

participants (7)

Denny Vrandečić
Felipe Schenone
fn＠imm.dtu.dk
John Erling Blad
Luca Martinelli
Pine W
Sebastian Hellmann