August 2020 - Abstract-Wikipedia - lists.wikimedia.org

Re: [Abstract-wikipedia] Loose notes
by Grounder UK 05 Aug '20

05 Aug '20

Just one huge Thank You for Ordia, Finn Årup Nielsen! It's really coming along nicely now we have so many more Lexemes. You are quite right, of course; we're not quite up to 325,000. I overlooked the possibility of a Lexeme having multiple lemmas. A few have as many as six, it seems! Sorry, for that slight overstatement. I hope you didn't think you had lost some. While I'm apologizing, it seems that I got the link to your aclweb.anthology paper wrong when I included it earlier! (It should be "2020.ldl" not "2020.idl", of course.) Sorry for that, too. I assume that https://www.aclweb.org/anthology/2020.ldl-1.12.pdf [corrected link] is identical to https://people.compute.dtu.dk/faan/ps/Nielsen2020Lexemes.pdf. Thank you again for your great work. I hope my mistakes did not inconvenience you too much. Best regards, Al. On Tuesday, 4 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org> wrote: > Send Abstract-Wikipedia mailing list submissions to > abstract-wikipedia(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia > or, via email, send a message with subject or body 'help' to > abstract-wikipedia-request(a)lists.wikimedia.org > > You can reach the person managing the list at > abstract-wikipedia-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Abstract-Wikipedia digest..." > > > Today's Topics: > > 1. Re: Loose notes (Andy) > 2. Re: Loose notes (fn(a)imm.dtu.dk) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Tue, 4 Aug 2020 17:49:03 +0200 > From: Andy <borucki.andrzej(a)gmail.com> > To: "General public mailing list for the discussion of Abstract > Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists. > wikimedia.org> > Subject: Re: [Abstract-wikipedia] Loose notes > Message-ID: > <CAE2KeALchD9EAY0HPZgmR9y760eVPO=O+mWiEd5+o0Ns==zbYA@mail. > gmail.com> > Content-Type: text/plain; charset="utf-8" > > Is any road map on https://meta.wikimedia.org/ with estimated points of > time for Abstract Wikipedia? > > pon., 3 sie 2020 o 18:43 Grounder UK <grounderuk(a)gmail.com> napisał(a): > > > Plenty more work to be done! > > >

1 0

Re: [Abstract-wikipedia] Loose notes
by Grounder UK 04 Aug '20

04 Aug '20

Andrzej, Yes, there are over 325,000 lexemes in Wikidata now, over 40,000 for English. "Abstract" definitions are a little tricky, but it is not Lexemes themselves that are defined, it is their Senses, and Senses can be linked to Wikidata Items, which connects Lexemes into the abstract graph of "knowledge". Translations are still very incomplete but, as with definitions, it is the Sense that should have the translation. The difficulty is that translation cannot imply identity, which means that you cannot assume that a Sense to Sense translation allows you to acquire translations from the Sense you translate into. If you think of each Sense as a set, you cannot tell whether the translated Sense is a subset or a superset. What we need for that is the concept of the intersection between the two sets, which would be part of each Sense but not necessarily the whole of either Sense. So, broadly, your example of "zamek" is not a problem; you can connect the "lock" Sense to the Sense of the English word "lock" (L1132-S1) as well as to the identifier for the encyclopedic concept Q228039 and/or Q24644118 (claimed to be a subclass of Q228039). But you should not connect it to L1132-S2 (which connects to Q105731 pl:"Śluza wodna") or to L1132-S3 (Q1134386 pl:"Zamek (broń)", assuming that's a different Sense of "zamek" too). (I say this without knowing enough Polish to know if it makes sense; I'm living in Searle's Chiński pokój!)[1] I don't know whether the lexical data is in the dumps now, but it will be pretty huge just by itself. It is also quite dependent on the main Wikidata pages. For our natural-language generation, that's a great strength, because we can move naturally from the concept to the word and related vocabulary in any language without doing any translation. The extra context we need to be able to choose the right Form of the Lexeme for the Sense... that will need more work on the data, as will characterising thesaurus relations (hypernymy, synonymy, hyponymy, antonymy etc) so that good alternative Lexemes can be found. In an "abstract" context, these can be thought of as "translations" into overlapping Senses, but the extent to which we represent and consult (or navigate within) the broader compound Sense domain (the set union of the Senses) is... an interesting challenge. As for a fully "abstract" dictionary that can be read in any language... We'll be better able to think about that once we have built a few renderers for our "abstract" encyclopedic content, in my view. Machine translation and natural-language understanding are not our primary goal. I think we will make progress on both, if we remember to pay attention to inverse functions as we evolve our NLG renderers, but we have a very long way to go in all directions (and all languages). Best regards, Al. [1] https://pl.wikipedia.org/wiki/Chi%C5%84ski_pok%C3%B3j On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org> wrote: > Send Abstract-Wikipedia mailing list submissions to > abstract-wikipedia(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia > or, via email, send a message with subject or body 'help' to > abstract-wikipedia-request(a)lists.wikimedia.org > > You can reach the person managing the list at > abstract-wikipedia-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Abstract-Wikipedia digest..." > > > Today's Topics: > > 1. Re: Natural Language and Mathematics Generation (Adam Sobieski) > 2. Re: Loose notes (Andy) > 3. Re: Loose notes (Arthur Smith) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 3 Aug 2020 18:23:03 +0000 > From: Adam Sobieski <adamsobieski(a)hotmail.com> > To: Charles Matthews <charles.r.matthews(a)ntlworld.com>, "General > public mailing list for the discussion of Abstract Wikipedia (aka > Wikilambda)" <abstract-wikipedia(a)lists.wikimedia.org> > Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics > Generation > Message-ID: > <CH2PR12MB4184F2C81E4CD533ACFE9547C54D0(a)CH2PR12MB4184.namprd > 12.prod.outlook.com> > > Content-Type: text/plain; charset="utf-8" > > Charles, > > There is also MathML to consider. Work is underway at the W3C with respect > to a new version of MathML, MathML4 [1][2]. Work is underway with respect > to adding MathML support to Chromium [3][4]. > > Instead of LaTeX, MathML could be the way to go. > > > Best regards, > Adam > > [1] https://www.w3.org/community/mathml4/ > [2] https://mathml-refresh.github.io/mathml/ > [3] https://www.chromestatus.com/feature/5240822173794304 > [4] https://mathml.igalia.com/ > > From: Charles Matthews via Abstract-Wikipedia<mailto:abst > ract-wikipedia(a)lists.wikimedia.org> > Sent: Monday, August 3, 2020 1:53 PM > To: General public mailing list for the discussion of Abstract Wikipedia > (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org> > Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics > Generation > > > > On 03 August 2020 at 16:50 Adam Sobieski <adamsobieski(a)hotmail.com> wrote: > > > > By utilizing <math>LaTeX</math> elements in an XML-based intermediate > output format, one could simply copy that mathematical content to the > resultant output wikitext [3]. Wikitext utilizes this same convention for > mathematical expressions [3]. > > > > Whether or not to include mathematics in Abstract Wikipedia is an > important decision to make at a future point. Choosing to include > mathematics would entail discussions about representing mathematical > knowledge on Wikidata. It would entail discussions about how specific > senses of certain words have mathematical meaning. It would entail > discussions about how algorithms should determine when to use mathematical > and scientific notations and when they should, instead, use paraphrases > with the semantic content expressed using natural language. These are just > some of the discussion topics which would arise should we desire to include > mathematical and scientific notations in Abstract Wikipedia articles. > > > > > > I'm disagreeing with much of this. > > On LaTeX: while it is "industry standard", I'd like to draw attention to a > point made in https://en.wikipedia.org/wiki/Help:Displaying_a_formula#Rend > ering: "Latex does not have full support for Unicode characters, and not > all characters render." > > It goes on to suggest that Vietnamese, for example, would not be well > catered for, in terms of its diacritics. > > I appreciate that we are only talking currently about scoping, and > high-level initial planning. But given AW's objectives, this is not a good > sign, and I don't think we should just assume that LaTeX as an incumbent > gets waved through. It is pre-Web, and something closer to HTML would be > preferable, in my view. > > My background is in mathematics, and began my Wikipedia career writing > mathematics articles. There are certainly issues, such as prose/notation > balance. Mathematical language is heavily overloaded, from the > disambiguation aspect. But I'm not really recognising the landscape of > issues set out there. > > Charles > >

2 1

Re: [Abstract-wikipedia] Loose notes
by Grounder UK 04 Aug '20

04 Aug '20

Hi, Andrzej The assumption at the moment is, I think, that we will be using the Wikidata lexicographical data [1]. This is not yet as extensive as Wiktionary data [2], but it addresses many of the integrity issues. As far as I understand it, the modelling of Sense still suffers from the flaw that a Sense is presented as a "child" of a Lexeme. So, for example, L1883-S1 is a Sense of Lexeme L1883, representing the English verb to "be" with a gloss of "exist" and a "synonym" relationship to L2148-S1, a Sense of Lexeme L2148, representing the English verb to "exist". I could be wrong, but the simple idea of a word-free Sense to which all languages can link is implemented only through a possible link to a concrete Wikidata Item, so both L1883-S1 and L2148-S1 are linked to Q468777 (existence) and Q203872 (being). Apart from that, a separate translation of each Sense into each corresponding Sense in each language seems to be the intent, at present. Wikidata also has Forms of Lexemes (but I didn't find "widziałem"). The Lexeme L185 ("see") has a Form L185-F3 ("saw") but this has no link to Form L18498-F1, the uninflected form of the verb to "saw" (unlike Wiktionary, which supports homographs implicitly). Each form has "grammatical features", showing that L185-F3 is the "simple past" of L185 but the same string, "saw", is the "simple present" of L18498. It does not explicitly say that this is not the case in the third person singular, but there is a different form, L18498-F2, which is both "simple present" and "third-person singular", so there may be a presumption that the more particular overrides the more general. For "abstract" Senses, we could think of "abstract" as a new language, and then have translations between "abstract" "language" and Senses in all natural (and synthetic) languages. This would give you your "senses dictionary" (and allow implied translations between any Senses linked to the "abstract" Sense. When we need to generate a word in a particular language, we would need to translate the "abstract" Sense to the target language Lexeme and then consult the Forms of that Lexeme to identify which ones are applicable, given the "grammatical features" of the context. Plenty more work to be done! Best regards, Al. [1] https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation [2] https://www.aclweb.org/anthology/2020.idl-1.12.pdf On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org> wrote: > Send Abstract-Wikipedia mailing list submissions to > abstract-wikipedia(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia > or, via email, send a message with subject or body 'help' to > abstract-wikipedia-request(a)lists.wikimedia.org > > You can reach the person managing the list at > abstract-wikipedia-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Abstract-Wikipedia digest..." > > > Today's Topics: > > 1. Re: Comprehension questions (Charles Matthews) > 2. Natural Language and Mathematics Generation (Adam Sobieski) > 3. Re: Natural Language and Mathematics Generation (Charles Matthews) > 4. Loose notes (Andy) > > > ---------------------------------------------------------------------- > > > ------------------------------ > > Message: 4 > Date: Mon, 3 Aug 2020 12:29:03 +0200 > From: Andy <borucki.andrzej(a)gmail.com> > To: abstract-wikipedia(a)lists.wikimedia.org > Subject: [Abstract-wikipedia] Loose notes > Message-ID: > <CAE2KeAK00kSL=jJp8gNGPNp_N8KGH0yXXUXKSa6XLM9R-ParvA@ > mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > Hi, > > Abstract Wikipedia give benefits: > > - first, is creating multi-language corpus for machine translation > learning. The big disadvantage of the existing multi-language corpuses is > that most of data is from movie subtitles, which are very inaccurate. > > - second, that it will data for Word Sense Disambiguation learning and WSD > in many languages(!). > > In abstract form should be graph of senses. Senses will be choosed from > English Wordnet/UNL or English Wiktionary? UNL is piece of good work but is > inactive for years and not evolves. Wiktoinary senses have plus, that are > grouped by etymology – quite different senses are in other etymology group. > Abstract Wikipedia will linked with Wiktionary? Wiktionary senses numbers > should be now persistent , or better have unique idents. Wiktionary has > advantage that senses are translated to other languages, with disadvantage > that its points to words not senses in other language. Alternative Abstract > Wikipedia can have own sense list with idents but how to lik with > Wiktionary? > > Graph: should be possibility to create text in many/all laguages. For > example in English is “I saw”, in Polish “widziałemwidziałam” – Polish need > gender, in Abstract form should be gender of verb, even though some > languages not uses it. > > Senses dictionary can grow gradually with abstract text. If I edit abstract > text, editor should enforce me add word with senses to dictionary if not > exists and enable me to add new sense if not exists. > > Is neede: > > abstract text = corpus > > growing dictionary of senses > > growing senses to national language senses dictionary > > possibly link with Wiktionaries > > > Best regards, > > Andrzej >

3 4

Advanced user interface idea
by Andy 04 Aug '20

04 Aug '20

note: graphs in xml form are too wordy, maybe better using form with braces, dots and @ User open web editor. Pastes raw English text, for example first paragraph (maybe may be restriction to 1000 chars?) of https://en.wikipedia.org/wiki/Linux. Text is tokenized by spaCy, divided into sentences and words. For words and phrases are finding lemmas and parts of speech. Words changes color and become clickable. User can choose sense for lexem, add sense or add each lexem. Next, is shown structure of sentence graph, user can change it and add properties. In first stage, before making this editor user can edit graphs code in special language, it must be not too wordy. Best regards, Andrzej

1 0

Re: [Abstract-wikipedia] Natural Language and Mathematics Generation
by Grounder UK 04 Aug '20

04 Aug '20

More good points, Adam... At this stage, I can't say that formats bother me greatly, although clearly we need to think about them. We do have to start with Wikidata but I wonder whether we should also be looking at our wiki of functions. Could we consider a mathematical expression as a symbolic representation of an executable function? I like the idea of a Wikipedia that will actually compute the result of a function it is telling you about, not least because editors could verify that the syntax is correct by testing the function. But if some expressions are executable, that broadens the question of format. To have a string that could be copied into a spreadsheet, for example, would be an interesting function for many. So I'm wondering how far you can get by "labelizing" JSON objects with computer language labels rather than natural language ones. So our "multiply" function is "labelized" "=PRODUCT" and E=PRODUCT(m,POWER(c,2))... or E=m*c^2...? Thinking only about text, I think we are bound to take a broader WMF-wide view because we should at least consider how we can meet the requirements of each and every Wikipedia, without ignoring sister projects like Wikiversity. That's not to advocate a free-for-all, but if we increasingly represent the semantics of mathematical expressions, rather than their typography, this gives us something that can be represented more meaningfully in Wikidata and, from there, expressed in natural language as well as in a variety of symbolic and even functional forms. I happen to think it will also aid reuse of functions from the wiki, but I haven't given that idea much thought. Best regards, Al. On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org> wrote: > Send Abstract-Wikipedia mailing list submissions to > abstract-wikipedia(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia > or, via email, send a message with subject or body 'help' to > abstract-wikipedia-request(a)lists.wikimedia.org > > You can reach the person managing the list at > abstract-wikipedia-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Abstract-Wikipedia digest..." > > > Today's Topics: > > 1. Re: Natural Language and Mathematics Generation (Adam Sobieski) > 2. Re: Loose notes (Andy) > 3. Re: Loose notes (Arthur Smith) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 3 Aug 2020 18:23:03 +0000 > From: Adam Sobieski <adamsobieski(a)hotmail.com> > To: Charles Matthews <charles.r.matthews(a)ntlworld.com>, "General > public mailing list for the discussion of Abstract Wikipedia (aka > Wikilambda)" <abstract-wikipedia(a)lists.wikimedia.org> > Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics > Generation > Message-ID: > <CH2PR12MB4184F2C81E4CD533ACFE9547C54D0@CH2PR12MB4184. > namprd12.prod.outlook.com> > > Content-Type: text/plain; charset="utf-8" > > Charles, > > There is also MathML to consider. Work is underway at the W3C with respect > to a new version of MathML, MathML4 [1][2]. Work is underway with respect > to adding MathML support to Chromium [3][4]. > > Instead of LaTeX, MathML could be the way to go. > > > Best regards, > Adam > > [1] https://www.w3.org/community/mathml4/ > [2] https://mathml-refresh.github.io/mathml/ > [3] https://www.chromestatus.com/feature/5240822173794304 > [4] https://mathml.igalia.com/ > > From: Charles Matthews via Abstract-Wikipedia<mailto:abst > ract-wikipedia(a)lists.wikimedia.org> > Sent: Monday, August 3, 2020 1:53 PM > To: General public mailing list for the discussion of Abstract Wikipedia > (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org> > Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics > Generation > > > > On 03 August 2020 at 16:50 Adam Sobieski <adamsobieski(a)hotmail.com> wrote: > > > > By utilizing <math>LaTeX</math> elements in an XML-based intermediate > output format, one could simply copy that mathematical content to the > resultant output wikitext [3]. Wikitext utilizes this same convention for > mathematical expressions [3]. > > > > Whether or not to include mathematics in Abstract Wikipedia is an > important decision to make at a future point. Choosing to include > mathematics would entail discussions about representing mathematical > knowledge on Wikidata. It would entail discussions about how specific > senses of certain words have mathematical meaning. It would entail > discussions about how algorithms should determine when to use mathematical > and scientific notations and when they should, instead, use paraphrases > with the semantic content expressed using natural language. These are just > some of the discussion topics which would arise should we desire to include > mathematical and scientific notations in Abstract Wikipedia articles. > > > > > > I'm disagreeing with much of this. > > On LaTeX: while it is "industry standard", I'd like to draw attention to a > point made in https://en.wikipedia.org/wiki/Help:Displaying_a_formula# > Rendering: "Latex does not have full support for Unicode characters, and > not all characters render." > > It goes on to suggest that Vietnamese, for example, would not be well > catered for, in terms of its diacritics. > > I appreciate that we are only talking currently about scoping, and > high-level initial planning. But given AW's objectives, this is not a good > sign, and I don't think we should just assume that LaTeX as an incumbent > gets waved through. It is pre-Web, and something closer to HTML would be > preferable, in my view. > > My background is in mathematics, and began my Wikipedia career writing > mathematics articles. There are certainly issues, such as prose/notation > balance. Mathematical language is heavily overloaded, from the > disambiguation aspect. But I'm not really recognising the landscape of > issues set out there. > > Charles > > >

1 0

Natural Language and Mathematics Generation
by Adam Sobieski 04 Aug '20

04 Aug '20

I would like to broach, for discussion, the generation of natural language and mathematics for Abstract Wikipedia. Regardless of the eventual natural language generation approaches, it seems desirable to be able to include mathematics in automatically-generated encyclopedia articles. In the thread: A Document Abstraction Layer [1], it was mentioned that natural language generation algorithms could output to, instead of text strings, a custom XML format which could then be mechanically and configurably converted into intricate wikitext. That custom XML could resemble: <article xmlns="..." xmlns:meta="..."> <head>...</head> <body> <section> <head>...</head> <body> <paragraph> <sentence> <head> <meta:provenance>...</meta:provenance> <meta:console>...</meta:console> </head> <body>Next, consider the variable <math latex="x" />.</body> </sentence> </paragraph> </body> </section> </body> </article> or: <article xmlns="..." xmlns:meta="..."> <head>...</head> <body> <section> <head>...</head> <body> <paragraph> <sentence> <head> <meta:provenance>...</meta:provenance> <meta:console>...</meta:console> </head> <body>Next, consider the variable <math>x</math>.</body> </sentence> </paragraph> </body> </section> </body> </article> A <math> element could be of use for expressing mathematical notations in natural language articles. A <math> element with LaTeX syntax could simplify the complex matter of outputting mathematics into wikitext [2]. What do you think? Best regards, Adam [1] https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000151.h… [2] https://en.wikipedia.org/wiki/Wikipedia:Rendering_math

2 4

UNL
by Andy 04 Aug '20

04 Aug '20

Universal Networking Language seems to be most advanced system from the list https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Related_and_previous_wor… Its main advantage is that it is not lemma-oriented but sense-oriented. US patents 6,704,700 – active, 2020-11-17 Anticipated expiration 7,107,206 – Status Expired - Fee Related, 2022-07-22 Adjusted expiration can we register to http://www.unlweb.net/unlweb/ and for example get some resources see also github page: https://github.com/dikonov/Universal-Dictionary-of-Concepts Main dictionary is sense list , one sense can have several lemmas synonyms in many languages. Definitions is UNL-ized I think, it will be well if Abstract would inspired by UNL. If it will possible if sense definitions will not only in Abstract graph, but full understood by computer? , for example: https://en.wiktionary.org/wiki/star#Noun has 11 definitions, 1 and 2 are near, 4 and 5 are near, 1 and 4 are far, 4 seems to be sub-definition of 5 it is possible that computer find this properties of definitions? Also each definition might have some context example and even distinguish rules (for WSD): def 4: person, activities on film def 5, person, general activities def 9: often in phrase: his,her,out,their star etc I think it might several spaces: - uniwersal dictionary of senses with definitions in abstract graph - dictionaries for English and other languages - main space – not common words but encyclopedic entities Advantages of UNL- like approach - sense-oriented - can generate text in natural languages - resource of understanding definitions - data for learning word sense disambiguation problem Best regards, Andrzej

2 1

Re: [Abstract-wikipedia] Comprehension questions
by Grounder UK 03 Aug '20

03 Aug '20

Thanks, Charles. That's a very fine piece of work! And so relevant, not just to the quiz use case but to the whole of the NLG side of our project. I wonder whether there have been any further developments. I don't see any links to that pdf on Google. I've put the link in our to-do list[1]. There's some good stuff in there about performance as well as data quality, which are both areas we should certainly be looking into. On the question of data structure, I guess it rather depends where you are sitting. The essence of my quiZiverse idea is that the consumer handles a relatively small dataset (client side) because WMF servers are running the functions. Essentially, the result is a pretty manageable structured object (in the more collaborative mode, perhaps a ZObject) and it can be grown iteratively. Given those assumptions, it hardly matters which formats are desired as inputs to (for consumption within) an actual quiz. Since I was envisaging passing out "links" that are essentially a pre-written call back to the function with different arguments, those potential call-backs could be queued for processing server-side, so that the eventual call-back is referencing a freshly minted structured object (with a fairly limited shelf-life, unless it's a refresh of a pre-existing ZObject). So, again, lots of options to explore on the technical side. I'm inclined to disagree with you on the question of hints, though. The structure I was envisaging is very straightforward; the "distractor" is an answer to some other question, so the "hint" is just that question phrased as a statement. It could be more complicated, but it should still be a fairly simple connection to the next function call. Of course, "where next?" may depend on whether the question was answered correctly, or there might be a choice to be made, but I think that would still resolve quite simply into a single next call. The functionality is an interactive "crawler" [2], at the end of the day, with each "next step" deferred until required or pre-prepared if responsiveness might be an issue. Keep it simple; iterate collaboratively; make it great! Best regards, Al. [1] https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia/ Related_and_previous_work/Natural_language_generation <https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Related_and_previous_wor…> [2] https://en.wikipedia.org/wiki/Web_crawler On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org> wrote: > Send Abstract-Wikipedia mailing list submissions to > abstract-wikipedia(a)lists.wikimedia.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia > or, via email, send a message with subject or body 'help' to > abstract-wikipedia-request(a)lists.wikimedia.org > > You can reach the person managing the list at > abstract-wikipedia-owner(a)lists.wikimedia.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Abstract-Wikipedia digest..." > > > Today's Topics: > > 1. Re: Comprehension questions (Charles Matthews) > 2. Natural Language and Mathematics Generation (Adam Sobieski) > 3. Re: Natural Language and Mathematics Generation (Charles Matthews) > 4. Loose notes (Andy) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 2 Aug 2020 17:22:11 +0100 (BST) > From: Charles Matthews <charles.r.matthews(a)ntlworld.com> > To: "General public mailing list for the discussion of Abstract > Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki > media.org> > Subject: Re: [Abstract-wikipedia] Comprehension questions > Message-ID: <29359200.234452.1596385332016(a)mail2.virginmedia.com> > Content-Type: text/plain; charset="utf-8" > > > > On 02 August 2020 at 16:16 Grounder UK <grounderuk(a)gmail.com> wrote: > > > > > But we don't just want the answer, we want a quiz! Equally, maybe we > don't just want the question and the answer, we want some wrong answers and > some tips. > > > > It goes back to 2016, just to generate questions from Wikidata: > > https://pub.tik.ee.ethz.ch/students/2016-FS/BA-2016-03.pdf > > Technically the incorrect answers in multiple choice are called > "distractors". Clearly this is a rather simple data structure to handle. > Hints assume quite a bit more. > > At the beginning of 2017, I decided to take seriously the suggestion (from > Magnus Manske) that questions should be treated as structured data. I even > suggested Wikidata should have a namespace for them (this didn't go down > well). A road not taken then, and just as the Comprende! tool was finished > I got diverted into a Wikimedian in Residence position. So much for that. > > Anyone, one take on this is that AW output might be some kind of > structured data, rather than the sectioned prose (+media files and tables > and templated data) familiar from Wikipedia. > > By the way, mathematics in wikitext has traditionally been a threefold mix > of approaches (HTML, png, LaTex): not an elegant solution. > > Charles >

2 1

Loose notes
by Andy 03 Aug '20

03 Aug '20

Hi, Abstract Wikipedia give benefits: - first, is creating multi-language corpus for machine translation learning. The big disadvantage of the existing multi-language corpuses is that most of data is from movie subtitles, which are very inaccurate. - second, that it will data for Word Sense Disambiguation learning and WSD in many languages(!). In abstract form should be graph of senses. Senses will be choosed from English Wordnet/UNL or English Wiktionary? UNL is piece of good work but is inactive for years and not evolves. Wiktoinary senses have plus, that are grouped by etymology – quite different senses are in other etymology group. Abstract Wikipedia will linked with Wiktionary? Wiktionary senses numbers should be now persistent , or better have unique idents. Wiktionary has advantage that senses are translated to other languages, with disadvantage that its points to words not senses in other language. Alternative Abstract Wikipedia can have own sense list with idents but how to lik with Wiktionary? Graph: should be possibility to create text in many/all laguages. For example in English is “I saw”, in Polish “widziałemwidziałam” – Polish need gender, in Abstract form should be gender of verb, even though some languages not uses it. Senses dictionary can grow gradually with abstract text. If I edit abstract text, editor should enforce me add word with senses to dictionary if not exists and enable me to add new sense if not exists. Is neede: abstract text = corpus growing dictionary of senses growing senses to national language senses dictionary possibly link with Wiktionaries Best regards, Andrzej

1 0

Re: [Abstract-wikipedia] Comprehension questions
by Grounder UK 02 Aug '20

02 Aug '20

Thanks, Charles. I can certainly see the possibility of many interesting use cases there. True or false questions would be an interesting game for our natural-language renderers to play, for example. Given an inferred statement supposed to be true, negate it. Test-setters might be expected to correct errors of fact or expression, but that's up to them. It would be interesting to monitor which statements they preferred to choose as True and which as False, in any event. Questions of the form: "choose the best answer from the following" could also be a win-win if our renderers face difficulties selecting or expressing some combination of facts. Then there is the grading of information. Questions chosen for more basic tests might be supposed to be more generally relevant than those chosen for more advanced tests, which might feed back into the emphasis in the general Wikipedia article (now complete with a slider bar for the reader's current and/or target level of understanding, as well as competence in the language). And finally, renderer, given the pedagogue's valuable input into what is an appropriate statement of fact here, please turn it into questions in many languages! Loving it... Thank you again, Charles Best regards, Al. > > Today's Topics: > > 1. Re: How to store wikitext along the structured content? > (Grounder UK) > 2. Re: Comprehension questions (Charles Matthews) > >

3 2

2024

2023

2022

2021

2020

Abstract-Wikipedia August 2020