Tomorrow (Friday, August 21st) at 19:00 UTC (Noon Pacific, 9pm Central
Europe) at the Wikimedia Clinic #009 I will be presenting about Abstract
Wikipedia. If you are interested, feel free to join!
There will be time for questions.
https://meta.wikimedia.org/wiki/Wikimedia_Clinics#Upcoming_calls
I think that Abstract Wikipedia and more explicitly WikiLambda functions
could be used for generating "References by Description" and later, help
understanding them and reconciling them.
One of the long term goals of Data Commons https://datacommons.org/faq is
to help resolve ambiguous entities by using "Reference by Description".
Denny, are you familiar with Guha's paper?
https://arxiv.org/pdf/1511.06341.pdf
‘John McCarthy, Pioneer in Artificial Intelligence...’ the term ‘John
> McCarthy’ alone is ambiguous. It could refer to a computer scientist, a
> politician or even a novel or film. In order to disambiguate the reference,
> the head-line includes the description “Pioneer in Artificial
> Intelligence”.
>
Thad
https://www.linkedin.com/in/thadguidry/
Hoi,
I am amazed by all the competing ideas, notions I have read on the mailing
list so far. It is bewildering and does not give me a notion of what is to
be done.
I have thought about it and for me it is simple. For every Wikipedia
article there are two Wikidata items that have no Wikipedia article. It
follows that the first item of business is to make these knowable in any
language. The best way to do this is by providing automated descriptions
that aid in disambiguation.
When a Wikipedia article exists, it links to many articles. All of them
have their own Wikidata item and all can be described either in a Wikidata
triple or in structured text.
When sufficient data is available, a text can be generated. This has been
demonstrated by LSJBOT and it is why a Cebuano Wikipedia has so many
articles. A template as used by LSJBOT can be adapted for every language.
My point is that all the research in the world makes no difference when we
do not apply what we know.
Thanks,
GerardM
https://ultimategerardm.blogspot.com/2020/08/keeping-it-simple-for-abstract…
Thank you for joining in, Adam. I think I'll leave outputting mathematics
for now. Perhaps you could start a new topic on that later? I'd also like
to think about outputting functions from our anonymous wiki (aka
"wikilambda"), which I'll come onto here.
I definitely agree about reading-comprehension qustions. For now, I
encourage everyone who is not already familiar with it to check out
https://www.deepl.com/translator. It does work on a tablet, but I found it
a bit fiddly to interact with. What it allows you to do is play with its
translation, choosing alternative words etc, and it reworks the rest of the
translation as you go along.
When it comes to Q&A mode, that brings us back to Charles's difficult
problem of re-use. (This seems to be missing here, but it's in
https://lists.wikimedia.org/pipermail/abstract-wikipedia/
2020-July/000233.html.) It also brings us back to "Reasoning over
ontologies" (https://lists.wikimedia.org/pipermail/abstract-wikipedia/20
20-July/000206.html). Reuse is never easy, and "all the facts in all the
languages at all levels" is rather a broad scope... So I certainly accept
the wisdom of Charles's suggestion of a pilot project. I started with a
single quiz question: "What is the atomic number of oxygen?". For many
Wikipedias, if not most, the article on oxygen gives you the answer in the
first sentence or two.
But we don't just want the answer, we want a quiz! Equally, maybe we don't
just want the question and the answer, we want some wrong answers and some
tips. We begin with a multiple choice question, as above. "What is the
atomic number of oxygen? Is it (a) "Z"; (b) "O"; (c) 16; (d) 6; (e) 8; (f)
....
The first thing to notice is that the wrong answers are not fictitious;
they are values you will find in Wikidata Item Q629 (well, "Z" is nearby,
in the label of the atomic number property, but no more clues!) and we can
explore this further by choosing a wrong answer and provoking a response
(a) "No, "Z" is the symbol for "atomic number", not the value for oxygen.
(b) That's not oxygen's atomic number, that's its symbol.
(c) Oxygen is in group 16 on the periodic table. See if you can find it
there.
(d) Well, oxygen is the sixth element in period 2...
(e) That's right! Try choosing a wrong answer, just for fun...
So we have a small number of somewhat connected facts to explore. Any of
them can be turned into a question, which can pull in facts connected to
that, like other elements in group 16 or period 2. This is enabled by the
"quiZiverse" function that can turn a Wikidata Statement into a related set
of statements, for any one of which a new related set can be produced. In
an interactive context, choosing the option displays the text and could
call the function again with the chosen statement in place of the original.
Or the function is embedded in a link, so that, for example, clicking
"group 16" calls the function withQ104567 as well as Q629, extending the
quiz.
If used to set a quiz, the teacher can suppress some answers and add
alternatives, tailoring the quiZiverse to the students' level and the
material to cover. It is here that we need to see how to capture the
teacher's thinking, the rationale for changing the "focused ontology". Was
it too advanced or too simple? Difficult to understand? Irrelevant? More
interesting connection to something else... and so on. That sort of
metadata can then be used to guide future quiZiverse instantiations, with
target values for a range of dimensions, such as "reading comprehension".
Best regards,
Al.
On Sunday, 2 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Comprehension questions (Adam Sobieski)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 2 Aug 2020 01:15:10 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wiki
> media.org>
> Subject: Re: [Abstract-wikipedia] Comprehension questions
> Message-ID:
> <CH2PR12MB41844F0AF42F5A53024C82D5C54C0(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="utf-8"
>
> Educational technology is also interesting here.
>
> Generating reading comprehension questions while generating natural
> language articles is an interesting topic. I think that the matter would be
> one of refining the set of possible questions and selecting the best
> questions for a particular reader in a particular context. One might also
> find interesting the topics of intelligent tutoring systems [1] and
> automatic item generation [2].
>
> We can view the automatic generation of encyclopedia articles in response
> to search engine queries as a type of Q&A system. Articles and their
> related content hyperlinks sections could be generated in search result
> contexts, contexts which include the question(s) that users asked a search
> engine to find the content. Articles, when produced with this search engine
> referrer information, could, in addition to highlighting relevant content,
> recommend follow-up questions for readers to select in a related content
> section, each follow-up question a hyperlink to another article (resembling
> a hypertext-based dialogue system). Hopefully, these related content
> hyperlink sections (perhaps resembling a recommender system) would entice
> readers to further self-directed learning.
>
> I would like to also indicate that we should explore outputting
> mathematics when automatically generating encyclopedia articles. For
> wikitext, this could involve outputting LaTeX for MathJax to process.
>
>
> Best regards,
> Adam
>
> [1] https://en.wikipedia.org/wiki/Intelligent_tutoring_system
> [2] https://en.wikipedia.org/wiki/Automatic_Item_Generation
>
> From: Grounder UK<mailto:grounderuk@gmail.com>
> Sent: Friday, July 31, 2020 12:32 PM
> To: abstract-wikipedia(a)lists.wikimedia.org<mailto:abstract-wikip
> edia(a)lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Comprehension questions
>
> Thanks, Charles.
>
> I can certainly see the possibility of many interesting use cases there.
> True or false questions would be an interesting game for our
> natural-language renderers to play, for example. Given an inferred
> statement supposed to be true, negate it. Test-setters might be expected to
> correct errors of fact or expression, but that's up to them. It would be
> interesting to monitor which statements they preferred to choose as True
> and which as False, in any event.
>
> Questions of the form: "choose the best answer from the following" could
> also be a win-win if our renderers face difficulties selecting or
> expressing some combination of facts.
>
> Then there is the grading of information. Questions chosen for more basic
> tests might be supposed to be more generally relevant than those chosen for
> more advanced tests, which might feed back into the emphasis in the general
> Wikipedia article (now complete with a slider bar for the reader's current
> and/or target level of understanding, as well as competence in the
> language).
>
> And finally, renderer, given the pedagogue's valuable input into what is
> an appropriate statement of fact here, please turn it into questions in
> many languages!
>
> Loving it...
>
> Thank you again, Charles
>
> Best regards,
> Al.
>
> Today's Topics:
>
> 1. Re: How to store wikitext along the structured content?
> (Grounder UK)
> 2. Re: Comprehension questions (Charles Matthews)
>
Just one huge Thank You for Ordia, Finn Årup Nielsen!
It's really coming along nicely now we have so many more Lexemes.
You are quite right, of course; we're not quite up to 325,000. I overlooked
the possibility of a Lexeme having multiple lemmas. A few have as many as
six, it seems! Sorry, for that slight overstatement. I hope you didn't
think you had lost some.
While I'm apologizing, it seems that I got the link to your
aclweb.anthology paper wrong when I included it earlier! (It should be
"2020.ldl" not "2020.idl", of course.) Sorry for that, too. I assume that
https://www.aclweb.org/anthology/2020.ldl-1.12.pdf [corrected link] is
identical to https://people.compute.dtu.dk/faan/ps/Nielsen2020Lexemes.pdf.
Thank you again for your great work. I hope my mistakes did not
inconvenience you too much.
Best regards,
Al.
On Tuesday, 4 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Loose notes (Andy)
> 2. Re: Loose notes (fn(a)imm.dtu.dk)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 4 Aug 2020 17:49:03 +0200
> From: Andy <borucki.andrzej(a)gmail.com>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.
> wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Loose notes
> Message-ID:
> <CAE2KeALchD9EAY0HPZgmR9y760eVPO=O+mWiEd5+o0Ns==zbYA@mail.
> gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Is any road map on https://meta.wikimedia.org/ with estimated points of
> time for Abstract Wikipedia?
>
> pon., 3 sie 2020 o 18:43 Grounder UK <grounderuk(a)gmail.com> napisał(a):
>
> > Plenty more work to be done!
> >
>
Andrzej,
Yes, there are over 325,000 lexemes in Wikidata now, over 40,000 for
English.
"Abstract" definitions are a little tricky, but it is not Lexemes
themselves that are defined, it is their Senses, and Senses can be linked
to Wikidata Items, which connects Lexemes into the abstract graph of
"knowledge".
Translations are still very incomplete but, as with definitions, it is the
Sense that should have the translation. The difficulty is that translation
cannot imply identity, which means that you cannot assume that a Sense to
Sense translation allows you to acquire translations from the Sense you
translate into. If you think of each Sense as a set, you cannot tell
whether the translated Sense is a subset or a superset. What we need for
that is the concept of the intersection between the two sets, which would
be part of each Sense but not necessarily the whole of either Sense.
So, broadly, your example of "zamek" is not a problem; you can connect the
"lock" Sense to the Sense of the English word "lock" (L1132-S1) as well as
to the identifier for the encyclopedic concept Q228039 and/or Q24644118
(claimed to be a subclass of Q228039). But you should not connect it to
L1132-S2 (which connects to Q105731 pl:"Śluza wodna") or to L1132-S3
(Q1134386 pl:"Zamek (broń)", assuming that's a different Sense of "zamek"
too). (I say this without knowing enough Polish to know if it makes sense;
I'm living in Searle's Chiński pokój!)[1]
I don't know whether the lexical data is in the dumps now, but it will be
pretty huge just by itself. It is also quite dependent on the main Wikidata
pages. For our natural-language generation, that's a great strength,
because we can move naturally from the concept to the word and related
vocabulary in any language without doing any translation. The extra context
we need to be able to choose the right Form of the Lexeme for the Sense...
that will need more work on the data, as will characterising thesaurus
relations (hypernymy, synonymy, hyponymy, antonymy etc) so that good
alternative Lexemes can be found. In an "abstract" context, these can be
thought of as "translations" into overlapping Senses, but the extent to
which we represent and consult (or navigate within) the broader compound
Sense domain (the set union of the Senses) is... an interesting challenge.
As for a fully "abstract" dictionary that can be read in any language...
We'll be better able to think about that once we have built a few renderers
for our "abstract" encyclopedic content, in my view. Machine translation
and natural-language understanding are not our primary goal. I think we
will make progress on both, if we remember to pay attention to inverse
functions as we evolve our NLG renderers, but we have a very long way to go
in all directions (and all languages).
Best regards,
Al.
[1] https://pl.wikipedia.org/wiki/Chi%C5%84ski_pok%C3%B3j
On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Natural Language and Mathematics Generation (Adam Sobieski)
> 2. Re: Loose notes (Andy)
> 3. Re: Loose notes (Arthur Smith)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 3 Aug 2020 18:23:03 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: Charles Matthews <charles.r.matthews(a)ntlworld.com>, "General
> public mailing list for the discussion of Abstract Wikipedia (aka
> Wikilambda)" <abstract-wikipedia(a)lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics
> Generation
> Message-ID:
> <CH2PR12MB4184F2C81E4CD533ACFE9547C54D0(a)CH2PR12MB4184.namprd
> 12.prod.outlook.com>
>
> Content-Type: text/plain; charset="utf-8"
>
> Charles,
>
> There is also MathML to consider. Work is underway at the W3C with respect
> to a new version of MathML, MathML4 [1][2]. Work is underway with respect
> to adding MathML support to Chromium [3][4].
>
> Instead of LaTeX, MathML could be the way to go.
>
>
> Best regards,
> Adam
>
> [1] https://www.w3.org/community/mathml4/
> [2] https://mathml-refresh.github.io/mathml/
> [3] https://www.chromestatus.com/feature/5240822173794304
> [4] https://mathml.igalia.com/
>
> From: Charles Matthews via Abstract-Wikipedia<mailto:abst
> ract-wikipedia(a)lists.wikimedia.org>
> Sent: Monday, August 3, 2020 1:53 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics
> Generation
>
>
>
> On 03 August 2020 at 16:50 Adam Sobieski <adamsobieski(a)hotmail.com> wrote:
>
>
>
> By utilizing <math>LaTeX</math> elements in an XML-based intermediate
> output format, one could simply copy that mathematical content to the
> resultant output wikitext [3]. Wikitext utilizes this same convention for
> mathematical expressions [3].
>
>
>
> Whether or not to include mathematics in Abstract Wikipedia is an
> important decision to make at a future point. Choosing to include
> mathematics would entail discussions about representing mathematical
> knowledge on Wikidata. It would entail discussions about how specific
> senses of certain words have mathematical meaning. It would entail
> discussions about how algorithms should determine when to use mathematical
> and scientific notations and when they should, instead, use paraphrases
> with the semantic content expressed using natural language. These are just
> some of the discussion topics which would arise should we desire to include
> mathematical and scientific notations in Abstract Wikipedia articles.
>
>
>
>
>
> I'm disagreeing with much of this.
>
> On LaTeX: while it is "industry standard", I'd like to draw attention to a
> point made in https://en.wikipedia.org/wiki/Help:Displaying_a_formula#Rend
> ering: "Latex does not have full support for Unicode characters, and not
> all characters render."
>
> It goes on to suggest that Vietnamese, for example, would not be well
> catered for, in terms of its diacritics.
>
> I appreciate that we are only talking currently about scoping, and
> high-level initial planning. But given AW's objectives, this is not a good
> sign, and I don't think we should just assume that LaTeX as an incumbent
> gets waved through. It is pre-Web, and something closer to HTML would be
> preferable, in my view.
>
> My background is in mathematics, and began my Wikipedia career writing
> mathematics articles. There are certainly issues, such as prose/notation
> balance. Mathematical language is heavily overloaded, from the
> disambiguation aspect. But I'm not really recognising the landscape of
> issues set out there.
>
> Charles
>
>
Hi, Andrzej
The assumption at the moment is, I think, that we will be using the
Wikidata lexicographical data [1]. This is not yet as extensive as
Wiktionary data [2], but it addresses many of the integrity issues. As far
as I understand it, the modelling of Sense still suffers from the flaw that
a Sense is presented as a "child" of a Lexeme. So, for example, L1883-S1 is
a Sense of Lexeme L1883, representing the English verb to "be" with a gloss
of "exist" and a "synonym" relationship to L2148-S1, a Sense of Lexeme
L2148, representing the English verb to "exist". I could be wrong, but the
simple idea of a word-free Sense to which all languages can link is
implemented only through a possible link to a concrete Wikidata Item, so
both L1883-S1 and L2148-S1 are linked to Q468777 (existence) and Q203872
(being). Apart from that, a separate translation of each Sense into each
corresponding Sense in each language seems to be the intent, at present.
Wikidata also has Forms of Lexemes (but I didn't find "widziałem"). The
Lexeme L185 ("see") has a Form L185-F3 ("saw") but this has no link to Form
L18498-F1, the uninflected form of the verb to "saw" (unlike Wiktionary,
which supports homographs implicitly). Each form has "grammatical
features", showing that L185-F3 is the "simple past" of L185 but the same
string, "saw", is the "simple present" of L18498. It does not explicitly
say that this is not the case in the third person singular, but there is a
different form, L18498-F2, which is both "simple present" and "third-person
singular", so there may be a presumption that the more particular overrides
the more general.
For "abstract" Senses, we could think of "abstract" as a new language, and
then have translations between "abstract" "language" and Senses in all
natural (and synthetic) languages. This would give you your "senses
dictionary" (and allow implied translations between any Senses linked to
the "abstract" Sense. When we need to generate a word in a particular
language, we would need to translate the "abstract" Sense to the target
language Lexeme and then consult the Forms of that Lexeme to identify which
ones are applicable, given the "grammatical features" of the context.
Plenty more work to be done!
Best regards,
Al.
[1]
https://www.wikidata.org/wiki/Wikidata:Lexicographical_data/Documentation
[2] https://www.aclweb.org/anthology/2020.idl-1.12.pdf
On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Comprehension questions (Charles Matthews)
> 2. Natural Language and Mathematics Generation (Adam Sobieski)
> 3. Re: Natural Language and Mathematics Generation (Charles Matthews)
> 4. Loose notes (Andy)
>
>
> ----------------------------------------------------------------------
>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 3 Aug 2020 12:29:03 +0200
> From: Andy <borucki.andrzej(a)gmail.com>
> To: abstract-wikipedia(a)lists.wikimedia.org
> Subject: [Abstract-wikipedia] Loose notes
> Message-ID:
> <CAE2KeAK00kSL=jJp8gNGPNp_N8KGH0yXXUXKSa6XLM9R-ParvA@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> Abstract Wikipedia give benefits:
>
> - first, is creating multi-language corpus for machine translation
> learning. The big disadvantage of the existing multi-language corpuses is
> that most of data is from movie subtitles, which are very inaccurate.
>
> - second, that it will data for Word Sense Disambiguation learning and WSD
> in many languages(!).
>
> In abstract form should be graph of senses. Senses will be choosed from
> English Wordnet/UNL or English Wiktionary? UNL is piece of good work but is
> inactive for years and not evolves. Wiktoinary senses have plus, that are
> grouped by etymology – quite different senses are in other etymology group.
> Abstract Wikipedia will linked with Wiktionary? Wiktionary senses numbers
> should be now persistent , or better have unique idents. Wiktionary has
> advantage that senses are translated to other languages, with disadvantage
> that its points to words not senses in other language. Alternative Abstract
> Wikipedia can have own sense list with idents but how to lik with
> Wiktionary?
>
> Graph: should be possibility to create text in many/all laguages. For
> example in English is “I saw”, in Polish “widziałemwidziałam” – Polish need
> gender, in Abstract form should be gender of verb, even though some
> languages not uses it.
>
> Senses dictionary can grow gradually with abstract text. If I edit abstract
> text, editor should enforce me add word with senses to dictionary if not
> exists and enable me to add new sense if not exists.
>
> Is neede:
>
> abstract text = corpus
>
> growing dictionary of senses
>
> growing senses to national language senses dictionary
>
> possibly link with Wiktionaries
>
>
> Best regards,
>
> Andrzej
>
note: graphs in xml form are too wordy, maybe better using form with
braces, dots and @
User open web editor. Pastes raw English text, for example first paragraph
(maybe may be restriction to 1000 chars?) of
https://en.wikipedia.org/wiki/Linux. Text is tokenized by spaCy, divided
into sentences and words. For words and phrases are finding lemmas and
parts of speech. Words changes color and become clickable. User can choose
sense for lexem, add sense or add each lexem. Next, is shown structure of
sentence graph, user can change it and add properties.
In first stage, before making this editor user can edit graphs code in
special language, it must be not too wordy.
Best regards,
Andrzej
More good points, Adam...
At this stage, I can't say that formats bother me greatly, although clearly
we need to think about them.
We do have to start with Wikidata but I wonder whether we should also be
looking at our wiki of functions. Could we consider a mathematical
expression as a symbolic representation of an executable function?
I like the idea of a Wikipedia that will actually compute the result of a
function it is telling you about, not least because editors could verify
that the syntax is correct by testing the function. But if some expressions
are executable, that broadens the question of format. To have a string that
could be copied into a spreadsheet, for example, would be an interesting
function for many. So I'm wondering how far you can get by "labelizing"
JSON objects with computer language labels rather than natural language
ones. So our "multiply" function is "labelized" "=PRODUCT" and
E=PRODUCT(m,POWER(c,2))... or E=m*c^2...?
Thinking only about text, I think we are bound to take a broader WMF-wide
view because we should at least consider how we can meet the requirements
of each and every Wikipedia, without ignoring sister projects like
Wikiversity. That's not to advocate a free-for-all, but if we increasingly
represent the semantics of mathematical expressions, rather than their
typography, this gives us something that can be represented more
meaningfully in Wikidata and, from there, expressed in natural language as
well as in a variety of symbolic and even functional forms.
I happen to think it will also aid reuse of functions from the wiki, but I
haven't given that idea much thought.
Best regards,
Al.
On Monday, 3 August 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Natural Language and Mathematics Generation (Adam Sobieski)
> 2. Re: Loose notes (Andy)
> 3. Re: Loose notes (Arthur Smith)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 3 Aug 2020 18:23:03 +0000
> From: Adam Sobieski <adamsobieski(a)hotmail.com>
> To: Charles Matthews <charles.r.matthews(a)ntlworld.com>, "General
> public mailing list for the discussion of Abstract Wikipedia (aka
> Wikilambda)" <abstract-wikipedia(a)lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics
> Generation
> Message-ID:
> <CH2PR12MB4184F2C81E4CD533ACFE9547C54D0@CH2PR12MB4184.
> namprd12.prod.outlook.com>
>
> Content-Type: text/plain; charset="utf-8"
>
> Charles,
>
> There is also MathML to consider. Work is underway at the W3C with respect
> to a new version of MathML, MathML4 [1][2]. Work is underway with respect
> to adding MathML support to Chromium [3][4].
>
> Instead of LaTeX, MathML could be the way to go.
>
>
> Best regards,
> Adam
>
> [1] https://www.w3.org/community/mathml4/
> [2] https://mathml-refresh.github.io/mathml/
> [3] https://www.chromestatus.com/feature/5240822173794304
> [4] https://mathml.igalia.com/
>
> From: Charles Matthews via Abstract-Wikipedia<mailto:abst
> ract-wikipedia(a)lists.wikimedia.org>
> Sent: Monday, August 3, 2020 1:53 PM
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)<mailto:abstract-wikipedia@lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Natural Language and Mathematics
> Generation
>
>
>
> On 03 August 2020 at 16:50 Adam Sobieski <adamsobieski(a)hotmail.com> wrote:
>
>
>
> By utilizing <math>LaTeX</math> elements in an XML-based intermediate
> output format, one could simply copy that mathematical content to the
> resultant output wikitext [3]. Wikitext utilizes this same convention for
> mathematical expressions [3].
>
>
>
> Whether or not to include mathematics in Abstract Wikipedia is an
> important decision to make at a future point. Choosing to include
> mathematics would entail discussions about representing mathematical
> knowledge on Wikidata. It would entail discussions about how specific
> senses of certain words have mathematical meaning. It would entail
> discussions about how algorithms should determine when to use mathematical
> and scientific notations and when they should, instead, use paraphrases
> with the semantic content expressed using natural language. These are just
> some of the discussion topics which would arise should we desire to include
> mathematical and scientific notations in Abstract Wikipedia articles.
>
>
>
>
>
> I'm disagreeing with much of this.
>
> On LaTeX: while it is "industry standard", I'd like to draw attention to a
> point made in https://en.wikipedia.org/wiki/Help:Displaying_a_formula#
> Rendering: "Latex does not have full support for Unicode characters, and
> not all characters render."
>
> It goes on to suggest that Vietnamese, for example, would not be well
> catered for, in terms of its diacritics.
>
> I appreciate that we are only talking currently about scoping, and
> high-level initial planning. But given AW's objectives, this is not a good
> sign, and I don't think we should just assume that LaTeX as an incumbent
> gets waved through. It is pre-Web, and something closer to HTML would be
> preferable, in my view.
>
> My background is in mathematics, and began my Wikipedia career writing
> mathematics articles. There are certainly issues, such as prose/notation
> balance. Mathematical language is heavily overloaded, from the
> disambiguation aspect. But I'm not really recognising the landscape of
> issues set out there.
>
> Charles
>
>
>