Hi Amir,
I understand the process is different that usual research. In fact I've seen Wikipedia grown from an unknown website to the biggest encyclopedia it is now. I use it daily in multiple languages and love it. I know what crowd sourcing could achieve.
> It's also possible that the mere *finding* of these stumbling blocks by such a big, diverse, open, and active community, will itself be a contribution to the scientific knowledge around this subject.
I disagree here. It would be contribution to scientic knowledge if and only if it wasn't discovered before. My email was precisely about that: capitalizing on the knowledge that has already been discovered, to avoid making the same mistake them again. It would not matter for a small project, but this one is really ambitious. We are speaking of 40 years of work by a horde of talented and very knowledgeable people, so this isn't to be dismissed easily.
This thing is, my previous email was a bit abstract, because it were a review of the paper, not of the project itself. I should have made more examples to illustrate where the problem lies.
Let's start with a simple example, in English, with corresponding Wikidata entities in-between parenthesis. I'm also using pseudo-turtle notation with made up relationships.
France (Q142) is a country (Q6256).
<Q142> <rel_is> <Q6256> .
Creating the English sentence is straightforward with the naive approach presented in the paper.
What is the French equivalent?
La France est un pays.
More information is required in the abstract representation: the text generator needs to know about the gender of both nouns (France and pays). So we need to extend the model as such:
<Q142> <rel_gender> <Q1775415> .
<Q6256> <rel_gender> <Q499327> .
Fine! Now what about Chinese?
法國是一個國家。
What we have in the middle of the sentence is a classifier (個). The model needs the following update:
<Q499327> <rel_use_classifier> <Q63153> .
To handle these 3 languages, the model has already 3 additional triples just for accounting for linguistic facts occuring in these languages. Wikipedia exists in more than 300 languages, and the world has about 6000 of them, each of them having particularities that must be taken into account. Fortunately they recoup themselves in-between languages. Nonetheless the World Atlas Language Structures (https://wals.info/chapter/s1) count 144 distinct language features. Some are related to speech, but this means there is probably something like a hundred features that must be taken into account in the data model to produce valid natural language sentence.
Note that in the Chinese example, there is also a number (一, one) showing up. This is a phenomenon that must be taken into account; and it's not always appearing when using 是 (to be). How complex the "lambda" system will be just to deal with this issue? Hint: very much. It also needs to be compatible with dozen of other phenomena.
Then each of those features require extensive and complete data. For French, the gender of every noun entity *must* be present, otherwise there is half a chance of producing a wrong sentence each time a noun entity is encountered. For Chinese and Japanese, classifier information must be present for all noun, in case one must be enumerated. Where does the project will get the data from? (we are speaking of millions of item, most not referenced in existing dictionaries) How will this be encoded? Those are real questions that must be answered.
Suppose now we have done the work for "renderers" in these three languages. They both use the more or less similar A X B structure where X is a verb meaning "to be".
What would be the Japanese equivalent?
The more natural structure would be like:
フランスは国(だ)。
What is a play here is a topicalization (Q63105) of France, followed by a predicate (it's a country). This is very different from the previous structure, which, not surprisingly enough, needs it's own representation. To make situation more difficult, the previous (A be B) structure can also exists in Japanese, but would lead to a totally different sentence if used.
The paper states that Figure 1 and 2 are examples that will be more complex in real life. Yet, the use of any existing formalism is dismissed, which mean all the situations I illustrated in this email will need to be dealt with in an ad hoc fashion. Moreover, changing formalism (be it ad hoc or not) will require to change every piece of code/data using it. This will happen everytime a language with unsupported feature(s) is added to the project. It's not hard to see how this will waste a huge amount of time and goodwill from involved people. The very code focussed tone of the paper, the english-centric approach used in the examples and the lack of references shows that the complexity of the task on the NLP front is not sufficiently conceptualized.
Best Regards,
Louis Lecailliez
________________________________
De : Abstract-Wikipedia <abstract-wikipedia-bounces(a)lists.wikimedia.org> de la part de abstract-wikipedia-request(a)lists.wikimedia.org <abstract-wikipedia-request(a)lists.wikimedia.org>
Envoyé : samedi 4 juillet 2020 15:06
À : abstract-wikipedia(a)lists.wikimedia.org <abstract-wikipedia(a)lists.wikimedia.org>
Objet : Abstract-Wikipedia Digest, Vol 1, Issue 6
Send Abstract-Wikipedia mailing list submissions to
abstract-wikipedia(a)lists.wikimedia.org
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
or, via email, send a message with subject or body 'help' to
abstract-wikipedia-request(a)lists.wikimedia.org
You can reach the person managing the list at
abstract-wikipedia-owner(a)lists.wikimedia.org
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Abstract-Wikipedia digest..."
Today's Topics:
1. Re: NLP issues severely overlooked (Charles Matthews)
2. Use case: generation of short description (Jakob Voß)
3. Re: NLP issues severely overlooked (Amir E. Aharoni)
----------------------------------------------------------------------
Message: 1
Date: Sat, 4 Jul 2020 14:05:09 +0100 (BST)
From: Charles Matthews <charles.r.matthews(a)ntlworld.com>
To: "General public mailing list for the discussion of Abstract
Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wikimedia.org>
Subject: Re: [Abstract-wikipedia] NLP issues severely overlooked
Message-ID: <2126327926.39940.1593867909152(a)mail2.virginmedia.com>
Content-Type: text/plain; charset="utf-8"
It is interesting to be on a list where one can hear about software issues, and then computational linguistic problems. I'm not an expert in either area.
I do have 17 years of varied Wikimedia experience (and I use my real name there).
> On 04 July 2020 at 12:25 Louis Lecailliez <louis.lecailliez(a)outlook.fr> wrote:
>
<snip>
> Nothing precise is said about linguistic resources in the AW paper except for "These function finally can call the lexicographic knowlegde stored in Wikidata.", which is not very convincing: first because Wiktionary projects themselves severely lacks content and structure for those who has some content at all, secondly since specialized NLP ressources are missing there too (note: I'm not familiar with Wikidata so I could be wrong, however I never saw it cited for the kind of NLP resources I'm talking about).
>
I can comment about this. Besides Wiktionary, there is the "lexeme" namespace of Wikidata. It is a relatively new part of Wikidata, dealing with verbal forms.
>To finish on a positive note, I would like to highlight the points I really like in the paper. First, its collaborative and open nature, like all Wikimedia projects, gives him a much higher chance of success than its predecessors.
It is worth saying, for context, that there is a certain style or philosophy coming from the wiki side: more precisely, from the wikis before Wikipedia. There is the slogan "what is the simplest thing that would actually work?" You might argue, plausibly, that Wikipedia at nearly 20 years old, shows that there is a bit more to engineering than that.
On the other hand, looking at Wikidata at seven years old, there is some point to the comment. It has a rather simple approach to linked structured data, compared to the Semantic Web environment. (Really, just write a very large piece of JSON and try to cope with it!) But the number of binary relations used (8K, if you count the "external links" handling) is now quite large, and has grown organically. The data modelling is in a sense primitive, sometimes non-existent. But the range of content handled really is encyclopedic. And in an area like scientific bibliography, at a scale of tens of millions of entities, the advantages of not much ontological fussiness begin to be seen.
Wikidata started as an index of all Wikipedia articles, and is now five times the size needed for that: a very enriched "index".
I suppose the NLP required to code up, for example, 50K chemistry articles about molecules, might be a problem that could be solved, leaving aside the general problems for the moment.
In any case, there is a certain approach, neither academic nor commercial, that comes with Wikimedia and its communities, and it will be interesting to see how the issues are addressed.
Charles Matthews (in Cambridge UK)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/abstract-wikipedia/attachments/202007…>
------------------------------
Message: 2
Date: Sat, 4 Jul 2020 08:18:56 +0200
From: Jakob Voß <jakob.voss(a)gbv.de>
To: <abstract-wikipedia(a)lists.wikimedia.org>
Subject: [Abstract-wikipedia] Use case: generation of short
description
Message-ID: <4403bbda-040b-6c89-9cb6-6540139250dc(a)gbv.de>
Content-Type: text/plain; charset="utf-8"
Hi,
I want to auto-generate disambiguation description for African
politicians to be added to Wikidata, e.g. from the country Mozambique
(Q1029) the following descriptions should be generated:
Mozambican politician (en)
Mosambikanischer Politiker (de)
politico mozambicano (it)
...
This could be extended to other professions. My questions:
- Can anyone point me to data sources where to best look up country
adjectives such as "Mozambican"?
- Where/how to best store the lexical information for best reuse with
other renderers
- If a create small renderers for this short descriptions, what
architecture do you prefer for best reuse?
My just-get-it-done solution would be a set of CSV files and a few lines
of Perl code, but maybe this use case can be aligned with Abstract
Wikidata to better learn about it.
Looking forward to collaborate,
Jakob
------------------------------
Message: 3
Date: Sat, 4 Jul 2020 18:03:24 +0300
From: "Amir E. Aharoni" <amir.aharoni(a)mail.huji.ac.il>
To: "General public mailing list for the discussion of Abstract
Wikipedia (aka Wikilambda)" <abstract-wikipedia(a)lists.wikimedia.org>
Subject: Re: [Abstract-wikipedia] NLP issues severely overlooked
Message-ID:
<CACtNa8t6kbWe21C980h1MxiWNfUp+0eDE82vPMjDUX2UCgb2gw(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Hi,
Thanks a lot for the sources. I am not one of the people implementing
Wikilambda, but I am just very curious about it as a member of the wider
Wikimedia community. But there's a good chance that they will be useful to
people who do work on the implementation.
I will dare to add a little thought I have about it, however. It's possible
that the challenge of building a well-functioning natural language
generator is underestimated by the founders, and that they don't pay enough
attention to existing work (although, knowing Denny, there is a good chance
that he actually is aware of at least some of it). But there is something
that the wide Wikimedia community has that I'm not sure that the past
projects in this field did: The community itself. A big, worldwide, and
diverse group of passionate volunteers, who love the idea of spreading free
knowledge and who love their languages. Quite a lot of them also know some
programming, and in the past they proved unbelievably creative and
productive when writing code for Wikimedia projects as a community, in the
form of templates, modules, gadgets, bots, extensions, and other tools. I'm
quite sure that once the new tools become usable, this community will start
doing creative things again, and it will also start reporting bugs and
limitations.
So yes, while it's possible that along the way both the core developers and
the volunteer community will find all kinds of stumbling blocks, I'm pretty
sure that they will also have all kinds of surprising success stories. It's
also possible that the mere *finding* of these stumbling blocks by such a
big, diverse, open, and active community, will itself be a contribution to
the scientific knowledge around this subject. And don't underestimate the
"open" part—that's where we really shine. This won't be a theoretical work
in a lab, published in a paywalled and copyright-restricted academic
journal, but fully optimized for accessibility to everyone.
Yes, this whole email from me is incredibly naïve, but it's the same
attitude that got us to writing the biggest and most multilingual
encyclopedia in history, so maybe we can do something cool again :)
--
Amir Elisha Aharoni · אָמִיר אֱלִישָׁע אַהֲרוֹנִי
http://aharoni.wordpress.com
“We're living in pieces,
I want to live in peace.” – T. Moore
בתאריך שבת, 4 ביולי 2020 ב-14:26 מאת Louis Lecailliez <
louis.lecailliez@outlook.fr>:
> Hello,
>
> my name is Louis Lecailliez, PhD student at Kyoto University in education
> technology. I'm a Computer Science and NLP graduate. One thing I do is
> working on language learner's knowledge modelling as graphs.
>
> The Abstract Wikipedia project is really interesting. There is however two
> very concerning issues I spotted when reading the associated paper draft (
> https://arxiv.org/abs/2004.04733). The following email could be read as
> negative, but please don't take it as such: my purpose is to avoid spending
> people efforts and money for things that can (need to!) be fixed upfront.
>
> 1. Issues with NLP
>
> The main issue is that the difficulty of the NLP task of generating
> natural text from an abstract representation is severely overlooked. This
> stems from the other main problem: the paper is not based on the decades of
> previous work in that space.
>
> As I understand it, the main value proposition of Abstract Wikipedia (AW)
> is a computer representation of encyclopedic knowledge that can be
> projected into different existing natural languages, with the goal of
> supporting a huge number of them. Plus, an editor to make this happen
> easily.
>
> This is in fact surprisingly extremely close to what the Universal
> Networking Language (UNL) project, which started 20 years ago, aims to do.
> UNL provides a language agnostic representation of text that uses
> hypergraph. Software (called EnConverter) produce UNL graphs from natural
> text in a given language. Another kind of software called DeConverter do
> the reverse, that is producing natural text from the abstract
> representation. This is exactly the same function of the "renderers" in the
> AW paper. The way of doing it is also similar: by applying successive
> transformations until the final text string is produced. In general, that
> kind of abstract meaning representation is called an Interlingua, and is
> widely used in Machine Translation (MT) systems.
>
> Disregarding two decades of work, in the UNL case, on the same problem
> space (rule-based machine translation, here from an abstract language as
> fixed source language), which was itself based on few other decades of
> work, doesn't seem to be a wise move to start a new project. For a start,
> the graph representation used in the AW will likely not be expressive
> enough to encode linguistic knowledge; this is why UNL uses hypergraphs
> instead of graphs.
>
> The problem is glaring when looking at the references list: the list is
> bloated with irrelevant references (such as those to programming languages
> [27, 37, 41, 77], Turing completeness being the worst offender [11, 17, 23,
> ...]) while containing only two references [7, 85] to the really hard part
> of the project: generating natural language from the abstract
> representation. There are few more relevant references about natural
> language generation, but this isn't enough.
>
> Interestingly, [85] is an UNL paper, but not the main one. Moreover, it is
> cited in Section 9 "Opening future research". This should be instead placed
> in a "Previous work" section which is missing from the paper.
>
> To fill a part of this section yet to be written, I propose the following
> references:
> [*1] Uchida, H., Zhu, M., & Della Senta, T. (1999). A gift for a
> millennium. IAS/UNU, Tokyo.
>
> https://www.researchgate.net/profile/Hiroshi_Uchida2/publication/239328725_…
> [*2] Wang-Ju Tsai (2004) La coédition langue-UNL pour partager la révision
> entre langues d'un document multilingue. [Language-UNL coedition to share
> revisions in a multilingual document] Thèse de doctorat. Grenoble.
>
> https://pdfs.semanticscholar.org/b030/ea4662e393657b9a134c006ca5b08e8a23b3.…
> [3*] Boitet, C., & Tsai, W. J. (2002). La coédition langue<—> UNL pour
> partager la révision entre les langues d'un document multilingue: un
> concept unificateur. Proc. TALN-02, Nancy, 22-26.
>
> http://www.afcp-parole.org/doc/Archives_JEP/2002_XXIVe_JEP_Nancy/talnrecita…
> [4*] Tomokiyo, M., Mangeot, M., & Boitet, C. (2019). Development of a
> classifiers/quantifiers dictionary towards French-Japanese MT. arXiv
> preprint arXiv:1902.08061.
> https://arxiv.org/pdf/1902.08061.pdf
> [5*] Boguslavsky, I. (2005). Some controversial issues of UNL: Linguistic
> aspects. Research on Computer Science, 12, 77-100.
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.212.2058&rep=rep1&…
> [6*] Boitet, C. (2002). A rationale for using UNL as an interlingua and
> more in various domains. In Proc. LREC-02 First International Workshop on
> UNL, other Interlinguas, and their Applications, Las Palmas (pp. 26-31).
> https://www.cicling.org/2005/unl-book/Papers/003.pdf
> [7*] Dhanabalan, T., & Geetha, T. V. (2003, December). UNL deconverter for
> Tamil. In International Conference on the Convergences of Knowledge,
> Culture, Language and Information Technologies.
> http://www.cfilt.iitb.ac.in/convergence03/all%20data/paper%20032-372.pdf
> [8*] Singh, S., Dalal, M., Vachhani, V., Bhattacharyya, P., & Damani, O.
> P. (2007). Hindi generation from Interlingua (UNL). Machine Translation
> Summit XI.
>
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.78.979&rep=rep1&ty…
> [9*] Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K.,
> Hermjakob, U., ... & Schneider, N. (2013, August). Abstract meaning
> representation for sembanking. In Proceedings of the 7th linguistic
> annotation workshop and interoperability with discourse (pp. 178-186).
> https://www.aclweb.org/anthology/W13-2322.pdf
> [10*] Berment, V., & Boitet, C. (2012). Heloise—An Ariane-G5 Compatible
> Rnvironment for Developing Expert MT Systems Online. In Proceedings of
> COLING 2012: Demonstration Papers (pp. 9-16).
> https://www.aclweb.org/anthology/C12-3002.pdf
> [11*] Berment, V. (2005). Online Translation Services for the Lao
> Language. In Proceedings of the First International Conference on Lao
> Studies. De Kalb, Illinois, USA (pp. 1-11).
>
> https://www.researchgate.net/profile/Vincent_Berment/publication/242140227_…
>
> [*1] is the paper that describes UNL. [2*] is a doctoral thesis discussing
> a core problem AW is trying to address too. [3*] is a short paper done in
> the scope of [2*], even if you don't understand French you can have a look
> at the figures: two of them are about an editor similar in principe to what
> AW wants to incorporate.
> [5*] Insights about UNL expressivity issues, 10 years after the project's
> start. [6*] More UNL, with short history and context in which it is used.
>
> [4*] shows how deep natural language conversion goes: this paper addresses
> the issue of classifiers in French and Japanese. This is just one
> linguistic issue and there are dozens if not hundreds of such. An important
> point is that both of the languages involved need to be taken into account
> when modelling the abstract encoding, otherwise too much information is
> lost for producing a correct output.
>
> [7*] [8*] are very valuable examples of real world deconverter systems for
> UNL. As it's visible on [7*]'s Figure 1 and [8*]'s Figure 2, the process is
> *way* more complicated than a single "renderers" box. Moreover, there are
> very distinct identifiable steps, informed by linguistics. The AW does not
> describe any such structuration of natural text generation processing
> steps, everything is supposed to be happening in some unstructured "lambda"
> system. Also missing are the specialized resources (UNL-Hindi dictionary,
> Tamil Word dictionary, co-occurrence dictionary, etc.) required for the
> task. Nothing precise is said about linguistic resources in the AW paper
> except for "These function finally can call the lexicographic knowlegde
> stored in Wikidata.", which is not very convincing: first because
> Wiktionary projects themselves severely lacks content and structure for
> those who has some content at all, secondly since specialized NLP
> ressources are missing there too (note: I'm not familiar with Wikidata so I
> could be wrong, however I never saw it cited for the kind of NLP resources
> I'm talking about).
>
> [10*] is a translation system built with "specialised languages for
> linguistic programming (SLLPs)" which is the service Wikilambda is supposed
> to provide for Abstract Wikipedia. [11*] gives the estimation of 2500 hours
> for the development (by a specialist) of three linguistic modules for Lao
> processing.
>
> So, in regard to the difficulty of the task, and previous work in the
> literature, the AW paper does not provide any convincing evidence that the
> technology on which it is supposed to be built can even reach the
> state-of-art. Dismissing every existing formal and software systems on the
> ground of "no consensus commiting to any specific linguistic theory" is not
> gonna work: this will result in ad hoc implementation-driven formalism that
> will have hard time fullfilling its goal.
> The NLP part (generating sentences from abstract representation) is the
> hardest of the project, yet it’s by far the least convincing one. "Abstract
> Wikipedia is indeed firmly within this tradition, and in preparation for
> this project we studied numerous predecessors." I would like to believe so,
> but the lack of corresponding reference as well as lack of previous work
> section tends to prove the contrary.
>
> While I can't advice for a switch to UNL, as I'm not specialist of it, it
> would be smart to capitalize on the work done on it by highly skilled (PhD
> level) individuals. As the UNL system is built on hypergraphs, it probably
> could be made interoperable easily with RDF knowledge graphs if named
> graphs are used. By having a UNL/RDF specification (yet to be written), the
> vision exposed in the AW paper may be reached sooner by reusing existing
> software (we are speaking of thousands man-year of work as per [11*]), and
> almost as importantly, an existing formalism that has been "debugged" for
> decades. There are probably other systems I'm unaware of that are worth
> investigating too, some like [9*] having more specialized usage. In any
> case, there is a strong need to back the paper and the project on the
> existing (huge) literature.
>
> 2. Other issues
>
> "In order to evaluate a function call, an evaluator can choose from a
> multitude of backends: it may evaluate the function
> call in the browser, in the cloud, on the servers of the Wikimedia
> Foundation, on a distributed peer-to-peer evaluation
> platform, or natively on the user’s machine in a dedicated hosting
> runtime, which could be a mobile app or a server on the
> user’s computer."
>
> This part is big technical creep. There is no reason to turn the project
> into a distributed heterogenous computing platform with a dedicated
> runtime, which could be a research project on its own, when the stated goal
> is to provide abstract multilingual encyclopedic content. All the
> computation can be done on servers (cloud is servers too) and cached. This
> is way easier to implement, test and deliver than to implement 10 different
> backends with various progress in implementation, incompatibilities and
> runtime characteristics.
>
> The paper presents AW as sitting on top on WL. Both are big projects.
> Sitting a big project on top of another one is really risky, as it means a
> significant milestone must first be reached in the dependency (here WL),
> which would likely took some years, before even starting the work on the
> other project. AW can be realised with current tools and engineering
> practices.
>
> "One obstacle in the democratization of programming has been that almost
> every programming language requires first to learn some basic English."
>
> This strong affirmation needs to be sourced. Programming languages, save
> for a few keywords, doesn't rely much on English. The vast insuccess of
> localized version of programming languages (such as French Basic) as well
> as the heavy use of existing programming language in countries that doesn't
> even use the Latin alphabet (China, Russia) tends to prove that English is
> not all a bottleneck for the democratization of programming. [53] is cited
> later in the paper but is a pop-linguistic article from an online
> newspaper, not an academic article.
>
> 3. Final words
>
> To finish on a positive note, I would like to highlight the points I
> really like in the paper. First, its collaborative and open nature, like
> all Wikimedia projects, gives him a much higher chance of success than its
> predecessors. If UNL is not too well-known, it’s not because it didn't
> yield research achievements, but because one selected institution per
> language is working on it and keep the resources and software within the
> lab walls. Secondly, there are some very welcome out-of-scope features:
> conversion from natural language, restriction to encyclopedic style text.
> This will allow for more focused effort towards the end goal, making it
> more achievable. And finally, the choice to go with symbolic/rule-based
> system with a touch of other ML where useful. This is, as said in the
> paper, a big win for explainability and using human contributions to build
> the system. This will also keep the computing cost to a more sane baseline
> than what the current deep learning models require.
>
> I think the project can succeed thanks to its openess, yet there is are
> real dangers visible in the paper: on the NLP side to reinvent a wheel that
> took 40 years to build, and on the technical side to lose time and effort
> on a project not required per se for AW to be build.
>
> As I spend a significant time (~10 hours) gathering references and writing
> this email (which is 5 pages long in Word), I would like to be mentioned as
> co-author in the final paper if any idea or references presented here is
> used in it.
>
> Best regards,
> Louis Lecailliez
>
> PS:
> 4. Typos
> * "These two projects will considerably expand the capabilities of the
> Wikimedia platform to enable every single human being to freely share share
> in the sum of all knowledge." => duplicate share
> * "The content is than turned into" => The content is then turned into
> * "[26] Charles J Fillmore, Russell Lee-Goldman, and Russell Rhodes. The
> framenet constructicon. Sign-based construction grammar, pages 309–372,
> 2012." => The framenet construction
> * "These function finally can call the lexicographic knowlegde stored in
> Wikidata." => These function finally can call the lexicographic knowledge
> stored in Wikidata
> * "[102] George Kinsley Zipf. Human Behavior and the Pirnciple of Least
> Effort. Addison-Wesley, 1949." => [102] George Kinsley Zipf. Human Behavior
> and the Principle of Least Effort. Addison-Wesley, 1949.
> * "Allowing the individual language Wikipedias to call Wikilambda has an
> addtional benefit." => Allowing the individual language Wikipedias to call
> Wikilambda has an additional benefit.
> _______________________________________________
> Abstract-Wikipedia mailing list
> Abstract-Wikipedia(a)lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/abstract-wikipedia/attachments/202007…>
------------------------------
Subject: Digest Footer
_______________________________________________
Abstract-Wikipedia mailing list
Abstract-Wikipedia(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
------------------------------
End of Abstract-Wikipedia Digest, Vol 1, Issue 6
************************************************
Thanks, Charles.
I can certainly see the possibility of many interesting use cases there.
True or false questions would be an interesting game for our
natural-language renderers to play, for example. Given an inferred
statement supposed to be true, negate it. Test-setters might be expected to
correct errors of fact or expression, but that's up to them. It would be
interesting to monitor which statements they preferred to choose as True
and which as False, in any event.
Questions of the form: "choose the best answer from the following" could
also be a win-win if our renderers face difficulties selecting or
expressing some combination of facts.
Then there is the grading of information. Questions chosen for more basic
tests might be supposed to be more generally relevant than those chosen for
more advanced tests, which might feed back into the emphasis in the general
Wikipedia article (now complete with a slider bar for the reader's current
and/or target level of understanding, as well as competence in the
language).
And finally, renderer, given the pedagogue's valuable input into what is an
appropriate statement of fact here, please turn it into questions in many
languages!
Loving it...
Thank you again, Charles
Best regards,
Al.
>
> Today's Topics:
>
> 1. Re: How to store wikitext along the structured content?
> (Grounder UK)
> 2. Re: Comprehension questions (Charles Matthews)
>
>
Hello all,
one early question we are currently debating is how to store Wikitext
documentation alongside with the structured data?
So, the label of the page and aliases and the actual content object are
stored as JSON, but then we would like to have the documentation be more or
less normal wikitext.
So in this mockup
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Early_mockups#/media/Fil…
The text "en:Multiplication is a mathematical operation that...", that's
just wikitext. And it is different per language. Obviously, it would be
great to represent that as abstract content, but we are not there yet.
Until we get to that level of inception, the question is - where and how do
we store that text and how is it combined with the structured data about
the object on the page.
James wrote up a task with an overview of the options.
https://phabricator.wikimedia.org/T258953
That's a question that would benefit from input.
Thank you,
Denny
> On 30 July 2020 at 10:47 Grounder UK <grounderuk(a)gmail.com> wrote:
>
> Did you have any questions, Charles? I'm not seeing any.
>
Ah. According to https://lists.wikimedia.org/pipermail/abstract-wikipedia/2020-July/000223.h… there may have been some glitch.
Resending.
Charles
/starts
I have been interested in edtech since 2012, when I did some work on Moodle for Wikimedia UK. The AW project has, for me, an obvious place for some educational development, and I'm dropping in the main lines of my thinking with this posting.
Firstly, it is a standard form of educational material to supply some material to read, or watch, and then some questions to answer. That can either be as a test, or as knowledge review/self-test. If AW is going to supply base code for articles - or let's say an article section - then questions could be appended in related code, and rendered together with it. For example the cloze (missing word) type of test would seem relatively easy to implement.
Granted that this sort of application of AW, in fully multilingual form, is not so hard to envisage, what needs to be said at the current stage of prototyping? A few points:
(1) There is actually no de facto standard for multiple choice questions. AW could address this gap in the market.
(2) My experience with Moodle (which is a long story) suggests to me the basic architectural point that a question database should be the hub of an edtech system.
(3) Instructional design, which is a bit more than just having an edtech content management system, is not so hard to enable. The function wiki could enable it without a big stretch, I'd think.
I don't want to write a manifesto here, just yet. On point (1) there is Moodle XML, but it is clearly too rigid and limited. Magnus Manske at http://magnusmanske.de/wordpress/?p=446 has shown what Wikibase can do in this area, with a tool Comprende! - the overlap with the subject of this posting is no coincidence.
/ends
Slightly off-offtopic, but not back on-topic, yet within scope and looking
forward...
I was exploring this kind of issue with ArthurPSmith yesterday on
https://meta.wikimedia.org/wiki/Talk:Abstract_Wikipedia#Hybrid_article. I
ended up drafting some possible requirements (in the context of delivering
natural-language text to Wikipedias):
- *Requirement [a]*: Implicit and/or explicit exclusion of a Wikidata
Item (explicit could be a special case of implicit but not vice versa)
- *Requirement [b]*: Wikipedias can opt out of sets of Items (including
a single Item)
- *Requirement [c]*: Wikipedias can opt out of sets of images, templates
etc (including sets of one), and of specific Wikidata claims (or types of
claim, loosely defined...)
- *Requirement [d]*: Inclusion of anything subject to [c], if
unconditional, implies opt-out by Wikipedias opting out under [c] (opt-outs
are inherited upwards by unconditional inclusion)
Do these make sense? Would they address the problem being discussed here in
full, or only in part? I would put "short description" under [c], as a
"type of claim", but I was already thinking [c] could be divided up. (Now
it looks to me as if the "explicit" part of [a] is the same as the "single
item" in [b], but they are what they are, for now..."All work is
preliminary")
Best regards,
Al.
On Thursday, 30 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: How to store wikitext along the structured content?
> (Peter Southwood)
> 2. Re: Off-topic, was How to store wikitext along the structured
> content? (Peter Southwood)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 30 Jul 2020 13:41:17 +0200
> From: "Peter Southwood" <peter.southwood(a)telkomsa.net>
> To: "'General public mailing list for the discussion of Abstract
> Wikipedia \(aka Wikilambda\)'"
> <abstract-wikipedia(a)lists.wikimedia.org>
> Subject: Re: [Abstract-wikipedia] How to store wikitext along the
> structured content?
> Message-ID: <003101d66666$582586c0$08709440$(a)telkomsa.net>
> Content-Type: text/plain; charset="utf-8"
>
> Amir, That discussion is not so much about undeploying the {{short
> description}} feature, it is about WMF following up on what they undertook
> to do to disable display of unsourced and unchecked Wikidata content in a
> way that suggests it is Wikipedia content once Wikipedia met the WMFs
> unilaterally imposed conditions against Wikipedia project consensus. I
> would have expected the closing to have been worked out ahead of time, and
> a mere formality, but it seems that interminable squabbling is possible
> even in as straight forward appearing a situation as this.
>
> Cheers,
>
> Peter
>
>
>
> From: Abstract-Wikipedia [mailto:abstract-wikipedia-
> bounces(a)lists.wikimedia.org] On Behalf Of Amir E. Aharoni
> Sent: 30 July 2020 12:22
> To: General public mailing list for the discussion of Abstract Wikipedia
> (aka Wikilambda)
> Subject: Re: [Abstract-wikipedia] How to store wikitext along the
> structured content?
>
>
>
>
>
> בתאריך יום ה׳, 30 ביולי 2020 ב-13:08 מאת Peter Southwood <
> peter.southwood@telkomsa.net>:
>
> Amir,
>
> “some people are now thinking of undeploying the {{short description}}
> feature”
>
> Who? Where is this being discussed? I am not sure what you mean by
> “undeploying the feature”
>
> Agree that it was a badly handled problem and massive time-sink, but would
> prefer that the baby is not thrown out with the bathwater.
>
> Cheers,
>
> Peter
>
>
>
> On https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(WMF) now, and a
> lot of different and contradicting opinions are thrown around there.
>
>
>
>
>
>
> <http://www.avg.com/email-signature?utm_medium=email&
> utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
>
> Virus-free. <http://www.avg.com/email-signature?utm_medium=email&
> utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> www.avg.com
>
>
>
>
Hello all,
one of the things we have been discussing in the team is that we want to do
as much of our work in the open. At the same time, we're a distributed team
and starting to form a shared understanding of the task at hand. Due to the
COVID situation, we didn't have the opportunity to have a project kick off,
where we meet for a few days and make sure that we are fully aligned and
use the same words and have the same thinking.
That's both an opportunity, but also a risk, as it might lead to divergence
in what we are saying and writing.
We have two possible ways forward - either we vet documents and discussions
internally every time, in order to present a more unified view on the
project, or we just drop that and we publish our documents and plans in the
open immediately, with the understanding that this is merely preliminary,
that there might be inconsistencies. We might discuss and disagree with
each other publicly in Phabricator tasks and on this mailing list and on
the wiki pages - but in the end, this is also an opportunity to together
with you build a common understanding and share the process of developing
the project vision and implementation.
So, in that light, we still have a small backlog of internal documents that
we want to get out, and by the end of this week, most of the state of the
work should be in the open, and we will move more and more of our
discussions to the public, to eventually have them all in the open.
Here is a document I have been working on for a while, it is the core model
of how the evaluation and representation of data, functions, and function
calls in Wikilambda may work. Again, there is no agreement on this yet. It
differs from the AbstractText prototype implementation, and there is a list
of main differences at the end, and it also has not all the answers yet.
Thanks to, particularly Arthur P. Smith for many comments and rewriting of
some of the sections, thanks to Lucas Werkmeister for his valuable input
(and, even more important, for his work on GraalEneyj), thanks to Cyrus
Omar for his advice and pointers, and thanks to Adam Baso, James Forrester,
and Nick Wilson for their internal comments.
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model
Feedback on this would be extremely valuable, and you can see there are
many open questions left.
Stay safe,
Denny
I'm replying to Denny's original message. I've read other replies and
James's phabricator overview and I think I understand the problem. Except I
don't. So I'm stepping back to the requirements and constraints.
Constraint 1 is that text entered by humans must be stored as
human-readable text (encoded, obviously).
So, labels and aliases and "source code" are primarily text. If you need to
store the text as JSON, that's fine, but I was imagining that the human
text, as entered, would be translated into an abstract form (with Zs and Ks
etc) and it is the abstract form that gets stored as JSON. Yes, it's
dynamic (on-the-fly) during the editing process, but the human enters text
we care about and the machine turns it into an object we care about. The
text entered is a text and the interpreted text (object) is another text.
Translations are translations of a source. You can translate the text
entered into another language, and that's another text. Or you can
translate the derived form, and that's a different text. But if the
translation is fully automated, you might treat it as "mere" presentation:
a visualisation of underlying data. Maybe it makes sense to store such a
thing, especially if a human has seen it, even more so if they have relied
on it, but do we have a Requirement to store all translations up-front? I
don't think so. We store it in the language it was entered in (preferably
with metadata that identifies the language) and maybe we store it in a
small number of different languages (always having at least two would be
nice). Beyond that, I think you're talking about sub-pages per language
(but let's not jump to solutions).
Constraint 2 is that text is bound to its context.
A good example of this is comments in source-code. The word "in" indicates
the binding. The comment doesn't point to some text, it is the text, right
there, "in" the source code. Constraint 1 ("C1") applies: it is stored as
entered, where entered. If you later want to tidy up the source code and
replace comments with pointers to comments and/or translations, that's
fine. But C1 still applies, so you have a new version of the source-code
and you still have the old version.
When it comes to documentation outside of comments, that's just a text (as
written). It might be written as a multi-lingual text, but more than
bi-lingual is stretching it a bit for most of us. The bi-lingual text may
be a collaboration with a machine translator, but I would only see that as
a Requirement when one of the languages is WMF's own synthetic language
(ZKspeak, to coin a phrase). That is, I might compose my text in DeepL and
paste its English into the ZObject documentation. For us, that is the text
as written (C1 applies). If I also paste in the text I composed in a
different language, that's fine; that's another text as written (C1
applies). (If I make a comment to that effect, that's just text where
entered, but it's interesting metadata, so there may be a Requirement to
capture the metadata. Either way (or both ways) C1 applies.)
Requirement 1 might be that any text can be entered as Wikitext.
Ah, but the JSON can't be Wikitext... Well, that isn't the Requirement. We
can enter the text as Wikitext (so C1 applies). If it must be translated
into text that can be JSON, that's fine. We still have the Wikitext and now
we also have a translation; that's another text.
Does the above guide us toward a Solution? Well, it's not A, because we
don't have many translations in the JSON blob. But maybe we have three: the
source human text, the interpreted ZK text and a translation into a second
language.
It's not B (but I don't understand B). I think we do have "secondary
wikitext" but it might be implemented as "primary wikitext" with secondary
translations as sub-pages (somewhat optionally), as in Meta. It would be
the JSON blob that would be secondary (in a logical sence): some
transformation of a primary text. If the JSON needs to be primary, you can
treat it that way; then its human source pretends to be "about" the primary
object.
It's a bit like C, but it's not a big blob and it's probably not parallel.
Maybe it's a primary Meta-like wiki that is linked by common reference
(ZID) to the JSON blobosphere. Well, that sounds a lot like D, but...
It's not D, because the Meta-like wiki page for a ZObject is not a sub-page
of a non-wiki page. Wikipedia pages are not sub-pages of their Wikidata
Item's page, but you can look at them as if they are. We can link from one
Wikipedia to another directly, or we can link through Wikidata. I know
callable functions are a bit different but, as I said at the beginning, I
don't understand the problem. Hopefully this input will still be of benefit
to somebody who does, however.
Best regards,
Al.
On Wednesday, 29 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Two different kinds of information? (Andy) (Denny Vrandečić)
> 2. How to store wikitext along the structured content?
> (Denny Vrandečić)
> 3. Re: Two different kinds of information? (Denny Vrandečić)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 28 Jul 2020 13:57:52 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.
> wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Two different kinds of information?
> (Andy)
> Message-ID:
> <CA+bik1eS56HuAqtd6O-4OS-kexUzfvu0u4hsYfXtxc83Fms42w@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Al,
>
> just one quick request - can you set up your answers to the mailing list in
> such a way that it doesn't break the thread? (I am not sure how, maybe
> someone else can chime in, but right now, your answers start a new thread
> instead of continuing the previous one).
>
> Louis Lecaillez had a similar issue initially, but managed to resolve it,
> for which I am thankful.
>
> If not, it is OK, but I thought I'd ask.
>
> Thank you!
> Denny
>
>
> On Tue, Jul 28, 2020 at 8:23 AM Grounder UK <grounderuk(a)gmail.com> wrote:
>
> > Hi, Andy! Welcome!
> > I do like your idea of being clear about basic "facts" and details. I
> > think it will be key in the selection of "statements" that go into an
> > "article", in whatever language is required. I don't think we can say how
> > many levels of information there might be, but we can already see
> something
> > from how Wikipedia pages are put into categories.
> >
> > "France is a country in Europe" and "in western Europe" and "in the
> > European Union", just to mention three categories. The first is an
> > important fact of geography, but is the second more helpful? All
> countries
> > in western Europe are (1) a country and (2) in Europe and (3) to the
> west.
> > (3) feels more like a detail, but if we tell you France is in Europe,
> what
> > is the first question you might ask? It might be, "Is it in the European
> > Union?" or "How big is it?" or "Do many people live there?" So I would
> > expect us to give you those facts or details (FAQs) as well.
> >
> > Facts about facts and statements about claims are a whole other topic,
> but
> > if a "fact" is disputed, we do need to know how to show this. If you look
> > at Wikidata, you will see that the United Kingdom has been a sovereign
> > state since 1927. This is untrue. But if 1927 is not the answer to the
> > question "How long has the UK been a country (or sovereign state)?", what
> > is? "Since 1707, 1801 or 1922", depending on the details. Luckily for
> you,
> > France has "always" been a country, despite now being the fifth republic
> > (since 1958).
> >
> > So, sometimes the Property of an entity is not a simple value or
> > relationship. It might be better to think about it as a relationship to a
> > "disagreement" or debate. Then, a "fact" is an entity's relationship to
> an
> > absence of "disagreement", a "consensus", as Wikipedia would call it.
> Part
> > of this consensus is the meaning of an entity's label. For example,
> English
> > Wikipedia thinks "oxygen" is the chemical element ("O") and "its most
> > stable form" ("O<sub>2</sub>", "dioxygen"). French Wikipedia thinks
> > "oxygène" is just the element. Wikidata has statements (mostly) about the
> > element but the "Identifiers" (external authorities) are for the English
> > Wikipedia concept, not the French one. The point is, it is clear that
> there
> > might be some confusion! We have a separate item for dioxygen and for
> ozone
> > and (in theory) for atomic oxygen (and there are others) so we can give
> you
> > all of the oxygen facts, mostly grouped by form (allotrope and/or state).
> > Think of that as a disambiguation page enriched with detail... It's an
> > interesting use case (or test case), I think.
> >
> > Best regards,
> > Al.
> >
> > On Tuesday, 28 July 2020, <abstract-wikipedia-request@
> lists.wikimedia.org>
> > wrote:
> >
> >> Send Abstract-Wikipedia mailing list submissions to
> >> abstract-wikipedia(a)lists.wikimedia.org
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> >> or, via email, send a message with subject or body 'help' to
> >> abstract-wikipedia-request(a)lists.wikimedia.org
> >>
> >> You can reach the person managing the list at
> >> abstract-wikipedia-owner(a)lists.wikimedia.org
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of Abstract-Wikipedia digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >> 1. All work is preliminary (Denny Vrandečić)
> >> 2. Two different kinds of information? (Andy)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Mon, 27 Jul 2020 12:43:05 -0700
> >> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> >> To: Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> >> Subject: [Abstract-wikipedia] All work is preliminary
> >> Message-ID:
> >> <CA+bik1dNtpbA3H2_O=
> >> 8H8iyNrBPMbpQeAaOb04EpEaoLxCWSZQ(a)mail.gmail.com>
> >> Content-Type: text/plain; charset="utf-8"
> >>
> >> Hello all,
> >>
> >> one of the things we have been discussing in the team is that we want to
> >> do
> >> as much of our work in the open. At the same time, we're a distributed
> >> team
> >> and starting to form a shared understanding of the task at hand. Due to
> >> the
> >> COVID situation, we didn't have the opportunity to have a project kick
> >> off,
> >> where we meet for a few days and make sure that we are fully aligned and
> >> use the same words and have the same thinking.
> >>
> >> That's both an opportunity, but also a risk, as it might lead to
> >> divergence
> >> in what we are saying and writing.
> >>
> >> We have two possible ways forward - either we vet documents and
> >> discussions
> >> internally every time, in order to present a more unified view on the
> >> project, or we just drop that and we publish our documents and plans in
> >> the
> >> open immediately, with the understanding that this is merely
> preliminary,
> >> that there might be inconsistencies. We might discuss and disagree with
> >> each other publicly in Phabricator tasks and on this mailing list and on
> >> the wiki pages - but in the end, this is also an opportunity to together
> >> with you build a common understanding and share the process of
> developing
> >> the project vision and implementation.
> >>
> >> So, in that light, we still have a small backlog of internal documents
> >> that
> >> we want to get out, and by the end of this week, most of the state of
> the
> >> work should be in the open, and we will move more and more of our
> >> discussions to the public, to eventually have them all in the open.
> >>
> >> Here is a document I have been working on for a while, it is the core
> >> model
> >> of how the evaluation and representation of data, functions, and
> function
> >> calls in Wikilambda may work. Again, there is no agreement on this yet.
> It
> >> differs from the AbstractText prototype implementation, and there is a
> >> list
> >> of main differences at the end, and it also has not all the answers yet.
> >>
> >> Thanks to, particularly Arthur P. Smith for many comments and rewriting
> of
> >> some of the sections, thanks to Lucas Werkmeister for his valuable input
> >> (and, even more important, for his work on GraalEneyj), thanks to Cyrus
> >> Omar for his advice and pointers, and thanks to Adam Baso, James
> >> Forrester,
> >> and Nick Wilson for their internal comments.
> >>
> >> https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model
> >>
> >> Feedback on this would be extremely valuable, and you can see there are
> >> many open questions left.
> >>
> >> Stay safe,
> >> Denny
> >>
On Wed, 29 Jul 2020 at 01:51, Grounder UK <grounderuk(a)gmail.com> wrote:
> Well, let's see, Denny...
> I predict that omitting the name of the person I'm replying to will have
> the effect you desire. (How logical is that?)
> Fingers crossed! (...sorry, everyone)
> Al.
>
> On Wednesday, 29 July 2020, <
> abstract-wikipedia-request(a)lists.wikimedia.org> wrote:
>
>> Send Abstract-Wikipedia mailing list submissions to
>> abstract-wikipedia(a)lists.wikimedia.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
>> or, via email, send a message with subject or body 'help' to
>> abstract-wikipedia-request(a)lists.wikimedia.org
>>
>> You can reach the person managing the list at
>> abstract-wikipedia-owner(a)lists.wikimedia.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Abstract-Wikipedia digest..."
Hey all -
Yes it is getting hard, but also nice to see constant influx of
curious people joining in, also very in depth discussions in the list.
Expecting for the email replies to list to be neat is a bit too late
in 2020 as general use of good style deteriorated for at least last 2
decades...
...if there is going to be a bit of effort on this side - I think it
should start with adding more instruction info to footer, than
establishing introduction-to and maybe FAQ list?
Best Z
Well, let's see, Denny...
I predict that omitting the name of the person I'm replying to will have
the effect you desire. (How logical is that?)
Fingers crossed! (...sorry, everyone)
Al.
On Wednesday, 29 July 2020, <abstract-wikipedia-request(a)lists.wikimedia.org>
wrote:
> Send Abstract-Wikipedia mailing list submissions to
> abstract-wikipedia(a)lists.wikimedia.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> or, via email, send a message with subject or body 'help' to
> abstract-wikipedia-request(a)lists.wikimedia.org
>
> You can reach the person managing the list at
> abstract-wikipedia-owner(a)lists.wikimedia.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Abstract-Wikipedia digest..."
>
>
> Today's Topics:
>
> 1. Re: Two different kinds of information? (Andy) (Denny Vrandečić)
> 2. How to store wikitext along the structured content?
> (Denny Vrandečić)
> 3. Re: Two different kinds of information? (Denny Vrandečić)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 28 Jul 2020 13:57:52 -0700
> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> To: "General public mailing list for the discussion of Abstract
> Wikipedia (aka Wikilambda)" <abstract-wikipedia@lists.
> wikimedia.org>
> Subject: Re: [Abstract-wikipedia] Two different kinds of information?
> (Andy)
> Message-ID:
> <CA+bik1eS56HuAqtd6O-4OS-kexUzfvu0u4hsYfXtxc83Fms42w@
> mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Al,
>
> just one quick request - can you set up your answers to the mailing list in
> such a way that it doesn't break the thread? (I am not sure how, maybe
> someone else can chime in, but right now, your answers start a new thread
> instead of continuing the previous one).
>
> Louis Lecaillez had a similar issue initially, but managed to resolve it,
> for which I am thankful.
>
> If not, it is OK, but I thought I'd ask.
>
> Thank you!
> Denny
>
>
> On Tue, Jul 28, 2020 at 8:23 AM Grounder UK <grounderuk(a)gmail.com> wrote:
>
> > Hi, Andy! Welcome!
> > I do like your idea of being clear about basic "facts" and details. I
> > think it will be key in the selection of "statements" that go into an
> > "article", in whatever language is required. I don't think we can say how
> > many levels of information there might be, but we can already see
> something
> > from how Wikipedia pages are put into categories.
> >
> > "France is a country in Europe" and "in western Europe" and "in the
> > European Union", just to mention three categories. The first is an
> > important fact of geography, but is the second more helpful? All
> countries
> > in western Europe are (1) a country and (2) in Europe and (3) to the
> west.
> > (3) feels more like a detail, but if we tell you France is in Europe,
> what
> > is the first question you might ask? It might be, "Is it in the European
> > Union?" or "How big is it?" or "Do many people live there?" So I would
> > expect us to give you those facts or details (FAQs) as well.
> >
> > Facts about facts and statements about claims are a whole other topic,
> but
> > if a "fact" is disputed, we do need to know how to show this. If you look
> > at Wikidata, you will see that the United Kingdom has been a sovereign
> > state since 1927. This is untrue. But if 1927 is not the answer to the
> > question "How long has the UK been a country (or sovereign state)?", what
> > is? "Since 1707, 1801 or 1922", depending on the details. Luckily for
> you,
> > France has "always" been a country, despite now being the fifth republic
> > (since 1958).
> >
> > So, sometimes the Property of an entity is not a simple value or
> > relationship. It might be better to think about it as a relationship to a
> > "disagreement" or debate. Then, a "fact" is an entity's relationship to
> an
> > absence of "disagreement", a "consensus", as Wikipedia would call it.
> Part
> > of this consensus is the meaning of an entity's label. For example,
> English
> > Wikipedia thinks "oxygen" is the chemical element ("O") and "its most
> > stable form" ("O<sub>2</sub>", "dioxygen"). French Wikipedia thinks
> > "oxygène" is just the element. Wikidata has statements (mostly) about the
> > element but the "Identifiers" (external authorities) are for the English
> > Wikipedia concept, not the French one. The point is, it is clear that
> there
> > might be some confusion! We have a separate item for dioxygen and for
> ozone
> > and (in theory) for atomic oxygen (and there are others) so we can give
> you
> > all of the oxygen facts, mostly grouped by form (allotrope and/or state).
> > Think of that as a disambiguation page enriched with detail... It's an
> > interesting use case (or test case), I think.
> >
> > Best regards,
> > Al.
> >
> > On Tuesday, 28 July 2020, <abstract-wikipedia-request@
> lists.wikimedia.org>
> > wrote:
> >
> >> Send Abstract-Wikipedia mailing list submissions to
> >> abstract-wikipedia(a)lists.wikimedia.org
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >> https://lists.wikimedia.org/mailman/listinfo/abstract-wikipedia
> >> or, via email, send a message with subject or body 'help' to
> >> abstract-wikipedia-request(a)lists.wikimedia.org
> >>
> >> You can reach the person managing the list at
> >> abstract-wikipedia-owner(a)lists.wikimedia.org
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of Abstract-Wikipedia digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >> 1. All work is preliminary (Denny Vrandečić)
> >> 2. Two different kinds of information? (Andy)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Mon, 27 Jul 2020 12:43:05 -0700
> >> From: Denny Vrandečić <dvrandecic(a)wikimedia.org>
> >> To: Abstract Wikipedia list <abstract-wikipedia(a)lists.wikimedia.org>
> >> Subject: [Abstract-wikipedia] All work is preliminary
> >> Message-ID:
> >> <CA+bik1dNtpbA3H2_O=
> >> 8H8iyNrBPMbpQeAaOb04EpEaoLxCWSZQ(a)mail.gmail.com>
> >> Content-Type: text/plain; charset="utf-8"
> >>
> >> Hello all,
> >>
> >> one of the things we have been discussing in the team is that we want to
> >> do
> >> as much of our work in the open. At the same time, we're a distributed
> >> team
> >> and starting to form a shared understanding of the task at hand. Due to
> >> the
> >> COVID situation, we didn't have the opportunity to have a project kick
> >> off,
> >> where we meet for a few days and make sure that we are fully aligned and
> >> use the same words and have the same thinking.
> >>
> >> That's both an opportunity, but also a risk, as it might lead to
> >> divergence
> >> in what we are saying and writing.
> >>
> >> We have two possible ways forward - either we vet documents and
> >> discussions
> >> internally every time, in order to present a more unified view on the
> >> project, or we just drop that and we publish our documents and plans in
> >> the
> >> open immediately, with the understanding that this is merely
> preliminary,
> >> that there might be inconsistencies. We might discuss and disagree with
> >> each other publicly in Phabricator tasks and on this mailing list and on
> >> the wiki pages - but in the end, this is also an opportunity to together
> >> with you build a common understanding and share the process of
> developing
> >> the project vision and implementation.
> >>
> >> So, in that light, we still have a small backlog of internal documents
> >> that
> >> we want to get out, and by the end of this week, most of the state of
> the
> >> work should be in the open, and we will move more and more of our
> >> discussions to the public, to eventually have them all in the open.
> >>
> >> Here is a document I have been working on for a while, it is the core
> >> model
> >> of how the evaluation and representation of data, functions, and
> function
> >> calls in Wikilambda may work. Again, there is no agreement on this yet.
> It
> >> differs from the AbstractText prototype implementation, and there is a
> >> list
> >> of main differences at the end, and it also has not all the answers yet.
> >>
> >> Thanks to, particularly Arthur P. Smith for many comments and rewriting
> of
> >> some of the sections, thanks to Lucas Werkmeister for his valuable input
> >> (and, even more important, for his work on GraalEneyj), thanks to Cyrus
> >> Omar for his advice and pointers, and thanks to Adam Baso, James
> >> Forrester,
> >> and Nick Wilson for their internal comments.
> >>
> >> https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Function_model
> >>
> >> Feedback on this would be extremely valuable, and you can see there are
> >> many open questions left.
> >>
> >> Stay safe,
> >> Denny
> >>