I am coming to the realization that machine learning is somewhat counter wiki-culture, but I'm going to keep trying to help in the way I know. My new project proposal, WikiPragmatica (naming things is something I love, but am poor at, so sorry), is a machine learning-based, data curation of paraphrases. It has a bit of a special sauce in that the curation retains the original context, hence the nod to pragmatics. When complete for any corpus, you could still read the source material in the graph itself, as the connectors, (edges) point to what concepts come prior and next.

This is important to fact checking, as well as the more general misinformation detection, as each node (the collection of paraphrases of each other) will have wikidata and other associated metadata. Thus, for any speech or article or propaganda piece, we can detect variations from pragmatics, as well as making statements contrary to the associated metadata. The deviation from pragmatics may stem from a piece of misinformation using a string of concepts rarely or never used, say giving the location of a pizza restaurant, followed by a discussion of child trafficking at the restaurant. A statement that, "Vaccines cause autism," will immediately be flagged as the associated metadata for that concept will have an entry in the sentiment section of the metadata as "not true," no matter how they choose to say, "Vaccines cause autism."

The biggest problem with word based approaches is that they all lack higher order contextual clues. Autism the lexeme is not as useful for fact checking as the full context of the concept, "Vaccines cause autism." However, the concept, "Vaccines cause autism," will inherit all of the context appropriate metadata from the lexemes that are used to build the concept! We get all of the power of Wikidata for free, just by reindexing all Wikipedias to paraphrases. This will also massively help with general search. Google and all other search indices suffer from a lack of contextual resolution. Even the near AI of GPT-3 suffers from repetitions, coherence loss over sufficiently long passages, and contradiction, meaning it loses context occasionally (from the GPT paper here). The combination of contextual forensics and metadata evaluation and graph based analytics will provide highly accurate misinformation detection.

In my project proposal, I recommended that we ultimately re-index the web to paraphrases. While this may sound a bit Dr Evilish, the paraphrase graph is actually the same thing as a Wikipedia. The paraphrase graph also documents human knowledge, but is based on how we humans communicate that knowledge. Some of this knowledge is intended to harm others via misinformation or outright fraud. The wiki community can certainly build a reference work of all paraphrases, but sorting through about 500 billion sentences seems daunting to me without some electrical help.

I do think that a reference work of human communication is in Wikimedia's wheelhouse, but this reference work also has very practical applications. Thank you for taking the time to read my thoughts on this vital topic.

On Tue, Feb 9, 2021 at 5:35 PM Netha Hussain <nethahussain@gmail.com> wrote:

Hi all,

I am generally interested in any project that helps counter misinformation on the internet, and I think that our existing projects have limitations in calling out fake news. Wikipedia, for example, has dedicated pages surrounding misinformation related to various topic areas (such as this article on Misinformation related to COVID-19) where fact checking can be incorporated. However, such articles do not only contain fact-checked statements, but they deal with misinformation in a comprehensive way, covering the origin, extent and effect of misinformation, in addition to commonly circulated bits of mis(dis)information. Another possibility on Wikipedia is to create a list of commonly circulated misinformation on notable themes (such as this article on List of unproven methods against COVID-19). Turns out that such lists contain several primary sources as citations, because there are too few available secondary sources which call out misinformation.

In the realm of misinformation, the existing primary/secondary sources only cover the tip of the iceberg, and there is so much more misinformation circulating in the internet than is being documented by fact checking websites and news media. Another limitation is that it is not possible to add a piece of misinformation that you found on social media to a Wikipedia page, because that amounts to original research. Searchability is also an issue on Wikipedia, and our search interface on Wikipedia is not exactly suitable for someone who wants to check if a piece of information is true or not. What is possible to do now on WIkipedia is to give 'good information' to the readers, and I see that it has limitations when it comes to calling out misinformation.

I am also thinking about using Wikidata for incorporating misinformation-related data. What if we can model data in a way on Wikidata to show that AUTISM (item) is NOT CAUSED BY (property) vaccination, giving reference to research from the WHO? That COVID-19 cannot be cured by garlic according to the CDC? In this way, we could build up Listeria lists, say, of 'unproven methods against COVID-19', making it easier for the readers to find misinformation. It will then be possible for search engines to 'learn' to weed out misinformation by 'reading' from Wikidata. The limitation in this case would be to incorporate statements where truth is somewhat ambiguous, or where we do not have sufficient evidence yet. Can yoga cure back pain? Can vitamin E prevent ageing? These questions do not have unambiguous answers. How can we deal with such situations? I think we'll need to have robust policies in place on Wikidata before we try and incorporate misinformation related data there.

In summary, I think that starting a new project for fact checking is justified, given that our existing projects have limitations in calling out misinformation.

Regards
Netha

On Sat, 6 Feb 2021 at 03:16, Gerard Meijssen <gerard.meijssen@gmail.com> wrote:
Hoi,
Jimmy has a project that does exactly that.

Having said that, what we could do is have a project investigating the missing information in Wikidata. The bias in Wikidata is alive and well. I doubt for instance that there is one city in Africa whose mayors are all known in Wikidata.. For Northern America and European towns this is common. We do not even know all the national ministers of African countries for the twentieth and twenty first century.

Fact checking starts with having facts in dispute and we don't even have many of the basic facts. That is a project that we could and should have.
Thanks,
GerardM

On Thu, 4 Feb 2021 at 20:17, Leinonen Teemu <teemu.leinonen@aalto.fi> wrote:
Hi all,

Has there been any discussion to start a new Wikimedia project focusing on fact checking?

Fact checking of course is in the core of editing Wikipedia, but I was thinking about dedicated wiki-site that is dedicated for fact checking of current events and news. Why this would be important?

(1) There are many fact checking site in the English speaking world but much less elsewhere. I am afraid that there is still greater need for fact checking in the rest of the world. {{Citation needed}}

(2) Our community is very well educated to do fact checking the wiki-way. Again internationally, many of our community members are real fact champions in their home countries and language groups. The practice of Wikipedia could be applied to fact checking of fast moving current events and news, too.

(3) This could help us to get new young people to the movement, as editing Wikipedias is not anymore so easy to start (because they are so good already).

(4) In many parts of the world, fact checking can also be dangerous. With our anonymous and community driven practices and services we could protect the fact checkers in many parts of the world.

I am not sure what is the state of the Wikinews, but my impression is that it is not really working. It was a good idea, but maybe wiki or wiki-way is not the way to produce news. Also the beautiful idea of citizen journalism has not really become reality. Maybe we could try if wiki and the wki-way works better in fact checking.

Peace,

- Teemu

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>

--
Netha Hussain
(she/her)

_______________________________________________
Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>