I am coming to the realization that machine learning is somewhat counter
wiki-culture, but I'm going to keep trying to help in the way I know. My
new project proposal, WikiPragmatica (naming things is something I love,
but am poor at, so sorry), is a machine learning-based, data curation of
paraphrases. It has a bit of a special sauce in that the curation retains
the original context, hence the nod to pragmatics. When complete for any
corpus, you could still read the source material in the graph itself, as
the connectors, (edges) point to what concepts come prior and next.
This is important to fact checking, as well as the more general
misinformation detection, as each node (the collection of paraphrases of
each other) will have wikidata and other associated metadata. Thus, for any
speech or article or propaganda piece, we can detect variations from
pragmatics, as well as making statements contrary to the associated
metadata. The deviation from pragmatics may stem from a piece of
misinformation using a string of concepts rarely or never used, say giving
the location of a pizza restaurant, followed by a discussion of child
trafficking at the restaurant. A statement that, "Vaccines cause autism,"
will immediately be flagged as the associated metadata for that concept
will have an entry in the sentiment section of the metadata as "not true,"
no matter how they choose to say, "Vaccines cause autism."
The biggest problem with word based approaches is that they all lack higher
order contextual clues. Autism the lexeme is not as useful for fact
checking as the full context of the concept, "Vaccines cause autism."
However, the concept, "Vaccines cause autism," will inherit all of the
context appropriate metadata from the lexemes that are used to build the
concept! We get all of the power of Wikidata for free, just by reindexing
all Wikipedias to paraphrases. This will also massively help with general
search. Google and all other search indices suffer from a lack of
contextual resolution. Even the near AI of GPT-3 suffers from repetitions,
coherence loss over sufficiently long passages, and contradiction, meaning
it loses context occasionally (from the GPT paper here
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>).
The combination of contextual forensics and metadata evaluation and graph
based analytics will provide highly accurate misinformation detection.
In my project proposal, I recommended that we ultimately re-index the web
to paraphrases. While this may sound a bit Dr Evilish, the paraphrase graph
is actually the same thing as a Wikipedia. The paraphrase graph also
documents human knowledge, but is based on how we humans communicate that
knowledge. Some of this knowledge is intended to harm others via
misinformation or outright fraud. The wiki community can certainly build a
reference work of all paraphrases, but sorting through about 500 billion
sentences seems daunting to me without some electrical help.
I do think that a reference work of human communication is in
Wikimedia's wheelhouse, but this reference work also has very practical
applications. Thank you for taking the time to read my thoughts on this
vital topic.
On Tue, Feb 9, 2021 at 5:35 PM Netha Hussain <nethahussain(a)gmail.com> wrote:
Hi all,
I am generally interested in any project that helps counter misinformation
on the internet, and I think that our existing projects have limitations in
calling out fake news. Wikipedia, for example, has dedicated pages
surrounding misinformation related to various topic areas (such as this
article on Misinformation related to COVID-19
<https://en.wikipedia.org/wiki/COVID-19_misinformation>) where fact
checking can be incorporated. However, such articles do not only contain
fact-checked statements, but they deal with misinformation in a
comprehensive way, covering the origin, extent and effect of
misinformation, in addition to commonly circulated bits of
mis(dis)information. Another possibility on Wikipedia is to create a list
of commonly circulated misinformation on notable themes (such as this
article on List of unproven methods against COVID-19
<https://en.wikipedia.org/wiki/List_of_unproven_methods_against_COVID-19>).
Turns out that such lists contain several primary sources as
citations, because there are too few available secondary sources which call
out misinformation.
In the realm of misinformation, the existing primary/secondary sources
only cover the tip of the iceberg, and there is so much more misinformation
circulating in the internet than is being documented by fact checking
websites and news media. Another limitation is that it is not possible to
add a piece of misinformation that you found on social media to a Wikipedia
page, because that amounts to original research. Searchability is also an
issue on Wikipedia, and our search interface on Wikipedia is not exactly
suitable for someone who wants to check if a piece of information is true
or not. What is possible to do now on WIkipedia is to give 'good
information' to the readers, and I see that it has limitations when it
comes to calling out misinformation.
I am also thinking about using Wikidata for incorporating
misinformation-related data. What if we can model data in a way on Wikidata
to show that AUTISM (item) is NOT CAUSED BY (property) vaccination, giving
reference to research from the WHO? That COVID-19 cannot be cured by garlic
according to the CDC? In this way, we could build up Listeria lists, say,
of 'unproven methods against COVID-19', making it easier for the readers to
find misinformation. It will then be possible for search engines to 'learn'
to weed out misinformation by 'reading' from Wikidata. The limitation in
this case would be to incorporate statements where truth is somewhat
ambiguous, or where we do not have sufficient evidence yet. Can yoga cure
back pain? Can vitamin E prevent ageing? These questions do not have
unambiguous answers. How can we deal with such situations? I think we'll
need to have robust policies in place on Wikidata before we try and
incorporate misinformation related data there.
In summary, I think that starting a new project for fact checking is
justified, given that our existing projects have limitations in calling out
misinformation.
Regards
Netha
On Sat, 6 Feb 2021 at 03:16, Gerard Meijssen <gerard.meijssen(a)gmail.com>
wrote:
Hoi,
Jimmy has a project that does exactly that.
Having said that, what we could do is have a project investigating the
missing information in Wikidata. The bias in Wikidata is alive and well. I
doubt for instance that there is one city in Africa whose mayors are all
known in Wikidata.. For Northern America and European towns this is common.
We do not even know all the national ministers of African countries for the
twentieth and twenty first century.
Fact checking starts with having facts in dispute and we don't even have
many of the basic facts. That is a project that we could and should have.
Thanks,
GerardM
On Thu, 4 Feb 2021 at 20:17, Leinonen Teemu <teemu.leinonen(a)aalto.fi>
wrote:
Hi all,
Has there been any discussion to start a new Wikimedia project focusing
on fact checking?
Fact checking of course is in the core of editing Wikipedia, but I was
thinking about dedicated wiki-site that is dedicated for fact checking of
current events and news. Why this would be important?
(1) There are many fact checking site in the English speaking world but
much less elsewhere. I am afraid that there is still greater need for fact
checking in the rest of the world. {{Citation needed}}
(2) Our community is very well educated to do fact checking the
wiki-way. Again internationally, many of our community members are real
fact champions in their home countries and language groups. The practice of
Wikipedia could be applied to fact checking of fast moving current events
and news, too.
(3) This could help us to get new young people to the movement, as
editing Wikipedias is not anymore so easy to start (because they are so
good already).
(4) In many parts of the world, fact checking can also be dangerous.
With our anonymous and community driven practices and services we could
protect the fact checkers in many parts of the world.
I am not sure what is the state of the Wikinews, but my impression is
that it is not really working. It was a good idea, but maybe wiki or
wiki-way is not the way to produce news. Also the beautiful idea of citizen
journalism has not really become reality. Maybe we could try if wiki and
the wki-way works better in fact checking.
Peace,
- Teemu
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>
--
Netha Hussain
(she/her)
_______________________________________________
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l(a)lists.wikimedia.org
Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
<mailto:wikimedia-l-request@lists.wikimedia.org?subject=unsubscribe>