[Wikitech-l] Spam filters for wikidata.org

4 Dec 2012


      Hi!
Once wikidata.org allows for entry of arbitrary properties, we will need some
protection against spam. However, there is a nasty little problem with making
SpamBlacklist, AntiBot, AbuseFilter etc work with Wikidata content:
Wikibase implements editing directly via the API, but using EditPage. But the
spam filters usually hook into EditPage, typically using the EditFilter or
EditFilterMerged resp EditFilterMergedContent.
Wikibase has a utility class called EditEntity which implements many things
otherwise done by the EditPage: token checks, conflict detection and resolution,
permission checks, etc. We could just trigger  EditFilterMergedContent there,
and also EditFilterMerged and EditFilter, though we would have to fake the
"text" for these.
There is one problem with this though: These hooks take as their first parameter
an EnditPage object, and the handler functions defined in the various extensions
make use of this. Often, just to get the context, like page title, etc - but
often enough also for non-trivial things, like calling EditPage::spamPage() or
even EditPage::spamPageWithContent().
How can we handle this? I see several possibilities:
1) change the definition of the hook so it just has a ContextSource as it's
first parameter, and fix all extensions that use the hook. However, it is
unclear how functionality like  EditPage::spamPageWithContent() can then be
implemented. EditPage::spamPage() could be moved to a utility class, or into
OutputPage.
2) emulate an EditPage object, using a proxy/stub/dummy object. This would need
a bit of coding, and it's prone to get out of sync with the real EditPage. But
things like spamPageWithContent() could be implemented nicely, in a content
model specific manner.
3) we could instantiate a dummy EditPage, and pass that to the hooks. But
EditPage doesn't support non-text content, and even if we force it, we are
likely to end up with an edit field full of json, if we are not very careful.
4) just add another hook, similar to EditFilterMergedContent, but more generic,
and call it in EditEntity (and perhaps also in EditPage!). If we want a spam
filter extension to work with non-text content, it will have to implement that
new hook.
What's the best option, do you think?
There's another closely related problem, btw: showing captchas. How can that be
implemented at all for API based, atomic edits? Would the API return a special
error, which includes a link to the captcha image as a challange? And then
requires thecaptcha's solution via some special arguments to the module call?
How can an extension controll this? How is this done for the API's action=edit
at present?
thanks,
daniel

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

[Wikitech-l] Spam filters for wikidata.org