Hi!
Once wikidata.org allows for entry of arbitrary properties, we will need some protection against spam. However, there is a nasty little problem with making SpamBlacklist, AntiBot, AbuseFilter etc work with Wikidata content:
Wikibase implements editing directly via the API, but using EditPage. But the spam filters usually hook into EditPage, typically using the EditFilter or EditFilterMerged resp EditFilterMergedContent.
Wikibase has a utility class called EditEntity which implements many things otherwise done by the EditPage: token checks, conflict detection and resolution, permission checks, etc. We could just trigger EditFilterMergedContent there, and also EditFilterMerged and EditFilter, though we would have to fake the "text" for these.
There is one problem with this though: These hooks take as their first parameter an EnditPage object, and the handler functions defined in the various extensions make use of this. Often, just to get the context, like page title, etc - but often enough also for non-trivial things, like calling EditPage::spamPage() or even EditPage::spamPageWithContent().
How can we handle this? I see several possibilities:
1) change the definition of the hook so it just has a ContextSource as it's first parameter, and fix all extensions that use the hook. However, it is unclear how functionality like EditPage::spamPageWithContent() can then be implemented. EditPage::spamPage() could be moved to a utility class, or into OutputPage.
2) emulate an EditPage object, using a proxy/stub/dummy object. This would need a bit of coding, and it's prone to get out of sync with the real EditPage. But things like spamPageWithContent() could be implemented nicely, in a content model specific manner.
3) we could instantiate a dummy EditPage, and pass that to the hooks. But EditPage doesn't support non-text content, and even if we force it, we are likely to end up with an edit field full of json, if we are not very careful.
4) just add another hook, similar to EditFilterMergedContent, but more generic, and call it in EditEntity (and perhaps also in EditPage!). If we want a spam filter extension to work with non-text content, it will have to implement that new hook.
What's the best option, do you think?
There's another closely related problem, btw: showing captchas. How can that be implemented at all for API based, atomic edits? Would the API return a special error, which includes a link to the captcha image as a challange? And then requires thecaptcha's solution via some special arguments to the module call? How can an extension controll this? How is this done for the API's action=edit at present?
thanks, daniel