Hey Guys,
Thanks for explaining it to me. Can I have your IRC handles, I still think I have many doubts.
Is there a simpler bug related with extension, so I can get an Idea of it working.
On Fri, Mar 8, 2013 at 5:23 AM, Chris Steipp csteipp@wikimedia.org wrote:
On Thu, Mar 7, 2013 at 1:34 PM, Platonides Platonides@gmail.com wrote:
On 07/03/13 21:03, anubhav agarwal wrote:
Hey Chris
I was exploring SpamBlaklist Extension. I have some doubts hope you
could
clear them.
Is there any place I can get documentation of Class SpamBlacklist in the file SpamBlacklist_body.php. ?
There really isn't any documentation besides the code, but a couple more things you should look at. Notice that in SpamBlacklist.php, there is the line "$wgHooks['EditFilterMerged'][] = 'SpamBlacklistHooks::filterMerged';", which is the way that SpamBlacklist registers itself with MediaWiki core to filter edits. So when MediaWiki core runs the EditFilterMerged hooks (which it does in includes/EditPage.php, line 1287), all of the extensions that have registered a function for that hook are run with the passed in arguments, so SpamBlacklistHooks::filterMerged is run. And SpamBlacklistHooks::filterMerged then just sets up and calls SpamBlacklist::filter. So that is where you can start tracing what is actually in the variables, in case Platonides summary wasn't enough.
In function filter what does the following variables represent ?
$title
Title object (includes/Title.php) This is the page where it tried to
save.
$text
Text being saved in the page/section
$section
Name of the section or ''
$editpage
EditPage object if EditFilterMerged was called, null otherwise
$out
A ParserOutput class (actually, this variable name was a bad choice, it looks like a OutputPage), see includes/parser/ParserOutput.php
I have understood the following things from the code, please correct me
if
I am wrong. It extracts the edited text, and parse it to find the links.
Actually, it uses the fact that the parser will have processed the links, so in most cases just obtains that information.
It then replaces the links which match the whitelist regex,
This doesn't make sense as you explain it. It builds a list of links, and replaces whitelisted ones with '', ie. removes whitelisted links from the list.
and then checks if there are some links that match the blacklist regex.
Yes
If the check is greater you return the content matched.
Right, $check will be non-0 if the links matched the blacklist.
it already enters in the debuglog if it finds a match
Yes, but that is a private log. Bug 1542 talks about making that accesible in the wiki.
Yep. For example, see
I guess the bug aims at creating a sql table. I was thinking of the following fields to log. Title, Text, User, URLs, IP. I don't understand why you denied it.
Because we don't like to publish the IPs *in the wiki*.
The WMF privacy policy also discourages us from keeping IP addresses longer than 90 days, so if you do keep IPs, then you need a way to hide / purge them, and if they allow someone to see what IP address a particular username was using, then only users with checkuser permissions are allowed to see that. So it would be easier for you not to include it, but if it's desired, then you'll just have to build those protections out too.
I think the approach should be to log matches using abusefilter extension if that one is loaded.
The abusefilter log format has a lot of data in it specific to AbuseFilter, and is used to re-test abuse filters, so adding these hits into that log might cause some issues. I think either the general log, or using a separate, new log table would be best. Just for some numbers, in the first 7 days of this month, we've had an average of 27,000 hits each day. So if this goes into an existing log, it's going to generate a significant amount of data.
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l