Hello,
we plan to change the way AbuseFilter filter hits are logged fundamentally. Feel free to skip to "Actual impact to end users" in case you're not interested in or don't understand the technical background.
Some technical background:
We have a AbuseFilterVariableHolder object that contains all variables usable by the filters. Some of these variables are stored as AFPData objects and some as AFComputedVariable objects. The values of the AFPData ones are already known while the values of the AFComputedVariable ones are computed when needed (lazy load variables). Right now AbuseFilter is logging filter hits by saving a serialized version of an AbuseFilterVariableHolder object without any of the lazy load variables computed. That object, as explained above, includes several AFComputedVariable objects which hold information on how the value for a lazy load variable can be computed (eg. parameters and a method for AFComputedVariable::compute). That has several technical downsides like it's not very forward compatible so that we will never be able to change the method names or the way methods in AFComputedVariable::compute work as we always have to expect that an old log entry calls the methods with the old parameters. That's an even bigger problem with the hooks in that function as those have to stay backwards compatible as well. Furthermore this means we're saving a lot unneeded data to the database.
What we're going to change now is that we will no longer log AbuseFilterVariableHolder objects in serialized form to the database but a serialized array with only native data types (which is much more robust). Lazy load variables will be logged only if they have been computed before the logging occurs. This furthermore implies that we will no longer log any lazy load accessor information to the database.
Actual impact to end users:
The actual impact to the users will be very little as the logging page (Special:AbuseLog) will still hold all non lazy load variables (like page title, page namespace, user name, ...) and the lazy load variables used by the filter(s) tested. Due to this all the relevant data for the current log action will still be there (while irrelevant data might not be available). In some cases this might even make it easier to spot information relevant to a filter hit as data not involved in this filter hit is no longer logged forcibly.
This change will make it much simpler to make more data available for filters without having to face the headaches of the current logging format.
I hope you agree with me that this change makes sense so that we finally can move forward with the AbuseFilter extension!
Gerrit change: https://gerrit.wikimedia.org/r/42501 Note: This was cross-posted to wikitech-ambassadors
Cheers,
Marius Hoch (hoo)
wikitech-l@lists.wikimedia.org