Hello,
we plan to change the way AbuseFilter filter hits are logged
fundamentally.
Feel free to skip to "Actual impact to end users" in case you're not
interested in or don't understand the technical background.
Some technical background:
We have a AbuseFilterVariableHolder object that contains all variables
usable by the filters. Some of these variables are stored as AFPData
objects and some as AFComputedVariable objects. The values of the
AFPData ones are already known while the values of the
AFComputedVariable ones are computed when needed (lazy load variables).
Right now AbuseFilter is logging filter hits by saving a serialized
version of an AbuseFilterVariableHolder object without any of the lazy
load variables computed. That object, as explained above, includes
several AFComputedVariable objects which hold information on how the
value for a lazy load variable can be computed (eg. parameters and a
method for AFComputedVariable::compute). That has several technical
downsides like it's not very forward compatible so that we will never be
able to change the method names or the way methods in
AFComputedVariable::compute work as we always have to expect that an old
log entry calls the methods with the old parameters. That's an even
bigger problem with the hooks in that function as those have to stay
backwards compatible as well. Furthermore this means we're saving a lot
unneeded data to the database.
What we're going to change now is that we will no longer log
AbuseFilterVariableHolder objects in serialized form to the database but
a serialized array with only native data types (which is much more
robust). Lazy load variables will be logged only if they have been
computed before the logging occurs. This furthermore implies that we
will no longer log any lazy load accessor information to the database.
Actual impact to end users:
The actual impact to the users will be very little as the logging page
(Special:AbuseLog) will still hold all non lazy load variables (like
page title, page namespace, user name, ...) and the lazy load variables
used by the filter(s) tested. Due to this all the relevant data for the
current log action will still be there (while irrelevant data might not
be available). In some cases this might even make it easier to spot
information relevant to a filter hit as data not involved in this filter
hit is no longer logged forcibly.
This change will make it much simpler to make more data available for
filters without having to face the headaches of the current logging
format.
I hope you agree with me that this change makes sense so that we finally
can move forward with the AbuseFilter extension!
Gerrit change: https://gerrit.wikimedia.org/r/42501
Note: This was cross-posted to wikitech-ambassadors
Cheers,
Marius Hoch (hoo)