Within the Flow extension we have a need for inserting our own special changes into the recentchanges table so that Watchlists continue to inform users of changes in the same ways they are used to. Within mediawiki the WikiData extension has similar requirements and has implemented a solution that works for their use case. Flow is looking to extend this to handle multiple types of external change sources. The solution taken by WikiData to render the lines works well and will be used by Flow, but we have some concerns regarding how different types of external changes will be filtered by the queries that generate the Special:RecentChanges and Special:Watchlist pages.
How does the current solution work?
There is a field in the recentchanges table, rc_type. All WikiData entries use the value of RC_EXTERNAL( = 5) for this field. Queries are generated with either (rc_type = 5) or (rc_type != 5) when filtering is required.
Requirements:
- Currently WikiData entries into recentchanges are filtered from Special:RecentChanges and Special:Watchlist. This is toggleable. By default we will not want to filter Flow entries, but will want to offer a toggle much like WikiData does. - More types of external change sources should be able to add themselves in the future without core changes - We should play nice with the db slave's serving up watchlists.
There are a couple options, each with their own tradeoffs.
1. Use rc_type = RC_EXTERNAL and add a new field to the recentchanges table, rc_external_type. This would be a varchar(16) field. Wikidata and Flow would put their respective names in the field to distinguish between each other. This is conceptually simple, but makes the queries look even odder. (rc_type != 5) becomes (rc_type != 5 AND rc_external_type != 'wikidata'). 2. Similar to 1, but instead of creating a new field reuse rc_log_type field which is only used when rc_type = RC_LOG. This seems a bit hacky, but would only need a field rename to not feel so hacky. I'm not proposing to rename the field though as there are a variety of extensions depending on the current field name and we are not going to coordinate getting them all updated at the exact same time. The fact that this field is used by various extensions may be a hint that we shouldn't reuse it. 3. Replace RC_EXTERNAL with RC_WIKIDATA and RC_FLOW constants in their respective extensions. This is also straightforward, but adds development overhead to ensure future creators of RC_* constants do not conflict with each other. It would be handled similarly to NS_* constants with an on-wiki list. I have heard some mention that naming conflicts have occurred in the past with this solution. This would force queries looking for only core sources of change to provide an inclusive list of RC_* values to find, rather than using rc_type != RC_EXTERNAL.
Things to consider: On smaller wiki's WikiData changes can account for > 50% of the changes. Talk namespace edits, which we expect to eventually replace with flow edits, account for ~20% of enwiki recentchanges rows
The standard query issued by Special:RecentChanges is
SELECT /* lots of fields */ FROM `recentchanges` FORCE INDEX (rc_timestamp) LEFT JOIN `watchlist` ON (wl_user = '2' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) LEFT JOIN `tag_summary` ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp >= '20130912000000') AND rc_bot = '0' AND (rc_type != 5) ORDER BY rc_timestamp DESC LIMIT 50
The standard query issued by Special:Watchlist is
SELECT /* lots of fields */ FROM `recentchanges` INNER JOIN `watchlist` ON (wl_user = '2' AND (wl_namespace=rc_namespace) AND (wl_title=rc_title)) LEFT JOIN `page` ON ((rc_cur_id=page_id)) LEFT JOIN `tag_summary` ON ((ts_rc_id=rc_id)) WHERE (rc_timestamp > '20130916175626') AND (rc_this_oldid=page_latest OR rc_type=3) AND (rc_type != 5) ORDER BY rc_timestamp DESC
Without further input I will be implementing option 3 from above, I welcome any input on better solutions, or potential pitfalls with this solution.
Erik Bernhardson