Within the Flow extension we have a need for inserting our own special
changes into the recentchanges table so that Watchlists continue to inform
users of changes in the same ways they are used to. Within mediawiki the
WikiData extension has similar requirements and has implemented a solution
that works for their use case. Flow is looking to extend this to handle
multiple types of external change sources. The solution taken by WikiData
to render the lines works well and will be used by Flow, but we have some
concerns regarding how different types of external changes will be filtered
by the queries that generate the Special:RecentChanges and
Special:Watchlist pages.
How does the current solution work?
There is a field in the recentchanges table, rc_type. All WikiData entries
use the value of RC_EXTERNAL( = 5) for this field. Queries are generated
with either (rc_type = 5) or (rc_type != 5) when filtering is required.
Requirements:
- Currently WikiData entries into recentchanges are filtered from
Special:RecentChanges and Special:Watchlist. This is toggleable. By
default we will not want to filter Flow entries, but will want to offer a
toggle much like WikiData does.
- More types of external change sources should be able to add themselves
in the future without core changes
- We should play nice with the db slave's serving up watchlists.
There are a couple options, each with their own tradeoffs.
1. Use rc_type = RC_EXTERNAL and add a new field to the recentchanges
table, rc_external_type. This would be a varchar(16) field. Wikidata and
Flow would put their respective names in the field to distinguish between
each other. This is conceptually simple, but makes the queries look even
odder. (rc_type != 5) becomes (rc_type != 5 AND rc_external_type !=
'wikidata').
2. Similar to 1, but instead of creating a new field reuse rc_log_type
field which is only used when rc_type = RC_LOG. This seems a bit hacky,
but would only need a field rename to not feel so hacky. I'm not proposing
to rename the field though as there are a variety of extensions depending
on the current field name and we are not going to coordinate getting them
all updated at the exact same time. The fact that this field is used by
various extensions may be a hint that we shouldn't reuse it.
3. Replace RC_EXTERNAL with RC_WIKIDATA and RC_FLOW constants in their
respective extensions. This is also straightforward, but adds development
overhead to ensure future creators of RC_* constants do not conflict with
each other. It would be handled similarly to NS_* constants with an
on-wiki list. I have heard some mention that naming conflicts have
occurred in the past with this solution. This would force queries looking
for only core sources of change to provide an inclusive list of RC_* values
to find, rather than using rc_type != RC_EXTERNAL.
Things to consider:
On smaller wiki's WikiData changes can account for > 50% of the changes.
Talk namespace edits, which we expect to eventually replace with flow
edits, account for ~20% of enwiki recentchanges rows
The standard query issued by Special:RecentChanges is
SELECT /* lots of fields */
FROM `recentchanges`
FORCE INDEX (rc_timestamp)
LEFT JOIN `watchlist` ON (wl_user = '2' AND (wl_title=rc_title) AND
(wl_namespace=rc_namespace))
LEFT JOIN `tag_summary` ON ((ts_rc_id=rc_id))
WHERE (rc_timestamp >= '20130912000000') AND rc_bot = '0' AND (rc_type
!=
5)
ORDER BY rc_timestamp DESC LIMIT 50
The standard query issued by Special:Watchlist is
SELECT /* lots of fields */
FROM `recentchanges`
INNER JOIN `watchlist` ON (wl_user = '2' AND (wl_namespace=rc_namespace)
AND (wl_title=rc_title))
LEFT JOIN `page` ON ((rc_cur_id=page_id))
LEFT JOIN `tag_summary` ON ((ts_rc_id=rc_id))
WHERE (rc_timestamp > '20130916175626') AND (rc_this_oldid=page_latest OR
rc_type=3) AND (rc_type != 5)
ORDER BY rc_timestamp DESC
Without further input I will be implementing option 3 from above, I welcome
any input on better solutions, or potential pitfalls with this solution.
Erik Bernhardson