On Nov 15, 2012, at 2:51 PM, MZMcBride z@mzmcbride.com wrote:
Max Semenik wrote:
On 15.11.2012, 4:06 Diederik wrote:
I think that the Analytics team would prefer either:
- detect source of edit in the URL
Or 2) have a hook activated after a successful edit and have the data send to the pixel service
Having this data in a MySQL table poses a lot of challenges with respect of importing that data into the analytics cluster
That's for analytics purposes. However, there can be other use cases for which tags in the DB are perfect, for example filter recent changes for edits made only via a particular channel.
Max is right.
The general issue is that the revision table could use a generalized metadata store the same way that page table has page_props[1]. This is not the same, but sometimes coincident to analytical needs (I assume that if we come up with a way to attach revision-based metadata, it would be easy to expose that same data to the analytics pipeline for RevTagging).
To Amir's original suggestion, I think that hacking in a rev_mobile field into the rev table sounds extremely clunky. I'd be worried that over time this will end up with an explosion that will resemble our Recentchanges table[2]. I assume that's why Amir brought this up to suggestion.
On the other hand, a "revision_props" the way page_props would be a terrible waste of space and performance—imagine storing a boolean in a BLOB with every revision?
Perhaps we could use a small varchar or smallint in place of the BLOB and not too high impact but fliexible enough to handle both existing (key: mobile_edit; value:1) and future needs? Especially if mobile_edit=0 isn't actually stored as an entry at all.
Right, which is why a revision tagging system exists in MediaWiki core currently. If someone wanted to, for example, modify the MobileFrontend extension to add a "mobile" tag to edits, it would be trivial to do. The tagging infrastructure is already in place.
It's unfortunate that RevTagging got mixed in this discussion, but I hope this clarifies the distinction between mobile's needs and RevTagging.
Currently, MW has a very limited ability to attach metadata revision table to the revision table in the form of new cols to the revision table (existing cols are… limited[3]) The issue is that this data is prioritized for transactional use, not necessary analytical use (in wiki[4]: "is needed to operate the website and, in particular, to populate article revision histories").
In analytical systems, data is fed down a different pipeline in order to be "online" and have no impact to the web transactions. Naïvely, that's because analytical questions on transactional databases look like "COUNT * FROM sometable" which are full table scans (or thereabouts) and are expensive. Adding the metadata for analytical purposes based on the OLTP store would then be "COUNT * FROM sometable GROUP BY datafromothertable JOIN awholemessoftables" which are multiple full table scans, and pretty soon that is would require a dedicated offline read-only DB, and still be terribly slow.
So there is a need to attach metadata needed for analytics (which may or may not be the same metadata "needed to operate the website") at runtime so that it can be run down the analytical data pipeline without needing to hit the live OLTP store continually asking things like "give me the campaign that this revision occurred under?" especially when things like "campaign" probably have no importance at all to the website itself.
My thinking that if we had a way of attaching arbitrary meta to revisions, then, in cases where the two needs are coincident, all we have to do is expose that same meta to analytics through their pixel service (revtagging) and we're good to go. If revtagging isn't up, or hasn't recorded it, we could still go back to the transactional store offline and amortize the missing information.
Going back to the broader point, I'm completely lost as to why the Analytics team can't handle a structured database.
I assume this last is a bit tongue-in-cheek, but I LOL'd… for completely different reasons.
[1]: http://www.mediawiki.org/wiki/Manual:Page_props_table [2]: http://www.mediawiki.org/wiki/Manual:Recentchanges_table [3]: http://www.mediawiki.org/wiki/Manual:Revision_table [4]: http://www.mediawiki.org/wiki/Revtagging
terry chay 최태리 Director of Features Engineering Wikimedia Foundation “Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment.”
p: +1 (415) 839-6885 x6832 m: +1 (408) 480-8902 e: tchay@wikimedia.org i: http://terrychay.com/ w: http://meta.wikimedia.org/wiki/User:Tychay aim: terrychay