I agree: in fact we don't do it in the write pipeline. The code we wrote
implements a simple queue, where page_id are queued for processing. The
processing job then gets a page_id out of that table, and processes all the
missing revisions for that page_id. So this is useful also if (say) there
is a page merge or something similar: we can just erase all authorship
information for that page, and at the next edit, it will be rebuilt.
What we wrote can work also on labs, but:
- We need a way to poll the database for things like what are all
revision_ids of a given page. We could use the API instead, but it's less
efficient.
- We need a way to read the text of revisions. Again, the API can work,
but having better access is better.
- We need a place where to store the authorship information. This is
several terabytes for enwiki. Basically, we need access to some text
store. Is this available on labs?
We would welcome more information on how much of the above is feasible on
labs.
Luca
On Mon, Feb 25, 2013 at 7:27 PM, Matthew Flaschen
<mflaschen(a)wikimedia.org>wrote;wrote:
On 02/25/2013 09:21 PM, Luca de Alfaro wrote:
I am writing this message as we hope this might
be of interest, and as we
would be quite happy to find people willing to collaborate. Is anybody
interested in developing a GUI for it and talk to us about what API we
should have for retrieving this authorship information? Is there anybody
interested in helping to move the code to production-ready stage?
Are you planning to run this live in production (i.e. 1-2 seconds on
every save)?
I think people would be reluctant to slow writes down further. You
could potentially do it deferred, or in the job queue, but I think it
might make more sense on something like Wikimedia Labs
(
https://www.mediawiki.org/wiki/Wikimedia_Labs)
Did you try doing it with no caching (similar to git blame, though I
know it's a different algorithm)? I'm wondering how much benefit you
get from the cached info.
Matt Flaschen
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l