One thing I think I will add is a text byte size field on the revision table; with individual-revision compression we no longer can easily get the size short of decompressing the text to see what it looks like. Generally this size will not change, either, since a given revision's source text is immutable.
I'll need to parse the full article text anyway, for several stats. Number of int/ext links, image links, word count, ...
If I run directly on the database the job will run for days. Fine with me, but heavy queries are a problem every now and then, or no more?
I'd better make the counts job incremental then. It is a bit less flexible and more error prone on script updates, but it can be done. Any idea when the new scheme will be implemented?
Erik Zachte
On Dec 19, 2004, at 6:49 PM, Erik Zachte wrote:
I'd better make the counts job incremental then. It is a bit less flexible and more error prone on script updates, but it can be done. Any idea when the new scheme will be implemented?
1.5 is slated for February.
-- brion vibber (brion @ pobox.com)
wikitech-l@lists.wikimedia.org