On Fri, Sep 2, 2011 at 5:47 PM, Daniel Friesen lists@nadir-seen-fire.com wrote:
On 11-09-02 05:20 PM, Asher Feldman wrote:
When using for analysis, will we wish the new columns had partial indexes (first 6 characters?)
Bug 2939 is one relevant bug to this, it could probably use an index. [1] https://bugzilla.wikimedia.org/show_bug.cgi?id=2939
I generally suspect that a standard index is going to be a waste for the most urgent uses of this. It will rarely be interesting to search for common hashes between articles. The far more common case will be to search for duplicate hashes within the history of a single article. My understanding is that having a normal index on a table the size of our revision table will be far too expensive for db writes.
This is the first I've heard of partial indexes (/me researches) I don't know if a partial index is going to be cheap enough that we can use it, and useful enough that we'd want to. Would this be a faster query in a world with a partial index on the first six characters? SELECT rev_id FROM revision WHERE rev_page=12345 AND rev_sha1='4cdbd80be15fcfff139fb8a95f2ca359520939ee'
...or would we have to run a query like this to get the benefit of the index? SELECT rev_id FROM revision WHERE rev_page=12345 AND rev_sha1 like '4cdbd8%'
...and would either of these queries be considered too expensive to run without a partial index? How about with a partial index?
Rob