Base36 certainly isn't the most efficient way to store a sha1, but it's
what is in use all over mediawiki. I think there was some discussion on
this list of the tradeoffs of different methods when revision.rev_sha1 was
added, and base36 was picked as a compromise. I don't know why base36 was
picked over base62 once it was decided to stick with an ascii alpha-numeric
encoding but regardless, there was opposition to binary. Taken on its
own, an integer index would be more efficient but I don't think it makes
sense if we continue using base36.
On Tue, Sep 25, 2012 at 11:20 AM, Artur FijaĆkowski <wiki.warx(a)gmail.com>wrote;wrote:
<tl;dr>
Where an index is desired on a mysql column of base36 encoded
sha1
hashes, I recommend ADD INDEX (sha1column(10)).
Shorter indexes will be
sufficient in many cases, but this is still provides a >2/3 space savings
while covering a huge (2^51.43) space.
Isn't it better to store BIGINT containing part of (binary) sha1 and
use index on numeric column?
AJF/WarX
_______________________________________________
Wikitech-l mailing list
Wikitech-l(a)lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l