Base36 certainly isn't the most efficient way to store a sha1, but it's what is in use all over mediawiki. I think there was some discussion on this list of the tradeoffs of different methods when revision.rev_sha1 was added, and base36 was picked as a compromise. I don't know why base36 was picked over base62 once it was decided to stick with an ascii alpha-numeric encoding but regardless, there was opposition to binary. Taken on its own, an integer index would be more efficient but I don't think it makes sense if we continue using base36.
On Tue, Sep 25, 2012 at 11:20 AM, Artur FijaĆkowski wiki.warx@gmail.comwrote:
<tl;dr> Where an index is desired on a mysql column of base36 encoded
sha1
hashes, I recommend ADD INDEX (sha1column(10)). Shorter indexes will be sufficient in many cases, but this is still provides a >2/3 space savings while covering a huge (2^51.43) space.
Isn't it better to store BIGINT containing part of (binary) sha1 and use index on numeric column?
AJF/WarX
Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l