On Fri, Sep 16, 2011 at 9:48 AM, Thomas Gries <mail(a)tgries.de> wrote:
Am 16.09.2011 11:24, schrieb Roan Kattouw:
For some applications, I use the technique of
representing the 128 bit
of md5 or other checksums
as base-62 character strings
instead of hexadecimal (base-16) strings.
MediaWiki already uses a similar technique,
storing SHA-1 hashes of
images in base 36.
Was there a certain reason to chose base 36 ?
Why not recoding to base 62 and saving 3 bytes per checksum ?
This format was chosen for hashes to be used as filenames for uploaded file
storage (currently used only for storing deleted files, I think, but there's
long been a long-term plan to switch primary image storage to this as well
some day).
For greatest compatibility with all filesystems, we only use characters that
are safe (ASCII digits and letters) and don't rely on case distinctions
which are not always preserved (Windows and Mac OS X systems default to
case-insensitive filesystems).
The reason we're not using hex here is that a more compact representation
makes the filenames, and thus any URL references including them in the path,
shorter. On img_sha1 I guess we just kept using it for
compatibility/similarity with the deleted file archives?
-- brion