-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160
I'm trying to understand Math.php to fix a bug, but I don't understand why math_inputhash and math_outputhash are being stored as binary values. It appears that Math.php is simply coverting the original hex values to a binary upon store to the database, and then doing the inverse when it is read back out. What's the reason for this as opposed to simply storing the hex values directly in the database?
- -- Greg Sabino Mullane greg@turnstep.com End Point Corporation PGP Key: 0x14964AC8 200706021603 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Greg Sabino Mullane wrote:
I'm trying to understand Math.php to fix a bug, but I don't understand why math_inputhash and math_outputhash are being stored as binary values. It appears that Math.php is simply coverting the original hex values to a binary upon store to the database, and then doing the inverse when it is read back out. What's the reason for this as opposed to simply storing the hex values directly in the database?
To save space?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Greg Sabino Mullane wrote:
I'm trying to understand Math.php to fix a bug, but I don't understand why math_inputhash and math_outputhash are being stored as binary values. It appears that Math.php is simply coverting the original hex values to a binary upon store to the database, and then doing the inverse when it is read back out. What's the reason for this as opposed to simply storing the hex values directly in the database?
There's no good reason for it; perhaps premature optimization on Taw's part back in the day. :)
We could change it to hex, but that may require changing the existing entries or such.
- -- brion vibber (brion @ wikimedia.org)
Brion wrote:
Greg Sabino Mullane wrote:
I'm trying to understand Math.php to fix a bug, but I don't understand why math_inputhash and math_outputhash are being stored as binary values. It appears that Math.php is simply coverting the original hex values to a binary upon store to the database, and then doing the inverse when it is read back out. What's the reason for this as opposed to simply storing the hex values directly in the database?
There's no good reason for it; perhaps premature optimization on Taw's part back in the day. :)
We could change it to hex, but that may require changing the existing entries or such.
Well, I haven't looked at it, but just read these messages...
If I get it right, there's no difference between storing something as hex or storing it as binary, because hex is just a shorter form of presentation for binary values, right?
So, was "hex" in the original message referring to strings to save some numeric values?
*confused*
-- chris
------ Der Inhalt dieses E-Mails ist streng vertraulich und m�glicherweise gesetzlich besonders gesch�tzt. Dieses E-Mail ist nur f�r den Adressaten bestimmt. Wenn Sie nicht der beabsichtigte Empf�nger sind, ist es Ihnen nicht gestattet und stellt m�glicherweise einen Gesetzesverstoss dar, dieses E-mail zu kopieren, zu verbreiten oder anderweitig zu verwenden. In diesem Fall sollten Sie uns so schnell wie m�glich benachrichtigen und dieses E-mail l�schen.
Le contenu de ce courriel est strictement confidentiel et probablement l�galement prot�g�. Il n'est adress� qu'aux destinataires mentionn�s. Si vous n'en faites pas partie, vous n'avez pas l'autorisation de le copier, ni de le diffuser, ni m�me de le destiner � un autre usage - ce qui dans chaque cas peut en effet constituer une infraction � la loi. Dans ce cas, vous devriez nous en informer imm�diatement et effacer ce courriel.
Il contenuto del presente messaggio elettronico � da considerare di massima riservatezza e possibilmente � protetto in modo particolare dalla legge. La presente e-mail � riservata alla sola attenzione della persona destinataria. Se lei non � la persona che avrebbe dovuto ricevere la e-mail, non le � permesso e possibilmente rappresenta una violazione della legge, copiare la presente e-mail, diffonderla o utilizzarla in qualsiasi altro modo. In questo caso lei dovrebbe informarci al pi� presto possibile e cancellare la presente e-mail.
On 6/5/07, christoph.huesler@css.ch christoph.huesler@css.ch wrote:
If I get it right, there's no difference between storing something as hex or storing it as binary, because hex is just a shorter form of presentation for binary values, right?
Hex takes two bytes per byte, while binary takes one byte per byte. Is there any reason not store binary values as binary data, even though they may be used later in an other representation?
Bryan
Bryan wrote:
Hex takes two bytes per byte, while binary takes one byte per byte. Is there any reason not store binary values as binary data, even though they may be used later in an other representation?
Well, I guess you're wrong here... one hex digit is four bits, thus half a byte. A byte in hex isn't any different as a byte in binary, hex is just another representation of the same value in binary form, takes up the same space _and is stored in exact the same physical representation as binary_.
0x10 in hex is 16 in decimal and stored as one byte. 00010000 in binary is 16 in decimal and stored as one byte.
Of course, when you think I am wrong please say so ;)
-- chris ------ Der Inhalt dieses E-Mails ist streng vertraulich und m�glicherweise gesetzlich besonders gesch�tzt. Dieses E-Mail ist nur f�r den Adressaten bestimmt. Wenn Sie nicht der beabsichtigte Empf�nger sind, ist es Ihnen nicht gestattet und stellt m�glicherweise einen Gesetzesverstoss dar, dieses E-mail zu kopieren, zu verbreiten oder anderweitig zu verwenden. In diesem Fall sollten Sie uns so schnell wie m�glich benachrichtigen und dieses E-mail l�schen.
Le contenu de ce courriel est strictement confidentiel et probablement l�galement prot�g�. Il n'est adress� qu'aux destinataires mentionn�s. Si vous n'en faites pas partie, vous n'avez pas l'autorisation de le copier, ni de le diffuser, ni m�me de le destiner � un autre usage - ce qui dans chaque cas peut en effet constituer une infraction � la loi. Dans ce cas, vous devriez nous en informer imm�diatement et effacer ce courriel.
Il contenuto del presente messaggio elettronico � da considerare di massima riservatezza e possibilmente � protetto in modo particolare dalla legge. La presente e-mail � riservata alla sola attenzione della persona destinataria. Se lei non � la persona che avrebbe dovuto ricevere la e-mail, non le � permesso e possibilmente rappresenta una violazione della legge, copiare la presente e-mail, diffonderla o utilizzarla in qualsiasi altro modo. In questo caso lei dovrebbe informarci al pi� presto possibile e cancellare la presente e-mail.
christoph.huesler@css.ch wrote:
Bryan wrote:
Hex takes two bytes per byte, while binary takes one byte per byte. Is there any reason not store binary values as binary data, even though they may be used later in an other representation?
Well, I guess you're wrong here... one hex digit is four bits, thus half a byte. A byte in hex isn't any different as a byte in binary, hex is just another representation of the same value in binary form, takes up the same space _and is stored in exact the same physical representation as binary_.
0x10 in hex is 16 in decimal and stored as one byte. 00010000 in binary is 16 in decimal and stored as one byte.
Of course, when you think I am wrong please say so ;)
I think all of this is pretty much taken for granted. The problem the OP was no doubt referring to is that the PostgreSQL client library makes it relatively difficult to load and store binary values, requiring a special kind of escaping and unescaping. Cryptographic hashes are conventionally represented in hexadecimal form. Indeed the PHP hash functions output in hex form unconditionally, and the Math.php code goes to some trouble to convert them to binary. That's why the design choice was a strange one. But the designer would have been unaware of the PostgreSQL issues.
The solution to the PostgreSQL binary problem (which affects various things apart from Math.php) is to call encodeBlob() and decodeBlob() on data, which do nothing for MySQL but call pg_escape_bytea() and pg_unescape_bytea() for PostgreSQL.
-- Tim Starling
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
christoph.huesler@css.ch wrote:
Bryan wrote:
Hex takes two bytes per byte, while binary takes one byte per byte. Is there any reason not store binary values as binary data, even though they may be used later in an other representation?
Well, I guess you're wrong here... one hex digit is four bits, thus half a byte.
Hence, it requires two bytes of hex string to encode a single byte of binary data.
Of course, when you think I am wrong please say so ;)
You're wrong. :)
- -- brion vibber (brion @ wikimedia.org)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Bryan Tong Minh wrote:
On 6/5/07, christoph.huesler@css.ch christoph.huesler@css.ch wrote:
If I get it right, there's no difference between storing something as hex or storing it as binary, because hex is just a shorter form of presentation for binary values, right?
Hex takes two bytes per byte, while binary takes one byte per byte.
Right.
Is there any reason not store binary values as binary data, even though they may be used later in an other representation?
Hex is easier to work with in a number of circumstances:
1) No worries about encoding and valid byte values.
2) Dumping debug information to a console is easier when it's legible and doesn't contain non-printable characters.
3) Digging in the database to do any sort of manual work is a lot easier when you're dealing with sequences of [0-9a-f] than [\x00-\xff]. It doesn't mess up your terminal, and cut-n-paste works.
4) When the in-database keys match up with filenames in hex, it's rather handy to use the same representation in both places.
Whereas the only benefit to storing raw binary is: it saves a few bytes per row.
- -- brion vibber (brion @ wikimedia.org)
On 6/5/07, christoph.huesler@css.ch christoph.huesler@css.ch wrote:
So, was "hex" in the original message referring to strings to save some numeric values?
Answer to all your questions right here: yes. In an ideal world we could specify the column as GINORMOUSINT(128) FORMAT HEX or something and have both the efficiency and the prettiness, but alas this isn't an ideal world, and CHAR/BINARY storage of some kind is needed if you don't want to split up over multiple columns, and there's no "FORMAT HEX" in SQL. The question is then whether to store the data as unreadable binary (debugging, innocent SELECT *'s, and changing character encodings become fun then, as Brion observes) or as nicely manageable hex characters at a whole sixteen bytes more per row.
On 05/06/07, Simetrical Simetrical+wikilist@gmail.com wrote:
Answer to all your questions right here: yes. In an ideal world we could specify the column as GINORMOUSINT(128) FORMAT HEX or something
Well, Oracle has VARCHAR2, so I'm sure we'll be seeing GINORMOUSINT real soon...
Rob Church
wikitech-l@lists.wikimedia.org