For a lot (all?) text input fields for reasons we have set a maxlength of 200 but some are set to 255.
Is there a special reason for 200? Or should we increase all to 255? Or reduce all to 200? I like consistency :)
Raymond.
On 1/7/09 12:59 PM, Raimond Spekking wrote:
For a lot (all?) text input fields for reasons we have set a maxlength of 200 but some are set to 255.
Is there a special reason for 200? Or should we increase all to 255? Or reduce all to 200? I like consistency :)
The big trick is that these are generally for fields which have a database length limit of 255 *bytes*, which can be anywhere from 63 to 255 *characters* of UTF-8 text.
The 'maxlength' on the input field is enforced in Unicode *characters* by the browser.
I think the 200 is an arbitrary compromise to keep the in-browser limit from overflowing the field length in most European languages... but it's insufficient for non-Latin text.
-- brion
On Wed, Jan 7, 2009 at 4:39 PM, Brion Vibber brion@wikimedia.org wrote:
The big trick is that these are generally for fields which have a database length limit of 255 *bytes*, which can be anywhere from 63 to 255 *characters* of UTF-8 text.
The 'maxlength' on the input field is enforced in Unicode *characters* by the browser.
I think the 200 is an arbitrary compromise to keep the in-browser limit from overflowing the field length in most European languages... but it's insufficient for non-Latin text.
. . . and the right solution here is probably just to convert all these fields to TEXT and allow for the possibility of much longer edit summaries if desired.
Aryeh Gregor wrote:
. . . and the right solution here is probably just to convert all these fields to TEXT and allow for the possibility of much longer edit summaries if desired.
TEXT (and related) fields have always been iffier than fixed-length types as far as performance goes, so I guess that's why everybody's steering clear of those if at all possible (think article titles and all related queries, for instance).
Gutza
On Wed, Jan 7, 2009 at 7:48 PM, Gutza gutza@moongate.ro wrote:
TEXT (and related) fields have always been iffier than fixed-length types as far as performance goes, so I guess that's why everybody's steering clear of those if at all possible (think article titles and all related queries, for instance).
There are various differences, yes. For instance, TEXT rows can't exist in MEMORY tables, so they force filesorts to go to disk (if the row contents are being explicitly sorted instead of just row pointers, anyway). If we upgraded to MySQL 5, we could use VARBINARY(65535) instead. For the time being, I suspect the performance problems would be tolerable: it's not like we're searching or indexing these, we're just retrieving them for results.
Aryeh Gregor wrote:
There are various differences, yes. For instance, TEXT rows can't exist in MEMORY tables, so they force filesorts to go to disk (if the row contents are being explicitly sorted instead of just row pointers, anyway). If we upgraded to MySQL 5, we could use VARBINARY(65535) instead. For the time being, I suspect the performance problems would be tolerable: it's not like we're searching or indexing these, we're just retrieving them for results.
As I said, think article names (and by extension page names in general), or namespaces and usernames for that matter (well, ok, I expect namespaces are pretty much abstract, but usernames are still relevant). I haven't looked this up in the code, but if there's any sanity left in this world I hope all significantly expensive operations on those fields are performed by the DBMS (as opposed to PHP caches and the like). In which case any and all optimization options/alternatives/hacks become relevant.
Gutza
On Wed, Jan 7, 2009 at 9:04 PM, Stancescu, Bogdan bogdan@moongate.ro wrote:
As I said, think article names (and by extension page names in general), or namespaces and usernames for that matter (well, ok, I expect namespaces are pretty much abstract, but usernames are still relevant).
I don't understand what you're saying. Article names and usernames are varchar(255), namespaces are int, and I expect them to remain that way. What we're discussing is only edit summaries and the like, rev_comment and log_comment and so on (if there is any so on, I can't recall offhand).
I haven't looked this up in the code, but if there's any sanity left in this world I hope all significantly expensive operations on those fields are performed by the DBMS (as opposed to PHP caches and the like). In which case any and all optimization options/alternatives/hacks become relevant.
But we're not talking about those fields. I don't understand the relevance. Obviously, yes, the database queries are optimized, we don't scan tons of rows and filter in PHP. IIRC there are <10 queries per page average with average time of <10 ms per query, depending on how recently Domas has stabbed people.
Aryeh Gregor wrote:
I don't understand what you're saying. Article names and usernames are varchar(255), namespaces are int, and I expect them to remain that way. What we're discussing is only edit summaries and the like, rev_comment and log_comment and so on (if there is any so on, I can't recall offhand).
Ah, then that's the confusion. As far as I can tell, the original poster was raising the issue of HTML text input maxlength (see subject) -- I don't think he ever meant to delve into database matters explicitly. Brion was the one to cross-reference VARCHAR limits into the matter, and you prolly picked up from there. I apologise if I misdirected you, in case you intended to open a different sub-topic -- I was sticking to the original context.
Gutza
Brion Vibber wrote:
On 1/7/09 12:59 PM, Raimond Spekking wrote:
For a lot (all?) text input fields for reasons we have set a maxlength of 200 but some are set to 255.
Is there a special reason for 200? Or should we increase all to 255? Or reduce all to 200? I like consistency :)
The big trick is that these are generally for fields which have a database length limit of 255 *bytes*, which can be anywhere from 63 to 255 *characters* of UTF-8 text.
The 'maxlength' on the input field is enforced in Unicode *characters* by the browser.
I think the 200 is an arbitrary compromise to keep the in-browser limit from overflowing the field length in most European languages... but it's insufficient for non-Latin text.
I developed a JavaScript workaround some time ago to limit edit summaries to exactly 250 bytes. It's pretty hacky, but works:
http://en.wikipedia.org/wiki/User:Ilmari_Karonen/longeditsummary.js
I'd been thinking about adding something like it to MediaWiki itself, but kind of forgot about it. While doing so, we should also modify the edit UI so that, instead of silently truncating overlong summaries, it returns the user to the edit form with a warning message and the truncated summary, allowing the user to edit it before saving. That would let us safely raise the default maxlength to 255 even for users without JavaScript.
Of course, the same feature could and should be deployed for other length-limited summary fields, such as move summaries. (And BTW, why the heck is the UI element for those a textarea, anyway?)
On 1/7/09 2:44 PM, Ilmari Karonen wrote:
Brion Vibber wrote:
The 'maxlength' on the input field is enforced in Unicode *characters* by the browser.
I developed a JavaScript workaround some time ago to limit edit summaries to exactly 250 bytes. It's pretty hacky, but works:
http://en.wikipedia.org/wiki/User:Ilmari_Karonen/longeditsummary.js
Hmm... offhand, I'm not sure how reliably this would interact with things like cut-and-paste. If in one paste operation we try to replace 80 hanzi characters with 200 ASCII characters, what happens?
I'd been thinking about adding something like it to MediaWiki itself, but kind of forgot about it. While doing so, we should also modify the edit UI so that, instead of silently truncating overlong summaries, it returns the user to the edit form with a warning message and the truncated summary, allowing the user to edit it before saving. That would let us safely raise the default maxlength to 255 even for users without JavaScript.
Could be done, though that complicates the form logic of course.
I suspect that there's not a huge reason to be using VARCHARs/TINYBLOBs for summary fields to begin with. They're not used for sorting or anything, and those that are TINYBLOBs could just be made regular ol' BLOBs without any performance degredation or extra storage usage AFAIK unless the actual values are bigger than 255 bytes.
The main area where we _need_ to enforce these funny length limits is on page titles and usernames, say for renaming pages or creating accounts. Right now I think we just spit back an "invalid title" error or some such if you do submit an overlong title.
Of course, if MySQL properly supported Unicode to begin with we'd probably just be using fixed-length Unicode fields for the titles and usernames and such where it does make sense to have a clean limit, and the 'maxlength' attribute would work fine. Ah well. :)
Of course, the same feature could and should be deployed for other length-limited summary fields, such as move summaries. (And BTW, why the heck is the UI element for those a textarea, anyway?)
Because the original single-line input field was frequently confused with the target field by impatient typers, leading to unexpected results.
-- brion
Brion Vibber wrote:
Of course, the same feature could and should be deployed for other length-limited summary fields, such as move summaries. (And BTW, why the heck is the UI element for those a textarea, anyway?)
Because the original single-line input field was frequently confused with the target field by impatient typers, leading to unexpected results.
...ouch. I see. :)
On Wed, Jan 7, 2009 at 10:41 PM, Brion Vibber brion@wikimedia.org wrote:
Hmm... offhand, I'm not sure how reliably this would interact with things like cut-and-paste. If in one paste operation we try to replace 80 hanzi characters with 200 ASCII characters, what happens?
Ilmari and I have actually discussed this before. Ilmari's script jerkily reverts the excess characters. I made a variant of the script that displays an unobtrusive error message instead: http://en.wikipedia.org/wiki/User:Remember_the_dot/Long_edit_summary.js
wikitech-l@lists.wikimedia.org